Functions are just like what you
remember from math class. Most functions are in the following form:
f(argument1, argument2, …) Where f is the name of the function, and
argument1, argument2, . . . are the arguments
to the function. Here are a few examples of built-in functions:
Note in the last example that if you
give the argument in the default order, you can omit the names. Some
built-in functions have operator form like the following examples:
A function in R is just another object
that is assigned to a symbol. You can define your own functions in R,
assign them a name, and then call them just like the built-in functions.
Writing your function code on the R Console is hard, so R provided a
simple text editor for that. To write your code go to File >> New
Script. This will open the R Editor, enter the following code
You could select the code, copy and
paste it in R console. Now you can call the function to get its result
(note that entering the function name and hitting enter retrieves the
function code. This is a useful trick to view function code before using
it).
Now lets go back to the editor and
save the code in our working directory in MyCode.R (.R is the extension
for R code files). To load the code in any R code file inside the
console for use, use source() and pass the file name for it. Then you
can use the code inside that file in the console. Each time you edit
that code file, you have to call source() again to load the latest
code.
A function definition in R includes
the arguments’ names (in the previous example we didn’t use any
arguments).
Optionally, you can include default
values for arguments. If you specify a default value for an argument,
it will be considered optional (can be omitted from the function call).
If you provided a value for an argument with default value, your value
will override the default one. Non-optional parameters have to be
provided in the function call.
If you want to specify a
variable-length argument list, specify (…) in the arguments to the
function. Everything other than the named arguments, will be stored in
the ellipsis … .To can then convert the ellipsis to a list to work with
it.
You can also refer directly to items
within the ellipsis using the variables ..1 for the first item, ..2 for
second and so on to ..9. Any argument that appear after the ellipsis in
the function call, have to be named explicitly.
You can get the set of arguments
accepted by a function, use the args function. NULL represents the
function body.
You can pass named arguments any ware
in the function call by their name. Unnamed arguments have to match the
order that they are listed in the function definition. The following
lm() function calls are equivalent :
lm(data = mydata, y ~ x, model =
FALSE, 1:100)
lm(y ~ x, mydata, 1:100, model =
FALSE)
Named argument are helpful if you have
a long argument list which you remember it by arguments’ names, not the
order.
Arguments to functions are evaluated lazily, so they are evaluated
only as needed. The function below never uses the argument b, so calling
f(2) will not produce an error because the 2 gets positionally matched
to a (the only variable needed).
even if you will use a missing argument, R will not give an error until
the first use of this missing argument. Everything before that will
execute normally.
You can use the return function to specify the value to be returned by
the function. Also R will return the last evaluated expression as the
result of the function if no return() is found.
Many functions in R can take other functions as arguments. An example of
these functions, the sapply function iterates through each element in a
vector, applying another function to each element in the vector and
returning the results.
You create functions that do not have names. These are called anonymous
functions. Anonymous functions are usually passed as arguments to other
functions.
the R interpreter assigns the anonymous function functions(x) {x * 7}
to the argument f of function apply.to.three then assigns 3 to the
argument x of the anonymous function. So, it will ends up by evaluating
3 * 7 and returns the result.
anonymous functions can also be used with sapply()
it is possible also to define an anonymous function and apply it
directly to an argument.
How does R know which value to assign to which symbol ? How does R know
what value to assign to the symbol lm ? Why doesn’t it give it the value
of lm that is in the stats package ?
When R tries to bind a value to a symbol, it searches through a series
of environments (sets of symbols, objects,…) to find the appropriate
value. When you are working on the command line and need to retrieve the
value of an R object, the search begins with the global environment you
working in it and look for a symbol name matching the one requested. If
not found, R starts searching the namespaces of each of the packages on
the search list. You can get the search list using search() function.
.GlobalEnv represents your current working environment on the R
command line, and its always the first element of the search list. The
base package is always the last one. The order on the list matters
If you loaded a package with library the namespace of that package will
be in the 2nd position of the search list, and everything else will be
shifted down the list.
You can also load package on the command line window by going to
Packages >> Load package >> then select the desired package
and click Ok.
You can configure which packages to be loaded automatically on startup
to be available for you. To do that open C:\Program
Files\R\<Your-R-Version>\etc\Rprofile.site using Notepad and
append the following to the bottom of the file. You can append whatever
package you want to the vestor c and it will be loaded for your on
startup.
local({
old <- getOption(“defaultPackages”)
options(defaultPackages = c(old, “car”, “RODBC”, “foreign”, “DA AG”,
“MASS”,
“lattice “,
“latticedl”, “sciplot”, “tree”, “lme4”))
})
Lexical Scoping Rules (or Static Scoping Rules) determines how a
value is associated with a free variable in a function. The values of free
variables are searched for in the environments in which the function was
defined.
So what is a a free variable ? a free variable is not a formal argument
(arguments declared in function signature) nor a local variable that is
declared and assigned in the function body. In the following example, x
and y are formal arguments. z is a free variable. f <- function(x, y) { x^2 + y / z }
So what is an environment ? an environment is a collection of (symbol,
value) pairs. Every environment has a parent environment, and it is
possible for an environment to have multiple children. A function + an
environment = a closure or function closure.
So, searching for the value for a free variable starts in the
environment in which the function was defined, if not found, the search
continued to the parent environment. The search continues until we hit
the top-level environment ( workspace or the namespace of the package).
After that the search continues down the search list until we hit the
empty environment. If not found, an error is thrown.
You can get the environment of a function using environment() (for
functions coded on the command line, that will be the global
environment). You can get the parent of an environment using
parent.env() (for functions coded on command line, it will be send item
in the search list).
Why does knowing lexical scoping rules matters ? Typically, a
function is defined in the global environment, so that the values of
free variables will be found in the user’s workspace (which is the right
approach). However, in R you can define functions inside other
functions, in this case the environment in which a function is defined
is the body of another function.
In this post we talked about functions and using it weather from the
console or from external files, functions as parameters, anonymous
functions, and many other low level stuff.
Stay tuned for more R notes.