Assignment Operator In R

We share our opinion that should be preferred to the more standard for assignment in R. This is from a draft of the appendix of our upcoming book. This has the risk of becoming an R version of Javascript’s semicolon controversy, but here you have it.

R has five common assignment operators: ““, ““, ““, “” and ““. Traditionally in R is the preferred assignment operator and is thought as an amateurish alias for it.

The notation is preferred by some for the very good reason that always means assignment. Whereas can mean assignment, function argument binding or case statement depending on context. However, in our opinion, you are allowed by R to type too many places (such as inside expressions) and it usually an easier to find bug when you typed when you meant than the other way around.

We prefer to get into the habit of never typing , because accidentally typing instead of in a function call can cause a non-reported error. Consider the following code fragment demonstrating how we can use to bind values to function arguments:

> divide = function(numerator,denominator) { numerator/denominator } > divide(1,2) [1] 0.5 > divide(2,1) [1] 2 > divide(denominator=2,numerator=1) [1] 0.5

Now consider the following (deliberate) error, by habit we typed instead of :

> divide(denominator<-2,numerator<-1) [1] 2 > denominator [1] 2

We quietly get the wrong answer and contaminate the values of and in the global name space. This is a simple example of where typing where was intended causes a non-signaling bug. We don’t know of any simple example (other than building examples that intend side-effects) where typing where you meant is an error. So we prefer .

The operator is just a right to left assignment that lets you write things like . It is cute, but not game changing. The and are to be avoided unless you actually need their special abilities. They undo one of the important safety point about functions. When a variable is assigned inside a function this assignment is local to the function. That is nobody outside of the function every sees the effect, the function can safely use variables to store intermediate calculations without clobbering same-named outside variables. The and operators are the operators to reach outside of this protected scope and cause outside side effects. Side effects seem great when you need them, but on the balance they make code maintenance, debugging and documentation much harder.


Edit 5/23/2016: Frankly I still believe there is good evidence that the currently observed semantics of R as it is implemented make “” the better choice. The “” operator currently does not preserve differences that really improve readability or code safety. Obviously it may have been different in the past (before some language changes and when interoperating with S-PLUS was critical. However, we no longer encourage students to use “” as we can’t in good conscience risk wasting student’s time on a dealing with the inevitable bickering and backlash. I have enough experience to try and make my case (right or wrong), a newer R user will not be able to respond to a number of the standard criticisms.

Like this:

LikeLoading...

Related

For R beginners, the first operator they use is probably the assignment operator. Google’s R Style Guide suggests the usage of rather than even though the equal sign is also allowed in R to do exactly the same thing when we assign a value to a variable. However, you might feel inconvenient because you need to type two characters to represent one symbol, which is different from many other programming languages.

As a result, many users ask Why we should use as the assignment operator?

Here I provide a simple explanation to the subtle difference between and in R.

First, let’s look at an example.

The above code uses both and symbols, but the work they do are different. in the first two lines are used as assignment operator while in the third line does not serves as assignment operator but an operator that specifies a named parameter for function.

In other words, evaluates the the expression on its right side () and assign the evaluated value to the symbol (variable) on the left side () in the current environment. evaluates the expression on its right side () and set the evaluated value to the parameter of the name specified on the left side () for a certain function.

We know that and are perfectly equivalent when they are used as assignment operators.

Therefore, the above code is equivalent to the following code:

Here, we only use but for two different purposes: in the first and second lines we use as assignment operator and in the third line we use as a specifier of named parameter.

Now let’s see what happens if we change all symbols to .

If you run this code, you will find that the output are similar. But if you inspect the environment, you will observe the difference: a new variable is defined in the environment whose value is . So what happens?

Actually, in the third line, two things happened: First, we introduce a new symbol (variable) to the environment and assign it a formula-typed value . Then, the value of is provided to the first paramter of function rather than, accurately speaking, to the parameter named , although this time they mean the identical parameter of the function.

To test it, we conduct an experiment. This time we first prepare the data.

Basically, we just did similar things as before except that we store all vectors in a data frame and clear those numeric vectors from the environment. We know that function accepts a data frame as the data source when a formula is specified.

Standard usage:

Working alternative where two named parameters are reordered:

Working alternative with side effects that two new variable are defined:

Nonworking example:

The reason is exactly what I mentioned previously. We reassign to and give its value to the first argument () of which only accepts a formula-typed value. We also try to assign to a new variable and give it to the second argument () of which only accepts a data frame-typed value. Both types of the parameter we provide to are wrong, so we receive the message:

From the above examples and experiments, the bottom line gets clear: to reduce ambiguity, we should use either or as assignment operator, and only use as named-parameter specifier for functions.

In conclusion, for better readability of R code, I suggest that we only use for assignment and for specifying named parameters.

0 thoughts on “Assignment Operator In R

Leave a Reply

Your email address will not be published. Required fields are marked *