How-To
Part 5 of our hands-on guide covers some R mysteries you'll need to understand.
By Sharon Machlis
Executive Editor, Data & Analytics, Computerworld |

R syntax can seem a bit quirky, especially if your frame of reference is, well, pretty much any other programming language. Here are some unusual traits of the language you may find useful to understand as you embark on your journey to learn R.
[This story is part of Computerworld's "Beginner's guide to R." To read from the beginning, check out the introduction; there are links on that page to the other pieces in the series.]
Assigning values to variables
In most other programming languages I know, the equals sign assigns a certain value to a variable. You know, x = 3 means that x now holds the value of 3.
But in R, the primary assignment operator is <-
as in:
x <- 3
Not:
x = 3
To add to the potential confusion, the equals sign actually can be used as an assignment operator in R — most (but not all) of the time.
The best way for a beginner to deal with this is to use the preferred assignment operator <- and forget that equals is ever allowed. That's recommended by the tidyverse style guide (tidyverse is a group of extremely popular packages) — which in turn is used by organizations likeGoogle for its R style guide— and what you'll see in most R code.
(If this isn't a good enough explanation for youand you really really want to know the ins and outs of R's 5 — yes, count 'em, 5 — assignment options, check out the R manual's Assignment Operators page.)
You'll see the equals sign in a few places, though. One is when assigning default values to an argument in creating a function, such as
myfunction <- function(myarg1 = 10) {
# some R code here using myarg1
}
Another is within some functions, such as the dplyr package's mutate() function (creates or modifies columns in a data frame).
One more note about variables: R is a case-sensitive language. So, variable x is not the same as X. That applies to just about everything in R; for example, the function subset()
would not be the same as Subset()
.
c is for combine (or concatenate, and sometimes convert/coerce.)
When you create an array in most programming languages, the syntax goes something like this:
myArray = array(1, 1, 2, 3, 5, 8);
Or:
int myArray = {1, 1, 2, 3, 5, 8};
Or maybe:
myArray = [1, 1, 2, 3, 5, 8]
In R, though, there's an extra piece: To put multiple values into a single variable, you use the c()
function, such as:
my_vector <- c(1, 1, 2, 3, 5, 8)
If you forget that c()
, you'll get an error. When you're starting out in R, you'll probably see errors relating to leaving out that c() a lot. (At least, I did.) It eventually does become something you don't think much about, though.
And now that I've stressed the importance of that c()
function, I (reluctantly) will tell you that there's a case when you can leave it out -- if you're referring to consecutive values in a range with a colon between minimum and maximum, like this:
my_vector <- (1:10)
You'll likely run into that style quite a bit in R tutorials and texts, and it can be confusing to see the c()
required for some multiple values but not others. Note that it won't hurt anything to use the c()
with a colon-separated range, though, even if it's not required, such as:
my_vector <- c(1:10)
One more important point about the c()
function: It assumes that everything in your vector is of the same data type — that is, all numbers or all characters. If you create a vector such as:
my_vector <- c(1, 4, "hello", TRUE)
You will not have a vector with two integer objects, one character object and one logical object. Instead, c()
will do what it can to convert them all into all the same object type, in this case all character objects. So my_vector will contain "1", "4", "hello" and "TRUE". You can also think ofc()
as for "convert" or "coerce."
To create a collection with multiple object types, you need an Rlist, not a vector. You create a list with the list()
function, not c(),
such as:
My_list <- list(1,4,"hello", TRUE)
Now, you've got a variable that holds the number 1, the number 4, the character object "hello" and the logical object TRUE.
Vector indexes in R start at 1, not 0
In most computer languages, the first item in a vector, list, or array is item 0. In R, it's item 1. my_vector[1] is the first item in my_vector. If you come from another language, this will be strange at first. But once you get used to it, you'll likely realize how incredibly convenient and intuitive it is, and wonder why more languages don't use this more human-friendly system. After all, people count things starting at 1, not 0!
Loopless loops
Iterating through a collection of data with loops like "for" and "while" is a cornerstone of many programming languages. That's not the R way, though. While R does have for, while, and repeat loops, you'll more likely see operations applied to a data collection using apply() functions or thepurrrtidyverse package.
But first, some basics.
If you've got a vector of numbers such as:
my_vector <- c(7,9,23,5)
and, for example, you want to multiply each by 0.01 to turn them into percentages, how would you do that? You don't need a for, foreach, or while loop at all. Instead, you can create a new vector called my_pct_vectors like this:
my_pct_vector <- my_vector * 0.01
Performing a mathematical operation on a vector variable will automatically loop through each item in the vector. Many R functions are already vectorized, but others aren't, and it's important to know the difference. if()
is not vectorized, for example, but there's a version ifelse()
that is.
If you attempt to use a non-vectorized function on a vector, you'll see an error message such as
the condition has length > 1 and only the first element will be used
Typically in data analysis, though, you want to apply functions to more than one item in your data: finding the mean salary by job title, for example, or the standard deviation of property values by community. The apply()
function group and in base R and functions in the tidyverse purrr package are designed for this. I learned R using the older plyr package for this — and while I like that package a lot, it's essentially been retired.
There are more than half a dozen functions in the apply family, depending on what type of data object is being acted upon and what sort of data object is returned. "These functions can sometimes be frustratingly difficult to get working exactly as you intended, especially for newcomers to R," says anblog post at Revolution Analytics, which focuses on enterprise-class R, in touting plyr over base R.
Plain old apply()
runs a function on every row or every column of a 2-dimensional matrix or data frame where all columns are the same data type. You specify whether you're applying by rows or by columns by adding the argument 1 to apply by row or 2 to apply by column. For example:
apply(my_matrix, 1, median)
returns the median of every row in my_matrix and
apply(my_matrix, 2, median)
calculates the median of every column.
Other functions in the apply() family such as lapply() or tapply() deal with different input/output data types. Australian statistical bioinformatician Neal F.W. Saunders has a nice brief introduction to apply in R in a blog post if you'd like to find out more and see some examples.
purrr is a bit beyond the scope of a basic beginner's guide. But if you'd like to learn more, head to the purrr websiteand/or Jenny Bryan's purrr tutorial site.
R data types in brief (very brief)
Should you learn about all of R's data types and how they behave right off the bat, as a beginner? If your goal is to be an R expert then, yes, you've got to know the ins and outs of data types. But my assumption is that you're here to try generating quick plots and stats before diving in to create complex code.
So this is what I'd suggest you keep in mind for now: R has multiple data types. Some of them are especially important when doing basic data work. And most functions require your data to be in a particular type and structure.
More specifically, R data types include integer, numeric, character and logical. Missing values are represented by NaN (if a mathematical function won't work properly) or NA (missing or unavailable).
As mentioned in the prior section, you can have a vector with multiple items of the same type, such as:
1, 5, 7
or
"Bill", "Bob", "Sue"
A single number or character string is also a vector -- a vector of length 1. When you access the value of a variable that's got just one value, such as 73 or "Learn more about R at Computerworld.com," you'll also see this in your console before the value:
[1]
That's telling you that your screen printout is starting at vector item number one. If you've got a vector with lots of values so the printout runs across multiple lines, each line will start with a number in brackets, telling you which vector item number that particular line is starting with. (See the screen shot, below.)

As mentioned earlier, if you want to mix numbers and strings or numbers and TRUE/FALSE types, you need a list. (If you don't create a list, you may be unpleasantly surprised that your variable containing (3, 8, "small") was turned into a vector of characters ("3", "8", "small").)
And by the way, R assumes that 3 is the same class as 3.0 — numeric (i.e., with a decimal point). If you want the integer 3, you need to signify it as 3L or with the as.integer() function. In a situation where this matters to you, you can check what type of number you've got by using the class()
function:
class(3)
class(3.0)
class(3L)
class(as.integer(3))
There are several as()
functions for converting one data type to another, including as.character()
, as.list()
and as.data.frame()
.
R also has special data types types that are of particular interest when analyzing data, such as matrices and data frames. A matrix has rows and columns; you can find a matrix dimension with dim() such as
dim(my_matrix)
A matrix needs to have all the same data type in every column, such as numbers everywhere.
Data frames are much more commonly used. They're similar to matrices except one column can have a different data type from another column, and each column must have a name. If you've got data in a format that might work well as a database table (or well-formed spreadsheet table), it will also probably work well as an R data frame.
Unlike in Python, where this two-dimensional data type requires an add-on package (pandas), data frames are built into R. There are packages that extend the basic capabilities of R data frames, though. One, the tibble tidyverse package, creates basic data frames with some extra features. Another, data.table, is designed for blazing speed when handling large data sets. It's adds a lot of functionality right within brackets of the data table object
mydt[code to filter columns, code to create new columns, code to group data]
A lot of data.table will feel familiar to you if you know SQL. For more on data.table, check out the package website or this intro video:
When working with a basic data frame, you can think of each row as similar to a database record and each column like a database field. There are lots of useful functions you can apply to data frames, such as base R'ssummary()
and the dplyr package's glimpse().
Back to base R quirks: There are several ways to find an object's underlying data type, but not all of them return the same value. For example, class()
and str()
will return data.frame on a data frame object, but mode()
returns the more generic list.
If you'd like to learn more details about data types in R, you can watch this video lecture by Roger Peng, associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health:
One more useful concept to wrap up this section — hang in there, we're almost done: factors. These represent categories in your data. So, if you've got a data frame with employees, their department and their salaries, salaries would be numerical data and employees would be characters (strings in many other languages); but you might want department to be a factor — ia category you may want to group or model your data by. Factors can be unordered, such as department, or ordered, such as "poor," "fair," "good," and "excellent."
R command line differs from the Unix shell
When you start working in the R environment, it looks quite similar to a Unix shell. In fact, some R command-line actions behave as you'd expect if you come from a Unix environment, but others don't.
Want to cycle through your last few commands? The up arrow works in R just as it does in Unix -- keep hitting it to see prior commands.
The list function, ls()
, will give you a list, but not of files as in Unix. Rather, it will provide a list of objects in your current R session.
Want to see your current working directory? pwd, which you'd use in Unix, just throws an error; what you want is getwd()
.
rm(my_variable)
will delete a variable from your current session.
Related:
- Business Intelligence
- Software Development
- R Language
Page 1 of 2
Bing’s AI chatbot came to work for me. I had to fire it.
FAQs
What does %% in R mean? ›
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
What are the basic R terms? ›...
- Factors are used to represent categorical data.
- Factors can be ordered or unordered.
- Some R functions have special methods for handling factors.
Yes, R is relatively easy to learn. It is fairly simple to understand and use to write code. It's likely that once you get started, you will be able to write simple programs within a week. However, R is designed to do some pretty heavy lifting.
What is the c () function in R? ›The c() function in R is used to combine or concatenate its argument.
What does %*% mean? ›%*% is matrix multiplication. For matrix multiplication, you need an m x n matrix times an n x p matrix.
What does %>% mean in R? ›The pipe operator, written as %>% , is a longstanding feature of the magrittr package for R. It takes the output of one function and passes it into another function as an argument. This allows us to link a sequence of analysis steps.
What are the 4 types of R? ›R's basic data types are character, numeric, integer, complex, and logical.
What are basic skills in R? ›- Getting Started with the R Programming Language. Use basic R syntax. ...
- Installing & Updating Packages in R. Describe the basics of an R package. ...
- Build & Work With Functions in R. Explain why we should divide programs into small, single-purpose functions.
R is considered one of the more difficult programming languages to learn due to how different its syntax is from other languages like Python and its extensive set of commands. It takes most learners without prior coding experience roughly four to six weeks to learn R. Of course, this depends on several factors.
Can I learn R in 2 weeks? ›Those who have programming knowledge may be able to learn how to use the language within two weeks. R online courses commonly offer instruction in the following topics: R syntax. Set-up.
Is R tough than Python? ›
Both Python and R are considered fairly easy languages to learn. Python was originally designed for software development. If you have previous experience with Java or C++, you may be able to pick up Python more naturally than R. If you have a background in statistics, on the other hand, R could be a bit easier.
Can I learn R in 3 months? ›High-Quality Instruction. With R in 3 Months, you'll get high-quality instruction that will guide you from R newbie to R expert. Over the three months, you'll go through Getting Started with R, Fundamentals of R, and Going Deeper with R, courses that have helped thousands of people around the world learn R.
What does C ()] do in R? ›In R, the c() function returns a vector (a one dimensional array).
Why use <- instead of in R? ›Traditionally in R <- is the preferred assignment operator and = is thought as an amateurish alias for it. The <- notation is preferred by some for the very good reason that <- always means assignment. Whereas = can mean assignment, function argument binding or case statement depending on context.
What does 👈 👉 mean in texting? ›The majority of people agree that it means 'shy'. As if you were twiddling your fingers together, nervously. The emojis can often be paired with the emoji too, for extra nervous vibes. The emoji sequence can be used if you're about to ask someone a soft, yet risky question, or if you're just feeling hella shy.
What does B * * * * * D stand for? ›Word History
Etymology. bondage, discipline, sadism, masochism. Note: The d and s have also been taken to stand for dominance and submission. First Known Use.
Summary of Key Points. "In Love" is the most common definition for *_* on Snapchat, WhatsApp, Facebook, Twitter, Instagram, and TikTok. *_* Definition: In Love.
What does three dots mean in R? ›If you have any basic experience with R, you probably noticed that R uses three dots ellipsis (…) to allow functions to take arguments that weren't pre-defined or hard-coded when the function was built.
What does semicolon mean in R? ›Certainly the R documentation doesn't seem to make a distinction with respect to syntactically complete statements: Both semicolons and new lines can be used to separate statements. A semicolon always indicates the end of a statement while a new line may indicate the end of a statement.
What are dollar signs in R? ›What is Dollar Sign in R? The Dollar $ sign in R code is a special operator in R Programming Language that is used to access the List for DataFrame. Similar to bracket [] notation, you can use $ sign to access, add, update and delete variables from list and columns from DataFrame.
What are the 6 classes of R objects? ›
- logical.
- numeric.
- integer.
- complex.
- character.
- raw.
There are three main types of loop in R: the for loop, the while loop and the repeat loop.
What is the easiest way to learn R? ›- Install , RStudio, and R packages like the tidyverse. ...
- Spend an hour with A Gentle Introduction to Tidy Statistics In R. ...
- Start coding using RStudio. ...
- Publish your work with R Markdown. ...
- Learn about some power tools for development.
- Rule 1: Prepare for a steep learning curve.
- Rule 2: Take the time to read a book.
- Rule 3: Use free resources.
- Rule 4: Build skills with low-pressure projects.
- Rule 5: Adopt good practices and be consistent.
- Rule 6: Use CRAN's Task View.
- Rule 7: Ask for help (and help others)
...
Top R language resources to improve your data skills
- Learn R language basics.
- Ask questions.
- Visualize your data.
- Advance your skills.
- Keep up with new developments.
- Package and repo info.
- Shiny Web framework.
Don't expect much from it. Truly speaking, nobody can learn R in a day. But if you are out of touch for a few days, it is a nice reminder.
Can R be self taught? ›Absolutely possible. R is such a high-level, interpreted language, it is so easy to learn. There are hundreds of FREE quality online courses out there, Datacamp being the most famous one, where you can enroll for free, learn at your own pace, practice and understand R and get a completion certificate!
Is R beginner friendly? ›R is also considered a beginner-friendly language. It might have a steeper learning curve at the beginning, but once you understand the basic features, it gets significantly easier. Having a background in statistics would probably make learning R a bit easier.
Is it hard to roll R's? ›To many native English speakers, the rolled R is notoriously hard to pronounce since there isn't an equivalent in the English language. The biggest myth around this topic is that the ability of rolling your R's genetic. In fact, alveolar trill is a skill that can be acquired through practicing.
Can I learn R without programming background? ›You don't need any programming or data science background to learn R with Vertabelo Academy! Open yourself to more data science and big-data job opportunities, and take your career to the next level. No additional software or talking-head tutorials—just you, your browser, and 187 interactive exercises.
Which pays more R or Python? ›
This has resulted in a larger job scope for both the languages. According to a Dice Tech Salary Survey, the average salary for professionals skilled in R and Python is $115,531 and $94,139, respectively.
Do data scientists still use R? ›R language is used by more than 2 million statisticians and data scientists across the world, and with the wider adoption of R language for business applications, the usage of this statistical software is increasing exponentially.
Why is R less popular than Python? ›R is relatively slower than python or other programming languages with poorly written code. Python emphasizes simplicity and code readability, resulting in a smooth learning curve. R programming has a steep learning curve for developers who do not have prior statistical language programming skills.
Which website is best for learning R programming? ›- R Programming by John Hopkins University: Coursera.
- Data Science R Basics Certificate by Harvard University: edX.
- R Training Course: Lynda.
- R Programming A - Z: R for Data Science: Udemy.
- R Programming Course and Tutorial Online: Pluralsight.
So to reiterate, choose one language. If you're starting out, R is almost certainly the best choice. And, really focus on learning the skills of data science. Additionally, once you start to learn R, don't get “shiny new object” syndrome.
How do you memorize R? ›- 1) Use the tools pros actually use (dplyr, ggplot, tidyverse.)
- 2) Create muscle memory for the commands you use. Never ever ever copy and paste commands you're trying to learn.
- 3) Use Scientifically Proven memorization techniques.
To concatenate two or more vectors in r we can use the combination function in R. Let's assume we have 3 vectors vec1, vec2, vec3 the concatenation of these vectors can be done as c(vec1, vec2, vec3). Also, we can concatenate different types of vectors at the same time using the same function.
What is \t used for in c? ›\t (Horizontal tab) – We use it to shift the cursor to a couple of spaces to the right in the same line. \a (Audible bell) – A beep is generated indicating the execution of the program to alert the user. \r (Carriage Return) – We use it to position the cursor to the beginning of the current line.
What is a vector in R? ›A vector is substantially a list of variables, and the simplest data structure in R. A vector consists of a collection of numbers, arithmetic expressions, logical values or character strings for example.
What is the difference between <- and == in R? ›The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.
What does the double arrow mean in R? ›
The double arrow in the last line is somewhat special. First, it checks if the variable already exists in the local environment, and if it doesn't, it will store the variable in the global environment (. GlobalEnv). If you are unfamiliar with this, try reading up on scope in programming.
Why is a double colon used in R? ›The double-colon operator :: selects definitions from a particular namespace. In the example above, the transpose function will always be available as base::t , because it is defined in the base package. Only functions that are exported from the package can be retrieved in this way.
How to create all combinations of vectors in R? ›To create combination of multiple vectors, we can use expand. grid function. For example, if we have six vectors say x, y, z, a, b, and c then the combination of vectors can be created by using the command expand. grid(x,y,z,a,b,c).
How do I convert a character to a vector in R? ›How to create a character vector in R? Use character() or c() functions to create a character vector. character() creates a vector with a specified length of all empty strings whereas c() creates a vector with the specified values, if all values are strings then it creates a character vector.
How do I convert an object to a vector in R? ›Use unlist() function to convert a list to a vector in R. Let's pass the above created list object li as an argument to unlist() function, this returns a vector with each element of the list. Yields below output. Since we have character elements in a list, it is converted into a character vector.
What does vertical bar mean in R? ›In R, the NOT operator is the exclamation mark. The AND operator is the ampersand. The OR operator is the vertical bar. Logical operators are often used to subset vectors or data frames.
What are symbols in R? ›Symbols refer to R objects. The name of any R object is usually a symbol. Symbols can be created through the functions as.name and quote . Symbols have mode "name" , storage mode "symbol" , and type "symbol" . They can be coerced to and from character strings using as.
How much is R $100 in US dollars? ›Real | Dollar |
---|---|
R$ 10 | $ 1.91 |
R$ 30 | $ 5.72 |
R$ 50 | $ 9.53 |
R$ 100 | $ 19.06 |
Country and Currency | Currency Code | Symbol |
---|---|---|
Brazil Real | BRL | R$ |
Brunei Darussalam Dollar | BND | $ |
Cambodia Riel | KHR | ៛ |
Canada Dollar | CAD | $ |
The vertical bar | is commonly referred to as a "pipe". It is used to pipe one command into another. That is, it directs the output from the first command into the input for the second command.
What is the three line symbol? ›
What does ≡ mean in math? Equal sign with three lines, i.e. ≡ means identical to.
What is Geom bar in R? ›geom_bar() makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col() instead.
What is the circle with an R in it called? ›The ® on a product means that it's a registered trademark, meaning the brand name or logo is protected by (officially registered in) the US Patent and Trademark Office, while plain old ™ trademarks have no legal backing. Protections for registered trademarks last for 10 years and can be renewed after that.
How do you write infinity in R? ›R represents infinite numbers using Inf or -Inf and you can check if a value is infinite using is. infinite() . A calculation will produce Inf or -Inf when the result is a number too large for R's memory to handle.
What is the R symbol with a circle? ›The symbol "R" in a circle signifies that a trademark has been registered in the U.S. Patent and Trademark Office for the goods inside the package.
What are the 3 dots with 3 lines called? ›All three together constitute an ellipsis. The plural form of the word is ellipses, as in "a writer who uses a lot of ellipses." They also go by the following names: ellipsis points, points of ellipsis, suspension points.
What are the three stacked dots called? ›An ellipsis (three dots) vertically aligned. It is sometimes used to communicate the continuation of a list vertically as opposed to horizontally.
What do 3 dots above a letter mean? ›The ellipsis ... (/ɪˈlɪpsɪs/, also known informally as dot dot dot) is a series of dots that indicates an intentional omission of a word, sentence, or whole section from a text without altering its original meaning.