Ayushi Gupta — February 4, 2022
Beginner Data Visualization Deep Learning Machine Learning Statistics

Overview

  • Motivation to Learn R
  • Covering the BASICS & MUST KNOW Concepts in R

Introduction

Since you are reading this article, I am assuming that right now you are in your journey of becoming a data scientist. There is a high possibility that you already are aware of some of the data visualization and analytics tools like Excel , SQL , Tableau and might have heard the name ‘Python’. In this article we will be bringing out another fruit from the data scientist’s basket and introducing R. But you might be thinking ‘What is R?’ , ‘Why R?’ , ‘Is it not just an alternative to Python?’

Don’t worry ! Before going on to the wide range of topics that we are going to cover in this article about R we will be beginning with a very basic question : ‘What is R?’

Table of Contents

  • What is R?
  • What can R do?
  • Code Execution in R
  • Syntax Rules in R
  • Operators in R
  • Data Types in R
  • Type Casting (Data Types)
  • Libraries in R
  • Data Structures in R
  • Vector : How to create a Vector in R?
  • Data Frame : How to create a Data Frame in R?
  • Importing Data Frames in R
  • Functions in R
  • How to define functions in R : User Defined Functions

 

What is R?

Such an easy question! It is a programming language.

No ! R is not just a programming language rather it is a Statistical Programming Language.

R is not a typical programming language and this is what sets it apart from the Python language. Apparently it is very different from Python which is a general purpose programming language and not made particularly for data manipulation , statistical analysis and moreover solving statistical problems.

Essentially , R is:

  1. A Statistical Language : It is popularly said that ‘R is a language Made by the statisticians for the statisticians’.

  2. A Programming Language : Like every other programming language including Python here we have to write codes i.e. do programming to derive desired results and accomplish the tasks.

  3. An Object Based Language : In R everything we create is saved as an object and all the operations are done on those objects.

  4. A Dynamically Typed Language : Unlike programming languages like C,C++ in R there is no need to declare the class of the objects we create. It automatically understands the datatype of the object.

  5. An Open Source Language : It is available to the public and is completely free to use.

  6. A Modular Language : There are various libraries(also known as modules) available here which have pre written codes in them and can be used to solve various purposes and expand the capability of the language.

What can R do?

We can gain a lot from this language. We can perform a lot of tasks here. Stating a few of them :

  1. Descriptive Statistics
  2. Inferential Statistics
  3. Statistical Modeling
  4. Data Visualization
  5. Data Manipulation
  6. R Programming
  7. Machine Learning Modeling
  8. Deep Learning Modeling

If you are finding it interesting till now , then the next step in the process of learning R is going to be the installation of the software. Just go to Google and from there download R and R Studio. Alternatively you can install R Studio from Anaconda as well.

Get familiar with the R Studio interface majorly with Script , Console , Environment , History , Plots , Packages and Help.

We will be using a R Script for this article. To open an R Script just follow these steps.

R Studio → File → New File → R Script

Remember : We will be writing the commands/code in the R Script and will look for the output in the Console. (There are reasons for this , Trust Me!)

You are good to go now! Let’s begin!

Code Execution

In R Studio we can execute the commands in multiple ways. To execute :

  1. A single line of code : Put the cursor on that line and Click on Run
  2. Multiple lines of code : Select the lines you want to execute and Click on Run
  3. Entire R file : There are 2 ways for that –
    1. Click on Source : This will execute the whole R file and only print the output you want to display.
    2. Click on Source with Echo : This will execute the whole R file but it will print all the commands along with the output you want to display.

This displays all the icons.

R Language

R Language

Shortcut for the Run button : Ctrl + Enter

Shortcut for the Source button : Ctrl + Shift + S

Shortcut for the Source with Echo button : Ctrl + Shift + Enter

Syntax Rules in R

Since the code we write in any programming language consists of logic , function and syntax, it becomes necessary to learn about the R syntax here.

Point 1 : R is a case sensitive language. Both user defined and predefined objects need to be written as it is.

Point 2 : Since we talked about the way the objects must be written in R in the above point, it becomes vital to mention some of the rules we need to follow while creating objects in R i.e for the user defined objects. These rules are known as the Object Naming Rules and states that the object name :

  1. Can contain only alphabets

  2. Can contain both alphabets and numbers

  3. Cannot contain only numbers

  4. Cannot start with a number

  5. Cannot contain any special characters except . and _ which can be included just in between the name and not in the beginning or end

  6. Cannot contain any spaces

  7. Must not coincide with any other object name whether predefined or user defined.

Not following these naming conventions sometimes leads to an error and at other times must be avoided.

Writing Comments in R :

A) Single Line Comments : Use the # symbol at the beginning of the line. This symbol can be placed anywhere in the line and everything following the symbol gets commented i.e. is not executed.

R Language

B) Multiple Line Comments : To comment a set of lines i.e. make them unexecutable , select those lines and press Ctrl+Shift+C.

R Language

Fun Fact : We can write multiple commands in a single line just by separating the different commands with the ; operator.

How? For the following code :

var1=100

var2=200

var3=var1-var2

We can simply write :

var1=100 ; var2=200 ; var3=var1-var2

Operators in R

Operators are some symbols that are used to perform certain operations on the operands. The various categories of operators and the symbols within each category are somewhat similar in all the data science tools and languages that we have and so it would not be a surprise to you. Let’s quickly look into the various operators R has !

Arithmetic Operators : + , – , * , / (gives exact answer) , %% (the modulus operator which gives the remainder) , %/% (results in integer division)

Input in Script 

Operators in R

Output in Console 

Output in Cansole

 

Relational Operators : , = , == , !=

Input in Script

Relational operations

Output in Console

Output in console

So the relational operators result in the output as TRUE or FALSE.

We just learnt relational operators and there are instances where we need to combine the output these relational operators result in and hence for that we have the logical operators.

Logical Operators : & (and) , | (or)

Input in Script

Logical operations

Output in Console 

Output in console

Assignment Operators : = , ← , →

← is going to be the most used operator while writing the R commands whereas → is going to be the least used.

What about = ?

You must have encountered that = is used as an assignment operator in most of the other tools that we have including Python of course.

Well , in R both = and ← can be used to assign values however they do differ in some sense which you will get to know when you learn about data manipulation in R.

Let’s create some objects using operator ,

Input in Script

Input in Script | R Language

The objects x ,y ,z get the values as:

Assignment operators | R Language

The values are just stored in the objects that we can see in the Environment.

Lastly , introducing the operator we will be requiring every time we want to use a function defined in a particular library. (And we require this A LOTTT !)

Package Reference Operator : It is made up of 2 colons represented as :: and allows us to use the functions defined in R libraries.

Want to see how? The syntax is :

LibraryName::FunctionName()

By now we have talked a bit about the ‘objects’ in R. Every object in R has a class and this class can be a data type or a data structure. The concept of class is of real importance in R majorly because the class of an object helps us determine the various functions associated with it.

So we got 2 new terms namely : data type and data structure. Let’s get to their detailed explanation.

Data Types in R

The concept of data type is ancient and by now we are quite familiar with the major data types we have for the data in various tools and technologies. Encountering this term what immediately comes in my mind is text, numbers, date and boolean .That’s pretty much it.

All the structured data that we have can be categorized into these 4 categories majorly but the names could be slightly different in different tools. In R we can classify them as:

  1. Text : This category can further be classified into character and factor. If anything is stored in “ “ or ‘ ‘ it has a class of character.

  2. Numbers : This category can further be classified into numeric , integer and complex. However we will be encountering the numeric type the most in this case.

  3. Boolean : We have logical data type here which is apparently for the TRUE and FALSE output we receive.

  4. Date : The most complicated data type in R , it is not a direct data type but instead a derived data type and hence a whole new topic which needs to be discussed separately.

To understand the concept of dates in R , refer this article : Link to Article to be added before publishing

Previously we created few objects :

Input in Script

Let’s determine their class (data type here):

Input in Script

We get the data types as:

Output in Console

Type Casting

Does the name ‘Type Casting’ suggest anything about the concept?

It does! Type refers to the data type we just learnt about and casting refers to the conversion of this data type from one to another.

Essentially , Type Casting is the process of changing the data type of an object in R to another data type.

Suppose we have an object “demo” with us having any particular data type. To see this object in the form of another data type say “new_datatype” we write the command as as.new_datatype(demo) and we are done.

Note : Do you know that we can use only Console for both the input and the output part. Let us use Console for the rest of the article!

a is an object having value as 100 so its class is numeric.

To display this object as character we can write :

And we get the value of numeric object a as a character i.e “100”

Remember that the object a remains numeric as we didn’t save our result into any object.

Similarly we can write commands like :

as.character(demo)

as.numeric(demo)

as.logical(demo)

as.integer(demo)

and so on….

But this process has some rules , not every data type can be transformed to another type. There is a precedence that these data types follow according to which type casting is done.

Taking the most used data types : character, numeric ,logical.

Here type casting can be done from bottom to top but not vice versa.

In general , for any object the class cannot be converted from character to numeric or logical. Similarly for any object the class cannot be converted from numeric to logical.

However there are various exceptions and special cases to this.

To learn about the concept of Type Casting in detail I strongly suggest you to go through this article : Link to Article to be added before publishing

Libraries in R

We touched on this part at the beginning when we discussed that R is a modular language and therefore has something known as a library (also called module). R has two kinds of libraries: System libraries and User libraries which are more than 18,000 in number.

These libraries are available on CRAN (Comprehensive R Archive Network) which is a global repository of open source packages.

Now when it comes to libraries in R there are 3 things to keep in mind : Available Libraries , Installed Libraries and Loaded Libraries.

While available libraries refer to all the packages there on CRAN , installed libraries refer to all those libraries which are installed in your system and the loaded libraries are the ones that you explicitly load each time you open R in order to use the various functions listed there.

How to Install a Library ?

I will be using the GUI method here which is quite easy !

Go to PackagesInstall → Give the Package Name

How to Load a Library?

Having a library installed in your system doesn’t mean that you can use it (functions defined inside) any time rather you need to explicitly load that library and the preferred way of doing the same is : library(library_name)


(Note : dplyr is one of the most commonly used libraries in R which contains various important functions (predefined functions) in it used for the purpose of data manipulation in R. Some of the commonly used functions defined in dplyr are mutate() , rename() , filter() etc. )

Since we already touched on the concept of package reference operator in the beginning , let me throw some light. It is used in the function calling part of the code whenever we are using a particular function from a particular library.

The operator :: denotes that we are referring to the mutate function from the dplyr library.

Data Structures in R

Just a few minutes ago you read about the data types in R , just like that we have another concept known as the data structures. The major ones are Vector and Data Frame.

Vector

A vector in R is an object and indeed an integral part of a data frame around which everything revolves in R. Vectors are created to store multiple elements in just a single object.

How to create a vector?

Use c and pass the values inside it.


We get our vector name and vector numbers.

Here we took similar values to create the vectors , all character in case of name and all numeric in case of numbers however we can create vectors with mixed values as well having character , numeric , logical etc. together but in that case the vector takes the highest data type (type casting occurs) as its class according to the precedence rules we learnt above.

Data Frame

It is a two dimensional data structure which is essentially made up of multiple one dimensional data structures called vectors. Since its 2D therefore has rows and columns where the columns are nothing but vectors and the rows are made up by the data that these vectors contain.

There are 2 ways to get a data frame:

Method 1 : Importing the data from some source and saving it as a data frame. We will be talking about this at the end.
Method 2 : Combining Vectors

For this we use data.frame as the function.

Let’s create some vectors first :

Now we will combine v1,v2,v3 to form a data frame details.

R Language

Our data frame is ready , to display it simply write details and we get :

R Language

However to view the data frame properly in a new tab we can use the View function. Simply write View(details) and we get :

R Language

Did you notice something?

Yes ! We took all the 3 vectors of the same size. Well that’s necessary here.

So this is how we can create data frames in R.

Importing Data Frames in R

In R we can import the data (our dataframe) from various sources.

To name a few we can import csv files , delimited files (tab delimited files) , excel files (spreadsheets) ,SAS files ,XML files etc.

The format we follow to import these data frames in R is :

df ← function_name(“file_absolute_path/filename.extension”)

R has different functions to read different type of files and hence the function_name

file_absolute_path refers to the absolute path of the file which can be obtained by simply replacing with / in the path.

Finally df is the name of the data frame in which we are storing the imported data.

Additional arguments can be added towards the end of the command in case of some files.

Try importing the data frames yourself with a little bit of research on the functions and arguments required for various types of files.

Functions in R

We might already have used the word ‘function’ by now. Functions come in handy when we want to perform a certain task multiple times. While there are some functions already defined in R like sum() , min() etc. which can be used directly to perform tasks like finding the sum of numbers and the least number amongst a set of numbers respectively, we too can create our own functions in R which are popularly known as the UDF’s.

Functions can be categorized broadly as:

  1. Build-In Functions : Those functions which are already defined in R

  2. User Defined Functions : Those functions in R which we can write on our own.

Let’s get to know how we can write our own functions !

How to Define Functions in R : User Defined Functions

Quite a simple process ! To define our own function in R we use the function keyword. The syntax is :

Your_function_name ← function(argument(s)){

Statement 1

Statement 2

.

.

}

Your_function_name → Any identifier name you want to give for the function

argument(s) → The User Input

Let’s create a function cube which finds the cube of numbers the user inputs:

Function Definition

R Language

By this we have created our own function ‘cube’ which can be called using the statement cube() passing the number (you want to find the cube of) as an argument in it.

Function Call

R Language

Similarly, we can create more such functions according to our usage.

So here we come to the end of this article covering the basics of R. But it is just the starting, to get the actual taste of programming in R we need to cover many more advanced topics.

Endnotes

In this article, we covered the entire introductory part to programming in R Language. I hope that by now you must be confident enough to write commands in the R language and perform tasks like data manipulation but this isn’t it! With R programming we can perform many more sophisticated tasks and hence the learning must not stop here. Let’s move on to Learning Statistical Analysis and Machine Learning with R Language.

Read more articles from our blog page!

If you still have any queries on R Language with respect to these myths, do let me know in the comments below. We can get on a quick chat there.

You can connect with me on LinkedIn: https://www.linkedin.com/in/ayushi-gupta25/

About the Author

Ayushi Gupta

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *