Data Analysis with R

  Data Analysis with R

A. Installation of R and R Studio 

R can be installed by clicking the following link and choosing your operating system.

https://cloud.r-project.org/

After choosing the operating system (e.g. windows). It will take us to the installer file. Select Install R for the First Time. And in the page that opens, choose Download R 4.3.0 for Windows.  

Or, you can download R for windows directly by clicking here.

Install the file after download completes.

The R studio can be downloaded from :

https://www.rstudio.com/products/rstudio/download/ 

Click download the free version and in the page that opens click the button "Download R Studio for Windows".

B. Download the Data and R script file from the Link Below 

 Downloading the Data and R script File from the link below.

Data File and R File: Download 

 ..................................................................................................................................................

1. Introduction to R?

R is a powerful program for data analysis and graphics. It was initiated by Robert Gentleman and Ross Ihaka at the University of Auckland, New Zealand, in the early 1990s, and has been developed by an international team since mid-1997. 

More about R can be found at https://www.r-project.org/about.html

2. Why to USE R?

1. It is free to download and use.  You do not have to pay like STATA and SAS.
2. It is opensource. People can expand its functions through packages.
3. It can handle almost all data formats.
4. There are tools for extensive graphical visualizations.
5. Easy to share output.
6. Reproducibility through the R script files.
7. One can learn the calculations step by step. 

3. R Studio   

R studio is an application(IDE) for R that helps write, edit and execute codes in R. It has a GUI interface that makes our life easier. Thus, for R studio to work, we must have installed R first. However, while working in R studio, it is not necessary to open the R application simultaneously.   

 4. R Studio Window  

We can divide the R window into four parts as shows below: 

Code Editor / R Script :   It is the place where we type codes for data analysis. It will not open be default in R studio but can be opened by going to file>New File>R Script or pressing the keys 'Ctrl+Shift+N'. 

R Console :  It is the part where results are displayed from the executed command. We can type commands directly here rather than typing in R script file. The only difference is that commands types here cannot be executed in the future whereas the commands typed in R script can be saved in a file and be executed in the future too.

Environment/History :  In the environment tab, all the names of the objects created by us during data analysis such as data file, graphs, tables, estimations are displayed. The history tab records the codes run on R console.

Files/Packages/Plots/Help :  The 'Files' tab shows the list of files of our working folder(directory). The 'Packages' tab shows that list of installed packages and an option to install new packages. The 'Plots' tab shows the charts created by us and the 'Help' tab shows help on various commands with examples.  

5. Shortcuts in R Studio

Ctrl+shift+1: Move Focus to Source

Ctrl+ shift+ 2: Move Focus to R Console

Ctrl+ shift+ 3: Move Focus to Help

Ctrl+ shift+ 4: Show History

 Ctrl+ shift+ 5: Show Files

Ctrl+ shift+ 6: Show Plots

Ctrl+ shift+ 7: Show Packages

Ctrl+ shift+ 8: Show Environment

Ctrl+S: To save R file

Ctrl+Shift+N: New R script 

Ctrl+O: Open file 

Ctrl+W: Close file

Shift+Alt+G: Go to line

Ctrl+Enter: Run the selected line

Ctrl+Alt +R: Run ALL

Ctrl+Alt +B: Run from beginning to the line

Ctrl+Alt +E: Run from the line to the end

Ctrl+L: Clear console

Ctrl+Q:  Quit Session

Ctrl+Shift +H:  Choose Working Directory

6. Things to Remember while Preparing Data for R?

  • Use first row as variable names.
  • Do not use space in variable names. For instance do not keep 'age of the respondents' as variable name, rather you can join them as 'age_of_the_respondents'.  
  • Do not use special characters such as /, @, #, +,-, *, & etc in variable names.
  • Columns names must be unique.
  • Do not keep blank rows in your data.
  • R is case sensitive. So, so the variable 'Age' and 'age' are understood by different variables by R. 
  • Change the missing values to missing value codes such as -999 or replace by 'NA'.
  • It better to save the data file as .csv or .txt file. 

7. Setting Working Directory in R

It is useful to tell R the folder in which we have saved our data for analysis and where we want the output to be saved. This is done by setting a folder in our computer as the 'Working Directory' for R.

To set a folder as working directory , we should type :

 setwd("folderpath") # put the folder path inside the inverted comma

For instance: 

setwd("C:/Users/siddhabhatta/Desktop/October31") 

Note that we use forward slash(/) rather than backward slash(\) while specifying the folder path. 

 8. Importing Data in R Studio 

Data can be imported in R studio by GUI interface as well as command line interface. To import the data through GUI interface, we should go to 'Import Dataset' tab just below the 'History tab on top right and select the option that matches with our data. For instance, to import data from excel format, we should go to 'Import dataset", then select 'From Excel' and then browse the excel file from our computer and click 'Import'.  

 To import the dataset through codes, we use the following codes (data3 is the name of the datafile):

#importing the data automatically
#from excel file
library(readxl)
datafile <- read_excel("data3.xlsx", sheet="data")

It imports the data3 in R and saves it as datafile.
#from txt file 

datafile <- read.delim("data3.txt") 

#From CSV file 

datafile <- read.csv("data3.csv", header=T)
#From stata file
library(haven)
datafile <- read_dta("data3.dta")

#From R file
load("data3.RData")
#From Clipboard 

For this we should copy the dataset first.
datafile<-read.table("clipboard", header=T) 

#importing data from website
datafile<- read.csv("https://sites.google.com/site/siddhabhatta/data/data3.csv", header=T)
#dta file from the internet 

library(foreign)
datafile<-read.dta("https://sites.google.com/site/siddhabhatta/data/data3.dta")

#Importing by choosing the file manually
datafilemanual<-read.csv(file.choose(), header=T)
library(foreign)
datafile<-read.dta(file.choose())