ECON403 R lab01

2019-01-30

Why R?

  • Free
  • Best statistics packages
  • Great visualization tools
  • Strong numerical computional tools
  • tidyvers

Intended Learning Outcome

After this lecture, you

  • know basic R syntax
  • plot functions
  • solve equations

Pre-assessment

  • How many of you use R before?
  • How many of you use any programming language before?

1. Getting Help

1.1 Accessing the help files

  • Get help of a particular function.
In [3]:
?mean
  • Search the help files for a word or phrase.
In [5]:
help.search('weighted mean')
  • Find help for a package.
In [6]:
help(package = 'dplyr')

1. 2 More about an object

  • Get a summary of an object’s structure.
In [69]:
str(iris)
'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
  • Find the class an object belongs to
In [70]:
class(iris)
'data.frame'

2. Using Libraries

  • Download and install a package from CRAN.
In [71]:
install.packages("tidyverse")
Installing package into ‘/home/edubc2018/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
  • Load the package into the session, making all its functions available to use.
In [3]:
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 2.2.1     ✔ purrr   0.2.5
✔ tibble  2.0.1     ✔ dplyr   0.7.8
✔ tidyr   0.8.2     ✔ stringr 1.2.0
✔ readr   1.1.1     ✔ forcats 0.2.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

3. Working Directory

  • Find the current working directory (where inputs are found and outputs are sent).
In [12]:
getwd()
'/home/edubc2018/econ/econ403'
  • Change the current working directory.

setwd('C://file//path')

4. Vectors

4.1 Creating Vectors

  • Join elements into a vector
In [13]:
c(2, 4, 6)
  1. 2
  2. 4
  3. 6
  • An integer sequence
In [14]:
2:6
  1. 2
  2. 3
  3. 4
  4. 5
  5. 6
  • A complex sequence
In [15]:
seq(2, 3, by=0.5)
  1. 2
  2. 2.5
  3. 3
  • Repeat a vector
In [16]:
rep(1:2, times=3)
  1. 1
  2. 2
  3. 1
  4. 2
  5. 1
  6. 2
  • Repeat elements of a vector
In [72]:
x=rep(1:2, each=3)
x
  1. 1
  2. 1
  3. 1
  4. 2
  5. 2
  6. 2

4.2 Selecting Vector Elements

In [76]:
x
x[4]# fourth element.
  1. 1
  2. 1
  3. 1
  4. 2
  5. 2
  6. 2
2
In [77]:
x[-4] # All but the fourth.
  1. 1
  2. 1
  3. 1
  4. 2
  5. 2
In [78]:
x[2:4]# Elements two to four.
  1. 1
  2. 1
  3. 2
In [79]:
x[-(2:4)]# All elements except two to four.
  1. 1
  2. 2
  3. 2
In [80]:
x[c(1,5)] # Elements one and five.
  1. 1
  2. 2
In [87]:
x[x < 2] # All elements less than zero.
  1. 1
  2. 1
  3. 1

5. Programming

5.1 For Loop

for (variable in sequence/vector){
Do something 
}
In [19]:
# Example
for (i in 1:4){
j <- i + 10 
print(j) 
}
[1] 11
[1] 12
[1] 13
[1] 14

5.2 If Statements

if (condition){ 
Do something 
} else { 
Do something different  
}
In [21]:
# Example
if (i > 3){ 
print('Yes') 
} else { 
print('No')  
}
[1] "Yes"

5.3 Functions

function_name <- function(var){ 
Do something 
return(new_variable) 
}
In [23]:
# Example
square <- function(x){ 
squared <- x*x 
return(squared) 
}
square(2)
4

6. Reading and Writing Data

6.1 Read and write a comma separated value file.

In [118]:
df <- read.csv('https://nb.vse.cz/~zouharj/econ/wage1.csv')
write.csv(df, 'file.csv')
head(df, 3)
wageeducexpertenurenonwhitefemalemarriednumdepsmsanorthcen⋯trcommputradeservicesprofservprofocccleroccservocclwageexpersqtenursq
3.10 11 2 0 0 1 0 2 1 0 ⋯ 0 0 0 0 0 0 0 1.131402 4 0
3.24 12 22 2 0 1 1 3 1 0 ⋯ 0 0 1 0 0 0 1 1.175573484 4
3.00 11 2 0 0 0 0 2 0 0 ⋯ 0 1 0 0 0 0 0 1.098612 4 0

6.2 Tidyverse read_excel read_csv

In [119]:
library(readxl)
wage1 = read_csv('https://nb.vse.cz/~zouharj/econ/wage1.csv')
head(wage1,3)
Parsed with column specification:
cols(
  .default = col_integer(),
  wage = col_double(),
  lwage = col_double()
)
See spec(...) for full column specifications.
wageeducexpertenurenonwhitefemalemarriednumdepsmsanorthcen⋯trcommputradeservicesprofservprofocccleroccservocclwageexpersqtenursq
3.10 11 2 0 0 1 0 2 1 0 ⋯ 0 0 0 0 0 0 0 1.131402 4 0
3.24 12 22 2 0 1 1 3 1 0 ⋯ 0 0 1 0 0 0 1 1.175573484 4
3.00 11 2 0 0 0 0 2 0 0 ⋯ 0 1 0 0 0 0 0 1.098612 4 0
In [120]:
# Note that the Excel spreadsheet must be local (a URL does not work).
wage2 = read_excel('wage2.xls', sheet = 1)
head(wage2,3)
wagehoursIQKWWeducexpertenureagemarriedblacksouthurbansibsbrthordmeducfeduc
76940 9335 12 11 2 31 1 0 0 1 1 2 8 8
80850 11941 18 11 16 37 1 0 0 1 1 . 14 14
82540 10846 14 11 9 33 1 0 0 1 1 2 14 14

7. Maths Functions

In [123]:
x = wage1$wage[1:10]
t(x)
3.1 3.24 3 6 5.3 8.75 11.255 3.6 18.18
In [124]:
t(round(x, 1)) # Round to n decimal places.
3.1 3.2 3 6 5.3 8.8 11.25 3.6 18.2
In [125]:
max(x) # Largest element.
sum(x)#Sum.
mean(x)#Mean.
median(x)#Median. 
min(x) # Smallest element.
18.1800003051758
67.4200003147125
6.74200003147125
5.15000009536743
3
In [126]:
var(x) # The variance.
cor(x, x) # Correlation.
sd(x) # The standard deviation
23.3610408509191
1
4.83332606503215
In [127]:
t(log(x))#Natural log.
1.1314021.1755731.0986121.7917591.6677072.1690542.4203681.6094381.2809342.900322
In [128]:
t(exp(x))# Exponential.
22.1979525.5337220.08554403.4288200.33686310.68876879.92148.413236.5982378609279

8. Plotting

8.1 Values of x in order.

In [100]:
plot(x)

8.2 Values of x against y.

In [110]:
plot(wage1$educ,
     wage1$wage,
    xlab= 'educ',
    ylab = 'wage',
    col = 'blue')

8.3 Curve for Functions

In [116]:
curve(15+6*x -3*x^2,
      xlab= 'experience',
      ylab = 'income'      
     )

8.3 add 'abline' and 'text' function

In [173]:
curve(15+6*x -3*x^2,
      xlab= 'experience',
      ylab = 'income',
      col = 'purple'
     )
abline(v=0.15, col="blue") # add vertical  lines # change line colors
abline(h=0.40, col="red") # add  horizontal lines # change line colors
abline(a = 15, b = 2, col = 'black') # add a: intercept; b: slope lines # change line colors
abline(a = 18, b = -2, col = 'gray') # add a: intercept; b: slope lines # change line colors
points(x = 0.8, y = 16.5, type = 'p'   ,col="red") # x,y coordinate vectors of points to plot.
text(x=0.8,y=16.2, labels = "solution: 15, 2", col = 'green') # x,y are coordinates where the text labels should be written

9. Solve equations

In [174]:
# install.packages("rootSolve")
library(rootSolve)
In [170]:
## =======================================================================
##  simultaneous equations
## =======================================================================
model <- function(x){
    c(F1 = 500-0.1*x[1] - x[2], 
      F2 = 0.05*x[1]- x[2])
} 

(ss <- multiroot(f = model, start = c(1, 1)))
$root
  1. 3333.33333333347
  2. 166.66666666667
$f.root
F1
-1.66551217262167e-11
F2
3.69482222595252e-12
$iter
3
$estim.precis
1.01749719760846e-11

10. Post-assessment

  1. what is the standard deviation function in R?

    • a. std()
    • b. sd()
  2. What is the way to get help in R?

    • a.help(myfunction)
    • b. ?myfunction
  3. Subset first element in vector x

    • a. x[1]
    • b. x(1)
  4. https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3210040301
    • download csv file, and load to R

11. Summary

  • R syntax
    • R cheat sheet
  • Plot
    • plot(x, y)
  • solve equations
    • rootSolve

Reference