Video 3

3.1. Current folder

R can import data from local storage or Internet, and export it locally. Let us first setup the current folder, where our results will be stored.

getwd()   # shows current folder
dir()     # shows files in the current folder
dir.create("D:/Data/R") # create a folder
setwd("D:/Data/R") # sets the current folder

3.2. Read values from a text file

Here is EUR/USD ratio from January 1999 till April 2017.

We can read values from unformatted text file using scan().

SomeData = scan("http://edu.modas.lu/data/txt/currency.txt",what = "") # what - defines the value class
head(SomeData)

In fact, you can download an entire webpage by scan to parse it afterwards. It’s funny, but we need to get readable data.


3.3. Read text tables

We will use read.table() to import the data as a data frame.

Date EUR
1999-01-04 1.1867
1999-01-05 1.1760
1999-01-06 1.1629
1999-01-07 1.1681
1999-01-08 1.1558

Some parameters are important in read.table():

  • header - set it TRUE if there is a header line
  • sep - separator character. "\t" stands for tabulation
  • as.is - prevents transforming character columns to factors.
Currency = read.table("http://edu.modas.lu/data/txt/currency.txt", header=T, sep="\t", as.is=T) 
str(Currency)

Do not forget functions that allow you seeing, what is inside your data:

head(Currency)
summary(Currency)
View(Currency)

Let’s make the first plot.

plot(Currency$EUR)

Hmm… it’s quite ugly… We will improve it later.


3.4. Read values from a binary file

R can keep data in GZip-ed form, automatically loading the variables into memory. Such files have .RData extension. This is a fast & easy way to store your data. Let us first download the data in RData format into you working directory using download.file() and then load it by load(). Parameters of downloading:

  • destfile - the file name, under which you would like to store the downloaded file.
  • mode - the way you would like to treat the data (as text or binary). To keep binary data unchanged, use wb!
download.file("http://edu.modas.lu/data/rda/all.RData",
              destfile="all.RData",mode = "wb")

getwd()                # show current folder
dir(pattern=".RData")  # show files in the current folder
load("all.RData")      # load the data

ls()                   # you should see 'GE.matrix' among variables

View(GE.matrix)         

You can see row and column names of the loaded data.frame object:

attr(GE.matrix,"dimnames")    # annotation of the dimensions
rownames(GE.matrix)
colnames(GE.matrix)

ToDo: RDS file


3.5. Read Excel tables

R can read Excel files using one of tidyverse packages: readxl. Install it and attach the library:

# install.packages("readxl")
library(readxl)

Note: read_excel() can only read from folders, not from Internet! So, we will first download Excel file:

download.file("http://edu.modas.lu/data/xls/cancer.xlsx",destfile="cancer.xlsx",mode = "wb")
getwd()

Function read_excel() can be used to read both “xls” and “xlsx” files. Some parameters:

  • path - path and file name
  • sheet - either name of the sheet or its number
  • col_names - are there column names? (default = TRUE)
  • col_types - types of the columns. Automatically detected by default

It will read Excel file into a tibble object - tidyverse version of a data.frame. If you wish, you can transforme it by as.data.frame() function.

Cancer = read_excel("cancer.xlsx") 
str(Cancer)
## now Cancer is a 'tibble' - tidyverse object for data.frame
## if you prefer standard data.frame:
Cancer = as.data.frame(Cancer)
str(Cancer)

3.6. Data export

There are several ways to export your data. Let’s consider the most simple.

  • write() - writes a column of numbers / characters
  • write.table() - writes a data table
  • save() - saves one or several variables into a binary RData file.

Parameters of write.table are:

  • eol - character for the end of line (can be differ with OS). The standard one is “”
  • dec - decimal separator
  • quote - do we put “” around character values or not
  • row.names - do we put row names as a column or not
write.table(Currency,file = "curr.txt",sep = "\t",
            eol = "\n", na = "NA", dec = ".",
            row.names = FALSE, quote=FALSE)

You can also save object in binary format (faster and smaller file):

save(Currency,file="Currency.RData") # save as binary file

save(list=ls(),file="workspace.RData") # save all variables as binary file

getwd()
dir()      # see the results

Exercises 1.3

  1. Dataset from http://edu.modas.lu/data/txt/shop.txt contains records about customers, collected by a women’s apparel store. Check its structure. View its summary.

read.table, View, str, summary, head

  1. For the “shop” table, save into a new text file only the records for customers, who paid using Visa card.

write.table

  1. Calculate mean Sales for men and women.

mean

  1. Buld a table showing mean Sales for single/married men/women (contigency table).

mean

  1. Download and open beer.xlsx file from http://edu.modas.lu/data/xls/beer.xlsx. Build a contigency table using fucntion table() to check beer preferences among men and women in the study.

download.file, read_excel, table

f*. Export the contigency table (e) into a CSV file.

table, unclass, as.data.frame, write.table


Prev Home Next

By Petr Nazarov