Video 3

3.1. Current folder

R can import data from local storage or Internet, and export it locally. Let us first setup the current folder, where our results will be stored.

getwd()   # shows current folder
dir()     # shows files in the current folder
dir.create("D:/Data/R") # create a folder
setwd("D:/Data/R") # sets the current folder

3.2. Read values from a text file

Here is EUR/USD ratio from January 1999 till April 2017.

We can read values from unformatted text file using scan().

SomeData = scan("http://edu.modas.lu/data/txt/currency.txt",what = "") # what - defines the value class
head(SomeData)

In fact, you can download an entire webpage by scan to parse it afterwards. It’s funny, but we need to get readable data.


3.3. Read text tables

We will use read.table() to import the data as a data frame.

Date EUR
1999-01-04 1.1867
1999-01-05 1.1760
1999-01-06 1.1629
1999-01-07 1.1681
1999-01-08 1.1558

Some parameters are important in read.table():

Currency = read.table("http://edu.modas.lu/data/txt/currency.txt", header=T, sep="\t", as.is=T) 
str(Currency)

Do not forget functions that allow you seeing, what is inside your data:

head(Currency)
summary(Currency)
View(Currency)

Let’s make the first plot.

plot(Currency$EUR)

Hmm… it’s quite ugly… We will improve it later.


3.4. Read values from a binary file

R can keep data in GZip-ed form, automatically loading the variables into memory. Such files have .RData extension. This is a fast & easy way to store your data. Let us first download the data in RData format into you working directory using download.file() and then load it by load(). Parameters of downloading:

download.file("http://edu.modas.lu/data/rda/all.RData",
              destfile="all.RData",mode = "wb")

getwd()                # show current folder
dir(pattern=".RData")  # show files in the current folder
load("all.RData")      # load the data

ls()                   # you should see 'GE.matrix' among variables

View(GE.matrix)         

You can see row and column names of the loaded data.frame object:

attr(GE.matrix,"dimnames")    # annotation of the dimensions
rownames(GE.matrix)
colnames(GE.matrix)

3.5. Read Excel tables

R can read Excel files using one of tidyverse packages: readxl. Install it and attach the library:

# install.packages("readxl")
library(readxl)

Note: read_excel() can only read from folders, not from Internet! So, we will first download Excel file:

download.file("http://edu.modas.lu/data/xls/cancer.xlsx",destfile="cancer.xlsx",mode = "wb")
getwd()

Function read_excel() can be used to read both “xls” and “xlsx” files. Some parameters:

It will read Excel file into a tibble object - tidyverse version of a data.frame. If you wish, you can transforme it by as.data.frame() function.

Cancer = read_excel("cancer.xlsx") 
str(Cancer)
## now Cancer is a 'tibble' - tidyverse object for data.frame
## if you prefer standard data.frame:
Cancer = as.data.frame(Cancer)
str(Cancer)

3.6. Data export

There are several ways to export your data. Let’s consider the most simple.

Parameters of write.table are:

write.table(Currency,file = "curr.txt",sep = "\t",
            eol = "\n", na = "NA", dec = ".",
            row.names = FALSE, quote=FALSE)

You can also save object in binary format (faster and smaller file):

save(Currency,file="Currency.RData") # save as binary file

save(list=ls(),file="workspace.RData") # save all variables as binary file

getwd()
dir()      # see the results

Exercises

  1. Dataset from http://edu.modas.lu/data/txt/shop.txt contains records about customers, collected by a women’s apparel store. Check its structure. View its summary.

read.table, View, str, summary, head

  1. For the “shop” table, save into a new text file only the records for customers, who paid using Visa card.

write.table


Prev Home Next

By Petr Nazarov