For those, who are new to programming, just consider variable as a box with a label. You can store some information in it. In R there are several ways how one can assign values to a variable.
# put 2 to x
x = 2
x
# put 3 to y
y <- 3
y
# put x+y into z
x + y -> z
z
Variables are case-sensitive. Try typing in Y
instead of
y
and you will see error.
As you see, you can type the variable name to see what is inside.
More advanced way to show the data is to use functions
print()
, cat()
, View()
.
print(z)
cat("x=",x,", y=",y,", z=",z,"\n")
To see what variables we defined, type ls()
. And if you
want to remove a variable - rm()
Try:
ls() # here are the variables
## [1] "x" "y" "z"
rm(list=ls()) # remove them
ls() # check
## character(0)
i = 5 # i assigned the value of 5
i <- 5 # i assigned the value of 5
i*2
i/2
i^2 # power
i%/%2 # integer division
i%%2 # modulo - the remainder of integer division
round(1.5) # round the results
atomic
types of dataAtomic types of data are what we call scalar in math. An
atomic value is a simple, unique value. You can get the class of the
data by functions class()
or mode()
.
Numbers can be presented by integer
or
numeric
data types. They are numeric
by
default
r = 1.5
len = 2 * pi * r # note: 'pi' - predefined constant 3.141592653589793
len
Logical or Boolean variables get two values - TRUE
(or
just T
) and FALSE
(or F
)
b1 = TRUE # try b1=T
b2 = FALSE # try b2=F
b1 & b2 # logical AND
b1 | b2 # logical OR
!b1 # logical NOT
xor(b1,b2) # logical XOR
r == len # does value in `r` equals to the one in `len` ?
r < len # is `r` smaller then `len` ?
r <= len # is `r` smaller or euqal then `len`
r != len # is `r` different from `len`
In R the text information is stored in variables of
character
class. Different to many other languages, one
atomic character
variable can contain entire text. In other
words, value “hello” is not considered as a vector of letters, but as a
whole.
"..."
or '...'
to
define your character
There are many functions that work with text in R. Let’s consider some of them
st = 'Hello, world!'
paste("We say:",st) # concatenation
## [1] "We say: Hello, world!"
# a more powerfull method to create text (as in C):
sprintf("We say for the %d-nd time: %s..",2,st) # directly prints output
## [1] "We say for the 2-nd time: Hello, world!.."
st = sprintf("By the way, pi=%f and N_Avogadro=%.2e",pi,6.02214085e23) # set output to `st` variable
print(st)
## [1] "By the way, pi=3.141593 and N_Avogadro=6.02e+23"
casefold(st, upper=T) # change the case
## [1] "BY THE WAY, PI=3.141593 AND N_AVOGADRO=6.02E+23"
nchar(st) # number of characters
## [1] 47
strsplit(st," ") # splits characters
## [[1]]
## [1] "By" "the" "way,"
## [4] "pi=3.141593" "and" "N_Avogadro=6.02e+23"
Very powerful functions are sub
and gsub
.
They replace regular expression template by defined character
value. sub
replace only the first match, gsub
- all matches.
sub(".+and ","",st)
## [1] "N_Avogadro=6.02e+23"
In R, there is a special value to denote missing
data. This value is NA
and it can be
assigned to a variable of any class. Whatever operation you do with
NA
value will be
NA
, except function is.na()
,
that returns TRUE
. Try:
na = NA # create variable `na` with NA inside
na + 1 # result is NA
100>na # result is still NA
na==na # result is still NA
is.na(na) # TRUE
NULL
. It shows that the variable is
defined, but contains nothing yet. is.null()
or
length()
may help checking for this value.Numeric numbers can be, in addition, infinite
(Inf
,-Inf
) and undefined not-a-number
(NaN
). Functions is.infinite()
,
is.finite()
and is.nan()
help detecting such
values.
1/0 # Inf
-1/0 # -Inf
is.infinite(1/0)
is.finite(1/0)
0/0 # undefined value NaN
sqrt(-1) # not a real number
Vectors combine atomic
elements of a single class. You
can have vector of numbers, logical values, characters… but not mixed.
Numeric vectors can be created by a simple sequence,
e.g. 1:5
. Generic function is c()
that takes
enumeration of elements and combine them. You can address to an element
of a vector using [i]
, where i
- is element
number (starts from 1).
a = c(1,2,3,4,5) # creating vector by enumeration
a
## [1] 1 2 3 4 5
a[1]+a[5]
## [1] 6
b=5:9
a+b
## [1] 6 8 10 12 14
length(a) # get length of `a`
## [1] 5
txt = c(st, "Let's try vectors", "bla-bla-bla")
txt
## [1] "By the way, pi=3.141593 and N_Avogadro=6.02e+23"
## [2] "Let's try vectors"
## [3] "bla-bla-bla"
boo = c(T,F,T,F,T)
boo
## [1] TRUE FALSE TRUE FALSE TRUE
a + 1:3
. The missing
values are circularly repeated.More advanced way to define sequences
seq(from=1,to=10,by=0.5) # a numeric sequence
rep(1:4, times = 2) # any sequence defined by repetition
rep(1:4, each = 2) # similar, but not the same
And here is one of the strongest feature of R
We can work easily with elements of the vector. The indexes of the vector can be vectors themselves.
a
## [1] 1 2 3 4 5
a[1:3] # take a part of vector by index numbers
## [1] 1 2 3
a[boo] # take a part of vector by logical vector
## [1] 1 3 5
a[a>2] # take a part by a condition
## [1] 3 4 5
a[-1] # removes the first element
## [1] 2 3 4 5
Matrices are very similar to vectors, just defined in 2 dimensions. They as well include atomic values of a single class. Arrays are multidimensional matrixes
Let us define a matrix with 5 rows and 3 columns
A=matrix(0,nrow=5, ncol=3)
A
A=A-1 # add scalar
A
A=A+1:5 # add vector
A
t(A) # transpose
A*A # by-element product
A%*%t(A) # matrix product
# alternative ways to create matrix:
cbind(c(1,2,3,4),c(10,20,30,40))
rbind(c(1,2,3,4),c(10,20,30,40))
ToDo: mention access to elements. [i,j], [,j]
ToDo: mention access by names and naming
Data frames are two-dimensional tables that can contain values of different classes in different columns.
Data = data.frame(matrix(nr=5,nc=5))
# let us add a column to Data
mice = sprintf("Mouse_%d",1:5)
Data = cbind(mice,Data)
# put the names to the variables
# NOTE: you can send parameters into some functions with "="
names(Data) = c("name","sex","weight","age","survival","code")
Data
## name sex weight age survival code
## 1 Mouse_1 NA NA NA NA NA
## 2 Mouse_2 NA NA NA NA NA
## 3 Mouse_3 NA NA NA NA NA
## 4 Mouse_4 NA NA NA NA NA
## 5 Mouse_5 NA NA NA NA NA
# put in the data manualy
Data$name=sprintf("Mouse_%d",1:5)
Data$sex=c("Male","Female","Female","Male","Male")
Data$weight=c(21,17,20,22,19)
Data$age=c(160,131,149,187,141)
Data$survival=c(T,F,T,F,T)
Data$code = 1:nrow(Data)
Data
## name sex weight age survival code
## 1 Mouse_1 Male 21 160 TRUE 1
## 2 Mouse_2 Female 17 131 FALSE 2
## 3 Mouse_3 Female 20 149 TRUE 3
## 4 Mouse_4 Male 22 187 FALSE 4
## 5 Mouse_5 Male 19 141 TRUE 5
If you wish to select only male mice from the dataset, use indexes:
Data[Data$sex == "Male",]
## name sex weight age survival code
## 1 Mouse_1 Male 21 160 TRUE 1
## 4 Mouse_4 Male 22 187 FALSE 4
## 5 Mouse_5 Male 19 141 TRUE 5
Useful functions to see what is inside your data frame:
View(Data) # visualize data as a table
str(Data) # see the structure of the table or other variables
head(Data) # see the head of the table
summary(Data) # summary on the data
Factors are introduced instead of character vectors with repeated
values, e.g. Data$sex. A factor
variable includes a vector
of integer indexes and a short vector of character - levels of the
factor.
# Let's use factors
Data$sex = factor(Data$sex)
summary(Data)
# usefull commands when working with factors:
levels(Data$sex) # returns levels of the factor
nlevels(Data$sex) # returns number of levels
as.character(Data$sex) # transform into character vector
Lists are the most general containers in classical R. Elements (fields) of a list can be atomic, vectors, matrices, data frames or other lists. Let’s create a list that includes data and description of an experiment.
L = list() # creates an empty list
L$Data = Data
L$description = "A fake experiment with virtual mice"
L$num = nrow(Data)
str(L)
Access to list elements:
L$Data
L$num
# or by index:
L[[1]]
L[[3]]
# other ways:
L[["num"]]
L$'num'
Please, do the following tasks:
- Compare two numbers: \(e^\pi\) and \(\pi^e\). Print the results using
cat()
use:
pi
,exp()
,^
,>
,cat()
- Create a vector of exponents of 2: \(2^0\), \(2^1\), \(2^2\), …, \(2^{10}\)
i:j
,^
- Output the results of Task b as a vector of character with a template: “2^i = x”.
print()
,sprintf()
,sprintf("2^%d = %d", i, x)
- Output the results of Task c, showing only even exponents.
print()
,seq()
or%%
- Create a character vector containing the names of 10 cities. Filter out and return only those names that have more than 5 characters.
nchar()
- Create two vectors of length 5 containing random numbers. Perform element-wise multiplication and calculate the dot product of the two vectors
sum()
- Create a 4x4 matrix filled with numbers from 1 to 16. Extract the second and fourth rows, and then calculate the sum of each row.
sum()
- Create a data frame that contains information about 10 people: their names, ages, heights, and weights. Summarize the data frame.
data.frame()
,summary()
- Add a new column to data frame from (h) that categorizes people based on their age group (“young” - less or equal to median age, “old” - older than median age). Transform the column to
factor
data.frame()
,median()
,factor()
- Create a list that contains data frame from (i) and a text field containing a random description of (you can use the task as the text example). Now print the mean age of the people.
list()
,$
,mean()
Prev Home Next |