Module 49 Conditional statements
Learning goals
- Understand what conditional statements are, and why they are so awesome.
- Be able to write your own conditional statements in
R
.
First steps
An example of a conditional statement is, “If ______ happens, do _____. Otherwise, do _____.”
In R
code, conditional statements work a similar way: they let a variable’s value determine which process to carry out next.
Here is a basic example:
Let’s break this if
statement down.
- The
if
command opens up a conditional statement. - The parenthetical
(x==3)
is where the logical test happens. If the result of this test isTRUE
, then theif
statement will be processed; if not, theelse
statement will be run instead. - The curly brackets,
{ }
serve to contain the code that will be run, depending on the outcome of the logical test. - The
else
command indicates the start of the code that will be run if the logical test’s result isFALSE
.
This code ran the logical test x==3
, determined its outcome to be FALSE
, and so it skipped the if
code and ran the else
code instead.
Here is another example of the same idea:
Since x==3
returned FALSE
, y
was defined as 0
instead of 100
.
Note that you do not need to define an else
statement. If you do not, R
will simply do nothing if the logical test is FALSE
.
(Nothing happened)
You can also feed the if
statement a logical object instead of a test.
x <- 4
x_test <- x == 3
x_test # check out the value of x_stest
[1] FALSE
if(x_test){
message("x is equal to 3!")
}
Not that, since x_test
is a logical value (TRUE
or FALSE
), you do not need to write out a logical test within the if
parenthetical. But you are free to do so if you wish:
This if
statement is saying that, if it is TRUE
that x_test
is FALSE
, print a message saying so.
Next steps
Nested conditions
You can nest conditional statements within others as many times as you wish:
x <- 6
if(x==3){
message("x equals 3")
}else{
if(x < 3){
message("x is less than 3")
}else{
message("x is greater than 3")
}
}
Note that every open bracket, {
, needs a corresponding closing bracket ,}
. Most errors associated with if
statements involve missing brackets.
Handling NA
s, NaN
’s, Inf
s, and NULL
s
if
statements can be particularly helpful when your dataset contains missing or broken values. R
includes base functions that help you carry out logical tests concerning missing values. These three test below can be very helpful within if
statements.
is.finite()
tests whether a numeric object is a real, finite number.
Here is an if
statement making use of is.finite()
:
x <- 0
y <- 10/x
if(is.finite(y)){
message("y is indeed a finite number!")
}else{
message("y is not a finite number!")
}
is.na()
tests whether a variable contains a missing or broken object.
x <- "eric"
is.na(x)
[1] FALSE
y <- as.numeric(x) # try converting `x` to a numeric value
is.na(y)
[1] TRUE
is.finite(y)
[1] FALSE
NA
stands for “Not Available”. NaN
is also a common sign of a broken value. It stands for Not a Number.
is.null()
tests whether a variable is empty. That is, it has been initiated, but it contains no data.
Joint conditions
if
statements can also accommodate joint logical tests. For example, the follow if
statement only returns if a message if two tests are true:
The next if
statement returns a message if either of the logical statements are true:
NULL
as a default
Setting the default value for an input to NULL
can be useful in certain use cases. For example, let’s say that if b
is not defined by the user, you want its value to be set to five times the value of x
.
my_function <- function(x,a=1.3,b=NULL){
# Handle input `b`
if(is.null(b)){
b <- x*5
print(paste("b was NULL! Setting its value to ", b))
}
# Now perform process
y <- a*x + b
return(y)
}
In this function, a conditional statement is used – if(is.null(b)){ ... }
– to handle the input b
when the user does not specify a value for it. When b
is NULL
, the logical test is.null(b)
will be TRUE
, which will trigger the conditional statement and case b
to be defined as x*5
. Conditional statements will be convered in detail in the next modules.
Try running the function with and without providing a value for b
.
Conditional statements such as if(is.null(x)){ ... }
or if(is.na(x)){ ... }
will be helpful in dealing with all the possible values that a user can pass to your custom functions.
Complex inputs
You can pass vectors, dataframes, and any other data structure as inputs in your own custom functions. For example:
Complex function outputs
At some point you will want multiple objects to be returned by your function. For example, perhaps you want both y
and b
to be returned now that you can define b
according to the value of x
.
Unfortunately, the return()
command does not let you include multiple objects. return(y,b)
will not work. To make it work, you have to place your output objects within a single object, such as a vector, dataframe, or list.
Here is a modification of my_function()
that allows multiple outputs:
my_function <- function(x,a=1.3,b=NULL){
# Handle input `b`
if(is.null(b)){
b <- x*5
}
# Now perform process
y <- a*x + b
output <- c("y"=y,"b"=b)
return(output)
}
Now my_function()
works like this:
To get the value of just y
or just b
, you can treat the output just like any other vector:
Exercise 2
Write a nested if
statement that produces a message reporting the hemisphere for any GPS position you provide it (a pair of latitude and longitude coordinates, in decimal degrees). The four hemisphere options are as follows:
- Northwest (positive latitudes and negative longitudes, e.g., USA)
- Northeast (positive latitudes and longitudes, e.g., Russia)
- Southwest (negative latitudes and longitudes, e.g., Brazil)
- Southeast (negative latitudes and positive longitudes, e.g., New Zealand).
Include the ability to handle missing values (i.e., if an NA
is provided, return a message saying that values are missing and the hemisphere cannot be determined.)
Provide five examples that demonstrate the functionality for all the possible message options.
Review assignment
Note: This exercise will combine all the skills you’ve learned for for
loops, if
statements, and writing functions into a real-world data science scenario. Buckle up!
You are working at the Center for Disease Control. Your supervisor has asked you to take a look at state-level data on infectious diseases within the United States in the last century. Specifically, she wants you to address the following questions and requests:
In the last 90 years, which states have had the highest average prevalence of measles, pertussis (whooping cough), and smallpox in proportion to their population sizes? Which have had the lowest?
Provide beautiful plots showing trends in the prevalence of these diseases over the last century. Produce a single plot for each state, with three lines representing the time series for each disease of interest. Save each plot as a
pdf
into a folder namedstate-level-summaries
. Name eachpdf
using the state’s name.
To do this work, your supervisor asks you to use the us_contagious_diseases
dataset contained within the R
package dslabs
, which contains disease data from 1928-2011 for all states. To make sure the numbers reflect actual patterns, she asks you to only use prevalence numbers for years in which counts were made in more than 20 weeks out of the year.