Module 26 RMS Titanic
Use the dplyr
verbs to answer these questions.
First, download the following mystery dataset by running this code.
library(readr)
library(dplyr)
df <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/deaths.csv')
1. Review the dataset and try to figure out which each row represents. “Each row is a ____.”
2. How many people are in the dataset?
3. Use summarize()
to count the number of men and women.
4. Use summarize()
to count the number of people in each class.
5. Use summarize()
to count the number of men and women in each class.
6. What is the average age of men in the dataset?
7. What is the average age of women in the dataset?
8. Use mutate
to create a variable called died
. This should be a boolean based on the Survived
column (in which 1 means the person survived, and 0 means the person died).
9. Use mutate
to create a variable called child
. This should be a boolean based on the Age
column, indicating if someone was less than 18 years old.
10. Create a different dataframe for men vs. women. Name them accordingly.
11. Create a different dataframe for class 1, class 2, and class 3. Name them accordingly.
12. For each of the 5 datasets you’ve just created, what is the death rate?
13. For each of the 5 datasets, how many children died?
14. Now, using the full original dataset, calculate the child-specific death rate for each combination of class and sex (ie, “first class females”, “third class males”, etc.).
15. What did you find? What might explain that?
16. What is the average age of men and women, separately, in each class?