Module 32 Trump tweets
Learning goals
- This is a review exercise: apply the
dplyr
andggplot
skills introduced in the previous modules and theDeep R
modules on Working with Dates and Times and Working with Text to doing some text mining of former-President Trump’s tweets.
Let’s run the below to get started.
library(dplyr)
library(readr)
library(tidytext)
trump <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/trumptweets.csv')
stop_words <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/stopwords.csv')
1. In the current format, one row of data is equal to one
2. Create a variable called line
. This should be 1, 2, 3, 4, etc.
3. Create a variable called text
. This should be an exact copy of content
.
4. Use the unnest_tokens
function to reshape the data for better text processing.
simple <- trump %>%
select(-mentions, -hashtags, -geo, -content, -link, -id) %>%
unnest_tokens(word, text)
5. What format is the data in now (ie, one row is equal to
6. Take a minute to read about the tidytext
package at https://www.tidytextmining.com/tidytext.html.
7. What is the most common word used by Trump?
8. Use substr
to create a year
variable.
9. What is the most common word used by Trump each year?
10. Create a variable named month
using substr
.
11. What is the most common word used by Trump each month?
12. Create a dataframe with one word per row, and a column called freq
saying how many times that word was used.
13. Load up the wordcloud
library.
14. Subset the dataframe created in number 12 to only include the top 100 words.
15. Create a wordcloud of Trump’s top 100 words.
16. Are you ready to do some sentiment analysis? Great.
17. Create a dataframe named sentiments
by running the following: sentiments <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/sentiments.csv')
18. What is the sentiments
dataset?
19. Create another dataset named polarity
by running the following: polarity <- get_sentiments("afinn")
20. Use left_join
to combine polarity and sentiments into one dataset named emotions
.
21. Use left_join
to combine the trump
data and the emotions
data.
22. Have a look at the simple
(Trump) data. What do you see?
23. Get an overall polarity score (using the value
variable) for the entire dataset. Is it positive or negative?
24. How many words were emotionally associated with “anger” in 2015?
25. What percentage of words were associated with “fear” by year?
26. What is the average sentiment polarity by year?
27. What is Trump’s most positive tweet?
28. What month was Trump’s most negative month?
29. What percentage of Trump tweets have more sadness than joy by year/month?
30. Read in data on full moons by running the following: moon <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/full-moon.csv')
31. Create a date
column with a correctly formatted date.
32. What day of the week has the most full moons?
33. Use left_join
to bring the moon data into the Trump data.
34. Does Trump have more negative emotions on full moon days?
35. Read in “stop words” by running the following: sw <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/stopwords.csv')
36. Join the sw
data to the simple
data, and remove the stop words.
37. Create a new word cloud.
38. Do a new analysis of sentimentality.