Module 32 Trump tweets

Learning goals

This is a review exercise: apply the dplyr and ggplot skills introduced in the previous modules and the Deep R modules on Working with Dates and Times and Working with Text to doing some text mining of former-President Trump’s tweets.

Let’s run the below to get started.

library(dplyr)
library(readr)
library(tidytext)
trump <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/trumptweets.csv')
stop_words <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/stopwords.csv')

1. In the current format, one row of data is equal to one ?

2. Create a variable called line. This should be 1, 2, 3, 4, etc.

3. Create a variable called text. This should be an exact copy of content.

4. Use the unnest_tokens function to reshape the data for better text processing.

simple <- trump %>%
  select(-mentions, -hashtags, -geo, -content, -link, -id) %>%
  unnest_tokens(word, text)

5. What format is the data in now (ie, one row is equal to )?

6. Take a minute to read about the tidytext package at https://www.tidytextmining.com/tidytext.html.

7. What is the most common word used by Trump?

8. Use substr to create a year variable.

9. What is the most common word used by Trump each year?

10. Create a variable named month using substr.

11. What is the most common word used by Trump each month?

12. Create a dataframe with one word per row, and a column called freq saying how many times that word was used.

13. Load up the wordcloud library.

14. Subset the dataframe created in number 12 to only include the top 100 words.

15. Create a wordcloud of Trump’s top 100 words.

16. Are you ready to do some sentiment analysis? Great.

17. Create a dataframe named sentiments by running the following: sentiments <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/sentiments.csv')

18. What is the sentiments dataset?

19. Create another dataset named polarity by running the following: polarity <- get_sentiments("afinn")

20. Use left_join to combine polarity and sentiments into one dataset named emotions.

emotions <- left_join(sentiments, polarity) %>% filter(!duplicated(word))

21. Use left_join to combine the trump data and the emotions data.

simple <- left_join(simple, emotions)

22. Have a look at the simple (Trump) data. What do you see?

23. Get an overall polarity score (using the value variable) for the entire dataset. Is it positive or negative?

24. How many words were emotionally associated with “anger” in 2015?

25. What percentage of words were associated with “fear” by year?

26. What is the average sentiment polarity by year?

27. What is Trump’s most positive tweet?

28. What month was Trump’s most negative month?

29. What percentage of Trump tweets have more sadness than joy by year/month?

30. Read in data on full moons by running the following: moon <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/full-moon.csv')

31. Create a date column with a correctly formatted date.

32. What day of the week has the most full moons?

33. Use left_join to bring the moon data into the Trump data.

34. Does Trump have more negative emotions on full moon days?

35. Read in “stop words” by running the following: sw <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/stopwords.csv')

36. Join the sw data to the simple data, and remove the stop words.

37. Create a new word cloud.

38. Do a new analysis of sentimentality.