Module 31 Shakespeare text mining

Learning goals

This is a review exercise: apply the skills introduced in the previous modules and the Deep R modules on Working with Text by text mining the works of Shakespeare.

library(dplyr)
library(readr)
shake <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/Shakespeare_data.csv')

All good? Great. Let’s go.

1. How may rows are in the data?

2. Create a dataframe named spoken. This should be those lines which are spoken by an actor/actress (figure it out).

3. How many lines are spoken?

4. Create a column called first_word. This should be the first word of each spoken line.

5. What is the most common first word spoken?

6. Create a boolean column named “King”. This should indicate whether or not the word “King” was spoken in any given line.

7. Improve the above by making sure that it includes both lower and uppercase variations of “king”.

8. Figure out which play has the word “king” mentioned most?

9. What percentage of lines in Hamlet mention the word "king?

10. How many times does the word “woman” appear in each play?

11. How many words are there in all Shakespeare plays?

12. How many letters are there in each Shakespeare play?

13. Which character says the most words?

14. Which character says the least words?

15. What is the lines(s) of the character who says the least words?

16. Make a table of plays with one row per play and variables being: (a) number of lines, (b) number of words, (c) number of characters, (d) number of letters, (e) number of mentions of “Brew”.