A class project trying sentiment dictionaries on lyrics of the album “Magpie” by Peach Pit.
Author
Jeneta Nwosu
Published
May 16, 2025
Modified
May 17, 2025
Question
“Magpie” by Peach Pit was my favorite album of the year in 2024. Peach Pit is known for having creative (maybe even zany?) lyricism, and I’m curious if the meaning behind what they write is parsable by a machine. Below, I predict what sentiments will be identified and then see if the results line up with my expectations.
As an example of the problem, the first track “Every Little Thing” sounds upbeat and tender. But when I looked up what it meant, I learned it’s written from the Devil’s perspective. Spooky! On a second read, the malevolence and sinister themes of temptation become clear.
Data
The lyrics were scraped from Genius using their API and the lyricsgenius python package. I replaced the “right single quote character” with a normal apostrophe before importing because it was breaking the cleaning process.
Prediction
# What sentiments are possible?nrc_sentiments <- tidytext::get_sentiments("nrc")nrc_sentiments %>%select(sentiment) %>%unique()
# A tibble: 10 × 1
sentiment
<chr>
1 trust
2 fear
3 negative
4 sadness
5 anger
6 surprise
7 positive
8 disgust
9 joy
10 anticipation
library(SentimentAnalysis)
Attaching package: 'SentimentAnalysis'
The following object is masked from 'package:base':
write
While the Harvard IV dictionary assigns its words to either negative or positive sentiment, the NRC dictionary contains 10 possible emotions. The Harvard IV options will help simplify the meaning of the document, and the NRC will help capture nuances. My predictions below are mostly about the literal meaning of the lyrics, not the figurative.
Document
NRC Sentiment
Harvard Sentiment
Every Little Thing
anticipation, trust
positive
Yasmina
positive, joy
positive
Am I Your Girl
negative
negative
Little Dive
joy, positive*
positive
Outta Here
fear, anger, negative
negative
Did You Love Somebody
negative, anger, fear
negative
St. Mark’s Funny Feeling
negative, sadness
negative
Magpie
negative
negative
Nowhere Next To Me
disgust, sadness, negative
negative
Wax & Wane
negative, disgust
negative
Your Long Black Hair
negative, anticipation
negative
*Song is about doing drugs, but the words are positive
Descriptive
# Read in with track numbers to preserve the order for latermagpie_album_order <-read_csv('magpie.csv', col_select ="number":"song.lyrics")
New names:
Rows: 11 Columns: 3
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(2): song.full_title, song.lyrics dbl (1): number
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
magpie <- magpie_album_order[1:nrow(magpie_album_order),2:3]# Getting rid of punctuation that isn't within the text for the word cloud. Ex. "me," becomes "me", but "I'll" doesn't become "ill"lyrics_no_punct <- tm::removePunctuation(magpie$song.lyrics,preserve_intra_word_contractions = T, preserve_intra_word_dashes = T)docs <- tm::Corpus(tm::VectorSource(lyrics_no_punct))dtm <- tm::DocumentTermMatrix(docs,control =list(stopwords =TRUE,tolower =TRUE))freq <-colSums(as.matrix(dtm))library(wordcloud)
# Turning words into tokens but keeping as many words as possible, as there's only 10 songsmagpie <- magpie %>% tidytext::unnest_tokens(word, song.lyrics) # Joining NRC dictionary with Magpie wordsnrc_words <- magpie %>% dplyr::inner_join(nrc_sentiments)
Joining with `by = join_by(word)`
Warning in dplyr::inner_join(., nrc_sentiments): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 10 of `x` matches multiple rows in `y`.
ℹ Row 1573 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Warning in dplyr::inner_join(., harvard_sentiments): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 30 of `x` matches multiple rows in `y`.
ℹ Row 1806 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
# A tibble: 11 × 3
song.full_title n sentiment_harv
<chr> <int> <chr>
1 Am I Your Girl 3 negative
2 Did You Love Somebody 25 negative
3 Every Little Thing 13 positive
4 Little Dive 20 positive
5 Magpie 16 negative and positive
6 Nowhere Next To Me 15 positive
7 Outta Here 16 positive
8 St. Mark's Funny Feeling 9 positive
9 Wax & Wane 12 positive
10 Yasmina 18 negative
11 Your Long Black Hair 15 positive
# ^^The song "Magpie" is equally positive and negative
Final Results
# Sentiments from both dictionaries in original ordermag_sent <- mag_sent_nrc %>%left_join(mag_sent_harv[1:nrow(magpie_counts_2),c(1,3)], by ='song.full_title') %>%right_join(magpie_album_order[1:nrow(magpie_album_order), 1:2]) %>%arrange(number)
Joining with `by = join_by(song.full_title)`
mag_sent
# A tibble: 11 × 4
song.full_title sentiment_nrc sentiment_harv number
<chr> <chr> <chr> <dbl>
1 Every Little Thing positive (8), trust (7), joy … positive 1
2 Yasmina negative (6), anger (4), joy … negative 2
3 Am I Your Girl negative (2), sadness (2), tr… negative 3
4 Little Dive negative (15), positive (14),… positive 4
5 Outta Here negative (14), anticipation (… positive 5
6 Did You Love Somebody negative (9), anger (6), fear… negative 6
7 St. Mark's Funny Feeling negative (14), anticipation (… positive 7
8 Magpie negative (9), positive (7), j… negative and … 8
9 Nowhere Next To Me negative (12), sadness (9), a… positive 9
10 Wax & Wane negative (3), positive (3), s… positive 10
11 Your Long Black Hair anticipation (11), negative (… positive 11
Discussion
NRC picked up on a broader range of sentiments per song than I expected. I only made a few predictions per song, and many ended up being correct, but not complete. I also failed to predict the number of songs that ended up with positive themes; the same was true for the Harvard IV results. I think on that front, the NRC assignments are better than the Harvard ones. “Wax and Wane” is in no way a positive song.
However, I think the results of both dictionaries benefit from lucky/surface level guesses. For example, in “Little Dive” the NRC dictionary identifies “dandy” with “disgust” and “negative” presumably referring to the pejorative term for a fashionable man, instead of the sense it is actually used in the song, which is a positive feeling. For some reason, “train” gets a positive identification by the Harvard IV dictionary in Your Long Black Hair when I would argue it’s by default neutral word (and in context, negative — “I been thinking up a one-night train from this world”). While there’s a valid argument for considering the song as positive, that’s not it.
I conclude that text analysis using dictionaries like these is a fun tool for a cursory investigation of themes in song lyrics, but especially for complex works like “Magpie” it’s probably best to use a more advanced tool like the human brain. And if you must use a dictionary, NRC performs better!
See the full final table below.
Document
NRC Sentiment
Harvard Sentiment
Every Little Thing
positive (8), trust (7), joy (6), negative (5), sadness (5), anticipation (4), fear (3), disgust (1)
positive
Yasmina
negative (6), anger (4), joy (4), positive (4), trust (4), anticipation (1), disgust (1), fear (1), sadness (1)