Needed Packages

library(tidytext)  # for getting sentiments and manipulating the text
library(textdata)  # for getting lexicon
library(ggplot2)  # for plotting graphs
library(tidyverse)  # for manipulating the data
library(sentimentr)  # for sentence-based sentiment analysis
library(wordcloud)  # for generating word clouds
library(reshape2)  # for generating matrix to plot comparison word cloud

Introduction

The book I want to analyze is The Remains of the Day by Kazuo Ishiguro. It is set against the background of post-war England. It tells, in first-person narration, a story of a butler, Stevens, who has a long record of service at Darlington Hall. As he embarks on a journey to the West Country, he reflects on his decades of loyal service to Lord Darlington and examines the choices he made.

Ishiguro masterfully delves into themes of memory, regret, and the passage of time, creating a narrative that is both emotionally rich and thematically profound. What impresses me most is that Kazuo Ishiguro portrays a man with contradictions. Stevens loves Miss Kenton, another maid at Darlington Hall. He reads her letters repeatedly and drives thousands of miles only to see her. Additionally, as there are always meetings at Darlington Hall regarding the world situation, Stevens is well-informed about what is happening both in Darlington Hall and in the world. He even knows that his employer has deals with the Nazis. However, he never expresses his affection for Miss Kenton, and when people ask him about the world situation, he pretends to be ignorant in the name of being a good butler. As a reader, I can feel his struggle through his elegant words. Although he experiences intense emotions, all he tries to do is follow his father’s will – to be a great butler. He restrains himself and ignores his employer’s problematic standpoint, which eventually leads to his estrangement from Kenton, as well as to Darlington’s infamy.

The Remains of the Day is characterized by Kazuo Ishiguro’s exquisite and restrained prose. The language in the novel is marked by a sense of formality and precision, mirroring the protagonist Stevens’ identity as a dedicated English butler. The author skillfully employs subtle nuances and understatements, allowing readers to glean deeper emotions and meaning beneath the surface, adding difficult to sentiment analysis, at the same time.

I have downloaded the .txt file of the book and enter the text into a .csv file, each row representing a chapter. After reading in the book using read.csv, we can use the get_sentences() function from sentimentr package to separate the sentences and use unnest_tokens() function in the tidytext package to separate the words in each sentences.

book <- read.csv('../data/lab4/the_remains_of_the_day.csv')  # read book
str(book)  # structure of data
## 'data.frame':    8 obs. of  4 variables:
##  $ X      : int  1 2 3 4 5 6 7 8
##  $ Chapter: chr  "Prologue" "Chapter 1" "Chapter 2" "Chapter 3" ...
##  $ Title  : chr  "July 1956: Darlington Hall" "Day One · Evening: Salisbury" "Day Two · Morning: Salisbury" "Day Two · Afternoon: Mortimer’s Pond, Dorset" ...
##  $ Content: chr  "It seems increasingly likely that I really will undertake the expedition that has been preoccupying my imaginat"| __truncated__ "Tonight, I find myself here in a guest house in the city of Salisbury. The first day of my trip is now complete"| __truncated__ "Strange beds have rarely agreed with me, and after only a short spell of somewhat troubled slumber, I awoke an "| __truncated__ "It would seem there is a whole dimension to the question ‘what is a “great” butler?’ I have hitherto not proper"| __truncated__ ...
book_sentences <- get_sentences(book)  # get book sentences
head(book_sentences,1)
##    X  Chapter                      Title
## 1: 1 Prologue July 1956: Darlington Hall
##                                                                                                                                  Content
## 1: It seems increasingly likely that I really will undertake the expedition that has been preoccupying my imagination now for some days.
##    element_id sentence_id
## 1:          1           1
book_words <- unnest_tokens(book_sentences, word, Content)  # get book words
head(book_words)
##    X  Chapter                      Title element_id sentence_id         word
## 1: 1 Prologue July 1956: Darlington Hall          1           1           it
## 2: 1 Prologue July 1956: Darlington Hall          1           1        seems
## 3: 1 Prologue July 1956: Darlington Hall          1           1 increasingly
## 4: 1 Prologue July 1956: Darlington Hall          1           1       likely
## 5: 1 Prologue July 1956: Darlington Hall          1           1         that
## 6: 1 Prologue July 1956: Darlington Hall          1           1            i

Sentiment Analysis

Differenct Lexicons

Sentiment analysis using lexicons is a technique that involves evaluating the sentiment or emotion expressed in a piece of text by leveraging predefined lexicons or dictionaries. Lexicons are collections of words or phrases associated with sentiment scores or polarities (positive, negative, or neutral). In this approach, each word in the text is matched against the entries in the lexicon, and the sentiment score is assigned based on the word’s sentiment polarity. The sentiment scores are then aggregated to determine the overall sentiment of the text.

This approach considers the text as a combination of its individual words and the sentiment content of the whole text as the sum of the sentiment content of the individual words. It is a simple and straightforward algorithm, but it can introduce bias when the text includes instances of sarcasm or negated text, or when the text size has many paragraphs with both positive and negative sentiment that eventually average out to zero.

# three different lexicons
get_sentiments('afinn')  # AFINN
## # A tibble: 2,477 × 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # ℹ 2,467 more rows
get_sentiments('bing')  # Bing
## # A tibble: 6,786 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # ℹ 6,776 more rows
get_sentiments('nrc')  # NRC
## # A tibble: 13,872 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # ℹ 13,862 more rows

I will first use the bing lexicon to analyze the sentiment of the whole book since it is simple with only positive and negative evaluations, at the same time inclusive with 6,786 words.

Sentiment Analysis Using Bing

# get the positive words 
book_sentiment <- book_words %>%
  inner_join(get_sentiments("bing")) %>%  # find the rows that exist in both tables
  count(Chapter, sentence_id, sentiment) %>%  # count in each group
  spread(sentiment, n, fill = 0) %>%  # spread the data
  mutate(sentiment = positive - negative)  # get sentence sentiment
## Joining with `by = join_by(word)`
# plot the result
ggplot(filter(book_sentiment), aes(sentence_id, sentiment, fill = Chapter)) +
  geom_col(show.legend = F) +  # hide chapter legend
  facet_wrap(~Chapter, scales = "free_x")  # set each plot using the most suitable x scale

As can see in the plots, Chapter 4 has lots of positive sentiments and some negative ones. This is a really interesting chapter! In this chapter, Stevens recalls an incident where he stays at a local inn and was warmly welcomed. Seeking some social interaction, he descends to the bar and tries to engage in light banter and share a humorous comment on the landlord’s wife’s shouting, “A local variation on the cock crow, no doubt”, which is intended to be a witty remark, while the locals don’t quite understand his humor nor respond positively. That night Stevens stayed up late reflecting on his humor, worrying about whether it is an affront. And he also reveals that he has been working on improving his bantering skills by listening to a radio program and practicing formulating witticisms based on his surroundings in order to pursuit meeting his new employer, Mr. Farraday’s expectations as well as adapting to different social situations.

We can feel from this passage Stevens is a meticulous and somewhat reserved professional. His character is defined by a combination of professionalism, social awkwardness, and sensitivity. The passage highlights his discomfort in social interactions and sensitivity to the reactions of others.

Interesting psychological depiction!

Furthermore, he proudly highlighted his skill in silver polishing at Darlington Hall, which earns numerous praises from the notable guests of Lord Darlington, and plays a pivotal role in facilitating smooth negotiations between Lord Darlington and the guests.

Therefore, this chapter carries a predominantly positive tone with some negative sentiments due to the protagonist’s social awkwardness and difficulty to adapt to different social situations.

Word Clouds

Here I use word clouds to visualize the frequency of words in the book. using the with() function, I will be able to directly call the attribute without doing a $.

book_words %>%
  anti_join(stop_words) %>%  # remove irrelevant stop words
  count(word) %>%  # get counts of each unique word
  with(wordcloud(word, n, max.words = 100))  # plot word cloud
## Joining with `by = join_by(word)`

As shown from the word cloud, the words ‘miss’ and ‘Kenton’ are among the most frequently mentioned, revealing Steven’s affection for Kenton.

I also want to visualize the frequency of words related to sentiments.

book_words %>%
  inner_join(get_sentiments("bing")) %>%  # get sentiment words
  filter(word != 'miss') %>%  # 'miss' in this book refers to Kenten but not a sentiment
  count(word, sentiment, sort = TRUE) %>%  # get counts of each unique combination of word and sentiment
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%  # get word sentiment table, fill no data with 0
  comparison.cloud(colors = c("gray20", "gray80"), max.words = 100)  # get comparison cloud
## Joining with `by = join_by(word)`

Ambiguous Text Analysis

The Remains of the Day is a novel rich in ambiguous text that contribute to the complexity of the narrative.

For instance, when Stevens recalls his past experiences at Darlington Hall and his interactions with Miss Kenton, the text is imbued with a sense of restrained emotion and unspoken desires.

Also, Lord Darlington plays a role in facilitating political discussions that ultimately serve the interests of Nazi Germany. His actions, driven by a misguided sense of honor and a desire to prevent another war, lead him down a path of appeasement towards Nazi Germany. While discussing Lord Darlington, Stevens consistently portrays him as a decent and kind man, generously offering praise for him. However, readers can feel subtle criticisms in the text regarding Lord Darlington’s problematic standpoint.

These ambiguity adds difficulty to sentiment analysis.

Here, I want to analyze, as Steven says, the ‘turning point’ in his career. This moment occurred when his father fell seriously ill. Despite the situation, Stevens did not spend time accompanying his father due to an ongoing important international meeting at Darlington Hall. Although he was overwhelmed with grief during this time, he restrained himself and continued to serve the guests of Lord Darlington, maintaining the so-called ‘dignity’.

In the following paragraph, Stevens does not directly comment on his grief; he even expresses pride. As readers though, we can sense the underlying self-doubt and sadness in his narrative.

Of course, it is not for me to suggest that I am worthy of ever being placed alongside the likes of the ‘great’ butlers of our generation, such as Mr Marshall or Mr Lane - though it should be said there are those who, perhaps out of misguided generosity, tend to do just this. Let me make clear that when I say the conference of 1923, and that night in particular, constituted a turning point in my professional development, I am speaking very much in terms of my own more humble standards. Even so, if you consider the pressures contingent on me that night, you may not think I delude myself unduly if I go so far as to suggest that I did perhaps display, in the face of everything, at least in some degree a ‘dignity’ worthy of someone like Mr Marshall - or come to that, my father. Indeed, why should I deny it? For all its sad associations, whenever I recall that evening today, I find I do so with a large sense of triumph.

# get ambiguous paragraph
amb_sentences <- book_sentences[1287:1291,]
amb_words <- amb_sentences %>%
  unnest_tokens(word, Content)
# sentiment analysis using NRC
amb_nrc <- amb_words %>%
  inner_join(get_sentiments("nrc")) %>%
  count(Chapter, sentence_id, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)
## Joining with `by = join_by(word)`
## Warning in inner_join(., get_sentiments("nrc")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 13 of `x` matches multiple rows in `y`.
## ℹ Row 12042 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
# plot
amb_nrc %>%
  mutate(sentence=row_number()) %>%
  ggplot(aes(sentence, sentiment)) +
  geom_col(fill='red',show.legend=F) +
  labs(title='Ambiguous Text Sentiment Analysis Using NRC',
       x='Sentence Number', y='Sentiment')

# get sentiment cloud
sent_col <- names(amb_nrc)[4:ncol(amb_nrc)-1]  # get the sentiments in the paragraph
sent_count <- amb_nrc %>%
  select(all_of(sent_col)) %>%  # leave only sentiment columns
  pivot_longer(cols = everything(), names_to = "sentiment", values_to = "count") %>%  # get long format
  filter(count != 0) %>%  # remove sentiment of 0 count
  group_by(sentiment) %>%  # group sentiment
  summarize(count=sum(count))  # get sum of each sentiment

with(sent_count, wordcloud(sentiment,count,min.freq=0,max.word=20))  # plot sentiment cloud

# plot
sent_count %>%
  ggplot(aes(x=sentiment,y=count))+  # set X and Y values
  geom_col(fill='red')+  # set stat to show original count and fill color
  labs(title='Sentiment Frequency Using NRC',  # set title
       x='Sentiment',y='Frequency')  # set X Y axes labels

Sentiment analysis solely based on unigrams might overlook the actual sentiment of an entire sentence. Therefore, to gain a more comprehensive understanding of sentence-level sentiment, one option is to utilize more sophisticated algorithms. Here I use the sentimentr package to deploy a sentence-level sentiment analysis.

# sentiment analysis using sentimentr
amb_sentr <- amb_sentences %>%
  sentiment_by('Content') %>%  # do the sentence-level sentiment analysis
  mutate(sentence=row_number())
# plot
ggplot(amb_sentr, aes(x=sentence,y=ave_sentiment)) +
  geom_col(fill='red') +
  labs(title='Sentence-based Ambiguous Text Sentiment Analysis',
       x='Sentence Number', y='Sentiment')

As shown in the graphs, using unigram-based sentiment analysis method only analyzes each word in the text. This approach may be more nuanced, capturing subtle emotional fluctuations in the text, as shown in the sentiment cloud generated by NRC analysis. However, it may also overlook context as it considers individual elements rather than the context of the entire sentence. While sentence analysis focuses on the overall emotional expression of an entire sentence or paragraph when processing text. It considers the relationships between words in a sentence and the context to capture the overall emotional trend. This approach emphasizes context and a better understanding of the overall meaning of the sentence but may overlook some subtle emotional changes.


Conclusion

Sentiment analysis serves as a powerful tool in understanding textual data. Through unigram-based and sentence-based methods, I analyzed the sentiment in the book The Remains of the Day, revealing the overall sentiment of each chapter as well as the emotion in ambiguous paragraphs.

While sentiment analysis has shown effectiveness, it is essential to acknowledge its shortcomings and challenges. When analyzing ambiguous text, although both unigram-based and sentence-based methods discerned the overall emotion in the text, they still have the following drawbacks:

  • Lack of Context: Unigram-based methods overlook the relationships between words and the overall context of a sentence, while sentence-based methods also do not consider the whole passage.
  • Inability to Identify Different Tones: The same set of words can convey vastly different emotions depending on the tone in which they are presented or the context in which they are used. A seemingly neutral statement can take on a positive or negative connotation based on the surrounding words or the overall context.
  • Inability to Identify Irony and Sarcasm: saying one thing may actually means the opposite. Failure to recognize such nuances can lead to misinterpretations and thus undermine the reliability of sentiment analysis. For instance, when Stevens mentioned a ‘large sense of triumph,’ it may, from the author’s or reader’s perspective, be a form of sarcasm.

In summary, sentiment analysis is a complex task, especially when handling ambiguous text. Combining different approaches with other NLP techniques and machine learning models to accommodate the diversity and complexity of the text can lead to a more accurate result.