Social media and blogging sites have gained popularity, enabling individuals to express their opinions and thoughts. Over a decade now, these sites have revolutionized the digital landscape. It has empowered every individual and community to freely express their opinions, feelings, and thoughts on a variety of topics. Users of these sites are generally allowed to post short and limited-character texts. Even though these texts are limited in size, they hold a wealth of information. They clearly convey thoughts, emotions, and feelings on a particular topic within their social network. Upon mining these texts, it is possible to determine the emotions and feelings expressed by an individual on a particular topic. An individual can express any emotions or feelings, including anger, disgust, fear, joy, love, sadness, surprise, etc. These texts can be examined at different ecological levels. Sometimes, more than one emotion can also be expressed via the same text. Given the inherent lack of structure and variability in the size of these texts, understanding these emotions, whether from an individual or larger group’s perspective (also known as emotion classification), can pose challenges. Further, despite significant breakthroughs in sentiment analysis within the field of Data Mining and Machine Learning, the wide range of emotions associated with human behavior has yet to be addressed. Knowing the exact emotion underlying a topic of investigation rather than a generic sentiment is critical. Since more than one emotion can be expressed via text, it becomes necessary to analyze each sentence one by one to get a better grasp of the overall emotion associated with it. Additionally, the popularity of social media is encouraging users to submit short messages, thus replacing the use of traditional electronic documents and article styles for expressing views and opinions.
Twitter tweets are short messages, unlike conventional texts. They are very peculiar in terms of size and structure. As indicated earlier, they are restricted to 280 characters. Because of this limitation, users have to use very restrictive language to express their feelings and emotions. The language used in tweets is very different from the language used in other digitized documents such as blogs, articles, and news [
1]. A large variety of features (i.e., words) are used in these texts, which poses a very significant challenge. Upon representing these texts as a vector of features, each text results in exponentially increasing the size of the available features. This is because the corpus would contain a million features for a given topic [
2]. A major significant challenge persists in manually classifying the text within the tweets into different emotion classes. Manually classifying the tweets into different emotion classes has been tried previously. However, manually annotating the tweets into different emotion classes is not free from ambiguity and does not also guarantee 100% accuracy [
2]. It is also the inherent complexity of the various emotional states that poses a significant challenge. Differentiating the emotional classes from one another is also a complication. According to the Circumplex model [
3], human beings have 28 different types of affect emotions. To explain this, Russell proposed a two-dimensional circular space model in which it was demonstrated that the 28 different emotion types differ from each other by a slight angle. Russell clearly showed that several emotion types are clustered so closely together that it becomes difficult to differentiate between them. Thus, it becomes very difficult for humans to label those texts accurately. When humans try to label these texts, there is a notable risk of mislabeling the emotions that are subtly different or close to each other. This is a serious issue because it eventually inhibits the classifier from learning the critical features that can be used to identify emotions hidden in the texts.
In this article, we focus on analyzing the tweets collected during the 2020 presidential elections. Using the lexicon-based NRC classifier, we analyzed the emotions and sentiments expressed by people toward the two presidential candidates, Donald Trump and Joe Biden, on various topics. Based on these emotions and sentiments, we predicted the swing direction of the 2020 presidential election in a subset of states deemed battleground and key to the election. To begin with, we have provided a short review of the emotion classification work performed in the past. Following that, we discussed the materials and methods employed in this study, presented the results and discussions from this study, and concluded our research with discussions on the scope for future work.
Literature Review
Studies related to sentiment and emotional classification have recently garnered considerable empirical attention. This popularity is due to the increase in the amount of unstructured opinion-rich text resources from social media, blogs, and textual corpus. These texts have given researchers and companies access to the opinions of a larger group of individuals around the globe. Meanwhile, the advances in ML and natural language processing (NLP) have also sparked increased interest in sentiment and emotion classification. For example, Hasan, Rundensteiner, and Agu (2014) have proposed the use of EMOTEX, which can detect emotions in text messages. EMOTEX uses supervised classifiers for emotion classification. Using Naïve Bayes (N.B.), Support Vector Machine (SVM), Decision trees, and the KNN (
k-nearest neighbor), they have demonstrated 90% precision for a four-class emotion classification on the Twitter dataset [
2]. Other studies, including the work by Pak et al. (2010) and Barbosa et al. (2010), have considered using ML techniques on Twitter datasets. They both have demonstrated accuracies ranging between 60% and 80% for distinguishing between positive and negative classes [
4,
5]. Go et al. (2009) have also performed sentiment analysis on the Twitter dataset using Western-style emoticons. They have used the N.B., SVM, and Maximum Entropy and have reported an accuracy of 80% [
6].
Furthermore, Brynielsson et al. (2014) have demonstrated close to 60% accuracy on a four-class emotion (positive, fear, anger, and others) classification on the tweets related to the Sandy hurricane using the SVM classifier [
7]. Last but not least, Roberts et al. (2012) have proposed Empa Tweet that can be used to annotate and detect emotions on Twitter posts. In their work, they have discussed developing a synthetic corpus containing tweets for seven different emotion types (anger, disgust, fear, joy, love, sadness, and surprise). On their constructed synthetic dataset, they used seven different binary SVM classifiers and classified the tweets. Using their ensemble classification technique, they have classified each tweet to determine if a particular emotion was present. In addition, they have reported that their corpus contained tweets with multiple emotion labels [
8].
Emotion and sentiment classification has been widely researched using various machine learning and deep learning techniques.
Bhowmick et al. (2010) performed an experiment where they observed that humans and machine learning models exhibited a very similar level of performance for emotion and sentiment classification on multiple data sets. Therefore, they concluded that the machine learning (deep learning) models can be trusted for this task [
9]. Chatterjee et al. (2019) also confirmed through their study that methods employing Deep neural networks outperform other off-the-shelf models for emotion classification in textual data [
10]. Kim (2014) performed several experiments on emotion classification using CNN on multiple benchmark datasets, including the fine-grained Stanford Sentiment Treebank. A simple CNN with slight hyperparameter tuning demonstrated excellent results for binary classifications of different emotions [
11]. In a work by Kalchbrenner et al., 2014 Dynamic Convolutional Neural Network (DCNN) has been explored for sentiment classification on the Twitter dataset. According to them, DCNN is capable of handling varying lengths of input texts in any language. They have reasoned that the use of Dynamic k-Max Pooling makes DCNN a potential method for sentiment analysis of Twitter data [
12]. Acharya et al. (2018) have explored emotion detection in EEG signals. In their study, they have explored and demonstrated the potential of using the complex 13-layer CNN architecture [
13]. In one of the studies, Hamdi et al. (2020) utilized the CNN streams and the pre-trained word embeddings (Word2Vec) to achieve a staggering 84.9% accuracy on the Stanford Twitter Sentiment dataset [
14]. On the contrary, Zhang et al. (2016) have proposed the Dependency Sensitive Convolutional Neural Networks (DSCNN) that outperforms traditional CNNs. They have reported 81.5% accuracy in the sentiment analysis of Movie Review Data (MR) [
15]. Zhou et al. (2015) have proposed C-LSTM, which utilizes both the CNN and LSTM for a 5-class classification task. However, they have only reported an accuracy of 49.2% [
16].
Since emotion and sentiment classification is a sequence problem, several studies have focused on exploring recurrent neural networks or RNNs. Lai et al. (2015) have explored RNNs and have determined that RNNs have the capability to capture the key features and phases in texts that can help boost performance for emotion and sentiment classification [
17]. Abdul-Mageed and Ungar (2017) have explored Gated RNN or GRNN for emotion and sentiment classification in several dimensions and have demonstrated significantly high accuracies [
18]. Kratzwald et al. (2018) have explored six benchmark datasets for emotion classification using the combination of both the RNN and sent2affect. They have reported exceptional performance of this combination when compared against any traditional machine learning algorithm [
19].
Using the Recursive Neural Tensor Network (RNTN) for the famous Stanford Sentiment Treebank dataset (SST), Socher et al. (2013) have reported 85.4% accuracy for sentiment classification [
20]. Zhou et al. (2016) used the BLSTM-2DCNN architecture for Stanford Sentiment Treebank binary and fine-grained classification tasks, archiving a mere 52.4% accuracy. In their study, they observed that the BLSTM-2DCNN architecture was very efficient in capturing long-term sentence dependencies [
21]. Czarnek et al. (2022) used the Linguistic Inquiry and Word Count (LIWC) and NRC Word-Emotion Association Lexicon (NRC) to investigate whether older people have more positive expressions through their language use. They examined nearly five million tweets created by 3573 people between 18 and 78 years old and found that both methods show an increase in positive affect until age 50. They also concluded that according to NRC, the growth of positive affect increases steadily until age 65 and then levels off [
22]. Barnes, J. (2023) has presented a systematic comparison of sentiment and emotion classification methods. In this study, different methods for sentiment and emotion classification have been compared, ranging from rule- and dictionary-based methods to recently proposed few-shot and prompting methods with large language models. In this study, it has been reported that in different settings—including the in-domain, out-of-domain, and cross-lingual—the rule- and dictionary-based methods outperformed the few-shot and prompting methods in low-resource settings [
23].
There are three types of classifiers for emotion and sentiment classification: supervised, unsupervised, and lexicon-based classifiers. Supervised classifiers are more commonly used to address the emotion and sentiment classification problem [
2,
4,
5,
6,
8,
24,
25,
26,
27,
28,
29,
30,
31,
32]. However, a training dataset is required to employ a supervised classifier for the classification problem. More specifically, a domain-specific training dataset is required. Obtaining a domain-specific training dataset for a task in hand is hard as it might not always be available. Therefore, it is wise to explore unsupervised or lexicon-based classifiers.
Unsupervised classifiers are utilized to model the underlying structure or the distribution of the data. Therefore, these algorithms are left on their own to discover and present interesting patterns. Upon using unsupervised learning, users are left to look at those patterns and assign the class labels. In the Lexicon-based approach, the aim is to identify certain patterns that occur together with a seed list of sentiment/emotion-related words. More specifically, similar sentiment/emotion-related words are identified from a large corpus with the same feature-specific orientations. For this study, the unavailability of the domain-specific corpus is a major challenge [
33]. Therefore, we have opted for a lexicon-based classifier, NRC [
34,
35], to determine the emotions and sentiments expressed within the collected tweets. In previous works related to US presidential elections [
36,
37,
38,
39,
40,
41], it has been clearly demonstrated that the NRC classifiers are best suited for emotion and sentiment classification of tweets compared to several different supervised learning techniques.