Sentiment Analysis: Effects and underlying Mechanisms – Part II

How Does Sentiment Analysis Work?

SA uses various NLP methods and algorithms, which we’ll go over in more detail in this section. The main types of algorithms used include:

    • Rule-based systems that perform SA based on a set of manually created set of rules.
    • Automatic systems depend on Machine Learning techniques to learn from the data.
    • Hybrid systems that combine both rule-based and automated approaches.

If you haven’t yet read part 1, please enter here: Sentiment Analysis – Effects and underlying Mechanisms.

by Rashed Sabra

photo5767035730495911595
Rashed Sabra - AI Engineer at L-One Systems

Rule-based Approaches

Usually, a rule-based approach uses a set of human-designed rules to help identify subjectivity, polarity, or the subject of an opinion.

These rules may include various methods developed in computational linguistics, such as:

    • Stemmingtokenizationpart-of-speech tagging, and parsing.
    • Lexicons (i.e., lists of words and phrases).

Here’s an actual example of how a rule-based system works:

    1. It defines two lists of polarized words (e.g., negative words such as sourworstugly, Etc. and positive words such as soundbestexcellent, Etc.).
    2. Compute the number of positive and negative words that appear in an input text.
    3. If the appearances of positive words are greater than the ones of negative words, the algorithm returns a positive sentiment and vice versa. If the appearances are even, the algorithm returns a neutral sentiment.

Rule-based approaches are very naive since they don’t consider how words are combined in a sequence. Of course, more advanced processing techniques can be used, and new rules are added to support new terms and vocabulary. However, adding new rules may affect previous results, and the whole system can get very complex. Since rule-based systems often require fine-tuning and maintenance, they’ll also need regular investments.

Automatic Approaches

Contrary to rule-based systems, automatic methods don’t rely on manually crafted rules but Machine Learning techniques. The existing methods can be roughly categorized into supervised learning, semi-supervised learning, and unsupervised learning.
The idea of supervised learning is to study the features from both positive and negative examples. The existing approaches consist of reading annotated corpus, memorizing lists of entities, and creating disambiguation rules. The shortcoming of supervised learning is the requirement of a large annotated corpus, which leads to two alternative learning methods: semi-supervised learning and unsupervised learning.
The primary technique for unsupervised learning, “bootstrapping,” also involves some degree of supervision, such as a set of seeds for starting the learning process.
The typical approach in unsupervised learning is clustering, such as using a dictionary to compile sentiment words. These techniques rely on resources, lexical patterns, or statistics calculated on a large unannotated corpus.

Here’s how a classifier (supervised learning use case) can be implemented:

Training and Testing Process
  • The Training and Prediction Processes

    In the training process (a), the model associates a text input to the corresponding output (tag) based on the test paired samples used in the training phase. The feature extractor transfers the input pair (sentence, tag) into a numeric vector of features. Pairs of the feature vectors/tags (e.g., positive(pos)negative(neg), or neutral(n)) are fed into the ML algorithm to train the model.

    In the prediction phase(b), the feature extractor transforms unseen inputs samples into feature vectors. These vectors are then fed into the ML model, generating predicted tags.

  • Feature Extraction from Text

    The first step in a ML classifier is the text extraction or text vectorization, and the classical approach has been bag-of-words or bag-of-n-grams with their frequency.

    Recently, New feature extraction techniques is applied based on word embeddings. This kind of representation makes it possible for words with similar meanings to have an equal representation, improving the performance of classifiers.

Challenges

There are many factors affecting the performance of a SA system.

Contextual understanding
Contextual understanding is crucial for a system to be able to reach human-level accuracy.
For example: “I am craving McDonald’s so bad”.

Most systems will misinterpret this statement as negative by seeing the phrase “so bad”.

Sentiment Ambiguity
“Can you recommend any nice vacation destinations?”
This sentence doesn’t show any sentiment, although it uses the positive sentiment word “nice.”

Sarcasm
“Sure, I’m so happy for my browser to crash right in the middle of my coursework.”

This sentence is negative, even though it has the positive word “happy.”

»Sentiment Analysis (SA) is an important NLP field. It extracts and categorizes sentiments in written texts: positive, negative and neutral. We extract them with the help of text analysis methods. It allows for example, customer feedback to attain objective weight as a quantifiable criterion.«

Rashed Sabra - AI Engineer at L-One Systems

Comparatives
“iPhone is much better than Samsung.”

Most Sentiment analyzer systems cannot “pick sides” when they find comparative terms like the one mentioned above, they can only pick the sentiment based on keywords. This example would be classified as “positive” because it contains keyword: “much better,” which is positive regardless of which company is looking at this data.

To conclude, Sentiment Analysis (SA) is the process of mining meaningful patterns from text data. These patterns include interpretation and classifying emotions (positive, negative, and neutral) within text data using text analysis techniques. Several approaches are used in the SA task, including rule-based and automatic approaches. The first approach uses a set of human-designed rules to help identify subjectivity, polarity, or the subject of an opinion. In contrast, the second one doesn’t rely on manually crafted rules but Machine Learning techniques. Finally, there are many factors affecting the performance of a SA system, such as Contextual understanding, Sentiment Ambiguity, Sarcasm, and Comparatives. Recently, research focused on reducing these challenges’ negative impact on the SA systems’ performance.