Tutorial: Sentiment Analysis

As part of my group’s final project, which is a text analysis of the Carletonian newspaper, I am conducting sentiment analysis across our text corpus of all Carletonian prints from 1877 to 2025. Sentiment analysis is, broadly, a computational way of extracting sentiment from a snippet of text. For example, if I have the sentence “I am really excited for this weekend!”, a sentiment analysis tool would label this as POSITIVE, generally with some kind of confidence score. On the other hand, a sentence like “I am so tired and I don’t want to do homework” would receive a NEGATIVE label, also with an associated confidence score. Depending on the sentiment analysis model, sentences that are simply declarative, and do not convey any kind of sentiment, could receive a NEUTRAL label. This is a powerful tool, as it allows us to extract the general sentiment from large amounts of text efficiently and accurately, which is something that would take people a large amount of time. In this tutorial, I am going to show you how to use an existing sentiment analysis model, based on the language model RoBERTa-base (which itself is derived from the BERT model).

1. Create a Google Colab Notebook

As the sentiment analysis model we are going to use is based on the transformer architecture, it takes a significant amount of compute power to run. If you laptop has a dedicated GPU, then feel free to run the code locally, however I will be using a Colab notebook to run this code, as it provides cloud access to an NVIDIA T4 GPU, which is plenty powerful to run the model. Colab provides a hosted Jupyter Notebook to interact with, making it especially easy to iteratively run code and see the output. Navigate to Google Colab, and select File > New notebook in Drive. You should now see the following blank notebook:

2. Select Your Model

Hugging Face is a large online repository for sharing machine learning models, and it contains many versions of pretrained and fine-tuned models that users have contributed that are tailored towards specific tasks. For sentiment analysis, the best approach is generally to use encoder models, such as the BERT or RoBERTa models, that have then been fine-tuned on large sentiment analysis datasets. For this tutorial, I will use the cardiffnlp/twitter-roberta-base-sentiment-latest model, which is a RoBERTa-base model trained on about 124M tweets and fine-tuned for sentiment analysis. Depending on the task, however, different models may be more suitable, or you may even need to fine-tune a base model yourself. For example, the mrm8488/deberta-v3-ft-financial-news-sentiment-analysis model is specifically tailored for sentiment analysis of financial news, making it suitable for text that fits the style more of financial news. After navigating to the link above, you can see more details on Hugging Face:

3. Import Libraries and Enable GPU

Now that you have selected your model, it is time to import it into Colab. Luckily, Hugging Face provides the transformers Python library and the pipeline class, which makes it incredibly simple to do this. First, we want to import both the torch library and the aforementioned pipeline function (these are both preinstalled in the Colab environment):

import torch
from transformers import pipeline

Go ahead and run the cell. Now that we have these imported, we want to ensure that Colab utilizes the NVIDIA T4 GPU. We can do this with the following lines of code:

if torch.cuda.is_available():
    device = 0
    print('Utilizing GPU')
else:
    device = -1
    print('No GPU found')

Finally, before running the above cell, ensure that a GPU runtime is enabled by going to Runtime > Change runtime type, and selecting a runtime that provides a GPU.

4. Import Model into Colab

Now, we need to ensure that your program is set up to use the model we selected before. The pipeline class provided by Hugging Face makes this extremely simple to do. Create a new code cell, and write the following lines of code:

model_path = "cardiffnlp/twitter-roberta-base-sentiment-latest"
sentiment_task = pipeline(
    "sentiment-analysis",
    model=model_path,
    tokenizer=model_path,
    device=device
)

Run the cell. Now that the pipeline object has been created, we are ready to move on to actually evaluating the sentiment of some text.

5. Evaluate Sentiment

The Hugging Face pipline class makes this very straightforward to do! In order to evaluate the sentiment of the sentence “I am very happy today!”, write the following line of code in a new cell:

sentiment = sentiment_task("I am very happy today!")
print(sentiment)

You should see output, after running the cell, that looks something like this:

[{'label': 'positive', 'score': 0.9863167405128479}]

The 'label' field of the returned dictionary gives the assigned class, while the 'score' field gives the model’s confidence score that its predicted label is the correct label for the sentence. Feel free to change the sentence around, to see what the model will output for different sentences of varying sentiment. Also, feel free to reference the documentation for transformers here if you need any more information about the library. Additionally, if you need more clarification for the exact code to write, I have a screenshot of my Colab notebook below for reference:

Conclusion

Sentiment analysis is a powerful tool that can allow digital humanists to quickly extract sentiment from large amounts of text, such as the Carletonian text corpus. While this tutorial is just a simple introduction to sentiment analysis, much more can be done with this, such as programmatically combining it with other text analysis tools, such as sentence tokenizers from nltk to get sentiment scores for each sentence in lage corpora. Additionally, if you have a large labeled dataset, you could also fine tune existing base models so that they perform better at a specific task that you are trying to accomplish.

1 thought on “Tutorial: Sentiment Analysis

  1. Warren, this is a really cool project and tool. I had never heard of sentiment text analysis before, nor would I have been able to perform it had I been asked. I think you did a great job of laying out the code and making the instructions easy to follow. I appreciate the Austin style formatting you used too! Great work.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

css.php