Group Members: Cam, Warren, Emilia, Daya, and Luha
Progress
We’ve started to conduct analysis on the txt files from the Carletonian archive after getting help from David Bliss, a digital archivist from the Carleton College Archives, in getting these files from their PDF forms. We have gathered a folder with txt files of Carletonian publications from 1877 to 2008. We have all started working on our individual elements, however some of us have been able to get further as some of our analysis will have to wait on getting the rest of the txt files from 2008 to current day.
Problems
We initially intended to personally convert the PDF versions of the Carletonian into txt files, but after some testing, we determined that it would not be feasible to do it ourselves. Luckily, we were able to get help from the Archives staff who downloaded and shared the files with us instead. This did take some time and we weren’t able to do much in the meantime, but we are still on track with our proposed timeline so it wasn’t a big issue.
Tools and Techniques
We are utilizing WordPress for our final deliverable, TimelineJS for the timeline, Flourish and R Studio for the graphs, and sentiment analysis for the maps. We will be embedding these visualizations onto the WordPress website.
Deliverables
Our project is still on track and we are working on conducting analysis and creating visualizations. Our current timeline for deliverables is:
- Week 9: After data cleaning, we will run the text through textual analysis tools and begin creating deliverables.
- Week 10: During the final week we will finalize our deliverables and create a WordPress site to showcase our results.
Personal Messages
Cam: I have been working on building the WordPress site that will host the presentation for our final visualizations. I decided to use the theme GeneratePress, as it is clean, easy to use, and allows for a lot of nice customization features. I built multiple pages that can be accessed from the home screen of the site, which includes: the home screen that includes an introduction to our project, a page for each visualization made by my fellow team members, an about page which gives a brief background about each of us, and finally a source page that includes our bibliography for the project.
Warren: I have been looking into methods of sentiment analysis for use on the Carletonian corpus of text. I am currently thinking of using a specific version of the BERT family of encoder-decoder language models, specifically the SiEBERT model, which is a fine-tuned version of the RoBERTa-large model. This specific model strikes a good balance between performance and high accuracy of sentiment classification. I was thinking that it would be interesting to plot these sentiment scores over time, to see how the average sentiment of the Carletonian has changed over time, or perhaps to use some kind of wind graph (similar to the one Google project we looked at), to see how the sentiment of the Carletonian has shifted seasonally over time. Currently, I have a small script that can successfully produce a sentiment score for a short snippet of text, and I think my next step will be to try to chunk the Carletonian to perform analysis.
Emilia: I have been working on creating the timeline aspect. This is more of a manual process than some of the other aspects as we are personally choosing relevant historical events rather than statistically seeing the most frequently or densely discussed events. Thus, I created a timeline based on the historical timeline Carleton has on its website and have started the process of finding Carletonian articles from the time period that mention the event on the front page. After finding a suitable article, I copy the image, image address, and collection address into a google doc and then add all the relevant information necessary to create a timeline using TimelineJS to a google spreadsheet. With the way TimelineJS works, you just have to attach a published google sheet and the timeline will update as you update the google sheet so I’m able to start to see what the timeline will look like as I slowly add to it. I will spend this week continuing this process and then review with the team to see if there is any historical event I’ve missed or anything I’ve included that doesn’t seem relevant.
Daya: I have been brainstorming ideas for creative and insightful visualizations to create with these text files, as well as figuring out what other data will be required to create these visualizations. I have also started working on creating an R script to combine, clean, and tokenize the text files so that we can perform these analyses. My next steps are to finalize the cleaned datasets, gather any other data necessary, and begin actually creating the visualizations.
Luha: I’ve been brainstorming possible methods for identifying events especially with computation ways to extract keywords from the Carletonian text files. Also, I’ve been exploring TimelineJS examples to understand how to structure and present our data effectively. My next steps will be trying text analysis tools to extract events from text files and work on TimelineJS spreadsheets to test the formatting.
This sounds like such an interesting project and I look forward to seeing your final project. The Carletonian is so uniquely tied to Carleton’s identity and comparing the newspaper throughout Carleton’s history will be so interesting to examine. I appreciate that you are using a number of different techniques to analyze the data and observe the information, using tools that we have used throughout the class. I’m glad that you were able to convert all of the many PDFs into txt files, and use the previous newspapers easily.
This sounds really cool. I am specifically interested in seeing how your project turns out because there are so many directions you can take with the data. I am sure you will uncover many trends and changes over time through text analysis, but then you will have to decide which are most notable. It is cool that you will be able to tell a story of your choice. Furthermore, it will be interesting to analyze why these changes actually occurred. Changes in political climate, social norms, pandemics, conflicts, changes in leadership?
This sounds like an exciting project! It’s great that you were able to collaborate with the archivist to access the Carletonian files in text format. That must have saved a lot of time. I really like how each team member is focusing on a distinct but connected aspect, such as sentiment analysis, timelines, and visualization design. I’m especially looking forward to seeing how the trends and tone of the Carletonian change over time.
I think the idea of analyzing and tracing back the Carletonian archive is super interesting, and I look forward to seeing the final product! It’s nice to hear that you got help from the librarian, cause that’s what our group is going to need too. I also like how your group split up the tasks. It looks like everyone has something important and interesting to work on!
I think the idea of analyzing and tracing back the Carletonian archive is super interesting, and I look forward to seeing the final product! It’s nice to hear that you got help from the librarian, cause that’s what our group is going to need too. I also like how your group split up the tasks. It looks like everyone has something important and interesting to work on!
Everything sounds really great so far! We are also working with David Bliss from the archives and he has been amazing to work with. I am interested to see how your text analysis and visualizations turn out. I am especially interested to see how the sentiment trends and timeline elements come together on the website.
I really like your update on the “Carletonian Through Time” project. It’s clear you’ve put thoughtful effort into showing how the newspaper has evolved and what those changes say about campus history and culture. Your writing helps make a complex source accessible and interesting — it made me curious to explore the archives myself. Great job!
I am glad the Archives staff were able to help you all so far! There are a lot of moving parts in this project and you are using a Digital Humanist skill of finding the necessary connections (in this case the staff) that will help you carry out your project! There are so many years of data so it will be really interesting to see what comes of it.