Friday, July 29, 2016
In the prior two posts, I have described how I gathered twitter data from @HillaryClinton and @realDonaldTrump, how I ran a sentiment analysis on the individual tweets, and how I performed a principal component analysis on the most commonly used words. Today, I’ll tie everything together and describe how I created a model to predict whether a given tweet belongs to either of the two candidates.
Friday, July 22, 2016
As described on my last blog post on this topic, I've been tracking tweets from the US presidential candidates, Hillary Clinton and Donald Trump. I've looked at the top words they used and the sentiments expressed in their tweets given their word choice. However, some words are used with others almost all the time, a notable example being a slogan like Make America Great Again. As such, it may be beneficial to look at groups of words rather than individual words. For that, I took an approach applying a Principal Component Analysis. Below I describe what this is, how I used it, and what it reveals. Do note, however, that I'm applying things I learned in astronomy to this problem rather than taking courses specific to text mining. It may be that there are better tools out there than what I've used.
Friday, July 15, 2016
Over the past few months, I've been working on a little hobby data science project to explore twitter data with regards to the upcoming presidential election in the United States. The project has evolved quite a bit and detailing it in full is beyond the scope of a single blog post. As such, I've decided to split it into (at least) 3 posts. This post is the first of the series and will go over the basics of gathering data from Twitter and doing some simple text mining. The second and third posts will discuss more details of the project and show some neat visualizations I've created. I'll release all my code after the third post for any curious coders. For now, let's get started seeing what Hillary Clinton and Donald Trump's Twitter accounts are talking about.