Agenda detector: labeling tweets with political policy agenda

Kaul, Sheetal
Major Professor
Wallapak Tavanapong
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Computer Science
Organizational Unit
Journal Issue
Computer Science

In nearly one decade of Twitter’s being it has witnessed an ever growing user base from various realms of the world, one of them being politics. In the political domain, Twitter is used as a vital tool for communication purposes, running effective e-campaigns, and mining and affecting public opinions to name a few. We study the problem of automatically detecting whether a tweet posted by a state’s Senate’s twitter handle in the US has a reference to policy agenda(s). Such a capability can help detect the policy agendas that a state focuses on and also capture the inception of ideas leading to framing of bill/law. Furthermore, analyzing the spatial and temporal dynamics of tweets carrying policy agendas can facilitate study of policy diffusion among states, and help in comprehending the changing aspects of states learning policy-making from each other.

Currently, no study has been carried out that analyzes Twitter data to detect whether or not a tweet refers to a policy agenda. We present our analysis on 122,965 tweets collected from verified Twitter handles of the US state’s upper house – Senate. We present our high-level analysis on (a) how much Twitter has penetrated into state politics and (b) how states use the medium differently in terms of the messages they broadcast. Our proposed approach aims to automate classification of a tweet based on having a reference to policy agenda (Has Agenda) or not (No Agenda). We accomplish this by leveraging existing text classification methodology and achieve a recall of 89.1% and precision of 77.2% for the “Has Agenda” class. We investigate several machine learning algorithms to determine the best performing one for our binary classification problem. We conclude that support vector machine using linear kernel was the most efficient algorithm to use for our dataset. Lastly, we propose a set of hand-crafted features that together with feature selection and stemming improved our classifier’s performance. Prior to including these features the classifier was developed using, basic preprocessing techniques, and term occurrence (for feature extraction). An overall improvement of 5.187 % at a significance level of α=0.05 was achieved.