Graph Data Modeling for Political Communication on Twitter

dc.contributor.advisor Wallapak Tavanapong
dc.contributor.author Kumar, Prashant
dc.contributor.department Computer Science
dc.date 2018-08-11T17:30:24.000
dc.date.accessioned 2020-06-30T03:07:38Z
dc.date.available 2020-06-30T03:07:38Z
dc.date.copyright Fri Jan 01 00:00:00 UTC 2016
dc.date.embargo 2001-01-01
dc.date.issued 2016-01-01
dc.description.abstract <p>Twitter has become a political reality where political parties, presidential candidates, legislatures and journalists post tweets about the latest events sharing texts, pictures, hashtags, URLs, and mentioning other users. Gaining insight from the vast amount of political data on Twitter is only possible with proper computational tools.</p> <p>We propose to store and manage Twitter data in an optimized Neo4j graph database for serving queries about political communication among state legislators of 50 U.S. states, state reporters, and presidential candidates for the 2016 presidential election. Our rationale for selecting this relatively new database technology is threefold: (1) ease of use in explicitly modeling and visualizing communication relationships among entities of interest; (2) flexibility to evolve the database overtime to quickly adapt to changes in user requirements; and (3) user-friendly intuitive query interface. We developed a Python-based Google App Engine application using Twitter API to collect tweets from the Twitter’s handlers of the aforementioned political actors. We employed best practice guidelines in graph database design to develop five different database models in order to distinguish the impact of each query optimization technique. We evaluated each of the models on the same set of tweets posted during January 1, 2016 to November 11, 2016 using the same set of queries of interest to political communication scholars in terms of the average query response times. Our experimental results confirmed the benefits of the best practice design guidelines. In addition, they show that the optimized database model is able to provide significant improvement in query response times. Reducing the number of hops used in the graph queries and using database indexes on most commonly used attributes reduced the average query response time in our dataset by as much as 74.52% and by 85.27%, respectively, compared to the reference model. Nevertheless, the reduction in the average query response time comes with the cost of the increase in graph database relationship store size by 5.49% compared to the reference model.</p> <p>Our contributions are as follows. (1) The optimized Neo4j graph database that will be updated weekly with new tweets; the access to this database can be made available to political communication scholars. (2) The above findings added to currently limited guidelines in graph database designs. (3) The findings about political communication prior to the Iowa caucus of the 2016 primary presidential election.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/etd/15949/
dc.identifier.articleid 6956
dc.identifier.contextkey 11169390
dc.identifier.doi https://doi.org/10.31274/etd-180810-5576
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/15949
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/30132
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/etd/15949/Kumar_iastate_0097M_16251.pdf|||Fri Jan 14 20:49:01 UTC 2022
dc.subject.disciplines Computer Sciences
dc.title Graph Data Modeling for Political Communication on Twitter
dc.type article
dc.type.genre thesis
dspace.entity.type Publication
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
thesis.degree.discipline Computer Science
thesis.degree.level thesis
thesis.degree.name Master of Science
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Kumar_iastate_0097M_16251.pdf
Size:
1.18 MB
Format:
Adobe Portable Document Format
Description: