An introductory guide to Data science: The terminological landscape Yedla, Abhinav Dorius, Shawn Dorius, Shawn
dc.contributor.department Computer Science
dc.contributor.department Sociology 2018-08-13T23:31:59.000 2020-07-02T06:50:24Z 2020-07-02T06:50:24Z Sun Jan 01 00:00:00 UTC 2017 2017-01-01
dc.description.abstract <p>The emerging field of data science has rapidly evolved into an extremely diverse field equipped with multi-disciplinary techniques to extract, analyze and classify structured and unstructured data. These methods offer researchers, policy analysts, and the lay public evidence-based insights into a tremendous range of human, organizational, and societal activities on a scale and scope that has rarely been possible with conventional scientific methods. At present, however, the multi-disciplinary nature of the data science space suffers a ‘language’ problem insofar as data scientists from different fields often use different terms to describe common methods and concepts. The aim of the present research is threefold. First, we report results of a literature review that identifies and defines the essential content domain of data science, with special focus on the classification of data collection techniques. Second, we establish a preliminary set of relationships among the most trafficked terms of data science to facilitate interdisciplinary communication among scientists from heterogeneous fields. And third, we develop a classification scheme of web-scraping methods based on their availability, the quality of the data procured by the method, the ease of data extraction, reproducibility, the technical skills required to leverage each method, and the types of data collected by each method.</p>
dc.description.comments <p>This is a pre-print made available through <em>Social Science Research Network: </em><a href="" target="_blank"></a>.</p>
dc.format.mimetype application/pdf
dc.identifier archive/
dc.identifier.articleid 1031
dc.identifier.contextkey 12650531
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath soc_las_pubs/32
dc.language.iso en
dc.source.bitstream archive/|||Fri Jan 14 23:34:40 UTC 2022
dc.subject.disciplines Computer Sciences
dc.subject.disciplines Software Engineering
dc.subject.disciplines Theory, Knowledge and Science
dc.title An introductory guide to Data science: The terminological landscape
dc.type article
dc.type.genre article
dspace.entity.type Publication
relation.isAuthorOfPublication 6d3ca941-cb6e-48a3-bbae-ee96987df91b
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
relation.isOrgUnitOfPublication 84d83d09-42ff-424d-80f2-a35244368443
Original bundle
Now showing 1 - 1 of 1
1.36 MB
Adobe Portable Document Format