Text Classification for Businesses via Microsoft Azure Machine Learning
Date
2023-05
Authors
Masrol, Fazrul Nazrin Bin
Major Professor
Townsend, Anthony
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract
The “Corporate Messaging” dataset retrieved from GitHub consists of over 3000 lines of online articles for various companies such as Barclays, Nestle, and Pfizer. These texts are split into three categories: Information, Action, and Dialogue. To extract the performance metrics, the category Dialogue has been excluded for analysis to perform only binary classification.
By conducting Text Classification on this large dataset, we can help businesses transform text data into quantitative data to help drive business decisions and gain valuable insights to improve their businesses. The large dataset is first input into Microsoft Azure Machine Learning Studio and undergoes Text Preprocessing to clean and simplify the text. Then, feature selection methods are performed in Azure ML Studio to avoid overfitting and produce more accurate results. Next, several machine learning models are trained and evaluated on the large dataset. Once the machine learning models are evaluated, the most accurate model is deployed and visualized.
In this study, it was found that the Decision Tree model proved to be the best machine learning model based off metrics such as accuracy, precision, and recall. Finally, the deployed model is tested by uploading independent data to predict the category of the texts, where the results are then visualized in Power BI.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
creative component
Comments
Rights Statement
Copyright
2023