Text Classification for Businesses via Microsoft Azure Machine Learning

Masrol, Fazrul Nazrin Bin

Text Classification for Businesses via Microsoft Azure Machine Learning

File

MIS 599_Azure Text Classification_Fazrul Masrol.pdf (2.72 MB)

Date

2023-05

Authors

Masrol, Fazrul Nazrin Bin

Major Professor

Townsend, Anthony

Altmetrics

Abstract

The “Corporate Messaging” dataset retrieved from GitHub consists of over 3000 lines of online articles for various companies such as Barclays, Nestle, and Pfizer. These texts are split into three categories: Information, Action, and Dialogue. To extract the performance metrics, the category Dialogue has been excluded for analysis to perform only binary classification. By conducting Text Classification on this large dataset, we can help businesses transform text data into quantitative data to help drive business decisions and gain valuable insights to improve their businesses. The large dataset is first input into Microsoft Azure Machine Learning Studio and undergoes Text Preprocessing to clean and simplify the text. Then, feature selection methods are performed in Azure ML Studio to avoid overfitting and produce more accurate results. Next, several machine learning models are trained and evaluated on the large dataset. Once the machine learning models are evaluated, the most accurate model is deployed and visualized. In this study, it was found that the Decision Tree model proved to be the best machine learning model based off metrics such as accuracy, precision, and recall. Finally, the deployed model is tested by uploading independent data to predict the category of the texts, where the results are then visualized in Power BI.

Academic or Administrative Unit

College of Business

Type

creative component

Copyright

2023