Text Classification Based on Machine Learning: A Review

Yang Sun

Authors

Yang Sun
Shenyang Normal University

Keywords:

Automatic text classification, Machine learning, Pre-trained language model

Abstract

With the rapid development of the Internet, text information has shown a blowout growth. Massive text data such as news, social media posts, academic literature, etc. are constantly emerging, and manual classification and management of these texts has become time-consuming and inefficient, which is difficult to meet the actual needs. The continuous progress of natural language processing technology, especially the rise of deep learning methods, provides strong technical support for automatic text classification. Deep learning models can automatically mine the essential features of text from massive samples, capture deep semantic representation information, and avoid the tedious process of manual design rules and features. In practical applications, text data often co-exists with data of other modes (such as images, audio, etc.). Through the feature learning of multimodal data, the information of multiple modes can be mapped to the joint vector space, and the unified representation of data can be obtained, so that the text classification can be more accurate. In recent years, pre-trained language models such as BERT and GPT have achieved remarkable results. These models learn a common language representation through unsupervised pre-training on large-scale corpus, and then fine-tune on specific text classification tasks, which can significantly improve the classification performance and further promote the research of automatic text classification. Automatic text classification can classify massive text data into different categories quickly and accurately, which is convenient for information storage, retrieval and management. For example, in the fields of library document management and enterprise document management, automatic classification can greatly improve work efficiency and save labor costs. In social media and online public opinion monitoring, automatic text classification can quickly identify text information with different themes and emotional tendencies. This helps to timely understand the dynamics of public opinion, and provides a basis for the government, enterprises and other institutions to formulate corresponding coping strategies. In the field of customer service, such as online customer service, customer feedback processing, etc., automatic text classification can automatically identify the types of questions and emotional tendencies of customers. Thus, automated customer consultation and problem classification can be realized to improve the efficiency and quality of customer service. Automatic text classification is an important task in the field of natural language processing, and its research progress can provide reference for other natural language processing tasks. For example, in tasks such as sentiment analysis, machine translation, question answering system, etc., the techniques and methods of text classification can be applied and expanded. Automatic text classification technology can be widely used in many fields, such as financial risk assessment, medical text analysis, legal document classification and so on. In these fields, automatic text classification can help professionals quickly sift and process a large amount of text information, improve work efficiency and decision-making accuracy.

Text Classification Based on Machine Learning: A Review

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Latest publications

Information

Make a Submission