In the past decades, one line has run through the entire research spectrum of natural language processing (NLP)—knowledge. With various kinds of knowledge, such as linguistic knowledge, world knowledge, and commonsense knowledge, machines can understand complex semantics at different levels. In this article, we introduce a framework named "knowledgeable machine learning" to revisit existing efforts to incorporate knowledge in NLP, especially the recent breakthroughs in the Chinese NLP community.
Since knowledge is closely related to human languages, the ability to capture and utilize knowledge is crucial to make machines understand languages. As shown in the accompanying figure, the symbolic knowledge formalized by human beings was widely used by NLP researchers before 1990, such as applying grammar rules for linguistic theories3 and building knowledge bases for expert systems.1 After 1990, statistical learning and deep learning methods have been widely explored in NLP, where knowledge is automatically captured from data and implicitly stored in model parameters. The success of the recent pretrained language models (PLMs)4,13 on a series of NLP tasks proves the effectiveness of this implicit knowledge in models. Making full use of knowledge, including both human-friendly symbolic knowledge and machine-friendly model knowledge, is essential for a better understanding of languages, which has gradually become the consensus of NLP researchers.