Research by: Michelle P. Banawan, Jinnie Shin, Tracy Arner, Renu Balyan, Walter L. Leite, & Danielle S. McNamara
EXECUTIVE SUMMARY
Introduction
Academic discourse communities and learning circles are characterized by collaboration, sharing commonalities in terms of social interactions and language. The discourse of these communities is composed of jargon, common terminologies, and similarities in how they construe and communicate meaning.
In the pursuit of effective communication, parties in a conversation act cooperatively, and this is reflected in their lexical choices to increase understanding and reduce miscommunication. The conversing parties use common terminologies or, at times, agree to a change in terminology to overcome language ambiguities as conversations progress. Linguistic or lexical mimicry describes coordinated communication that involves the repetition of words and phrases among discourse participants. Semantic entrainment is likewise observed in this linguistic adaptation among discourse participants, leveraging semantic similarity as a measure of entrainment. Semantics, in linguistics, is the study of the meanings of words and sentences. Lexical semantics represents how the meanings of words are understood in the context in which they are used and is derived from the relationships between words in a sentence. Semantic similarity, which represents semantic entrainment, indicates whether two texts have similar or different meanings. It is mathematically derived using word embeddings, sentence embeddings, or both and calculates a similarity metric representing the distance between the texts being compared.
In this study, the objective is to measure the similarity of words and texts in threaded discussions on an online math tutoring platform, leveraging the power of Deep Natural Language Processing (Deep NLP). However, it is important to acknowledge the significant challenge posed by the scalability of state-of-the-art Deep NLP models to accurately capture the subtle nuances inherent in mathematical discourse.
Methods
A multi-method analytical approach is used, including (1) feature engineering using state-of-the-art natural language processing techniques to select the most appropriate features, (2) the bag-of-words classification model to predict linguistic similarity, (3) explainable AI using the local interpretable model-agnostic explanations to explain the model, and (4) a two-step cluster analysis to extract innate groupings between linguistic similarity and emotion.
The NLP methods used in this study were: (1) textual similarity calculation using pre-trained deep NLP and neural language models (i.e., spaCy language model and Universal Sentence Encoder); (2) emotion feature extraction (i.e., valence, arousal, polarity) using VADER; (3) derivation of trust affect measures using EmoLex; (4) textual classification modeling using TF-IDF word vectors as features; and (5) explainable AI using the LIME package for model interpretation.
Summary of Findings
We found that linguistic similarity within and between the threaded discussions was significantly varied, revealing the dynamic and unconstrained nature of the discourse. Further, word choice moderately predicted linguistic similarity between posts within threaded discussions (accuracy = 0.73; F1-score = 0.67), revealing that discourse participants’ lexical choices effectively discriminate between posts in terms of similarity. Lastly, cluster analysis reveals profiles that are distinctly characterized in terms of linguistic similarity, trust, and affect. Our findings demonstrate the potential role of linguistic similarity in supporting social cohesion and affinity within online discourse communities.
Implications to Scalability of NLP Language Models for the Mathematics Domain
State-of-the-art semantic extraction and information retrieval have made grand strides in terms of capturing context across domains. For mathematical domains, however, there is much room for the desired scalability. The complexity of encompassing mathematical context within deep NLP remains challenging because of the inherent structure of mathematical texts. Our findings corroborate this scalability challenge.
To cite this article: Banawan, M. P., Shin, J., Arner, T., Balyan, R., Leite, W. L., & McNamara, D. S. (2023). Shared language: Linguistic similarity in an algebra discussion forum. Computers, 12(3), 53. http://dx.doi.org/10.3390/computers12030053
To access this article: http://dx.doi.org/10.3390/computers12030053
About the Journal
Computers is an international, scientific, peer-reviewed, open-access journal of computer science, including computer and network architecture and computer–human interaction as its main foci, published monthly online by MDPI.
Journal ranking
Chartered Association of Business Schools Academic Journal Guide 2021 | Not ABS rank |
Scimago Journal & Country Rank | h-Index: 24 |
SJR 2021: 0.56 | |
Scopus | CiteScore 2021: 3.7 |
Australian Business Deans Council Journal List | Not ABDC rank |
Journal Citation Reports (Clarivate) | JCI 2021: 0.44 |