Design and implementation of a Chatbot for answering questions on scientometric indicators
Export citation
Abstract
Scientometrics is the field of study and evaluation of scientific measures such as the impact of research papers and academic journals. It is an essential field because nowadays, different rankings use key indicators for university rankings, and universities themselves use them as Key Performance Indicators (KPI). The first objective of this research work is to propose a semantic model of scientometric indicators by generating a statistical ontology that extends Statistical Data and Metadata Exchange (SDMX). We develop a case study at Tecnologico de Monterrey following the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. We evaluate the benefits of storing and querying scientometric indicators using linked data in Neo4j to provide flexible and quick access knowledge representation that supports indicator retrieval, discovery, and composition based on a self-knowledge strategy. The semantic representation can answer a simple query using dimensions, query returning values with time intervals, aggregation functions such as average and standard deviation, and calculate a new scientometric indicator with data stored in the ontology.
The second objective of this research work is to integrate the proposed statistical ontology model of scientometric indicators in a chatbot. Building a chatbot requires the use of Natural Language Processing (NLP) as a capability for recognizing users' intent and extracting entities from users' questions. We proposed a method for recognizing the requested indicator and transforming the question expressed in natural language into a query to the semantic model. The chatbot and the ontology model represent a novel framework that can answer questions from the Research Office about scientometric indicators. The chatbot is evaluated in terms of Goal Completion Rate (GCR). It measures how many questions the chatbot answered correctly and correctly identifies intent and entity extraction. The second evaluation approach of the chatbot is a survey that focuses on usability, the strictness of language variations, chatbot comprehension, correlation in chatbot responses, and user satisfaction.
The main contribution of this research is the structural representation of the type of question that can be performed over the indicators modeled with SDMX. We simplify the model training and interpretation of questions by defining complexity levels and extracting entities from the question. We demonstrate how a chatbot can answer questions about any indicator modeled with SDMX. The chatbot can be trained to recognize another way to formulate questions without impacting the semantic representation of the indicators. The model is scalable because we can add more indicators using RDF, and the chatbot will only require minor changes (e.g., adding new dimensions).