Analysis and use of textual definitions through a transformer neural network model and natural language processing
Baltazar Reyes, Germán Eduardo
MetadataShow full item record
There is currently an information overload problem, where data is excessive, disorganized, and presented statically. These three problems are deeply related to the vocabulary used in each document since the usefulness of a document is directly related to the number of understood vocabulary. At the same time, there are multiple Machine Learning algorithms and applications that analyze the structure of written information. However, most implementations are focused on the bigger picture of text analysis, which is to understand the structure and use of complete sentences and how to create new documents as long as the originals. This problem directly affects the static presentation of data. For these past reasons, this proposal intends to evaluate the semantical similitude between a complete phrase or sentence and a single keyword, following the structure of a regular dictionary, where a descriptive sentence explains and shares the exact meaning of a single word. This model uses a GPT-2 Transformer neural network to interpret a descriptive input phrase and generate a new phrase that intends to speak about the same abstract concept, similar to a particular keyword. The validation of the generated text is in charge of a Universal Sentence Encoder network, which was finetuned for properly relating the semantical similitude between the total sum of words of a sentence and its corresponding keyword. The results demonstrated that the proposal could generate new phrases that resemble the general context of the descriptive input sentence and the ground truth keyword. At the same time, the validation of the generated text was able to assign a higher similarity score between these phrase-word pairs. Nevertheless, this process also showed that it is still needed deeper analysis to ponderate and separate the context of different pairs of textual inputs. In general, this proposal marks a new area of study for analyzing the abstract relationship of meaning between sentences and particular words and how a series of ordered vocables can be detected as similar to a single term, marking a different direction of text analysis than the one currently proposed and researched in most of the Natural Language Processing community.