Análisis de emociones en textos en español mediante traducción Automática y modelos BERT Multilingües: Análisis de emociones en textos en español

Abraham Jorge Jiménez Alfaro; Griselda Cortes Barrera; Norma Karen Valencia Vázquez; Jhacer-Kharen Ruiz-Garduño; Claudia-Teresa González-Ramírez

doi:10.5281/zenodo.17525359

Authors

Abraham Jorge Jiménez Alfaro TECNM/TES Ecatepec https://orcid.org/0000-0003-3058-9082
Griselda Cortes Barrera Tecnológico Nacional de México/TES Ecatepec https://orcid.org/0000-0002-1159-0769
Norma Karen Valencia Vázquez TECNM/TESCHI https://orcid.org/0000-0002-6000-5925
Jhacer-Kharen Ruiz-Garduño TECNM/ITZ https://orcid.org/0000-0002-3003-8072
Claudia-Teresa González-Ramírez TECNM/ITZ https://orcid.org/0000-0002-1159-0769

DOI:

https://doi.org/10.5281/zenodo.17525359

Keywords:

Emotion analysis, Natural Language Processing (NLP), Pre-trained models (BERT)

Abstract

Emotion analysis in written texts through natural language processing (NLP) techniques is an expanding research area with key applications in mental health, marketing, education, and recommendation systems. This article proposes a systematic approach based on an NLP programming pipeline that enables emotion classification in Spanish texts by leveraging pretrained models originally developed in English. Since the most advanced models for emotion detection—such as BERT (Bidirectional Encoder Representations from Transformers)—have been primarily trained on English datasets, the proposed solution involves automatic translation of Spanish texts into English using the Helsinki-NLP/opus-mt-es-en model. Once translated, the texts are processed using the DistilRoBERTa model fine-tuned for emotion classification (j-hartmann/emotion-english-distilroberta-base), which predicts the emotional category among labels such as joy, sadness, anger, fear, love, and surprise. The pipeline is implemented in Python using specialized libraries such as Hugging Face Transformers for translation and classification tasks, and Scikit-learn for the statistical evaluation of model performance. Predictions are compared to ground truth labels, and evaluation metrics such as the confusion matrix, precision, recall, specificity, accuracy, and F1-scores (macro and weighted) are calculated to assess system effectiveness.

Results show an overall accuracy of 83%, confirming that despite language barriers, the integration of automatic translation with robust pretrained models can produce reliable and replicable results in emotion classification tasks applied to Spanish texts. This study highlights the potential of integrating multilingual NLP tools into real-world affective analysis applications.

Author Biographies

Abraham Jorge Jiménez Alfaro, TECNM/TES Ecatepec

Profesor Investigador

Griselda Cortes Barrera, Tecnológico Nacional de México/TES Ecatepec

Laboratorio Nacional

Norma Karen Valencia Vázquez , TECNM/TESCHI

Norma-Karen Valencia-Vázquez

https://orcid.org/0000-0002-6000-5925

Jhacer-Kharen Ruiz-Garduño , TECNM/ITZ

Jhacer-Kharen Ruiz-Garduño

https://orcid.org/0000-0002-3003-8072

Claudia-Teresa González-Ramírez , TECNM/ITZ

Claudia-Teresa González-Ramírez

https://orcid.org/0000-0002-1159-0769

References

Cañete, J., Chaperon, G., Fuentes, R., Ho, J.-H., Kang, H., & Pérez, J. (2020). Spanish pre-trained BERT model and evaluation data. Proceedings of the Practical ML for Developing Countries Workshop at ICLR 2020.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, 4171–4186.

Dos Santos, C. N., & Gatti, M. (2014). Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), 69–78.

Hartmann, J. (2022). j-hartmann/emotion-english-distilroberta-base. Hugging Face.

Análisis de emociones en textos en español mediante traducción Automática y modelos BERT Multilingües