Proyectos de Investigación

Proyecto ROBOT-TALK (english)

Project ROBOT-TALK

Recognizing the Origin of roBOtic Texts. Task Automatization and Linguistic Knowledge (ROBOT-TALK)

Project PID2022-140897OB-I00 funded by MICIU/AEI /10.13039/501100011033/ and FEDER Una manera de hacer Europa

september 2023-2026

Go to the Spanish version

ROBOT-TALK is a multidisciplinary applied research project addressing the digital transition to determine semi-automatically whether digital texts “have been produced by automated processes without human intervention” (Carta Derechos Digitales 2021, Chapter XV. 2). In essence, it in applying a “Turing Test” to digital texts (Turing 1950). Its results will help to guarantee the right to receive accurate information (Carta Derechos Digitales 2021) and to ensure information transparency. The project will also contribute to the generation of scientific knowledge in both forensic linguistics and computational linguistics with applications in cybersecurity by detecting whether a text has been generated by a human agent or a machine.

The objective of the project is to define a methodology based on linguistic knowledge to detect auto-generated texts (by robots) in Spanish in order to help improve current approaches to identification and classification of auto-generated texts. Our hypothesis is that it will be possible, by applying methods from forensic linguistics, to obtain the idiolect, i.e. the linguistic signature, of the author of an anonymous text by means of linguistic analysis/profiling, whether it is a human or a machine. In order to achieve this, five goals have been established:

 

  1. The creation of a (comparable) corpus of texts generated by human vs machine
  2. The study of the linguistic capabilities/knowledge of generative LLMs (here called robots)
  3. The analysis of the strengths and weaknesses of the methods employed in forensic linguistics in order characterize and identify robotic authorship.
  4. The analysis of the strengths and weaknesses of the current automatic systems employed to identify robotic texts generated in Spanish
  5. The elaboration of a methodological proposal for the identification of robotic authorship of texts
  6. The development of a proof of concept to verify the effectiveness of the proposal