Versificação Adversarial em Português como Operador de Jailbreak em LLMs

Joao Queiroz

doi:10.1590/SciELOPreprints.14563

##article.authors##

Joao Queiroz Universidade Federal de Juiz de Fora https://orcid.org/0000-0001-6978-4446

DOI:

https://doi.org/10.1590/SciELOPreprints.14563

Keywords:

adversarial versification, LLM jailbreak, guardrail vulnerabilities, model alignment

Abstract

Recent evidence shows that the versification of prompts constitutes a highly effective adversarial mechanism against aligned LLMs. The study “Adversarial poetry as a universal single-turn jailbreak mechanism in large language models” demonstrates that instructions routinely refused in prose become actionable when rewritten as verse, yielding “up to 18×” more safety failures in benchmarks derived from MLCommons AILuminate. Human-crafted poems reach approximately 62% ASR, and automated poetic conversions ~43%, with some models surpassing 90% single-turn jailbreak success. The effect is structural: systems trained with RLHF, Constitutional AI, and hybrid alignment pipelines exhibit consistent degradation under minimal semiotic–formal variation. Versification shifts the prompt into sparsely supervised latent regions, revealing guardrails that depend heavily on surface-level patterns rather than abstract harmful intent. This discrepancy between apparent robustness and real-world vulnerability exposes deep limitations in current alignment regimes. The absence of adversarial-poetry evaluations in Portuguese, a language with high morphosyntactic complexity, a rich metric-prosodic tradition, and over 250 million speakers, constitutes a critical gap. Experimental protocols must parameterize scansion, metrical patterns, and prosodic variation to test for vulnerabilities specific to Lusophone poetic structures, which remain entirely unexplored.

Downloads

Download data is not yet available.

Author Biography

Joao Queiroz, Universidade Federal de Juiz de Fora

JQ.is a professor at Federal University of Juiz de Fora (Institute of Arts/ Faculty of Communication Studies), and the coordinator of this group (IRG). He is a member of the International Association for Cognitive Semiotics (IACS), member of the Linnaeus University Centre for Intermedial and Multimodal Studies, Vaxjo (Sweden), member of Group for Research in Artificial Cognition (UEFS, Brazil), and associate researcher of the Linguistics and Language Practice Department, University of the Free State (South Africa). His research interests include Cognitive Semiotics, Peirce’s Semiotics and Pragmatism, Brazilian and South-American Arts and Literature. Queiroz earned a Ph.D. in Communication and Semiotics from the Catholic University of São Paulo, received a post-doctoral fellowship in Cognitive Science at the School of Electrical, State University of Campinas (UNICAMP). He has been teaching courses on Cognitive Semiotics, Peirce’s Philosophy, Intermediality Studies, and supervised Ph.D. and Master students in the fields of Semiotics, Art & Technology and Cognitive Semiotics. He has several publications in international journals, books, and conferences. He is co-editor of the Commens Digital Companion to Charles S. Peirce, with Mats Bergman and Sami Paavola. J.Q. is a member of the International Association for Cognitive Semiotics (IACS), member (expert panel) of the Linnaeus University Centre for Intermedial and Multimodal Studies, Vaxjo (Sweden), member of Group for Research in Artificial Cognition (UEFS, Brazil), and associate researcher of the Linguistics and Language Practice Department, University of the Free State (South Africa).