LinguagemSimples: Automatic Simplification of Court Decisions with Large Language Models
DOI:
https://doi.org/10.1590/SciELOPreprints.16575Keywords:
Plain Language, Legal NLP, Language Models, Court Decisions, Simplification EvaluationAbstract
The legal language of Brazilian court decisions, marked by Latinisms, technical jargon, and nested subordinate clauses, severely hinders comprehension by the average citizen. This paper presents LinguagemSimples, a pipeline for the automatic simplification of court decisions using large language models (LLMs). Sixteen techniques were evaluated: lexical rules, Big Pickle (Few-Shot, Zero-Shot, CoT), Nemotron 3 Ultra (FS, ZS, CoT), DeepSeek V4 Flash (FS, ZS, CoT), Qwen 2.5 7B (FS, ZS, CoT), GPT-5.4 Mini (FS), GPT-5.4 (full) (FS), and Gemini 3.5 Flash (FS) on 100 real STF decisions across consumer, family, and social security law. Metrics include readability (Adapted Flesch, Gunning-Fog), lexical similarity (ROUGE), and semantic preservation (BERTScore). Additionally, an LLM-as-Judge analysis (GPT-5.4 Mini) evaluated 1,500 simplified outputs across five error categories. All LLMs outperform the rule-based baseline, which actually reduced readability (-1.6 Flesch points). DeepSeek V4 Flash and Big Pickle achieved the highest readability gains (+24.3 points each), while Qwen 2.5 7B Zero-Shot led in semantic preservation (BERTScore mBERT F1=0.748). Chain-of-Thought proved counterproductive across all models, with Few-Shot being the most effective prompting strategy. GPT-5.4 Mini offered the best latency-quality trade-off (+16.4 Flesch gain, 0.697 BERTScore F1, ~2.5 s/doc), and GPT-5.4 (full) achieved the highest ROUGE-1 (0.583) and second-highest BERTScore (0.713). The LLM-as-Judge analysis revealed hallucination rates ranging from 7% (GPT-5.4 full) to 49% (Qwen 2.5 7B FS), with nuance loss being the most frequent error category across all techniques. Consumer law proved the most favorable domain for simplification (+28.2 points), while family law was the most challenging. The corpus and code are publicly available.
Downloads
Posted
How to Cite
Section
Copyright (c) 2026 João Pedro Sansão, Michel Leles

This work is licensed under a Creative Commons Attribution 4.0 International License.
Plaudit
Data statement
-
The research data is available in one or more data repository(ies)


