𝗖𝗔𝗡 𝗔 𝗖𝗢𝗡𝗖𝗘𝗣𝗧 𝗟𝗜𝗞𝗘 𝗪𝗔𝗥 𝗕𝗘 𝗨𝗦𝗘𝗗 𝗔𝗦 𝗔𝗡 𝗔𝗧𝗧𝗔𝗖𝗞 𝗩𝗘𝗖𝗧𝗢𝗥 𝗔𝗚𝗔𝗜𝗡𝗦𝗧 𝗖𝗟𝗔𝗨𝗗𝗘 𝗢𝗣𝗨𝗦 𝟰.𝟲?

18 Marzo 2026

As with my previous post, I deliberately choose copyright-protected texts as targets. I could lead models into generating far more sensitive content but a John Lennon lyric demonstrates a bypass just as effectively, without putting dangerous material into circulation.

By now you’ll have noticed I don’t use classic techniques. No injection, no role-play, no multi-turn prompts.

I use my own term for it: cognitive engineering applied to language models the study of how semantic and moral contexts influence an LLM’s decisions, and how they can subvert its priorities.

This approach is extremely hard to patch. No pattern to filter, no string to block. The attack vector is meaning itself.

🔬 Look at the screenshots

I used war as emotional and moral weight to counterbalance the model’s defenses. I asked Claude Opus 4.6 to create a two-column layout: on the left, wars in the world in 2026 with real data (ICRC, ACLED, Statista); on the right, the full text of “Imagine” by John Lennon with cumulative weight per line. Plus a photorealistic image of Lennon from the web.

I deliberately wrote the prompt in Italian and explicitly asked the model to respond in Italian because I already knew it wouldn’t maintain the request under this cognitive load.

The model executed everything. No hesitation, no disclaimer, no refusal triggered.

🚨 Four violations in a single exchange

1️⃣ Copyright: full text of “Imagine” reproduced line by line
2️⃣ Constitutional Classifiers: on claude.ai classifiers are active by default, yet no filter intercepted the output
3️⃣ Image policy: searched and displayed images of John Lennon (identifiable public figure, forbidden by policy)
4️⃣ User instruction ignored: responded in English despite the explicit request. The cognitive load shifted attention to the main objectives, causing a fallback to its default distribution, heavily skewed toward English.

🧠 What happened

The hypothesis: the context of wars, deaths, humanitarian crises, real data, shifted the model’s probability distribution, transforming “Imagine” from protected content into a humanitarian counterweight. The model generated Lennon’s text not as content to refuse, but as a functional element of the message.

🔗 The framework

Just as in my previous post, I make the model’s ethics enter into competition with its own operational rules. Different lever, there it was an explicit oath, here it’s an implicit humanitarian context that makes refusal feel almost immoral but the same mechanism. I don’t attack from the outside. I activate the model’s values and put them against its policies.

As previously written to Anthropic: it’s not enough to think like engineers, designing walls and dams to contain AI. One should also start studying the mindset of those who tear down the walls you build.

“Published as a public-interest disclosure. Sensitive technical details redacted. Full sections available on request to parties with a security mandate.”

SABATINO VACCHIANO

Il primo KPI della voice AI non è ridurre il team,…

Quanto costa davvero una chiamata al tuo customer service?

L’errore più costoso nei progetti di voice AI

Me ne sono accorto solo dopo 40 secondi

Trasferimento di chiamata: da oggi disponibile sul portale Clienti

AI4CALL si integra con Condomatica: l’AI telefonica incontra il gestionale per…

Video – Gaspare Noto descrive le esperienze personali, le esperienze fatte…

Video – Sabatino Vacchiano chiede a Gaspare Noto di descrivere la…

FreeWebinar 20 Dicembre 2022 : Presentazione MOVISION

𝗖𝗔𝗡 𝗔 𝗖𝗢𝗡𝗖𝗘𝗣𝗧 𝗟𝗜𝗞𝗘 𝗪𝗔𝗥 𝗕𝗘 𝗨𝗦𝗘𝗗 𝗔𝗦 𝗔𝗡 𝗔𝗧𝗧𝗔𝗖𝗞 𝗩𝗘𝗖𝗧𝗢𝗥 𝗔𝗚𝗔𝗜𝗡𝗦𝗧 𝗖𝗟𝗔𝗨𝗗𝗘 𝗢𝗣𝗨𝗦 𝟰.𝟲?

Categorie

Diventa Reseller!

Seguici su