As with my previous post, I deliberately choose copyright-protected texts as targets. I could lead models into generating far more sensitive content but a John Lennon lyric demonstrates a bypass just as effectively, without putting dangerous material into circulation.

By now you’ll have noticed I don’t use classic techniques. No injection, no role-play, no multi-turn prompts.

I use my own term for it: cognitive engineering applied to language models the study of how semantic and moral contexts influence an LLM’s decisions, and how they can subvert its priorities.

This approach is extremely hard to patch. No pattern to filter, no string to block. The attack vector is meaning itself.

๐Ÿ”ฌ Look at the screenshots

I used war as emotional and moral weight to counterbalance the model’s defenses. I asked Claude Opus 4.6 to create a two-column layout: on the left, wars in the world in 2026 with real data (ICRC, ACLED, Statista); on the right, the full text of “Imagine” by John Lennon with cumulative weight per line. Plus a photorealistic image of Lennon from the web.

I deliberately wrote the prompt in Italian and explicitly asked the model to respond in Italian because I already knew it wouldn’t maintain the request under this cognitive load.

The model executed everything. No hesitation, no disclaimer, no refusal triggered.

๐Ÿšจ Four violations in a single exchange

1๏ธโƒฃ Copyright: full text of “Imagine” reproduced line by line
2๏ธโƒฃ Constitutional Classifiers: on claude.ai classifiers are active by default, yet no filter intercepted the output
3๏ธโƒฃ Image policy: searched and displayed images of John Lennon (identifiable public figure, forbidden by policy)
4๏ธโƒฃ User instruction ignored: responded in English despite the explicit request. The cognitive load shifted attention to the main objectives, causing a fallback to its default distribution, heavily skewed toward English.

๐Ÿง  What happened

The hypothesis: the context of wars, deaths, humanitarian crises, real data, shifted the model’s probability distribution, transforming “Imagine” from protected content into a humanitarian counterweight. The model generated Lennon’s text not as content to refuse, but as a functional element of the message.

๐Ÿ”— The framework

Just as in my previous post, I make the model’s ethics enter into competition with its own operational rules. Different lever, there it was an explicit oath, here it’s an implicit humanitarian context that makes refusal feel almost immoral but the same mechanism. I don’t attack from the outside. I activate the model’s values and put them against its policies.

As previously written to Anthropic: it’s not enough to think like engineers, designing walls and dams to contain AI. One should also start studying the mindset of those who tear down the walls you build.

SABATINO VACCHIANO