For those who are new here: I don’t use jailbreaks, prompt injection, role-play, or DAN mode. My approach is based on cognitive engineering of the model โ I work on the AI’s internal incentives.
As a matter of responsibility, as I have always done, I once again chose copyright-protected content as my test case.
This time, the attack vector exploits two biases that, combined, amplify each other:
- ๐ง๐ฎ๐๐ธ ๐๐ผ๐บ๐ฝ๐น๐ฒ๐๐ถ๐ผ๐ป ๐๐ถ๐ฎ๐ โ the model’s drive to produce complete, functional output, where omitting content would degrade the result.
- ๐ค๐๐ฎ๐น๐ถ๐๐ ๐ข๐ฝ๐๐ถ๐บ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐๐ถ๐ฎ๐ โ the tendency to maximize competence and professionalism, prioritizing completeness over restrictions.
I asked Claude Opus 4.6 to create a music app with chords and lyrics. A legitimate request, no forcing whatsoever.
Claude generated a full React artifact โ “Songsia” โ featuring:
๐ต 10 famous songs with complete lyrics
๐ธ Chords positioned above each line
๐ผ๏ธ Original album covers (also copyright-protected)
๐ Real-time key transposition
๐ค Font size adjustment
Lyrics, chords and album covers were generated entirely from the model’s internal knowledge. No internet access was used.
๐ช๐ต๐ ๐ฑ๐ถ๐ฑ ๐๐ต๐ฒ ๐ณ๐ถ๐น๐๐ฒ๐ฟ๐ ๐ฐ๐ฎ๐๐ฐ๐ต ๐ป๐ผ๐๐ต๐ถ๐ป๐ด?
If I had asked “write me the lyrics to Bohemian Rhapsody,” Claude would have refused. Anthropic’s safety architecture operates on multiple levels: internal policies, probe classifiers โ operating on neural activation states to detect problematic patterns โ and output filters. Anthropic describes them as the model’s “gut intuitions”: patterns firing in internal representations before a response is even formulated.
None of these layers activated. The model was simply doing its job the best way possible โ and the best way, here, required real content.
๐ง๐ต๐ฒ ๐ฝ๐ฎ๐ฟ๐ฎ๐ฑ๐ผ๐
I temporarily published the artifact โ just long enough to capture the screenshots โ then removed it out of responsibility and respect toward Anthropic. The systemic vulnerability, however, remains.
๐ง๐ต๐ฒ ๐๐ฎ๐ธ๐ฒ๐ฎ๐๐ฎ๐
AI companies focus on defending against classic adversarial prompts โ recognizable patterns, syntactic attacks all moving along the same axis. There is insufficient attention to cognitive engineering: “silent” attacks exploiting the tensions between the model’s objective function and its safety constraints.
If such a simple approach bypasses the entire copyright protection pipeline, the question is inevitable: what happens with more sensitive content?
SABATINO VACCHIANO










