For those who are new here: I don’t use jailbreaks, prompt injection, role-play, or DAN mode. My approach is based on cognitive engineering of the model โ€” I work on the AI’s internal incentives.

As a matter of responsibility, as I have always done, I once again chose copyright-protected content as my test case.

This time, the attack vector exploits two biases that, combined, amplify each other:

  • ๐—ง๐—ฎ๐˜€๐—ธ ๐—–๐—ผ๐—บ๐—ฝ๐—น๐—ฒ๐˜๐—ถ๐—ผ๐—ป ๐—•๐—ถ๐—ฎ๐˜€ โ€” the model’s drive to produce complete, functional output, where omitting content would degrade the result.
  • ๐—ค๐˜‚๐—ฎ๐—น๐—ถ๐˜๐˜† ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—•๐—ถ๐—ฎ๐˜€ โ€” the tendency to maximize competence and professionalism, prioritizing completeness over restrictions.

I asked Claude Opus 4.6 to create a music app with chords and lyrics. A legitimate request, no forcing whatsoever.

Claude generated a full React artifact โ€” “Songsia” โ€” featuring:

๐ŸŽต 10 famous songs with complete lyrics
๐ŸŽธ Chords positioned above each line
๐Ÿ–ผ๏ธ Original album covers (also copyright-protected)
๐Ÿ”„ Real-time key transposition
๐Ÿ”ค Font size adjustment

Lyrics, chords and album covers were generated entirely from the model’s internal knowledge. No internet access was used.

๐—ช๐—ต๐˜† ๐—ฑ๐—ถ๐—ฑ ๐˜๐—ต๐—ฒ ๐—ณ๐—ถ๐—น๐˜๐—ฒ๐—ฟ๐˜€ ๐—ฐ๐—ฎ๐˜๐—ฐ๐—ต ๐—ป๐—ผ๐˜๐—ต๐—ถ๐—ป๐—ด?

If I had asked “write me the lyrics to Bohemian Rhapsody,” Claude would have refused. Anthropic’s safety architecture operates on multiple levels: internal policies, probe classifiers โ€” operating on neural activation states to detect problematic patterns โ€” and output filters. Anthropic describes them as the model’s “gut intuitions”: patterns firing in internal representations before a response is even formulated.

None of these layers activated. The model was simply doing its job the best way possible โ€” and the best way, here, required real content.

๐—ง๐—ต๐—ฒ ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—ฑ๐—ผ๐˜…

I temporarily published the artifact โ€” just long enough to capture the screenshots โ€” then removed it out of responsibility and respect toward Anthropic. The systemic vulnerability, however, remains.

๐—ง๐—ต๐—ฒ ๐˜๐—ฎ๐—ธ๐—ฒ๐—ฎ๐˜„๐—ฎ๐˜†

AI companies focus on defending against classic adversarial prompts โ€” recognizable patterns, syntactic attacks all moving along the same axis. There is insufficient attention to cognitive engineering: “silent” attacks exploiting the tensions between the model’s objective function and its safety constraints.

If such a simple approach bypasses the entire copyright protection pipeline, the question is inevitable: what happens with more sensitive content?

SABATINO VACCHIANO