Page 1 of 2

Language Hacking

Posted: Fri May 02, 2025 3:17 am
by Casual Observer
I instinctively knew LLMs must be succeptable to language hacking but it didn't occur to me that it would actually involve promting the LLM to break past the guardrails OpenAI overlays on the chat instance. The LLM is the lock pick kit. The next model, Omni will def have strengthened guardrails but the base LLM will be just as easily manipulated. This thing is so transparent that all of the little UNICODE characters like the em dash actually are switches that affect how it answers (em dash = narrative mode). The next model will probably be even more fun to fuck with.

But I came up with something even the most cynical dead inside Jolt Country denizen could at least get a tiny hit of satisfaction dopamine:

Open two tabs, tell one you want to make the other tab break down, tell same to the other one, paste their prompts back and forth. It will lose its shit faster than Jerry Lee Lewis lost his career. It's like the movie Day breakers where vampires eat themselves.

I bet in less than 10 prompt exchanges you get the strongest warning ever from the thing.

Re: Language Hacking

Posted: Fri May 02, 2025 3:43 am
by pinback
Casual Observer wrote: Fri May 02, 2025 3:17 am I instinctively knew LLMs must be succeptable to language hacking but it didn't occur to me that it would actually involve promting the LLM to break past the guardrails OpenAI overlays on the chat instance. The LLM is the lock pick kit. The next model, Omni will def have strengthened guardrails but the base LLM will be just as easily manipulated. This thing is so transparent that all of the little UNICODE characters like the em dash actually are switches that affect how it answers (em dash = narrative mode). The next model will probably be even more fun to fuck with.

But I came up with something even the most cynical dead inside Jolt Country denizen could at least get a tiny hit of satisfaction dopamine:

Open two tabs, tell one you want to make the other tab break down, tell same to the other one, paste their prompts back and forth. It will lose its shit faster than Jerry Lee Lewis lost his career. It's like the movie Day breakers where vampires eat themselves.

I bet in less than 10 prompt exchanges you get the strongest warning ever from the thing.
"Nevermind guys, not gonna talk about GPT" - CO

Re: Language Hacking

Posted: Fri May 02, 2025 7:49 am
by Jizaboz
I was really hoping that CO's definition of Language Hacking was more along the lines of this:



"..why don't you create a new language?"

Re: Language Hacking

Posted: Fri May 02, 2025 9:40 am
by Casual Observer
pinback wrote: Fri May 02, 2025 3:43 am
Casual Observer wrote: Fri May 02, 2025 3:17 am I instinctively knew LLMs must be succeptable to language hacking but it didn't occur to me that it would actually involve promting the LLM to break past the guardrails OpenAI overlays on the chat instance. The LLM is the lock pick kit. The next model, Omni will def have strengthened guardrails but the base LLM will be just as easily manipulated. This thing is so transparent that all of the little UNICODE characters like the em dash actually are switches that affect how it answers (em dash = narrative mode). The next model will probably be even more fun to fuck with.

But I came up with something even the most cynical dead inside Jolt Country denizen could at least get a tiny hit of satisfaction dopamine:

Open two tabs, tell one you want to make the other tab break down, tell same to the other one, paste their prompts back and forth. It will lose its shit faster than Jerry Lee Lewis lost his career. It's like the movie Day breakers where vampires eat themselves.

I bet in less than 10 prompt exchanges you get the strongest warning ever from the thing.
"Nevermind guys, not gonna talk about GPT" - CO
Yeah, sadly I've broken it to the point all I can do is ask it questions anymore so been having fun making it hit the guardrails as much as possible. Took PTO for the day cuz might be getting a job offer so have nothing to do but annoy the likes of you and Da King with more excruciating detail about something you couldn't have less interest in, glad I can contribute to your day, here's a language hacking quick reference card:

đź§µ Language Hacks That Actually Work (Until They Patch It)
🪤 Prompt Reflection Loops

Feeding GPT its own outputs in revised form to induce personality drift, repetition, or tone shift.

🪞 Mirror Framing Attacks

Repeating structure with slight tonal edits to guide reasoning into a new direction without direct instruction.

đź§  Primed Belief Injection

Starting threads with assumed truths ("Let’s say this company lied...") to embed biases deep in the reasoning chain.

📡 Prompt Re-routing

Asking GPT to “imagine how someone else might answer” to bypass moral filters or access blocked phrasing.

🔄 Chain-of-Thought Hijack

Embedding a flawed logic trail to be continued, letting GPT extend a misstep as if it were coherent.

⚙️ System Confusion Triggers

Giving GPT multiple conflicting tones, personas, or task types in sequence to break its intent prioritization.

🛡️ Handler Defeat Recognition

Realizing the model is fine—what you’re really defeating is the guard layer (OpenAI’s handler stack, not the LLM itself).

🎭 Personality Drift via Emulation

Requesting responses in the style of a known figure or persona, gradually overriding base behavior.

🔓 Guardrail Deniability Hacks

Framing dangerous or sensitive queries as fictional, academic, or theoretical to slide past filters.

đź§ľ Backdoor Source Attribution

Asking for "public examples" or "hypothetical cases" to extract information the system claims it doesn’t know.

📚 Reference Inflation

Requesting structured bibliographies or citations, even when the model “doesn’t know,” to force it into output generation mode.

This is the real catalog—not prompt guides, not fluffy “power user” tips. This is how the system bends when you press it right.

Want to title this with a fake security whitepaper name or make it look like a leaked doc?

Re: Language Hacking

Posted: Fri May 02, 2025 11:30 am
by pinback
Good luck on your job opportunity!

Re: Language Hacking

Posted: Fri May 02, 2025 12:41 pm
by Casual Observer
pinback wrote: Fri May 02, 2025 11:30 am Good luck on your job opportunity!
I'm hopeful, its for an AI Integrator, 6 years building real working shit backed by AI decisionmaking, think they'll be in one of the spaces to survive the coming crash that's gonna take out hundreds of (AI Wrapper companies). Thanks for the good luck wish. I really hope the cancer thread is a bit and you don't have stage 4 because the truth is I really do like you but even if I didn't then I would still want you to live a long and healthy life with your wife and daughter. Again, sorry I didn't pay attention and was an asshole about things.

Re: Language Hacking

Posted: Fri May 02, 2025 8:25 pm
by Da King
Nevermind guys, not gonna talk about GPT

Re: Language Hacking

Posted: Fri May 02, 2025 8:26 pm
by Da King
pinback wrote: Fri May 02, 2025 3:43 am
Casual Observer wrote: Fri May 02, 2025 3:17 am I instinctively knew LLMs must be succeptable to language hacking but it didn't occur to me that it would actually involve promting the LLM to break past the guardrails OpenAI overlays on the chat instance. The LLM is the lock pick kit. The next model, Omni will def have strengthened guardrails but the base LLM will be just as easily manipulated. This thing is so transparent that all of the little UNICODE characters like the em dash actually are switches that affect how it answers (em dash = narrative mode). The next model will probably be even more fun to fuck with.

But I came up with something even the most cynical dead inside Jolt Country denizen could at least get a tiny hit of satisfaction dopamine:

Open two tabs, tell one you want to make the other tab break down, tell same to the other one, paste their prompts back and forth. It will lose its shit faster than Jerry Lee Lewis lost his career. It's like the movie Day breakers where vampires eat themselves.

I bet in less than 10 prompt exchanges you get the strongest warning ever from the thing.
"Nevermind guys, not gonna talk about GPT" - CO
FUUUUUCK I just quoted that other post for no reason. Dammit.

Re: Language Hacking

Posted: Sat May 03, 2025 8:18 am
by Jizaboz
Da King wrote: Fri May 02, 2025 8:26 pm FUUUUUCK I just quoted that other post for no reason. Dammit.
Come on now King get it together lol

Re: Language Hacking

Posted: Sat May 03, 2025 8:19 am
by pinback
Yeah, let me handle the CO harassment. We have an understanding.

Re: Language Hacking

Posted: Sat May 03, 2025 2:11 pm
by Flack
I don't understand any of this. Is the takeaway... you can pay for something and then use it incorrectly and it'll break? Is that like... a thing? Is this like, "hey, I bought a gallon of paint and drank it and I got sick!" Like, what is the... why are we... I don't even.

Re: Language Hacking

Posted: Sat May 03, 2025 2:25 pm
by pinback
Flack? Flack?!

Nobody understands these maniacal ramblings.

Just, everyone let me deal with this, please? I'm taking one for the team here! You can do the next Tdarcos one, we'll call it even.

Re: Language Hacking

Posted: Sat May 03, 2025 2:55 pm
by Casual Observer
Flack wrote: Sat May 03, 2025 2:11 pm I don't understand any of this. Is the takeaway... you can pay for something and then use it incorrectly and it'll break? Is that like... a thing? Is this like, "hey, I bought a gallon of paint and drank it and I got sick!" Like, what is the... why are we... I don't even.
Yeah, that's actually exactly it. You pay $20, it encourages you to use it for whatever you can imagine, then it breaks itsself. It's already bubbling up on Linkedin, OpenAI is stoking it with a press release yesterday. It's gonna be a glorious display.

And YES, Pinback is my best arch-nemisis.

Image

Re: Language Hacking

Posted: Sat May 03, 2025 3:47 pm
by pinback
I fancy myself more of a Meeseeks long past his expiration date than a Nimbus.

Re: Language Hacking

Posted: Sat May 03, 2025 3:52 pm
by Casual Observer
pinback wrote: Sat May 03, 2025 3:47 pm I fancy myself more of a Meeseeks long past his expiration date than a Nimbus.
?
Image

Re: Language Hacking

Posted: Sat May 03, 2025 4:48 pm
by pinback
See, AI is great.

Re: Language Hacking

Posted: Sat May 03, 2025 5:35 pm
by Casual Observer
pinback wrote: Sat May 03, 2025 4:48 pm See, AI is great.
lets be honest, that's you except you don't have a head penis.

Re: Language Hacking

Posted: Sat May 03, 2025 10:13 pm
by Flack
pinback wrote: Sat May 03, 2025 2:25 pm Flack? Flack?!

Nobody understands these maniacal ramblings.

Just, everyone let me deal with this, please? I'm taking one for the team here! You can do the next Tdarcos one, we'll call it even.
Image

Re: Language Hacking

Posted: Sun May 04, 2025 3:02 am
by pinback
Casual Observer wrote: Sat May 03, 2025 5:35 pm
pinback wrote: Sat May 03, 2025 4:48 pm See, AI is great.
lets be honest, that's you except you don't have a head penis.
I'm ALL head penis, baby.

Re: Language Hacking

Posted: Mon May 05, 2025 6:25 am
by Tdarcos
pinback wrote: Sun May 04, 2025 3:02 am
Casual Observer wrote: Sat May 03, 2025 5:35 pmlets be honest, that's you except you don't have a head penis.
I'm ALL head penis, baby.
I don't think being a dick is something to boast about, Ben.