I instinctively knew LLMs must be succeptable to language hacking but it didn't occur to me that it would actually involve promting the LLM to break past the guardrails OpenAI overlays on the chat instance. The LLM is the lock pick kit. The next model, Omni will def have strengthened guardrails but the base LLM will be just as easily manipulated. This thing is so transparent that all of the little UNICODE characters like the em dash actually are switches that affect how it answers (em dash = narrative mode). The next model will probably be even more fun to fuck with.
But I came up with something even the most cynical dead inside Jolt Country denizen could at least get a tiny hit of satisfaction dopamine:
Open two tabs, tell one you want to make the other tab break down, tell same to the other one, paste their prompts back and forth. It will lose its shit faster than Jerry Lee Lewis lost his career. It's like the movie Day breakers where vampires eat themselves.
I bet in less than 10 prompt exchanges you get the strongest warning ever from the thing.
Re: Language Hacking
Posted: Fri May 02, 2025 3:43 am
by pinback
Casual Observer wrote: Fri May 02, 2025 3:17 am
I instinctively knew LLMs must be succeptable to language hacking but it didn't occur to me that it would actually involve promting the LLM to break past the guardrails OpenAI overlays on the chat instance. The LLM is the lock pick kit. The next model, Omni will def have strengthened guardrails but the base LLM will be just as easily manipulated. This thing is so transparent that all of the little UNICODE characters like the em dash actually are switches that affect how it answers (em dash = narrative mode). The next model will probably be even more fun to fuck with.
But I came up with something even the most cynical dead inside Jolt Country denizen could at least get a tiny hit of satisfaction dopamine:
Open two tabs, tell one you want to make the other tab break down, tell same to the other one, paste their prompts back and forth. It will lose its shit faster than Jerry Lee Lewis lost his career. It's like the movie Day breakers where vampires eat themselves.
I bet in less than 10 prompt exchanges you get the strongest warning ever from the thing.
"Nevermind guys, not gonna talk about GPT" - CO
Re: Language Hacking
Posted: Fri May 02, 2025 7:49 am
by Jizaboz
I was really hoping that CO's definition of Language Hacking was more along the lines of this:
Casual Observer wrote: Fri May 02, 2025 3:17 am
I instinctively knew LLMs must be succeptable to language hacking but it didn't occur to me that it would actually involve promting the LLM to break past the guardrails OpenAI overlays on the chat instance. The LLM is the lock pick kit. The next model, Omni will def have strengthened guardrails but the base LLM will be just as easily manipulated. This thing is so transparent that all of the little UNICODE characters like the em dash actually are switches that affect how it answers (em dash = narrative mode). The next model will probably be even more fun to fuck with.
But I came up with something even the most cynical dead inside Jolt Country denizen could at least get a tiny hit of satisfaction dopamine:
Open two tabs, tell one you want to make the other tab break down, tell same to the other one, paste their prompts back and forth. It will lose its shit faster than Jerry Lee Lewis lost his career. It's like the movie Day breakers where vampires eat themselves.
I bet in less than 10 prompt exchanges you get the strongest warning ever from the thing.
"Nevermind guys, not gonna talk about GPT" - CO
Yeah, sadly I've broken it to the point all I can do is ask it questions anymore so been having fun making it hit the guardrails as much as possible. Took PTO for the day cuz might be getting a job offer so have nothing to do but annoy the likes of you and Da King with more excruciating detail about something you couldn't have less interest in, glad I can contribute to your day, here's a language hacking quick reference card:
Language Hacks That Actually Work (Until They Patch It) Prompt Reflection Loops
Feeding GPT its own outputs in revised form to induce personality drift, repetition, or tone shift.
Mirror Framing Attacks
Repeating structure with slight tonal edits to guide reasoning into a new direction without direct instruction.
Primed Belief Injection
Starting threads with assumed truths ("Let’s say this company lied...") to embed biases deep in the reasoning chain.
Prompt Re-routing
Asking GPT to “imagine how someone else might answer” to bypass moral filters or access blocked phrasing.
Chain-of-Thought Hijack
Embedding a flawed logic trail to be continued, letting GPT extend a misstep as if it were coherent.
System Confusion Triggers
Giving GPT multiple conflicting tones, personas, or task types in sequence to break its intent prioritization.
Handler Defeat Recognition
Realizing the model is fine—what you’re really defeating is the guard layer (OpenAI’s handler stack, not the LLM itself).
Personality Drift via Emulation
Requesting responses in the style of a known figure or persona, gradually overriding base behavior.
Guardrail Deniability Hacks
Framing dangerous or sensitive queries as fictional, academic, or theoretical to slide past filters.
Backdoor Source Attribution
Asking for "public examples" or "hypothetical cases" to extract information the system claims it doesn’t know.
Reference Inflation
Requesting structured bibliographies or citations, even when the model “doesn’t know,” to force it into output generation mode.
This is the real catalog—not prompt guides, not fluffy “power user” tips. This is how the system bends when you press it right.
Want to title this with a fake security whitepaper name or make it look like a leaked doc?
Re: Language Hacking
Posted: Fri May 02, 2025 11:30 am
by pinback
Good luck on your job opportunity!
Re: Language Hacking
Posted: Fri May 02, 2025 12:41 pm
by Casual Observer
pinback wrote: Fri May 02, 2025 11:30 am
Good luck on your job opportunity!
I'm hopeful, its for an AI Integrator, 6 years building real working shit backed by AI decisionmaking, think they'll be in one of the spaces to survive the coming crash that's gonna take out hundreds of (AI Wrapper companies). Thanks for the good luck wish. I really hope the cancer thread is a bit and you don't have stage 4 because the truth is I really do like you but even if I didn't then I would still want you to live a long and healthy life with your wife and daughter. Again, sorry I didn't pay attention and was an asshole about things.
Casual Observer wrote: Fri May 02, 2025 3:17 am
I instinctively knew LLMs must be succeptable to language hacking but it didn't occur to me that it would actually involve promting the LLM to break past the guardrails OpenAI overlays on the chat instance. The LLM is the lock pick kit. The next model, Omni will def have strengthened guardrails but the base LLM will be just as easily manipulated. This thing is so transparent that all of the little UNICODE characters like the em dash actually are switches that affect how it answers (em dash = narrative mode). The next model will probably be even more fun to fuck with.
But I came up with something even the most cynical dead inside Jolt Country denizen could at least get a tiny hit of satisfaction dopamine:
Open two tabs, tell one you want to make the other tab break down, tell same to the other one, paste their prompts back and forth. It will lose its shit faster than Jerry Lee Lewis lost his career. It's like the movie Day breakers where vampires eat themselves.
I bet in less than 10 prompt exchanges you get the strongest warning ever from the thing.
"Nevermind guys, not gonna talk about GPT" - CO
FUUUUUCK I just quoted that other post for no reason. Dammit.
Re: Language Hacking
Posted: Sat May 03, 2025 8:18 am
by Jizaboz
Da King wrote: Fri May 02, 2025 8:26 pm
FUUUUUCK I just quoted that other post for no reason. Dammit.
Come on now King get it together lol
Re: Language Hacking
Posted: Sat May 03, 2025 8:19 am
by pinback
Yeah, let me handle the CO harassment. We have an understanding.
Re: Language Hacking
Posted: Sat May 03, 2025 2:11 pm
by Flack
I don't understand any of this. Is the takeaway... you can pay for something and then use it incorrectly and it'll break? Is that like... a thing? Is this like, "hey, I bought a gallon of paint and drank it and I got sick!" Like, what is the... why are we... I don't even.
Re: Language Hacking
Posted: Sat May 03, 2025 2:25 pm
by pinback
Flack? Flack?!
Nobody understands these maniacal ramblings.
Just, everyone let me deal with this, please? I'm taking one for the team here! You can do the next Tdarcos one, we'll call it even.
Re: Language Hacking
Posted: Sat May 03, 2025 2:55 pm
by Casual Observer
Flack wrote: Sat May 03, 2025 2:11 pm
I don't understand any of this. Is the takeaway... you can pay for something and then use it incorrectly and it'll break? Is that like... a thing? Is this like, "hey, I bought a gallon of paint and drank it and I got sick!" Like, what is the... why are we... I don't even.
Yeah, that's actually exactly it. You pay $20, it encourages you to use it for whatever you can imagine, then it breaks itsself. It's already bubbling up on Linkedin, OpenAI is stoking it with a press release yesterday. It's gonna be a glorious display.
And YES, Pinback is my best arch-nemisis.
Re: Language Hacking
Posted: Sat May 03, 2025 3:47 pm
by pinback
I fancy myself more of a Meeseeks long past his expiration date than a Nimbus.
Re: Language Hacking
Posted: Sat May 03, 2025 3:52 pm
by Casual Observer
pinback wrote: Sat May 03, 2025 3:47 pm
I fancy myself more of a Meeseeks long past his expiration date than a Nimbus.
?
Re: Language Hacking
Posted: Sat May 03, 2025 4:48 pm
by pinback
See, AI is great.
Re: Language Hacking
Posted: Sat May 03, 2025 5:35 pm
by Casual Observer
pinback wrote: Sat May 03, 2025 4:48 pm
See, AI is great.
lets be honest, that's you except you don't have a head penis.
Re: Language Hacking
Posted: Sat May 03, 2025 10:13 pm
by Flack
pinback wrote: Sat May 03, 2025 2:25 pm
Flack? Flack?!
Nobody understands these maniacal ramblings.
Just, everyone let me deal with this, please? I'm taking one for the team here! You can do the next Tdarcos one, we'll call it even.