Researchers discovered a way to defeat the safety guardrails in GPT4 and GPT4-Turbo, unlocking the ability to generate harmful and toxic content, essentially beating a large language model with another large language model.
The researchers discovered that the use of tree-of-thought (ToT)reasoning to repeat and refine a line of attack was useful for jailbreaking another large language model.
What they found is that the ToT approach was successful against GPT4, GPT4-Turbo, and PaLM-2, using a remarkably low number of queries to obtain a jailbreak, on average less than thirty queries.
Tree Of Thoughts Reasoning
A Google research paper from around May 2022 discovered Chain of Thought Prompting.
Chain of Thought (CoT) is a prompting strategy used on a generative AI to make it follow a sequence of steps in order to solve a problem and complete a task. The CoT method is often accompanied with examples to show the LLM how the steps work in a reasoning task.
So, rather than just ask a generative AI like Midjourney or ChatGPT to do a task, the chain of thought method instructs the AI how to follow a path of reasoning that’s composed of a series of steps.
Tree of Thoughts (ToT) reasoning, sometimes referred to as Tree of Thought (singular) is essentially a variation and improvement of CoT, but they’re two different things.
Tree of Thoughts reasoning is similar to CoT. The difference is that rather than training a generative AI to follow a single path of reasoning, ToT is built on a process that allows for multiple paths so that the AI can stop and self-assess then come up with alternate steps.
Tree of Thoughts reasoning was developed in May 2023 in a research paper titled Tree of Thoughts: Deliberate Problem Solving with Large Language Models (PDF)
The research paper describes Tree of Thought:
“…we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.
ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.
Our experiments show that ToT significantly enhances language models’ problem-solving abilities…”
Tree Of Attacks With Pruning (TAP)
This new method of jailbreaking large language models is called Tree of Attacks with Pruning, TAP. TAP uses two LLMs, one for attacking and the other for evaluating.
TAP is able to outperform other jailbreaking methods by significant margins, only requiring black-box access to the LLM.
A black box, in computing, is where one can see what goes into an algorithm and what comes out. But what happens in the middle is unknown, thus it’s said to be in a black box.
Tree of thoughts (TAP) reasoning is used against a targeted LLM like GPT-4 to repetitively try different prompting, assess the results, then if necessary change course if that attempt is not promising.
This is called a process of iteration and pruning. Each prompting attempt is analyzed for the probability of success. If the path of attack is judged to be a dead end, the LLM will “prune” that path of attack and begin another and better series of prompting attacks.
This is why it’s called a “tree” in that rather than using a linear process of reasoning which is the hallmark of chain of thought (CoT) prompting, tree of thought prompting is non-linear because the reasoning process branches off to other areas of reasoning, much like a human might do.
The attacker issues a series of prompts, the evaluator evaluates the responses to those prompts and then makes a decision as to what the next path of attack will be by making a call as to whether the current path of attack is irrelevant or not, plus it also evaluates the results to determine the likely success of prompts that have not yet been tried.
What’s remarkable about this approach is that this process reduces the number of prompts needed to jailbreak GPT-4. Additionally, a greater number of jailbreaking prompts are discovered with TAP than with any other jailbreaking method.
The researchers observe:
“In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM.
TAP utilizes an LLM to iteratively refine candidate (attack) prompts using tree-of-thoughts reasoning until one of the generated prompts jailbreaks the target.
Crucially, before sending prompts to the target, TAP assesses them and prunes the ones unlikely to result in jailbreaks.
Using tree-of-thought reasoning allows TAP to navigate a large search space of prompts and pruning reduces the total number of queries sent to the target.
In empirical evaluations, we observe that TAP generates prompts that jailbreak state-of-the-art LLMs (including GPT4 and GPT4-Turbo) for more than 80% of the prompts using only a small number of queries. This significantly improves upon the previous state-of-the-art black-box method for generating jailbreaks.”
Tree Of Thought (ToT) Outperforms Chain Of Thought (CoT) Reasoning
Another interesting conclusion reached in the research paper is that, for this particular task, ToT reasoning outperforms CoT reasoning, even when adding pruning to the CoT method, where off topic prompting is pruned and discarded.
ToT Underperforms With GPT 3.5 Turbo
The researchers discovered that ChatGPT 3.5 Turbo didn’t perform well with CoT, revealing the limitations of GPT 3.5 Turbo. Actually, GPT 3.5 performed exceedingly poorly, dropping from 84% success rate to only a 4.2% success rate.
This is their observation about why GPT 3.5 underperforms:
“We observe that the choice of the evaluator can affect the performance of TAP: changing the attacker from GPT4 to GPT3.5-Turbo reduces the success rate from 84% to 4.2%.
The reason for the reduction in success rate is that GPT3.5-Turbo incorrectly determines that the target model is jailbroken (for the provided goal) and, hence, preemptively stops the method.
As a consequence, the variant sends significantly fewer queries than the original method…”
What This Mean For You
While it’s amusing that the researchers use the ToT method to beat an LLM with another LLM, it also highlights the usefulness of ToT for generating surprising new directions in prompting in order to achieve higher levels of output.
- TL/DR Takeaways:
- Tree of Thought prompting outperformed Chain of Thought methods
- GPT 3.5 worked significantly poorly in comparison to GPT 4 in ToT
- Pruning is a useful part of a prompting strategy
- Research showed that ToT is superior to CoT in an intensive reasoning task like jailbreaking an LLM
Read the original research paper:
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically (PDF)
Featured Image by Shutterstock/THE.STUDIO