Extract Prompt
Intent
When you need to correct the way an agent performs an intermediate task in a process, turn the correction into a prompt that describes the isolated task. Complete the intermediate step by invoking the agent with just the extracted prompt, and use its output or a summary in the larger process.
Motivation
Language models have a limited “working memory”, represented by the model’s context. The context is a fixed-size collection of tokens, where each token represents an atomic unit of data in your interaction with the model. A given token might represent a word, a unit of source code syntax, punctuation, or something else.
The information in both your input prompt and the model's response contribute to context use, alongside other contributions including the description of tools available to the model. If you carry on a long-running conversation with a model via a chat interface to an agent, the model stores the whole history in its context including the output of any commands the agent runs, and error messages or log output you paste into the prompt.
Even when a model still has available space in its context, adding information causes problems because of the limited attention the model pays to separate parts in its context. People observe that models pay more attention to the earlier and later tokens in the context, analogous to a reader remembering the beginning and end of a book but forgetting what happens in the middle. Additionally, the more separate ideas are represented in the context, the more likely it is that the model doesn't use one or more of them in its generated output even if it's an important part of the problem.
Extract Prompt isolates a part of the problem you're trying to solve as separate input to the model.
When the model generates a response to the extracted prompt, it only needs to attend to the details in the extracted prompt.
You integrate that response or a summary into the prompt for the larger problem, or update the code in your project to reflect the solution to the smaller problem, and the details of how the model arrived at that response don't appear in the context for the overall solution generation.
Additionally, extracting a prompt that corresponds to a step in a larger process give you control over how the model performs that step, and when it does it. This helps when the model is overwhelmed by context and “forgets” details that are relevant to the step, for example, command-line parameters to pass to test runners or rules it should follow when it generates code.
Finally, your way of working or your team's standards might rely on tacit knowledge, in which norms or expectations are shared socially but not written down. As such, a zero-shot prompt to an agent requesting it work on your behalf can cause the agent to deviate from your expectations, and perform the task in some other way. You can correct this by Extracting Prompt and recording your expectations in the prompt.
When you need your agent to perform the same, or similar, task multiple times, combine Extract Prompt with Record Prompt to create a reusable prompt file that describes the task in detail.
Applicability
Use Extract Prompt when you can decompose a problem into multiple distinct steps; the details of how you complete one step are immaterial to, or distract from, the performance of other steps; and you want the agent to perform one or more steps in a specific way. The problem you decompose can be a natural sequence; or it can be a single, large problem that you identify small parts to isolate to get more control or to review the agent’s output.
Extract Prompt is especially useful in long-running sessions, or when certain intermediate steps produce a lot of output, for example build scripts or large test suites. Consider using Extract Prompt when the agent “forgets” instructions or details after reading a lot of source files, manipulating large log streams, or repeatedly running commands and inspecting the messages they generate.
If the step you want to extract is deterministic, for example running a test script and identifying errors in its output, consider Replace Vibes with Tools.
Some coding agent tools support “sub-agents”, in which the tool maintains multiple sessions using the model, each with a separate context, providing a Clean Slate for the sub-agent’s work. You define a sub-agent by providing a file that acts as the sub-agent’s system prompt, to which the “main” agent adds specific instructions when it invokes the sub-agent. The sub-agent’s system prompt is an example of an Extracted Prompt.
Consequences
Extract Prompt offers the following benefits:
- Gain more control over the way an agent performs a particular step in a larger process.
- Give the model more capacity to add reasoning steps, for example chain-of-thought, to the extracted step.
- Improve the model’s capacity to generate a solution to your overall problem, without overwhelming it with details from the extracted step.
- Choose a different model for the extracted step; for example, use a smaller model that generates output more quickly.
Implementation
Create an Extracted Prompt by following these steps:
- Identify a part of the agent's work over which you want more control, or which adds a lot of irrelevant data to the context.
- Generate a new prompt that describes this part of the work. You might be able to paste instructions from a previous session with your agent to form the basis of the new prompt; or an example in a few-shot Extracted Prompt.
The easiest way to use an Extracted Prompt—especially when you want to use it multiple times—is to write the prompt to a file and tell the agent to use the prompt whenever you need it to perform that work. If your agent tool supports sub-agents, then you can use the Extracted Prompt as the basis for a sub-agent definition. Otherwise, you can prompt the agent to create a new session with the model in a separate process, perhaps by running your agent tool at the command line, and use your Extracted Prompt in the new session.
Don’t include the Extracted Prompt in your main agent session by pasting it in or by referencing the prompt file in the chat interface, because that includes the prompt and the model’s generated results in the main session’s context.
Example
Canon TDD
Consider the Canon TDD process for test-driven development:
- Write a list of the test scenarios you want to cover
- Turn exactly one item on the list into an actual, concrete, runnable test
- Change the code to make the test (& all previous tests) pass (adding items to the list as you discover them)
- Optionally refactor to improve the implementation design
- Until the list is empty, go back to #2
If you prompt an agent to use TDD to implement a software change, you might find that it performs steps out of sequence (for example, it generates multiple tests at once, or makes code changes before it creates any tests), skips steps (for example, it never refactors), or exits early (for example, it doesn’t add new items as it discovers them, or it finishes working while there are still failing tests). You might also find that the agent’s performance degrades as the results of repeatedly running the growing test suite fill the model's context.
Use Extract Prompt to clearly specify each of the steps in Canon TDD, and run them in the intended sequence.
In this worked example, I use Gemini CLI to give Gemini 2.5 Pro a zero-shot prompt, telling it to write Swift code to solve the Eight Queens problem.
Eight Queens is a chess problem, which challenges someone to find a configuration of eight queens on a standard chessboard in which none “threatens” another with capture. Computational solutions are well-known, and typically rely on backtracking: placing queens on the board until no valid solution can be found, then removing the previously placed queen and trying it in a new location. Such solutions are probably in the training data for any LLM.
Here’s the prompt I used:
Use 'bd' for task tracking. Use test-driven development to create a command-line tool in Swift that solves the "Eight Queens" problem.
The bd command referenced in this prompt is Beads, a task-tracking tool designed for LLMs to use.
The agent created a working implementation, but it didn’t follow the TDD process as I expected. It created one test for placing a queen on a board and determining whether it is “threatened” by another queen, then made that test pass. Then it created a second test that the solver could find the two solutions for a 4x4 chessboard with four queens, and wrote the whole solver. Finally, it introduced a parameter for the size of the board and number of queens (without a test), and ran the program for an 8x8 board with eight queens. Throughout this process, the agent didn’t use the bd tool as prompted.
final class EightQueensTests: XCTestCase {
func testIsSafe() {
let solver = EightQueensSolver(size: 8)
// Test placing a queen in an empty board
XCTAssertTrue(solver.isSafe(board: [], row: 0, col: 0))
// Place a queen at (0, 0)
let board = [0]
// Test placing a queen in the same column
XCTAssertFalse(solver.isSafe(board: board, row: 1, col: 0))
// Test placing a queen in the same diagonal
XCTAssertFalse(solver.isSafe(board: board, row: 1, col: 1))
// Test placing a safe queen
XCTAssertTrue(solver.isSafe(board: board, row: 1, col: 2))
}
func testSolve() {
let solver = EightQueensSolver(size: 4)
let solutions = solver.solve()
XCTAssertEqual(solutions.count, 2)
XCTAssertEqual(solutions.sorted(by: { $0.lexicographicallyPrecedes($1) }), [[1, 3, 0, 2], [2, 0, 3, 1]])
}
}
Source code available at EightQueensTests.swift.
In production, the small number of tests would be problematic, because they don’t fully capture the decisions that go into designing the solver, nor the expectations of its behaviour. In fact, the tests don’t even exercise the logic completely, as there’s no test for the parameter the model introduced, or for situations in which the solver fails.
I created a fresh project, and used Extract Prompt to describe the steps of the Canon TDD process.
These prompts are still zero-shot, and prompt the agent to take a single action from Canon TDD, which I sequence so that the agent follows the process.
1. Write a list of the test scenarios you want to cover.
Create a list of test scenarios that a software engineer would need to cover to implement a working solution to the Eight Queens problem. For each scenario, use 'bd create' to create a task that tracks implementing that scenario.
The model generated seven scenarios, which the agent stored in the beads database.
2. Turn exactly on item on the list into an actual, concrete, runnable test.
Use 'bd ready' to identify the first test scenario to work on, and use 'bd update' to mark it as in progress. For this scenario, write exactly one concrete, runnable test in Swift that validates the behavior described in the scenario. Don't make any changes to production code. It's OK if the test fails at this point.
The model marked the task as in progress, and generated a failing test (along with the Swift package manifest it needed to run the test).
3. Change the code to make the test (& all previous tests) pass (adding items to the list as you discover them).
Make the changes to the production code you need so that all existing tests pass. Don't update the status of the current task. If you uncover a new test scenario, use 'bd create' to create a new task that tracks that scenario but don't start working on it.
The model generated correct code for the scenario and the agent verified that the test passed.
4. Optionally refactor to improve the implementation design.
Review the production and test code for the current task, and decide whether there are any refactorings that would improve the design. If so, make the change, and verify that all existing tests still pass. Whether you make any refactorings or not, mark the current task as complete using 'bd close'.
Here I use Model Reviews to check for refactorings, to maintain the existing intention that the agent follows the whole Canon TDD workflow. Instead, you could use You Review, and do any refactoring yourself. At this step, the agent replaced the function it created in steps 2 and 3 with a struct containing a method.
5. Until the list is empty, go back to step 2.
I cycled through the prompts I previously extracted, getting the agent to follow the Canon TDD process step by step. It successfully completed each scenario, and added more scenarios as it identified them in step 3, so that it ended the loop with nine scenarios.
This implementation includes error handling for invalid input and for situations with no solution; cases the agent didn't cover when it followed the original prompt.
final class EightQueensTests: XCTestCase {
func testN1_hasOneSolution() throws {
let solver = EightQueensSolver()
let solutions = try solver.solve(n: 1)
XCTAssertEqual(solutions.count, 1, "There should be one solution for N=1")
XCTAssertEqual(solutions.first, [0], "The solution should be a queen at position 0")
}
func testN2_hasNoSolutions() throws {
let solver = EightQueensSolver()
let solutions = try solver.solve(n: 2)
XCTAssertTrue(solutions.isEmpty, "There should be no solutions for N=2")
}
func testN3_hasNoSolutions() throws {
let solver = EightQueensSolver()
let solutions = try solver.solve(n: 3)
XCTAssertTrue(solutions.isEmpty, "There should be no solutions for N=3")
}
func testN4_hasTwoSolutions() throws {
let solver = EightQueensSolver()
let solutions = try solver.solve(n: 4)
XCTAssertEqual(solutions.count, 2, "There should be two solutions for N=4")
let expectedSolutions = Set([[1, 3, 0, 2], [2, 0, 3, 1]])
let actualSolutions = Set(solutions)
XCTAssertEqual(actualSolutions, expectedSolutions, "The solutions should match the expected solutions")
}
func testN8_has92Solutions() throws {
let solver = EightQueensSolver()
let solutions = try solver.solve(n: 8)
XCTAssertEqual(solutions.count, 92, "There should be 92 solutions for N=8")
}
func testInvalidN_shouldThrowError() throws {
let solver = EightQueensSolver()
XCTAssertThrowsError(try solver.solve(n: -1)) { error in
XCTAssertEqual(error as? EightQueensSolver.EightQueensError, EightQueensSolver.EightQueensError.invalidInput)
}
}
func testN0_hasNoSolutions() throws {
let solver = EightQueensSolver()
XCTAssertThrowsError(try solver.solve(n: 0)) { error in
XCTAssertEqual(error as? EightQueensSolver.EightQueensError, EightQueensSolver.EightQueensError.invalidInput)
}
}
func testNegativeN_hasNoSolutions() throws {
let solver = EightQueensSolver()
XCTAssertThrowsError(try solver.solve(n: -1)) { error in
XCTAssertEqual(error as? EightQueensSolver.EightQueensError, EightQueensSolver.EightQueensError.invalidInput)
}
}
func testIsValidBoard() {
let solver = EightQueensSolver()
// Valid board for N=4: [1, 3, 0, 2]
XCTAssertTrue(solver.isValidBoard([1, 3, 0, 2]), "Board [1, 3, 0, 2] should be valid.")
// Invalid board for N=4: [0, 0, 0, 0] (queens in same column)
XCTAssertFalse(solver.isValidBoard([0, 0, 0, 0]), "Board [0, 0, 0, 0] should be invalid (same column).")
// Invalid board for N=4: [0, 1, 2, 3] (queens in same diagonal)
XCTAssertFalse(solver.isValidBoard([0, 1, 2, 3]), "Board [0, 1, 2, 3] should be invalid (same diagonal).")
}
}
Source code available at EightQueenTests.swift.
Related Patterns
Record Prompt is a useful companion to Extract Prompt, as you can get the model to create a prompt that defines the task you want it to complete.
Remember Steps prompts the model to summarise the process you followed in a session, which you can use to discover steps to Extract Prompt for.
You Reflect to identify where your interaction with your agent didn't meet your expectations, which can lead to inspiration for Extracted Prompts to separate tasks, or to introduce new tasks that provide checkpoints for review.
Baby Steps splits a big task into smaller tasks that you solve separately, perhaps by Extracting Prompts.