We’ve all been there: you spin up a cutting-edge LLM or AI agent, give it a task, and within a few iterations, it hallucinates, wanders off track, or completely forgets the original architecture. Left to their own devices, AI models are like cats—brilliant, but notoriously difficult to herd.
If you want an AI agent to build production-grade software instead of a mountain of technical debt, you need a strict framework. Here is my battle-tested workflow for herding AI models and keeping them relentlessly productive.
1. The Blueprinting Phase: Invest in the Master Plan
Never let an AI start writing code without a strategy. Treat token usage at this stage as an investment, not an expense.
- The High-Level Plan: Start by putting a cutting-edge model into “plan mode” on its highest creativity/reasoning settings. Have it map out the entire architecture.
- The Single Source of Truth: Crucially, instruct the model to save this roadmap into a masterPlan.md file. This file becomes your anchor for the rest of the project.
- The Micro-Plans: Break down the master plan into major domain areas. For each area, generate a dedicated markdown file (e.g., dataLayerPlan.md, authPlan.md).
2. The Execution Phase: Step-by-Step Implementation
With your documentation in place, execute the plans in strict logical order. For the specific area you are tackling, have the AI generate a granular execution document—for example, step1Implementation.md.
When you prompt your agent or coding model, always dual-reference your files:
- Point to the specific implementation step (step1Implementation.md).
- Remind it of the big picture by referencing masterPlan.md.
3. The Iteration Loop: Accumulating Constraints
As you iterate through the implementation, your prompts should follow a strict pattern of giving clear direction while continuously enforcing a growing list of constraints.
Never assume an AI knows your coding style or will maintain it naturally—even if it has full context of your codebase. You must explicitly fence it in. Accumulate constraints with every single prompt, including non-negotiables like:
- “Always write comprehensive unit tests.”
- “Always include architecture tests (e.g., ArchUnit) to enforce package boundaries.”
- *“Do not write temporary or placeholder code.”
- *“Do not generate unnecessary scaffolding or boilerplate.”
4. The Quality Gate: Continuous & Cross-Model Reviews
Code review is where you win or lose the battle against AI drift.
The Inner Loop (Human Review)
After every single iteration, line-by-line code review is mandatory. Never commit a change without reviewing the diff to spot architectural deviations, lazy implementations, or code smells. This is also your best opportunity to catch a new bad habit the AI is forming and turn it into a new constraint for the next prompt.
The Outer Loop (Cross-Model Review)
Once an entire implementation step (like step1Implementation.md) is complete, reviewing the entire delta by yourself becomes overwhelming. This is the perfect time to leverage a multi-model strategy.
Take the completed code and pass it to a completely different model on a high-reasoning setting (for example, if you build with GPT, review with Claude). Prompt the reviewer model by feeding it:
- The masterPlan.md
- The step1Implementation.md
- Your complete list of mandatory constraints
Ask it to ruthlessly audit the first model’s work against those three documents.
Conclusion
Herding AI models isn’t about writing a single clever prompt; it’s about establishing a rigorous engineering process. By separating planning from execution, maintaining a living list of constraints, and using a multi-model review system, you can transform AI from an unpredictable assistant into a highly disciplined engineering partner.
Now go set up your master plan, and have fun building!
