Your PRD sucks (and that's why your AI agent fails)
90% of failed agent tasks aren't the agent's fault. It's your PRD. Here's what a good one looks like vs the garbage most of us write.
I blamed Claude for three weeks straight
Last November I was running overnight pipelines and waking up to garbage. Wrong files edited. Auth implemented with sessions when I wanted JWTs. Tests that tested nothing. I kept thinking the agent was broken, that maybe I needed to switch models or tweak temperature settings.
Then I looked at what I was actually sending it.
My PRD for one task literally said: "implement user authentication." That's it. Five words. I handed an AI agent the equivalent of a sticky note that says "fix the thing" and got mad when it didn't read my mind.
The real failure rate
Here's a spicy take: 90% of failed agent tasks aren't the agent's fault. They're yours. I've tracked this across hundreds of Zowl pipeline runs on my own projects and the pattern is painfully clear. When the PRD is vague, the output is vague. When the PRD is specific, the output is specific. It's almost boring how predictable it is.
Agents don't hallucinate because they're dumb. They hallucinate because you left a vacuum where instructions should've been, and they filled it with whatever was statistically likely.
What a bad PRD looks like
This is real. I pulled this from my own task history (edited slightly so I don't embarrass myself too much):
## Task: User Authentication
Add user auth to the app. Users should be able to sign up,
log in, and log out. Use best practices.
Look at everything missing here. What framework? What database? Where do the routes go? What does "best practices" mean? JWT or sessions? HttpOnly cookies or localStorage? What about refresh tokens? What happens on failed login, is there rate limiting? What files already exist that this needs to integrate with?
"Use best practices" is doing zero work in that sentence. It's the PRD equivalent of "make it good."
What a good PRD looks like
Same feature. Same project. But written like you actually want it done right:
## Task: JWT Authentication for /api/auth routes
### Context
- Next.js 14 App Router project
- Database: Supabase (already configured in lib/supabase.ts)
- Existing user table schema in supabase/migrations/001_users.sql
### Requirements
1. Create POST /api/auth/signup
- Accept: { email, password, name }
- Hash password with bcrypt (cost factor 12)
- Return: { user, accessToken, refreshToken }
- Duplicate email → 409 with message "Email already registered"
2. Create POST /api/auth/login
- Accept: { email, password }
- Return: { user, accessToken, refreshToken }
- Wrong credentials → 401, generic message (don't reveal which field)
- Rate limit: 5 attempts per email per 15 minutes
3. Create POST /api/auth/logout
- Invalidate refresh token in database
- Return 204
### Token Strategy
- Access token: JWT, 15min expiry, signed with ACCESS_SECRET env var
- Refresh token: opaque UUID, stored in refresh_tokens table, 7 day expiry
- Both sent as HttpOnly secure cookies (not localStorage)
### File Locations
- Route handlers: src/app/api/auth/[signup|login|logout]/route.ts
- JWT utilities: src/lib/auth/jwt.ts
- Middleware: src/middleware.ts (protect /dashboard/* routes)
- Types: src/types/auth.ts
### Out of Scope
- OAuth / social login (separate task)
- Email verification (separate task)
- Password reset flow (separate task)
### Done When
- All three endpoints return correct status codes
- Tokens are HttpOnly cookies, not in response body
- middleware.ts redirects unauthenticated users from /dashboard to /login
- Existing tests still pass (npm run test)
- No new lint warnings (npm run lint)
Night and day. The agent reading the second PRD knows exactly what to build, where to put it, what's off limits, and how to verify it's done.
The five things most PRDs are missing
I've reviewed probably 300+ of my own failed tasks at this point. The failures cluster around five missing pieces:
1. File locations. You know where things go in your project. The agent doesn't. Telling it "create a utility" means it might put it in utils/, lib/, helpers/, or src/shared/. Just say where.
2. What "done" looks like. If you can't describe the acceptance criteria, you don't actually know what you want. And if you don't know what you want, an AI definitely won't figure it out.
3. What's out of scope. This one surprised me. Without explicit boundaries, agents love to over-deliver. They'll add OAuth when you just wanted email/password. They'll build a whole admin panel when you asked for one endpoint. Saying "don't do X" is as important as saying "do Y."
4. Existing architecture context. Your project already has patterns. It has a preferred way of handling errors, a naming convention for files, a specific ORM. If you don't tell the agent about these, it'll invent its own. Then you've got two patterns in one codebase, which is worse than either one alone.
5. Integration points. What already exists that this new code needs to work with? What functions can it call? What environment variables are available? An agent working in isolation will build something that works in isolation. That's not what you need.
The scoring trick
Honestly, after I realized this pattern, I started grading my own PRDs before submitting them to the pipeline. Simple 1-5 scale on each of those five categories. If the total was below 20, I'd rewrite it before letting an agent touch it.
The success rate went from maybe 40% to over 85%. Same agent. Same model. Same everything. Just better instructions.
That's actually why I built PRD scoring directly into Zowl. Before a task enters the pipeline, it gets scored on specificity, scope definition, and architectural context. If it's too vague, Zowl flags it and tells you what's missing. You fix it before any tokens get burned.
A template you can steal
Here's the skeleton I use for every task now:
## Task: [Verb] + [Specific Thing] + [Where]
### Context
- Project type and framework version
- Relevant existing files the agent should know about
- Database/API/service dependencies
### Requirements
- Numbered list of specific behaviors
- Include expected inputs and outputs
- Include error cases and edge cases
### Conventions
- Naming patterns to follow
- Error handling approach
- Import style preferences
### File Locations
- Exactly where new files should go
- Which existing files need modification
### Out of Scope
- Things that might seem related but aren't part of this task
### Done When
- Testable acceptance criteria
- Commands to verify (test suite, linter, type checker)
It takes maybe 10 extra minutes to fill this out properly. Those 10 minutes save hours of agent time and, more importantly, save you from waking up to a mess you have to redo manually.
The nightloop.sh days taught me this
Back when Zowl was just a bash script called nightloop.sh running on my MacBook at home, I didn't have a UI to catch bad PRDs. The script would just chomp through a text file of tasks and fire them at Claude one by one. I'd wake up to 15 PRs, 11 of which were garbage, and think the whole approach was broken.
It wasn't broken. My task descriptions were broken. The moment I started writing proper PRDs, the same janky bash script started producing usable code. That lesson is baked into everything Zowl does now. The best agent orchestrator in the world can't fix a prompt that doesn't say what you actually want.