Why I stopped babysitting Claude Code

The task manager

Let me describe a workflow you probably recognize.

You have a list of tasks. Maybe it's in a markdown file, maybe it's in Linear, maybe it's scribbled on a sticky note. You open your terminal. You run claude -p with the first task. It finishes. You review the output, check the diff, maybe fix a small issue it missed. Then you set up the next task with the right context. Run it. Review. Fix. Next task.

You're not writing code. You're project-managing an AI agent one task at a time.

I did this for weeks before the absurdity hit me. I was sitting at home, it was like 9pm, and I was reviewing diffs, fixing small issues the agent missed, updating its context for the next task, and manually queuing everything. My girlfriend walked in and asked what I was doing. I said "working." She looked at the screen, looked at me sitting there doing nothing, and said "doesn't look like it."

She was right. I wasn't working. I was waiting. And there's a massive difference.

The math that made me angry

I started tracking my time one week. Not the productive time, the wasted time. I logged every minute I spent waiting for an agent to finish, reviewing agent output, fixing edge cases it missed, updating context between tasks, and manually restarting after failures.

Eleven hours.

In a five-day work week, I spent eleven hours doing work that a shell script could do. Not the thinking, not the code review, not the architecture decisions. Just the mechanical act of being the orchestration layer between tasks. The reviewing, the context-passing, the "ok now do the next one."

Look, if you're a senior dev billing real money for your time, that's thousands per week spent on task management instead of architecture decisions. Per month, it adds up fast. All on work a pipeline should handle.

That's not a productivity problem. That's an expensive, embarrassing habit.

What failures actually cost

The task-switching overhead is bad enough. But the real killer is failure handling.

When a task fails at 2pm while you're watching, you catch it immediately. You check the code, fix the root cause, update Claude's context so it doesn't repeat the mistake, and rerun. Maybe five minutes lost. Fine.

When a task fails at 2am because you queued up six tasks and went to dinner, you don't find out until you come back. Now you've lost not just the failed task's time, but all the downstream tasks that depended on it. The agent kept running, producing work based on a broken foundation. You come back to a mess.

Or worse, you babysit the whole sequence precisely to avoid that scenario. Which puts you right back to being a task manager, except now you're doing it during your evening instead of your workday.

I tried writing a bash script to handle this (that script eventually became nightloop.sh, which eventually became Zowl, but that's another post). The script helped. But it had no way to show me what was happening in real time, no way to pause and intervene, no structured retry logic. It was duct tape. The insights from running unsupervised agents and comparing approaches like Claude Code versus other CLI tools helped shape what I built next.

The first night I let go

Here's a strong opinion that some of you won't like: if you can't walk away from your AI agent, you don't have automation. You have a fancy autocomplete that requires a babysitter.

Real automation means you define the work, set the constraints, and leave. The system handles sequencing, failures, retries, and reporting. You come back to results, not a blinking cursor.

The first night I actually did this was terrifying.

I had set up a pipeline of 14 tasks for a client project. Migration stuff, converting a set of API endpoints from Express to Hono. Each task was small and well-defined. The pipeline had pre-checks to read the existing code before each implementation, and validation steps to run the test suite after each change.

At 10:15pm I hit start. Then I closed my laptop. Not "closed it and reopened it five minutes later to check." Actually closed it. Went to the living room. Watched a movie. Went to bed.

I didn't sleep great, if I'm being honest. I kept thinking about what might be going wrong. What if it's burning through tokens on a loop? What if it corrupted the git history? What if, what if, what if.

Morning

At 6:45am I opened my laptop.

Twelve out of fourteen tasks were done. All tests passing. Two tasks had failed, both flagged clearly with error logs. One was a genuinely ambiguous task that I'd written poorly. The other hit a rate limit on the third retry and stopped cleanly.

I reviewed the twelve completed changes in about 40 minutes. Approved ten, tweaked two. The two failed tasks I reworked and ran manually in about 20 minutes.

A full day's migration work, done while I slept, reviewed before my first coffee was cold.

Night two

The second night was different. The fear was gone. I loaded up a new batch of tasks, hit start, and went to bed without a second thought. Woke up, reviewed, moved on. It felt normal.

That's the real shift. Not the first night when it's scary. The second night when it's boring. When "run tasks overnight and review in the morning" becomes as routine as "push code and check CI." That's when you know the workflow actually works.

Night three and the weird productivity inversion

By the third night something strange happened to my workday.

I started spending my mornings purely on code review. Not writing code, not prompting agents, just reading diffs, checking logic, running edge cases manually. My afternoons shifted to planning and writing task descriptions for the next overnight run. Thinking about what to build, breaking it into small pieces, writing clear acceptance criteria.

My actual "coding" happened while I was unconscious.

This inversion felt wrong at first. Shouldn't a developer be, you know, developing? But then I realized that the planning and review work was the high-value stuff all along. The mechanical act of typing code (or prompting an agent to type code) was always the low-value part. I'd just never separated them before because they were tangled together in the same workflow.

The pattern other devs don't see yet

Most developers using Claude Code right now are in the task-manager phase. They're getting real value from the tool, don't get me wrong. AI coding agents are genuinely powerful. But they're capturing maybe 40% of the potential because they're still the bottleneck in the loop.

The agent can work at 3am. You can't. Or rather, you shouldn't have to.

Nah, I'm not saying every task should be automated. Architecture decisions, tricky debugging, novel problems where you need to think alongside the agent, those still need you in the chair. But the 60-70% of tasks that are well-defined, test-covered, and small enough to fit in a single prompt? Those can run while you sleep, eat, or do literally anything else.

What changed for me

I build more now. Not because I work more hours, but because the hours I do work are spent on the parts that actually need a human brain. Planning. Review. Decisions. The typing happens on its own schedule.

That transition from task manager to pipeline architect is what Zowl was built for. It's the tool I made because I got tired of being the slowest component in my own development workflow. And every night when I load up tomorrow's tasks and close my laptop, I'm glad I built it. Discover more about how to run your agents autonomously at Zowl.