From bash script to pipeline: how Zowl started

It started at 2am

I wasn't trying to build a product. I was trying to go to sleep.

I had a list of 8 small refactoring tasks I wanted done by morning. Boring stuff: rename a module, extract a utility function, add error handling to three endpoints. The kind of work that's easy to describe but tedious to do. So I opened a terminal, typed claude -p "read src/ and refactor the logger module to use structured output", waited 4 minutes, checked the result, copied the next task, pasted it in. Waited again. Checked. Copied. Pasted.

By the third task I thought: this is a for loop.

#!/bin/bash
# nightloop.sh v0.1 - the dumbest possible version
while IFS= read -r task; do
  echo "[$(date)] Running: $task"
  claude -p "$task" >> nightloop.log 2>&1
done < tasks.txt

That was the entire script. Seven lines. I ran it, went to bed, and woke up to a log file full of completed work. Five out of eight tasks were done correctly. Two needed minor fixes. One was garbage.

I was hooked.

The pre-check problem

Within a week the script grew legs. The biggest issue was context. I'd write a task like "add pagination to the users endpoint" and Claude would implement it in a file that didn't exist, or use an ORM method we'd already deprecated. The agent had no idea what the codebase actually looked like before it started coding.

So I added a pre-check step.

# nightloop.sh v0.3 - now with pre-check
run_precheck() {
  local task="$1"
  claude -p "Read the codebase in src/. Then evaluate this task: $task.
  Does it make sense given the current code? What files will you need to touch?
  If anything is unclear or contradictory, output SKIP with a reason." 2>&1
}

This changed everything. The agent would read the code first, think about whether the task was feasible, and flag problems before writing a single line. My success rate went from ~60% to about 85% overnight.

Honestly, that one idea (read before you write) is still the core of how Zowl pipelines work.

Then came validation

Pre-check handled the "garbage in" problem. But I still had "garbage out" to deal with. The agent would write code that looked fine in the diff but broke tests, or it would silently skip the acceptance criteria I'd written in the task description.

So I added a third step: validate.

# nightloop.sh v0.5 - the 3-step flow
run_validate() {
  local task="$1"
  local diff="$2"
  claude -p "Here's the original task: $task
  Here's what was implemented (git diff): $diff
  Run the test suite. Check if the acceptance criteria are met.
  Output PASS or FAIL with details." 2>&1
}

Pre-check, implement, validate. Three steps. That pattern worked so reliably that I stopped thinking about it. It was just how you run an AI agent on a task. Read the code, write the code, check the code.

I didn't realize I'd accidentally designed a pipeline architecture. I just wanted to stop waking up to broken builds.

Where bash started to fall apart

By version 0.8, nightloop.sh was 400 lines and growing. It had retry logic with exponential backoff. It had a lockfile system so I wouldn't accidentally run two instances. It had colored log output. It had failure routing that could skip dependent tasks when an upstream task failed.

It also had bugs that made me want to throw my laptop off the balcony.

The lockfile issue was the worst. On macOS, flock doesn't exist natively. I was using shlock from the developer tools, except it behaved differently across OS versions. One morning I woke up to find two instances of the script had been running simultaneously because the lockfile check silently failed after a macOS update. Both instances modified the same files. The git history looked like a crime scene.

Log contamination was another recurring nightmare. Bash doesn't give you structured logging without serious gymnastics. My logs were a mix of stdout, stderr, Claude's output, my own echo statements, and occasionally raw ANSI escape codes from tools that didn't know they were being piped. Parsing those logs to figure out what happened at 3am was archaeology, not debugging.

# This is what "error handling" looked like at v0.9
result=$(run_implement "$task" 2>&1) || {
  echo "[ERROR] Task failed: $task" >> "$LOG_FILE"
  echo "$result" >> "$LOG_FILE"
  # Did the error come from claude? from bash? from a subshell?
  # Nobody knows. Good luck.
  ((retries++))
  if [ $retries -ge $MAX_RETRIES ]; then
    echo "[FATAL] Max retries exceeded" >> "$LOG_FILE"
    # Should we continue to the next task? Skip dependents?
    # This is where the spaghetti lives.
  fi
}

And argument passing. My god, argument passing in bash when your arguments contain quotes, newlines, code snippets, and markdown. I had a task description that included a JSON example with escaped quotes. The bash script mangled it into something unrecognizable. I spent two hours debugging before I realized the task itself was being corrupted before it ever reached the agent.

The copy-paste moment

Here's the thing that finally pushed me to build something real.

I had nightloop.sh working well for my main project. Then I started a side project. I copied the script over. Modified some paths. Tweaked the retry count. A week later I started a client project. Copied it again. Different modifications. Now I had three copies of the script, each slightly different, each with its own bugs.

When I fixed the lockfile issue in one copy, I forgot to patch the others. When I added a new feature to the client project version, I had to manually backport it. I was maintaining three forks of my own bash script.

That felt ridiculous.

If I'm copying this pipeline everywhere and every developer using Claude Code has the same problem (run tasks, check results, handle failures), then this should be a tool. Not a script. A tool with a proper UI where you can see what's running, what failed, and what's queued up. Where the pipeline definition is visual, not buried in bash conditionals. Where logs are structured and searchable instead of a wall of text.

From nightloop.sh to Zowl

I built the first version of Zowl in about six weeks. Native macOS app, built with Swift. The core pipeline engine is basically the same three-step flow from nightloop.sh: pre-check, implement, validate. But wrapped in proper process isolation, structured logging, a visual pipeline editor, and a retry system that actually works. The journey from bash to native app is something I explored in more detail in how bash became a macOS app.

The "NightLoop" pipeline template that ships with Zowl is a direct descendant of that original bash script. Same philosophy. Same three steps. Same idea that agents need guardrails before and after they write code.

I still have nightloop.sh on my machine. It's 637 lines now, and I haven't touched it in months.

Sometimes I open it just to remember that the best tools start as the dumbest possible solution to a real problem. A for loop, a task list, and the desire to go to bed. That's the entire origin of Zowl, and honestly I wouldn't have it any other way. The real insight came later: pipelines are the new git hooks. If you're interested in building something similar, check out Zowl.