Implementing a SubAgent Orchestration System in my Dev Container
Building a Multi-Agent AI Orchestra: How I Solved the Coordination Problem
Part 1 Recap: Where We Left Off
In my previous blog post, I built a Docker container that unified Claude Code, OpenAI Codex, and OpenCode into a single, portable development environment. I could SSH in from any device and have all my AI tools ready to go.
It was great. For about two weeks.
Then I tried to build something ambitious: a full-stack SaaS application with authentication, payments, a dashboard, and an API. I typed out my detailed prompt, hit enter, and waited for Claude to work its magic.
The result? Chaos.
Claude wrote the backend API. Then it wrote the frontend. But the API endpoints it created didnβt match the frontendβs fetch calls. The database schema was missing fields the UI expected. The authentication flow was designed twiceβdifferently each time. And when I asked Claude to fix the integration issues, it lost context of the original requirements and started making completely different assumptions.
I had hit the wall that every AI-assisted developer eventually hits: AI coding assistants are brilliant at focused tasks, but they struggle with complex, multi-component projects.
This post is about how I solved that problem by building a multi-agent orchestration systemβwhere specialized AI agents work in parallel like a well-coordinated development team, with an orchestrator ensuring their work integrates seamlessly.
The Problem: One AI, Too Many Hats
Let me paint the picture of what happens when you ask a single AI to build a full-stack app:
You: "Build a SaaS for project management with auth, Kanban boards,
time tracking, invoicing, and Stripe payments."
AI (thinking): "Okay, that's... a lot. Let me start with the backend..."
[40 minutes later]
AI: "I've built the User model with email/password auth."
You: "Great, but what about Google OAuth? And the Kanban boards?"
AI: "Right! Let me add OAuth... here's the frontend login component..."
[Switches context, loses track of database schema decisions]
AI: "Done! The login button is styled nicely."
You: "The login button calls /api/auth/login but you created /api/users/authenticate"
AI: "Oh, let me fix that..."
[Fixes frontend, forgets it broke the backend test]
You: "The tests are failing now."
AI: "What tests?"
Sound familiar?
The fundamental issue is that AI models, despite their impressive capabilities, work with limited context windows and single-threaded attention. When you ask one AI to build a complex system, it has to:
- Hold the entire project architecture in context
- Remember every decision made hours ago
- Switch between backend, frontend, testing, and DevOps thinking
- Maintain consistency across hundreds of files
- Not lose sight of the original requirements
Thatβs asking too much. Even for Claude Opus with its 200K context window.
The solution became obvious: donβt ask one AI to wear all the hats. Build a team.
The Insight: How Human Teams Work
Before diving into code, I thought about how real development teams tackle complex projects.
A startup building a SaaS doesnβt have one developer doing everything. They have:
- A backend engineer designing APIs and database schemas
- A frontend developer building the UI
- A QA engineer writing tests
- A DevOps person setting up deployment
- A project manager coordinating everyone
Each person is a specialist. They work in parallel on their domain. They communicate through shared artifacts (design docs, API contracts, git repos). And critically, someone coordinates them to ensure the pieces fit together.
What if I could replicate this with AI agents?
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HUMAN TEAM β
β β
β Project Manager β
β β β
β ββββΊ Backend Engineer βββΊ API Code β
β ββββΊ Frontend Developer βββΊ UI Code β
β ββββΊ QA Engineer βββΊ Tests β
β ββββΊ DevOps βββΊ Deployment β
β β
β PM ensures: API contracts match, features are complete, code works β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRANSLATE TO β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI AGENT TEAM β
β β
β Orchestrator Script β
β β β
β ββββΊ Claude Opus (Backend) βββΊ API Code β
β ββββΊ Gemini CLI (Frontend) βββΊ UI Code β
β ββββΊ Claude Sonnet (Testing) βββΊ Tests β
β ββββΊ Claude Sonnet (DevOps) βββΊ Deployment β
β β
β Orchestrator ensures: Integration works, requirements met, verified β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This insight led to the Multi-Agent Orchestration System.
Architecture: The Orchestra and Its Instruments
The Big Picture
The system has three layers:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 1: USER INTERFACE β
β β
β orchestrate "Build a SaaS for project management" β
β route multi β
β route backend-arch β
β β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 2: ORCHESTRATOR β
β β
β β’ Prompt Analysis & Requirements Gathering β
β β’ Agent Planning & Task Distribution β
β β’ Parallel Execution Management β
β β’ Progress Monitoring β
β β’ Integration Verification β
β β’ Fix Cycles β
β β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β LAYER 3: β β β β β
β AI CLI Agents β β β β β
β β β β β β
β Claude Opus β β Gemini CLI β β Claude Sonnet β
β Claude Sonnet β β Copilot CLI β β Codex CLI β
β β β β β β
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
Letβs break down each component.
The Orchestrator: Bash as the Conductor
Hereβs a decision that might surprise you: the orchestrator is a bash script, not an AI agent.
Why bash? Because the orchestrator needs to:
- Spawn and manage multiple processes
- Track PIDs and exit codes
- Read/write state files
- Coordinate timing and dependencies
- Never βforgetβ what itβs doing
AI models can lose context. Bash scripts donβt. The orchestrator is deterministicβit follows its coordination logic exactly, every time.
The Orchestration Lifecycle
#!/bin/bash
# Multi-Agent Orchestrator - The Conductor
main() {
show_banner
# Phase 1: Initialize Session
SESSION_ID=$(generate_session_id)
SESSION_DIR="${PROJECTS_DIR}/${SESSION_ID}"
mkdir -p "$SESSION_DIR"
# Phase 2: Capture User Prompt (COMPLETE, UNTRUNCATED)
capture_user_prompt
# Phase 3: Analyze & Plan
components=$(analyze_project_request "$ORIGINAL_PROMPT")
agents=$(map_components_to_agents "$components")
# Phase 4: Gather Requirements (Clarifying Questions)
gather_requirements
# Phase 5: Execute Parallel Agents
execute_orchestration "$agents"
# Phase 6: Monitor Until All Complete
monitor_agents
# Phase 7: Verify Integration
verify_integration
# Phase 8: Fix Cycles if Needed
if [ $? -ne 0 ]; then
run_fix_cycle 3 # Up to 3 attempts
fi
# Phase 9: Final Report
final_report
}
Each phase solves a specific problem I encountered in my single-agent nightmare.
Problem 1: The Lost Prompt
The Problem: When I gave Claude a detailed prompt, it would start working on one part and gradually forget details from other parts. By the time it got to the fifth feature, it had no memory of the specific requirements for the first feature.
The Solution: Full Prompt Preservation
The orchestrator stores the complete, unmodified prompt and passes it to every agent:
# Store the COMPLETE original prompt
echo "$initial_prompt" > "${SESSION_DIR}/original_prompt.txt"
# Later, when launching each agent:
launch_agent() {
local full_prompt="## Project Context
You are working as part of a multi-agent team coordinated by an orchestrator.
Your role: $agent_type
## Original Project Request
$ORIGINAL_PROMPT # <-- FULL PROMPT, NOT SUMMARIZED
## Your Specific Task
$task
## Integration Notes
Other agents working on this project:
$(for a in "${ACTIVE_AGENTS[@]}"; do echo "- $a"; done)
Ensure your code is compatible with shared interfaces."
# Launch with full context
claude --model opus -p "$full_prompt"
}
Now every agentβbackend, frontend, testing, DevOpsβsees the complete original requirements. The backend architect knows about the Kanban boards (even though theyβre building APIs). The frontend developer knows about Stripe (even though theyβre building UI).
This shared context is crucial for implicit coordinationβagents naturally make compatible decisions because they understand the full picture.
Problem 2: The One-Track Mind
The Problem: A single AI works sequentially. It builds the backend, then the frontend, then the tests. Total time: 3+ hours. And by the time it gets to testing, itβs forgotten details about the backend implementation.
The Solution: True Parallel Execution
The orchestrator spawns each agent as a separate background process:
launch_agent() {
local agent_id="$1"
local agent_type="$2"
local cli="$3"
local task="$4"
# Run in background with subshell
(
update_agent_state "$state_file" "status" '"running"'
update_agent_state "$state_file" "started_at" "\"$(date -Iseconds)\""
# Execute the AI CLI
if claude --model opus -p "$full_prompt" >> "$output_file" 2>&1; then
update_agent_state "$state_file" "status" '"completed"'
update_agent_state "$state_file" "exit_code" "0"
else
update_agent_state "$state_file" "status" '"failed"'
update_agent_state "$state_file" "exit_code" "$?"
fi
touch "$marker_file" # Signal completion
) &
local pid=$!
ACTIVE_AGENTS+=("$agent_id:$pid:$state_file")
}
Key insight: Each agent process is completely independent. They donβt share context windows. They donβt share memory. Theyβre separate CLI invocations running in parallel.
Timeline: Single Agent (Sequential)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[ Backend (60min) ][ Frontend (50min) ][ Testing (40min) ]
Total: 2.5 hours
Timeline: Multi-Agent (Parallel)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[ Backend (60min) ]
[ Frontend (50min) ]
[ Testing (40min) ]
Total: 1 hour (max of all agents)
This isnβt just fasterβit also means each agent has 100% of its context window dedicated to its specialized task. No context lost to remembering other domains.
Problem 3: The Context Window Confusion
The Problem: When I first designed the system, I worried: βIf I run 4 agents, do I have 4x the context available, or does it all share one pool?β
The Answer: Complete Independence
This is crucial to understand:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR β
β (bash script - no AI) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β spawns separate processes
βββββββββββββββΌββββββββββββββ¬ββββββββββββββ
βΌ βΌ βΌ βΌ
βββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ
β Claude β β Gemini β β Claude β β Codex β
β Opus β β CLI β β Sonnet β β CLI β
βββββββββββββ€ βββββββββββββ€ βββββββββββββ€ βββββββββββββ€
β Context: β β Context: β β Context: β β Context: β
β 200K β β 1M+ β β 200K β β 128K β
β (SEPARATE)β β (SEPARATE)β β (SEPARATE)β β (SEPARATE)β
βββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ
Each agent gets its FULL context window. Running 4 agents doesnβt mean dividing 200K by 4βit means having 200K + 1M + 200K + 128K = 1.5M+ tokens of context working simultaneously.
Butβand this is the trade-offβagents canβt see each otherβs conversations. They can only coordinate through:
- The shared original prompt
- The filesystem (the actual code they write)
- The orchestratorβs final verification step
This is actually a feature, not a bug. It mirrors how human teams work: the backend engineer doesnβt need to see every Slack message the frontend developer sends. They just need to agree on the API contract and deliver compatible code.
Problem 4: The Blind Orchestrator
The Problem: Once I launched parallel agents, how would I know whatβs happening? Were they stuck? Failed? Done?
The Solution: Continuous Monitoring Dashboard
The orchestrator polls agent state files and displays real-time status:
monitor_agents() {
while true; do
local all_done=true
local status_line=""
echo -ne "\r[$(date '+%H:%M:%S')] Agent Status: "
for agent_entry in "${ACTIVE_AGENTS[@]}"; do
IFS=':' read -r agent_id pid state_file <<< "$agent_entry"
local status=$(get_agent_state "$state_file" "status")
case $status in
pending) status_line="${status_line}β "; all_done=false ;;
running) status_line="${status_line}β "; all_done=false ;;
completed) status_line="${status_line}β " ;;
failed) status_line="${status_line}β " ;;
esac
done
echo -ne "$status_line"
if $all_done; then break; fi
sleep 5
done
}
What you see in your terminal:
[14:32:05] Agent Status: β β β β
backend-architect β Running [=====> ] 60%
frontend-developer β Running [===> ] 40%
test-writer-fixer β Running [=> ] 15%
security-expert β Waiting [ ] 0%
Legend: β Pending β Running β Complete β Failed
The orchestrator doesnβt move to verification until all agents complete. No more partial implementations where the backend is done but the frontend is still being written.
Problem 5: The Integration Nightmare
The Problem: Even with parallel agents, thereβs no guarantee their outputs work together. The backend might create /api/users/:id but the frontend calls /api/user/:userId. Different names, broken integration.
The Solution: Automated Integration Verification
After all agents complete, the orchestrator runs a verification stepβusing Claude Opus as an integration reviewer:
verify_integration() {
local summaries=$(get_agent_summaries)
local verification_prompt="## Integration Verification Task
You are the project orchestrator verifying that all agent outputs integrate correctly.
## Original Request
$ORIGINAL_PROMPT
## Agent Outputs
$summaries
## Your Tasks
1. **Completeness Check**: Verify all aspects of the original request have been addressed
2. **Integration Check**: Ensure all components work together (APIs match frontend calls, etc.)
3. **Consistency Check**: Verify naming conventions, coding styles, and patterns are consistent
4. **Dependency Check**: Ensure all dependencies are properly declared
5. **Test Coverage Check**: Verify testing covers the implementation
## Output Format
Please provide:
1. A checklist of original requirements and their status (β
Done, β οΈ Partial, β Missing)
2. List of any integration issues found
3. List of any conflicts between agent outputs
4. Recommendations for fixes needed
5. Overall project status (READY / NEEDS_FIXES / INCOMPLETE)"
claude --model opus -p "$verification_prompt" > "$verification_output"
if grep -q "NEEDS_FIXES\|INCOMPLETE" "$verification_output"; then
return 1 # Integration failed
fi
return 0 # Integration passed
}
This is where the magic happens. The verifier:
- Reads all agent outputs together (summaries of their work)
- Compares them against the original requirements
- Identifies mismatches like API contract disagreements
- Flags incomplete features
- Produces a clear pass/fail verdict
Problem 6: The Fix Loop of Doom
The Problem: When verification fails, you need to fix issues. But if you just re-run agents, they might introduce new issues while fixing old ones. You end up in an infinite fix loop.
The Solution: Bounded Fix Cycles
The orchestrator runs up to 3 fix cycles before requiring human intervention:
run_fix_cycle() {
local max_cycles="${1:-3}"
local cycle=1
while [ $cycle -le $max_cycles ]; do
log INFO "Running fix cycle $cycle of $max_cycles..."
# Create targeted fix prompt from verification output
local fix_prompt="## Fix Cycle $cycle
Based on the integration verification, please fix the identified issues.
## Issues to Fix
$(grep -A 20 "integration issues\|Issues Found\|NEEDS_FIXES" "$verification_output")
## Instructions
1. Address each identified issue
2. Ensure fixes don't break existing functionality
3. Run tests after fixes
4. Document what was changed"
# Launch fix agent
claude --model opus -p "$fix_prompt" > "$fix_output"
# Re-verify
if verify_integration; then
log OK "Fix cycle $cycle resolved all issues!"
return 0
fi
((cycle++))
done
log WARN "Maximum fix cycles reached. Manual intervention needed."
return 1
}
The key improvements:
- Targeted fixes: The fix prompt includes specific issues from verification
- Limited attempts: 3 cycles max prevents infinite loops
- Re-verification: Each fix cycle is verified before continuing
- Clear failure: If 3 cycles canβt fix it, the human is alerted with specific details
The Agent Specialists: Who Does What
Not all agents are created equal. I carefully matched each task type to the optimal AI CLI:
The Agent Roster
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AGENT SPECIALISTS β
ββββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ€
β Agent Type β CLI β Why This Pairing? β
ββββββββββββββββββββΌβββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€
β backend-architectβ Claude Opus β Deep reasoning for complex APIs β
β frontend-developerβ Gemini CLI β Multimodal, visual understanding β
β test-writer-fixerβ Claude Sonnet β Fast, methodical, good for TDD β
β devops-engineer β Claude Sonnet β Infrastructure patterns β
β ui-designer β Gemini CLI β Design eye, component styling β
β security-expert β Claude Opus β Threat modeling, deep analysis β
β technical-writer β Claude Sonnet β Clear documentation, fast β
β data-engineer β Claude Opus β Schema design, data modeling β
ββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββ
Each agent gets a tailored task prompt. Hereβs what the backend-architect receives:
generate_agent_task() {
case $agent_type in
backend-architect)
echo "Design and implement the backend architecture including:
- API endpoints and routes
- Database schema and models
- Authentication and authorization
- Business logic and services
- Error handling and validation
Ensure APIs are well-documented and follow RESTful conventions."
;;
frontend-developer)
echo "Design and implement the frontend including:
- UI components and layouts
- State management
- API integration with backend
- Responsive design
- User interactions and feedback
Ensure the UI is intuitive and matches modern design standards."
;;
# ... other agents
esac
}
The Smart Router: Choosing the Right Tool
Sometimes you donβt need a full orchestraβyou just need one instrument. Thatβs where the route command comes in.
Automatic Task Detection
# The route script analyzes your prompt and picks the best CLI
$ route
βΊ Enter your task: "Review this authentication code for security vulnerabilities"
π Analyzing your request...
Detected: Security review task
Recommended: Claude Opus (deep analysis, threat modeling)
Launching claude --model opus...
The routing logic uses keyword detection:
detect_task_category() {
local prompt="$1"
local prompt_lower=$(echo "$prompt" | tr '[:upper:]' '[:lower:]')
# Security tasks β Claude Opus
if [[ "$prompt_lower" =~ (security|vulnerability|audit|penetration|threat) ]]; then
echo "security"
return
fi
# UI/Design tasks β Gemini
if [[ "$prompt_lower" =~ (ui|design|visual|css|animation|component) ]]; then
echo "design"
return
fi
# GitHub tasks β Copilot CLI
if [[ "$prompt_lower" =~ (github|workflow|actions|ci/cd|pull.request) ]]; then
echo "github"
return
fi
# Default to Claude Sonnet for general coding
echo "general"
}
Manual Routing
For power users who know exactly what they want:
route backend-arch # Jump straight to Claude Opus
route frontend # Jump to Gemini CLI
route testing # Claude Sonnet for tests
route github # Copilot CLI for GitHub tasks
A Complete Example: Building TaskFlow SaaS
Let me walk through a real orchestration session, step by step.
Step 1: Launch the Orchestrator
$ orchestrate
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π― Multi-Agent Project Orchestrator v1.0 β
β Coordinate AI Agents for Complex Projects β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΉ Starting orchestration session: orch-20260118-143000-12345
π― What would you like to build?
(Describe your project in detail. The more context, the better.)
βΊ
Step 2: Enter the Detailed Prompt
βΊ Create a full-stack task management application called TaskFlow for
freelancers with:
- User authentication (email/password + Google OAuth)
- Project and task management with drag-and-drop Kanban boards
- Time tracking per task with start/stop timer
- Invoice generation from tracked time entries
- Client portal where clients can view project progress
- Stripe integration for subscription billing
Tech stack: Next.js 14, Prisma ORM, PostgreSQL, Redis for caching
The UI should be modern, clean, with a dark mode option.
Mobile-responsive is essential.
Step 3: Requirements Gathering
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Requirements Gathering
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΉ Analyzing your request...
Detected project components:
β backend
β frontend
β testing
β security
β devops
Please answer a few questions to clarify requirements:
(Press Enter to skip any question)
β Project type? (MVP/prototype, production, enterprise): MVP
β Preferred tech stack?: Already specified - Next.js, Prisma, PostgreSQL
β Any timeline constraints?: 1 week
β Most important features to prioritize?: Auth and Kanban boards
β Any specific constraints or requirements?: Must work on mobile
Step 4: Review the Execution Plan
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Execution Plan
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Agents to be deployed:
1. backend-architect β claude-opus
2. frontend-developer β gemini
3. test-writer-fixer β claude-sonnet
4. security-expert β claude-opus
5. devops-engineer β claude-sonnet
Execution strategy:
β’ Agents will run in parallel where possible
β’ Each agent receives full project context
β’ Orchestrator monitors progress continuously
β’ Integration verification after completion
β’ Fix cycles if issues are detected
Proceed with this plan? [Y/n/edit]: Y
Step 5: Watch the Parallel Execution
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Executing Multi-Agent Orchestration
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π€ Launching backend-architect (claude-opus)...
β Agent backend-architect-1 started (PID: 45231)
π€ Launching frontend-developer (gemini)...
β Agent frontend-developer-2 started (PID: 45232)
π€ Launching test-writer-fixer (claude-sonnet)...
β Agent test-writer-fixer-3 started (PID: 45233)
π€ Launching security-expert (claude-opus)...
β Agent security-expert-4 started (PID: 45234)
π€ Launching devops-engineer (claude-sonnet)...
β Agent devops-engineer-5 started (PID: 45235)
βΉ Monitoring 5 agents...
[14:31:05] Agent Status: β β β β β
[14:31:10] Agent Status: β β β β β
[14:32:15] Agent Status: β β β β β (devops-engineer complete)
[14:35:22] Agent Status: β β β β β (security-expert complete)
[14:42:18] Agent Status: β β β β β (test-writer-fixer complete)
[14:55:33] Agent Status: β β β β β (frontend-developer complete)
[15:02:47] Agent Status: β β β β β (backend-architect complete)
β All agents completed successfully!
Step 6: Integration Verification
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Integration Verification Results
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## Requirements Checklist
β
User authentication (email/password + Google OAuth)
β
Project and task management with Kanban boards
β
Time tracking per task
β
Invoice generation from tracked time
β
Client portal
β
Stripe integration
β
Dark mode
β
Mobile responsive
## Integration Check
β
API endpoints match frontend calls
β
Database schema supports all features
β
Auth flow works end-to-end
β
Stripe webhooks properly configured
## Minor Issues Found
β οΈ Missing error boundary in Kanban component
β οΈ Client portal missing loading states
## Overall Status: NEEDS_FIXES (minor)
Step 7: Automated Fix Cycle
β PROJECT NEEDS ATTENTION
Would you like to run fix cycles? [Y/n]: Y
βΉ Running fix cycle 1 of 3...
π€ Dispatching fix agent for identified issues...
[Fixing: Error boundary in Kanban component]
[Fixing: Loading states in client portal]
β Changes applied
βΉ Re-verifying integration...
## Overall Status: READY β
β Fix cycle 1 resolved all issues!
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PROJECT COMPLETED SUCCESSFULLY
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Session ID: orch-20260118-143000-12345
Logs: ~/.orchestrator/logs/orch-20260118-143000-12345/
Total time: 32 minutes
Agents used: 5
Fix cycles: 1
The Result: A Working TaskFlow
After 32 minutes (instead of 3+ hours with a single agent), I have:
taskflow/
βββ src/
β βββ app/
β β βββ api/
β β β βββ auth/ # OAuth, session management
β β β βββ projects/ # Project CRUD
β β β βββ tasks/ # Task management
β β β βββ time-entries/ # Time tracking
β β β βββ invoices/ # Invoice generation
β β β βββ stripe/ # Webhooks, subscription
β β βββ dashboard/ # Main dashboard
β β βββ projects/ # Project views
β β βββ portal/ # Client portal
β β βββ settings/ # User settings
β βββ components/
β β βββ KanbanBoard/ # Drag-and-drop board
β β βββ TimeTracker/ # Start/stop timer
β β βββ InvoiceBuilder/ # Invoice generation
β β βββ ThemeToggle/ # Dark mode
β βββ lib/
β βββ prisma.ts # Database client
β βββ auth.ts # Auth utilities
β βββ stripe.ts # Stripe client
βββ prisma/
β βββ schema.prisma # Full database schema
βββ tests/
β βββ unit/ # Unit tests
β βββ integration/ # API tests
β βββ e2e/ # End-to-end tests
βββ docker-compose.yml # Dev environment
βββ .github/workflows/ # CI/CD pipeline
βββ README.md # Documentation
All components work together because they were built with shared context and verified for integration.
Phase 2: Marketing After the Build
Hereβs something I intentionally designed: marketing agents are NOT included in the build phase.
Why? Because:
- Marketing needs a finished product to describe
- Marketing content consumes context better spent on code
- Marketing is a separate workflow, not part of coding orchestration
After the build completes, I switch to marketing mode:
# Option 1: Direct routing for specific marketing tasks
$ route content
βΊ Create landing page copy for TaskFlow, a task management SaaS for
freelancers. Focus on time savings and invoicing automation.
# Option 2: Use Claude with marketing agents
$ claude
> Use content-creator
Create a launch email sequence (5 emails) for TaskFlow targeting
freelancers who struggle with project organization.
> Use seo-specialist
Research keywords for "freelance project management" and create a
content calendar.
> Use social-media-manager
Create a Twitter/LinkedIn launch campaign with 10 posts.
This two-phase approach keeps the build focused and gives marketing agents a completed product to work with.
What I Learned: The Meta-Lessons
1. Coordination > Raw Power
Having 5 mediocre agents that coordinate well beats 1 powerful agent that tries to do everything. The orchestration layer is where the real value is created.
2. Bash is Underrated for AI Workflows
When you need deterministic coordination, state management, and process control, bash beats AI agents every time. Let AI do what itβs good at (reasoning, generation) and let scripts do what theyβre good at (orchestration).
3. Independent Context is a Feature
At first, I worried that agents couldnβt see each otherβs conversations. Then I realized: they donβt need to. Just like human teams, they coordinate through shared artifacts (the codebase) and clear contracts (the original prompt).
4. Verification is Non-Negotiable
Without the integration verification step, youβll have beautifully written components that donβt work together. The extra 2 minutes for verification saves hours of debugging.
5. Bounded Failures are Acceptable
The system doesnβt pretend to be perfect. If 3 fix cycles canβt resolve issues, it stops and asks for human help with specific details about whatβs wrong. This honesty is more valuable than false confidence.
Whatβs Next?
The current system handles the coding phase beautifully. Hereβs what Iβm building next:
- Phase 2 Marketing Workflow: Automated marketing launch after code completion
- Dependency Detection: Smarter sequencing when agents depend on each otherβs output
- Learning from History: Using past sessions to improve agent task assignments
- Cost Tracking: Monitor API spend per agent and optimize for budget
- Human Checkpoints: Pause points where humans can review before continuing
Try It Yourself
The complete system is in the repository:
# Clone the repo
git clone https://github.com/your-username/agent-container.git
cd agent-container
# Set up API keys
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY and OPENAI_API_KEY
# Deploy to Hetzner (or run locally)
HETZNER_IP=your-server-ip ./scripts/deploy.sh
# SSH in and orchestrate
ssh ai-dev
orchestrate "Build your amazing project idea here"
The orchestrate and route commands are in /scripts/. The agent definitions are in /claude-agents/. The documentation is comprehensive.
Final Thoughts
When I started this project, I was frustrated with the limitations of single AI assistants. Theyβre brilliant at focused tasks but fall apart on complex projects.
The solution wasnβt to wait for more powerful AIβit was to orchestrate existing AI into teams. Each agent is a specialist. The orchestrator is the project manager. Together, they deliver what no single agent could.
The future of AI development isnβt one superintelligent agent doing everything. Itβs AI teamworkβspecialized agents coordinated by smart orchestration. And with the tools in this repo, you can have that future today.
Happy building! π