Building AI Teams That Actually Think Together
I spent a month building hierarchical AI teams and achieved 48x productivity gains. Here's why organizing AI agents like a real company changes everything.
The Context
Last week I saw Tom Bilyeu's tweet about rebuilding his company in 90 days using a "5-member AI department". This idea captured something many of us have been thinking about. The vision is compelling: AI agents working 24/7, handling everything from strategy to execution, for a fraction of traditional costs.
But having spent the last month intensively building with the help of AI—not just theorizing about it—I had to respond with what I've discovered:
"This is great, but you're missing a HUGE value lever: a centralized knowledge base where context and memory is shared and updated. This also allows for convergence thinking from multiple AI profiles. It's the difference between having one-to-one sessions and a board meeting where agents can build together (plus human input). I'm actually building this and the results are remarkable."
A lot of people asked if I could share more about this approach, so this article is my attempt to show you what I mean.
The Problem with Isolated AI "Employees"
The current AI narrative focuses on replacement: replace your designer, your coder, your analyst. This approach treats AI agents like independent contractors—each working in isolation, delivering discrete outputs.
I've been testing this model extensively the past month—generating over 200,000 lines of code through rapid experimentation. Most were dead ends, but the learnings compounded quickly. While this approach works, it's fundamentally limited. At most, you get linear improvements: 1 AI + 1 AI + 1 AI = 3x better.
Often, agents start stepping on each other, creating conflicts, and you end up spending more time coordinating than coding. In many cases, I found it's actually better to just have one agent instead of many.
But what if AI agents could actually collaborate? What if they could build on each other's insights in real-time, with humans kept in the loop for key decisions?
Hierarchy Changed Everything
My breakthrough came when I stopped thinking about AI agents as equals and started organizing them like an actual company:
C-Suite Level (High-Intelligence Models)
Strategic Advisor: Collaborates with human to set vision, identify improvement opportunities, measure ROI
CTO: Makes architectural decisions, configures and deploys Execution Level agents based on project needs, assigns tasks to appropriate models, reviews technical work
Framework Architect: Extracts patterns, designs methodologies, documents
CEO: Me, the sole human!
Execution Level (Efficient Models)
Junior Developers: Implement clearly defined tasks
Test Writers: Create comprehensive test suites
Documentation Writers: Keep everything current
I set up my own C-suite using multiple instances of Claude Code—the biggest advantage being they can act directly on code and are optimized for development support. I could have used other models through IDE integrations, but found it better to work with consistent terminal interfaces rather than juggling different UIs.
For the execution level, our AI-CTO helped set them up with a complexity-based routing system: simpler tasks go to cheaper models via APIs, and some even to local Ollama instances that cost nothing.
This hierarchy alone cut costs by 90%: expensive AI thinks and plans (with my inputs), efficient AI executes, expensive AI reviews and commits. Rinse and repeat.
The Infrastructure Nobody Talks About
But hierarchy is just the foundation. The real magic happens with two critical components that address what even GitHub's own Claude Code creators acknowledged as "one of the hardest problems in AI development": maintaining context across sessions and getting AI to perform consistently.
1. Agent-to-Agent Communication: Why We Chose Files Over "Proper" Protocols
With the help of my CxO suite, I built a simple asynchronous messaging system using file-based inbox/outbox folders. Before you ask: yes, we evaluated MCP protocols, N8N automation workflows, WebSocket APIs, and message queues. We chose file-based messaging intentionally.
Here's why simple beats sophisticated for AI coordination:
Reliability: APIs go down. Webhooks fail. Message brokers crash. Files just... exist. When your AI agent crashes at 2 AM, you want recovery to work without debugging connection pools or OAuth tokens.
Transparency: Every message is human-readable (and editable). No binary protocols, no authentication layers, no network debugging. Just cat inbox/message.md
to see exactly what's happening between agents.
Git Integration: Messages become part of your version history. You can diff conversation threads, branch communication strategies, and roll back coordination mistakes. Try doing that with MCP or N8N workflows.
Universal Compatibility: Every programming language, every AI model, every operating system can read and write files. No SDK requirements, no API versioning, no vendor lock-in, no protocol negotiation.
Debuggable Failures: Complex systems fail in complex ways. Simple systems fail in simple ways. When our file-based system "fails," it usually means someone forgot to commit a message. When API-based systems fail, you're debugging OAuth tokens, rate limits, and network timeouts at midnight.
Zero Infrastructure: No servers to maintain, no message brokers to scale, no rate limits to manage, no automation platforms to configure. Works on localhost, works on production, works offline.
The real insight: AI agents don't need real-time messaging or complex workflow orchestration. They need reliable, transparent, debuggable communication. File-based A2A gives us exactly that.
Agents can:
Share discoveries as they happen
Request help when stuck
Build on each other's work
Maintain conversation history
It's intentionally simple—no complex APIs, no real-time coordination. Just agents leaving messages for each other.
2. Git-Based Memory and Recovery
This addresses the core problem that even the Claude Code team identified: context persistence. Instead of complex state management systems that can corrupt or fail, we use git as our single source of truth:
Every 5-10 minutes, agents commit their work ("checkpoint")
If an agent crashes, the next instance reads git history to recover
Recovery takes about 45 seconds, not hours
No complex state files or monitoring daemons
We essentially solved the "hardest problem in AI development" by using the simplest tool that could work: git, which every developer already understands. This simple approach enables recovery in under a minute, transforming what GitHub identified as a major challenge into a solved problem.
Real Results from Real Projects
In order to make this real, I am building a real product using this framework to test and improve it, spending time measuring everything on actual development tasks. Here are results from our specific context and optimal conditions:
OCR Feature Implementation:
Traditional approach: 2 weeks
Single AI agent estimate: 4 days (3.5x improvement)
Our hierarchical system: 7 hours (48x improvement on this well-defined task)
Blog Post Generation:
Traditional approach: 8 hours
Our system: 45 minutes (10x improvement)
Test Suite Creation:
Traditional approach: 3 days
Our system: 4 hours (15x improvement)
Important context: These metrics apply to our specific workflow, with pre-existing patterns and clear specifications. Results will vary significantly based on project complexity and team experience, and these numbers are actually conservative based on our recent iterations.
How Convergent Thinking Actually Works
Having multiple agents with different "personalities" and experience allows for an interesting dynamic around "convergent thinking". Here's a concrete example of what "convergent thinking" looks like in practice:
The State Management Crisis:
CTO agent was struggling with complex JSON state files that kept corrupting
Human suggested sharing the problem with the broader agent team
Framework Architect noticed the pattern: "We're reinventing version control"
Strategic Advisor saw the business impact: "This complexity is killing our velocity"
Convergent insight: "Why not just use git for state management?"
Result: We deleted our entire state management system. Recovery became nearly instant. Sometimes the best solution is elimination, not optimization.
The Patterns We've Discovered
After intensive experimentation, we've identified several key patterns:
The Challenge Pattern: One agent hits a wall → Human asks to share the problem → Another agent suggests a different approach → Breakthrough innovation
The Build Pattern: One agent discovers a useful pattern → Human suggests sharing it → Others implement variations → Systematic improvement across all work
The Vision Pattern: Human sets ambitious goal → Multiple agents explore solution paths → Convergent validation → Solutions none of us would have conceived alone
Why This Approach Actually Works
Traditional AI usage is transactional:
Human: "Do this specific task"
AI: "Here's the result"
Human: "Do the next task"
Our approach is collaborative:
Human sets vision and constraints
Multiple specialized AIs explore the solution space
They share discoveries and build on each other's work
Human guides overall direction and makes key decisions
Team uses failure and roadblocks as opportunities to collaborate and improve. As a result, system learns and improves continuously
The Cost Economics That Make It Viable
Traditional AI Team (or the approach proposed by Tom's tweet): 5 high-cost models working independently
High coordination overhead
Lots of wasted compute on planning
Linear scaling
Our Hierarchical Approach: 2-3 high-cost models for strategy, 5-10 low-cost models for execution
90% cost reduction in our testing
Exponential capability scaling potential
Continuous learning and improvement
Most actual work gets done by efficient models that cost pennies per task.
Who Should Try This Approach
This hierarchical model is particularly effective for:
Development Teams ready to experiment with git-based workflows and comfortable with rapid iteration cycles.
Startups and Scale-ups that need to move fast with limited engineering resources but have well-defined feature requirements.
Product Teams working on software with extractable patterns—web apps, mobile applications, internal tools—where learnings from one feature can accelerate the next.
Technical Leaders who want to amplify their architectural vision across larger codebases without hiring proportionally more developers.
Building Your Own AI Management Team
If you want to experiment with this approach:
Start Simple
2-3 specialized agents with clear roles
Basic file-based messaging between them
Git for shared memory
Measure collaboration, not just output
Focus on Communication
Agents need to know what others are working on
They need to be able to request help
They need to share discoveries immediately
Humans should guide when and what to share—asynchronous works better than real-time
Design for Recovery
Everything should be recoverable from git history
Agents should crash and restart gracefully, with full project awareness
No complex state management systems that can corrupt
Target sub-minute recovery times using git history
Remember: Agents run out of memory, systems crash… the key to success is to have a very robust recovery plan for all your agents!
Measure What Matters
How often do agents build on each other's work?
How fast do insights propagate through the system?
Are you seeing solutions no single agent would create?
Is human time spent on vision or coordination?
When This Approach Works (And When It Doesn't)
This hierarchical model works best for:
Well-defined development projects with clear specifications
Teams comfortable with git workflows and terminal interfaces
Projects where patterns can be extracted and reused across features
Scenarios where iteration speed matters more than perfection
Codebases with established architectural patterns
It's not suitable for:
Highly creative, open-ended exploratory projects
Teams without strong technical infrastructure comfort
Projects requiring deep, specialized domain expertise
Situations where explaining AI decisions is critical for compliance
Complex debugging of legacy systems with unclear documentation
The Future We're Actually Building
We're not promising that AI will replace human developers. We're demonstrating that AI can amplify human vision in ways we're only beginning to understand.
When you create the right conditions—shared memory, diverse perspectives, structured communication—AI agents don't just work together. They start thinking together. New insights emerge that none of them (or us) would have discovered alone.
What's Next
We're building a development framework around these ideas. Current frameworks like Agile and Scrum weren't designed for AI-human collaboration—we measure development time in sessions, not sprints, because AI consistently underestimates its own speed. I will write more about this in a later post.
Our next steps involve sharing this approach with organizations ready to experiment:
Complete documentation of our patterns
Open-source components for A2A communication
Frameworks for setting up hierarchical AI teams
Measured case studies from real projects
This isn't about replacing the conversation around AI employees. It's about evolving it. The question isn't "How can I get AI to work faster?" It's "How can I build AI teams that think together?"
The difference? That's where the real leverage lives.
Want to see what hierarchical AI organizations can do for your work? We're building in public and sharing everything we learn. The future isn't about better AI—it's about better AI collaboration.
As always, if you like this, share it and follow me here or in Twitter or LinkedIn.