The Mental Model
The pattern is simpler than you think. Modern AI agents work through function calling. You define structured interfaces with strict schemas. The model analyzes intent and selects appropriate tools. The model generates parameters that match your schema. Your system executes the function and returns results. That’s it. Tool definitions become registries. Registries get exposed via MCP (Model Context Protocol) so any AI application can discover and use them.- Function Calling Flow
- Why This Architecture Works
- What You Don't Need
System Architecture:
- User Message → Agent System
- Agent queries Tool Registry for available capabilities
- LLM analyzes intent against tool catalog
- LLM generates structured parameters matching tool schema
- Validation layer verifies parameter compliance
- Handler function executes business logic
- Structured result returns to LLM
- LLM incorporates results into natural language response
- Registry acts as service catalog
- Schema validation enforces type safety
- Handlers encapsulate integration logic
- Results maintain structured format for programmatic use
Start With One Tool
Let’s architect one tool properly before scaling to dozens. Here’s the pattern that works every time. A tool has four architectural components: metadata, schema, handler, and response contract. That’s it.Email Tool Architecture Pattern
Email Tool Architecture Pattern
send_email not email or sendMail. Use action_noun patterns. LLMs demonstrate higher tool selection accuracy with explicit verb-object naming.
Description-driven discovery: The description field drives LLM tool selection. This is your primary signal for intent matching. Include when to use the tool, not just what it does. “Send email when user requests notification” outperforms “Sends an email.”
Structured response contracts: Return objects, never throw exceptions. Maintain consistent {success, data/error} patterns. Exceptions break the agent loop. Structured errors enable retry logic and graceful degradation.
Secret management architecture: Environment-based configuration separates concerns. Development uses local environment files. Production uses proper secret managers (cloud provider services, dedicated secret management platforms). Never embed secrets in tool definitions.
The Registry Pattern
After building this 50+ times, here’s the simplest architectural pattern that works. A registry is a structured mapping. Tool identifiers map to their definitions and handlers.MCP Servers: The Modern Standard
MCP solves the integration nightmare I used to face on every project. Before MCP, every AI platform required custom integrations. Build tools for OpenAI’s schema format. Rebuild for Anthropic’s API. Rebuild again for Google’s specification. Different schemas, different calling conventions, different protocol layers. Model Context Protocol standardizes this integration layer. Build your tool registry once, expose it via MCP server, and any compliant AI application can discover and use your tools. Claude Desktop, Cline, Continue, Zed, and dozens more support MCP natively. Business value: Stop rebuilding integrations for each AI platform. Build tools once, use everywhere.MCP Server Architecture Pattern
MCP Server Architecture Pattern
Real Implementation: The 35,000 User Marketplace
Let me show you what this looks like in production. I built this for Contra’s job discovery automation. They had 30,000+ users who needed systematic lead generation across LinkedIn and X. The previous approach was scattered - individual scrapers, manual workflows, zero coordination. The architecture we built:- Tool Library Structure
- Registry Organization
- Agent Composition
analyze_content tool tried to handle LinkedIn posts, tweets, job listings, and profile bios. Ended up with messy conditional logic and poor results. Fixed by creating specific tools: analyze_linkedin_post, analyze_tweet, etc.
What actually worked:
- Started with 5 tools for one use case (LinkedIn job detection)
- Deployed to 100 beta users in week 2
- Added tools based on real usage patterns, not speculation
- Reached 12 tools over 8 weeks - never needed more
- 70% of new workflows reused existing tools
- 30,000+ active users (from beta group of 100)
- 2x increase in marketplace job inventory
- 85% reduction in manual lead generation work
- Component reuse rate: 73%
- Average new workflow deployment: 2 days (down from 3 weeks)
Production Concerns
Here’s what breaks in production and the architectural patterns to handle it.- Error Handling
- Security
- Versioning
- Testing
- Monitoring
Retry Pattern Architecture:Circuit Breaker Pattern:Dead Letter Queue Pattern:
Common Mistakes
I’ve watched teams waste 6 months on these mistakes. Learn from my failures. Building elaborate platforms before shipping one tool The biggest killer. Teams design perfect tool management UIs, sophisticated versioning systems, elaborate discovery mechanisms. Six months later, they have zero tools in production. Start with a TypeScript file that exports an object. Ship tools that solve real problems. Add infrastructure only when pain is obvious. Skipping error handling initially “We’ll add it later” becomes “everything breaks in production.” I’ve seen this destroy three separate projects. Error handling isn’t optional. Add retries, circuit breakers, and graceful failures from the first tool. The patterns are simple, the cost of skipping them is catastrophic. No versioning strategy from day one You will need to change tool schemas. If you don’t plan for this, you’ll break every agent that uses them. Start with semantic versioning. Use tool names likesend_email_v2 for breaking changes. It feels premature until the day you need it.
Not documenting tool schemas
The description field in your tool definition is critical. Models use it to decide when to call your tool. “Send email” is useless. “Send an email to a specified recipient. Use this when the user asks to send an email or notify someone via email.” works. I’ve debugged dozens of agent issues that came down to poor descriptions.
Trying to support every use case immediately
Generic tools sound good in theory. In practice, they’re messy and unreliable. analyze_content that handles posts, tweets, jobs, profiles, and articles ends up with nested conditionals and poor results. Build specific tools: analyze_linkedin_post, analyze_tweet. Start narrow, expand only when patterns emerge.
Not measuring tool usage
You can’t improve what you don’t measure. Track which tools get called, which succeed, which fail. This data shows you where to invest. I’ve watched teams build 50 tools where 5 handled 90% of use cases. The other 45 were waste.
Overcomplicating the registry
Database-backed registries, GraphQL APIs, elaborate caching layers - I’ve seen it all. You don’t need any of it. A TypeScript object that imports your tools works for hundreds of tools across dozens of agents. Add complexity only when simple patterns break.
The 4-Week Implementation Plan
Here’s how I’d implement this if I joined your team tomorrow.Week 1: Build Core Tools
Pick one high-impact use case
- Identify 3-5 tools needed for this use case
- Build tools with proper schemas and error handling
- Write unit tests for each handler
- Deploy to staging environment
Week 2: Registry + First Agent
Create simple registry
- Build registry.ts with your tools
- Deploy first agent using these tools
- Test agent with real scenarios
- Document tool schemas and usage
Week 3: MCP Server
Add MCP layer
- Implement basic MCP server
- Connect to Claude Desktop for testing
- Deploy MCP server to production
- Add monitoring and error tracking
Week 4: Measure + Plan
Document and scale
- Measure impact of first use case
- Document patterns for team
- Identify next 2-3 use cases
- Plan tool library expansion
- Ship to real users by end of week 2
- Start narrow, prove the pattern, then scale
- Document as you build, not after
- Measure impact before expanding scope
Tools and Stack Recommendations
After trying everything, here’s the architectural approach I recommend.Schema Validation Library
Purpose:
- Runtime validation with compile-time type inference
- Schema reuse across tool definitions and API contracts
- Single source of truth for data structures
- Type inference (schema → types automatically)
- Composable schemas (reuse and extend)
- Clear validation error messages
- JSON Schema export for documentation
Lightweight Web Framework
Purpose:
- HTTP server for MCP transport layer
- API endpoints for tool execution
- Middleware for auth, logging, error handling
- Edge-compatible for global deployment
- Minimal dependencies and overhead
- Request/response validation integration
- OpenAPI documentation generation
MCP Protocol Layer
Purpose:
- Standardized tool discovery protocol
- Cross-platform AI application compatibility
- Future-proof integration layer
- Standard protocol implementation
- Multiple transport support (stdio, HTTP)
- Client ecosystem compatibility
- Well-documented specification
Workflow Orchestration
Purpose:
- LLM workflow execution management
- Automatic retry and error recovery
- Observability and debugging
- Built-in retry mechanisms
- Workflow state persistence
- Execution visibility and tracing
- Integration with monitoring tools
Documentation Generation
Purpose:
- Auto-generate tool documentation from schemas
- Provide interactive API exploration
- Maintain schema-code synchronization
- OpenAPI/JSON Schema support
- Interactive documentation UI
- Code example generation
- Schema validation in documentation
Testing Framework
Purpose:
- Unit testing for tool handlers
- Integration testing for agent workflows
- Regression testing for schema changes
- Fast unit test execution
- Reliable async/await handling
- Easy mocking of external services
- End-to-end testing support
- Local MCP servers via process execution
- File-based configuration for simplicity
- Local environment variables for secrets
- Zero cloud dependencies for rapid iteration
- Hosted MCP services with HTTP transport
- Shared tool catalog for team testing
- Production-like configuration
- Integration with staging dependencies
- Global edge deployment for low latency
- Enterprise cloud providers for compliance
- Comprehensive monitoring and alerting
- Proper secret management infrastructure
What Actually Matters
Three years and 50+ implementations later, here’s what I know works: Start with one tool, prove the pattern Don’t build infrastructure. Build one tool that solves a real problem. Deploy it. Measure impact. If it works, build more tools. If not, fix the first one. Infrastructure emerges from successful patterns, not speculation. Registry is pattern, not platform You don’t need a database, UI, or elaborate system. A structured mapping from names to tool definitions scales to hundreds of tools across dozens of agents. I’ve never hit the limits of this simple pattern. MCP makes tools reusable across platforms Build once, expose via MCP, use everywhere. This is the standardization layer the industry needed. Invest in MCP-compatible tools and they’ll work with Claude, Continue, Cline, Zed, and whatever comes next. Production concerns matter from day one Error handling, security, versioning, testing, monitoring - these aren’t later problems. They’re now problems. Add them as you build your first tool. The patterns are simple, the cost of skipping them is high. Ship small, iterate fast I’ve watched elaborate six-month platform builds fail. I’ve watched simple two-week tool libraries succeed. The difference is shipping to real users and learning from actual usage. Build less, ship faster, iterate based on reality.This is how I build Level 2 infrastructure. Not theoretical frameworks or perfect architectures - production patterns that work when real users depend on them. Next article in this series: measuring ROI and making the business case for systematic AI adoption. The numbers that convince executives and the metrics that prove impact.
Read Level 2 Strategy
Understand why systematic integration creates competitive advantage
See Real Implementations
Case study: building the 30,000+ user tool ecosystem
Building these systems for your team? Email me at doug@withseismic.com or connect on LinkedIn. I help engineering teams implement Level 2 infrastructure in 4-6 week sprints.