Skip to main content
By Doug Silkstone | January 13, 2025 I’ve built tool registries for 50+ companies in the past three years. The ones that succeed follow the same pattern: they treat AI tools like they treat code packages. Build once, version properly, share across teams, deprecate gracefully. The ones that fail try to build Storybook for AI, or some elaborate registry platform, or a perfect abstraction layer. They spend six months building infrastructure and never ship. Here’s the architecture that actually works. This is the technical implementation guide for Level 2: Systematic Integration from the AI Automation Maturity framework. If you haven’t read that article, start there to understand why this matters.

The Mental Model

The pattern is simpler than you think. Modern AI agents work through function calling. You define structured interfaces with strict schemas. The model analyzes intent and selects appropriate tools. The model generates parameters that match your schema. Your system executes the function and returns results. That’s it. Tool definitions become registries. Registries get exposed via MCP (Model Context Protocol) so any AI application can discover and use them.
  • Function Calling Flow
  • Why This Architecture Works
  • What You Don't Need
System Architecture:
  1. User Message → Agent System
  2. Agent queries Tool Registry for available capabilities
  3. LLM analyzes intent against tool catalog
  4. LLM generates structured parameters matching tool schema
  5. Validation layer verifies parameter compliance
  6. Handler function executes business logic
  7. Structured result returns to LLM
  8. LLM incorporates results into natural language response
Key Architectural Points:
  • Registry acts as service catalog
  • Schema validation enforces type safety
  • Handlers encapsulate integration logic
  • Results maintain structured format for programmatic use
I’ve watched teams spend months building elaborate platforms. The simple architectural pattern always wins.

Start With One Tool

Let’s architect one tool properly before scaling to dozens. Here’s the pattern that works every time. A tool has four architectural components: metadata, schema, handler, and response contract. That’s it.
Tool Component Architecture
════════════════════════════════════════════════════════════

1. METADATA LAYER
   ├── Identifier: "send_email"
   │   Purpose: Unique, explicit name for LLM discovery
   │   Naming convention: action_noun format (verb_object)

   └── Description: "Send an email to a specified recipient.
       Use when user requests email delivery or notification."
       Purpose: Intent matching for LLM tool selection
       Critical: This drives tool selection accuracy

2. SCHEMA VALIDATION LAYER
   ├── Input Contract:
   │   ├── to: EmailAddress (required, validated format)
   │   ├── subject: String (required, 1-200 chars)
   │   ├── body: String (required, min 1 char)
   │   └── from: EmailAddress (optional, has default)

   ├── Validation Rules:
   │   ├── Type checking (string vs number vs object)
   │   ├── Format validation (email regex, length limits)
   │   ├── Required vs optional field enforcement
   │   └── Default value injection

   └── Runtime Behavior:
       ├── Invalid input → Return schema violation error
       ├── Valid input → Pass typed data to handler
       └── Prevents downstream errors from malformed data

3. HANDLER EXECUTION LAYER
   ├── Receives: Validated, typed parameters
   ├── Authenticates: With external email service provider
   ├── Transforms: Parameters → Provider API format
   ├── Executes: API call with proper error boundaries
   └── Returns: Structured response object

4. RESPONSE CONTRACT
   ├── Success Path:
   │   {
   │     success: true,
   │     messageId: "unique-provider-id",
   │     timestamp: "ISO-8601-datetime"
   │   }

   └── Error Path:
       {
         success: false,
         error: "human-readable-message",
         errorCode: "PROVIDER_ERROR | VALIDATION_ERROR | etc"
       }
       Critical: Never throw exceptions that break agent loop

ARCHITECTURAL DECISIONS:
════════════════════════════════════════════════════════════

Decision: Schema validation library (Zod, Joi, etc.)
Rationale: Runtime type safety + compile-time inference
          Prevents LLM hallucinated parameters from reaching handlers

Decision: Explicit naming ("send_email" not "email")
Rationale: LLMs perform better with verb_noun clarity
          Reduces tool selection ambiguity

Decision: Rich descriptions with usage context
Rationale: Description is primary signal for LLM tool selection
          More important than parameter names for accuracy

Decision: Structured error responses
Rationale: Maintains agent conversation loop continuity
          Enables error recovery and retry logic

Decision: Environment-based secrets
Rationale: Development/staging/production isolation
          Prevents credential leakage in code
This is the architectural pattern I’ve deployed dozens of times. The key architectural decisions: Schema validation layer: Separate validation from business logic. The schema defines your contract with the LLM. Choose a library that provides both runtime validation and type inference. This prevents malformed LLM outputs from reaching your handler logic. Explicit naming convention: send_email not email or sendMail. Use action_noun patterns. LLMs demonstrate higher tool selection accuracy with explicit verb-object naming. Description-driven discovery: The description field drives LLM tool selection. This is your primary signal for intent matching. Include when to use the tool, not just what it does. “Send email when user requests notification” outperforms “Sends an email.” Structured response contracts: Return objects, never throw exceptions. Maintain consistent {success, data/error} patterns. Exceptions break the agent loop. Structured errors enable retry logic and graceful degradation. Secret management architecture: Environment-based configuration separates concerns. Development uses local environment files. Production uses proper secret managers (cloud provider services, dedicated secret management platforms). Never embed secrets in tool definitions.

The Registry Pattern

After building this 50+ times, here’s the simplest architectural pattern that works. A registry is a structured mapping. Tool identifiers map to their definitions and handlers.
Registry Architecture Pattern
════════════════════════════════════════════════════════════

CORE STRUCTURE:
├── Registry Object (in-memory data structure)
│   ├── Key: Tool identifier string
│   └── Value: Tool package {definition, handler, metadata}

├── Discovery Interface
│   ├── getToolDefinitions()
│   │   Returns: Array of tool definitions for LLM
│   │   Used by: Agent initialization, capability discovery
│   │
│   └── getToolByName(name: string)
│       Returns: Single tool package or null
│       Used by: Tool lookup during execution

└── Execution Interface
    └── executeTool(name: string, params: object)
        Flow:
        1. Lookup tool in registry by name
        2. Validate params against tool schema
        3. Execute handler with validated params
        4. Return structured response
        Error cases:
        - Tool not found → Return unknown tool error
        - Validation fails → Return schema error
        - Handler fails → Return execution error

EXAMPLE REGISTRY STRUCTURE:
════════════════════════════════════════════════════════════

Registry = {
  "send_email": {
    definition: {
      name: "send_email",
      description: "Send email via provider API",
      schema: InputSchema
    },
    handler: EmailHandlerFunction,
    metadata: {
      version: "1.0.0",
      category: "communication",
      requiredPermissions: ["email:send"]
    }
  },

  "create_task": {
    definition: {
      name: "create_task",
      description: "Create task in project management system",
      schema: TaskInputSchema
    },
    handler: TaskHandlerFunction,
    metadata: {
      version: "1.0.0",
      category: "productivity",
      requiredPermissions: ["tasks:write"]
    }
  },

  "search_docs": {
    definition: {
      name: "search_docs",
      description: "Search documentation with vector similarity",
      schema: SearchInputSchema
    },
    handler: SearchHandlerFunction,
    metadata: {
      version: "1.0.0",
      category: "knowledge",
      requiredPermissions: ["docs:read"]
    }
  }
}

ARCHITECTURAL PROPERTIES:
════════════════════════════════════════════════════════════

✓ In-Memory Performance
  - No database lookups required
  - Direct object property access (O(1) lookup)
  - Entire catalog loaded at application startup

✓ Type Safety
  - Static analysis catches registry mismatches
  - Schema validation enforced per tool
  - Handler signatures verified at compile time

✓ Simple Addition Pattern
  - Import new tool module
  - Add single registry entry
  - Zero configuration required

✓ Testability
  - Mock individual handlers for unit tests
  - Replace entire registry for integration tests
  - Isolated tool testing without dependencies

VERSIONING ARCHITECTURE:
════════════════════════════════════════════════════════════

Approach: Semantic versioning via tool naming

Registry = {
  // Version 1: Original implementation
  "send_email": {
    definition: { ... },
    handler: sendEmailV1Handler,
    metadata: { version: "1.0.0" }
  },

  // Version 2: Breaking changes (CC/BCC support)
  "send_email_v2": {
    definition: { ... enhanced schema ... },
    handler: sendEmailV2Handler,
    metadata: {
      version: "2.0.0",
      supersedes: "send_email",
      deprecation: {
        date: "2025-06-01",
        migrationGuide: "/docs/email-v2-migration"
      }
    }
  }
}

Migration Strategy:
1. Deploy v2 alongside v1
2. Mark v1 as deprecated in description
3. Monitor usage metrics
4. Migrate agents gradually (low-risk rollout)
5. Remove v1 after 90 days of zero usage

This pattern handles:
- Breaking schema changes
- Different integration providers
- Performance optimizations requiring new architecture
- Security requirement updates
That’s it. No database. No complex state management. No elaborate platform. A structured mapping that provides discovery and execution interfaces. Why this architecture works: Performance: In-memory registry enables O(1) tool lookup. No network calls for discovery. No database queries. Entire catalog loaded once at startup. Simplicity: Adding tools requires single entry in structured mapping. Import tool module, add registry entry, done. No configuration files, no separate deployment steps. Type Safety: Static type systems verify registry structure at compile time. Schema validation enforces runtime type safety. Prevents entire classes of production errors. Testability: Mock individual handlers without affecting registry structure. Replace registry entirely for integration tests. Isolated tool testing without cross-dependencies. Versioning Strategy: Tool names encode version when needed. Run multiple versions simultaneously. Gradual migration with zero downtime. Metrics-driven deprecation decisions.

MCP Servers: The Modern Standard

MCP solves the integration nightmare I used to face on every project. Before MCP, every AI platform required custom integrations. Build tools for OpenAI’s schema format. Rebuild for Anthropic’s API. Rebuild again for Google’s specification. Different schemas, different calling conventions, different protocol layers. Model Context Protocol standardizes this integration layer. Build your tool registry once, expose it via MCP server, and any compliant AI application can discover and use your tools. Claude Desktop, Cline, Continue, Zed, and dozens more support MCP natively. Business value: Stop rebuilding integrations for each AI platform. Build tools once, use everywhere.
MCP Server Architecture
════════════════════════════════════════════════════════════

SYSTEM COMPONENTS:
├── MCP Server Process
│   ├── Capabilities Declaration
│   │   Advertises: "tools" capability to clients
│   │   Enables: Tool discovery and execution
│   │
│   ├── Request Handlers
│   │   ├── tools/list → Returns catalog of available tools
│   │   └── tools/call → Executes specified tool with params
│   │
│   └── Transport Layer
│       Options: stdio, HTTP, WebSocket
│       Default: stdio (process communication)
│       Production: HTTP for remote access

├── Client Applications (Claude Desktop, Cline, etc.)
│   ├── Discovery Phase
│   │   1. Connect to MCP server via transport
│   │   2. Query capabilities to find supported features
│   │   3. Request tools/list to get full catalog
│   │   4. Parse tool schemas for LLM context
│   │
│   └── Execution Phase
│       1. LLM selects tool based on intent
│       2. LLM generates parameters matching schema
│       3. Client sends tools/call request
│       4. Server executes handler, returns result
│       5. Client receives structured response
│       6. LLM incorporates result into conversation

└── Tool Registry Integration
    MCP server acts as protocol adapter:
    Registry (internal format) ↔ MCP Server ↔ Standard Protocol

PROTOCOL FLOW:
════════════════════════════════════════════════════════════

[AI Application] ←→ [MCP Server] ←→ [Tool Registry]

1. DISCOVERY REQUEST (tools/list)
   AI App → MCP Server: "What tools are available?"
   MCP Server → Registry: Query all tool definitions
   Registry → MCP Server: Return tool catalog
   MCP Server → AI App: Format as MCP tool list

   Response structure:
   {
     tools: [
       {
         name: "send_email",
         description: "Send email via provider API",
         inputSchema: {
           type: "object",
           properties: {
             to: { type: "string", format: "email" },
             subject: { type: "string" },
             body: { type: "string" }
           },
           required: ["to", "subject", "body"]
         }
       },
       { ... more tools ... }
     ]
   }

2. EXECUTION REQUEST (tools/call)
   AI App → MCP Server: {
     name: "send_email",
     arguments: {
       to: "user@example.com",
       subject: "Test",
       body: "Hello"
     }
   }

   MCP Server → Registry: executeTool("send_email", params)
   Registry → Validation Layer → Handler → External API
   Handler → Registry: Structured result
   Registry → MCP Server: Return response
   MCP Server → AI App: Format as MCP response

   Response structure:
   {
     content: [
       {
         type: "text",
         text: JSON.stringify({
           success: true,
           messageId: "msg_abc123"
         })
       }
     ]
   }

DEPLOYMENT ARCHITECTURES:
════════════════════════════════════════════════════════════

Development Environment:
├── MCP Server runs as local process
├── Transport: stdio (process stdin/stdout)
├── Client config points to executable path
└── Zero network configuration required

Production Environment:
├── MCP Server deployed as hosted service
├── Transport: HTTP/WebSocket for remote access
├── Client config uses network endpoints
└── Enables centralized tool management

CLIENT CONFIGURATION PATTERN:
════════════════════════════════════════════════════════════

MCP clients discover servers via configuration:

{
  "mcpServers": {
    "company-tools": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-server.js"],
      "env": {
        "API_KEY": "secret-value",
        "LOG_LEVEL": "info"
      }
    }
  }
}

Configuration elements:
- Server identifier: Unique name for this tool catalog
- Command: Executable to launch MCP server process
- Args: Command-line arguments passed to process
- Env: Environment variables for server context

Client behavior:
1. Read configuration file at startup
2. Launch MCP server processes via command
3. Establish stdio transport connection
4. Query tools/list to populate catalog
5. Tools available to LLM immediately

ARCHITECTURAL BENEFITS:
════════════════════════════════════════════════════════════

✓ Protocol Standardization
  Single implementation works across all MCP clients
  No platform-specific adaptations required
  Future clients support tools automatically

✓ Zero Manual Registration
  Configuration file drives discovery
  No API keys for basic usage
  No manual tool catalog management

✓ Isolation and Security
  Each MCP server runs in separate process
  Server-level environment variable isolation
  Granular permission control per server

✓ Composability
  Multiple MCP servers can coexist
  Each server provides focused tool catalog
  Clients aggregate tools from all servers
This architecture connects your tool registry to the MCP protocol layer. Claude Desktop and other MCP-compliant clients discover and use your tools automatically. Deployment patterns I use: Development: Local MCP server process launched via client configuration. Stdio transport for simplicity. Environment variables from local files. Zero network configuration. Production: Hosted MCP service with HTTP transport. Centralized tool catalog accessible to distributed clients. Proper secret management via cloud provider. Monitoring and observability integrated. Configuration: MCP clients read configuration files specifying server locations. Client launches server processes (stdio) or connects to network endpoints (HTTP). Tools appear in client immediately after server connection. No manual registration, no additional API configuration.

Real Implementation: The 35,000 User Marketplace

Let me show you what this looks like in production. I built this for Contra’s job discovery automation. They had 30,000+ users who needed systematic lead generation across LinkedIn and X. The previous approach was scattered - individual scrapers, manual workflows, zero coordination. The architecture we built:
  • Tool Library Structure
  • Registry Organization
  • Agent Composition
tools/
├── linkedin/
│   ├── extract-profile.ts
│   ├── search-jobs.ts
│   └── analyze-post.ts
├── twitter/
│   ├── extract-profile.ts
│   └── analyze-timeline.ts
├── classification/
│   ├── detect-hiring-signal.ts
│   └── extract-job-details.ts
└── pipeline/
    ├── store-lead.ts
    └── notify-user.ts
12 core tools, each focused on one thing
What we got wrong first: Mistake #1: Built elaborate tool versioning before we had tool stability. Wasted two weeks on infrastructure we didn’t need. Fixed by shipping v1, adding versioning only when we needed breaking changes. Mistake #2: No retry logic in handlers. LinkedIn rate limits broke everything. Fixed by adding exponential backoff to all external API calls. Mistake #3: Tried to make tools too generic. The analyze_content tool tried to handle LinkedIn posts, tweets, job listings, and profile bios. Ended up with messy conditional logic and poor results. Fixed by creating specific tools: analyze_linkedin_post, analyze_tweet, etc. What actually worked:
  • Started with 5 tools for one use case (LinkedIn job detection)
  • Deployed to 100 beta users in week 2
  • Added tools based on real usage patterns, not speculation
  • Reached 12 tools over 8 weeks - never needed more
  • 70% of new workflows reused existing tools
Metrics after 6 months:
  • 30,000+ active users (from beta group of 100)
  • 2x increase in marketplace job inventory
  • 85% reduction in manual lead generation work
  • Component reuse rate: 73%
  • Average new workflow deployment: 2 days (down from 3 weeks)
The infrastructure cost was minimal - $800/month for hosting, LLM API costs scaled with usage. ROI was clear within the first quarter.

Production Concerns

Here’s what breaks in production and the architectural patterns to handle it.
  • Error Handling
  • Security
  • Versioning
  • Testing
  • Monitoring
Retry Pattern Architecture:
Retry Logic Pattern
════════════════════════════════════════════════════════════

Flow:
1. Attempt operation
2. If success → Return result
3. If failure → Check attempt count
4. If attempts remaining:
   - Calculate backoff delay (exponential: delay * attempt)
   - Wait for delay period
   - Retry operation
5. If max attempts reached → Return final error

Configuration parameters:
- maxAttempts: Number of retry attempts (typically 3-5)
- baseDelay: Initial delay in milliseconds (typically 1000ms)
- backoffStrategy: Linear vs exponential
- retryableErrors: Which error types trigger retry

Example timing (exponential backoff):
- Attempt 1: Execute immediately
- Attempt 2: Wait 1s, then execute
- Attempt 3: Wait 2s, then execute
- Attempt 4: Wait 4s, then execute

Apply to: External API calls, database operations, network requests
Circuit Breaker Pattern:
Circuit Breaker States
════════════════════════════════════════════════════════════

CLOSED (Normal Operation):
- Requests flow through to handler
- Track success/failure metrics
- If failure rate exceeds threshold → Open circuit

OPEN (Failing Fast):
- Immediately return error without calling handler
- Prevents cascading failures
- After timeout period → Transition to Half-Open

HALF-OPEN (Testing Recovery):
- Allow limited requests through
- If requests succeed → Close circuit
- If requests fail → Reopen circuit

Configuration:
- Failure threshold: 50% failure rate over last N calls
- Sample size: Minimum calls before evaluation (e.g., 10)
- Open duration: How long to stay open (e.g., 5 minutes)
- Half-open test size: Number of test requests (e.g., 3)

Fallback strategies when circuit open:
- Return cached data
- Return degraded functionality
- Return user-friendly error message
Dead Letter Queue Pattern:
Failed Tool Call Recovery
════════════════════════════════════════════════════════════

Architecture:
Main Queue → Tool Execution → Success Path
            ↓ (on failure)
        Dead Letter Queue → Manual Review / Automated Retry

DLQ entry includes:
- Original tool name and parameters
- Failure timestamp and error details
- Execution context (user ID, agent state)
- Number of previous attempts
- Stack trace / debugging information

Recovery workflows:
1. Automated retry after fixing root cause
2. Manual parameter correction and resubmission
3. Pattern analysis for systematic issues
4. Alerting for critical business operations

DLQ analysis reveals:
- Edge cases missed in testing
- Schema validation gaps
- External API reliability issues
- Handler implementation bugs

Common Mistakes

I’ve watched teams waste 6 months on these mistakes. Learn from my failures. Building elaborate platforms before shipping one tool The biggest killer. Teams design perfect tool management UIs, sophisticated versioning systems, elaborate discovery mechanisms. Six months later, they have zero tools in production. Start with a TypeScript file that exports an object. Ship tools that solve real problems. Add infrastructure only when pain is obvious. Skipping error handling initially “We’ll add it later” becomes “everything breaks in production.” I’ve seen this destroy three separate projects. Error handling isn’t optional. Add retries, circuit breakers, and graceful failures from the first tool. The patterns are simple, the cost of skipping them is catastrophic. No versioning strategy from day one You will need to change tool schemas. If you don’t plan for this, you’ll break every agent that uses them. Start with semantic versioning. Use tool names like send_email_v2 for breaking changes. It feels premature until the day you need it. Not documenting tool schemas The description field in your tool definition is critical. Models use it to decide when to call your tool. “Send email” is useless. “Send an email to a specified recipient. Use this when the user asks to send an email or notify someone via email.” works. I’ve debugged dozens of agent issues that came down to poor descriptions. Trying to support every use case immediately Generic tools sound good in theory. In practice, they’re messy and unreliable. analyze_content that handles posts, tweets, jobs, profiles, and articles ends up with nested conditionals and poor results. Build specific tools: analyze_linkedin_post, analyze_tweet. Start narrow, expand only when patterns emerge. Not measuring tool usage You can’t improve what you don’t measure. Track which tools get called, which succeed, which fail. This data shows you where to invest. I’ve watched teams build 50 tools where 5 handled 90% of use cases. The other 45 were waste. Overcomplicating the registry Database-backed registries, GraphQL APIs, elaborate caching layers - I’ve seen it all. You don’t need any of it. A TypeScript object that imports your tools works for hundreds of tools across dozens of agents. Add complexity only when simple patterns break.

The 4-Week Implementation Plan

Here’s how I’d implement this if I joined your team tomorrow.

Week 1: Build Core Tools

Pick one high-impact use case
  • Identify 3-5 tools needed for this use case
  • Build tools with proper schemas and error handling
  • Write unit tests for each handler
  • Deploy to staging environment
Deliverable: 3-5 production-ready tools solving one real problem

Week 2: Registry + First Agent

Create simple registry
  • Build registry.ts with your tools
  • Deploy first agent using these tools
  • Test agent with real scenarios
  • Document tool schemas and usage
Deliverable: One working agent solving the target use case

Week 3: MCP Server

Add MCP layer
  • Implement basic MCP server
  • Connect to Claude Desktop for testing
  • Deploy MCP server to production
  • Add monitoring and error tracking
Deliverable: Tools accessible via MCP to any compliant client

Week 4: Measure + Plan

Document and scale
  • Measure impact of first use case
  • Document patterns for team
  • Identify next 2-3 use cases
  • Plan tool library expansion
Deliverable: ROI analysis and roadmap for scaling
Key principles:
  • Ship to real users by end of week 2
  • Start narrow, prove the pattern, then scale
  • Document as you build, not after
  • Measure impact before expanding scope
This timeline assumes a team of 2-3 engineers. Add a week if you’re solo. Remove a week if you’re moving fast with no organizational friction.

Tools and Stack Recommendations

After trying everything, here’s the architectural approach I recommend.

Schema Validation Library

Purpose:
  • Runtime validation with compile-time type inference
  • Schema reuse across tool definitions and API contracts
  • Single source of truth for data structures
Key capabilities needed:
  • Type inference (schema → types automatically)
  • Composable schemas (reuse and extend)
  • Clear validation error messages
  • JSON Schema export for documentation
Alternative approaches: Strictly-typed languages (Go, Rust) if runtime validation less critical. Python with Pydantic for data science ecosystems. Choose based on team expertise and performance requirements.

Lightweight Web Framework

Purpose:
  • HTTP server for MCP transport layer
  • API endpoints for tool execution
  • Middleware for auth, logging, error handling
Key capabilities needed:
  • Edge-compatible for global deployment
  • Minimal dependencies and overhead
  • Request/response validation integration
  • OpenAPI documentation generation
Alternative approaches: Full-featured frameworks for complex enterprise requirements. Serverless functions for sporadic usage. Choose based on deployment target and scale requirements.

MCP Protocol Layer

Purpose:
  • Standardized tool discovery protocol
  • Cross-platform AI application compatibility
  • Future-proof integration layer
Key capabilities needed:
  • Standard protocol implementation
  • Multiple transport support (stdio, HTTP)
  • Client ecosystem compatibility
  • Well-documented specification
Alternative approaches: Custom protocols only if you need capabilities beyond MCP specification. Evaluate carefully - standardization has significant network effects.

Workflow Orchestration

Purpose:
  • LLM workflow execution management
  • Automatic retry and error recovery
  • Observability and debugging
Key capabilities needed:
  • Built-in retry mechanisms
  • Workflow state persistence
  • Execution visibility and tracing
  • Integration with monitoring tools
Alternative approaches: Simple queue systems for straightforward workflows. Complex state machine engines for sophisticated orchestration. Match complexity to actual requirements.

Documentation Generation

Purpose:
  • Auto-generate tool documentation from schemas
  • Provide interactive API exploration
  • Maintain schema-code synchronization
Key capabilities needed:
  • OpenAPI/JSON Schema support
  • Interactive documentation UI
  • Code example generation
  • Schema validation in documentation
Alternative approaches: Manual documentation for small catalogs. Custom documentation systems for specialized needs. Prioritize automation to prevent drift.

Testing Framework

Purpose:
  • Unit testing for tool handlers
  • Integration testing for agent workflows
  • Regression testing for schema changes
Key capabilities needed:
  • Fast unit test execution
  • Reliable async/await handling
  • Easy mocking of external services
  • End-to-end testing support
Alternative approaches: Choose framework matching your existing stack. Consistency matters more than specific features. Ensure good TypeScript support if using typed languages.
Infrastructure Architecture: Development Environment:
  • Local MCP servers via process execution
  • File-based configuration for simplicity
  • Local environment variables for secrets
  • Zero cloud dependencies for rapid iteration
Staging Environment:
  • Hosted MCP services with HTTP transport
  • Shared tool catalog for team testing
  • Production-like configuration
  • Integration with staging dependencies
Production Environment:
  • Global edge deployment for low latency
  • Enterprise cloud providers for compliance
  • Comprehensive monitoring and alerting
  • Proper secret management infrastructure
LLM Provider Selection: Choose based on task requirements, not platform defaults: Tool use and agent orchestration: Models with strong function calling capabilities and reliable parameter generation. Specific domain tasks: Models optimized for particular use cases (code generation, data analysis, etc.). Cost optimization: Balance model capability against volume and budget. High-volume workflows may require cost-optimized models. Multi-provider strategy: Don’t lock into single provider. Use appropriate model for each tool based on accuracy requirements and cost constraints.

What Actually Matters

Three years and 50+ implementations later, here’s what I know works: Start with one tool, prove the pattern Don’t build infrastructure. Build one tool that solves a real problem. Deploy it. Measure impact. If it works, build more tools. If not, fix the first one. Infrastructure emerges from successful patterns, not speculation. Registry is pattern, not platform You don’t need a database, UI, or elaborate system. A structured mapping from names to tool definitions scales to hundreds of tools across dozens of agents. I’ve never hit the limits of this simple pattern. MCP makes tools reusable across platforms Build once, expose via MCP, use everywhere. This is the standardization layer the industry needed. Invest in MCP-compatible tools and they’ll work with Claude, Continue, Cline, Zed, and whatever comes next. Production concerns matter from day one Error handling, security, versioning, testing, monitoring - these aren’t later problems. They’re now problems. Add them as you build your first tool. The patterns are simple, the cost of skipping them is high. Ship small, iterate fast I’ve watched elaborate six-month platform builds fail. I’ve watched simple two-week tool libraries succeed. The difference is shipping to real users and learning from actual usage. Build less, ship faster, iterate based on reality.
This is how I build Level 2 infrastructure. Not theoretical frameworks or perfect architectures - production patterns that work when real users depend on them. Next article in this series: measuring ROI and making the business case for systematic AI adoption. The numbers that convince executives and the metrics that prove impact.
Building these systems for your team? Email me at doug@withseismic.com or connect on LinkedIn. I help engineering teams implement Level 2 infrastructure in 4-6 week sprints.