Why We Built Our Own AI Agents

At Vixi, we didn't want to just sell AI automation services — we wanted to run on them. Over the past year, we've deployed 11+ AI agents that handle everything from inbound lead screening to weekly client reports, all running autonomously on our own infrastructure.

This post is the honest breakdown: what we built, what actually works, and what burned us early on.

The Problem With Off-the-Shelf AI Tools

When we started exploring AI automation in 2024, the market was flooded with "AI-powered" tools that were really just GPT wrappers with a nice UI. They worked well for demos. In production? Not so much.

The core issues we kept running into:

No memory between sessions — every conversation started from scratch
Can't call external APIs reliably — tools would claim success but nothing happened
No accountability — if an agent failed silently, you'd never know
Pricing that didn't scale — per-seat SaaS costs explode when you need to run agents constantly

We needed something different. We needed agents that could:

Run on a schedule without human intervention
Call real APIs (Monday.com, Hyros, Retell, Supabase)
Report results back to us (Telegram, Slack)
Fail loudly and recover gracefully

Our Stack: OpenClaw + Claude API

We settled on a combination of OpenClaw (our local agent gateway) and the Claude API from Anthropic. Here's why:

OpenClaw gives us:

A persistent gateway that routes requests to specialized agents
Agent-specific memory (each agent has its own MEMORY.md)
Tool execution (file reads/writes, API calls, bash commands)
Logging that lets us audit every action

Claude (Sonnet 4.6) gives us:

Best-in-class instruction following for agentic tasks
200K token context window for processing large reports
Reliable tool use that doesn't fabricate results
The ability to chain complex multi-step tasks

Together, they let us run agents that think, act, and report — without babysitting.

The 11 Agents We Run in Production

Here's what's actually running at Vixi right now:

1. Lead Screener (Retell AI + Claude)

When someone books a call, our voice AI agent (built on Retell AI) calls them within 5 minutes. It qualifies them using a scoring rubric, logs the call summary to Supabase, and updates Monday.com with the lead score. No human needed until the actual sales call.

Cost: ~$0.05/minute (down from $0.12 with optimization) ROI: Sales team spends zero time on unqualified leads

2. Daily Standup Collector

Every morning at 8am, this agent pings each team member on Slack, collects their standup updates, and posts a formatted summary to our #general channel. It flags blockers and tags the relevant person.

3. Hyros Escalation Monitor

Monitors our Hyros attribution data hourly. If any campaign's ROAS drops below threshold, it immediately fires a Telegram alert with the specific ad, campaign, and recommended action. It's prevented multiple 5-figure losses.

4. Client Report Generator

Every Friday, this agent pulls data from Hyros, Monday.com, and Google Analytics, formats it into a branded report, and emails it to the client automatically. What used to take 2 hours per client now takes 0 minutes.

5. Upwork Job Monitor

Scans Upwork every 30 minutes for relevant job postings, scores them with Claude (0-100 fit score), and sends the top 3 to Telegram with a pre-drafted proposal. Our Upwork response time went from hours to minutes.

6-11. Specialized Agents

We also run agents for: content ideation, SEO gap analysis, invoice creation, new client onboarding, ad creative briefs, and Monday.com task management.

The Architecture That Makes It Work

The key insight that changed everything was treating agents like employees, not tools. Each agent has:

A SOUL.md file — their role, personality, and rules A MEMORY.md file — persistent context that survives session resets A TOOLS.md file — exactly which tools they're allowed to use A schedule — when they run and what triggers them

This structure eliminates the biggest problem with AI agents: hallucination and scope creep. When an agent knows exactly what it's supposed to do and has memory of past actions, it stops making stuff up.

Agent Architecture:
┌─────────────────────────────────────┐
│  OpenClaw Gateway (port 18789)       │
│  Routes requests to right agent      │
└──────────┬──────────────────────────┘
           │
    ┌──────▼──────────────┐
    │  Agent Workspace     │
    │  - SOUL.md           │
    │  - MEMORY.md         │
    │  - TOOLS.md          │
    └──────┬───────────────┘
           │
    ┌──────▼──────────────┐
    │  Claude Sonnet 4.6   │
    │  (via API or MAX)    │
    └──────┬───────────────┘
           │
    ┌──────▼──────────────┐
    │  Tool Execution      │
    │  - API calls         │
    │  - File operations   │
    │  - Bash commands     │
    └─────────────────────┘

What We Learned the Hard Way

1. Agents Need to Fail Loudly

Silent failures are worse than no automation. Early on, we had agents claiming they'd completed tasks when they hadn't. Now every agent has a strict rule: if a tool call fails, log it immediately and send a Telegram alert. No fake success messages.

2. Start With One Well-Defined Task

The temptation is to build a "general-purpose" agent that does everything. This always fails. Our best agents do one thing and do it extremely well. The lead screener only screens leads. The Hyros monitor only monitors Hyros.

3. Memory Is Everything

An agent without memory is Groundhog Day. It will make the same mistakes, forget context, and lose track of ongoing work. Every production agent we run has a persistent memory file that gets updated after each session.

4. The Model Matters Less Than the Prompt

We spent weeks A/B testing Claude vs GPT-4 vs Gemini. The biggest performance gains came from improving our prompts and agent architecture, not from switching models. Invest in your system prompt before you invest in a premium model.

How to Build Your First Agency AI Agent

If you're an agency owner reading this, here's the fastest path to your first production agent:

Step 1: Pick one painful, repetitive task Not "automate everything" — pick the one task that eats 2+ hours per week.

Step 2: Map the exact steps Write out every action a human takes to complete the task. Be specific.

Step 3: Identify the data sources What APIs does it need? What accounts? Get the credentials first.

Step 4: Build a simple version first Start with a script that just calls the API and logs the result. Add Claude after you've proven the data flow works.

Step 5: Add memory and error handling Only after the happy path works, add persistence and failure modes.

Step 6: Schedule it and monitor for 2 weeks Don't trust any agent until it's run reliably for at least 14 days in production.

The Business Case

Here's the ROI we've measured in our own agency:

| Task | Before | After | Weekly Time Saved | |------|--------|-------|-------------------| | Lead qualification | 45 min/lead | 0 min | 4-6 hours | | Client reports | 2 hours/client | 15 min review | 8+ hours | | Upwork proposals | 30 min each | 5 min review | 3-4 hours | | Campaign monitoring | Daily manual check | Automated alerts | 5 hours |

That's 20-25 hours per week we've reclaimed. At our billing rate, that's $3,000-5,000/week in capacity that we've redirected to client work.

Ready to Build Your Own?

We offer done-for-you AI agent implementations for marketing agencies. Whether you want us to build your lead screener, automate your reporting, or set up a complete agent stack — we've built it before.

Book a free discovery call to see what we can automate for your agency.

AI Agent Development

Hyros Implementation

n8n Workflow Automation

Voice AI Agents

Marketing Attribution

Full-Stack Development

AI Agents for Marketing Agencies: How We Built Ours at Vixi