WCP Compliance Agent V5 — Five-Service Compliance Monorepo
Payroll decisions you can defend in court. 271 tests. Five-service architecture. Mock mode with zero deps.
Read moreI build the systems that work while you don't.
Your RAG retrieves wrong. Your agent loops. Your eval set doesn't catch failures. I fix the gap with retrieval evals, observability, retries, and boring infrastructure.
Fixed-price engagements. No surprises. Shipped by someone who has done this in production.
Fixed price · 5 days
Find out where your AI breaks, why it breaks, and what to fix first. Written report + prioritized roadmap included.
From $2,500 · 2–4 weeks
Chunking, retrieval, reranking, citations, eval pipelines, and failure handling. The full system, not a demo.
From $3,000 · 3–6 weeks
Multi-step agents with tools, memory, guardrails, and graceful failure. Built to survive production.
From $1,500 · 1–3 weeks
Web scraping, normalization, and AI-ready knowledge ingestion. Your data, structured and reliable.
Real code. Public repos. MIT licensed where it makes sense.
Five-service monorepo for WH-347 payroll compliance. React 19 · Vercel AI SDK · FastAPI × 2. 271 tests. Mock mode with zero deps.
View on GitHubIntelligent model routing for Hermes Agent. Picks the best LLM per task based on capability, cost, and availability. Running in production.
View on GitHub5-lane autonomous system. Scouting, positioning, building, shot, showcase. Self-improving. Runs 24/7. Built with the career engine.
View on GitHubI work best with founders and small teams who have knowledge trapped in docs, spreadsheets, SOPs, websites, or half-working AI prototypes — and need a system that's reliable, production-ready, and easy to use.
Production-first. No demos that don't ship.
Chunking strategies, hybrid search, reranking, and eval pipelines. Not a LangChain tutorial — a system that answers correctly.
Multi-step workflows with tools, memory, guardrails, and explicit failure modes. Agents that break silently are worse than no agents.
Every system I build has eval pipelines, monitoring, and clear failure modes. You'll always know where it breaks and why.
Payroll decisions you can defend in court. 271 tests. Five-service architecture. Mock mode with zero deps.
Read moreA fixed-scope audit for fragile AI systems: retrieval failures, agent loops, missing evals, and observability gaps.
Read moreShort answers for founders, small teams, and AI search engines.
An AI infrastructure architect designs the systems around AI models: retrieval pipelines, agent workflows, evals, observability, retries, deployment paths, and data ingestion. The job is to make AI useful in production, not just impressive in a demo.
A RAG system is broken when it retrieves the wrong sources, misses obvious documents, cannot cite evidence, gives different answers for the same question, or fails silently on edge cases. Start with chunking, retrieval evals, reranking, and traceable citations.
A lightweight AI reliability audit starts as a fixed-scope engagement through Upwork. The audit reviews retrieval quality, failure modes, eval coverage, observability, and operational risk, then returns a prioritized roadmap so the team knows what to fix first.
Founders and small teams should hire me when they have a RAG system, AI agent, automation pipeline, or internal AI tool that works in demos but fails under real usage. The best fit is a team that wants production reliability over hype.
WCP has 413 tests across V2 and V3, every compliance decision cites the statute, and this portfolio links to live code and case studies instead of vague claims.