Ships AI product features end-to-end.
Open to full-time roles and contract engagements.
Top Rated Plus on Upwork.
What I do
- Production LLM integrations (caching, tool use, structured outputs)
- LLM evaluation systems (rubrics, audit, regression detection)
- MCP server design and implementation
- Full-stack AI features, frontend to backend
- Claude Code workflow setup for engineering teams
Recent work
- gatekeepview →Multi-agent GitHub App that screens PRs and issues for low-effort or AI-generated noise. A TypeScript gateway is the only thing that touches GitHub; a Python/LangGraph brain decides and never holds a token. Every verdict is grounded, explainable evidence, not a bare score.
- eval-harnessview →Agent eval harness that audits SWE-bench Verified itself. Flagged issues in 5 of 25 tasks (20%) with a custom rubric system.
- another-mePersonal knowledge wiki with a custom Claude Code plugin: ingest, query, and lint skills over a local vault.
- more on requestProduction integrations under NDA. Ask and I will walk you through the architecture and the numbers.
How I work
- Full-time roles or 3-6 month contract engagements
- Full-time or part-time
- Remote preferred, open to relocation
Rate
Flexible depending on scope, length, and whether it is full-time or contract. Get in touch.