Founding Engineer, Agentic Systems
ChampAI
San Francisco, CA, USA
Location: San Francisco Stage: Early, high-ownership, design-partner driven Comp: Competitive salary + meaningful equity
About Champ AI
Champ AI is building a multimodal work-agent orchestration platform that helps ops/support/compliance teams automate end-to-end workflows—not just "chat with docs." We're building agentic systems that can reliably take actions across tools, handle real-world edge cases, and continuously improve with evaluations and feedback loops.
The Role
We're looking for a Founding Engineer (Agentic Systems) to own core pieces of our agent runtime and developer/product surface area. You'll build the systems that let agents operate safely, deterministically, and measurably in production: memory + context management, tool integration, sandbox execution, data syncing, evals, and AI-native UX.
This is a hands-on role where you'll ship to production quickly, work directly with design partners, and help define what "good" looks like for enterprise-grade agents.
What You'll Build
You'll likely own several of these areas end-to-end:
Agent runtime + orchestration
Agent loop design (planning → tool-use → verification → recovery) with strong guardrails.
Context assembly pipelines: retrieval + compression + summarization + "state" that survives long workflows.
Memory management: short-term working memory, long-term memory, user/org/project memory, and safe write policies.
Multi-agent patterns: delegation, handoffs, coordinator/worker setups, and concurrency.
Tooling + integrations
Tool definition frameworks: typed schemas, validation, retries, idempotency, rate limits, and observability.
Connectors + data syncing: SaaS APIs, webhooks, polling strategies, incremental sync, conflict resolution.
Browser automation / computer-use flows (auth, session handling, DOM variability, screenshots, network traces).
Sandbox + execution
Secure execution environments for "agent writes code / runs scripts / transforms data."
Permissions, isolation, secret management, and audit trails.
Deterministic replays where possible; safe "dry run" modes; blast-radius controls.
Evals + reliability
Evaluation harnesses for tool-use correctness, workflow completion, policy compliance, and regression detection.
Golden tasks + synthetic tasks + real production traces; offline + online metrics.
Experimentation frameworks (prompt/model/tool changes), versioning, and rollbacks.
Human-in-the-loop review flows: sampling, labeling, adjudication, continuous improvement loops.
AI-native product + UX
Interfaces that make agents understandable and controllable: traces, state, "why it did that," and editable plans.
UX patterns for approvals, step-through execution, partial automation, and exception handling.
Customer-facing debugability: audit logs, run history, data provenance.
What We're Looking For
You have real agent-building scars. We're specifically looking for engineers who have either:
Shipped AI agents into production (internal or external), or
Built meaningful open-source contributions in agent frameworks, eval tooling, RAG/memory tooling, browser automation, or similar.
You likely have experience with:
LLM tool-use, structured outputs, function calling, and multi-step workflows.
Context engineering: retrieval strategies, chunking, reranking, summarization, memory write/read policies.
Systems thinking: state machines, retries, idempotency, failure modes, and "what happens at 3am."
Integrations: OAuth, scopes, token refresh, pagination, incremental sync, webhooks, rate limiting.
Sandboxed execution or secure-by-default infra patterns (containers, ephemeral environments, secrets).
Observability: traces, metrics, logs; building "explainable runs" for humans.
Evaluation approaches for non-deterministic systems; confidence scoring; regression testing.
Bonus points
You've built AI-native UI surfaces (not just APIs): agent run views, trace explorers, approval UIs, etc.
You've worked with enterprise requirements: SOC2 posture, auditability, access controls, tenant isolation.
You can move between research-y prototyping and production-grade engineering without getting stuck in either.
How We Work
High ownership, fast iteration, direct customer feedback loops.
Strong bias toward shipping + measuring + improving.
You'll have meaningful influence on architecture, product direction, and hiring.
Interview Process (example)
30-min intro + deep dive on prior agent work (we'll ask about failure modes, evals, and production learnings)
Technical session: design an agent system for a real workflow (with tools, memory, guardrails, and evals)
Practical take-home or pair session (small scope, production-minded)
Founder chat + Q&A