Founding Engineer, Agentic Systems

ChampAI

ChampAI

San Francisco, CA, USA

Posted on Apr 17, 2026

Location: San Francisco Stage: Early, high-ownership, design-partner driven Comp: Competitive salary + meaningful equity

About Champ AI

Champ AI is building a multimodal work-agent orchestration platform that helps ops/support/compliance teams automate end-to-end workflows—not just "chat with docs." We're building agentic systems that can reliably take actions across tools, handle real-world edge cases, and continuously improve with evaluations and feedback loops.

The Role

We're looking for a Founding Engineer (Agentic Systems) to own core pieces of our agent runtime and developer/product surface area. You'll build the systems that let agents operate safely, deterministically, and measurably in production: memory + context management, tool integration, sandbox execution, data syncing, evals, and AI-native UX.

This is a hands-on role where you'll ship to production quickly, work directly with design partners, and help define what "good" looks like for enterprise-grade agents.

What You'll Build

You'll likely own several of these areas end-to-end:

Agent runtime + orchestration

  • Agent loop design (planning → tool-use → verification → recovery) with strong guardrails.

  • Context assembly pipelines: retrieval + compression + summarization + "state" that survives long workflows.

  • Memory management: short-term working memory, long-term memory, user/org/project memory, and safe write policies.

  • Multi-agent patterns: delegation, handoffs, coordinator/worker setups, and concurrency.

Tooling + integrations

  • Tool definition frameworks: typed schemas, validation, retries, idempotency, rate limits, and observability.

  • Connectors + data syncing: SaaS APIs, webhooks, polling strategies, incremental sync, conflict resolution.

  • Browser automation / computer-use flows (auth, session handling, DOM variability, screenshots, network traces).

Sandbox + execution

  • Secure execution environments for "agent writes code / runs scripts / transforms data."

  • Permissions, isolation, secret management, and audit trails.

  • Deterministic replays where possible; safe "dry run" modes; blast-radius controls.

Evals + reliability

  • Evaluation harnesses for tool-use correctness, workflow completion, policy compliance, and regression detection.

  • Golden tasks + synthetic tasks + real production traces; offline + online metrics.

  • Experimentation frameworks (prompt/model/tool changes), versioning, and rollbacks.

  • Human-in-the-loop review flows: sampling, labeling, adjudication, continuous improvement loops.

AI-native product + UX

  • Interfaces that make agents understandable and controllable: traces, state, "why it did that," and editable plans.

  • UX patterns for approvals, step-through execution, partial automation, and exception handling.

  • Customer-facing debugability: audit logs, run history, data provenance.

What We're Looking For

You have real agent-building scars. We're specifically looking for engineers who have either:

  • Shipped AI agents into production (internal or external), or

  • Built meaningful open-source contributions in agent frameworks, eval tooling, RAG/memory tooling, browser automation, or similar.

You likely have experience with:

  • LLM tool-use, structured outputs, function calling, and multi-step workflows.

  • Context engineering: retrieval strategies, chunking, reranking, summarization, memory write/read policies.

  • Systems thinking: state machines, retries, idempotency, failure modes, and "what happens at 3am."

  • Integrations: OAuth, scopes, token refresh, pagination, incremental sync, webhooks, rate limiting.

  • Sandboxed execution or secure-by-default infra patterns (containers, ephemeral environments, secrets).

  • Observability: traces, metrics, logs; building "explainable runs" for humans.

  • Evaluation approaches for non-deterministic systems; confidence scoring; regression testing.

Bonus points

  • You've built AI-native UI surfaces (not just APIs): agent run views, trace explorers, approval UIs, etc.

  • You've worked with enterprise requirements: SOC2 posture, auditability, access controls, tenant isolation.

  • You can move between research-y prototyping and production-grade engineering without getting stuck in either.

How We Work

  • High ownership, fast iteration, direct customer feedback loops.

  • Strong bias toward shipping + measuring + improving.

  • You'll have meaningful influence on architecture, product direction, and hiring.

Interview Process (example)

  1. 30-min intro + deep dive on prior agent work (we'll ask about failure modes, evals, and production learnings)

  2. Technical session: design an agent system for a real workflow (with tools, memory, guardrails, and evals)

  3. Practical take-home or pair session (small scope, production-minded)

  4. Founder chat + Q&A