Senior Platform Engineer
Nomic AI
Location
New York HQ
Employment Type
Full time
Location Type
Hybrid
Department
Technical Staff
About Nomic
Nomic builds AI agents and developer tools for document intelligence. We help physical ai enterprise teams, primarily in architecture, engineering, and construction, extract structured knowledge from decades of drawings, specs, and project files. Our platform combines embedding models, document parsing, and autonomous agents that reason over real-world data and take action in live environments.
The Role
Nomic trains and deploys its own models, pioneered on-device LLM inference with GPT4All in 2023, and is aggressively building agentic systems—both for customers and into our own processes. The infrastructure already reflects that: multi-account AWS, Kubernetes, infrastructure-as-code, CI/CD, observability, and per-customer deployments. We're hiring a Platform Engineer to own that stack and push it forward—particularly the hard problem of running autonomous agents safely in deployed customer environments: sandboxed execution, secure isolation, automated health checks, and the reliability guarantees enterprises expect.
This is a senior IC role with broad ownership and real architectural influence.
Team: Platform & Infrastructure Reports to: CTO
What You'll Work On
Agent infrastructure is the highest priority. Our agents execute in customer cloud environments. You'll own the sandboxing, isolation, monitoring, and safe operation of those workloads at scale. This includes execution environments, security boundaries, automated QA, eval harnesses, and feedback loops that make agents more reliable over time.
Core infrastructure Kubernetes, multi-account AWS, CI/CD, deployment strategies, observability (traces, metrics, logs, alerting, SLOs), disaster recovery, and cost management.
Security posture access controls, secrets management, network security, image scanning, dependency auditing, and compliance work (SOC2, enterprise security) as customer requirements demand it.
Infrastructure as code defining, provisioning, and evolving all infrastructure through code.
What We're Looking For
You have:
5+ years in infrastructure, DevOps, or SRE roles running cloud infrastructure in production
Strong Kubernetes experience deploying workloads, debugging issues, working with operators and controllers
Solid infrastructure-as-code skills designing modules, managing state, thinking about blast radius
Strong software engineering fundamentals you write and review production code in Python and/or TypeScript, not just infra configs
Linux systems and networking fundamentals
CI/CD pipeline design and maintenance
A proactive orientation and comfort owning a wide surface area
Even better if you have:
Terraform experience
Experience with observability platforms (Datadog, OpenTelemetry) dashboards, trace/metric/log pipelines
PostgreSQL operations performance tuning, replica management
Experience with ML/AI infrastructure inference services, GPU workloads, model serving, eval pipelines
Background in multi-tenant deployment patterns or per-customer isolation
Experience building sandboxed execution environments or automated reliability systems