nomad
2022-03-25 ยท 2 min read
Site: https://www.nomadproject.io/
Docs: https://www.nomadproject.io/docs
- "Workload" scheduler and orchestrator
- Supports container, VM, raw forked binary, or custom "driver" workloads.
- Devs declaratively specify workloads as Job file, which describes different services, versions, ports, dependencies, etc...
- Nomad looks at current desired set of jobs and tries to make reality match desired jobs.
- Nomad schedules workloads across available hardware. It restarts jobs that have crashed. It migrates jobs from failed machines.
- It allows you to specify version upgrade rollout strategy.
- Developers consume infrastructure via APIs. Nomad provides these "northbound" APIs.
- Ops manages infrastructure via APIs. Nomad provides these "southbound" APIs.
- Custom Driver Plugins (how to run and manage a task)
- Custom Device Plugins (custom resources that can be exposed). Could be useful for SGX maybe?
Example #
CircleCI #
- CircleCI uses Nomad as a Job Queue
- New commits from customers triggers an event
- This commit event is then submitted to Nomad as a Job (to test their code or w/e).
- Nomad can act like a Job queue and buffer Jobs until there is available capacity in the fleet. For example, CircleCI might get 2k+ jobs/min but only have 1k jobs/min hardware capacity during peak hours. Nomad will correctly buffer jobs until there is capacity.
Citadel #
- Citadel cares about how quickly, in absolute time, they can compute some job, rather than the total cost / number of cores / etc...
- Their issue is: how many containers can we run in a short period of time?
- Need to support bursty workloads (analysts want to run big models as quickly as possible).
- Need to work across existing DCs and "burst" to cloud on-demand.
- Want to run 40M containers in a short period.
- 2017 talk mentions 3k+ containers scheduled / sec.
- related blog post:
- HashiCorp Nomad scheduled 2,000,000 Docker containers on 6,100 hosts in 10 AWS regions in 22 minutes.
- 1.5k containers/sec
- https://www.hashicorp.com/c2m