Site Reliability Engineer
Greater NYC Area
11 hours ago
You will manage the infrastructure for cloud services, including running internal production systems and hosting CockroachDB for our external customers.
You will design, write, and deliver software and systems to increase product reliability and organizational efficiency.
You will develop custom tools as necessary
You will keep a complex system running and solve problems relating to mission-critical services.
You will design, implement, operate, and troubleshoot the automation and monitoring of production clusters to maximize performance and availability.
You will drive the company through disaster recovery tests, where we manually turn down pieces of CockroachDB to test it's overall resilience to failures.
You will participate in a weekly on-call rotation for our production systems and hosted services.