Head of Site Reliability
Greater NYC Area
1 week ago
Own the delivery and execution of SRE objectives by judiciously managing time, resources and the team members assigned to the platform. Success is measured both by adherence to timelines and quality of the work product.
Promote and implement best practices in observability (monitoring, tracing, alerting, logging) and high availability software engineering. Success is measured both by defining Service Level Objectives and error budgets for Paxos’ products and services and adhering to those.
Proactively manage costs by constantly monitoring utilization and optimizing computing resources as a consequence of capacity planning. Success is measured by optimizing infrastructure spend in an environment with an ever-increasing demand for resources while not compromising on performance or availability.
Guide the team in designing and building the tools, frameworks, systems, and processes that Paxos’ engineers use to build, integrate, deploy, scale and manage their software. Success is measured by time to market of product features while not compromising on reliability.
Build and develop a world-class SRE team. Success is defined by constantly giving timely feedback to team members, rewarding and growing the best engineers, investing in the development of team members as well as a swift action on poor performance.