Site Reliability Engineer
AlphaSense provides an AI-based search engine for market intelligence, used by the largest and fastest-growing firms globally. Our mission is to curate and semantically index the world’s market and company information, including the vast high-value content sets that traditional web search engines cannot reach. With 1000+ enterprise clients, AlphaSense helps knowledge professionals become dramatically more productive, and gain an information edge by discovering critical data points and trends that others miss.
We are seeking a passionate Site Reliability Engineer to help create the next big thing in data analysis and search solutions.
You will join our Cloud infrastructure team supporting our team of development engineers taking care of the AlphaSense platform. We will pair you up with world-class talent in cloud and software engineering and provide a position and environment for continuous learning.
The ideal candidate has a strong system cloud configuration, monitoring, support and scripting skills. He is passionate about system engineering, scalability, stability and never wants to stop learning. Experience with AWS is essential.
Your responsibilities will include:
- Establish tools and instrumentation to measure and monitor availability, latency and overall system health
- Continuously provide help on Cloud-Native transition
- Provide sustainable incident response and blameless postmortems
- Learn the system far and wide and know all it’s weak points
- Troubleshoot production and development issues
- Provide help with deployments, tooling support
- Help drive the team towards continuous deployment
- Improve system stability by close communication with developers regarding the weak points in the system
- Passion to solve all engineers issues in a cloud-native way
- Keeping the system green and stable
- With help of our strong development team, you should be able to find a way to prevent incidents instead of just fixing those
- Create and maintain operational runbooks
- BS / MS Degree in Computer Science or related discipline preferred
- Experience with AWS
- At least basic experience with K8s (helm, operators)
- Strong skills in scripting languages (shell scripts, Perl, Python)
- Experience with Prometheus, Grafana and other open source monitoring/logging solutions
- Interest in designing and troubleshooting of large-scale distributed systems
- Strong communication skills as well as a problem-solving mind
- Ability to automate routine tasks
- Good working knowledge of relational and NoSQL databases
Nice to have:
- Understanding of continuous deployment and how to get there
- Infrastructure as code experience (Ansible, Terraform, CloudFormation)
- Experience with logging setup configuration and maintenance (EKS, FluentD, LogStash)