Looking for Service Reliability engineer who will be able to start immediately setting up monitoring, connect with the right people and automating client’s deployment pipelines.
- Design, build, and maintain highly available and scalable services or applications, focusing on observability.
- Develop and implement monitoring, tracing, and logging solutions to provide deep insights into service behavior and performance.
- Collaborate with cross-functional teams to define key performance indicators (KPIs) and service level objectives (SLOs) for the services.
- Establish and maintain service dashboards and visualizations to provide real-time visibility into service health and performance.
- Develop and maintain alerting systems to proactively detect and respond to service anomalies or degradation.
- Analyze and troubleshoot complex issues related to service performance, reliability, and availability.
- Conduct post-incident analysis using observability tools and implement improvements to prevent similar incidents in the future.
- Drive continuous improvement of the observability platform, including evaluating and adopting new tools and technologies.
- Participate in capacity planning exercises and provide recommendations to ensure optimal service performance.
- Collaborate with development teams to design and implement monitoring, tracing, and logging instrumentation within the services.