SRE

Europe

Remote

Who We Are

Role Description

Job Summary

As a Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, performance, and scalability of customer-facing and critical systems. You will work to maximize uptime, improve observability, automate repetitive tasks, and enhance deployment practices. This role involves close collaboration with software engineering and product teams to deliver robust and secure solutions in a high-availability environment.

Key Responsibilities

System Reliability & Performance

Ensure reliability, availability, and performance of critical systems and infrastructure.
Troubleshoot and resolve issues across distributed environments.

Automation & Infrastructure

Automate repetitive tasks, including monitoring, infrastructure management, and deployments.
Manage infrastructure as code (IaC) using tools such as Terraform, Ansible, or CloudFormation.

Incident Management

Respond to incidents and collaborate with development teams to resolve issues.
Lead post-incident reviews and contribute to long-term solutions.

Monitoring, Logging & Alerting

Design, implement, and maintain monitoring and alerting systems.
Use tools like Prometheus, Grafana, Datadog, New Relic, or ELK Stack to ensure observability.
Apply best practices for metrics, tracing, and logging.

Security & Compliance

Partner with development and ISMS teams to ensure secure deployments.
Implement security best practices across infrastructure and CI/CD pipelines.

Required Skills & Qualifications

Programming/Scripting: Proficiency in one or more languages (Go, Python, Java, TypeScript).
Cloud Platforms: Experience with AWS, GCP, or Azure.
Containerization & Orchestration: Hands-on experience with Docker and Kubernetes.
Networking Fundamentals: Strong understanding of DNS, load balancers, firewalls, VPNs.
Monitoring & Logging: Familiarity with Prometheus, Grafana, ELK, Datadog, New Relic.
CI/CD: Experience with GitHub Actions, Flux CD, Argo CD or similar.
Databases: Experience managing SQL databases (e.g., PostgreSQL).
Problem-Solving: Ability to diagnose and resolve complex distributed systems issues.
Security: Knowledge of security best practices and compliance with relevant standards.

Preferred Qualifications

Experience with large-scale distributed systems.
Knowledge of incident management practices and post-mortems.
Previous experience in DevOps or Software Engineering roles.

Soft Skills

Strong communication and collaboration skills across technical and non-technical teams.
Fluency in English (German is a plus).
Proactive, self-driven approach to identifying issues and suggesting improvements.

We Expect You to Have:

Apply for this position

Our team will review your application within the next 5 days.

Upload Resume

Uploading...

fileuploaded.jpg

Upload failed. Max size for files is 10 MB.

Send

Thank you!
We will be in touch shortly

kid giving a thumbs-up while sitting at a desktop table

Done

Oops! Something went wrong while submitting the form.