All Jobs
No items found.
SRE
Europe
Remote
Who We Are
Role Description

Job Summary

As a Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, performance, and scalability of customer-facing and critical systems. You will work to maximize uptime, improve observability, automate repetitive tasks, and enhance deployment practices. This role involves close collaboration with software engineering and product teams to deliver robust and secure solutions in a high-availability environment.

Key Responsibilities

System Reliability & Performance

  • Ensure reliability, availability, and performance of critical systems and infrastructure.
  • Troubleshoot and resolve issues across distributed environments.

Automation & Infrastructure

  • Automate repetitive tasks, including monitoring, infrastructure management, and deployments.
  • Manage infrastructure as code (IaC) using tools such as Terraform, Ansible, or CloudFormation.

Incident Management

  • Respond to incidents and collaborate with development teams to resolve issues.
  • Lead post-incident reviews and contribute to long-term solutions.

Monitoring, Logging & Alerting

  • Design, implement, and maintain monitoring and alerting systems.
  • Use tools like Prometheus, Grafana, Datadog, New Relic, or ELK Stack to ensure observability.
  • Apply best practices for metrics, tracing, and logging.

Security & Compliance

  • Partner with development and ISMS teams to ensure secure deployments.
  • Implement security best practices across infrastructure and CI/CD pipelines.

Required Skills & Qualifications

  • Programming/Scripting: Proficiency in one or more languages (Go, Python, Java, TypeScript).
  • Cloud Platforms: Experience with AWS, GCP, or Azure.
  • Containerization & Orchestration: Hands-on experience with Docker and Kubernetes.
  • Networking Fundamentals: Strong understanding of DNS, load balancers, firewalls, VPNs.
  • Monitoring & Logging: Familiarity with Prometheus, Grafana, ELK, Datadog, New Relic.
  • CI/CD: Experience with GitHub Actions, Flux CD, Argo CD or similar.
  • Databases: Experience managing SQL databases (e.g., PostgreSQL).
  • Problem-Solving: Ability to diagnose and resolve complex distributed systems issues.
  • Security: Knowledge of security best practices and compliance with relevant standards.

Preferred Qualifications

  • Experience with large-scale distributed systems.
  • Knowledge of incident management practices and post-mortems.
  • Previous experience in DevOps or Software Engineering roles.

Soft Skills

  • Strong communication and collaboration skills across technical and non-technical teams.
  • Fluency in English (German is a plus).
  • Proactive, self-driven approach to identifying issues and suggesting improvements.
We Expect You to Have:

Apply for this position

Our team will review your application within the next 5 days.

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Send

Thank you!
We will be in touch shortly

kid giving a thumbs-up while sitting at a desktop table
Done
Oops! Something went wrong while submitting the form.