SRE Plattform Engineer mit MLOps

Europe

Remote

Who We Are

Role Description

‍

The project focuses on designing, building, and operating a scalable internal platform for Machine Learning and AI services
The goal is to enable reliable training, deployment, and inference of ML/AI models on our own cloud-native infrastructure

Strong hands-on experience with Machine Learning and AI workloads
Extensive practical experience running ML/AI workloads on Kubernetes (production-grade)
Solid MLOps expertise, including:
Model lifecycle management
CI/CD for ML models
Monitoring, logging, and reproducibility
Experience with ML and AI inference systems, focusing on scalability and low latency
Excellent knowledge of Kubernetes-based infrastructure for ML (GPU scheduling, scaling, reliability)
Experience with NVIDIA Triton Inference Server (strong plus / highly preferred)
Ability to design, build, and operate self-managed ML/AI infrastructure

‍

We Expect You to Have:

Apply for this position

Our team will review your application within the next 5 days.

Upload Resume

Uploading...

fileuploaded.jpg

Upload failed. Max size for files is 10 MB.

Send

Thank you!
We will be in touch shortly

Done

Oops! Something went wrong while submitting the form.