Hadoop infrastructure engineer
About the role:
We are currently seeking a Hadoop infrastructure engineer to work in the Data Lake Hadoop
● Hadoop MapR distribution
● Automation framework: GIT / Ansible / Jenkins
● Linux admin skills
What you will do:
As part of DevOps product team you will take end-to-end care of Hadoop platform being part of
the Data Lake ecosystem, that includes operating, scaling, supporting and engineering
● Develop: further engineer and automate the platform across technologies and
infrastructures with strong focus on network, servers and monitoring.
● Scale & harden: help to scale the platform to meet rapidly growing demand and load.
● Operate: overlook daily operations, maintenance, monitoring and capacity situation for
24x7 business critical platform.
● Support: help, troubleshoot and consult use cases, solve incidents, coordinate changes
You should have:
● Master degree in computer science or related field.
● Hands-on experience in running 24x7 critical, high load, big scale production platforms.
● Deep expertise in Hadoop mapr (HPE) distribution
● Expert knowledge on infrastructure automation using Ansible, Jenkins and Git.
● In-depth knowledge of Linux, preferably Red Hat Enterprise.
● Good experience with network and infrastructure administration.
● Hands-on experience in monitoring frameworks of Prometheus & Grafana.
● Some working experience in Docker / Kubernetes.
● Fair knowledge on Elasticsearch / Kibana.
● Some knowledge on Microsoft Azure / GCP (nice to have).
● Ability to use English in daily communication.
● Ready to learn extremely fast in a very agile and high pace environment.
● Work in a highly skilled, highly motivated international team of unique professionals.
Learn very fast, have a high impact and start-up feeling from day one.
● We follow scrum principles, trace work on Jira, talk via slack and live high trust DevOps
● Red Hat Enterprise Linux as OS layer.
● Jenkins, Ansible, Git as automation engine.
● Mapr (HPE) Hadoop cluster as storage with Hive, Spark, Drill, Hue ecosystem services.
● Data Science Workbench is based on JupyterHub and Kubeflow.
● Airflow to orchestrate data injection.
● Rancher Kubernetes Platform as underlying container infrastructure.
● Elasticsearch (OpenDistro) for central logging and specific use cases.
● Microsoft Azure and Google (GCP) for hybrid scenarios.