Senior AI QA Engineer

Europe

Remote

Who We Are

Role Description

You will work within an AI Incubator program, scouting, incubating, and validating client and internal AI conceptson a 3–5-year horizon. The focus is on building advanced AI prototypes, agentic workflows, and new AI-powered products and services while continuously exploring frontier AI capabilities.

Role Summary

The AI QA Engineer is the quality, safety, and reliability backbone of delivery. You ensure that agentic workflows, RAG systems, models, data pipelines, APIs, and UX layers behave reliably, safely, and consistently under real-world conditions.

This is not traditional QA. You design and own continuous evaluation strategies for non-deterministic AI systems and act as the final line of defense before solutions reach production environments.

Key Responsibilities

Own the end-to-end QA strategy across UI, backend, data, retrieval, and AI layers
Design and maintain LLM, RAG, and agent evaluation frameworks
Build automated test harnesses for Python services, APIs, agents, and pipelines
Integrate testing into CI/CD pipelines and prevent regressions
Validate data quality, embeddings, retrieval accuracy, and ranking performance
Identify hallucinations, reasoning failures, bias, and model drift
Design red-team and edge-case scenarios for safety and robustness
Define observability metrics for behavior, latency, cost, and failures
Run defect triage and deliver clear, actionable defect reporting
Collaborate closely with Tech Leads, engineers, and AI Ops

Required Skills & Experience

Technical

Strong Python scripting for automation and evaluation
Experience with LLM / RAG testing, ML evaluation, or model benchmarking
Familiarity with vector databases, retrieval systems, and agent workflows
CI/CD, DevOps tooling, and observability platforms
Ability to validate embeddings, precision/recall, and ranking metrics

QA & Risk

5–6+ years in QA, SDET, testing, or ML evaluation roles
Experience testing non-deterministic or probabilistic systems (preferred)
Strong instincts for edge cases, failure modes, and adversarial risks

Mindset

Curious, skeptical, and systematic
High ownership and strong communication skills
Comfortable defining what “quality” means for AI systems

‍

We Expect You to Have:

Apply for this position

Our team will review your application within the next 5 days.

Upload Resume

Uploading...

fileuploaded.jpg

Upload failed. Max size for files is 10 MB.

Send

Thank you!
We will be in touch shortly

kid giving a thumbs-up while sitting at a desktop table

Done

Oops! Something went wrong while submitting the form.

Role Summary

Key Responsibilities

Required Skills & Experience

Apply for this position

Thank you!We will be in touch shortly

Thank you!
We will be in touch shortly