You will work within an AI Incubator program, scouting, incubating, and validating client and internal AI conceptson a 3–5-year horizon. The focus is on building advanced AI prototypes, agentic workflows, and new AI-powered products and services while continuously exploring frontier AI capabilities.
Role Summary
The AI QA Engineer is the quality, safety, and reliability backbone of delivery. You ensure that agentic workflows, RAG systems, models, data pipelines, APIs, and UX layers behave reliably, safely, and consistently under real-world conditions.
This is not traditional QA. You design and own continuous evaluation strategies for non-deterministic AI systems and act as the final line of defense before solutions reach production environments.
Key Responsibilities
- Own the end-to-end QA strategy across UI, backend, data, retrieval, and AI layers
- Design and maintain LLM, RAG, and agent evaluation frameworks
- Build automated test harnesses for Python services, APIs, agents, and pipelines
- Integrate testing into CI/CD pipelines and prevent regressions
- Validate data quality, embeddings, retrieval accuracy, and ranking performance
- Identify hallucinations, reasoning failures, bias, and model drift
- Design red-team and edge-case scenarios for safety and robustness
- Define observability metrics for behavior, latency, cost, and failures
- Run defect triage and deliver clear, actionable defect reporting
- Collaborate closely with Tech Leads, engineers, and AI Ops
Required Skills & Experience
Technical
- Strong Python scripting for automation and evaluation
- Experience with LLM / RAG testing, ML evaluation, or model benchmarking
- Familiarity with vector databases, retrieval systems, and agent workflows
- CI/CD, DevOps tooling, and observability platforms
- Ability to validate embeddings, precision/recall, and ranking metrics
QA & Risk
- 5–6+ years in QA, SDET, testing, or ML evaluation roles
- Experience testing non-deterministic or probabilistic systems (preferred)
- Strong instincts for edge cases, failure modes, and adversarial risks
Mindset
- Curious, skeptical, and systematic
- High ownership and strong communication skills
- Comfortable defining what “quality” means for AI systems
.png)

