Data Scientist

Remote

Who We Are

Role Description

Project info:

Participate in AI incubator projects to scout, incubate, and validate client and PwC-internal ideas on a 3–5year horizon; develop technology roadmaps and prototypes that deliver advanced client solutions; champion internally generated concepts; and continually explore, test, and demonstrate cuttingedge AI to create new products, services, and capabilities.
‍

‍

Role Overview

The Data Scientist is the hands-on builder across IE pods — responsible for turning research ideas into production-ready data products and AI systems. You will design and implement end-to-end workflows spanning graph-centric AI (graph theory, graph data science, knowledge graphs, and graph neural networks such as GCN/GAT), modern NLP/LLM solutions (semantic engineering, search/retrieval, RAG, fine-tuning, and text-to-SQL), and core ML/deep learning (core requirement). You operate at the intersection of modeling, retrieval, data engineering, and modern application development, translating architecture and functional intent into robust, reliable, real-world intelligence.

Across multiple pods and functional domains, you will build systems that integrate models, structured and unstructured data, graphs, APIs, and enterprise services. This includes designing graph + embedding pipelines, building or integrating knowledge graph layers, and implementing retrieval and reasoning components that allow agentic workflows to retrieve, ground, evaluate, and act. Whether the pod focuses on Finance, Operations, Supply Chain, Engineering, or Investments, you ensure the underlying intelligence behaves predictably, efficiently, and safely.

This role demands strong data science and ML fundamentals (core requirement), solid engineering discipline, and a practical grasp of emerging AI patterns (knowledge graphs, GNNs, RAG, agents, evaluators). You have strong hands-on development experience, understand architecture and workflow design end-to-end, and can ship clean, high-quality increments every sprint — even as the frontier moves. You embrace ambiguity, break down complex problems, and collaborate closely with architects, tech leads, UX designers, QA, and functional leads.

You are the person who builds the thing — the data products, the systems, the behaviors, the intelligence — that defines the next generation of the IE.

Responsibilities

1. Graph AI, Knowledge Graphs & Graph Data Science

· Design and implement graph-based solutions, including graph features, graph analytics, and graph ML workflows aligned to business use cases.

· Build and operate knowledge graphs (schema/ontology, entity resolution, ingestion/update pipelines) in graph databases (e.g., Neo4j) and apply graph representation learning (embeddings; GNNs such as GCN/GAT) to support prediction, ranking, and reasoning.

2. NLP, LLMs, Semantic Engineering & RAG

· Build LLM-powered capabilities end-to-end: prompt/tool design, semantic engineering, grounding strategies, and production-grade RAG pipelines (chunking, embeddings, reranking, citations/attribution where applicable).

· Deliver NLP features such as text-to-SQL/semantic parsing, classification/extraction, and domain adapters; incorporate user/context signals to enable search, retrieval, and personalization across structured and unstructured sources.

· Apply fine-tuning and adaptation approaches (lightweight tuning, instruction tuning, preference optimization where appropriate) and establish evaluation harnesses for quality, safety, and robustness.

3. Core Data Science, Machine Learning & Deep Learning (Core Requirement)

· Formulate problems, define success metrics, and design experiments; select and train ML/DL models appropriate for the task, data shape, and operational constraints.

· Perform rigorous evaluation and error analysis (offline metrics, slice analysis, ablations), and iterate on features, architectures, and data to improve performance and robustness.

· Ensure reproducibility and responsible delivery: version data/models, document assumptions, manage bias/quality risks, and contribute to model monitoring and drift detection.

4. Software Engineering, Architecture & Workflow Design

· Design end-to-end solution architectures and workflows that connect data sources, graph layers, retrieval, models, and application surfaces into coherent, maintainable systems.

· Implement production services and APIs (batch + real-time) that expose model, retrieval, and graph capabilities with clear contracts, security considerations, and performance targets.

· Translate R&D prototypes into production-ready components; collaborate with architects, pod leads, and UX/FE to ensure scalable designs and high-quality delivery.

5. Agile Delivery, Quality, Observability & E2E Ownership

· Own delivery in agile pods: break down problems, estimate, ship increments each sprint, and continuously improve based on feedback and measured outcomes.

· Build quality into the workflow: automated tests, evaluation suites for ML/LLM behavior, CI checks, code reviews, and clear operational runbooks.

· Instrument systems for reliability: monitoring for latency/cost, failure modes, and drift; logging for auditability and troubleshooting; and support safe rollout/rollback patterns.

6. Compute, Performance & Scaling (Preferred)

· Optimize training and inference pipelines for performance and cost, including efficient batching, caching, quantization/acceleration approaches where appropriate, and practical profiling.

· Leverage GPUs and accelerated compute for deep learning/LLM workloads; understand memory constraints, throughput bottlenecks, and practical optimization trade-offs.

· Apply distributed compute patterns when needed (e.g., data processing, training, retrieval indexing) and design systems that scale reliably across cloud resources.

Required Skills & Experience

Target Profile: Graph + NLP/LLM + ML Engineering

We are not hiring only “graph engineers.” We are looking for an ensemble profile that combines strong ML/data science foundations (core requirement) with depth in one or more of: graph-centric AI (graph theory, graph data science, knowledge graphs, GNNs such as GCN/GAT) and modern NLP/LLM engineering (semantic engineering, search/retrieval, RAG, personalization, text-to-SQL, and fine-tuning). The ideal candidate can translate research ideas into scalable architectures and well-designed workflows, and ship reliably in agile pods.

Graph, Graph Data Science & Knowledge Graphs: graph theory fundamentals; graph analytics/graph data science; knowledge graph modeling (schemas/ontologies) and semantic layers; graph databases (e.g., Neo4j); graph embeddings; GNNs and graph ML tooling (e.g., GCN, GAT).

NLP, LLMs & Semantic Engineering: modern NLP and language systems; semantic engineering; search/retrieval (lexical + vector + hybrid) and reranking; RAG; personalization; text-to-SQL; prompt engineering; fine-tuning/adaptation patterns.

Core ML / Deep Learning / Data Science (Core Requirement): strong grounding in machine learning and deep learning; applied data science; ability to design experiments, evaluate models, and reason about quality, robustness, and bias.

Hands-on Software Engineering & Delivery: solid, hands-on development experience (production Python and/or related stack); agile ways of working; R&D prototyping; architecture and workflow understanding and design across services, APIs, and data pipelines.

Compute & Scale (Preferred): exposure to GPUs and accelerated training/inference; distributed compute; and performance/cost optimization in cloud environments.

Technical Skills
· Strong Python engineering experience (3–6+ years).

· Experience with retrieval/search systems (lexical + vector + hybrid), embeddings, and ranking/reranking.

· Exposure to graph data science and/or knowledge graphs (modeling, ingestion, graph features/embeddings; experience with graph databases such as Neo4j); experience with GNNs (e.g., GCN/GAT) is a plus.

· Hands-on experience building RAG, agentic workflows, or similar AI patterns.

· Familiarity with model evaluation, testing, and observability tools.

· Solid knowledge of APIs, microservices, and data-centric integrations.

· Comfort with cloud platforms (Azure, AWS, GCP) and CI/CD workflows.

· Preferred: exposure to GPUs, accelerated training/inference, and/or distributed compute (e.g., Ray, Spark, Dask, or similar patterns).

Foundational Engineering Skills

· Excellent debugging, profiling, and performance optimization skills.

· Strong understanding of source control (Git), branching, PR reviews.

· Ability to work in structured agile environments with clear sprint increments.

Mindset

· High ownership, curiosity, and willingness to experiment.

· Loves hard problems and iterating on prototypes.

· Comfortable working in fast-changing frontier environments.

· Values clarity, structure, and quality in code.

Success Criteria (12-Week Pod)

· AI features delivered on-time, with clean code and strong reliability.

· Retrieval and agentic workflows behave consistently under load.

· Evaluations and instrumentation provide clear insight into agent performance.

· Integrations are stable, documented, and testable.

· Pod velocity improves as engineers ramp and systems stabilize.

Start date: ASAP

HackerRank Challenge: Yes

Remote vs Onsite: Fully remote, with possible occasional in person team sessions / workshops / gatherings (i.e. 1x quarter) likely to take place in Prague

US Hours overlap needed: Minimum 2-6pm CET, preferred 2-7pm CET

‍

We Expect You to Have:

Apply for this position

Our team will review your application within the next 5 days.

Upload Resume

Uploading...

fileuploaded.jpg

Upload failed. Max size for files is 10 MB.

Send

Thank you!
We will be in touch shortly

kid giving a thumbs-up while sitting at a desktop table

Done

Oops! Something went wrong while submitting the form.

Project info:

Apply for this position

Thank you!We will be in touch shortly

Thank you!
We will be in touch shortly