Senior Lead Data Engineer at GSK

Role Overview:

The Senior Lead Data Engineer will operate within a matrixed product team design, holding responsibility for the technical solution design, implementation, and ongoing enhancement of products and systems developed by Medical Digital and Tech. In alignment with Agile Ways of Working, DevSecOps principles, and requirements for Compliance and Digital Certainty, this role will collaborate closely with a Product Manager, adhering to agile and DevOps methodologies.

The Senior Lead Data Engineer will serve as a “T-Shaped” engineer, demonstrating both deep expertise and broad proficiency across essential engineering competencies, such as Software Development, Automated Testing, DevOps, CI/CD, Data Science/Analytics, and Lifecycle Management.

We are seeking an exceptionally skilled and strategic Senior lead Data Engineer with DevOps skills to lead the design, development, and optimization of our data and medical products and systems. The ideal candidate will possess significant expertise in Azure Databricks, modern Lakehouse architectures, infrastructure management, data pipeline automation, and advanced security practices, including the application of Generative AI to create advanced data and analytics solutions.

The role requires effective collaboration with experts from other tech teams and subject matter domains, as well as core engineering knowledge and experience with industry technologies, practices, and frameworks such as REACT, Azure Cloud Ops, AI/ML, CI/CD, DevOps, Automated Testing, and API Architectures.

Key Responsibilities

Azure Data Architecture: Define, architect, and implement scalable, secure, and cost-effective data solutions on Azure, utilizing Azure Data Lake Storage (ADLS) Gen2, Azure Data Factory (ADF), and Azure Synapse.
Databricks Lakehouse Implementation: Architect and optimize the Databricks Lakehouse platform, leveraging Delta Lake for transactional support and implementing robust data ingestion and transformation architectures.
GenAI Data Strategy: Lead data engineering initiatives for Generative AI projects, including the design and construction of data pipelines for Retrieval-Augmented Generation (RAG), feature engineering for large language model (LLM) fine-tuning, and managing vector databases and embedding workflows in both Databricks and Azure.
Advanced Data Processing: Develop, manage, and optimize large-scale batch and streaming data pipelines using Databricks notebooks with PySpark and SQL. Implement Databricks Workflows for job orchestration, ensuring robust monitoring, error handling, and alerting.
Data Governance and Security: Champion data governance best practices using Databricks Unity Catalog to manage permissions, enforce data quality, track lineage, and ensure compliance with security and privacy standards for all data assets.
Collaboration and Mentorship: Work closely with AI/ML engineers, data scientists, and business teams to understand data requirements for models and translate these into technical solutions. Provide technical leadership, mentorship, and guidance to the data engineering team.
Azure Cloud Architecture: Oversee the design, provisioning, and management of Azure cloud resources, including Azure Active Directory (AAD), networking, and security protocols. Manage Azure Databricks workspaces and clusters, monitor performance, troubleshoot issues, and optimize resource utilization. Utilize advanced Azure services such as Azure Functions, Logic Apps, and Synapse Analytics to construct robust, serverless solutions.
Databricks Pipeline Automation: Implement and manage end-to-end CI/CD pipelines for data and analytics projects on Azure Databricks using Azure DevOps and Databricks Asset Bundles (DABs) with Git integration. Automate the deployment of Databricks notebooks, libraries, and jobs across multiple environments (development, staging, production), and define/manage Databricks jobs using CI/CD practices to ensure version control and reliable, repeatable executions.
Infrastructure as Code (IaC) and Automation: Develop, implement, and maintain Infrastructure as Code for the entire cloud stack using advanced Azure Resource Manager (ARM) templates. Create complex automation scripts and playbooks with Python to automate infrastructure tasks and streamline workflows.
DevSecOps and Governance: Lead the integration of security best practices throughout the CI/CD pipeline and Azure environment. Establish and enforce governance policies for Databricks and Azure, manage access controls, compliance, and data privacy, and implement observability solutions for monitoring, logging, and alerting on Azure and Databricks using tools such as Azure Monitor, Log Analytics, and Grafana.
Collaboration and Problem-Solving: Serve as a technical liaison between data engineering, data science, and security teams to align best practices for data processing and MLOps. Provide expert-level troubleshooting and root cause analysis for performance and availability issues.
Cloud Infrastructure Management: Manage, optimize, and secure cloud environments on major platforms like Azure, with a focus on scalability and cost efficiency.
Process Improvement: Continuously evaluate and optimize existing processes to enhance the speed, quality, and reliability of software delivery.

Required Skills & Qualifications:

Technical Skills

BE/ B Tech graduate with Over 6 to 8years of progressive experience in data engineering, with significant expertise in building solutions on Azure using Databricks.
Azure Ecosystem: Expert-level knowledge of Azure Data Platform components, including ADLS Gen2, Azure Data Factory, Azure Synapse Analytics, and Azure Key Vault.
Databricks Mastery: Demonstrated expertise with Databricks, including Delta Lake, Unity Catalog, Databricks SQL, MLflow, and advanced Spark optimization techniques such as Photon Engine and Adaptive Query Execution (AQE).
GenAI Integration: Hands-on experience creating Generative AI-driven data solutions, such as Retrieval-Augmented Generation (RAG) pipelines, fine-tuning LLMs, and implementing vector search in production environments.
Programming Expertise: Mastery of Python (including PySpark and Pandas) and SQL.
Data Warehousing and Modeling: Strong understanding of dimensional modeling, data warehousing concepts, and implementing the Medallion architecture within a Lakehouse framework.
CI/CD Tools: In-depth, hands-on experience with CI/CD platforms such as GitLab CI and GitHub Actions, Infrastructure-as-Code (Terraform), and containerization (Docker, Kubernetes) for data and ML workloads.
Containerization: Mastery of container technologies like Docker and orchestration platforms like Kubernetes.
Monitoring and Observability: Expertise with observability tools such as Grafana.
Version Control: Strong proficiency with Git, including advanced workflow management.
Operating Systems: Deep knowledge of Linux/Unix administration.
GenAI Model Deployment: Lead the deployment of large language models (LLMs) and Generative AI applications on Azure, addressing challenges related to latency, cost, and security.
RAG System Implementation: Architect and implement Retrieval-Augmented Generation (RAG) systems on Azure, integrating vector databases (like Azure AI Search) and managing the associated data and infrastructure.
AI-Powered Automation: Utilize Generative AI tools to automate code generation, improve testing procedures, and develop intelligent automation for operational tasks.

Preferred Qualifications

Databricks certifications such as Databricks Certified Data Engineer Professional or Generative AI Engineer. Experience with Generative AI-related technologies and frameworks like Azure AI Search and Lang Chain.

Inclusion at GSK:

As an employer committed to Inclusion, we encourage you to reach out if you need any adjustments during the recruitment process.

Please contact our Recruitment Team at IN.recruitment-adjustments@gsk.com to discuss your needs.

Why GSK?

Uniting science, technology and talent to get ahead of disease together.

GSK is a global biopharma company with a purpose to unite science, technology and talent to get ahead of disease together. We aim to positively impact the health of 2.5 billion people by the end of the decade, as a successful, growing company where people can thrive. We get ahead of disease by preventing and treating it with innovation in specialty medicines and vaccines. We focus on four therapeutic areas: respiratory, immunology and inflammation; oncology; HIV; and infectious diseases – to impact health at scale.

People and patients around the world count on the medicines and vaccines we make, so we’re committed to creating an environment where our people can thrive and focus on what matters most. Our culture of being ambitious for patients, accountable for impact and doing the right thing is the foundation for how, together, we deliver for patients, shareholders and our people.

Inclusion at GSK:

As an employer committed to Inclusion, we encourage you to reach out if you need any adjustments during the recruitment process.

Please contact our Recruitment Team at IN.recruitment-adjustments@gsk.com to discuss your needs.

Important notice to Employment businesses/ Agencies

GSK does not accept referrals from employment businesses and/or employment agencies in respect of the vacancies posted on this site. All employment businesses/agencies are required to contact GSK's commercial and general procurement/human resources department to obtain prior written authorization before referring any candidates to GSK. The obtaining of prior written authorization is a condition precedent to any agreement (verbal or written) between the employment business/ agency and GSK. In the absence of such written authorization being obtained any actions undertaken by the employment business/agency shall be deemed to have been performed without the consent or contractual agreement of GSK. GSK shall therefore not be liable for any fees arising from such actions or any fees arising from any referrals by employment businesses/agencies in respect of the vacancies posted on this site.

It has come to our attention that the names of GlaxoSmithKline or GSK or our group companies are being used in connection with bogus job advertisements or through unsolicited emails asking candidates to make some payments for recruitment opportunities and interview. Please be advised that such advertisements and emails are not connected with the GlaxoSmithKline group in any way.

GlaxoSmithKline does not charge any fee whatsoever for recruitment process. Please do not make payments to any individuals / entities in connection with recruitment with any GlaxoSmithKline (or GSK) group company at any worldwide location. Even if they claim that the money is refundable.

If you come across unsolicited email from email addresses not ending in gsk.com or job advertisements which state that you should contact an email address that does not end in “gsk.com”, you should disregard the same and inform us by emailing askus@gsk.com, so that we can confirm to you if the job is genuine.