We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Data Engineer (Databricks/AI integration)
Ework Group - founded in 2000, listed on Nasdaq Stockholm, with around 13,000 independent professionals on assignment - we are the total talent solutions provider who partners with clients, in both the private and public sector, and professionals to create sustainable talent supply chains.
With a focus on IT/OT, R&D, Engineering and Business Development, we deliver sustainable value through a holistic and independent approach to total talent management. By providing comprehensive talent solutions, combined with vast industry experience and excellence in execution, we form successful collaborations. We bridge clients and partners & professionals throughout the talent supply chain, for the benefit of individuals, organizations and society.
For our client, one of the Global Pharmaceutical Company, we realize a recruitment process for Data Engineer role.
Role Overview:
You will combine clinical data expertise with strong data engineering and technical skills to generate well documented pipelines from source to curated data sets in common data models like CDISC SDTM. You will collaborate closely with clinical SMEs, data scientists, infrastructure, and other skilled data engineers. We are looking to expand this functionality to include Real World Data (from a broad range of registries). You will help extend our medallion Databricks pipelines (CDISC SDTM) to incorporate Real-World Data (RWD) from registries and other sources, working with clinical experts and AI teams to combine rule-based and automated mapping approaches (including OMOP interoperability). Responsibilities: Design, build and maintain production ETL pipelines in Databricks/Delta Lake to ingest RWD (registries, claims, EHR extracts) and transform into standard models. Implement harmonisation workflows to map incoming RWD to OMOP and to the internal CDISC SDTM canonical model; handle vocabulary mapping, units normalization and provenance. Extend the medallion architecture (bronze/silver/gold) patterns with robust validation, lineage, partitioning and performance tuning. Develop configurable, input-driven transformation frameworks so clinical experts can drive mapping rules via config files and catalogs. Integrate AI/automation components (e.g., model-assisted mapping, NLP for free text) with human-in-the-loop review and confidence scoring. Establish testing, CI/CD, monitoring and alerting for ETL jobs and automations; ensure reproducibility, versioning and governance. Collaborate with clinical data scientists, data stewards and stakeholders to define requirements, data contracts and success metrics. Requirements:
- Higher Education level within IT or similar is prefered
- At least 5 years of experience as Data Engineer
- Having hands-on experience with DataBricks, Python/SQL and Spark
- Knowledge of cloud environment - AWS or Azure
- Proven experience designing and implementing ETL pipelines in Databricks / Spark and Delta Lake.
- Strong knowledge of OMOP CDM and experience mapping datasets to OMOP; familiarity with CDISC SDTM is a plus.
- Expertise in data modelling, partitioning, performance tuning, and best practices for large clinical/RWD datasets.
- Experience with vocabulary services and terminology mapping (OHDSI/Athena, UMLS, or similar).
- Experience integrating AI/NLP components into data pipelines (entity extraction, mapping suggestions) is desirable.
- Familiarity with testing frameworks for data (Great Expectations, Deequ), CI/CD, infrastructure as code, and orchestration tools (Databricks Jobs, Airflow).
- Fluency in English both written and spoken
- Nice to have: prior experience in the pharmaceutical sector or clinical research environments; knowledge of data governance, privacy regulations and secure handling of patient data.
We offer:
- B2B agreement
- Transparent working conditions with both Ework and the client
- Current support during our cooperation
- Possibility to work in an international environment
- Collaborative environment in Swedish organizational culture
- Private medical care
- Life insurance
- Multisport
- Teambuilding events
Contact person: mateusz.jozefiak@eworkgroup.com
Clietn code: HNN01
Do you know someone who would fit this position? Recommend a candidate by sending her/his CV to: polecenia@eworkgroup.com.
Whistleblowing Policy, which provides guidelines for reporting misconduct can be found on Ework website: https://www.eworkgroup.com/about-us/our-responsibility
- Locations: Remote
- Technologies: Amazon Web Services (AWS), Azure, Databricks, ETL, Python, SQL, Spark
- Language: English, Polish