Senior Data Engineer

We are looking for a Senior Data Engineer for a freelance assignment with a global technology company operating in the Earth observation, geospatial analytics, and risk intelligence domain. The role focuses on building and scaling a high-performance geospatial data platform that enables large-scale analytics across millions of locations.

If you are a senior data engineer with deep expertise in geospatial systems and a passion for building scalable, high-performance data platforms, we would like to hear from you.DescriptionIn this role, you will be responsible for evolving an existing geospatial data engine into a continent-scale analytics platform. You will work at the intersection of data engineering, geospatial processing, and performance optimization—building reliable, scalable data structures that power advanced analytics and machine learning use cases.A key objective is to create an environment where testing new geospatial hypotheses is fast, reliable, and reproducible, and where successful experiments can be seamlessly scaled into production-grade data pipelines.You will work closely with data scientists and engineers, ensuring frictionless access to high-quality, analysis-ready datasets while maintaining strong data integrity, lineage, and performance standards.Start date: ASAP or as agreed
Work model: Hybrid: preferred: on-site 3 days per week (Espoo)

Requirements

Master’s degree in Computer Science, Geoinformatics, or a related quantitative field (or equivalent practical experience)
5+ years of professional experience in data engineering, with strong focus on geospatial data
Proven experience designing and building analysis-ready datasets and data architectures (e.g. feature tables, star schemas, partitioned data formats)
Strong expertise in PostgreSQL / PostGIS in production environments
- Spatial indexing
- Complex spatial joins
- Performance tuning for large-scale workloads
Experience working with large-scale geospatial datasets (raster and vector) and understanding performance trade-offs between database-centric and distributed processing
Strong Python skills and ability to work with complex codebases in production settings
Hands-on experience with geospatial data formats and tooling, such as:
- Cloud Optimized GeoTIFFs (COGs)
- GeoParquet
- STAC
Solid understanding of data modeling, transformations, and data lineage, including reprojection and tiling strategies
Experience with AWS (S3, RDS/Aurora, EC2) and scaling data pipelines in cloud environments
Experience building reliable, automated data pipelines and workflows (ingestion, orchestration, monitoring)
Strong focus on data quality, consistency, and observability
Ability to deliver production-grade, testable, and maintainable code
Comfortable leveraging AI-assisted development tools (e.g. Cursor, Claude, Copilot)

Nice to have

Experience with climate, flood, or natural hazard datasets (e.g. FEMA, NOAA, USGS)
Familiarity with modern geospatial tooling (e.g. GDAL/OGR, rasterio, rioxarray)
Experience with distributed data platforms (e.g. Databricks, Delta Lake, PySpark, Unity Catalog)
Experience with Parquet / Arrow for analytical data workflows
Familiarity with Docker-based development environments and Makefile workflows

Tech Stack

PostgreSQL / PostGIS
Python
GDAL / rasterio
AWS (S3, RDS/Aurora, EC2)
Docker

Locations: Remote
Technologies: Amazon Web Services (AWS), Data Pipelines, Databricks, Docker, Machine Learning, Python, Unity

Apply