Senior Data Engineer

We are looking for a Senior Data Engineer for a freelance assignment with a global technology company operating in the Earth observation, geospatial analytics, and risk intelligence domain. The role focuses on building and scaling a high-performance geospatial data platform that enables large-scale analytics across millions of locations.

If you are a senior data engineer with deep expertise in geospatial systems and a passion for building scalable, high-performance data platforms, we would like to hear from you.DescriptionIn this role, you will be responsible for evolving an existing geospatial data engine into a continent-scale analytics platform. You will work at the intersection of data engineering, geospatial processing, and performance optimization—building reliable, scalable data structures that power advanced analytics and machine learning use cases.A key objective is to create an environment where testing new geospatial hypotheses is fast, reliable, and reproducible, and where successful experiments can be seamlessly scaled into production-grade data pipelines.You will work closely with data scientists and engineers, ensuring frictionless access to high-quality, analysis-ready datasets while maintaining strong data integrity, lineage, and performance standards.Start date: ASAP or as agreed
Work model: Hybrid: preferred: on-site 3 days per week (Espoo)

Requirements
  • Master’s degree in Computer Science, Geoinformatics, or a related quantitative field (or equivalent practical experience)
  • 5+ years of professional experience in data engineering, with strong focus on geospatial data
  • Proven experience designing and building analysis-ready datasets and data architectures (e.g. feature tables, star schemas, partitioned data formats)
  • Strong expertise in PostgreSQL / PostGIS in production environments
    • Spatial indexing
    • Complex spatial joins
    • Performance tuning for large-scale workloads
  • Experience working with large-scale geospatial datasets (raster and vector) and understanding performance trade-offs between database-centric and distributed processing
  • Strong Python skills and ability to work with complex codebases in production settings
  • Hands-on experience with geospatial data formats and tooling, such as:
    • Cloud Optimized GeoTIFFs (COGs)
    • GeoParquet
    • STAC
  • Solid understanding of data modeling, transformations, and data lineage, including reprojection and tiling strategies
  • Experience with AWS (S3, RDS/Aurora, EC2) and scaling data pipelines in cloud environments
  • Experience building reliable, automated data pipelines and workflows (ingestion, orchestration, monitoring)
  • Strong focus on data quality, consistency, and observability
  • Ability to deliver production-grade, testable, and maintainable code
  • Comfortable leveraging AI-assisted development tools (e.g. Cursor, Claude, Copilot)

Nice to have
  • Experience with climate, flood, or natural hazard datasets (e.g. FEMA, NOAA, USGS)
  • Familiarity with modern geospatial tooling (e.g. GDAL/OGR, rasterio, rioxarray)
  • Experience with distributed data platforms (e.g. Databricks, Delta Lake, PySpark, Unity Catalog)
  • Experience with Parquet / Arrow for analytical data workflows
  • Familiarity with Docker-based development environments and Makefile workflows

Tech Stack
  • PostgreSQL / PostGIS
  • Python
  • GDAL / rasterio
  • AWS (S3, RDS/Aurora, EC2)
  • Docker
  • Locations: Remote
  • Technologies: Amazon Web Services (AWS), Data Pipelines, Databricks, Docker, Machine Learning, Python, Unity