Data Engineer — Big Data & Cloud Infrastructure
🌍 Remote | 🕐 Full-time
We are looking for a Data Engineer for our client the US technology company, specializing in digital solutions for roadway and infrastructure management. Their platforms provide government agencies and engineering firms with integrated tools for real-time pavement performance analytics, facilitating a proactive approach to roadway maintenance.
We expect from our candidates:
- 5+ years of experience in data engineering or big data roles
- Expert knowledge of Apache Spark (Core, SQL, DataFrame APIs)
- Hands-on experience with Apache Iceberg and Apache Parquet
- Proficiency with Databricks and Amazon EMR
- Strong experience in deploying and managing Spark clusters on open-source Kubernetes is a must!
- Deep understanding of Docker and containerization for Spark applications
- Experience with Spark on K8s operator, resource scaling, and job orchestration
- Solid programming skills in Java and Python
- Good problem-solving skills with a focus on performance optimization
- Experience with distributed systems and cloud-native infrastructure
- English — B2+ level
Nice to have:
- Experience with Flyte workflow orchestration
- Familiarity with AWS, Azure, or GCP
- Knowledge of CI/CD pipelines for data deployments
- Understanding of monitoring/observability tools for distributed systems
What you will be doing:
- Design and implement scalable data processing pipelines with Apache Spark
- Deploy and manage Spark clusters on open-source Kubernetes infrastructure
- Optimize data storage and access with Iceberg and Parquet formats
- Collaborate with data scientists and engineers to productionize ML workflows
- Apply best practices for building and running containerized big data workloads
- Contribute to a platform used for real-time analysis of transportation data and mobility optimization
📩 Interested?
We look forward to hearing from you!