Job Description - Data Engineer
Company Overview
We are an AdTech company that has built a cutting-edge Real-Time Bidding (RTB) exchange capable of handling high volumes of traffic at sub-second latency. We are revolutionizing the digital advertising industry by connecting advertisers and publishers in a highly efficient and automated manner. As we continue to scale our operations, we are seeking a skilled Data Engineer to join our team and manage the infrastructure and systems behind our RTB exchange.
Position Overview
We are seeking a skilled Data Engineer with a proven track record in building robust and high-performance ETL pipelines. The ideal candidate will have hands-on experience working with large-scale data processing frameworks and data lakes, and expertise in handling terabytes of data to deliver reliable and efficient data solutions.
Key Responsibilities
Core Duties
· Design, develop, and maintain scalable ETL pipelines for processing large datasets
· Manage and optimize data workflows for terabytes of big data to ensure high performance and reliability
· Work with Apache Spark (preferred) and/or Apache Flink to build resilient data processing solutions
· Integrate, clean, and transform data from multiple sources
· Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions
· Monitor, troubleshoot, and optimize ETL processes and data pipelines for operational excellence
· Ensure high performance, quality, and consistency across all workflows
Shared Responsibilities
· Participate in code reviews and improve the quality of the codebase
· Stay up to date with industry trends and emerging technologies
· Mentor junior engineers and contribute to team knowledge sharing
Required Qualifications
Experience Requirements
· 3+ years of experience in developing ETL pipelines using Apache Spark (preferred)/Apache Flink, Airflow, etc.
· Hands-on experience with big data caching solutions like ScyllaDB/Cassandra
· Hands-on experience with fine-tuning Spark jobs for performance optimization using spark cassandra connector
· Strong understanding of data lake architectures and experience with data lakes such as Delta Lake or similar technologies
· Proven ability to work with terabytes of data in distributed environments
Technical Skills
· Proficiency in programming languages such as Scala, Python, or Java (Scala Preferred)
· Hands-on experience with managing self-hosted solutions on bare metal servers.
· Hands-on experience with cloud platforms and big data tools like AWS S3, Azure Data Lake, or Google BigQuery (Azure Data Preferred).
· Knowledge of data modeling, data warehousing concepts, and SQL
· Familiarity with version control (e.g., Git) and CI/CD workflows for data pipelines
· Experience with data technologies like Apache Spark, Apache Flink, Apache Kafka
· Experience with data orchestration tools such as Apache Airflow
Soft Skills
· Excellent problem-solving skills
· Ability to work in collaborative, fast-paced environments