Job Summary :
We are seeking a highly skilled and experienced Lead Data Engineer to join our dynamic team. The ideal candidate will bring a wealth of expertise in designing, developing, and implementing data pipelines, cloud solutions, and ETL processes. With a proven track record in managing large-scale data warehouses, the candidate should demonstrate proficiency in cloud platforms like Azure and AWS, as well as real-time data processing technologies.
Key Responsibilities :
- Design and implement scalable data pipelines using PySpark, Apache Kafka, AWS Glue, and Azure Data Factory to process and transform data from diverse sources.
- Develop ETL workflows to ingest and process large-scale data into Snowflake, Azure Synapse Analytics, and other data warehouses.
- Leverage cloud-based platforms (Azure Data Lake, AWS S3, Databricks) for data storage, processing, and transformations.
- Optimize database performance and implement data modeling techniques such as Star Schema, Snowflake modeling, and slowly changing dimensions.
- Build real-time data processing solutions utilizing Apache Kafka, Spark Streaming, and other streaming technologies.
- Create and maintain pipelines for data synchronization between systems such as Salesforce, Snowflake, and other platforms using APIs, Amazon AppFlow, and Informatica tools.
- Develop and maintain dashboards and reports using Power BI and Tableau for business insights.
- Implement and manage CI/CD pipelines using Azure DevOps for data pipelines.
- Collaborate with cross-functional teams to design and develop end-to-end data solutions, including storage, integration, and visualization.
- Perform data profiling, quality checks, and transformations using Informatica IDQ and other data quality tools.
Required Skills and Qualifications :
- 12+ years of IT experience, with expertise in data engineering, data modeling, and cloud-based solutions.
- Proficiency in tools and platforms such as Informatica (PowerCenter, IICS), Snowflake, AWS Glue, Azure Data Factory, and Databricks.
- Strong programming skills in Python, Java, Scala, and experience with frameworks like PySpark.
- Expertise in real-time data processing using Kafka, Spark Streaming, and AWS Lambda.
- In-depth knowledge of RDBMS and NoSQL databases such as Oracle, SQL Server, PostgreSQL, MongoDB, and DynamoDB.
- Hands-on experience in building and managing data pipelines and workflows in Azure and AWS environments.
- Experience in reporting tools such as Power BI, Tableau, and QlikView.
- Knowledge of data governance, data quality, and MDM using Informatica tools.
- Familiarity with automation and scheduling tools like Apache Airflow and Control-M.
- Strong problem-solving and analytical skills with the ability to manage multiple tasks.
Preferred Qualifications :
- Experience in the Rail Transportation, Healthcare, or Banking domains.
- AWS or Azure certification is a plus.
- Familiarity with modern data warehousing techniques and frameworks like Apache Iceberg.
Education :
- Bachelor’s degree in Computer Science, Engineering, or a related field.