Job DescriptionJob Title: PySpark Data Reconciliation EngineerSummary:We're seeking a skilled
PySpark Data Reconciliation Engineer to join our team and drive the development of robust data reconciliation solutions within our financial systems. You will be responsible for designing, implementing, and maintaining PySpark-based applications to perform complex data reconciliations, identify and resolve discrepancies, and automate data matching processes. The ideal candidate possesses strong PySpark development skills, experience with data reconciliation techniques, and the ability to integrate with diverse data sources and rules engines.
Key Responsibilities:Data Reconciliation Development: - Design, develop, and test PySpark-based applications to automate data reconciliation processes across various financial data sources, including relational databases, NoSQL databases, batch files, and real-time data streams.
- Implement efficient data transformation, matching algorithms (deterministic and heuristic) using PySpark and relevant big data frameworks.
- Develop robust error handling and exception management mechanisms to ensure data integrity and system resilience within Spark jobs.
Data Analysis and Matching: - Collaborate with business analysts and data architects to understand data requirements and matching criteria.
- Analyze and interpret data structures, formats, and relationships to implement effective data matching algorithms using PySpark.
- Work with distributed datasets in Spark, ensuring optimal performance for large-scale data reconciliation.
Rules Engine Integration: - Integrate PySpark applications with rules engines (e.g., Drools) or equivalent to implement and execute complex data matching rules.
- Develop PySpark code to interact with the rules engine, manage rule execution, and handle rule-based decision-making.
Problem Solving and Gap Analysis: - Collaborate with cross-functional teams to identify and analyze data gaps and inconsistencies between systems.
- Design and develop PySpark-based solutions to address data integration challenges and ensure data quality.
- Contribute to the development of data governance and quality frameworks within the organization.
Qualifications and Skills: - Bachelor's degree in Computer Science or a related field.
- 5+ years of hands-on experience in big data development, preferably with exposure to data-intensive applications.
- Strong understanding of data reconciliation principles, techniques, and best practices.
- Proficiency in PySpark, Apache Spark, and related big data technologies for data processing and integration.
- Experience with rules engine integration and development
- Strong analytical and problem-solving skills, with the ability to translate business requirements into technical solutions.
- Excellent communication and collaboration skills to work effectively with business analysts, data architects, and other team members.
- Familiarity with data streaming platforms (e.g., Kafka, Kinesis) and big data technologies (e.g., Hadoop, Hive, HBase) is a plus.