We’re on an exciting journey with our client and we want you to join us. With our client, you will be
exposed to the latest technologies and work with some of the brightest minds in the industry.
Our client is Healthcare Company so you will be playing a key role as a Python and PySpark Developer , who can assist with the below:
Job Summary:
We are seeking a highly skilled and experienced Python and PySpark Developer to join our team. The ideal candidate will be responsible for designing, developing, and optimizing big data pipelines and solutions using Python, PySpark, and distributed computing frameworks. This role involves working closely with data engineers, data scientists, and business stakeholders to process, analyze, and derive insights from large-scale datasets.
Key Responsibilities:
Data Engineering & Development:
- Design and implement scalable data pipelines using PySpark and other big data frameworks.
- Develop reusable and efficient code for data extraction, transformation, and loading (ETL).
- Optimize data workflows for performance and cost efficiency.
Data Analysis & Processing:
- Process and analyze structured and unstructured datasets.
- Build and maintain data lakes, data warehouses, and other storage solutions.
Collaboration & Problem Solving:
- Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions.
- Troubleshoot and resolve performance bottlenecks in big data pipelines.
Code Quality & Documentation:
- Write clean, maintainable, and well-documented code.
- Ensure compliance with data governance and security policies.
Required Skills & Qualifications:
Programming Skills:
- Proficient in Python with experience in data processing libraries like Pandas and NumPy.
- Strong experience with PySpark and Apache Spark.
Big Data & Cloud:
- Hands-on experience with big data platforms such as Hadoop, Databricks, or similar.
- Familiarity with cloud services like AWS (EMR, S3), Azure (Data Lake, Synapse), or Google Cloud (BigQuery, Dataflow).
Database Expertise:
- Strong knowledge of SQL and NoSQL databases.
- Experience working with relational databases like PostgreSQL, MySQL, or Oracle.
Data Workflow Tools:
- Experience with workflow orchestration tools like Apache Airflow or similar.
Problem Solving & Communication:
- Ability to solve complex data engineering problems efficiently.
- Strong communication skills to work effectively in a collaborative environment.
Preferred Qualifications:
- Knowledge of data Lakehouse architectures and frameworks.
- Familiarity with machine learning pipelines and integration.
- Experience in CI/CD tools and DevOps practices for data workflows.
- Certification in Spark, Python, or cloud platforms is a plus.
Education:
Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.