We are hiring for our client in the biotechnology industry.
Duration: 6 months (extendable)
100% onsite.
Responsibilities:
- Collaborate with stakeholders to understand data requirements for machine learning, data science, and analytics projects.
- Assemble large, complex data sets from disparate sources, writing code, scripts, and queries to efficiently extract, quality check, clean, harmonize, and visualize Big Data sets.
- Write pipelines for optimal extraction, transformation, and loading of data from a wide variety of sources using Python, SQL, Spark, and AWS big data technologies.
- Develop and design data schemas to support data science team development needs.
- Identify, design, and implement continuous process improvements, such as automating manual processes and optimizing data delivery.
- Design, develop, and maintain a dedicated machine learning inference pipeline on the AWS platform using SageMaker, EC2, and related services.
- Deploy inference pipelines on dedicated EC2 instances or Amazon SageMaker.
- Establish data pipelines to store and maintain inference output results, tracking model performance and key performance indicators (KPIs).
- Document data processes, write recommended procedures, and create training materials related to data management best practices.
Requirements:
- Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or equivalent experience.
- 5-7 years of experience developing and deploying data and machine learning pipelines.
- 5 years of experience deploying machine learning models via AWS SageMaker and AWS Bedrock.
- Proficiency in programming and scripting with Python, SQL, and Spark.
- Deep knowledge of AWS core services, including RDS, S3, API Gateway, EC2/ECS, Lambda, and more.
- Hands-on experience with model monitoring, drift detection, and automated retraining processes.
- Experience with CI/CD pipeline implementation using tools such as GitHub (Workflows and Actions), Docker, Kubernetes, Jenkins, and Blue Ocean.
- Familiarity with Agile/Scrum-based software development structures.
- 5 years of experience with data visualization and/or API development for data science users.