Job Title: Sr. Data Engineer
Location: San Francisco, CA (Hybrid)
Work Type: Full Time
Job Description :-
- Create and maintain optimal data pipeline architecture
- Build data pipelines that transform raw, unstructured data into formats that data analyst can use to for analysis
- Assemble large, complex data sets that meet functional / non-functional business requirements
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and delivery of data from a wide variety of data sources using SQL and AWS Big Data technologies
- Work with stakeholders including the Executive, Product, Engineering, and program teams to assist with data-related technical issues and support their data infrastructure needs.
- Develop and maintain scalable data pipelines and builds out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using scalable distributed Data technologies
- Implements processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it
- Write unit/integration tests, adopt Test-driven development, contribute to engineering wiki, and document work
- Performs root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
About you:
- 6+ yrs experience and bachelor’s degree in computer science, Informatics, Information Systems or a related field; or equivalent work experience
- In-depth working experience of distributed systems Hadoop/MapReduce, Spark, Hive, Kafka and Oozie/Airflow
- At least 5 years of solid production quality coding experience in data pipeline implementation in Java, Scala and Python
- Experience with AWS cloud services: EC2, EMR, RDS
- Experience in GIT, JIRA, Jenkins, Shell scripting
- Familiar with Agile methodology, test-driven development, source control management and test automation
- Experience supporting and working with cross-functional teams in a dynamic environment
- You're passionate about data and building efficient data pipelines
- You have excellent listening skills and are empathetic to others
- You believe in simple and elegant solutions and give paramount importance to quality You have a track record of building fast, reliable, and high-quality data pipelines
Nice to have skills:
- Experience building Marketing Data pipelines including Direct Mail will be a big plus
- Experience with Snowflake and Salesforce Marketing Cloud
- Working knowledge of open-source ML frameworks and end-to-end model development life cycle
- Previous working experience with running containers (Docker/LXC) in a production environment using one of the container orchestration services.