We are seeking a highly motivated and talented Gen AI LLM Associate. In this role, associate will play a critical part in refining and enhancing a cutting-edge Large Language Model (LLM) specifically designed to generate DB queries from natural language prompts. Your expertise in both natural language processing and database technologies will be instrumental in creating a powerful and user-friendly tool for data analysts and engineers.
Key Responsibilities:
- Model Tuning & Optimization:
- Fine-tune pre-trained LLMs on extensive DB datasets to enhance their ability to generate accurate and efficient DB queries from natural language instructions.
- Experiment with different model architectures, hyperparameters, and training techniques to optimize model performance and efficiency.
- Implement techniques to improve model robustness, address biases, and ensure the safety and reliability of generated DB queries.
- Data Preparation & Management:
- Collect, clean, and curate high-quality DB datasets for model training and evaluation.
- Develop and maintain data pipelines for efficient data ingestion, transformation, and loading.
- Explore and implement data augmentation techniques to improve model generalization and robustness.
- System Development & Integration:
- Design and implement robust and scalable systems for model deployment and inference.
- Integrate the LLM into existing data platforms and workflows.
- Develop user interfaces and APIs for seamless interaction with the LLM.
- Research & Development:
- Stay abreast of the latest advancements in natural language processing, database technologies, and LLM research.
- Conduct research and experimentation on novel approaches to DB query generation and LLM fine-tuning.
Required Qualifications:
- Strong foundation in Natural Language Processing (NLP): Deep understanding of NLP concepts, techniques, and architectures (e.g., transformers, attention mechanisms).
- Proficiency in SQL: Good knowledge of SQL syntax, semantics, and optimization techniques.
- Programming Proficiency: Strong programming skills in Python and/or Rust.
- Database Expertise: Good understanding of databases. Familiarity with database concepts such as indexing, query planning, and data warehousing.
- Distributed Systems: Experience with distributed computing frameworks like Apache Flink, Apache Spark, and cloud-based data processing services (e.g., AWS EMR, Google Dataflow).
Preferred Qualifications:
- Experience with Large Language Models (LLMs): Prior experience with fine-tuning and deploying LLMs, particularly for tasks related to code generation or natural language to code translation.
Cloud Computing Experience: Familiarity with AWS services (e.g., S3, EC2, Lambda, SageMaker