Job Title: Sr. Data Scientist
Location: Arlington, VA – Hybrid Remote (Occasional travel to SCIF Locations in Arlington, Falls Church, Alexandria, or Pentagon)
Eligibility: Candidate must possess an active TS/SCI Clearance
Job Description:
Major Duties/Tasks:
- Designs, configures, develops, tests, and supports informatics and data science solutions for a wide array of technical use cases.
- Collaborate with cross-functional teams, including data scientists and software engineers to integrate AI solutions developed by other elements of the DoD community into Search Portfolio products when appropriate.
- Optimize AI models for performance, scalability, and efficiency, leveraging cloud-based resources and distributed computing frameworks, specifically Apache Spark/Databricks.
- Ability to adapt code base to also run using GPU enabled Kubernetes clusters.
- Stay updated on and contribute to the latest advancements in AI research, applying new findings to improve Search Portfolio products.
- Manage the lifecycle of AI/ML components used in Search Portfolio products from research and development to deployment and optimization.
- Applies analytical methodologies to diagnose data-related challenges, implement solutions, and evaluate performance.
- Documents and presents requirements, design alternatives, and findings to team members and clients.
- Ability to develop strategic, baselined, data modeling processes; ability to accurately determine cause-and-effect relationships.
- Experience with integrated development environments, data integration, data visualization, data mining, and analysis tools.
- Maintains and guides the development of common libraries and tools used by multiple teams.
- Aids in formulating a strategy on how to achieve rapid prototyping.
Required Education:
- Bachelor’s degree plus 7-10 years’ experience or a master’s degree plus 5 years of experience.
Clearance:
- Possess a minimum of an active Top Secret (TS) security clearance with Sensitive Compartmented Information (SCI) eligibility.
Required Skills/Experience:
- Experience with Machine Learning - ML (natural language processing, computer vision, statistical learning theory, …).
- Hands-on experience with Natural Language Processing (NLP), Large Language Models, text embedding, semantic query, use of generative AI for text, and retrieval augmented generation (RAG).
- Familiarity with data preprocessing, feature engineering, and model evaluation techniques essential for machine learning projects.
- Strong understanding of various ML algorithms, including supervised and unsupervised learning, reinforcement learning, and neural networks.
- Experience with version control systems like Git, enabling effective collaboration and code management.
- Experience in an ML engineer or data scientist role building ML models.
- Experience writing code in Python, R, Scala, Java, C++ with documentation for reproducibility.
- Experience using Apache Spark/Databricks distributed compute environments for AI/ML workloads.
- Experience handling petabyte size datasets, diving into data to discover hidden patterns, using data visualization tools, writing SQL, and working with GPUs to develop models.
- Experience with cloud-based data persistence products, especially RDS PostgreSQL and PostgreSQL extensions such as pgvector.
- Experience persisting vectorized data from text embedding processes using Elastic and/or OpenSearch, in addition to vector enabled RDBMS like pgvector enhanced PostgreSQL.
- Experience writing and speaking about technical concepts to business, technical, and lay audiences and giving data-driven presentations.
#J-18808-Ljbffr