Data Scientist - GenAI & ML

Omni Inclusive14 days ago
Princeton, NJ, United States
Hybrid
Full-time
Junior Level (1-3 years)

Job Description

Must have:

  • Python Programming
  • Pytorch
  • NLP
  • ML modeling/development
  • GenAI
  • SQL
  • LLMs, RAG architecture, and agentAI frameworks

Key Responsibilities:

  • Lead use case/workstream with junior data scientists
  • Contribute to the end-to-end model lifecycle, including data exploration and understanding, feature engineering, model training and validation, ensuring quality, security, scalability, and fairness
  • Support use case development that includes initial project scoping, project/sample design, reception and processing of data, performing analysis and modeling to creation of final report/presentation
  • Data wrangling/data matching/ETL to explore a variety of data sources, gain data expertise, perform summary analyses and prepare modeling datasets
  • Utilizing advanced statistical and AI/ML techniques to create high-performing predictive models and creative analyses to address business objectives and partner needs
  • Identification of source data and data quality checks both in model/solution development and in production
  • Packaging of model/solution and deployment in cooperation with Data Engineers and MLOps
  • Implement new statistical or other mathematical methodologies as needed for specific models or analysis.
  • Propose innovative ways to look at problems through using data mining and data visualization
  • Work with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions.
  • Present information using data visualization techniques; communicate results and ideas to key decision makers.
  • Ensure data accuracy and consistent reporting by performing regular data quality control, prepare and maintain reports, and troubleshoot data anomalies
  • Adhere to model governance, documentation, testing, and other best practices in partnership with key stakeholders.
  • Consistent accuracy and thoroughness in performing work assignments
  • Attend industry conferences to stay current on industry trends, challenges, and potential market opportunities
  • Contribute to standardization of Data Science tools, processes, and best practices
  • Build LLM/AI powered application prototypes with lightweight UI (e.g., Streamlit) to validate usability and support adoption.

Required Skills:

  • PhD with 2+ years of experience, Master's degree with 4+ years of experience in Statistics, Computer Science, Engineering, Applied mathematics or related field
  • 3+ years of hands-on ML modeling/development experience
  • Background in insurance and underwriting preferred
  • Solid understanding of data analysis and statistical modeling.
  • Knowledge of a variety of machine learning techniques (clustering, decision tree, bagging/boosting artificial neural networks, etc.) and their real-world advantages/drawbacks.
  • Demonstrated track records in experimental design and executions
  • Hands-on experience with data wrangling including fuzzy matching and regular expression, distributed computing and applying parallelism to ML solutions
  • Strong programming skills in Python
  • Solid background in algorithms and a range of ML models
  • Excellent communication skills and ability to work and collaborate cross-functionally with Product, Engineering, and other disciplines at both the leadership and hands-on level
  • Excellent analytical and problem-solving abilities with superb attention to detail
  • Proven experience in providing technical leadership and mentoring to data scientists and strong project management skills with ability to monitor/track performance for enterprise success
  • Experience communicating complex ideas simply, presenting impact, trade-offs, and recommendations to non-technical partners.
  • Working knowledge of core software engineering concepts (version control with Git/GitHub, testing, logging, ...).
  • Working knowledge of NLP, LLMs, RAG architecture, and agent frameworks, including safe automation design and evaluation systems.
  • Experience in insurance, financial services, or related industries is a plus

Required Skills

RAG architecture
NLP
agentAI frameworks
Experimental design
Data visualization
GenAI
ML modeling/development
SQL
Python Programming
Algorithms
Pytorch
LLMs
ETL
Machine learning techniques
Statistical modeling
Data wrangling