Lead - Data Scientist - NLP & Gen AI

5 - 8 Years

Hyderabad /Bangalore

Posted 4 months ago

#NLP #Data Scientist #Data Science #Data Management #SQL #Python #Statistics #Artificial Intelligence

About the Lead Data Scientist (NLP & GenAI) Job Role

You will help our clients solve real-world problems by tracing the data-to-insights lifecycle:

- Understand business problems, making sense of the data landscape & footprint, performing a combination of Gen AI , Advanced NLP, exploratory analysis

- Create, experiment with and deliver innovative solutions in a consultative mindset to client stakeholders using textual data

- Guide team of data scientists to offer exceptional solutions to clients, across domains.

Work Location: Hyderabad/Bangalore (Hybrid mode with 3 days in office)

Qualification and experience for the Lead Data Scientist (NLP) Role:

- Background in Computer Science/Computer Applications or any quantitative discipline (Statistics, Mathematics, Economics/Operations Research etc.) from a reputed institute.

- 5-8 years of experience using analytical tools/languages like Python on large-scale data

- Must have Semantic model & NER experience

- Experience working with pre-trained models, awareness of state-of-art in embeddings and applicability for use cases

- Must have strong experience in NLP/NLG/NLU applications using any popular Deep learning frameworks like PyTorch, Tensor Flow, BERT, GPT (or similar models)

- Demonstrated ability to engage with client stakeholders at multiple levels and provide consultative solutions across different domains

- Deep knowledge of techniques such as Linear Regression, gradient descent, Logistic Regression, Forecasting, Cluster analysis, Decision trees, Linear Optimization, Text Mining

- Strong understanding of integrating NLP models into business workflows. Prospect should have exposure to project initiation to business impact creation in at least one project.

Experience in productionizing & retraining models:

- Ability to guide and mentor teams of associates on solution development and approaches

- Broad knowledge of fundamentals and state-of-the-art in NLP and machine learning

- Coding skills in one or more programming languages such as Python, SQL

- Expert / high level of understanding on language semantic concepts & data standardization

- Proven track record of successful models and practical implementation

- Experience in training transformer-based language models and their variants (T5, BART, BERT etc)

- Knowledge of transformer architecture and the impacts of modifying the same

- Familiar with multiple evaluation metrics fore LLMs

- Experience with Huggingface, Langchain etc., building the pipelines

- Experience with Vector DBs, Text embedding models

- Different prompting templates Zero-shot, Few-shot, Composition etc.

- LLM In-context learning , Fine tuning, Model evaluation metrics etc.

- Text pre/post - processing techniques

- Experience in using GPUs to train deep learning models

- Good knowledge of solving industrial problems using deep learning models with NLP-related use-cases

- Familiar with all prompting techniques

- Hands-on experience with popular ML frameworks such as Pytorch (must), TensorFlow

- Experience with Production deployment of LLM solutions

- Building scalable LLM solutions

- Familiarity with any Cloud services such as Azure ML studio, AWS Sage Maker etc. is considered a plus

- Knowledge in Machine Learning techniques in entity resolution, common speech products or text search domain