*4-5 years of hands-on experience in applied data science, machine learning, or analytics (industry or research with applied projects).
*Degree in Data Science, Computer Science, Statistics, Engineering, Biostatistics, Chemistry, or related quantitative field (Bachelor's minimum; Master's preferred).
*Strong Python skills (pandas, scikit-learn, numpy); working knowledge of SQL.
Practical experience with time-series/sensor data or tabular modeling in production-like settings.
*Experience with at least one deep learning framework (PyTorch or TensorFlow) for applied tasks.
Demonstrated ability to move from problem definition to prototype and present results to stakeholders.
*Good documentation practices, basic testing, and reproducible analysis (notebooks + script refactors).
*Clear communication skills and ability to work in cross-functional teams.
*Willingness to work in regulated environments and follow documentation/validation processes
Technical Skills:
Languages: Python (pandas, scikit-learn, xgboost/lightgbm), SQL
*DL: PyTorch or TensorFlow/Keras
*Storage / Viz: S3 / object store, Postgres, Tableau / Power BI / Plotly
*Tools: Git, Jupyter / VS Code, basic Linux shell
Additional Advantage if:
*Prior experience in pharma, biotech, CMO/CDMO, manufacturing, or other regulated industries.
*Familiarity with MES, LIMS, ELN, PAT, or industrial IoT data sources.
Experience with MLOps basics (Docker, simple CI/CD, MLflow, Airflow) or production handoffs.
*Knowledge of multivariate statistical process control (MSPC), DOE, chemometrics, or Six Sigma concepts.
*Exposure to LLMs and prompt engineering for knowledge extraction, summarization or augmentation of domain content (SOPs, batch records).
*Experience with cloud platforms (AWS/Azure/GCP) and data platforms (Snowflake, Redshift, BigQuery).
*Understanding of model explainability (SHAP, LIME) and uncertainty quantification techniques.