We are seeking a Senior Data Scientist with deep expertise in time series forecasting, Python, and Azure cloud services. The ideal candidate will have strong hands-on experience with Databricks, MLflow, and SQL, and will be responsible for designing, building, and deploying scalable machine learning solutions. This role requires proficiency in handling large datasets, implementing MLOps best practices, and exploring advanced AI/LLM-based solutions when needed.
Key Responsibilities
Lead design and implementation of advanced time series models (SARIMAX, Prophet, XGBoost, LSTM, TFT) for forecasting and anomaly detection across large-scale datasets.
Perform deep feature engineering for time series data, including lag features, rolling statistics, seasonality decomposition, and handling irregular intervals.
Develop and optimize data pipelines using Azure Data Factory (ADF), Synapse, and ADLS Gen2 for ingestion and transformation of high-volume data.
Build and manage Databricks workflows for data preparation, model training, and deployment using PySpark and Delta Lake.
Implement MLOps practices using MLflow for experiment tracking, model registry, and lifecycle management.
Design robust backtesting strategies for time series models (rolling windows, expanding windows, cross-validation).Write efficient and maintainable Python code leveraging core concepts (OOP, type hints, performance tuning).
Develop and optimize SQL queries for analytics and reporting on large datasets.
Ensure data governance, security, and compliance across Azure services and Databricks.
Monitor and maintain production models, including drift detection, automated retraining, and alerting.
Optimize Spark jobs and cluster configurations for cost and performance efficiency.
Collaborate with cross-functional teams to translate business problems into ML solutions.
Explore and prototype LLM-based solutions (Azure OpenAI) for advanced analytics use cases.
Implement model explainability techniques (SHAP, LIME) and ensure transparency in predictions.
Required Skills
Python (Advanced)
- Core concepts, OOP, performance optimization
Time Series Modeling
- SARIMAX, Prophet, XGBoost, LSTM, TFT
Azure Services
- ADF, Synapse, ADLS Gen2
Databricks
- Jobs, Delta Lake, Unity Catalog
MLflow
- Experiment tracking, model registry, lifecycle management