Data Analyst

Year    KA, IN, India

Job Description

About the opportunity



We are looking for a Data Analyst to join the team that powers search, AI Assistant, and AI agents in our Dayforce product. This role is central to ensuring that the data behind our AI experiences is clean, trustworthy, well-organized, and ready for use.



You will work with large and complex datasets, own data quality and governance for key domains, and enable data-driven decisions across the product and engineering teams. The ideal candidate has strong analytical skills, deep hands-on experience with data preparation and management, and a passion for turning messy, real-world data into reliable, usable assets.



Your impact will be visible across the full data lifecycle:

from ingestion and cleanup, modeling and documentation, to reporting and insight generation. You will collaborate closely with product managers, software engineers, data scientists, and business stakeholders to ensure that the right data is available, accurate, and actionable.



What you'll get to do



Data Annotation & Labeling: Annotate, tag, and label large volumes of data (such as text, images, or audio) according to predefined guidelines to create "ground truth" datasets for machine learning. Ensure labels are accurate, consistent, and meet quality standards . This includes reviewing and correcting labels, performing quality assurance checks, and refining the labeling process over time. Data Augmentation & Validation: Apply data augmentation techniques to increase the diversity and volume of training data--such as generating synthetic examples or transforming existing data--while maintaining data integrity. In collaboration with cross-functional teams across Dayforce, including Workforce Management (WFM), Payroll, Scheduling, Learning, and other product areas, perform data validation and error-checking routines to detect anomalies or inconsistencies. This ensures that datasets used for model training are accurate, representative, and free of issues that could negatively impact model performance. Data & Training Pipeline Automation: Design, implement, and own end-to-end data pipelines that move data from raw sources through labeling, validation, and preprocessing into training-ready datasets. This includes writing and maintaining Python-based automation using Databricks to ingest, clean, label, version, and store data in well-structured formats that ensure reproducibility and traceability. You will also own the automation of model training workflows, ensuring that newly labeled or updated datasets seamlessly trigger retraining jobs. In this role, you will monitor pipeline execution, troubleshoot data- and pipeline-related failures, and work closely with ML engineers to define clean interfaces between data pipelines and training systems, while keeping model design and evaluation out of scope. Workflow Improvement: Continuously evaluate and improve the data labeling workflow. Provide feedback on labeling tools and processes to increase efficiency - for example, suggesting better annotation tools or semi-automated labeling approaches. You may help develop simple utilities or scripts to assist annotators (e.g. automation for repetitive labeling tasks or active learning integration to prioritize labeling the most informative data). You will also document guidelines, edge cases, and best practices for the labeling process, and ensure knowledge transfer and training for collegues assisting with annotation.

Skills and experience we value



Experience: 2+ years of relevant experience such as data labeling/annotation, data quality, or data engineering roles. Proven track record of working with large datasets and following detailed data annotation guidelines. Mid-level understanding of machine learning data needs (e.g., basics of supervised learning and why consistent labeling matters). Technical Skills: Proficiency in Python for data manipulation and scripting automation (pandas, NumPy, etc.). Experience with data processing platforms or notebooks such as Databricks (or similar tools like Jupyter, Spark) to handle big data workflows. Familiarity with data labeling tools and the ability to quickly learn new annotation software . Comfortable with using version control (Git) and, ideally, data versioning tools for datasets. Data Management: Solid understanding of data management best practices - including data cleaning, validation, and augmentation techniques. Ability to implement data quality checks and troubleshoot data issues in a pipeline. Familiarity with the concept of data versioning and reproducible data pipelines. Attention to Detail: Excellent attention to detail and a methodical approach to tasks. You must be able to maintain high accuracy in labeling data, catching inconsistencies or errors (your meticulous work will directly affect model outcomes). An eye for consistency and patience for repetitive tasks when necessary are essential traits for this role. Organizational & Independence: Strong organizational skills to manage and prioritize multiple datasets, versions, and pipeline tasks. Ability to work independently and take initiative in improving processes - we expect you to be a self-starter who can manage the end-to-end data prep workflow with minimal supervision Communication: Good communication and collaboration skills. Capable of documenting guidelines clearly and discussing requirements or issues with the engineering team. You should be comfortable providing feedback and raising questions when instructions are unclear, as well as mentoring junior data labelers or coordinating with any external labeling support if needed.

What would make you really stand out



Education: Bachelor's degree in Computer Science, Data Science, Information Systems, or a related field is preferred (or equivalent practical experience). MLOps/Automation: Experience with MLOps or pipeline automation tools. For example, familiarity with ML workflow orchestration (such as MLflow, Airflow, or Databricks ML pipelines) and continuous integration/continuous deployment (CI/CD) practices for data or models. Experience setting up automated training pipelines in a cloud environment is a strong plus. Advanced Tools: Exposure to data versioning tools, data augmentation libraries, or active learning frameworks. Experience with any auto-labeling techniques or using AI to assist labeling is a bonus. * Quality Focus: Experience in a data quality or data curation role. Past involvement in setting up labeling quality assurance processes (such as consensus labeling, review workflows, or calibration sessions) is beneficial, as it shows you know how to maintain high annotation standards at scale.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD5099444
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    KA, IN, India
  • Education
    Not mentioned
  • Experience
    Year