Lead Data Modeler (hadoop/hive Ecosystem) Pune Locatiopn

Year    Hyderabad, Telangana, India

Job Description

Lead Backend Data Modeler (Hadoop/Hive Ecosystem)
Exp - 8 - 14 Yrs
Location - Pune
Hybrid Model - 3 Days a week- work from office
Job Title: Lead Backend Data Modeler (Hadoop/Hive Ecosystem)
Job Summary
We are seeking a highly experienced and hands-on Lead Backend Data Modeler with 8-10+ years of dedicated experience in designing, developing, and optimizing robust and scalable data models specifically within Hadoop/Hive ecosystems. The ideal candidate will possess deep expertise in big data technologies and architecture, with a proven track record of restructuring and managing complex backend databases and their underlying infrastructure. This role demands a laser focus on designing efficient, high-performance data structures for analytics and applications, working closely with data engineers, architects, and business stakeholders to ensure data integrity, performance, and accessibility.
Key Responsibilities

  • Strategic Data Model Design & Architecture: Lead the design and development of conceptual, logical, and physical data models for large-scale data lakes and data warehouses built on Hadoop/Hive, ensuring alignment with critical business requirements and long-term architectural vision.
  • Hadoop Ecosystem Management: Oversee and drive the restructuring, optimization, and management of backend databases and the underlying Hadoop architecture, including HDFS, YARN, and related components.
  • Hive Schema & Performance Optimization: Design, implement, and maintain highly optimized Hive schemas, including tables, partitions, bucketing strategies, and indexing (e.g., using LLAP, Tez, or other Hive accelerators). Analyze and tune Hive queries for maximum performance on petabyte-scale datasets.
  • Big Data Integration & ETL/ELT: Collaborate with data engineering teams to design efficient data ingestion and transformation pipelines using PySpark for ETL/ELT processes, ensuring seamless integration of diverse data sources into the Hadoop/Hive environment.
  • Data Governance & Standards: Establish and enforce stringent data modeling standards, naming conventions, metadata management, and best practices tailored for big data environments to ensure data consistency, quality, and compliance.
  • Performance Monitoring & Tuning: Proactively monitor and analyze data pipeline and query performance within the Hadoop ecosystem, identifying bottlenecks and implementing solutions for continuous improvement.
  • Collaboration & Mentorship: Work closely with data scientists, analysts, and software engineers to understand complex data needs. Provide technical leadership and mentorship to junior team members on big data modeling and architectural principles.
  • Documentation & Knowledge Transfer: Create and maintain comprehensive documentation for all data models, architectural patterns, data dictionaries, and data lineage within the Hadoop/Hive landscape.
  • Technology Evaluation & Innovation: Stay abreast of the latest advancements in the Hadoop ecosystem and big data technologies, evaluating and recommending new tools or approaches to enhance our data platform capabilities.
  • Security & Access Control: Implement and manage robust security measures and access controls for data within Hadoop/Hive, leveraging tools like Ranger or Sentry.
Qualifications
  • Bachelor's degree in Computer Science, Information Technology, or a related quantitative field. Master's degree preferred.
  • 8-10+ years of hands-on experience in data modeling, database design, and data architecture, with a primary focus on Hadoop/Hive ecosystems and big data technologies.
  • Expert-level proficiency in HiveQL and deep understanding of Hive architecture .
  • Strong hands-on experience with Apache Hadoop (HDFS, YARN) and related components.
  • Mandatory proficiency in PySpark for data processing, transformation, and optimization.
  • Demonstrable experience in designing and implementing dimensional models, 3NF, Data Vault, or other relevant data modeling techniques for large-scale data warehouses/lakes.
  • Experience with other big data technologies such as Kafka, Spark Streaming, HBase, Impala, or Presto is highly desirable.
  • Strong understanding of data partitioning, bucketing, and file formats (Parquet, ORC, Avro) within Hadoop for performance optimization.
  • Familiarity with cloud-based big data platforms (e.g., AWS EMR, Azure HDInsight, Google Cloud Dataproc) is a significant plus.
  • Excellent analytical, problem-solving, and communication skills, with the ability to articulate complex technical concepts to non-technical stakeholders.
  • Proven ability to lead technical initiatives, manage complex projects, and drive architectural decisions.
Tech Mahindra represents the connected world, offering innovative and customer-centric information technology experiences, enabling Enterprises, Associates and the Society to Rise . We are a USD 4.9 billion company with 121,840+ professionals across 90 countries, helping over 935 global customers including Fortune 500 companies. Our convergent, digital, design experiences, innovation platforms and reusable assets connect across a number of technologies to deliver tangible business value and experiences to our stakeholders. Tech Mahindra is the highest ranked Non-U.S. company in the Forbes Global Digital 100 list (2018) and in the Forbes Fab 50 companies in Asia (2018).

Skills Required

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD5189565
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Hyderabad, Telangana, India
  • Education
    Not mentioned
  • Experience
    Year