This job posting is for a senior-level manager who specializes in automating the movement and transformation of data (ETL) within a banking environment.
We are seeking a highly skilled and self-driven
Data Testing Architect
to oversee and own design, build, and deploy of scalable ETL pipelines across hybrid environments including Cloudera Hadoop, Red Hat OpenShift, and AWS Cloud. This role focuses on developing robust PySpark-based data processing solutions, building testing frameworks for ETL jobs, and leveraging containerization and orchestration platforms like Docker and AWS EKS for scalable workloads.
You will be responsible for automating ETL processes, integrating with data lakes and data warehouses, managing large datasets efficiently, and ensuring reliable data delivery through CI/CD-enabled workflows.
What You'll Do (Developer Focus):
Build Data Pipelines:
Create testing solutions to extract data from various sources (like databases and data lakes), clean and transform it, and load it into target systems
Testing and Validation:
Develop automated tests to ensure the data pipelines are working correctly and the data is accurate. This is like quality control, making sure everything meets the bank's standards
Containerization and Orchestration:
Package these data pipelines into containers (using Docker) and manage their execution using orchestration tools (like AWS EKS)
Cloud Integration:
Work with various cloud services (like AWS S3, Lambda, and Airflow) for data storage, processing, and scheduling
Test Data Management
- Oversee test data strategies and environment simulations for scalable, reliable automation. Experience with synthetic data generation
Build and maintain
ETL validation and testing scripts
that run on
Red Hat OpenShift
containers
Work with
Hive, HDFS, and Oracle
data sources to extract, transform, and load large-scale datasets
Develop
Dockerfiles
and create container images for PySpark jobs
Deploy and orchestrate ETL jobs using
AWS EKS (Elastic Kubernetes Service)
and integrate them into workflows
Leverage
AWS services
such as
S3, Lambda, and Airflow
for data ingestion, event-driven processing, and orchestration
Design and develop PySpark-based ETL pipelines on
Cloudera Hadoop platform
Create reusable frameworks, libraries, and templates to accelerate automation and testing of ETL jobs
Participate in code reviews,
CI/CD pipelines
, and maintain best practices in Spark and cloud-native development
Ensures tooling can be run in CICD providing real-time on demand test execution shortening the feedback loop to fully support Handsfree execution
Regression, Integration, Sanity testing - provide solutions and ensures timely completion
What You'll Do (Lead Focus):
Team Management:
Lead a team of automation professionals, guiding them on projects and helping them develop their skills
Own and maintain automation best practices and educates team via meetings, Demos and Q&A sessions
Ensures new utilities are documented and transitioned to testers for execution and supports for troubleshooting in case required
Strategy and Planning:
Define the overall strategy for automating data processes and testing, ensuring it aligns with the bank's goals
Lead initiatives related to automation on Data & Analytics testing requirements for process and product rollout into production
Tooling and Innovation:
Research and implement new automation tools and techniques, including AI and machine learning, low-code solutions to improve efficiency
Design and develop integrated portal to consolidate utilities and cater to user needs
Collaboration:
Work closely with other teams and partners to ensure smooth data operations and meet regulatory requirements.
Cross team collaboration to ensure automated solutions are provided and can be run self sufficient
Works with Business/Stakeholders to insure proper test coverage and Incident analysis and prevention
Reporting and Metrics:
Track key performance indicators (KPIs) related to automation for entire D&A team and report on progress to leadership.
Automation ROI Analysis: Measure the impact of automation productivity, quality and cost; adjust strategy based on data
Provides SMT forward looking agenda, plans, improvements, measured progress
Monitors and reviews code check-ins from the Team and helps maintain project repository
Skillset:
12-15 years of experience on data platform testing across data lineage especially with knowledge of regulatory compliance and risk management
Detailed knowledge data flows in relational database and Bigdata (Familiarity with Hadoop (a platform for processing massive datasets)).
Selenium BDD Cucumber using Java, Python
Strong experience with Python
broader understanding for batch and stream processing deploying PySpark workloads to AWS EKS (Kubernetes)
Proficiency in testing on Cloudera Hadoop ecosystem (HDFS, Hive) and AWS
Hands-on experience with ETL
Strong knowledge of Oracle SQL and HiveQL
Solid understanding of AWS services like S3, Lambda, EKS, Airflow, and IAM
Understanding of architecture on cloud with S3, Lamda, Airflow DAGs to orchestrate ETL jobs
Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI)
Scripting knowledge in Python
Version Control: GIT, Bitbucket, GitHub
Experience on BI reports validations e.g., Tableau dashboards and views validation
Strong understanding of Wealth domain, data regulatory & governance for APAC, EMEA and NAM
Strong problem-solving and debugging skills
Excellent communication and collaboration abilities to lead and mentor a large techno-functional team across different geographical locations
Manage global teams and ability to support multiple time zones
Strong financial Acumen and great presentation skills
Able to work in an Agile environment and deliver results independently
-
Job Family Group:
Technology
-
Job Family:
Technology Quality
-
Time Type:
Full time
-
Most Relevant Skills
Please see the requirements listed above.
-
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.
-
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi's EEO Policy Statement and the Know Your Rights poster.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.
Job Detail
Job Id
JD3795735
Industry
Not mentioned
Total Positions
1
Job Type:
Full Time
Salary:
Not mentioned
Employment Status
Permanent
Job Location
TN, IN, India
Education
Not mentioned
Experience
Year
Apply For This Job
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.