proprietary datasets of Shopify-based businesses globally
by mining the open web at scale.
This role is for someone who can go deep and operate independently:
Discover data where none is readily available
Scrape, crawl, parse, and infer business owner information
Build repeatable, automated data pipelines
Work
without paid enrichment tools
(Apollo, Hunter, etc.)
Your output will directly power our outbound growth engine.
You are
not generating leads
-- you are creating
raw, high-value business intelligence data
.
What You Will Do
1. Web Discovery & Crawling
Identify Shopify-powered stores using:
HTML, JavaScript, DNS, and theme signals
Script-based detection methods
Crawl websites at scale to extract:
Business metadata
Owner / founder signals
Contact information
Social media links
Build datasets segmented by country, niche, and company size
2. Advanced Web Scraping
Build and maintain scrapers using
Python
:
Scrapy
Playwright
Selenium
Handle:
JavaScript-heavy websites
Pagination and infinite scrolling
Rate limits and bot protection
Proxy and IP rotation
Reverse-engineer websites to extract hidden or non-obvious data
3. Data Enrichment (Logic-Based, Not Tool-Based)
Infer decision-maker identity using:
Website content analysis
Social graph signals
WHOIS and DNS records
Public mentions and references
Construct email logic through:
Pattern inference
Domain-based generation
SMTP-level validation via custom scripts
Assign confidence scores to enriched data
4. Pipeline & Automation
CSV
Databases
Google Sheets
Ensure data deduplication and freshness
5. Data Quality & Reporting
Maintain high standards for accuracy, freshness, and consistency
Create and maintain QA checks
Provide weekly reporting on:
Records created
Accuracy rate
Enrichment success
Pipeline uptime
Continuously improve scraping and enrichment efficiency
Required Skills (Non-Negotiable)Technical
Strong
Python
skills (web scraping focused)
Experience with
Scrapy, Playwright, Selenium
Solid understanding of
HTML, CSS, JavaScript
Regex, parsing, and data cleaning
Proxy and IP rotation handling
Linux basics
SQL or structured data handling
Data
Large-scale data mining experience
Deduplication and normalization techniques
Confidence scoring methodologies
Experience working with messy and unstructured data
Mindset
Builder mindset (not an operator)
Comfortable with failure, retries, and experimentation
System-level thinker
Obsessive about data quality
Ability to work independently with clear targets
Good to Have
Experience scraping the Shopify ecosystem
Large-scale crawling (100k+ domains)
Reverse-engineering JavaScript-heavy websites
CAPTCHA bypass experience
Exposure to growth, outbound, or sales intelligence data
KPIs (How You Will Be Measured)
5,000-20,000 validated records per week
(depth dependent)
Enrichment success rate:
>60%
Accuracy (QA pass rate):
>90%
Duplicate rate:
<3%
Pipeline uptime:
>95%
Cost per record: decreasing month-over-month
Automation coverage:
70%+ within 60 days
Compensation
Competitive and based on
demonstrated scraping depth and system-building ability
, not years of experience.
Interview Process
Technical screening (scraping + logic)
Take-home task: build a small scraper and explain enrichment logic
Final discussion (architecture, scalability, and approach)
Important Note
If your experience is limited to running paid tools or exporting lists, this role will
not
be a fit.
Job Type: Full-time
Pay: ?10,000.00 - ?40,000.00 per month
Work Location: Hybrid remote in Janakpuri Block C 4, Delhi
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.