Data Mining Analyst

Year    DL, IN, India

Job Description

Role Overview

We are hiring a

Web Data Analyst (B2B Data Mining & Scraping)

to build

proprietary datasets of Shopify-based businesses globally

by mining the open web at scale.

This role is for someone who can go deep and operate independently:

Discover data where none is readily available Scrape, crawl, parse, and infer business owner information Build repeatable, automated data pipelines Work

without paid enrichment tools

(Apollo, Hunter, etc.)
Your output will directly power our outbound growth engine.
You are

not generating leads

-- you are creating

raw, high-value business intelligence data

.

What You Will Do
1. Web Discovery & Crawling

Identify Shopify-powered stores using: HTML, JavaScript, DNS, and theme signals Script-based detection methods
Crawl websites at scale to extract:

Business metadata Owner / founder signals Contact information Social media links Build datasets segmented by country, niche, and company size
2. Advanced Web Scraping

Build and maintain scrapers using

Python

: Scrapy Playwright Selenium
Handle:

JavaScript-heavy websites Pagination and infinite scrolling Rate limits and bot protection Proxy and IP rotation Reverse-engineer websites to extract hidden or non-obvious data
3. Data Enrichment (Logic-Based, Not Tool-Based)

Infer decision-maker identity using: Website content analysis Social graph signals WHOIS and DNS records Public mentions and references
Construct email logic through:

Pattern inference Domain-based generation SMTP-level validation via custom scripts Assign confidence scores to enriched data
4. Pipeline & Automation

Build end-to-end pipelines: Discover ? Scrape ? Parse ? Enrich ? Validate ? Store Automate workflows using: Cron jobs Python scripts
Maintain structured storage:

CSV Databases Google Sheets Ensure data deduplication and freshness
5. Data Quality & Reporting

Maintain high standards for accuracy, freshness, and consistency Create and maintain QA checks
Provide weekly reporting on:

Records created Accuracy rate Enrichment success Pipeline uptime Continuously improve scraping and enrichment efficiency
Required Skills (Non-Negotiable)Technical

Strong

Python

skills (web scraping focused) Experience with

Scrapy, Playwright, Selenium

Solid understanding of

HTML, CSS, JavaScript

Regex, parsing, and data cleaning Proxy and IP rotation handling Linux basics SQL or structured data handling
Data

Large-scale data mining experience Deduplication and normalization techniques Confidence scoring methodologies Experience working with messy and unstructured data
Mindset

Builder mindset (not an operator) Comfortable with failure, retries, and experimentation System-level thinker Obsessive about data quality Ability to work independently with clear targets
Good to Have

Experience scraping the Shopify ecosystem Large-scale crawling (100k+ domains) Reverse-engineering JavaScript-heavy websites CAPTCHA bypass experience Exposure to growth, outbound, or sales intelligence data
KPIs (How You Will Be Measured)

5,000-20,000 validated records per week

(depth dependent) Enrichment success rate:

>60%

Accuracy (QA pass rate):

>90%

Duplicate rate:

<3%

Pipeline uptime:

>95%

Cost per record: decreasing month-over-month Automation coverage:

70%+ within 60 days


Compensation

Competitive and based on

demonstrated scraping depth and system-building ability

, not years of experience.

Interview Process

Technical screening (scraping + logic) Take-home task: build a small scraper and explain enrichment logic Final discussion (architecture, scalability, and approach)
Important Note

If your experience is limited to running paid tools or exporting lists, this role will

not

be a fit.

Job Type: Full-time

Pay: ?10,000.00 - ?40,000.00 per month

Work Location: Hybrid remote in Janakpuri Block C 4, Delhi

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD5148381
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    DL, IN, India
  • Education
    Not mentioned
  • Experience
    Year