Biz-Tech Analytics, an AI services company collaborating with leading ML organizations building frontier models, is developing a high-quality mathematics benchmark to evaluate the reasoning capabilities of next-generation AI models such as GPT-5 and Gemini-3.
We are seeking skilled problem creators capable of designing mathematically rigorous, original, and adversarial questions that challenge these state-of-the-art AI systems.
Role Overview
As a
Mathematics Benchmark Creator
, you will design advanced-level math problems that:
Have
a single correct integer answer
.
Are
difficult enough
that at least one of the latest AI models answers incorrectly.
Are novel, clearly stated, and solvable without ambiguity.
Span a range of mathematical domains such as number theory, algebra, combinatorics, geometry, and logical reasoning.
You will submit your question-answer pairs through a dedicated platform, where automated evaluation and reviewer feedback will also be provided.
Key Responsibilities
Create original mathematics questions intended to challenge advanced AI reasoning.
Ensure each question has a
clear, integer final answer
.
Verify correctness of your own solution through proper reasoning.
Iterate on your questions based on reviewer feedback if needed.
Follow project guidelines strictly to ensure high acceptance rates.
Maintain consistency, clarity, and difficulty aligned with benchmark requirements.
Requirements
Strong background in mathematics (students, graduates, researchers, Olympiad participants welcome).
Ability to craft tricky or non-standard problems that remain rigorously solvable.
Excellent logical reasoning and problem-solving skills.
Attention to detail and ability to detect potential ambiguities.
Reliability in delivering high-quality work.
Compensation
?800 - ?1000 per approved question
, depending on how many cutting-edge AI models the problem successfully "stumps" on the evaluation platform.
All payments are made for questions that pass validation and are accepted by the customer.
Tools & Workflow
Work will be done entirely through the provided platform.
You'll receive:
Submission interface
Automated feedback
Rework requests (if any)
Problem evaluation results
No prior experience with AI models is required -- only strong math and problem-design ability.
Ideal Candidate
Someone who enjoys designing clever puzzles, Olympiad-style problems, or trick questions that exploit subtle reasoning gaps, and is excited to contribute to the next generation of AI evaluation.
Job Type: Freelance
Contract length: 3 months
Pay: ₹1,500.00 - ?5,000.00 per day
Experience:
Mathematics Benchmark Creator: 4 years (Preferred)
Work Location: Remote
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.