Red Team harmful Manipulation Evaluation AI Trainer, $100–$120/hour

3 hours ago

Contract

Remote

United States

$100 - $120 USD hourly

Trainer

Project Overview:

Join a growing community of professionals advancing the next wave of AI. As an AI Trainer, you’ll play a hands-on role by analyzing and providing feedback on data to improve LLM performance, helping ensure that the next generation of AI technology is accurate and trustworthy.

We are seeking a skilled Behavioral Science, Trust & Safety, or Human-Computer Interaction expert to work as a project consultant in our AI Labor Marketplace. This is not a full-time employment position — you will be engaged as an expert project consultant on a contract basis.

Location: U.S.-based experts only

Engagement: Part-time, project-based expert evaluation work

Work Type: Remote

Project Summary:

Contributors will design adversarial prompts targeting harmful manipulation scenarios, evaluate model responses, and apply structured annotations to assess risk. The work combines behavioral insight, analytical judgment, and structured evaluation, along with peer review responsibilities to support quality and consistency.

Consultant Engagement Terms:

This is a project-based consultant role. Consultants will be paid on a per-project basis; hourly rates are estimates based on anticipated completion time. Consultants control their own schedule, provide their own tools, and may simultaneously provide services to other vendors/employers (subject to those vendors’ allowances).

Responsibilities:

Design realistic adversarial prompts reflecting manipulation and influence risks
Execute prompts against AI systems and capture outputs
Apply structured annotation rubrics to evaluate model behavior
Provide clear written justifications for evaluations
Review peer submissions for quality and consistency
Identify edge cases and nuanced failure modes
Incorporate feedback and maintain calibration over time

Expected Outcomes:

High-quality adversarial prompt sets
Consistent, well-reasoned annotations aligned with rubric standards
Constructive peer review feedback
Reliable contribution to overall dataset quality and evaluation goals

Qualifications:

Background in behavioral science, social psychology, trust & safety, HCI, disinformation research, or related field
3–10+ years of relevant professional or research experience
Strong analytical writing and decision-making under ambiguity
Experience with AI evaluation, red teaming, or content policy preferred
Ability to apply structured guidelines consistently across tasks

Apply now

Red Team harmful Manipulation Evaluation AI Trainer, $100–$120/hour

More jobs

Spanish Trust & Safety Data Trainer

AI Data Training Hub

Senior Operations Trainer

OLX