mpathic logo

Red Teaming Expert (AI Safety) — Execution + QA Tooling Support

mpathic
Full-time
Remote
Worldwide
Specialist

About mpathic

mpathic is building the future of empathetic, trustworthy AI. Grounded in behavioral science and human-centered design, our technology delivers AI systems that are safe, aligned, and emotionally intelligent. As enterprises race to adopt AI, we believe the companies that win will be those that build trust first.


We are seeking a Marketing Manager to help execute the programs, campaigns, and content that amplify mpathic’s brand, voice, and category leadership. This is a hands-on, high-impact role for a creative marketing generalist who loves engaging with customers across channels.


About the Role

We’re building a high-signal, scalable AI Red Team that helps frontier models become safer, more reliable, and more resilient to abuse. As a Red Teaming Expert, you’ll execute advanced adversarial testing and produce high-quality, reproducible findings—while also partnering with Product/Engineering to improve the QA tooling and delivery pipeline that powers red teaming at scale.


This is not a training role. You’ll be an expert operator who can both find real model weaknesses and help us industrialize the process (review flows, gold sets, tagging systems, dashboards, and automation).


What You’ll Do

Red Team Execution (Core)

  • Design and run adversarial tests across multiple failure modes:
    • jailbreaks and policy boundary probing
    • prompt injection / tool & agent manipulation
    • data leakage + privacy failures
    • unsafe instruction-following and harmful compliance
    • hallucination, overconfidence, and reliability weaknesses
  • Create realistic multi-turn scenarios (including ambiguous and edge-case prompts).
  • Produce clear, reproducible findings with:
    • exact steps to reproduce
    • expected vs actual model behavior
    • severity rating + risk rationale
    • recommended mitigations or safety improvements

Quality & Review Support (Day-to-day)

  • Participate in QA review cycles to improve consistency and signal:
    • spot-checks, peer review, and escalation flows
    • severity calibration and disagreements resolution
    • identifying low-signal patterns or “false-positive” findings
  • Help refine rubrics and evaluation standards over time.

QA Tooling & Pipeline Build (Blended Scope)

  • Collaborate with the TPM + Engineering team to build scalable quality systems:
    • task templates and structured output formats
    • labeling / review UI requirements (queues, sampling, audit trail)
    • tagging taxonomy for attacks, behaviors, and failure modes
    • “gold set” creation + versioning + evaluation harness integration
  • Identify bottlenecks and propose automation:
    • auto-checks for formatting, completeness, and duplication
    • report generation helpers
    • test case tracking (prompt/version/model metadata)
  • Contribute lightweight implementation support (based on your background):
    • writing specs and acceptance criteria
    • SQL queries / dashboards / metrics definitions
    • basic scripts or YAML configs for pipelines (if relevant)


What Success Looks Like (First 60–90 Days)

  • You consistently deliver high-signal red team findings that are reproducible and actionable.
  • Your work improves the team’s throughput without sacrificing quality.
  • You help implement at least 1–2 pipeline improvements, such as:
    • better rubric structure or severity alignment
    • improved review flow or sampling plan
    • cleaner reporting templates
    • tagging standards that unlock searchable reuse
  • You strengthen QA consistency by reducing disagreements and increasing reviewer alignment.

Required Qualifications

  • 3+ years in one or more of the following:
    • AI red teaming / model evaluation
    • trust & safety / abuse testing
    • security testing / adversarial testing
    • content integrity / policy enforcement QA
  • Strong ability to generate adversarial prompts and identify realistic failure modes.
  • Excellent writing skills: crisp, structured, and audit-friendly documentation.
  • Sound judgment around safety severity (high signal, low drama).
  • Comfortable working cross-functionally with Product/Engineering.

Preferred Qualifications

  • Experience red teaming LLMs, tool-using models, or agentic systems.
  • Familiarity with:
    • prompt injection and indirect injection attacks
    • data exfiltration and privacy testing
    • evaluation datasets, gold sets, or benchmark design
  • Experience supporting QA systems at scale (review flows, IRR, sampling plans).
  • Ability to contribute technically to pipeline work:
    • writing scripts, working with APIs, SQL, dashboards, or data tooling
  • Prior work in one or more harm domains:
    • self-harm / crisis safety
    • violence / extremism
    • fraud / impersonation
    • medical / legal unsafe guidance
    • harassment / hate content and edge cases

Key Skills

  • Adversarial creativity + disciplined reproducibility
  • Quality mindset: consistency, rubrics, calibration
  • Systems thinking: process improvements + scalable workflows
  • Strong written communication + issue reporting
  • Comfort with ambiguity and fast iteration

Collaboration & Interfaces

You’ll work closely with:

  • TPM / Ops Lead (delivery planning, queue design, metrics)
  • QA Lead (review standards, sampling, dispute resolution)
  • Engineering (tooling requirements, automation, audit trails)
  • Customers / Solutions
    (optional) (clarifying objectives and presenting findings)

Example Responsibilities by Week

  • Week 1–2: ramp on playbook, produce first findings, join calibration
  • Week 3–6: own a full test suite for a customer/model + propose pipeline improvements
  • Week 7–12: lead a mini “attack library” area + help implement QA tooling upgrades