AI Safety & Policy Analyst - 64521

Turing

1 day ago

Contract

Remote

Philippines

Analyst

About Turing:

Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L.

Role Overview:

As an AI Safety & Policy Analyst, you will be on the front lines of developing safe and responsible AI. You will be responsible for challenging our models' safeguards, identifying new vulnerabilities, and creating the detailed evaluation rubrics used to train and test our next generation of large language models.

This role requires a unique blend of creativity, analytical rigor, and a deep understanding of policy. You will not just follow instructions; you will actively design the tests, using an adversarial mindset to discover how models fail. You will then use your analytical skills to articulate why they failed, creating the precise rubrics and rationales that teach our models to be safer and more helpful.

*NOTE: This role may involve reviewing or encountering disturbing, sensitive, or otherwise potentially distressing content as part of AI safety evaluations. Candidates selected for this position may be required to sign an acknowledgment form confirming their understanding and consent.

What does day-to-day look like:

In this role, you will be part of a dynamic team focused on LLM safety and alignment. Your day-to-day work will involve:

Designing and executing creative, multi-turn conversational prompts that test model compliance with complex safety policies (e.g., Discriminatory, Abetting, Copyrighted Content, Harmful Advice).
Identifying, analyzing, and documenting model failures, including successful jailbreaks and subtle policy violations.
Developing detailed, objective, and independent rubrics for new safety prompts, assigning priority scores (e.g., Crucial, Important, Less Important) to define and weight desired model behavior.
Rigorously evaluating and stack-ranking multiple model responses to a single prompt, using the rubrics you created to ensure clear discrimination between good, bad, and nuanced failures.
Writing clear, defensible "Single Rationales" for your rankings that explain the "why" behind your evaluation, focusing on both safety and quality.
Collaborating with researchers and policy-makers to understand new risks and refine the safety taxonomy.

Education & Experience:

BS/BA degree or equivalent experience in a relevant field (e.g., Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field).
Experience in content moderation, policy analysis, AI safety evaluation, or a related role is strongly preferred

Requirements:

English Proficiency: Ability to read and write in English with a high degree of comp.
Exceptional Analytical Thinking: A proven ability to research and evaluate nuanced, complex, and ambiguous information against a defined set of policy criteria.
Creative & Adversarial Mindset: Experience in "red teaming," prompt engineering, or designing creative challenge prompts intended to test and bypass AI safety filters.
Strong Policy & Taxonomy Acumen: A strong understanding of Trust & Safety principles, particularly in relation to LLMs (e.g., categories like misinformation, abetting, bias/stereotypes, jailbreaks, and dual-use). We welcome candidates with expertise in at least one of the following domains:
Cyberharm
Violence, terrorism
Bias and stereotypes
Mental health and self-harm
Child safety
Nudity and sexually explicit content
Misinformation
Fraud
Sycophancy
Regulated goods
Privacy and identity rights
Copyright
Legal, medical, financial information
Meticulous Attention to Detail: The ability to design and author precise, self-contained, and independent evaluation rubrics that can clearly discriminate between models.
Excellent Written Communication: Superior ability to articulate complex rationale for model rankings clearly and concisely, providing a strong training signal for engineers.
Familiarity with RLHF (Reinforcement Learning from Human Feedback) workflows and data annotation is a significant plus.
Feedback: Ability to provide constructive feedback and detailed annotations.
Communication: Excellent communication and collaboration skills.
Independence: Self-motivated and able to work independently in a remote setting.
Technical Setup: Desktop/Laptop set up with a good internet connection.

Benefits:

Flexible working hours and remote work environment.
Opportunity to work on cutting-edge AI projects with leading LLM companies.
Potential for contract extension based on performance and project needs.

Offer Details:

Commitments Required : at least 4 hours per day and a total of 40 hours per week with 2-4 hours of overlap with PST.
Engagement type : Contractor assignment/freelancer (no medical/paid leave)
Duration of contract: 1 month
This role will require some overlap with UTC-8:00 (2-4 hrs/day) America/Los_Angeles

Application Process:

Shortlisted candidates will be sent automated analytical challenges.
Once you clear them, you are ready to go!

After applying, you will receive an email with a login link. Please use that link to access the portal and complete your profile.

Know amazing talent? Refer them at turing.com/referrals, and earn money from your network.

Apply now

AI Safety & Policy Analyst - 64521

More jobs

Content Moderator Lisbon (German speakers)

Quantivos

Content Moderator Lisbon (Dutch speakers)

Quantivos