Six month in: Trust and Safety Jobs recap. Learn More 👉
T

AI Engineer – LLM-Based Content Moderation

TrustLab
Full-time
On-site
Palo Alto, California, United States
Engineer
Job Summary:
TrustLab is an AI software company focused on combating misinformation and harmful content. They are seeking an AI Engineer with expertise in Large Language Models to enhance content moderation systems, improve detection accuracy, and collaborate with various teams to align AI models with user needs.

Responsibilities:
• Design, develop, and optimize AI models for content moderation, focusing on precision and recall improvements.
• Fine-tune LLMs for classification tasks related to abuse detection, leveraging supervised and reinforcement learning techniques.
• Develop scalable pipelines for dataset collection, annotation, and training with diverse and representative content samples.
• Implement adversarial testing and red-teaming approaches to identify model vulnerabilities and biases.
• Optimize model performance through advanced techniques such as active learning, self-supervision, and domain adaptation.
• Deploy and monitor content moderation models in production, iterating based on real-world performance metrics and feedback loops.
• Stay up-to-date with advancements in NLP, LLM architectures, and AI safety to ensure best-in-class content moderation capabilities.
• Collaborate with policy, trust & safety, and engineering teams to align AI models with customer needs.

Qualifications:

Required:
• Bachelor's or Master’s degree in Computer Science, Artificial Intelligence, Machine Learning, or a related field.
• 1+ years of experience in AI/ML, with a focus on NLP, deep learning, and LLMs.
• Proficiency in Python and deep learning frameworks such as TensorFlow, PyTorch, or JAX.
• Experience in fine-tuning and deploying transformer-based models like GPT, BERT, T5, or similar.
• Familiarity with evaluation metrics for classification tasks (e.g., F1-score, precision-recall curves) and best practices for handling imbalanced datasets.

Preferred:
• Experience working with large-scale, real-world content moderation datasets.
• Knowledge of regulatory frameworks related to content moderation (e.g., GDPR, DSA, Section 230).
• Familiarity with knowledge distillation and model compression techniques for efficient deployment.
• Experience with reinforcement learning (e.g., RLHF) for AI safety applications.

Company:
TrustLab is an AI software company used to protect against misinformation, hate speech, identity fraud, and other harmful content. Founded in 2019, the company is headquartered in Palo Alto, California, USA, with a team of 11-50 employees. The company is currently Early Stage. TrustLab has a track record of offering H1B sponsorships.