pi-labs.ai logo

ASR Developer

pi-labs.ai
Full-time
On-site
pune, Maharashtra, India
Engineer
Company Description
pi-labs provides cutting-edge cybersecurity and intelligence solutions to governments and enterprises, helping them stay ahead of emerging cyber threats driven by the rapid adoption of AI. We specialize in developing advanced tools that safeguard digital ecosystems and ensure trust, safety, and authenticity.

At pi-labs, you’ll work with a team of passionate experts at the forefront of AI-powered security technologies. Join us to build impactful solutions, solve complex challenges, and contribute to protecting the digital world as it evolves.

Role Description
We are seeking a Speech-to-Text Research Engineer with expertise in speech signal processing and modern ASR architectures. The role involves advancing speech enhancement, denoising, and recognition systems through applied research and prototyping for deployment in production environments.

Key Responsibilities
Research and develop robust ASR models combining traditional and deep learning approaches (Kaldi, Whisper, Conformer, wav2vec, HuBERT).
Apply speech signal processing methods—feature extraction, VAD, filtering, spectral analysis—for preprocessing and model input optimization.
Design and evaluate speech enhancement and denoising models using classical DSP and neural methods.
Conduct data curation, augmentation, and evaluation (WER, CER, SDR) across diverse acoustic and linguistic conditions.
Prototype and benchmark efficient ASR pipelines suitable for deployment.
Track research trends and contribute to innovation and publications in speech technologies.

Required Skills & Qualifications
3 - 4 years of experience
Strong foundation in speech signal processing and ASR modeling.
Hands-on with PyTorch, librosa, torchaudio, and traditional ASR toolkits (Kaldi/HTK).
Experience with transformer-based/self-supervised models for speech.
Proficiency in speech enhancement, noise reduction, and data pipeline optimization.
Solid understanding of evaluation metrics and model performance analysis.
Desirable
Exposure to speaker adaptation, diarization, multilingual ASR, or on-device inference.
Contributions to open-source ASR research or industry collaborations.
Show more Show less