Yiming Wang | Speech Recognition Researcher at NVIDIA

Yiming Wang (王一鸣)

Research Scientist at NVIDIA
NeMo Speech AI team
NVIDIA Corporation, Santa Clara, CA, USA
E-mail: freewym AT gmail DOT com

Google Scholar
LinkedIn
GitHub

Biography

I am a Research Scientist at NVIDIA, working on Multimodal LLMs. Before joining NVIDIA, I worked in Microsoft CoreAI under Jinyu Li, after receiving my Ph.D. degree in Computer Science from Johns Hopkins University. At JHU, I was also affiliated with the Center for Language and Speech Processing (CLSP), advised by Prof. Sanjeev Khudanpur and former JHU Prof. Daniel Povey. I am mostly working on speech recognition (ASR) problems, and have broad interests in machine learning and natural language processing as well. I am one of the major contributors of the Kaldi project, and the owner of the open-source end-to-end ASR toolkit Espresso. I interned at Google’s speech team and Amazon’s Alexa ASR team in 2017 and 2018 respectively, working on end-to-end ASR.
I received my B.S. and M.S. degree in Computer Science at Nanjing University in 2009 and 2012, respectively. My master advisor was Prof. Tong Lu.

Education

Ph.D. in Computer Science (Sep 2012 - Jul 2021)
Department of Computer Science
Johns Hopkins University, Baltimore, MD, USA
Advisors: Prof. Sanjeev Khudanpur and Dr. Daniel Povey
Thesis: Wake Word Detection and its Applications

M.S. in Computer Science (Sep 2009 - Jun 2012)
Department of Computer Science and Technology
Nanjing University, Nanjing, China
Advisor: Prof. Tong Lu
Thesis: Scene Image Understanding Based on Topic Modeling (in Chinese)

B.S. in Computer Science (Sep 2005 - Jun 2009)
Department of Computer Science and Technology
Nanjing University, Nanjing, China

Work Experience

Staff Research Scientist
NeMo Speech AI team, NVIDIA Corporation, Santa Clara, CA, USA (Apr 2026 - present)
Supervisor: Dr. Boris Ginsburg

Principal Applied Scientist
CoreAI, Microsoft Corporation, Redmond, WA, USA (Sep 2020 - Apr 2026)
Supervisor: Dr. Jinyu Li

Applied Scientist Intern
Amazon.com, Inc., Seattle, WA, USA (May 2018 - Aug 2018)
I worked with Dr. Xing Fan, Dr. I-Fan Chen and Dr. Yuzong Liu on improving Seq2Seq ASR model with information extracted from anchored words for Amazon Alexa.

Research Intern
Google LLC, Mountain View, CA, USA (May 2017 - Aug 2017)
I worked with Dr. Arun Narayanan, Dr. Rohit Prabhavalkar and Dr. Izhak Shafran on improving LAS model with time-frequency attention for robust ASR.

Research Assistant
Center for Language and Speech Processing, Johns Hopkins University, MD, USA (Sep 2015 - Aug 2020)
I worked with Dr. Daniel Povey and Prof. Sanjeev Khudanpur on speech recognition, and contributed to the Kaldi project.

Research Assistant
The Lieber Institute for Brain Development, Baltimore, MD, USA (Sep 2014 - Aug 2015)
I worked on multi-view learning for genomic and brain imaging data.

Teaching Experience

Teaching Assistant
Johns Hopkins University (Fall 2016, Fall 2014, Spring 2014)
Course: Machine Learning
Instructor: Mark Dredze

Teaching Assistant
Johns Hopkins University (Spring 2015)
Course: Information Retrieval and Web Agents
Instructor: David Yarowsky

Teaching Assistant
Johns Hopkins University (Fall 2013)
Course: Machine Learning in Complex Domains
Instructor: Suchi Saria

Teaching Assistant
Johns Hopkins University (Spring 2013)
Course: Algorithms for Sensor-based Robotics
Instructor: Gregory Hager

Teaching Assistant
Nanjing University (Spring 2010)
Course: Programming in Java
Instructor: Ning Li

Talks

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
NVIDIA GPU Technology Conference (GTC) 2020

Publications

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, et al.
Technical Report (2025)

LET-NLM-Decoder: A WFST-Based Asynchronous Lazy-Evaluation Token-Group Decoder for First-Pass Neural Language Model Decoding
Fangyi Li, Hang Lv, Yiming Wang, Lei Xie
Electronics Letters 2025

ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers
Yiming Wang, Jinyu Li
ICASSP 2024

Data2vec-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks
Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang, Yiming Wang, Shujie Liu, Zhuo Chen, DeLiang Wang, Michael Zeng
ICASSP 2023

Self-Supervised Learning with Bi-label Masked Speech Prediction for Streaming Multi-talker Speech Recognition
Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang
ICASSP 2023

CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Ruchao Fan, Yiming Wang, Yashesh Gaur, Jinyu Li
ICASSP 2023

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Yiming Wang^∗, Chengyi Wang^∗, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei
Interspeech 2022

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang
ICASSP 2022

Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition
Yiming Wang, Jinyu Li, Heming Wang, Yao Qian, Chengyi Wang, Yu Wu
ICASSP 2022

LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder with Exact Lattice Generation
Hang Lv, Daniel Povey, Mahsa Yarmohammadi, Ke Li, Yiming Wang, Lei Xie, Sanjeev Khudanpur
IEEE Signal Processing Letters 2021

Wake Word Detection with Streaming Transformers
Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur
ICASSP 2021

Wake Word Detection with Alignment-Free Lattice-Free MMI
Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur
Interspeech 2020

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR
Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
Interspeech 2020

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur
ASRU 2019

The JHU ASR System for VOiCES from a Distance Challenge 2019
Yiming Wang, David Snyder, Hainan Xu, Vimal Manohar, Phani Sankar Nidadavolu, Daniel Povey, Sanjeev Khudanpur
Interspeech 2019

Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings
Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, Kevin Duh
Machine Translation Summit 2019

End-to-end Anchored Speech Recognition
Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister
ICASSP 2019

A Pruned RNNLM Lattice-rescoring Algorithm for Automatic Speech Recognition
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur
ICASSP 2018

Neural Network Language Modeling with Letter-based Features and Importance Sampling
Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur
ICASSP 2018

Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Ke Li, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Interspeech 2018

A GPU-based WFST Decoder with Exact Lattice Generation
Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
Interspeech 2018

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohamadi, Sanjeev Khudanpur
Interspeech 2018

Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs
Vijayaditya Peddinti, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
IEEE Signal Processing Letters 2018

Backstitch: Counteracting Finite-sample Bias via Negative Steps
Yiming Wang, Vijayaditya Peddinti, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur
Interspeech 2017

The Kaldi OpenKWS System: Improving Low Resource Keyword Search
Jan Trmal, Matthew Wiesner, Vijayaditya Peddinti, Xiaohui Zhang, Pegah Ghahremani, Yiming Wang, Vimal Manohar, Hainan Xu, Daniel Povey, Sanjeev Khudanpur
Interspeech 2017

Far-Field ASR Without Parallel Data
Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
Interspeech 2016

Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur
Interspeech 2016

Weakly-supervised Region Annotation for Understanding Scene Images
Hao Wang, Tong Lu, Yiming Wang, Palaiahnakote Shivakumara, Chew Lim Tan
Multimedia Tools and Applications 2016

Accelerated Mini-batch Randomized Block Coordinate Descent Method
Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu
NeurIPS 2014

Learning Polylingual Topic Models from Code-Switched Social Media Documents
Nanyun Peng, Yiming Wang, Mark Dredze
ACL 2014

3D Model Comparison through Kernel Density Matching
Yiming Wang, Tong Lu, Rongjun Gao, Wenyin Liu
ICPR 2010

QuickDiagram: A System for Online Sketching and Understanding of Diagrams
Wenyin Liu, Xiangfei Kong, Yiming Wang, Chester Wan, Cheuk-Yin Ho, Tong Lu, Zhengxing Sun
International Workshop on Graphics Recognition 2009

Patents

Speech Detection and Speech Recognition
Xing Fan, I-Fan Chen, Yuzong Liu, Bjorn Hoffmeister, Yiming Wang, Tongfei Chen
US Patent 10,923,111 (2021)

3D Model Comparison and Retrieval Method based on Kernel Density Estimation
Tong Lu, Yiming Wang
CN Patent 101,882,150 (2012)

View My Stats