-
Evaluating ASR-LLM Setups for Japanese Speech Recognition with Multipass Augmented Generative Error Correction
Reviewed
Yuka Ko, Sheng Li, Chao-Han Huck Yang, Tatsuya Kawahara
Proc. IEEE-ICASSP
2026.5
-
Expressive Voice Conversion with Controllable Emotional Intensity
Reviewed
Nannan Teng, Ying Hu, Zhijian Ou, Sheng Li
Proc. IEEE-ICASSP
2026.5
-
What Should Automated Vehicles Communicate to Human Drivers? Prioritizing External Human-Machine Interface Information Based on the Four-Sides Model
Di Zhou, Guanghui Zhang, Tianqi Peng, Sheng Li
International Journal of Human-Computer Interaction
2026.3
-
Unified multi-prototype network with pretrained swin transformer for visual and audio open set recognition
Reviewed
Haiyan Yang, Sheng Li, Juncheng Li, Jun Shi, Jun Wang
Signal, Image and Video Processing
20
(
1
)
2026.1
-
Emotion-aware Speech Translation Correction with Large Language Models
Reviewed
Zhengdong Yang, Sheng Li, Chenhui Chu
Journal of Natural Language Processing
2026
-
Speech Foundation Bench for Robotic and EdgeAI systems
Reviewed
Sheng Li, Takahiro Shinozaki
Proc. IEEE-ICASSP demo 2026.
2026
-
Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Jianing Yang, Sheng Li, Takahiro Shinozaki, Yuki Saito, Hiroshi Saruwatari
APSIPA ASC 2025
2025.12
-
LatentSpeech: Latent Diffusion for Text-To-Speech Generation
Invited
Reviewed
Haowei Lou, Hye young Paik, Pari Delir Haghighi, Sheng Li, Wen Hu, Lina Yao
Proc. RO-MAN
2025.12
-
Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads
Reviewed
Jing Li, Felix Schijve, Sheng Li, Yuye Yang, Jun Hu, Emilia Barakova
Proc. IROS
2025.12
-
Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR,
Reviewed
Hongli Yang, S. Li, Hao Huang, Ayiduosi Tuohan, Yizhou Peng
Proc. Interspeech
2025.12
-
End-to-end Acoustic-linguistic Emotion and Intent Recognition Enhanced by Semi-supervised Learning
Reviewed
Zhao Ren, Rathi Adarshi Rammohan, Kevin Scheck, Sheng Li, Tanja Schultz
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
2025.12
-
Bandwidth Extension System for Throat Microphone Speech Reconstruction
Reviewed
Yu Xu, Xiaokai Qin, Tianyu Fan, Eng Siong Chng, Sheng Li, Nobuaki Minematsu, Daisuke Saito
2025.12
-
Generative Error Correction for Emotion-aware Speech-to-text Translation
Reviewed
Zhengdong Yang, Sheng Li, Chenhui Chu
Proc. ACL (findings)
2025.12
-
SIQ: Exterminating Speech Intelligence Quotient Cross Cognitive Levels in Voice Understanding Large Language Models
Reviewed
Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, Sadao Kurohashi
Proc. ACL (long main)
2025.12
-
Simple and Effective Content Encoder for Singing Voice Conversion via Dimension Reduction,
Reviewed
Wangjin Zhou, Tianjiao Du, Chenglin Xu, S. Li, Yi Zhao, Tatsuya Kawahara
Proc. Interspeech
2025.12
-
Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning
Reviewed
Hongli Yang, Yizhou Peng, Hao Huang, S. Li
Proc. Interspeech
2025.12
-
Designing an LLM-powered Social Robot for Supporting Emotion Regulation In Parent-Child Dyads
Jing Li, Sheng Li, Emilia I. Barakova, Felix Schijve, Jun Hu
Proc. RO-MAN (late breaking)
2025.12
-
Designing an LLM-powered Social Robot for Supporting Emotion Regulation In Parent-Child Dyads
Reviewed
Jing Li, Felix Schijve, Sheng Li, Emilia Barakova, Jun Hu
Interactive AI for Preventive Health (IAI4PH) 2025
2025.12
-
Empowering Māori Automatic Speech Recognition through EMD-Based Augmentation
Chengxi Lei, Sheng Li, Satwinder Singh, Feng Hou, Huia Jahnke, Ruili Wang
22nd Pacific Rim International Conference on Artificial Intelligence (PRICAI 2025)
2025.11
-
Extending Whisper for Emotion Prediction Using Word-level Pseudo Labels
Reviewed
Chin Yuen Kwok, Sheng Li, Jia Qi Yip, Chenhui Chu, Tatsuya Kawahara, Eng Siong Chng
IEEE-ICASSP
2025.3
-
Similarity-based accent recognition with continuous and discrete self-supervised speech representations
Reviewed
Jun-You Wang, Sheng Li, Li-An Lu, Sydney Chia-Chun Kao, Jyh-Shing Roger Jang
IEEE-ICASSP
2025.3
-
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Reviewed
Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang
IEEE-ICASSP
2025.3
-
RAG-Boost: Retrieval-Augmented Generation Enhanced LLM-based Speech Recognition,
Reviewed
Pengcheng Wang, Sheng Li, Takahiro Shinozaki
Interspeech2025 MLC-SLM Challenge workshop
2025
-
CoVoGER: A Multilingual Multitask Benchmark for Speech-to-text Generative Error Correction with Large Language Models,
Reviewed
Zhengdong Yang, Zhen Wan, Sheng Li, Chao-Han Huck Yang, Chenhui Chu
Proc. EMNLP (long main)
2025
-
Collaborative Transformer Prototype Network With Pretrained Contrastive Language-Audio Encoder for Open Set Audio Recognition
Reviewed
Haiyan Yang, Jun Wang, Sheng Li, Di Zhou, Xingwei Chen, Juncheng Li, Yufeng Hua, Jun Shi
IEEE Transactions on Signal Processing
73
4748
-
4763
2025
-
Cross-Lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
Reviewed
Zhengdong Yang, Qianying Liu, Sheng Li, Fei Cheng, Chenhui Chu
IEEE Transactions on Audio, Speech and Language Processing
1
-
13
2025
-
SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models.
Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li 0010, Ke Hu, Zhehuai Chen, Shinji Watanabe 0001, Fei Cheng 0002, Chenhui Chu, Sadao Kurohashi
ACL (1)
30381
-
30398
2025
-
Multi-Domain Dialogue State Tracking with Large Language Model Rationale and Disentangled Domain-Slot Attention
Reviewed
Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki
IEEE Transactions on Audio, Speech and Language Processing
1
-
14
2025
-
Neural TTS-Based Dynamic Data Augmentation for Improved Speech Separation
Reviewed
Kai Wang, Cuicui Zhu, Lili Yin, Sheng Li, Madina Mansurova, Hao Huang
IEEE Transactions on Audio, Speech and Language Processing
33
2457
-
2470
2025
-
A Two-Stage LoRA Strategy for Expanding Language Capabilities in Multilingual ASR Models
Reviewed
Chin Yuen Kwok, Hexin Liu, Jia Qi Yip, Sheng Li, Eng Siong Chng
IEEE Transactions on Audio, Speech and Language Processing
33
2576
-
2590
2025
-
Parallel and Limited Data Voice Conversions on Myanmar Language Speech for Spoofed Detection
Reviewed
Hay Mar Soe Naing, Win Pa Pa, Sheng Li
Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops
1
-
5
2024.12
-
LaMuCo: Large-Scale Multilingual Conversation Speech Recognition Challenge
Reviewed
Qingqing Zhang, Lei Luo, Simin Xu, Yongjing Chen, Chuang Li, Sheng Li, Ruili Wang
Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops
1
-
3
2024.12
-
Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition
Reviewed
Jianan Chen, Chenhui Chu, Sheng Li, Tatsuya Kawahara
2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
1
-
6
2024.12
-
LLM as decoder: Investigating Lattice-based Speech Recognition Hypotheses Rescoring Using LLM
Reviewed
Sheng Li, Yuka Ko, Akinori Ito
2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
1
-
5
2024.12
-
Low-resource Language Adaptation with Ensemble of PEFT Approaches
Reviewed
Chin Yuen Kwok, Sheng Li, Jia Qi Yip, Eng Siong Chng
2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
1
-
6
2024.12
-
Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition
Reviewed
Chao Tan, Sheng Li, Yang Cao, Zhao Ren, Tanja Schultz
ACM Multimedia Asia 2024
2024.12
-
Enhancing Privacy of Spatiotemporal Federated Learning Against Gradient Inversion Attacks
Reviewed
Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa
Lecture Notes in Computer Science
457
-
473
2024.10
-
Investigating ASR Error Correction with Large Language Model and Multilingual 1-best Hypotheses
Reviewed
Sheng Li, Chen Chen, Chin Yuen Kwok, Chenhui Chu, Eng Siong Chng, Hisashi Kawai
Interspeech 2024
1315
-
1319
2024.9
-
Automatic Post-Editing of Speech Recognition System Output Using Large Language Models
Reviewed
Sheng Li, Jiyi Li, Yang Cao
The DASFAA 2024 Workshop
2024.7
-
Revisiting Generative Adversarial Network for Downstream Task of Speech Recognition
Reviewed
Sheng Li, Bei Liu, Jianlong Fu
2024.6
-
Voices of the Himalayas: Benchmarking Speech Recognition Systems for the Tibetan Language
Reviewed
Sheng Li, Jiyi Li, Chenhui Chu
International Journal of Asian Language Processing
2024.5
-
Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis
Reviewed
Yankun Wu, Yuta Nakashima, Noa Garcia, Sheng Li, Zhaoyang Zeng
Proceedings of the 2024 International Conference on Multimedia Retrieval
2024.5
-
Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing
Reviewed
Yi Zhao, Chunyu Qiang, Hao Li, Yulan Hu, Wangjin Zhou, Sheng Li
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2024.4
-
MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction
Reviewed
Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Kawahara Tatsuya
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2024.4
-
Phantom in the opera: adversarial music attack for robot dialogue system
Invited
Reviewed
Sheng Li, Jiyi Li, Yang Cao
Frontiers in Computer Science
6
2024.2
-
End-to-end Japanese-English Speech-to-text Translation with Spoken-to-Written Style Conversion
Reviewed
Zhengdong Yang, Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi
Journal of Natural Language Processing
31
(
3
)
935
-
957
2024
-
LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement
Reviewed
Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
2023.12
-
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection
Reviewed
Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
2023.12
-
KyotoMOS: An Automatic MOS Scoring System for Speech Synthesis
Invited
Reviewed
Wangjin Zhou, Zhengdong Yang, Sheng Li, Chenhui Chu
ACM Multimedia Asia Workshops
2023.12
-
Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization
Reviewed
Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He
ACM Multimedia Asia 2023
2023.12
-
GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System
Reviewed
Xiaojiao Chen, Sheng Li, Jiyi Li, Yang Cao, Hao Huang, Liang He
ACM Multimedia Asia 2023
2023.12
-
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network
Reviewed
Nan Li, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li, Jianwu Dang
Speech Communication
103024
-
103024
2023.12
-
Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings
Reviewed
Soky Kak, Sheng Li, Chenhui Chu, Tatsuya Kawahara
International Journal of Asian Language Processing (IJALP)
2023.11
-
Disordered speech recognition considering low resources and abnormal articulation
Yuqin Lin, Longbiao Wang, Jianwu Dang, Sheng Li, Chenchen Ding
Speech Communication
155
103002
-
103002
2023.11
-
Correction while Recognition: Combining Pretrained Language Model for Taiwan-Accented Speech Recognition
Reviewed
Sheng Li, Jiyi Li
Artificial Neural Networks and Machine Learning – ICANN 2023
389
-
400
2023.9
-
The Kyoto Speech-to-Speech Translation System for IWSLT 2023
Reviewed
Zhengdong Yang, Shuichiro Shimizu, Wangjin Zhou, Sheng Li, Chenhui Chu
International Conference on Spoken Language Translation (IWSLT)
2023.7
-
Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Reviewed
Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi
In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023): Findings Volume
2023.7
-
Multi-Domain Dialogue State Tracking with Disentangled Domain-Slot Attention
Reviewed
Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki
In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023): Findings Volume
2023.7
-
Dialogue State Tracking with Sparse Local Slot Attention
Reviewed
Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki
ACL 2023 Workshop on NLP for Conversational AI
2023.7
-
Tendency-and-attention-informed deep learning for ENSO forecasts
Shen Qiao, Cuicui Zhang, Xuefeng Zhang, Kai Zhang, Hao Shi, Sheng Li, Hao Wei
Climate Dynamics
2023.6
-
Development of a Pain Signaling System Using Machine Learning
Reviewed
Helen Korving, Sheng Li, Di Zhou, Paula Sterkenburg, Panos Markopoulos, Emilia Barakova
2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
2023.6
-
General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition
Reviewed
Chao Tan, Yang Cao, Sheng Li, Masatoshi Yoshikawa
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2023.6
-
Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition
Reviewed
Qianying Liu, Zhuo Gong, Zhengdong Yang, Yuhang Yang, Sheng Li, Chenchen Ding, Nobuaki Minematsu, Hao Huang, Fei Cheng, Chenhui Chu, Sadao Kurohashi
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2023.6
-
Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language
Reviewed
Kak Soky, Sheng Li, Chenhui Chu, Tatsuya Kawahara
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2023.6
-
Speakeraugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation
Reviewed
Kai Wang, Yuhang Yang, Hao Huang, Ying Hu, Sheng Li
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2023.6
-
Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition
Reviewed
Yuhang Yang, Haihua Xu, Hao Huang, Eng Siong Chng, Sheng Li
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2023.6
-
An End-to-End Chinese and Japanese Bilingual Speech Recognition Systems with Shared Character Decomposition
Reviewed
Sheng Li, Jiyi Li, Qianying Liu, Zhuo Gong
Communications in Computer and Information Science
493
-
503
2023.4
-
Investigating Effective Domain Adaptation Method for Speaker Verification Task
Reviewed
Guangxing Li, Wangjin Zhou, Sheng Li, Yi Zhao, Jichen Yang, Hao Huang
Communications in Computer and Information Science
517
-
527
2023.4
-
GhostVec: Directly Extracting Speaker Embedding from End-to-End Speech Recognition Model Using Adversarial Examples
Reviewed
Xiaojiao Chen, Sheng Li, Hao Huang
Communications in Computer and Information Science
482
-
492
2023.4
-
SpecMNet: Spectrum Mend Network for Monaural Speech Enhancement
Reviewed
Cunhang Fan, Hongmei Zhang, Jiangyan Yi, Zhao Lv, Jianhua Tao, Taihao Li, Guanxiong Pei, Xiaopei Wu, Sheng Li
Applied Acoustics
194
(
108792
)
2022.12
-
Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling
Reviewed
Siqing Qin, Longbiao Wang, Sheng Li, Jianwu Dang, Lixin Pan
EURASIP Journal on Audio, Speech, and Music Processing
2022
(
1
)
1
-
10
2022.12
-
Can We Train a Language Model Inside an End-to-End ASR Model? - Investigating Effective Implicit Language Modeling
Reviewed
Zhuo Gong, Saito Daisuke, Sheng Li, Hisashi Kawai, Minematsu Nobuaki
Proceedings of the Second Workshop on When Creative AI Meets Conversational AI
42
-
47
2022.12
-
Subband-based Spectrogram Fusion for Speech Enhancement by Combining Mapping and Masking Approaches
Reviewed
Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, Tatsuya Kawahara
2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
2022.11
-
Nict-Tib1: A Public Speech Corpus Of Lhasa Dialect For Benchmarking Tibetan Language Speech Recognition Systems
Reviewed
Kak Soky, Zhuo Gong, Sheng Li
2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)
2022.11
-
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction
Reviewed
Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, Tatsuya Kawahara
Interspeech 2022
2022.9
-
Multi-Domain Dialogue State Tracking with Top-k Slot Self Attention
Reviewed
Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki
In Proc. SIGdial Meeting Discourse \& Dialogue
2022.9
-
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism
Reviewed
Kak Soky, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara
in Proc. INTERSPEECH
2022.9
-
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection
Reviewed
Longfei Yang, Wenqing Wei, Sheng Li, Jiyi Li, Takahiro Shinozaki
in Proc. INTERSPEECH
2022.9
-
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection
Reviewed
Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki
in Proc. INTERSPEECH
2022.9
-
Fusion of Self-supervised Learned Models for MOS Prediction
Reviewed
Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao
in Proc. INTERSPEECH
2022.9
-
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition
Reviewed
Siqing Qin, Longbiao Wang, Sheng Li, Yuqin Lin, Jianwu Dang
in Proc. INTERSPEECH
2022.9
-
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network
Reviewed
Nan LI, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li, Jianwu Dang
in Proc. INTERSPEECH
2022.9
-
Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network
Reviewed
Kai Li, Xugang Lu, Masato Akagi, Jianwu Dang, Sheng Li, Masashi Unoki
2022 30th European Signal Processing Conference (EUSIPCO)
379
-
383
2022.8
-
TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies
Invited
Reviewed
Kak Soky, Masato Mimura, Tatsuya Kawahara, Chenhui Chu, Sheng Li, Chenchen Ding, Sethserey Sam
International Journal of Asian Language Processing
31
(
03n04
)
2022.7
-
Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model
Reviewed
Zhuo Gong, Daisuke Saito, Longfei Yang, Takahiro Shinozaki, Sheng Li, Hisashi Kawai, Nobuaki Minematsu
The Speaker and Language Recognition Workshop (Odyssey 2022)
2022.6
-
Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection.
Reviewed
Sheng Li, Jiyi Li, Qianying Liu, Zhuo Gong
In Proc. LREC (Language Resources and Evaluation Conference)
7291
-
7297
2022.6
-
Mining Hard Samples Locally And Globally For Improved Speech Separation
Reviewed
Kai Wang, Yizhou Peng, Hao Huang, Ying Hu, Sheng Li
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2022.5
-
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation
Reviewed
Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li, Chenchen Ding, Lixin Pan, Yuguang Wang, Jianwu Dang, Kiyoshi Honda
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2022.5
-
Cross-Lingual Transfer Learningfor End-to-End Speech Translation
Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi
Journal of Natural Language Processing
29
(
2
)
611
-
637
2022
-
Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview
Reviewed
Xiaojiao Chen, Sheng Li, Hao Huang
Applied Sciences, Special Issues of Machine Speech Communication, 2021.
2021.12
-
Spectrograms Fusion-based End-to-End Robust Automatic Speech Recognition
Reviewed
Hao Shi, Longbiao Wang, Sheng Li, Cunhang Fan, Jianwu Dang, Tatsuya Kawahara
In Proc. APSIPA ASC
2021.12
-
Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework
Reviewed
Yizhou Peng, Jicheng Zhang, Haobo Zhang, Haihua Xu, Hao Huang, Sheng Li, Eng Siong Chng
In Proc. APSIPA ASC
2021.12
-
On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora
Reviewed
Kak Soky, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara
In Proc. APSIPA ASC
2021.12
-
Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC)
Reviewed
Kak Soky, Masato Mimura, Tatsuya Kawahara, Sheng Li, Chenchen Ding, Chenhui Chu, Sethserey Sam
in Proc. O-COCOSDA
2021.12
-
An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model
Reviewed
Ding Wang, Shuaishuai Ye, Xinhui Hu, Sheng Li, Xinkang Xu
Interspeech 2021
3266
-
3270
2021.8
-
End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain
Reviewed
Kai Wang, Hao Huang, Ying Hu, Zhihua Huang, Sheng Li
Interspeech 2021
3046
-
3050
2021.8
-
Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network
Nan Li, Longbiao Wang, Masashi Unoki, Sheng Li, Rui Wang, Meng Ge, Jianwu Dang
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2021.6
-
An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System
Shunfei Chen, Xinhui Hu, Sheng Li, Xinkang Xu
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2021.6
-
Encoder-Decoder Based Pitch Tracking and Joint Model Training for Mandarin Tone Classification
Reviewed
Hao Huang, Kai Wang, Ying Hu, Sheng Li
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2021.6
-
Phantom in the Opera: Effective Adversarial Music Attack on Keyword Spotting Systems.
Heran Zhang, Sheng Li, Xingjun Ma, Yi Zhao, Yang Cao, Tatsuya Kawahara
IEEE-SLT2021
2021
-
Simultaneous Progressive Filtering-Based Monaural Speech Enhancement
Reviewed
Haoran Yin, Hao Shi, Longbiao Wang, Luya Qiang, Sheng Li, Meng Ge, Gaoyan Zhang, Jianwu Dang
Communications in Computer and Information Science
213
-
221
2021
-
Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS
Reviewed
Dawei Liu, Longbiao Wang, Sheng Li, Haoyu Li, Chenchen Ding, Ju Zhang, Jianwu Dang
Communications in Computer and Information Science
110
-
118
2021
-
Speech Dereverberation Based on Scale-Aware Mean Square Error Loss
Reviewed
Luya Qiang, Hao Shi, Meng Ge, Haoran Yin, Nan Li, Longbiao Wang, Sheng Li, Jianwu Dang
Communications in Computer and Information Science
55
-
63
2021
-
VOIS: The First Speech Therapy App Specifically Designed for Myanmar Hearing-Impaired Children
Aye Thida, Nway Nway Han, Sheinn Thawtar Oo, Sheng Li, Chenchen Ding
2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)
2020.11
-
Compensation on x-vector for Short Utterance Spoken Language Identification
Peng Shen, Xugang Lu, Komei Sugiura, Sheng Li, Hisashi Kawai
Odyssey 2020 The Speaker and Language Recognition Workshop
47
-
52
2020.11
-
Joint Training End-to-End Speech Recognition Systems with Speaker Attributes
Sheng Li, Xugang Lu, Raj Dabre, Peng Shen, Hisashi Kawai
Odyssey 2020 The Speaker and Language Recognition Workshop
2020.11
-
Voice-Indistinguishability -- Protecting Voiceprint with Differential Privacy under an Untrusted Server
Yaowei Han, Yang Cao, Sheng Li, Qiang Ma, Masatoshi Yoshikawa
Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security
2125
-
2127
2020.10
-
Singing Voice Extraction with Attention-Based Spectrograms Fusion
Hao Shi, Longbiao Wang, Sheng Li, Chenchen Ding, Meng Ge, Nan Li, Jianwu Dang, Hiroshi Seki
Interspeech 2020
2020.10
-
Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription
Yuqin Lin, Longbiao Wang, Sheng Li, Jianwu Dang, Chenchen Ding
Interspeech 2020
2020.10
-
Voice-Indistinguishability: Protecting Voiceprint In Privacy-Preserving Speech Data Release
Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, Masatoshi Yoshikawa
2020 IEEE International Conference on Multimedia and Expo (ICME)
2020.7
-
Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation
Hao Shi, Longbiao Wang, Meng Ge, Sheng Li, Jianwu Dang
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2020.5
-
End-to-End Articulatory Modeling for Dysarthric Articulatory Attribute Detection
Yuqin Lin, Longbiao Wang, Jianwu Dang, Sheng Li, Chenchen Ding
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2020.5
-
Automatic speech recognition
Xugang Lu, Sheng Li, Masakiyo Fujimoto
SpringerBriefs in Computer Science
21
-
38
2020
-
Investigation of Effectively Synthesizing Code-Switched Speech Using Highly Imbalanced Mix-Lingual Data
Shaotong Guo, Longbiao Wang, Sheng Li, Ju Zhang, Cheng Gong, Yuguang Wang, Jianwu Dang, Kiyoshi Honda
Neural Information Processing
36
-
47
2020
-
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai
IEEE/ACM Transactions on Audio, Speech, and Language Processing
28
2674
-
2683
2020
-
Multi-lingual transformer training for khmer automatic speech recognition
Reviewed
Kak Soky, Sheng Li, Tatsuya Kawahara, Sopheap Seng
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
1893
-
1896
2019.11
-
Effective Training End-to-End ASR systems for Low-resource Lhasa Dialect of Tibetan Language
Lixin Pan, Sheng Li, Longbiao Wang, Jianwu Dang
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
2019.11
-
Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai
Interspeech 2019
2019.9
-
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation
Sheng Li, Dabre Raj, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
Interspeech 2019
2019.9
-
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition
Sheng Li, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
Interspeech 2019
2019.9
-
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese
Sheng Li, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
Interspeech 2019
2019.9
-
Interactive learning of teacher-student model for short utterance spoken language identification.
Reviewed
P.Shen, X.Lu, S. Li, H.Kawai
Proc. IEEE-ICASSP
2019
-
INVESTIGATION OF SEQUENCE-LEVEL KNOWLEDGE DISTILLATION METHODS FOR CTC ACOUSTIC MODELS
Reviewed
Ryoichi Takashima, Li Sheng, Hisashi Kawai
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
6156
-
6160
2019
-
Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification.
Reviewed
P.Shen, X.Lu, S. Li, H.Kawai
Proc. INTERSPEECH
2018
-
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Reviewed
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6
3708
-
3712
2018
-
CTC LOSS FUNCTION WITH A UNIT-LEVEL AMBIGUITY PENALTY
Reviewed
Ryoichi Takashima, Sheng Li, Hisashi Kawai
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
5909
-
5913
2018
-
AN INVESTIGATION OF A KNOWLEDGE DISTILLATION METHOD FOR CTC ACOUSTIC MODELS
Reviewed
Ryoichi Takashima, Sheng Li, Hisashi Kawai
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
5809
-
5813
2018
-
Temporal Attentive Pooling for Acoustic Event Detection.
Reviewed
X.Lu, P.Shen, S. Li, Y.Tsao, H.Kawai
Proc. INTERSPEECH
2018
-
IMPROVING VERY DEEP TIME-DELAY NEURAL NETWORK WITH VERTICAL-ATTENTION FOR EFFECTIVELY TRAINING CTC-BASED ASR SYSTEMS
Reviewed
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018)
77
-
83
2018
-
INCREMENTAL TRAINING AND CONSTRUCTING THE VERY DEEP CONVOLUTIONAL RESIDUAL NETWORK ACOUSTIC MODELS
Reviewed
Sheng Li, Xugang Lu, Peng Shen, Ryoichi Takashima, Tatsuya Kawahara, Hisashi Kawai
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU)
222
-
227
2017
-
SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING
Reviewed
Sheng Li, Xugang Lu, Shinsuke Sakai, Masato Mimura, Tatsuya Kawahara
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
5270
-
5274
2017
-
Conditional generative adversarial nets classifier for spoken language identification
Reviewed
Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2017-
2814
-
2818
2017
-
Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses
Reviewed
Sheng Li, Yuya Akita, Tatsuya Kawahara
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
24
(
9
)
1524
-
1534
2016.9
-
Confidence Estimation for Speech Recognition Systems using Conditional Random Fields Trained with Partially Annotated Data
Reviewed
Sheng Li, Xugang Lu, Shinsuke Mori, Yuya Akita, Tatsuya Kawahara
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP)
2016
-
Data Selection from Multiple ASR Systems' Hypotheses for Unsupervised Acoustic Model Training
Reviewed
Sheng Li, Yuya Akita, Tatsuya Kawahara
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS
5875
-
5879
2016
-
Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training
Reviewed
Sheng Li, Yuya Akita, Tatsuya Kawahara
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
E98D
(
8
)
1545
-
1552
2015.8
-
Discriminative Data Selection for Lightly Supervised Training of Acoustic Model using Closed Caption Texts
Reviewed
Sheng Li, Yuya Akita, Tatsuya Kawahara
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5
3526
-
3530
2015
-
Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
Reviewed
Sheng Li, Xugang Lu, Yuya Akita, Tatsuya Kawahara
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5
2892
-
2896
2015
-
Corpus and Transcription System of Chinese Lecture Room
Reviewed
Sheng Li, Yuya Akita, Tatsuya Kawahara
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP)
442
-
445
2014
-
Phoneme-level articulatory animation in pronunciation training
Reviewed
Lan Wang, Hui Chen, Sheng Li, Helen M. Meng
SPEECH COMMUNICATION
54
(
7
)
845
-
856
2012.9
-
Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data
Reviewed
Sheng Li, Lan Wang
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3
902
-
905
2012
-
The Phoneme-level Articulator Dynamics for Pronunciation Animation
Reviewed
Sheng Li, Lan Wang, En Qi
Proc. IALP
2011
-
IELS: A Computer-aided Pronunciation Training System for Undergraduate Students
Reviewed
Jinyu Chen, Lan Wang, Chongguo Li, Jin Hu, Sheng Li
ICETC
2010