Faculty Profiles - LI SHENG

写真a

LI SHENG

Organization

School of Engineering Assistant Professor

Contact information

Homepage

https://halspeech.github.io/index.html

Other name(s)

Sheng Li

Profile

Sheng LI received his BS and ME degrees in 2006 and 2009, respectively, from Nanjing University, Nanjing, China, and his Ph.D. in 2016 from Kyoto University, Kyoto, Japan. From 2009 to 2012, he worked at the joint lab of the Chinese University of Hong Kong and Shenzhen City, researching speech technology-assisted language learning. From 2016 to 2017, he worked as a researcher at Kyoto University, studying speech recognition systems for humanoid robots. From 2017 to Feb. 2025, he was a researcher at the National Institute of Information and Communications Technology in Kyoto, Japan, working on speech-to-speech translation. In March 2025, he was an assistant professor at the Institute of Science, Tokyo. From April 2026, he was hired by both the Institute of Science and Kyoto University as an assistant professor working on speech recognition. He is also a visiting scientist at RIKEN.

He served as a workshop/special session co-organizer and session chair in Interspeech2020, COLING2022, Odyssey2022, ACM Multimedia Asia2023/2024, RO-MAN2025, IROS2025, and ICASSP2024/2026. He is a member of the Acoustical Society of Japan (ASJ) and the International Speech Communication Association (ISCA), and a senior member of IEEE. He is now a member of the Speech, Language, and Audio (SLA) Technical Committee for APSIPA. He is also a member of the Applied Signal Processing Systems Technical Committee (ASPS TC) of the IEEE Signal Processing Society (SPS).

Focusing on

next-generation multilingual speech recognition, translation, and synthesis technology

security-aware speech processing technology

robot audition/embodied voices

https://search.star.titech.ac.jp/titech-ss/pursuer.act?event=outside&key_t2r2Rid=CTT100930321&lang=jp

https://educ.titech.ac.jp/ict/eng/faculty/

https://youtu.be/pP6YtlSVqlM

External link

Degree

Ph.D (Informatics) ( 2016.3 Kyoto University )

Research Interests

Speech Recognition/translation
computer assisted language learning
multimodal speech process
security-aware speech processing
large language model (speech, text)

Research Areas

Informatics / Perceptual information processing

Education

Kyoto University Graduate School Ph.D Informatic Science

2012.10 - 2016.3

　 More details

researchmap
Nanjing University Joint Program of Chinese Academy of Sciences, Chinese University of Hong Kong and Nanjing University M.E

2007.9 - 2009.7

　 More details

researchmap
Nanjing University (a C7 university, previously National Central Univ., CSRank2025≒Kyoto Univ.) B.S Computer Science

2002.7 - 2006.7

　 More details

researchmap

Research History

Kyoto University Project-Specific Assistant Professor

2026.4

　 More details

researchmap
RIKEN Visiting Scientist

2025.10

　 More details

researchmap
Institute of Science Tokyo Assistant Professor

2025.3

　 More details

Country：Japan

researchmap
Eindhoven University of Technology (TU/e), visiting assistant professor

2024.11

　 More details

Country：Netherlands

researchmap
Nanyang Technological University visiting researcher

2024.2 - 2024.3

　 More details

Country：Singapore

researchmap
Kyoto University master course advisor

2021.12 - 2023.3

　 More details

researchmap
National Institute of Information and Communications Technology (NICT) Advanced Speech Technology Laboratory (ASTL) Tenure-track Researcher

2020 - 2025.2

　 More details

researchmap
Oxford University Computer science department visiting researcher

2019.4 - 2019.5

　 More details

researchmap
National Institute of Information and Communications Technology (NICT) Advanced Speech Technology Laboratory (ASTL) researcher (hired by Tokyo Olympic2020 project)

2017 - 2019

　 More details

researchmap
Kyoto University, Speech and Audio Processing Lab. researcher (hired by Erica Humanoid robot project)

2016.4 - 2016.12

　 More details

researchmap
Sogou/Sohu Pinyin IME [Beijing, China] researcher (working on speech input)

2012.4 - 2012.9

　 More details

researchmap
Shenzhen Institute of Advanced Technology [Shenzhen, Guangdong China] researcher (computer-assisted language learning)

2009.7 - 2012.4

　 More details

researchmap

▼display all

Professional Memberships

APNNS (Asia Pacific Neural Network Society)

2023.12

　 More details

researchmap
ACM (Association for Computing Machinery)

　 More details

researchmap
IEEE/IEEE-SPS/IEEE-RAS

　 More details

researchmap
ISCA (International Speech Communication Association)

　 More details

researchmap
ASJ (日本音響学会)

　 More details

researchmap
SIG-CSLP (Chinese Spoken Language Processing)

　 More details

researchmap
APSIPA (Asia Pacific Signal and Information Processing Association)

　 More details

researchmap

▼display all

Committee Memberships

JSAI Co-organizer of OS

2026.6

　 More details

Committee type：Academic society

researchmap
IEEE ICASSP2026 meta reviewer

2026.1

　 More details

Committee type：Academic society

researchmap
APSIPA Speech, Language, and Audio (SLA) Technical Committee (till 2026)

2026

　 More details

Committee type：Academic society

researchmap
IEEE IROS2025 session chair

2025.10

　 More details

Committee type：Academic society

researchmap
IEEE RO-MAN2025 Co-organizer of special session

2025.9

　 More details

Committee type：Academic society

researchmap
IEEE senior member

2025.4

　 More details

Committee type：Academic society

researchmap
IEEE Signal Processing Society (SPS) Applied Signal Processing Systems Technical Committee (ASPS TC)

2025.1 - 2027.1

　 More details

Committee type：Academic society

researchmap
Co-organizing ACM Multimedia Asia 2024 workshop: Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages (M3Oriental) Co-organizer

2024.12

　 More details

researchmap
Session Chair of DASFAA2024

2024.7

　 More details

researchmap
Publicity Chair of ACM Multimedia Asia 2024

2024.6 - 2024.12

　 More details

Committee type：Academic society

researchmap
Session Chair of IEEE-ICASSP2024

2024.4

　 More details

Committee type：Academic society

researchmap
Co-organizing ACM Multimedia Asia 2023 workshop: Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages (M3Oriental)

2023.12

　 More details

researchmap
Session Chair of ICANN 2023

2023.9

　 More details

researchmap
Area Chair of APSIPA ASC 2023

2023.7

　 More details

researchmap
Area Chair of EMNLP 2023

2023.7

　 More details

researchmap
Co-organizing Coling2022 workshop: when creative ai meets conversational ai (cai + cai = cai^2)

2022.10

　 More details

Committee type：Academic society

researchmap
Session Chair for Speaker Odyssey2022 (Evaluation and Benchmarking Session)

2022.6

　 More details

Committee type：Academic society

researchmap
Session Chair for INTERSPEECH2020 (Topics of ASR I)

2020.10

　 More details

Committee type：Academic society

researchmap
Co-organizing INTERSPEECH2020 SLIMTS (Spoken Language Interaction for Mobile Transportation System) workshop

2020.10

　 More details

Committee type：Academic society

researchmap

▼display all

Papers

Casting Everything to Online API Services? A Survey of Integrating Localized Speech Recognition Models in Robotic Systems Reviewed

Sheng Li, Jing Li, Felix Schijve, Jun Hu, Emilia Barakova

International Conference on Social Robotics (ICSR) 2026.7

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Evaluating ASR-LLM Setups for Japanese Speech Recognition with Multipass Augmented Generative Error Correction Reviewed

Yuka Ko, Sheng Li, Chao-Han Huck Yang, Tatsuya Kawahara

Proc. IEEE-ICASSP 2026.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Expressive Voice Conversion with Controllable Emotional Intensity Reviewed

Nannan Teng, Ying Hu, Zhijian Ou, Sheng Li

Proc. IEEE-ICASSP 2026.5

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings)

researchmap
What Should Automated Vehicles Communicate to Human Drivers? Prioritizing External Human-Machine Interface Information Based on the Four-Sides Model

Di Zhou, Guanghui Zhang, Tianqi Peng, Sheng Li

International Journal of Human–Computer Interaction 1 - 25 2026.4

　More details

Authorship：Last author,　Corresponding author Publishing type：Research paper (scientific journal) Publisher：Informa UK Limited

DOI： 10.1080/10447318.2026.2647134

researchmap
Unified multi-prototype network with pretrained swin transformer for visual and audio open set recognition Reviewed

Haiyan Yang, Sheng Li, Juncheng Li, Jun Shi, Jun Wang

Signal, Image and Video Processing 20 ( 1 ) 2026.1

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

DOI： 10.1007/s11760-025-04968-x

researchmap

Other Link： https://link.springer.com/article/10.1007/s11760-025-04968-x
Emotion-aware Speech Translation Correction with Large Language Models Reviewed

Zhengdong Yang, Sheng Li, Chenhui Chu

Journal of Natural Language Processing 2026

　More details

Language：English Publishing type：Research paper (scientific journal)

researchmap
Speech Foundation Bench for Robotic and EdgeAI systems Reviewed

Sheng Li, Takahiro Shinozaki

Proc. IEEE-ICASSP demo 2026. 2026

　More details

researchmap
Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement

Jianing Yang, Sheng Li, Takahiro Shinozaki, Yuki Saito, Hiroshi Saruwatari

APSIPA ASC 2025 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
LatentSpeech: Latent Diffusion for Text-To-Speech Generation Invited Reviewed

Haowei Lou, Hye young Paik, Pari Delir Haghighi, Sheng Li, Wen Hu, Lina Yao

Proc. RO-MAN 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads Reviewed

Jing Li, Felix Schijve, Sheng Li, Yuye Yang, Jun Hu, Emilia Barakova

Proc. IROS 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR, Reviewed

Hongli Yang, S. Li, Hao Huang, Ayiduosi Tuohan, Yizhou Peng

Proc. Interspeech 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.21437/Interspeech.2025-1875

researchmap
End-to-end Acoustic-linguistic Emotion and Intent Recognition Enhanced by Semi-supervised Learning Reviewed

Zhao Ren, Rathi Adarshi Rammohan, Kevin Scheck, Sheng Li, Tanja Schultz

International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Bandwidth Extension System for Throat Microphone Speech Reconstruction Reviewed

Yu Xu, Xiaokai Qin, Tianyu Fan, Eng Siong Chng, Sheng Li, Nobuaki Minematsu, Daisuke Saito

2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Generative Error Correction for Emotion-aware Speech-to-text Translation Reviewed

Zhengdong Yang, Sheng Li, Chenhui Chu

Proc. ACL (findings) 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
SIQ: Exterminating Speech Intelligence Quotient Cross Cognitive Levels in Voice Understanding Large Language Models Reviewed

Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, Sadao Kurohashi

Proc. ACL (long main) 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Simple and Effective Content Encoder for Singing Voice Conversion via Dimension Reduction, Reviewed

Wangjin Zhou, Tianjiao Du, Chenglin Xu, S. Li, Yi Zhao, Tatsuya Kawahara

Proc. Interspeech 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning Reviewed

Hongli Yang, Yizhou Peng, Hao Huang, S. Li

Proc. Interspeech 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Designing an LLM-powered Social Robot for Supporting Emotion Regulation In Parent-Child Dyads

Jing Li, Sheng Li, Emilia I. Barakova, Felix Schijve, Jun Hu

Proc. RO-MAN (late breaking) 2025.12

　More details

DOI： 10.48550/arXiv.2507.10427

researchmap
Designing an LLM-powered Social Robot for Supporting Emotion Regulation In Parent-Child Dyads Reviewed

Jing Li, Felix Schijve, Sheng Li, Emilia Barakova, Jun Hu

Interactive AI for Preventive Health (IAI4PH) 2025 2025.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Empowering Māori Automatic Speech Recognition through EMD-Based Augmentation

Chengxi Lei, Sheng Li, Satwinder Singh, Feng Hou, Huia Jahnke, Ruili Wang

22nd Pacific Rim International Conference on Artificial Intelligence (PRICAI 2025) 2025.11

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Spolacq-GDS: A generative dialogue simulator for spoken interaction learning Reviewed

Taisei Awashima, Renon Toyosaki, Koki Mikuriya, Kota Kawakita, Sheng Li, Takahiro Shinozaki

The Journal of the Acoustical Society of America 158 ( 4_Supplement ) A260 - A260 2025.10

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Acoustical Society of America (ASA)

We propose Spolacq-GDS, a spoken dialogue simulator designed to support spoken interaction learning in multimodal environments. Dialogue scenarios are flexibly defined using human-readable finite-state automata, in which each state represents a dialogue scene and triggers dynamic generation of speech and visual context via pretrained generative models. This enables richly interactive simulations without requiring task-specific datasets. While large language models like ChatGPT are often fine-tuned using reinforcement learning from human feedback (RLHF), rewards are typically assigned to isolated, single-turn outputs or single-step actions. As a result, such models lack interactive learning loops with the environment and user state, limiting their ability to acquire goal-directed behavior through experience. Simulators such as EmoUS enhance user realism through emotion modeling but remain restricted to text-only interactions and fixed task flows. Spolacq-GDS addresses these limitations by providing a data-free, customizable spoken-dialogue simulation environment implemented within the Gymnasium reinforcement learning framework. It supports integration with both reinforcement learning and self-supervised learning agents. We demonstrate that previously studied spoken language acquisition tasks and newly designed multimodal scenarios can be implemented using only text-based scenario descriptions, making Spolacq-GDS a practical platform for exploring motivation-driven spoken interaction in richly simulated environments.

DOI： 10.1121/10.0040820

researchmap
Extending Whisper for Emotion Prediction Using Word-level Pseudo Labels Reviewed

Chin Yuen Kwok, Sheng Li, Jia Qi Yip, Chenhui Chu, Tatsuya Kawahara, Eng Siong Chng

IEEE-ICASSP 2025.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Similarity-based accent recognition with continuous and discrete self-supervised speech representations Reviewed

Jun-You Wang, Sheng Li, Li-An Lu, Sydney Chia-Chun Kao, Jyh-Shing Roger Jang

IEEE-ICASSP 2025.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding Reviewed

Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang

IEEE-ICASSP 2025.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
RAG-Boost: Retrieval-Augmented Generation Enhanced LLM-based Speech Recognition, Reviewed

Pengcheng Wang, Sheng Li, Takahiro Shinozaki

Interspeech2025 MLC-SLM Challenge workshop 2025

　More details

DOI： 10.48550/arXiv.2508.14048

researchmap
CoVoGER: A Multilingual Multitask Benchmark for Speech-to-text Generative Error Correction with Large Language Models, Reviewed

Zhengdong Yang, Zhen Wan, Sheng Li, Chao-Han Huck Yang, Chenhui Chu

Proc. EMNLP (long main) 2025

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Collaborative Transformer Prototype Network With Pretrained Contrastive Language-Audio Encoder for Open Set Audio Recognition Reviewed

Haiyan Yang, Jun Wang, Sheng Li, Di Zhou, Xingwei Chen, Juncheng Li, Yufeng Hua, Jun Shi

IEEE Transactions on Signal Processing 73 4748 - 4763 2025

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/tsp.2025.3616585

researchmap
Cross-Lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition Reviewed

Zhengdong Yang, Qianying Liu, Sheng Li, Fei Cheng, Chenhui Chu

IEEE Transactions on Audio, Speech and Language Processing 1 - 13 2025

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/taslpro.2025.3617233

researchmap
SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models.

Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li 0010, Ke Hu, Zhehuai Chen, Shinji Watanabe 0001, Fei Cheng 0002, Chenhui Chu, Sadao Kurohashi

ACL (1) 30381 - 30398 2025

　More details

Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://dblp.uni-trier.de/rec/conf/acl/2025-1
Multi-Domain Dialogue State Tracking with Large Language Model Rationale and Disentangled Domain-Slot Attention Reviewed

Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki

IEEE Transactions on Audio, Speech and Language Processing 1 - 14 2025

　More details

Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/taslpro.2025.3604650

researchmap
Neural TTS-Based Dynamic Data Augmentation for Improved Speech Separation Reviewed

Kai Wang, Cuicui Zhu, Lili Yin, Sheng Li, Madina Mansurova, Hao Huang

IEEE Transactions on Audio, Speech and Language Processing 33 2457 - 2470 2025

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/taslpro.2025.3578779

researchmap
A Two-Stage LoRA Strategy for Expanding Language Capabilities in Multilingual ASR Models Reviewed

Chin Yuen Kwok, Hexin Liu, Jia Qi Yip, Sheng Li, Eng Siong Chng

IEEE Transactions on Audio, Speech and Language Processing 33 2576 - 2590 2025

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/taslpro.2025.3578752

researchmap
Parallel and Limited Data Voice Conversions on Myanmar Language Speech for Spoofed Detection Reviewed

Hay Mar Soe Naing, Win Pa Pa, Sheng Li

Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops 1 - 5 2024.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3700410.3702120

researchmap
LaMuCo: Large-Scale Multilingual Conversation Speech Recognition Challenge Reviewed

Qingqing Zhang, Lei Luo, Simin Xu, Yongjing Chen, Chuang Li, Sheng Li, Ruili Wang

Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops 1 - 3 2024.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3700410.3702135

researchmap
Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition Reviewed

Jianan Chen, Chenhui Chu, Sheng Li, Tatsuya Kawahara

2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 1 - 6 2024.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/apsipaasc63619.2025.10848811

researchmap
LLM as decoder: Investigating Lattice-based Speech Recognition Hypotheses Rescoring Using LLM Reviewed

Sheng Li, Yuka Ko, Akinori Ito

2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 1 - 5 2024.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/apsipaasc63619.2025.10848752

researchmap
Low-resource Language Adaptation with Ensemble of PEFT Approaches Reviewed

Chin Yuen Kwok, Sheng Li, Jia Qi Yip, Eng Siong Chng

2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 1 - 6 2024.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/apsipaasc63619.2025.10848814

researchmap
Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition Reviewed

Chao Tan, Sheng Li, Yang Cao, Zhao Ren, Tanja Schultz

ACM Multimedia Asia 2024 2024.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3696409.3700187

researchmap
Enhancing Privacy of Spatiotemporal Federated Learning Against Gradient Inversion Attacks Reviewed

Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

Lecture Notes in Computer Science 457 - 473 2024.10

　More details

Language：English Publishing type：Part of collection (book) Publisher：Springer Nature Singapore

DOI： 10.1007/978-981-97-5552-3_31

researchmap
Investigating ASR Error Correction with Large Language Model and Multilingual 1-best Hypotheses Reviewed

Sheng Li, Chen Chen, Chin Yuen Kwok, Chenhui Chu, Eng Siong Chng, Hisashi Kawai

Interspeech 2024 1315 - 1319 2024.9

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2024-368

researchmap
Automatic Post-Editing of Speech Recognition System Output Using Large Language Models Reviewed

Sheng Li, Jiyi Li, Yang Cao

The DASFAA 2024 Workshop 2024.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-981-96-0914-7_12

researchmap
Revisiting Generative Adversarial Network for Downstream Task of Speech Recognition Reviewed

Sheng Li, Bei Liu, Jianlong Fu

2024.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Voices of the Himalayas: Benchmarking Speech Recognition Systems for the Tibetan Language Reviewed

Sheng Li, Jiyi Li, Chenhui Chu

International Journal of Asian Language Processing 2024.5

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：World Scientific Pub Co Pte Ltd

DOI： 10.1142/s2717554524500012

researchmap
Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis Reviewed

Yankun Wu, Yuta Nakashima, Noa Garcia, Sheng Li, Zhaoyang Zeng

Proceedings of the 2024 International Conference on Multimedia Retrieval 2024.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3652583.3658372

researchmap
Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing Reviewed

Yi Zhao, Chunyu Qiang, Hao Li, Yulan Hu, Wangjin Zhou, Sheng Li

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024.4

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp48485.2024.10447526

researchmap
MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction Reviewed

Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Kawahara Tatsuya

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024.4

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp48485.2024.10446041

researchmap
Phantom in the opera: adversarial music attack for robot dialogue system Invited Reviewed

Sheng Li, Jiyi Li, Yang Cao

Frontiers in Computer Science 6 2024.2

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Frontiers Media SA

This study explores the vulnerability of robot dialogue systems' automatic speech recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a natural camouflage for such attacks. We propose a novel method to hide ghost speech commands in a music clip by slightly perturbing its raw waveform. We apply our attack on an industry-popular ASR model, namely the time-delay neural network (TDNN), widely used for speech and speaker recognition. Our experiment demonstrates that adversarial music crafted by our attack can easily mislead industry-level TDNN models into picking up ghost commands with high success rates. However, it sounds no different from the original music to the human ear. This reveals a serious threat by adversarial music to robot dialogue systems, calling for effective defenses against such stealthy attacks.

DOI： 10.3389/fcomp.2024.1355975

researchmap
End-to-end Japanese-English Speech-to-text Translation with Spoken-to-Written Style Conversion Reviewed

Zhengdong Yang, Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi

Journal of Natural Language Processing 31 ( 3 ) 935 - 957 2024

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Association for Natural Language Processing

DOI： 10.5715/jnlp.31.935

researchmap
LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement Reviewed

Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/asru57964.2023.10389788

researchmap
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection Reviewed

Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023.12

　More details

Authorship：Last author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/asru57964.2023.10389690

researchmap
KyotoMOS: An Automatic MOS Scoring System for Speech Synthesis Invited Reviewed

Wangjin Zhou, Zhengdong Yang, Sheng Li, Chenhui Chu

ACM Multimedia Asia Workshops 2023.12

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (conference, symposium, etc.) Publisher：ACM

DOI： 10.1145/3611380.3628562

researchmap
Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization Reviewed

Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He

ACM Multimedia Asia 2023 2023.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3595916.3626366

researchmap
GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System Reviewed

Xiaojiao Chen, Sheng Li, Jiyi Li, Yang Cao, Hao Huang, Liang He

ACM Multimedia Asia 2023 2023.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3595916.3626367

researchmap
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network Reviewed

Nan Li, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li, Jianwu Dang

Speech Communication 103024 - 103024 2023.12

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Elsevier BV

DOI： 10.1016/j.specom.2023.103024

researchmap
Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings Reviewed

Soky Kak, Sheng Li, Chenhui Chu, Tatsuya Kawahara

International Journal of Asian Language Processing (IJALP) 2023.11

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1142/S2717554523500248

researchmap
Disordered speech recognition considering low resources and abnormal articulation

Yuqin Lin, Longbiao Wang, Jianwu Dang, Sheng Li, Chenchen Ding

Speech Communication 155 103002 - 103002 2023.11

　More details

Publishing type：Research paper (scientific journal) Publisher：Elsevier BV

DOI： 10.1016/j.specom.2023.103002

researchmap
Correction while Recognition: Combining Pretrained Language Model for Taiwan-Accented Speech Recognition Reviewed

Sheng Li, Jiyi Li

Artificial Neural Networks and Machine Learning – ICANN 2023 389 - 400 2023.9

　More details

Authorship：Lead author Language：English Publishing type：Part of collection (book) Publisher：Springer Nature Switzerland

DOI： 10.1007/978-3-031-44195-0_32

researchmap
The Kyoto Speech-to-Speech Translation System for IWSLT 2023 Reviewed

Zhengdong Yang, Shuichiro Shimizu, Wangjin Zhou, Sheng Li, Chenhui Chu

International Conference on Spoken Language Translation (IWSLT) 2023.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Towards Speech Dialogue Translation Mediating Speakers of Different Languages Reviewed

Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023): Findings Volume 2023.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Multi-Domain Dialogue State Tracking with Disentangled Domain-Slot Attention Reviewed

Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023): Findings Volume 2023.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Dialogue State Tracking with Sparse Local Slot Attention Reviewed

Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki

ACL 2023 Workshop on NLP for Conversational AI 2023.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Tendency-and-attention-informed deep learning for ENSO forecasts

Shen Qiao, Cuicui Zhang, Xuefeng Zhang, Kai Zhang, Hao Shi, Sheng Li, Hao Wei

Climate Dynamics 2023.6

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

DOI： 10.1007/s00382-023-06854-z

researchmap

Other Link： https://link.springer.com/article/10.1007/s00382-023-06854-z/fulltext.html
Development of a Pain Signaling System Using Machine Learning Reviewed

Helen Korving, Sheng Li, Di Zhou, Paula Sterkenburg, Panos Markopoulos, Emilia Barakova

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icasspw59220.2023.10193643

researchmap
General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition Reviewed

Chao Tan, Yang Cao, Sheng Li, Masatoshi Yoshikawa

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp49357.2023.10096844

researchmap
Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition Reviewed

Qianying Liu, Zhuo Gong, Zhengdong Yang, Yuhang Yang, Sheng Li, Chenchen Ding, Nobuaki Minematsu, Hao Huang, Fei Cheng, Chenhui Chu, Sadao Kurohashi

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.6

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp49357.2023.10095133

researchmap
Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language Reviewed

Kak Soky, Sheng Li, Chenhui Chu, Tatsuya Kawahara

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp49357.2023.10095644

researchmap
Speakeraugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation Reviewed

Kai Wang, Yuhang Yang, Hao Huang, Ying Hu, Sheng Li

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.6

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp49357.2023.10094767

researchmap
Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition Reviewed

Yuhang Yang, Haihua Xu, Hao Huang, Eng Siong Chng, Sheng Li

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp49357.2023.10096726

researchmap
An End-to-End Chinese and Japanese Bilingual Speech Recognition Systems with Shared Character Decomposition Reviewed

Sheng Li, Jiyi Li, Qianying Liu, Zhuo Gong

Communications in Computer and Information Science 493 - 503 2023.4

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Part of collection (book) Publisher：Springer Nature Singapore

DOI： 10.1007/978-981-99-1645-0_41

researchmap
Investigating Effective Domain Adaptation Method for Speaker Verification Task Reviewed

Guangxing Li, Wangjin Zhou, Sheng Li, Yi Zhao, Jichen Yang, Hao Huang

Communications in Computer and Information Science 517 - 527 2023.4

　More details

Language：English Publishing type：Part of collection (book) Publisher：Springer Nature Singapore

DOI： 10.1007/978-981-99-1645-0_43

researchmap
GhostVec: Directly Extracting Speaker Embedding from End-to-End Speech Recognition Model Using Adversarial Examples Reviewed

Xiaojiao Chen, Sheng Li, Hao Huang

Communications in Computer and Information Science 482 - 492 2023.4

　More details

Authorship：Lead author Publishing type：Part of collection (book) Publisher：Springer Nature Singapore

DOI： 10.1007/978-981-99-1645-0_40

researchmap
SpecMNet: Spectrum Mend Network for Monaural Speech Enhancement Reviewed

Cunhang Fan, Hongmei Zhang, Jiangyan Yi, Zhao Lv, Jianhua Tao, Taihao Li, Guanxiong Pei, Xiaopei Wu, Sheng Li

Applied Acoustics 194 ( 108792 ) 2022.12

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.apacoust.2022.108792

researchmap
Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling Reviewed

Siqing Qin, Longbiao Wang, Sheng Li, Jianwu Dang, Lixin Pan

EURASIP Journal on Audio, Speech, and Music Processing 2022 ( 1 ) 1 - 10 2022.12

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

<title>Abstract</title>Conventional automatic speech recognition (ASR) and emerging end-to-end (E2E) speech recognition have achieved promising results after being provided with sufficient resources. However, for low-resource language, the current ASR is still challenging. The Lhasa dialect is the most widespread Tibetan dialect and has a wealth of speakers and transcriptions. Hence, it is meaningful to apply the ASR technique to the Lhasa dialect for historical heritage protection and cultural exchange. Previous work on Tibetan speech recognition focused on selecting phone-level acoustic modeling units and incorporating tonal information but underestimated the influence of limited data. The purpose of this paper is to improve the speech recognition performance of the low-resource Lhasa dialect by adopting multilingual speech recognition technology on the E2E structure based on the transfer learning framework. Using transfer learning, we first establish a monolingual E2E ASR system for the Lhasa dialect with different source languages to initialize the ASR model to compare the positive effects of source languages on the Tibetan ASR model. We further propose a multilingual E2E ASR system by utilizing initialization strategies with different source languages and multilevel units, which is proposed for the first time. Our experiments show that the performance of the proposed method-based ASR system exceeds that of the E2E baseline ASR system. Our proposed method effectively models the low-resource Lhasa dialect and achieves a relative 14.2% performance improvement in character error rate (CER) compared to DNN-HMM systems. Moreover, from the best monolingual E2E model to the best multilingual E2E model of the Lhasa dialect, the system’s performance increased by 8.4% in CER.

DOI： 10.1186/s13636-021-00233-4

researchmap

Other Link： https://link.springer.com/article/10.1186/s13636-021-00233-4/fulltext.html
Can We Train a Language Model Inside an End-to-End ASR Model? - Investigating Effective Implicit Language Modeling Reviewed

Zhuo Gong, Saito Daisuke, Sheng Li, Hisashi Kawai, Minematsu Nobuaki

Proceedings of the Second Workshop on When Creative AI Meets Conversational AI 42 - 47 2022.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Subband-based Spectrogram Fusion for Speech Enhancement by Combining Mapping and Masking Approaches Reviewed

Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, Tatsuya Kawahara

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.23919/apsipaasc55919.2022.9979930

researchmap
Nict-Tib1: A Public Speech Corpus Of Lhasa Dialect For Benchmarking Tibetan Language Speech Recognition Systems Reviewed

Kak Soky, Zhuo Gong, Sheng Li

2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) 2022.11

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/o-cocosda202257103.2022.9997917

researchmap
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction Reviewed

Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, Tatsuya Kawahara

Interspeech 2022 2022.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2022-11268

researchmap
Multi-Domain Dialogue State Tracking with Top-k Slot Self Attention Reviewed

Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki

In Proc. SIGdial Meeting Discourse \& Dialogue 2022.9

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism Reviewed

Kak Soky, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara

in Proc. INTERSPEECH 2022.9

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.21437/Interspeech.2022-343

researchmap
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection Reviewed

Longfei Yang, Wenqing Wei, Sheng Li, Jiyi Li, Takahiro Shinozaki

in Proc. INTERSPEECH 2022.9

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.21437/Interspeech.2022-943

researchmap
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection Reviewed

Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki

in Proc. INTERSPEECH 2022.9

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/Interspeech.2022-10088

researchmap
Fusion of Self-supervised Learned Models for MOS Prediction Reviewed

Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao

in Proc. INTERSPEECH 2022.9

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.21437/Interspeech.2022-10262

researchmap
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition Reviewed

Siqing Qin, Longbiao Wang, Sheng Li, Yuqin Lin, Jianwu Dang

in Proc. INTERSPEECH 2022.9

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.21437/Interspeech.2022-10015

researchmap
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network Reviewed

Nan LI, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li, Jianwu Dang

in Proc. INTERSPEECH 2022.9

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.21437/Interspeech.2022-154

researchmap
Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network Reviewed

Kai Li, Xugang Lu, Masato Akagi, Jianwu Dang, Sheng Li, Masashi Unoki

2022 30th European Signal Processing Conference (EUSIPCO) 379 - 383 2022.8

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.23919/eusipco55093.2022.9909649

researchmap
TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies Invited Reviewed

Kak Soky, Masato Mimura, Tatsuya Kawahara, Chenhui Chu, Sheng Li, Chenchen Ding, Sethserey Sam

International Journal of Asian Language Processing 31 ( 03n04 ) 2022.7

　More details

Authorship：Lead author Language：English Publishing type：Research paper (scientific journal) Publisher：World Scientific Pub Co Pte Ltd

This paper presents an extended work on the trilingual spoken language translation corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC), namely TriECCC. TriECCC is a simultaneously spoken language translation corpus with parallel resources of speech and text in three languages: Khmer, English, and French. This corpus has approximately [Formula: see text] thousand utterances, approximately [Formula: see text], [Formula: see text], and [Formula: see text] h in length of speech, and [Formula: see text], [Formula: see text] and [Formula: see text] million words in text, in Khmer, English, and French, respectively. We first report the baseline results of machine translation (MT), and speech translation (ST) systems, which show reasonable performance. We then investigate the use of the ROVER method to combine multiple MT outputs and fine-tune the pre-trained English–French MT models to enhance the Khmer MT systems. Experimental results show that the ROVER is effective for combining English-to-Khmer and French-to-Khmer systems. Fine-tuning from both single and multiple parents shows the effective improvement on the BLEU scores for Khmer-to-English/French and English/French-to-Khmer MT systems.

DOI： 10.1142/s2717554522500072

researchmap
Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model Reviewed

Zhuo Gong, Daisuke Saito, Longfei Yang, Takahiro Shinozaki, Sheng Li, Hisashi Kawai, Nobuaki Minematsu

The Speaker and Language Recognition Workshop (Odyssey 2022) 2022.6

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/odyssey.2022-58

researchmap
Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection. Reviewed

Sheng Li, Jiyi Li, Qianying Liu, Zhuo Gong

In Proc. LREC (Language Resources and Evaluation Conference) 7291 - 7297 2022.6

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Mining Hard Samples Locally And Globally For Improved Speech Separation Reviewed

Kai Wang, Yizhou Peng, Hao Huang, Ying Hu, Sheng Li

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022.5

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp43922.2022.9747797

researchmap
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation Reviewed

Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li, Chenchen Ding, Lixin Pan, Yuguang Wang, Jianwu Dang, Kiyoshi Honda

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022.5

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp43922.2022.9746113

researchmap
Cross-Lingual Transfer Learningfor End-to-End Speech Translation

Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi

Journal of Natural Language Processing 29 ( 2 ) 611 - 637 2022

　More details

Publishing type：Research paper (scientific journal) Publisher：Association for Natural Language Processing

DOI： 10.5715/jnlp.29.611

researchmap
Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview Reviewed

Xiaojiao Chen, Sheng Li, Hao Huang

Applied Sciences, Special Issues of Machine Speech Communication, 2021. 2021.12

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (scientific journal)

researchmap
Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC) Reviewed

Kak Soky, Masato Mimura, Tatsuya Kawahara, Sheng Li, Chenchen Ding, Chenhui Chu, Sethserey Sam

in Proc. O-COCOSDA 2021.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora Reviewed

Kak Soky, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara

In Proc. APSIPA ASC 2021.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Spectrograms Fusion-based End-to-End Robust Automatic Speech Recognition Reviewed

Hao Shi, Longbiao Wang, Sheng Li, Cunhang Fan, Jianwu Dang, Tatsuya Kawahara

In Proc. APSIPA ASC 2021.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework Reviewed

Yizhou Peng, Jicheng Zhang, Haobo Zhang, Haihua Xu, Hao Huang, Sheng Li, Eng Siong Chng

In Proc. APSIPA ASC 2021.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model Reviewed

Ding Wang, Shuaishuai Ye, Xinhui Hu, Sheng Li, Xinkang Xu

Interspeech 2021 3266 - 3270 2021.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2021-374

researchmap
End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain Reviewed

Kai Wang, Hao Huang, Ying Hu, Zhihua Huang, Sheng Li

Interspeech 2021 3046 - 3050 2021.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2021-504

researchmap
Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network

Nan Li, Longbiao Wang, Masashi Unoki, Sheng Li, Rui Wang, Meng Ge, Jianwu Dang

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021.6

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp39728.2021.9415045

researchmap
An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System

Shunfei Chen, Xinhui Hu, Sheng Li, Xinkang Xu

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021.6

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp39728.2021.9414598

researchmap
Encoder-Decoder Based Pitch Tracking and Joint Model Training for Mandarin Tone Classification Reviewed

Hao Huang, Kai Wang, Ying Hu, Sheng Li

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021.6

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp39728.2021.9413888

researchmap
Phantom in the Opera: Effective Adversarial Music Attack on Keyword Spotting Systems.

Heran Zhang, Sheng Li, Xingjun Ma, Yi Zhao, Yang Cao, Tatsuya Kawahara

IEEE-SLT2021 2021

　More details

Authorship：Corresponding author

researchmap
Simultaneous Progressive Filtering-Based Monaural Speech Enhancement Reviewed

Haoran Yin, Hao Shi, Longbiao Wang, Luya Qiang, Sheng Li, Meng Ge, Gaoyan Zhang, Jianwu Dang

Communications in Computer and Information Science 213 - 221 2021

　More details

Language：English Publishing type：Part of collection (book) Publisher：Springer International Publishing

DOI： 10.1007/978-3-030-92307-5_25

researchmap
Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS Reviewed

Dawei Liu, Longbiao Wang, Sheng Li, Haoyu Li, Chenchen Ding, Ju Zhang, Jianwu Dang

Communications in Computer and Information Science 110 - 118 2021

　More details

Authorship：Corresponding author Language：English Publishing type：Part of collection (book) Publisher：Springer International Publishing

DOI： 10.1007/978-3-030-92310-5_13

researchmap
Speech Dereverberation Based on Scale-Aware Mean Square Error Loss Reviewed

Luya Qiang, Hao Shi, Meng Ge, Haoran Yin, Nan Li, Longbiao Wang, Sheng Li, Jianwu Dang

Communications in Computer and Information Science 55 - 63 2021

　More details

Language：English Publishing type：Part of collection (book) Publisher：Springer International Publishing

DOI： 10.1007/978-3-030-92307-5_7

researchmap
VOIS: The First Speech Therapy App Specifically Designed for Myanmar Hearing-Impaired Children

Aye Thida, Nway Nway Han, Sheinn Thawtar Oo, Sheng Li, Chenchen Ding

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) 2020.11

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/o-cocosda50338.2020.9295024

researchmap
Compensation on x-vector for Short Utterance Spoken Language Identification

Peng Shen, Xugang Lu, Komei Sugiura, Sheng Li, Hisashi Kawai

Odyssey 2020 The Speaker and Language Recognition Workshop 47 - 52 2020.11

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/odyssey.2020-7

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/odyssey/odyssey2020.html#ShenLS0K20
Joint Training End-to-End Speech Recognition Systems with Speaker Attributes

Sheng Li, Xugang Lu, Raj Dabre, Peng Shen, Hisashi Kawai

Odyssey 2020 The Speaker and Language Recognition Workshop 2020.11

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/odyssey.2020-54

researchmap
Voice-Indistinguishability -- Protecting Voiceprint with Differential Privacy under an Untrusted Server

Yaowei Han, Yang Cao, Sheng Li, Qiang Ma, Masatoshi Yoshikawa

Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security 2125 - 2127 2020.10

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3372297.3420025

researchmap
Singing Voice Extraction with Attention-Based Spectrograms Fusion

Hao Shi, Longbiao Wang, Sheng Li, Chenchen Ding, Meng Ge, Nan Li, Jianwu Dang, Hiroshi Seki

Interspeech 2020 2020.10

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2020-1043

researchmap
Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription

Yuqin Lin, Longbiao Wang, Sheng Li, Jianwu Dang, Chenchen Ding

Interspeech 2020 2020.10

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2020-1755

researchmap
Voice-Indistinguishability: Protecting Voiceprint In Privacy-Preserving Speech Data Release

Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, Masatoshi Yoshikawa

2020 IEEE International Conference on Multimedia and Expo (ICME) 2020.7

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icme46284.2020.9102875

researchmap
End-to-End Articulatory Modeling for Dysarthric Articulatory Attribute Detection

Yuqin Lin, Longbiao Wang, Jianwu Dang, Sheng Li, Chenchen Ding

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020.5

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp40776.2020.9054233

researchmap
Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation

Hao Shi, Longbiao Wang, Meng Ge, Sheng Li, Jianwu Dang

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020.5

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp40776.2020.9054661

researchmap
Automatic speech recognition

Xugang Lu, Sheng Li, Masakiyo Fujimoto

SpringerBriefs in Computer Science 21 - 38 2020

　More details

Language：English Publishing type：Part of collection (book) Publisher：Springer

DOI： 10.1007/978-981-15-0595-9_2

Scopus

researchmap
Investigation of Effectively Synthesizing Code-Switched Speech Using Highly Imbalanced Mix-Lingual Data

Shaotong Guo, Longbiao Wang, Sheng Li, Ju Zhang, Cheng Gong, Yuguang Wang, Jianwu Dang, Kiyoshi Honda

Neural Information Processing 36 - 47 2020

　More details

Publishing type：Part of collection (book) Publisher：Springer International Publishing

DOI： 10.1007/978-3-030-63830-6_4

researchmap
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification

Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai

IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 2674 - 2683 2020

　More details

Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/taslp.2020.3023627

researchmap
Effective Training End-to-End ASR systems for Low-resource Lhasa Dialect of Tibetan Language

Lixin Pan, Sheng Li, Longbiao Wang, Jianwu Dang

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2019.11

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/apsipaasc47483.2019.9023100

researchmap
Multi-lingual transformer training for khmer automatic speech recognition Reviewed

Kak Soky, Sheng Li, Tatsuya Kawahara, Sopheap Seng

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 1893 - 1896 2019.11

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/APSIPAASC47483.2019.9023137

Scopus

researchmap
Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection

Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

Interspeech 2019 2019.9

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2019-2271

researchmap
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation

Sheng Li, Dabre Raj, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai

Interspeech 2019 2019.9

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2019-2112

researchmap
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition

Sheng Li, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai

Interspeech 2019 2019.9

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2019-2092

researchmap
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese

Sheng Li, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai

Interspeech 2019 2019.9

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2019-2104

researchmap
Interactive learning of teacher-student model for short utterance spoken language identification. Reviewed

P.Shen, X.Lu, S. Li, H.Kawai

Proc. IEEE-ICASSP 2019

　More details

DOI： 10.1109/icassp.2019.8683371

researchmap
INVESTIGATION OF SEQUENCE-LEVEL KNOWLEDGE DISTILLATION METHODS FOR CTC ACOUSTIC MODELS Reviewed

Ryoichi Takashima, Li Sheng, Hisashi Kawai

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 6156 - 6160 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2019.8682671

Web of Science

researchmap
Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification. Reviewed

P.Shen, X.Lu, S. Li, H.Kawai

Proc. INTERSPEECH 2018

　More details

researchmap
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks Reviewed

Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara, Hisashi Kawai

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6 3708 - 3712 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.21437/Interspeech.2018-1475

Web of Science

researchmap
CTC LOSS FUNCTION WITH A UNIT-LEVEL AMBIGUITY PENALTY Reviewed

Ryoichi Takashima, Sheng Li, Hisashi Kawai

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 5909 - 5913 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
AN INVESTIGATION OF A KNOWLEDGE DISTILLATION METHOD FOR CTC ACOUSTIC MODELS Reviewed

Ryoichi Takashima, Sheng Li, Hisashi Kawai

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 5809 - 5813 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Temporal Attentive Pooling for Acoustic Event Detection. Reviewed

X.Lu, P.Shen, S. Li, Y.Tsao, H.Kawai

Proc. INTERSPEECH 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
IMPROVING VERY DEEP TIME-DELAY NEURAL NETWORK WITH VERTICAL-ATTENTION FOR EFFECTIVELY TRAINING CTC-BASED ASR SYSTEMS Reviewed

Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara, Hisashi Kawai

2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018) 77 - 83 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/slt.2018.8639675

Web of Science

researchmap
SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING Reviewed

Sheng Li, Xugang Lu, Shinsuke Sakai, Masato Mimura, Tatsuya Kawahara

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 5270 - 5274 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2017.7953162

Web of Science

researchmap
Conditional generative adversarial nets classifier for spoken language identification Reviewed

Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017- 2814 - 2818 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：International Speech Communication Association

DOI： 10.21437/Interspeech.2017-553

Scopus

researchmap
INCREMENTAL TRAINING AND CONSTRUCTING THE VERY DEEP CONVOLUTIONAL RESIDUAL NETWORK ACOUSTIC MODELS Reviewed

Sheng Li, Xugang Lu, Peng Shen, Ryoichi Takashima, Tatsuya Kawahara, Hisashi Kawai

2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) 222 - 227 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ASRU.2017.8268939

Web of Science

researchmap
Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses Reviewed

Sheng Li, Yuya Akita, Tatsuya Kawahara

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 24 ( 9 ) 1524 - 1534 2016.9

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/TASLP.2016.2562505

Web of Science

researchmap
Data Selection from Multiple ASR Systems' Hypotheses for Unsupervised Acoustic Model Training Reviewed

Sheng Li, Yuya Akita, Tatsuya Kawahara

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS 5875 - 5879 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2016.7472804

Web of Science

researchmap
Confidence Estimation for Speech Recognition Systems using Conditional Random Fields Trained with Partially Annotated Data Reviewed

Sheng Li, Xugang Lu, Shinsuke Mori, Yuya Akita, Tatsuya Kawahara

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ISCSLP.2016.7918419

Web of Science

researchmap
Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training Reviewed

Sheng Li, Yuya Akita, Tatsuya Kawahara

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E98D ( 8 ) 1545 - 1552 2015.8

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1587/transinf.2015EDP7047

Web of Science

researchmap
Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation Reviewed

Sheng Li, Xugang Lu, Yuya Akita, Tatsuya Kawahara

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 2892 - 2896 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Discriminative Data Selection for Lightly Supervised Training of Acoustic Model using Closed Caption Texts Reviewed

Sheng Li, Yuya Akita, Tatsuya Kawahara

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 3526 - 3530 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Corpus and Transcription System of Chinese Lecture Room Reviewed

Sheng Li, Yuya Akita, Tatsuya Kawahara

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) 442 - 445 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ISCSLP.2014.6936595

Web of Science

researchmap
Phoneme-level articulatory animation in pronunciation training Reviewed

Lan Wang, Hui Chen, Sheng Li, Helen M. Meng

SPEECH COMMUNICATION 54 ( 7 ) 845 - 856 2012.9

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.specom.2012.02.003

Web of Science

researchmap
Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data Reviewed

Sheng Li, Lan Wang

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 902 - 905 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
The Phoneme-level Articulator Dynamics for Pronunciation Animation Reviewed

Sheng Li, Lan Wang, En Qi

Proc. IALP 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IALP.2011.13

researchmap
IELS: A Computer-aided Pronunciation Training System for Undergraduate Students Reviewed

Jinyu Chen, Lan Wang, Chongguo Li, Jin Hu, Sheng Li

ICETC 2010

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICETC.2010.5529236

researchmap

▼display all

Books

Voices of the Himalayas: Investigation of Speech Recognition Technology for the Tibetan Language

Sheng Li（ Role： Sole author）

2023.2 （ ISBN:9784904020289 ）

　More details

researchmap
Bridging Eurasia: Multilingual Speech Recognition for Silkroad

Sheng Li（ Role： Sole author）

2023.1 （ ISBN:9784904020296 ）

　More details

researchmap
Phantom in the Opera: The Vulnerabilities of Speech-based Artificial Intelligence Systems

Sheng Li（ Role： Sole author）

2022.11 （ ISBN:9784904020265 ）

　More details

researchmap
Automatic speech recognition: Speech-to-Speech Translation

X. Lu, S. Li, M. Fujimoto（ Role： Joint authorChapter 3.3.2: From Shallow to Deep and Very Deep. Chapter 3.3.3: End-to-End and CTC models.）

Springer Singapore 2020

　More details

researchmap

MISC

Evaluating Tibetan ASR with Segmented Word Error Rate: Beyond Character-Level Metrics

Jacob Moore, Sheng Li, Paula Lauren

TechRxiv 2026.2

　More details

Language：English Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.36227/techrxiv.177102186.63648582/v1

researchmap
End-to-end Acoustic-linguistic Emotion and Intent Recognition Enhanced by Semi-supervised Learning.

Zhao Ren, Rathi Adarshi Rammohan, Kevin Scheck, Sheng Li, Tanja Schultz

2025.12

　More details

DOI： 10.48550/arXiv.2507.07806

researchmap
Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement

Jianing Yang, Sheng Li, Takahiro Shinozaki, Yuki Saito, Hiroshi Saruwatari

arXiv 2025.10

　More details

Language：English

DOI： 10.48550/arXiv.2510.01722

researchmap
Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR

Hongli Yang, Sheng Li, Hao Huang, Ayiduosi Tuohan, Yizhou Peng

arxiv 2025.7

　More details

Language：English

DOI： 10.48550/arXiv.2506.21577

researchmap
Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning

Hongli Yang, Yizhou Peng, Hao Huang, Sheng Li

2025.7

　More details

Language：English

DOI： 10.48550/arXiv.2506.21576

researchmap
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation

Haowei Lou, Hye-young Paik, Sheng Li, Wen Hu, Lina Yao

arXiv preprint arXiv:2504.08274 2025.4

　More details

Language：English

researchmap
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

Zhengdong Yang, Qianying Liu, Sheng Li, Fei Cheng, Chenhui Chu

arXiv 2025.1

　More details

Language：English

DOI： 10.48550/arXiv.2501.17615

researchmap
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang

arXiv 2025.1

　More details

Language：English Publishing type：Research paper, summary (international conference)

DOI： 10.48550/arXiv.2501.07329

researchmap
Multi-Prototype Network with Swin Transformer for Open Set Recognition

Jun Wang, Haiyan Yang, Sheng Li, Di Zhou, Xingwei Chen, Juncheng Li, Yufeng Hua, Jun Shi

SSRN 2025

　More details

Language：English Publishing type：Article, review, commentary, editorial, etc. (scientific journal) Publisher：Elsevier BV

DOI： 10.2139/ssrn.5134636

researchmap
A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations

Phurich Saengthong, Boonnithi Jiaramaneepini, Sheng Li, Manabu Okumura, Takahiro Shinozaki

2025

　More details

DOI： 10.48550/arXiv.2507.02927

researchmap
Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads

Jing Li, Felix Schijve, Sheng Li, Yuye Yang, Jun Hu, Emilia Barakova

2025

　More details

Language：English

DOI： 10.48550/arXiv.2507.10427

researchmap
Multi-Prototype Network with Swin Transformer for Open Set Recognition

Jun Wang, Haiyan Yang, Sheng Li, Di Zhou, Xingwei Chen, Juncheng Li, Yufeng Hua, Jun Shi

2025

　More details

Language：English Publisher：Elsevier BV

DOI： 10.2139/ssrn.5134636

researchmap
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction

Yuka Ko, Sheng Li, Chao-Han Huck Yang, Tatsuya Kawahara

arXiv 2024.12

　More details

Language：English

DOI： 10.48550/arXiv.2408.16180

researchmap
Extracting Spatiotemporal Data from Gradients with Large Language Models

Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

arXiv 2024.10

　More details

Language：English

DOI： 10.48550/arXiv.2410.16121

researchmap
Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition

Chao Tan, Sheng Li, Yang Cao, Zhao Ren, Tanja Schultz

arXiv 2024.10

　More details

Language：English

DOI： 10.48550/arXiv.2410.13221

researchmap
Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks

Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

arXiv 2024.7

　More details

Language：English

DOI： 10.48550/arXiv.2407.08529

researchmap
MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara

2024.1

　More details

Language：English Publishing type：Research paper, summary (international conference)

DOI： 10.48550/arXiv.2401.13249

researchmap
End-to-End Speech-to-Speech Translation toolkit

Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li

ACM Multimedia Asia 2023 workshop released tookit 2023.12

　More details

researchmap
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer's Speech Detection

Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li

2023.11

　More details

Authorship：Last author,　Corresponding author Language：English

DOI： 10.48550/arXiv.2311.13043

researchmap
LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu

2023.11

　More details

DOI： 10.48550/arXiv.2311.10656

researchmap
Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization

Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He

2023.11

　More details

DOI： 10.48550/arXiv.2311.10664

researchmap
GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System

Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He

2023.11

　More details

DOI： 10.48550/arXiv.2311.10689

researchmap
Towards Speech Dialogue Translation Mediating Speakers of Different Languages

Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi

arXiv:2305.09210 2023.5

　More details

Language：English

researchmap
Robust Voice Activity Detection Using an Auditory-Inspired Masked Modulation Encoder Based Convolutional Attention Network

Nan LI, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li, Jianwu Dang

2023

　More details

Language：English Publisher：Elsevier BV

DOI： 10.2139/ssrn.4557926

researchmap
Speech-text based multi-modal training with bidirectional attention for improved speech recognition

Yuhang Yang, Haihua Xu, Hao Huang, Eng Siong Chng, Sheng Li

2022.10

　More details

researchmap
Tendency-and-Attention-Informed Deep Learning for ENSO Forecasts

Shen Qiao, Cuicui Zhang, Xuefeng Zhang, Kai Zhang, Hao Shi, Sheng Li, Hao Wei

2022.6

　More details

Publisher：Research Square Platform LLC

Abstract

Deep learning has been acknowledged as an increasingly important technology for ENSO forecasts. The most cutting-edge deep learning algorithm is developed based on Convolutional Neural Network (CNN), which can achieve a multi-year (about 17-month-lead) forecast and has conquered the ‘spring forecast barrier’ problem. However, this group of methods are still challenged by several critical issues. First, they usually utilize the global sea surface temperature (SST) fields as inputs without considering the specific contributions of variant oceanic regions in ENSO forecasts. Consequently, they cannot effectively investigate the role of the ‘teleconnection’ mechanism among different oceans (Indian, Pacific, and Atlantic Oceans) and different ocean parts (the tropic and non-tropic regions) especially in the forecast of extreme ENSO events. Second, existing methods mainly utilize the discrete monthly SST fields for Deep Learning for ENSO Forecasts ENSO forecasts without investigating the rate-of-changes between adjacent months, which also provides important information to the prediction of variation tendency. To solve these problems, this paper develops a Tendency-and-Attention-Informed Deep Residual Network (TA-DRN) for multi-year ENSO forecasts. The contributions of different oceanic regions can be learned by a spatial attention module while the variation tendency of adjacent previous and current months can be interpreted by the first-and-second order of differences of SST fields. Through informed by these two modules, the performance of TA-DRN can be improved significantly, especially in predicting extreme El Niño and La Niña events.

DOI： 10.21203/rs.3.rs-1733575/v1

researchmap

Other Link： https://www.researchsquare.com/article/rs-1733575/v1.html
Fusion of Self-supervised Learned Models for MOS Prediction

Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao

CoRR abs/2204.04855 2022.4

　More details

Authorship：Corresponding author

researchmap
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition.

Qianying Liu, Yuhang Yang, Zhuo Gong, Sheng Li, Chenchen Ding, Nobuaki Minematsu, Hao Huang, Fei Cheng, Sadao Kurohashi

abs/2204.03855 2022.4

　More details

Authorship：Corresponding author

researchmap
Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release

Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, Masatoshi Yoshikawa

CoRR abs/2004.07442 2020.6

　More details

Authorship：Lead author

researchmap
Deep progressive multi-scale attention for acoustic event classification

Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

CoRR abs/1912.12011 2019.4

　More details

Authorship：Lead author

researchmap

▼display all

Presentations

大規模言語モデルの統合による音声認識システムの改善 Invited

李勝

NICT Open House 2024 2024.6

　More details

Event date： 2024.6

Language：Japanese

researchmap
Diversity-driven Semi-supervised Ensemble DNN Acoustic Model Training (音声)

LI Sheng, LU Xugang, SAKAI Shinsuke, KAWAHARA Tatsuya

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 2016.8 電子情報通信学会

　More details

Event date： 2016.8

Language：English

researchmap
Discriminative Data Selection from Multiple ASR Systems' Hypotheses for Unsupervised Acoustic Model Training (音声) -- (第17回音声言語シンポジウム)

LI SHENG, AKITA YUYA, KAWAHARA TATSUYA

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 2015.12 電子情報通信学会

　More details

Event date： 2015.12

Language：English

researchmap
相互情報量最小化による感情・音色の分離に基づく感情的音声合成,

楊家寧, 李勝, 篠崎隆宏, 齋藤佑樹, 猿渡洋

日本音響学会研究発表会講演論文集, 秋季 2025.10

　More details

Language：English Presentation type：Oral presentation (general)

researchmap
RAG-Boost: Retrieval-Augmented Generation Enhanced Speech Recognition in LLM-based Spoken Dialogue Systems

王鵬程, 李勝, 篠崎隆宏

日本音響学会研究発表会講演論文集, 秋季 2025.10

　More details

researchmap
Application of the RFID based audio service in regional navigation system

S. Li, C. Li

Bulletin of Advanced Technology Research 2009

　More details

researchmap
The Phoneme-level Articulator Dynamics for 3D Pronunciation Animation for Chinese

S. Li, K. Luo, L. Wang

Bulletin of Advanced Technology Research 2011

　More details

researchmap
Phoneme-level articulatory animation in pronunciation training using EMA data

Sheng LI

Speech Synthesis Lab., Tsinghua University, host: Prof. Zhiyong Wu. 2012

　More details

researchmap
Vocal Tract Length Normalization for Chinese Spontaneous Speech Recogntion

Sheng LI

Technical-report.（Kyoto university） 2013

　More details

researchmap
Multi-lingual transformer training for Khmer automatic speech recognition

K. Soky, S. Li, T. Kawahara, S. Seng

Interspeech 2020 Satellite Workshop (SLIMTS2020). (abstract paper)

　More details

researchmap
Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release

Y. Han, S. Li, Y. Cao, Q. Ma, M. Yoshikawa

Interspeech 2020 Satellite Workshop (SLIMTS2020). (abstract paper)(invited report)

　More details

researchmap
Automatic Transcription of Chinese Spoken Lectures

S. Li, M. Mimura, T. Kawahara

Acoustical Society of Japan, autumn 2013

　More details

researchmap
DNN-based Acoustic Modeling and Decoding for Chinese Spontaneous Speech Recogntion with HTK

Sheng LI

Technical-report.（Kyoto university） 2014

　More details

researchmap
Lightly-supervised training and confidence estimation by using CRF classifiers,

Sheng LI

Speech and Cognition Lab., Tianjin University, host: Prof. Jianwu Dang and Prof. Kiyoshi Honda. 2014

　More details

researchmap
Effective combination of multiple ASR hypotheses with CRF-based classifiers

S. Li, Y. Akita, T. Kawahara

Acoustical Society of Japan, autumn 2015

　More details

researchmap
Discriminative data selection from multiple ASR systems' hypotheses for unsupervised acoustic model training

S. Li, Y. Akita, T. Kawahara

IPSJ SIG-SLP-109-8 2015

　More details

researchmap
Data Selection Assisted by Caption to Improve Acoustic Modeling for Lecture Transcription

S. Li, Y. Akita, T. Kawahara

Acoustical Society of Japan, spring 2014

　More details

researchmap
Classifier-based data selection for lightly-supervised training of acoustic model for lecture transcription

S. Li, Y. Akita, T. Kawahara

IPSJ SIG-SLP-102-4 2014

　More details

researchmap
Unsupervised Training of Deep Neural Network Acoustic Models for Lecture Transcriptions

S. Li, Y. Akita, T. Kawahara

Acoustical Society of Japan, autumn 2014

　More details

researchmap
Incorporating divergences from hypotheses of multiple ASR systems to improve unsupervised acoustic model training

S. Li, Y. Akita, T. Kawahara

Acoustical Society of Japan 2015

　More details

researchmap
Diversity-driven Semi-supervised Ensemble DNN Acoustic Model Training

S. Li, X. Lu, S. Sakai, T. Kawahara

Acoustical Society of Japan, autumn 2016

　More details

researchmap
Very deep convolutional residual network acoustic models for Japanese lecture transcription

S. Li, X. Lu, P. Shen, H. Kawai

Acoustical Society of Japan, autumn 2017

　More details

researchmap
cGAN-classifier: Conditional Generative Adversarial Nets for Classification

P. Shen, X. Lu, S. Li, H. Kawai

Acoustical Society of Japan, autumn 2017

　More details

researchmap
CTC 音響モデルのための knowledge distillation 方式の検討

R.Takashima, S. Li, H. Kawai

Acoustical Society of Japan, spring 2018

　More details

researchmap
Short utterance-based spoken language identification

P. Shen, X. Lu, S. Li, H. Kawai

Acoustical Society of Japan, autumn 2018

　More details

researchmap
Training CTC and LFMMI-based TDNN with CNTK

Sheng LI

NICT internal report 2018

　More details

researchmap
CTC音響モデルのためのシーケンスレベル知識蒸留法の検討

高島遼一, 李勝, 河井恒

IPSJ SIG-SLP 2018

　More details

researchmap
An Empirical Comparison of Sequence Training Methods for the Very Deep Time-delay Neural Network

S. Li, X. Lu, R.Takashima, P. Shen, H. Kawai

Acoustical Society of Japan, autumn 2018

　More details

researchmap
Improving CTC-based acoustic model with very deep residual neural network

S. Li, X. Lu, R.Takashima, P. Shen, H. Kawai

Acoustical Society of Japan, spring 2018

　More details

researchmap
End-to-end音声認識技術の研究

李勝

情報通信フェア2019 2019.9

　More details

researchmap
End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition

S. Li, C. Ding, X. Lu, P. Shen and H. Kawai,

Acoustical Society of Japan, spring 2020

　More details

Presentation type：Oral presentation (general)

researchmap
Joint Training End-to-End Systems for Speech and Speaker Recognition with Speaker Attributes,

S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai,

Acoustical Society of Japan, spring 2020

　More details

Presentation type：Oral presentation (general)

researchmap
Improvement of x-vector for short utterance spoken language identification,

P. Shen, X. Lu, K. Sugiura, S. Li, H. Kawai,

Acoustical Society of Japan, spring 2020

　More details

Presentation type：Oral presentation (general)

researchmap
Investigation of multi-domain training for speech recognition,

P. Shen, X. Lu, S. Li, H. Kawai

Acoustical Society of Japan, spring 2019.3

　More details

researchmap
Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release Invited

Y. Han, S. Li, Y. Cao, Q. Ma, M. Yoshikawa

INTERSPEECH 2020 Satellite Workshop (SLIMTS2020) (invited report) 2020.10

　More details

Language：English Presentation type：Oral presentation (invited, special)

researchmap
A Mixture of Character and Word End-to-End System for Keyword Spotting Invited

H. Zhang, S. Ueno, M. Mimura, S. Li, W. Zhang, T. Kawahara

INTERSPEECH 2020 Satellite Workshop (SLIMTS2020)(full paper) 2020.9

　More details

Language：English Presentation type：Oral presentation (invited, special)

researchmap
Investigation of Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data and mask embedding,

S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, K. Honda

INTERSPEECH 2020 Satellite Workshop (SLIMTS2020). 2020.9

　More details

Language：English Presentation type：Oral presentation (general)

researchmap
Multi-lingual transformer training for Khmer automatic speech recognition, Invited

K. Soky, S. Li, T. Kawahara, S. Seng

INTERSPEECH 2020 Satellite Workshop (SLIMTS2020). 2020.9

　More details

Language：English Presentation type：Oral presentation (invited, special)

researchmap
System Description for Voice Privacy Challenge (Kyoto Team).

Y. Han, S. Li, Y. Cao, M. Yoshikawa

In special session of INTERSPEECH2020 (VoicePrivacy challenge 2020) 2020.9

　More details

Language：English Presentation type：Oral presentation (general)

researchmap
Description of End-to-End Dialect Identification System (accepted in INTERSPEECH2021)

Ding Wang, Shuaishuai Ye, Xinhui Hu, Sheng Li, Xinkang Xu

In special session of INTERSPEECH2021 (OLR2020 challenge) 2021.9

　More details

Language：English Presentation type：Poster presentation

researchmap
Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview

Xiaojiao Chen, Sheng Li, Hao Huang

NCMMSC2021 2021.10

　More details

Language：English Presentation type：Oral presentation (general)

researchmap
System description of Alzheimer's disease early detection (Silk-road team, short speech track)

Wenqing Wei, Rui Wong, Sheng Li, Yachao Guo, Hao Huang

Alzheimer's disease detection challenge (NCMMSC2021) 2021.10

　More details

Language：English Presentation type：Oral presentation (general)

researchmap
System description of joint speech and accent recognition (published in APSIPA ASC, 2021)

Y. Peng, J. Zhang, H. Zhang, H. Xu, H. Huang, S. Li, E.S. Chng

in Challenge of Interspeech2020 Accented English Speech Recognition, AESR, 2020. 2021.12

　More details

Language：English Presentation type：Poster presentation

researchmap
End-to-End Speech Translation with Cross-lingual Transfer Learning

S Shimizu, C Chu, S Li, S Kurohashi

NLP2021 2021

　More details

researchmap
Comparison of End-to-End Models for Joint Speaker and Speech Recognition

K Soky, S Li, M Mimura, C Chu, T Kawahara

IEICE-SP 2021

　More details

researchmap
The RoyalFlush(NICT) System Description for AP21-OLR Challenge Invited

Ding Wang, Shuaishuai Ye, Xinhui Hu, Sheng Li

AP21-OLR Challenge 2022.1

　More details

Language：English Presentation type：Oral presentation (invited, special)

researchmap
The System Description for VoiceMOS Challenge 2022 (main/ood tasks)

2022

　More details

researchmap
System Description for the CN-Celeb Speaker Recognition Challenge 2022

Guangxing Li, Wangjin Zhou, Sheng Li, Yi Zhao, Hao Huang, Jichen Yang

CNSRC (the CN-Celeb Speaker Recognition Challenge), Speaker Odyssey 2022 2022.6

　More details

Language：English Presentation type：Oral presentation (general)

researchmap
Study on Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network

Li Kai, Xugang Lu, Masato Akagi, Jianwu Dang, Sheng Li, Unoki Masashi

IEICE Tech. Rep. 2022.8

　More details

Language：English Presentation type：Oral presentation (general)

researchmap
異言語話者の対話を仲介する音声対話翻訳

清水周一郎, 褚晨翚, 李勝, 黒橋禎夫

言語処理学会第 29 回年次大会（NLP2023） 2023.3

　More details

Language：Japanese Presentation type：Oral presentation (general)

researchmap
Towards Security-aware Speech Recognition System, Invited

Sheng Li

NECTEC-NICT joint seminar 2023.8

　More details

Language：English Presentation type：Oral presentation (invited, special)

researchmap
Cross-lingual Mapping for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

Zhengdong Yang, Qianying Liu, Sheng Li, Chenhui Chu, Fei Cheng, Sadao Kurohashi

ASJ 2023 autumn 2023.9

　More details

Language：English Presentation type：Poster presentation

researchmap
Correction while Recognition: Combining Pretrained Language Model for Taiwan-accented Speech Recognition Invited

Sheng Li

Joint Seminar with NECTEC Language Understand Group 2023.11

　More details

Language：English Presentation type：Oral presentation (invited, special)

researchmap
System Description for the Voiceprivacy Challenge 2022

Xiaojiao Chen, Guangxing Li, Wangjin Zhou, Sheng Li, Yang Cao, Hao Huang, Yi Zhao

Voiceprivacy Challenge 2022 2022.9

　More details

Language：English Presentation type：Oral presentation (general)

researchmap
VoicePrivacy Challenge: System description

X. Chen, G. Li, H. Huang, W. Zhou, Y. Cao, S. Li, Y. Zhao

VoicePrivacy 2022 Challenge Workshop (Interspeech2022) 2022.9

　More details

Language：English Presentation type：Oral presentation (general)

researchmap
Domain and Language Adaptation of Large-scale Pretrained Model for Speech Recognition of Low-resource Language

Kak Soky, Sheng Li, Chenhui Chu, Tatsuya Kawahara

IEICE Tech. Rep. (信学技報) 2022.12

　More details

researchmap
Self-Supervised Learning MOS Prediction with Listener Enhancement Invited

Sheng Li

VoiceMOS mini workshop 2023.11

　More details

Language：English Presentation type：Oral presentation (invited, special)

researchmap
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition Invited

Zhengdong Yang

ICT-innovation 2023 (Kyoto Univ.) 2024.2

　More details

Language：English Presentation type：Public lecture, seminar, tutorial, course, or other speech

researchmap
Investigating effective methods for combining large language model with speech recognition system

Sheng Li, Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Hisashi Kawai

Acoustical Society of Japan 151st (Spring 2024) meeting 2024.3

　More details

Language：English Presentation type：Poster presentation

researchmap
Combining Large Language Model with Speech Recognition System in Low-resource Settings

Sheng Li, Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Chen Chen, Eng Siong Chng, Hisashi Kawai

NLP2024 2024.3

　More details

Presentation type：Poster presentation

researchmap
Enhancing Multi-Step Reasoning in Language Models with Synthetic Math Data Augmentation (HP_Fighters team)

Jieqing Mei, Jiyi Li, Qianying Liu, Sheng Li

The first workshop of fine-tuning and evaluation large language model (FT-LLM2025)@NLP2025 2025.3

　More details

Language：Japanese Presentation type：Oral presentation (general)

researchmap
CEFR-J Level Estimation of English Learners' Speech Using Large Language Models

Takahiro Shinozai, Syutaro Sato, Sheng Li

CEFR-J 2025 International Symposium 2025.3

　More details

Language：Japanese Presentation type：Oral presentation (general)

researchmap
音声認識および音声翻訳における生成的誤り訂正のための多言語ベンチマーク

Zhengdong Yang, Zhen Wan, Sheng Li, Chao-Han Huck Yang, Chenhui Chu

言語処理学会第32回年次大会 2026.3

　More details

researchmap
Multilingual Retrieval-Augmented Generation Enhanced LLM-based Speech Recognition

王鵬程, 李勝, 篠崎隆宏

日本音響学会第155回(2026年春季)研究発表会 2026.3

　More details

Presentation type：Oral presentation (general)

researchmap
指示再構成手法に基づく言語モデルベース音声合成のスタイル制御

Zhu Shiao, Li Sheng, 篠崎隆宏

日本音響学会第155回(2026年春季)研究発表会 2026.3

　More details

Presentation type：Oral presentation (general)

researchmap

▼display all

Industrial property rights

training method

Sheng Li, Xugang Lu, Ryoichi takashima, Peng Shen, Hisashi Kawai

　More details

Applicant：National Institute of Information and Communications Technology (NICT)

Application no：特願2017-236626 Date applied：2017.12

Announcement no：特開2019-105899 Date announced：2019.6

Patent/Registration no：特許6979203 Date registered：2021.11

Rights holder：National Institute of Information and Communications Technology (NICT)

researchmap
時系列情報の学習システム、方法およびニューラルネットワークモデル

高島遼一, 李勝, 河井恒

　More details

Application no：特願2018-044134

Patent/Registration no：特許7070894 Date registered：2022.5

Rights holder：National Institute of Information and Communications Technology (NICT)

researchmap
音声認識システム、音声認識方法、学習済モデル

李勝, シュガンルー・, 高島遼一, 沈鵬, 河井恒

　More details

Application no：特願2018-044491

Patent/Registration no：特許7109771 Date registered：2022.7

Rights holder：国立研究開発法人情報通信研究機構

researchmap
識別器、学習済モデル、学習方法

李勝, ルーシュガン, 高島遼一, 沈鵬, 河井恒

　More details

Application no：特願2018-142418

Patent/Registration no：特許7209330 Date registered：2023.1

Rights holder：国立研究開発法人情報通信研究機構

researchmap
言語識別モデルの訓練方法及び装置、並びにそのためのコンピュータプログラム

沈鵬, ルーシュガン, 李勝, 河井恒

　More details

Application no：特願2019-086005

Patent/Registration no：特許7282363 Date registered：2023.5

researchmap
推論器、推論プログラムおよび学習方法

李勝, ルー・シュガン, 丁塵辰, 河原達也, 河井恒

　More details

Application no：特願2019-163555

Patent/Registration no：特許7385900 Date registered：2023.11

researchmap
推論器および推論器の学習方法

李勝, ルーシュガン, 河井恒

　More details

Application no：特願2020-059962

Patent/Registration no：特許7423056 Date registered：2024.1

researchmap

▼display all

Works

HSoftmax: Hierachical Softmax (https://github.com/Derek-Gong/hsoftmax/)

Zhuo Gong, Qianying Liu, Sheng Li, Zhengdong Yang, Yuhang Yang

2020

　More details

Work type：Software

researchmap
https://openslr.org/158/

　More details

researchmap
Julius decoder with EESEN CTC acoustic model

　More details

researchmap
Julius decoder with Kaldi acoustic model

　More details

researchmap
Julius decoder with Kaldi feature extractor

　More details

researchmap
VTLN for Julius/HTK acoustic model

　More details

researchmap
Julius for speech foundation models

https, github.com/halspeech/julius-speech-foundation-model

　More details

researchmap
foundation models for Tibetan language

　More details

researchmap
online speech recognition module for Erica the human robot

　More details

researchmap
very deep residual time-delay neural network (TDNN) with LFMMI objective implemented with MS-CNTK

　More details

researchmap

▼display all

Awards

2025 year research funding

2026.3 The Telecommunications Advancement Foundation

　More details

researchmap
Awards and Research Grants from the School of Engineering Common Fund

2025.11 Institute of Science Tokyo

　More details

researchmap
Next Generation Star

2025.10 IEEE IROS2025 https://youtu.be/pP6YtlSVqlM

　More details

researchmap
IES SYPA Award

2025.10 IEEE IROS2025

Sheng Li

　More details

researchmap
best reviewer

2025.8 IEEE RO-MAN2025

Sheng Li

　More details

researchmap
task1: speech recognition error correction using LLM

2024.12 SLT2024 grand challenge LLM GER

　More details

researchmap
top2 in one track

2023.12 ICASSP2024 ICMC-ASR (In-Car Multi-Channel Automatic Speech Recognition) Challenge

　More details

researchmap
1st place in one track in ASRU2023 special session: VoiceMOS challenge

2023.12

　More details

researchmap
IEEE-SPS grant for IEEE-ICASSP2023 oral presentation (Co-supervised PhD student Qianying Liu)

2023.5 IEEE signal processing society

　More details

researchmap
1st place in 6 indexes (total 16) of Main/OOD tracks in INTERSPEECH2022 special session: VoiceMOS challenge

2022

　More details

researchmap
3rd/4th place in constrained/unconstrained resource multilingual ASR tracks of OLR2021 challenge

2021.12 Oriental language recognition challenge 2021

　More details

researchmap
Supervised student (Soky Kak) got best student paper nomination

2021.11 O-COCOSDA2021

　More details

researchmap
Outstanding Performance Award Excellence Award (Group)

2021.6 National Institute of Information and Communications Technology (NICT)

　More details

researchmap
Travel Grant

2020.9 ISCA Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription

supervised student Yuqin Lin

　More details

researchmap
Travel Grant

2020.9 ISCA Singing Voice Extraction with Attention based Spectrograms Fusion

Supervised student Hao Shi

　More details

researchmap
ICME 2020 best student paper nomination, selected as journal paper in IEEE Trans Multimedia (TMM)

2020.7

Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release Y. Han, S. Li, Y. Cao, Q. Ma, M. Yoshikawa IEEE-ICME

　More details

researchmap
FY 2020 International Development Fund (new proposal score top1)

2020.5 National Institute of Information and Communications Technology (NICT)

　More details

researchmap
selected as tenure-track researcher with grants (only 3 persons in FY2019)

2019 National Institute of Information and Communications Technology

　More details

researchmap
Japan Student Journal Paper Award

2018 IEEE Signal Processing Society

Sheng LI

　More details

researchmap
2012-2016 admission/tuition fee total exemption

2016.3 Kyoto Univ.

　More details

researchmap
Paper nominated as ACM/IEEE Trans. Audio, Speech \& Language Process. cover

2016

　More details

researchmap
ポートランド，Interspeech会議へIBM 旅行補助賞金

2012

　More details

researchmap
京都大学推薦国費留学生特別配置入学

2012

　More details

researchmap
職員優秀賞

2011 中国科学院

　More details

researchmap
香港青年起業家プログラムの創造的な企画賞

2011

　More details

researchmap
Encouragement Scholarship

2004 Nanjing University

Sheng Li

　More details

researchmap
Chen Yinchuan Scholarship (Hongkong) for Excellent University New Students

2002

　More details

researchmap
化学オリンピック二等賞,生物学オリンピック三等賞

2002 中国江蘇省

　More details

researchmap

▼display all

Research Projects

Next-Generation Spoken-Language Processing for Robotic and Edge AI Devices Empowered by LLM

Grant number：JPMJBY25F6 2026.4 - 2031.3

Japanese Science and Technology Agency (JST) Broadening Opportunities for Outstanding young researchers and doctoral students in STrategic areas (BOOST)

　 More details

Authorship：Principal investigator

researchmap
enhancing large language model

2024.4

Tohoku University - NICT collaborative research

　 More details

Authorship：Principal investigator

researchmap
意図を的確に伝える音声対話翻訳の基盤技術の創出

2023.4 - 2028.4

JSPS KAKEN Grant-in-Aid for Scientific Research (B)

　 More details

Authorship：Coinvestigator(s)

researchmap
M3OLR: Towards Effective Multilingual, Multimodal and Multitask Oriental Low-resourced Language Speech Recognition

2023.4 - 2026.4

JSPS KAKEN Grant-in-Aid for Scientific Research (C)

　 More details

Authorship：Principal investigator

researchmap
Spoof Detection for Automatic Speaker Verification

2023.4 - 2024.4

ICT Virtual Organization of ASEAN Institutes and NICT (ASEAN IVO)

　 More details

Authorship：Coinvestigator(s)

researchmap
Bridging Eurasia from Sea -- Multilingual Speech Recognition for Maritime Silkroad

2022 - 2024

NICT international funding

　 More details

Authorship：Principal investigator

researchmap
Phantom in the Opera -- the Vulnerabilities of Speech Interface for Robotic Dialogue System

2021.4 - 2023.4

JSPS Grant-in-Aid for Scientific Research Grant-in-Aid for Young Scientists

Sheng Li

　 More details

Authorship：Principal investigator

researchmap
Advanced Multilingual End-to-End Speech Recognition

2020.4 - 2022.4

National Institute of Information & Communications Technology (NICT) NICT tenure-track start-up funding

Sheng Li

　 More details

Authorship：Principal investigator

researchmap
Bridging Eurasia -- Multilingual Speech Recognition for Silkroad

2020.4 - 2022.4

National Institute of Information & Communications Technology (NICT) NICT international funding

Sheng Li

　 More details

Authorship：Principal investigator

researchmap
Speaker De-identification with Provable Privacy in Speech Data Release

2020.4 - 2021.4

NII Open Collaborative Research

　 More details

Authorship：Collaborating Investigator(s) (not designated on Grant-in-Aid)

researchmap
Next generation multilingual End-to-End speech recognition (from G30 to G200)

2019.10 - 2021.3

JSPS Grant-in-Aid for Scientific Research Grant-in-Aid for Research Activity Start-up

Sheng LI

　 More details

Authorship：Principal investigator Grant type：Competitive

researchmap

▼display all

Other

reviewer/program commitee

　More details

[1] IEEE/ACM Trans. Audio, Speech \& Language Process.
[2] Computer Speech and Language
[3] Speech Communication
[4] IEICE transactions, letters
[5] APSIPA transactions
[6] Applied Acoustics
[7] Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
[8] Digital Signal Processing
[9] behavior information and technology
[10] EURASIP Journal on Audio, Speech, and Music Processing

researchmap
国際会議査読

　More details

[1] ICASSP-2021/2022/2023/2024/2025/2026 (meta reviewer), INTERSPEECH-2015/2018/2019/2020/2021/2022/2023/2024/2025/2026, SLT-2022/2024, ASRU-2023/2025
[2] APSIPA-2019/2020/2021/2022/2023/2024/2025, IJCNN-2023/2024/2026, ICONIP2023
[3] BC_VCC-2020 (Blizzard Challenge and Voice Conversion Challenge 2020)
[4] ACL-2017/2018/2020/2021/2022/2023/2024/2025/2026, EACL-2020/2022/2026(loresmt), NAACL-HLT-2016/2018/2019/2021
[5] IJCNLP-2017, EMNLP-IJCNLP-2019, EMNLP-2020/2021/2022, AACL-IJCNLP-2020/2022/2023/2025, COLING-2018/2022, SIGDIAL-2024
[6] NLP-2022/2023/2024, IALP-2023/2024
[7] AAAI-2019, ICLR-2021/2024, NeurIPS-2022/2023, ICML-2023/2024
[8] IROS-2019/2025/2026, Ubiquitous Robots (UR)-2020/2026, IEEE-ROMAN 2023/2025/2026
[9] ICME-2020/2021/2022/2023(main+workshop)/2024, ACM Multimedia 2021/2022/2023, ACM Multimedia Asia 2023, MMM 2023
[10] PAKDD-2023, DASFAA-2024, ACM ICMR 2024

researchmap