Updated on 2026/03/27

写真a

 
LI SHENG
 
Organization
School of Engineering Assistant Professor
Title
Assistant Professor
Contact information
メールアドレス
Other name(s)
Sheng Li
Profile

Sheng LI received his BS and ME degrees in 2006 and 2009 from Nanjing University, Nanjing, China, and his Ph.D. from Kyoto University, Kyoto, Japan, in 2016. From 2009 to 2012, he worked at the joint lab of the Chinese University Hong Kong and Shenzhen City, researching speech technology-assisted language learning. From 2016 to 2017, he worked as a researcher at Kyoto University, studying speech recognition systems for humanoid robots. From 2017 to Feb.2025, he worked for the National Institute of Information and Communications Technology, Kyoto, Japan, as a researcher working on speech recognition. In March 2025, he joined the Institute of Science Tokyo as an assistant professor working on speech recognition. He served as a workshop/special session co-organizer and session chair in interspeech2020, coling2022, odyssey2022, ACM Multimedia Asia2023/2024, RO-MAN2025, and ICASSP2024. He is a member of the Acoustic Society of Japan (ASJ), the International Speech Communication Association (ISCA), and a senior member of IEEE. He is now a member of the Speech, Language, and Audio (SLA) Technical Committee for APSIPA. He is also a member of the Applied Signal Processing Systems Technical Committee (ASPS TC) of the IEEE Signal Processing Society (SPS).

Focusing on 

next-generation multilingual speech recognition,  translation, and synthesis technology

security-aware speech processing technology

robot audition/embodied voices

https://search.star.titech.ac.jp/titech-ss/pursuer.act?event=outside&key_t2r2Rid=CTT100930321&lang=jp

https://educ.titech.ac.jp/ict/eng/faculty/

https://youtu.be/pP6YtlSVqlM

External link

Degree

  • Ph.D (Informatics) ( 2016.3   Kyoto University )

Research Interests

  • Speech Recognition/translation

  • computer assisted language learning

  • multimodal speech process

  • security-aware speech processing

  • large language model (speech, text)

Research Areas

  • Informatics / Perceptual information processing

Education

  • Kyoto University   Graduate School   Ph.D Informatic Science

    2012.10 - 2016.3

      More details

  • Nanjing University   Joint Program of Chinese Academy of Sciences, Chinese University of Hong Kong and Nanjing University   M.E

    2007.9 - 2009.7

      More details

  • Nanjing University (a C7 university, previously National Central Univ., CSRank2025≒Kyoto Univ.)   B.S Computer Science

    2002.7 - 2006.7

      More details

Research History

  • RIKEN Visiting Scientist

    2025.10

      More details

  • Institute of Science Tokyo   Assistant Professor

    2025.3

      More details

    Country:Japan

    researchmap

  • Eindhoven University of Technology (TU/e), visiting assistant professor

    2024.11

      More details

    Country:Netherlands

    researchmap

  • Nanyang Technological University   visiting researcher

    2024.2 - 2024.3

      More details

    Country:Singapore

    researchmap

  • Kyoto University   master course advisor

    2021.12 - 2023.3

      More details

  • National Institute of Information and Communications Technology (NICT)   Advanced Speech Technology Laboratory (ASTL)   Tenure-track Researcher

    2020 - 2025.2

      More details

  • Oxford University   Computer science department   visiting researcher

    2019.4 - 2019.5

      More details

  • National Institute of Information and Communications Technology (NICT)   Advanced Speech Technology Laboratory (ASTL)   researcher (hired by Tokyo Olympic2020 project)

    2017 - 2019

      More details

  • Kyoto University, Speech and Audio Processing Lab.   researcher (hired by Erica Humanoid robot project)

    2016.4 - 2016.12

      More details

  • Sogou/Sohu Pinyin IME [Beijing, China]   researcher (working on speech input)

    2012.4 - 2012.9

      More details

  • Shenzhen Institute of Advanced Technology [Shenzhen, Guangdong China]   researcher (computer-assisted language learning)

    2009.7 - 2012.4

      More details

▼display all

Professional Memberships

  • APNNS (Asia Pacific Neural Network Society)

    2023.12

      More details

  • ACM (Association for Computing Machinery)

      More details

  • IEEE/IEEE-SPS/IEEE-RAS

      More details

  • ISCA (International Speech Communication Association)

      More details

  • ASJ (日本音響学会)

      More details

  • SIG-CSLP (Chinese Spoken Language Processing)

      More details

  • APSIPA (Asia Pacific Signal and Information Processing Association)

      More details

▼display all

Committee Memberships

  • JSAI   Co-organizer of OS  

    2026.6   

      More details

    Committee type:Academic society

    researchmap

  • IEEE ICASSP2026   meta reviewer  

    2026.1   

      More details

    Committee type:Academic society

    researchmap

  • APSIPA   Speech, Language, and Audio (SLA) Technical Committee (till 2026)  

    2026   

      More details

    Committee type:Academic society

    researchmap

  • IEEE IROS2025   session chair  

    2025.10   

      More details

    Committee type:Academic society

    researchmap

  • IEEE RO-MAN2025   Co-organizer of special session  

    2025.9   

      More details

    Committee type:Academic society

    researchmap

  • IEEE   senior member  

    2025.4   

      More details

    Committee type:Academic society

    researchmap

  • IEEE Signal Processing Society (SPS)   Applied Signal Processing Systems Technical Committee (ASPS TC)  

    2025.1 - 2027.1   

      More details

    Committee type:Academic society

    researchmap

  • Co-organizing ACM Multimedia Asia 2024 workshop: Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages (M3Oriental)   Co-organizer  

    2024.12   

      More details

  •   Session Chair of DASFAA2024  

    2024.7   

      More details

  •   Publicity Chair of ACM Multimedia Asia 2024  

    2024.6 - 2024.12   

      More details

    Committee type:Academic society

    researchmap

  •   Session Chair of IEEE-ICASSP2024  

    2024.4   

      More details

    Committee type:Academic society

    researchmap

  •   Co-organizing ACM Multimedia Asia 2023 workshop: Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages (M3Oriental)  

    2023.12   

      More details

  •   Session Chair of ICANN 2023  

    2023.9   

      More details

  •   Area Chair of APSIPA ASC 2023  

    2023.7   

      More details

  •   Area Chair of EMNLP 2023  

    2023.7   

      More details

  •   Co-organizing Coling2022 workshop: when creative ai meets conversational ai (cai + cai = cai^2)  

    2022.10   

      More details

    Committee type:Academic society

    researchmap

  •   Session Chair for Speaker Odyssey2022 (Evaluation and Benchmarking Session)  

    2022.6   

      More details

    Committee type:Academic society

    researchmap

  •   Session Chair for INTERSPEECH2020 (Topics of ASR I)  

    2020.10   

      More details

    Committee type:Academic society

    researchmap

  •   Co-organizing INTERSPEECH2020 SLIMTS (Spoken Language Interaction for Mobile Transportation System) workshop  

    2020.10   

      More details

    Committee type:Academic society

    researchmap

▼display all

Papers

▼display all

Books

  • Voices of the Himalayas: Investigation of Speech Recognition Technology for the Tibetan Language

    Sheng Li( Role: Sole author)

    2023.2  ( ISBN:9784904020289

     More details

  • Bridging Eurasia: Multilingual Speech Recognition for Silkroad

    Sheng Li( Role: Sole author)

    2023.1  ( ISBN:9784904020296

     More details

  • Phantom in the Opera: The Vulnerabilities of Speech-based Artificial Intelligence Systems

    Sheng Li( Role: Sole author)

    2022.11  ( ISBN:9784904020265

     More details

  • Automatic speech recognition: Speech-to-Speech Translation

    X. Lu, S. Li, M. Fujimoto( Role: Joint authorChapter 3.3.2: From Shallow to Deep and Very Deep. Chapter 3.3.3: End-to-End and CTC models.)

    Springer Singapore  2020 

     More details

MISC

  • Evaluating Tibetan ASR with Segmented Word Error Rate: Beyond Character-Level Metrics

    Jacob Moore, Sheng Li, Paula Lauren

    TechRxiv   2026.2

     More details

    Language:English   Publisher:Institute of Electrical and Electronics Engineers (IEEE)  

    DOI: 10.36227/techrxiv.177102186.63648582/v1

    researchmap

  • End-to-end Acoustic-linguistic Emotion and Intent Recognition Enhanced by Semi-supervised Learning.

    Zhao Ren, Rathi Adarshi Rammohan, Kevin Scheck, Sheng Li, Tanja Schultz

    2025.12

  • Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement

    Jianing Yang, Sheng Li, Takahiro Shinozaki, Yuki Saito, Hiroshi Saruwatari

    arXiv   2025.10

     More details

    Language:English  

    DOI: 10.48550/arXiv.2510.01722

    researchmap

  • Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR

    Hongli Yang, Sheng Li, Hao Huang, Ayiduosi Tuohan, Yizhou Peng

    arxiv   2025.7

     More details

    Language:English  

    DOI: 10.48550/arXiv.2506.21577

    researchmap

  • Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning

    Hongli Yang, Yizhou Peng, Hao Huang, Sheng Li

    2025.7

     More details

    Language:English  

    DOI: 10.48550/arXiv.2506.21576

    researchmap

  • Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation

    Haowei Lou, Hye-young Paik, Sheng Li, Wen Hu, Lina Yao

    arXiv preprint arXiv:2504.08274   2025.4

     More details

    Language:English  

    researchmap

  • Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

    Zhengdong Yang, Qianying Liu, Sheng Li, Fei Cheng, Chenhui Chu

    arXiv   2025.1

     More details

    Language:English  

    DOI: 10.48550/arXiv.2501.17615

    researchmap

  • Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

    Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang

    arXiv   2025.1

     More details

    Language:English   Publishing type:Research paper, summary (international conference)  

    DOI: 10.48550/arXiv.2501.07329

    researchmap

  • Multi-Prototype  Network with Swin Transformer for Open Set Recognition

    Jun Wang, Haiyan Yang, Sheng Li, Di Zhou, Xingwei Chen, Juncheng Li, Yufeng Hua, Jun Shi

    SSRN   2025

     More details

    Language:English   Publishing type:Article, review, commentary, editorial, etc. (scientific journal)   Publisher:Elsevier BV  

    DOI: 10.2139/ssrn.5134636

    researchmap

  • A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations

    Phurich Saengthong, Boonnithi Jiaramaneepini, Sheng Li, Manabu Okumura, Takahiro Shinozaki

    2025

  • Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads

    Jing Li, Felix Schijve, Sheng Li, Yuye Yang, Jun Hu, Emilia Barakova

    2025

     More details

    Language:English  

    DOI: 10.48550/arXiv.2507.10427

    researchmap

  • Multi-Prototype  Network with Swin Transformer for Open Set Recognition

    Jun Wang, Haiyan Yang, Sheng Li, Di Zhou, Xingwei Chen, Juncheng Li, Yufeng Hua, Jun Shi

    2025

     More details

    Language:English   Publisher:Elsevier BV  

    DOI: 10.2139/ssrn.5134636

    researchmap

  • Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction

    Yuka Ko, Sheng Li, Chao-Han Huck Yang, Tatsuya Kawahara

    arXiv   2024.12

     More details

    Language:English  

    DOI: 10.48550/arXiv.2408.16180

    researchmap

  • Extracting Spatiotemporal Data from Gradients with Large Language Models

    Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

    arXiv   2024.10

     More details

    Language:English  

    DOI: 10.48550/arXiv.2410.16121

    researchmap

  • Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition

    Chao Tan, Sheng Li, Yang Cao, Zhao Ren, Tanja Schultz

    arXiv   2024.10

     More details

    Language:English  

    DOI: 10.48550/arXiv.2410.13221

    researchmap

  • Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks

    Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

    arXiv   2024.7

     More details

    Language:English  

    DOI: 10.48550/arXiv.2407.08529

    researchmap

  • MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

    Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara

    2024.1

     More details

    Language:English   Publishing type:Research paper, summary (international conference)  

    DOI: 10.48550/arXiv.2401.13249

    researchmap

  • End-to-End Speech-to-Speech Translation toolkit

    Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li

    ACM Multimedia Asia 2023 workshop released tookit   2023.12

     More details

  • FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer's Speech Detection

    Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li

    2023.11

     More details

    Authorship:Last author, Corresponding author   Language:English  

    DOI: 10.48550/arXiv.2311.13043

    researchmap

  • LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

    Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu

    2023.11

  • Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization

    Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He

    2023.11

  • GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System

    Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He

    2023.11

  • Towards Speech Dialogue Translation Mediating Speakers of Different Languages

    Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi

    arXiv:2305.09210   2023.5

     More details

    Language:English  

    researchmap

  • Robust Voice Activity Detection Using an Auditory-Inspired Masked Modulation Encoder Based Convolutional Attention Network

    Nan LI, Longbiao Wang, Meng Ge, Masashi Unoki, Sheng Li, Jianwu Dang

    2023

     More details

    Language:English   Publisher:Elsevier BV  

    DOI: 10.2139/ssrn.4557926

    researchmap

  • Speech-text based multi-modal training with bidirectional attention for improved speech recognition

    Yuhang Yang, Haihua Xu, Hao Huang, Eng Siong Chng, Sheng Li

    2022.10

     More details

  • Tendency-and-Attention-Informed Deep Learning for ENSO Forecasts

    Shen Qiao, Cuicui Zhang, Xuefeng Zhang, Kai Zhang, Hao Shi, Sheng Li, Hao Wei

    2022.6

     More details

    Publisher:Research Square Platform LLC  

    Abstract

    Deep learning has been acknowledged as an increasingly important technology for ENSO forecasts. The most cutting-edge deep learning algorithm is developed based on Convolutional Neural Network (CNN), which can achieve a multi-year (about 17-month-lead) forecast and has conquered the ‘spring forecast barrier’ problem. However, this group of methods are still challenged by several critical issues. First, they usually utilize the global sea surface temperature (SST) fields as inputs without considering the specific contributions of variant oceanic regions in ENSO forecasts. Consequently, they cannot effectively investigate the role of the ‘teleconnection’ mechanism among different oceans (Indian, Pacific, and Atlantic Oceans) and different ocean parts (the tropic and non-tropic regions) especially in the forecast of extreme ENSO events. Second, existing methods mainly utilize the discrete monthly SST fields for Deep Learning for ENSO Forecasts ENSO forecasts without investigating the rate-of-changes between adjacent months, which also provides important information to the prediction of variation tendency. To solve these problems, this paper develops a Tendency-and-Attention-Informed Deep Residual Network (TA-DRN) for multi-year ENSO forecasts. The contributions of different oceanic regions can be learned by a spatial attention module while the variation tendency of adjacent previous and current months can be interpreted by the first-and-second order of differences of SST fields. Through informed by these two modules, the performance of TA-DRN can be improved significantly, especially in predicting extreme El Niño and La Niña events.

    DOI: 10.21203/rs.3.rs-1733575/v1

    researchmap

    Other Link: https://www.researchsquare.com/article/rs-1733575/v1.html

  • Fusion of Self-supervised Learned Models for MOS Prediction

    Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao

    CoRR abs/2204.04855   2022.4

     More details

    Authorship:Corresponding author  

    researchmap

  • Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition.

    Qianying Liu, Yuhang Yang, Zhuo Gong, Sheng Li, Chenchen Ding, Nobuaki Minematsu, Hao Huang, Fei Cheng, Sadao Kurohashi

    abs/2204.03855   2022.4

     More details

    Authorship:Corresponding author  

    researchmap

  • Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release

    Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, Masatoshi Yoshikawa

    CoRR abs/2004.07442   2020.6

     More details

    Authorship:Lead author  

    researchmap

  • Deep progressive multi-scale attention for acoustic event classification

    Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

    CoRR abs/1912.12011   2019.4

     More details

    Authorship:Lead author  

    researchmap

▼display all

Presentations

  • 大規模言語モデルの統合による音声認識システムの改善 Invited

    李 勝

    NICT Open House 2024  2024.6 

     More details

    Event date: 2024.6

    Language:Japanese  

    researchmap

  • Diversity-driven Semi-supervised Ensemble DNN Acoustic Model Training (音声)

    LI Sheng, LU Xugang, SAKAI Shinsuke, KAWAHARA Tatsuya

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報  2016.8  電子情報通信学会

     More details

    Event date: 2016.8

    Language:English  

    researchmap

  • Discriminative Data Selection from Multiple ASR Systems' Hypotheses for Unsupervised Acoustic Model Training (音声) -- (第17回音声言語シンポジウム)

    LI SHENG, AKITA YUYA, KAWAHARA TATSUYA

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報  2015.12  電子情報通信学会

     More details

    Event date: 2015.12

    Language:English  

    researchmap

  • 相互情報量最小化による感情・音色の分離に基づく感情的音声合成,

    楊家寧, 李勝, 篠崎 隆宏, 齋藤佑樹, 猿渡洋

    日本音響学会研究発表会講演論文集, 秋季  2025.10 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • RAG-Boost: Retrieval-Augmented Generation Enhanced Speech Recognition in LLM-based Spoken Dialogue Systems

    王鵬程, 李勝, 篠崎隆宏

    日本音響学会研究発表会講演論文集, 秋季  2025.10 

     More details

  • System Description for the CN-Celeb Speaker Recognition Challenge 2022

    Guangxing Li, Wangjin Zhou, Sheng Li, Yi Zhao, Hao Huang, Jichen Yang

    CNSRC (the CN-Celeb Speaker Recognition Challenge), Speaker Odyssey 2022  2022.6 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • Study on Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network

    Li Kai, Xugang Lu, Masato Akagi, Jianwu Dang, Sheng Li, Unoki Masashi

    IEICE Tech. Rep.  2022.8 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • 異言語話者の対話を仲介する音声対話翻訳

    清水 周一郎, 褚 晨翚, 李 勝, 黒橋 禎夫

    言語処理学会第 29 回年次大会(NLP2023)  2023.3 

     More details

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Towards Security-aware Speech Recognition System, Invited

    Sheng Li

    NECTEC-NICT joint seminar  2023.8 

     More details

    Language:English   Presentation type:Oral presentation (invited, special)  

    researchmap

  • Cross-lingual Mapping for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

    Zhengdong Yang, Qianying Liu, Sheng Li, Chenhui Chu, Fei Cheng, Sadao Kurohashi

    ASJ 2023 autumn  2023.9 

     More details

    Language:English   Presentation type:Poster presentation  

    researchmap

  • Correction while Recognition: Combining Pretrained Language Model for Taiwan-accented Speech Recognition Invited

    Sheng Li

    Joint Seminar with NECTEC Language Understand Group  2023.11 

     More details

    Language:English   Presentation type:Oral presentation (invited, special)  

    researchmap

  • System Description for the Voiceprivacy Challenge 2022

    Xiaojiao Chen, Guangxing Li, Wangjin Zhou, Sheng Li, Yang Cao, Hao Huang, Yi Zhao

    Voiceprivacy Challenge 2022  2022.9 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • VoicePrivacy Challenge: System description

    X. Chen, G. Li, H. Huang, W. Zhou, Y. Cao, S. Li, Y. Zhao

    VoicePrivacy 2022 Challenge Workshop (Interspeech2022)  2022.9 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • Domain and Language Adaptation of Large-scale Pretrained Model for Speech Recognition of Low-resource Language

    Kak Soky, Sheng Li, Chenhui Chu, Tatsuya Kawahara

    IEICE Tech. Rep. (信学技報)  2022.12 

     More details

  • Self-Supervised Learning MOS Prediction with Listener Enhancement Invited

    Sheng Li

    VoiceMOS mini workshop  2023.11 

     More details

    Language:English   Presentation type:Oral presentation (invited, special)  

    researchmap

  • Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition Invited

    Zhengdong Yang

    ICT-innovation 2023 (Kyoto Univ.)  2024.2 

     More details

    Language:English   Presentation type:Public lecture, seminar, tutorial, course, or other speech  

    researchmap

  • Investigating effective methods for combining large language model with speech recognition system

    Sheng Li, Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Hisashi Kawai

    Acoustical Society of Japan 151st (Spring 2024) meeting  2024.3 

     More details

    Language:English   Presentation type:Poster presentation  

    researchmap

  • Combining Large Language Model with Speech Recognition System in Low-resource Settings

    Sheng Li, Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Chen Chen, Eng Siong Chng, Hisashi Kawai

    NLP2024  2024.3 

     More details

    Presentation type:Poster presentation  

    researchmap

  • Enhancing Multi-Step Reasoning in Language Models with Synthetic Math Data Augmentation (HP_Fighters team)

    Jieqing Mei, Jiyi Li, Qianying Liu, Sheng Li

    The first workshop of fine-tuning and evaluation large language model (FT-LLM2025)@NLP2025  2025.3 

     More details

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • CEFR-J Level Estimation of English Learners' Speech Using Large Language Models

    Takahiro Shinozai, Syutaro Sato, Sheng Li

    CEFR-J 2025 International Symposium  2025.3 

     More details

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • The System Description for VoiceMOS Challenge 2022 (main/ood tasks)

    2022 

     More details

  • Application of the RFID based audio service in regional navigation system

    S. Li, C. Li

    Bulletin of Advanced Technology Research  2009 

     More details

  • The Phoneme-level Articulator Dynamics for 3D Pronunciation Animation for Chinese

    S. Li, K. Luo, L. Wang

    Bulletin of Advanced Technology Research  2011 

     More details

  • Phoneme-level articulatory animation in pronunciation training using EMA data

    Sheng LI

    Speech Synthesis Lab., Tsinghua University, host: Prof. Zhiyong Wu.  2012 

     More details

  • Vocal Tract Length Normalization for Chinese Spontaneous Speech Recogntion

    Sheng LI

    Technical-report.(Kyoto university)  2013 

     More details

  • Multi-lingual transformer training for Khmer automatic speech recognition

    K. Soky, S. Li, T. Kawahara, S. Seng

    Interspeech 2020 Satellite Workshop (SLIMTS2020). (abstract paper) 

     More details

  • Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release

    Y. Han, S. Li, Y. Cao, Q. Ma, M. Yoshikawa

    Interspeech 2020 Satellite Workshop (SLIMTS2020). (abstract paper)(invited report) 

     More details

  • Automatic Transcription of Chinese Spoken Lectures

    S. Li, M. Mimura, T. Kawahara

    Acoustical Society of Japan, autumn  2013 

     More details

  • DNN-based Acoustic Modeling and Decoding for Chinese Spontaneous Speech Recogntion with HTK

    Sheng LI

    Technical-report.(Kyoto university)  2014 

     More details

  • Lightly-supervised training and confidence estimation by using CRF classifiers,

    Sheng LI

    Speech and Cognition Lab., Tianjin University, host: Prof. Jianwu Dang and Prof. Kiyoshi Honda.  2014 

     More details

  • Effective combination of multiple ASR hypotheses with CRF-based classifiers

    S. Li, Y. Akita, T. Kawahara

    Acoustical Society of Japan, autumn  2015 

     More details

  • Discriminative data selection from multiple ASR systems' hypotheses for unsupervised acoustic model training

    S. Li, Y. Akita, T. Kawahara

    IPSJ SIG-SLP-109-8  2015 

     More details

  • Data Selection Assisted by Caption to Improve Acoustic Modeling for Lecture Transcription

    S. Li, Y. Akita, T. Kawahara

    Acoustical Society of Japan, spring  2014 

     More details

  • Classifier-based data selection for lightly-supervised training of acoustic model for lecture transcription

    S. Li, Y. Akita, T. Kawahara

    IPSJ SIG-SLP-102-4  2014 

     More details

  • Unsupervised Training of Deep Neural Network Acoustic Models for Lecture Transcriptions

    S. Li, Y. Akita, T. Kawahara

    Acoustical Society of Japan, autumn  2014 

     More details

  • Incorporating divergences from hypotheses of multiple ASR systems to improve unsupervised acoustic model training

    S. Li, Y. Akita, T. Kawahara

    Acoustical Society of Japan  2015 

     More details

  • Diversity-driven Semi-supervised Ensemble DNN Acoustic Model Training

    S. Li, X. Lu, S. Sakai, T. Kawahara

    Acoustical Society of Japan, autumn  2016 

     More details

  • Very deep convolutional residual network acoustic models for Japanese lecture transcription

    S. Li, X. Lu, P. Shen, H. Kawai

    Acoustical Society of Japan, autumn  2017 

     More details

  • cGAN-classifier: Conditional Generative Adversarial Nets for Classification

    P. Shen, X. Lu, S. Li, H. Kawai

    Acoustical Society of Japan, autumn  2017 

     More details

  • CTC 音響モデルのための knowledge distillation 方式の検討

    R.Takashima, S. Li, H. Kawai

    Acoustical Society of Japan, spring  2018 

     More details

  • Short utterance-based spoken language identification

    P. Shen, X. Lu, S. Li, H. Kawai

    Acoustical Society of Japan, autumn  2018 

     More details

  • Training CTC and LFMMI-based TDNN with CNTK

    Sheng LI

    NICT internal report  2018 

     More details

  • CTC音響モデルのためのシーケンスレベル知識蒸留法の検討

    高島 遼一, 李 勝, 河井 恒

    IPSJ SIG-SLP  2018 

     More details

  • An Empirical Comparison of Sequence Training Methods for the Very Deep Time-delay Neural Network

    S. Li, X. Lu, R.Takashima, P. Shen, H. Kawai

    Acoustical Society of Japan, autumn  2018 

     More details

  • Improving CTC-based acoustic model with very deep residual neural network

    S. Li, X. Lu, R.Takashima, P. Shen, H. Kawai

    Acoustical Society of Japan, spring  2018 

     More details

  • End-to-end音声認識技術の研究

    李 勝

    情報通信フェア2019  2019.9 

     More details

  • End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition

    S. Li, C. Ding, X. Lu, P. Shen and H. Kawai,

    Acoustical Society of Japan, spring  2020 

     More details

    Presentation type:Oral presentation (general)  

    researchmap

  • Joint Training End-to-End Systems for Speech and Speaker Recognition with Speaker Attributes,

    S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai,

    Acoustical Society of Japan, spring  2020 

     More details

    Presentation type:Oral presentation (general)  

    researchmap

  • Improvement of x-vector for short utterance spoken language identification,

    P. Shen, X. Lu, K. Sugiura, S. Li, H. Kawai,

    Acoustical Society of Japan, spring  2020 

     More details

    Presentation type:Oral presentation (general)  

    researchmap

  • Investigation of multi-domain training for speech recognition,

    P. Shen, X. Lu, S. Li, H. Kawai

    Acoustical Society of Japan, spring  2019.3 

     More details

  • Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release Invited

    Y. Han, S. Li, Y. Cao, Q. Ma, M. Yoshikawa

    INTERSPEECH 2020 Satellite Workshop (SLIMTS2020) (invited report)  2020.10 

     More details

    Language:English   Presentation type:Oral presentation (invited, special)  

    researchmap

  • A Mixture of Character and Word End-to-End System for Keyword Spotting Invited

    H. Zhang, S. Ueno, M. Mimura, S. Li, W. Zhang, T. Kawahara

    INTERSPEECH 2020 Satellite Workshop (SLIMTS2020)(full paper)  2020.9 

     More details

    Language:English   Presentation type:Oral presentation (invited, special)  

    researchmap

  • Investigation of Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data and mask embedding,

    S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, K. Honda

    INTERSPEECH 2020 Satellite Workshop (SLIMTS2020).  2020.9 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • Multi-lingual transformer training for Khmer automatic speech recognition, Invited

    K. Soky, S. Li, T. Kawahara, S. Seng

    INTERSPEECH 2020 Satellite Workshop (SLIMTS2020).  2020.9 

     More details

    Language:English   Presentation type:Oral presentation (invited, special)  

    researchmap

  • System Description for Voice Privacy Challenge (Kyoto Team).

    Y. Han, S. Li, Y. Cao, M. Yoshikawa

    In special session of INTERSPEECH2020 (VoicePrivacy challenge 2020)  2020.9 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • Description of End-to-End Dialect Identification System (accepted in INTERSPEECH2021)

    Ding Wang, Shuaishuai Ye, Xinhui Hu, Sheng Li, Xinkang Xu

    In special session of INTERSPEECH2021 (OLR2020 challenge)  2021.9 

     More details

    Language:English   Presentation type:Poster presentation  

    researchmap

  • Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview

    Xiaojiao Chen, Sheng Li, Hao Huang

    NCMMSC2021  2021.10 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • System description of Alzheimer's disease early detection (Silk-road team, short speech track)

    Wenqing Wei, Rui Wong, Sheng Li, Yachao Guo, Hao Huang

    Alzheimer's disease detection challenge (NCMMSC2021)  2021.10 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • System description of joint speech and accent recognition (published in APSIPA ASC, 2021)

    Y. Peng, J. Zhang, H. Zhang, H. Xu, H. Huang, S. Li, E.S. Chng

    in Challenge of Interspeech2020 Accented English Speech Recognition, AESR, 2020.  2021.12 

     More details

    Language:English   Presentation type:Poster presentation  

    researchmap

  • End-to-End Speech Translation with Cross-lingual Transfer Learning

    S Shimizu, C Chu, S Li, S Kurohashi

    NLP2021  2021 

     More details

  • Comparison of End-to-End Models for Joint Speaker and Speech Recognition

    K Soky, S Li, M Mimura, C Chu, T Kawahara

    IEICE-SP  2021 

     More details

  • The RoyalFlush(NICT) System Description for AP21-OLR Challenge Invited

    Ding Wang, Shuaishuai Ye, Xinhui Hu, Sheng Li

    AP21-OLR Challenge  2022.1 

     More details

    Language:English   Presentation type:Oral presentation (invited, special)  

    researchmap

  • Multilingual Retrieval-Augmented Generation Enhanced LLM-based Speech Recognition

    王 鵬程, 李 勝, 篠崎 隆宏

    日本音響学会第155回(2026年春季)研究発表会  2026.3 

     More details

    Presentation type:Oral presentation (general)  

    researchmap

  • 指示再構成手法に基づく言語モデルベース音声合成のスタイル制御

    Zhu Shiao, Li Sheng, 篠崎 隆宏

    日本音響学会第155回(2026年春季)研究発表会  2026.3 

     More details

    Presentation type:Oral presentation (general)  

    researchmap

  • 音声認識および音声翻訳における生成的誤り訂正のための多言語ベンチマーク

    Zhengdong Yang, Zhen Wan, Sheng Li, Chao-Han Huck Yang, Chenhui Chu

    言語処理学会 第32回年次大会  2026.3 

     More details

▼display all

Industrial property rights

  • training method

    Sheng Li, Xugang Lu, Ryoichi takashima, Peng Shen, Hisashi Kawai

     More details

    Applicant:National Institute of Information and Communications Technology (NICT)

    Application no:特願2017-236626  Date applied:2017.12

    Announcement no:特開2019-105899  Date announced:2019.6

    Patent/Registration no:特許6979203  Date registered:2021.11 

    Rights holder:National Institute of Information and Communications Technology (NICT)

    researchmap

  • 推論器および推論器の学習方法

    李勝, ルーシュガン, 河井恒

     More details

    Application no:特願2020-059962 

    Patent/Registration no:特許7423056  Date registered:2024.1 

    researchmap

  • 時系列情報の学習システム、方法および ニューラルネットワークモデル

    高島 遼一, 李 勝, 河井 恒

     More details

    Application no:特願2018-044134 

    Patent/Registration no:特許7070894  Date registered:2022.5 

    Rights holder:National Institute of Information and Communications Technology (NICT)

    researchmap

  • 音声認識システム、音声認識方法、学習済モデル

    李勝, シュガンルー・, 高島遼一, 沈鵬, 河井恒

     More details

    Application no:特願2018-044491 

    Patent/Registration no:特許7109771  Date registered:2022.7 

    Rights holder:国立研究開発法人情報通信研究機構

    researchmap

  • 識別器、学習済モデル、学習方法

    李勝, ルーシュガン, 高島遼一, 沈鵬, 河井恒

     More details

    Application no:特願2018-142418 

    Patent/Registration no:特許7209330  Date registered:2023.1 

    Rights holder:国立研究開発法人情報通信研究機構

    researchmap

  • 言語識別モデルの訓練方法及び装置、並びにそのためのコンピュータプログラム

    沈鵬, ルーシュガン, 李勝, 河井恒

     More details

    Application no:特願2019-086005 

    Patent/Registration no:特許7282363  Date registered:2023.5 

    researchmap

  • 推論器、推論プログラムおよび学習方法

    李 勝, ルー・シュガン, 丁 塵辰, 河原 達也, 河井 恒

     More details

    Application no:特願2019-163555 

    Patent/Registration no:特許7385900  Date registered:2023.11 

    researchmap

▼display all

Works

  • HSoftmax: Hierachical Softmax (https://github.com/Derek-Gong/hsoftmax/)

    Zhuo Gong, Qianying Liu, Sheng Li, Zhengdong Yang, Yuhang Yang

    2020

     More details

    Work type:Software  

    researchmap

  • https://openslr.org/158/

     More details

  • very deep residual time-delay neural network (TDNN) with LFMMI objective implemented with MS-CNTK

     More details

  • Julius decoder with EESEN CTC acoustic model

     More details

  • Julius decoder with Kaldi acoustic model

     More details

  • Julius decoder with Kaldi feature extractor

     More details

  • VTLN for Julius/HTK acoustic model

     More details

  • Julius for speech foundation models

    https, github.com/halspeech/julius-speech-foundation-model

     More details

  • foundation models for Tibetan language

     More details

  • online speech recognition module for Erica the human robot

     More details

▼display all

Awards

  • 2025 year research funding

    2026.3   The Telecommunications Advancement Foundation  

     More details

  • Awards and Research Grants from the School of Engineering Common Fund

    2025.11   Institute of Science Tokyo  

     More details

  • Next Generation Star

    2025.10   IEEE IROS2025   https://youtu.be/pP6YtlSVqlM

     More details

  • IES SYPA Award

    2025.10   IEEE IROS2025  

    Sheng Li

     More details

  • best reviewer

    2025.8   IEEE RO-MAN2025  

    Sheng Li

     More details

  • task1: speech recognition error correction using LLM

    2024.12   SLT2024 grand challenge LLM GER  

     More details

  • top2 in one track

    2023.12   ICASSP2024 ICMC-ASR (In-Car Multi-Channel Automatic Speech Recognition) Challenge  

     More details

  • 1st place in one track in ASRU2023 special session: VoiceMOS challenge

    2023.12  

     More details

  • IEEE-SPS grant for IEEE-ICASSP2023 oral presentation (Co-supervised PhD student Qianying Liu)

    2023.5   IEEE signal processing society  

     More details

  • 1st place in 6 indexes (total 16) of Main/OOD tracks in INTERSPEECH2022 special session: VoiceMOS challenge

    2022  

     More details

  • 3rd/4th place in constrained/unconstrained resource multilingual ASR tracks of OLR2021 challenge

    2021.12   Oriental language recognition challenge 2021  

     More details

  • Supervised student (Soky Kak) got best student paper nomination

    2021.11   O-COCOSDA2021  

     More details

  • Outstanding Performance Award Excellence Award (Group)

    2021.6   National Institute of Information and Communications Technology (NICT)  

     More details

  • Travel Grant

    2020.9   ISCA   Singing Voice Extraction with Attention based Spectrograms Fusion

    Supervised student Hao Shi

     More details

  • Travel Grant

    2020.9   ISCA   Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription

    supervised student Yuqin Lin

     More details

  • ICME 2020 best student paper nomination, selected as journal paper in IEEE Trans Multimedia (TMM)

    2020.7  

    Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release Y. Han, S. Li, Y. Cao, Q. Ma, M. Yoshikawa IEEE-ICME

     More details

  • FY 2020 International Development Fund (new proposal score top1)

    2020.5   National Institute of Information and Communications Technology (NICT)  

     More details

  • selected as tenure-track researcher with grants (only 3 persons in FY2019)

    2019   National Institute of Information and Communications Technology  

     More details

  • Japan Student Journal Paper Award

    2018   IEEE Signal Processing Society  

    Sheng LI

     More details

  • 2012-2016 admission/tuition fee total exemption

    2016.3   Kyoto Univ.  

     More details

  • Paper nominated as ACM/IEEE Trans. Audio, Speech \& Language Process. cover

    2016  

     More details

  • ポートランド,Interspeech会議へIBM 旅行補助賞金

    2012  

     More details

  • 京都大学推薦国費留学生特別配置入学

    2012  

     More details

  • 職員優秀賞

    2011   中国科学院  

     More details

  • 香港青年起業家プログラムの創造的な企画賞

    2011  

     More details

  • Encouragement Scholarship

    2004   Nanjing University  

    Sheng Li

     More details

  • Chen Yinchuan Scholarship (Hongkong) for Excellent University New Students

    2002  

     More details

  • 化学オリンピック二等賞,生物学オリンピック三等賞

    2002   中国江蘇省  

     More details

▼display all

Research Projects

  • enhancing large language model

    2024.4

    Tohoku University - NICT collaborative research 

      More details

    Authorship:Principal investigator 

    researchmap

  • 意図を的確に伝える音声対話翻訳の基盤技術の創出

    2023.4 - 2028.4

    JSPS  KAKEN  Grant-in-Aid for Scientific Research (B)

      More details

    Authorship:Coinvestigator(s) 

    researchmap

  • M3OLR: Towards Effective Multilingual, Multimodal and Multitask Oriental Low-resourced Language Speech Recognition

    2023.4 - 2026.4

    JSPS  KAKEN  Grant-in-Aid for Scientific Research (C)

      More details

    Authorship:Principal investigator 

    researchmap

  • Spoof Detection for Automatic Speaker Verification

    2023.4 - 2024.4

    ICT Virtual Organization of ASEAN Institutes and NICT (ASEAN IVO) 

      More details

    Authorship:Coinvestigator(s) 

    researchmap

  • Bridging Eurasia from Sea -- Multilingual Speech Recognition for Maritime Silkroad

    2022 - 2024

    NICT international funding 

      More details

    Authorship:Principal investigator 

    researchmap

  • Phantom in the Opera -- the Vulnerabilities of Speech Interface for Robotic Dialogue System

    2021.4 - 2023.4

    JSPS  Grant-in-Aid for Scientific Research  Grant-in-Aid for Young Scientists

    Sheng Li

      More details

    Authorship:Principal investigator 

    researchmap

  • Advanced Multilingual End-to-End Speech Recognition

    2020.4 - 2022.4

    National Institute of Information & Communications Technology (NICT)  NICT tenure-track start-up funding 

    Sheng Li

      More details

    Authorship:Principal investigator 

    researchmap

  • Bridging Eurasia -- Multilingual Speech Recognition for Silkroad

    2020.4 - 2022.4

    National Institute of Information & Communications Technology (NICT)  NICT international funding 

    Sheng Li

      More details

    Authorship:Principal investigator 

    researchmap

  • Speaker De-identification with Provable Privacy in Speech Data Release

    2020.4 - 2021.4

    NII  Open Collaborative Research 

      More details

    Authorship:Collaborating Investigator(s) (not designated on Grant-in-Aid) 

    researchmap

  • Next generation multilingual End-to-End speech recognition (from G30 to G200)

    2019.10 - 2021.3

    JSPS  Grant-in-Aid for Scientific Research  Grant-in-Aid for Research Activity Start-up

    Sheng LI

      More details

    Authorship:Principal investigator  Grant type:Competitive

    researchmap

▼display all

Other

  • reviewer/program commitee

     More details

    [1] IEEE/ACM Trans. Audio, Speech \& Language Process.
    [2] Computer Speech and Language
    [3] Speech Communication
    [4] IEICE transactions, letters
    [5] APSIPA transactions
    [6] Applied Acoustics
    [7] Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
    [8] Digital Signal Processing
    [9] behavior information and technology
    [10] EURASIP Journal on Audio, Speech, and Music Processing

    researchmap

  • 国際会議査読

     More details

    [1] ICASSP-2021/2022/2023/2024/2025/2026 (meta reviewer), INTERSPEECH-2015/2018/2019/2020/2021/2022/2023/2024/2025, SLT-2022/2024, ASRU-2023/2025
    [2] APSIPA-2019/2020/2021/2022/2023/2024/2025, IJCNN-2023/2024/2026, ICONIP2023
    [3] BC_VCC-2020 (Blizzard Challenge and Voice Conversion Challenge 2020)
    [4] ACL-2017/2018/2020/2021/2022/2023/2024/2025/2026, EACL-2020/2022/2026(loresmt), NAACL-HLT-2016/2018/2019/2021
    [5] IJCNLP-2017, EMNLP-IJCNLP-2019, EMNLP-2020/2021/2022, AACL-IJCNLP-2020/2022/2023/2025, COLING-2018/2022, SIGDIAL-2024
    [6] NLP-2022/2023/2024, IALP-2023/2024
    [7] AAAI-2019, ICLR-2021/2024, NeurIPS-2022/2023, ICML-2023/2024
    [8] IROS-2019/2025, Ubiquitous Robots (UR)-2020, IEEE-ROMAN 2023/2025
    [9] ICME-2020/2021/2022/2023(main+workshop)/2024, ACM Multimedia 2021/2022/2023, ACM Multimedia Asia 2023, MMM 2023
    [10] PAKDD-2023, DASFAA-2024, ACM ICMR 2024

    researchmap