Updated on 2025/09/30

写真a

 
SHINODA KOICHI
 
Organization
School of Computing Professor
Title
Professor
External link

News & Topics
  • データ駆動型アプローチに基づく映像検索のための意味インデクシング開発

    2014/02/14

    Languages: Japanese

      More details

    東京工業大学大学院情報理工学研究科計算工学専攻の井上中順(なかまさ)大学院生と篠田浩一教授らはキヤノンの協力で、インターネット上の映像データからタグやメタデータなどのテキスト概要情報を用いずに、必要な映像を検索する新手法「映像意味インデクシングシステム」を開発した。これは自動車・椅子などの「物体」、夕焼け・家族団らんなどの「シーン」、結婚式・花火などの「イベント」など、人間にとって意味のある「概念」を検出することができる。研究の背景インターネット上の映像データが急激に増加している。それらの大部分はユーザーが作成したもので、極めて多様であり、品質も良くなく、また、十分なテキストタグがついていない。このため、映像の画像特徴や音響特徴を利用した映像検索手法の開発が強く求められていた。

  • Semantic indexing system for video search using a data-driven ap

    2014/02/14

    Languages: English

      More details

    The volume of video data on the Internet increases rapidly each year, with the majority of the data being various kinds of low quality, consumer videos, without text tags. So there is strong demand for video search techniques based on the use of image and video features-so called “content-based video retrieval” (CBVR).Video semantic indexing systems extract videos with “concepts” that are meaningful for users without using any text information such as tags or meta-data from internet video data. The concepts include: objects such as cars and chairs, scenes such as sunsets and families having an enjoyable time, or events such as wedding ceremonies and fireworks.


News & Media

Degree

  • Doctor of Engineering ( Tokyo Institute of Technology )

Research Interests

  • Smart agriculture

  • Astronomical image processing

  • medical information processing

  • emotion recognition

  • gesture recognition

  • Gait recognition

  • image recognition

  • speech recognition

  • Video recognition

  • Deep Learning

  • Pattern recognition

  • image processing

  • speech processing

  • machine learning

  • speaker recognition

  • Multimodal recognition

Research Areas

  • Informatics / Perceptual information processing

Education

  • The University of Tokyo

    - 1989

      More details

    Country: Japan

    researchmap

  • The University of Tokyo   Graduate School, Division of Science

    - 1989

      More details

  • The University of Tokyo   Faculty of Science   Department of Physics

    - 1987

      More details

    Country: Japan

    researchmap

Research History

  • Institute of Science Tokyo   School of Computing   Professor

    2024.10

      More details

    Country:Japan

    researchmap

  • Tokyo Institute of Technology   School of Computing   Professor

    2016.4 - 2024.9

      More details

    Country:Japan

    researchmap

  • Tokyo Institute of Technology   Graduate School of Information Science and Engineering   Professor

    2013.4 - 2016.3

      More details

    Country:Japan

    researchmap

  • Tokyo Institute of Technology   Graduate School of Information Science and Engineering   Associate Professor

    2007.4 - 2013.3

      More details

    Country:Japan

    researchmap

  • Tokyo Institute of Technology   Graduate School of Information Science and Engineering   Associate Professor

    2003.4 - 2007.3

      More details

    Country:Japan

    researchmap

  • The Institute of Statistical Mathematics   Visiting asspciate professor

    2003.4 - 2005.3

      More details

  • The University of Tokyo   The Graduate School of Information Science and Technology   Associate Professor

    2001.10 - 2003.3

      More details

    Country:Japan

    researchmap

  • Lucent Technology   Bell Laboratories   Visiting scholar

    1997.1 - 1998.2

      More details

    Country:United States

    researchmap

  • NEC Corporation   Central Research Laboratories

    1989.4 - 2001.9

      More details

    Country:Japan

    researchmap

▼display all

Professional Memberships

▼display all

Committee Memberships

  • Institute of Electronics, Information, and Communication Engineers   Editor of Transactions on Information and Systems  

    2006 - 2009   

      More details

    Committee type:Academic society

    Institute of Electronics, Information, and Communication Engineers

    researchmap

  • 電子情報通信学会   小中高科学教室委員、東京支部評議員、英文論文誌ED編集幹事(企画)、音声研究会幹事 、英文論文誌ED編集委員  

    2006 - 2009   

      More details

    Committee type:Academic society

    電子情報通信学会

    researchmap

  • 情報処理学会   論文誌査読委員  

    2006 - 2009   

      More details

    Committee type:Academic society

    情報処理学会

    researchmap

  • 日本音響学会   編集委員会査読委員、音声専門委員会幹事  

    2005 - 2007   

      More details

    Committee type:Academic society

    日本音響学会

    researchmap

Papers

▼display all

MISC

  • MITSuME望遠鏡画像に対する深層学習を用いた突発天体検知システムの構築

    伊藤, 尚泰, Ito, Naohiro, 村田, 勝寛, Murata, Katsuhiro, 細川, 稜平, Hosokawa, Ryohei, 笹田, 真人, Sasada, Mahito, 庭野, 聖史, Niwano, Masafumi, 谷津, 陽一, Yatsu, Yoichi, 河合, 誠之, Kawai, Nobuyuki, 篠田, 浩一, Shinoda, Koichi, 井上, 中順, Inoue, Nakamasa, 伊藤, 亮介, Itoh, Ryosuke, 下川辺, 隆史, Shimokawabe, Takashi

    日本天文学会2022年秋季年会講演予稿集   2022.9

     More details

    Language:Japanese   Publisher:公益社団法人 日本天文学会  

    identifier:oai:t2r2.star.titech.ac.jp:50636760

    CiNii Research

    researchmap

  • MITSuME望遠鏡画像に対する深層学習を用いた突発天体検知システムの構築

    伊藤尚泰, 村田勝寛, 細川稜平, 笹田真人, 庭野聖史, 谷津陽一, 河合誠之, 篠田浩一, 井上中順, 伊藤亮介, 下川辺隆史

    日本天文学会年会講演予稿集   2022   2022

  • 深層学習を用いたMITSuME望遠鏡画像からの突発天体検知(2)

    飯田康太, 谷津陽一, 村田勝寛, 橘優太朗, 河合誠之, LONG Yan, 篠田浩一, 井上中順, 下川辺隆史

    日本天文学会年会講演予稿集   2020   2020

  • q-Gaussian Mixture Models for Video Semantic Indexing

    Nakamasa Inoue, Koichi Shinoda

    IPSJ SIG Notes. CVIM   2012 ( 5 )   1 - 6   2012.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Gaussian mixture models (GMMs) which extend the bag-of-visual-words (BoW) to a probabilistic frame work have been proved to be effective for image and video semantic indexing. Recently, the ^-Gaussian distribution, which is derived in the non-extensive statistics, has been shown to be useful for representing patterns in many complex systems in physics such as fractals and cosmology. We propose q-Gaussian mixture models (q-GMMs),which are mixture models of ^-Gaussian distributions, for image and video semantic indexing. It has a parameter q to control its tail-heaviness. The long-tailed distributions obtained for q > 1 are expected to effectively represent complexly correlated data, and hence, to improve robustness against outliers. In our experiments, our proposed method outperformed the BoW method and achieved 49.4% and 10.9% in Mean Average Precision on the PASCALVOC 2010 dataset and the TRECVID 2010 Semantic Indexing dataset, respectively.

    CiNii Books

    researchmap

  • Speech modeling using committee-based active and semi-supervised learning

    Takuya Tsutaoka, Koichi Shinoda

    IPSJ SIG Notes   2012 ( 22 )   1 - 6   2012.1

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We propose a combination of active learning and semi-supervised learning using committee-based approach for large vocabulary continuous speech recognition. In this method, utterances for manual/automatic transcription are selected by disagreements among the recognition results obtained from recognizers. Our method was evaluated by using simulated speech data in the Corpus of Spontaneous Japanese. It was shown that proposed method can achieve higher recognition accuracy with lower transcribing costs than random sampling. We also investigate the data selection criterion in semi-supervised learning.

    CiNii Books

    researchmap

  • Analysis of spectral reduction in noisy speech and its application to noise robust speech recognition

    41 ( 2 )   117 - 122   2011.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Active learning using multiple recognizers for speech recognition

    Hamanaka Yuzo, Emori Tadashi, Koshinaka Takafumi, Shinoda Koichi, Furui Sadaoki

    IEICE technical report   109 ( 355 )   19 - 23   2009.12

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    researchmap

  • Active learning using multiple recognizers for speech recognition

    HAMANAKA YUZO, EMORI TADASHI, KOSHINAKA TAKAFUMI, SHINODA KOICHI, FURUI SADAOKI

    2009 ( 4 )   1 - 5   2009.12

     More details

  • Automatic recognition of Indonesian declarative questions and statements using polynomial coefficients of the pitch contours

    Nazrul Effendy, Koichi Shinoda, Sadaoki Furui, Somchai Jitapunkul

    Acoustical Science and Technology   30 ( 4 )   249 - 256   2009

     More details

  • 確率モデルによる音声認識のための話者適応化技術(サーベイ論文)

    篠田 浩一

    電子情報通信学会論文誌   J87-D-Ⅱ ( 2 )   371 - 386   2009

     More details

  • Speaker Selection for Unsupervised Speaker Adaptation based on HMM Sufficient Statistics

    TANI Masahiro, EMORI Tadashi, OHNISHI Yoshifumi, KOSHINAKA Takafumi, SHINODA Koichi

    IPSJ SIG Notes   2007 ( 129 )   85 - 89   2007.12

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We propose a new speaker selection method for the unsupervised speaker adaptation based on HMM sufficient statistics. The adaptation technique of using HMM sufficient statistics has been proposed as one of the rapid unsupervised speaker adaptation techniques in speech recognition. The procedure is as follows:First the training speakers acoustically close to the test speaker are selected. Then, the acoustic model is trained using the HMM sufficient statistics of these selected training speakers. In this technique, the number of selected training speakers is always constant.In our proposed speaker selection method, the number of speakers is determined by the distances between the test speaker and each training speaker. In our recognition experiments using spoken dialogue data, the proposed method improved word accuracy by 0.74 points. It was confirmed that the proposed method particularly effective when there are not many training speakers around the test speaker in acoustic space.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00056768/

  • Efficient estimation method of scaling factors among probabilistic models in speech recognition

    EMORI Tadashi, ONISHI Yoshifumi, SHINODA Koichi

    IPSJ SIG Notes   2007 ( 129 )   49 - 53   2007.12

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We propose a new efficient method for estimating scaling factors among probabilistic models in speech recognition. Most speech recognition systems consist of more than one model, include an acoustic and a language model, and require scaling factors to balance probabilities among them. The scaling factors are conventionally optimized in preliminary recognition tests using data for development. In our proposed method, the scaling factors are regarded as parameters of a log-linear model, and they are estimated using a gradient-ascent method based on the maximum a posteriori probability criterion. Posterior probability is computed using word-lattices generated by a speech recognizer. We employ an iteration technique which repeats a word-lattice-generation/scaling-factor-estimation process, and the resulting scaling factor estimation is robust with respect to the changes in initial values. In an experimental evaluation of our method by LVCSR using Japanese dialogue speech data, estimated scaling factors were nearly identical to optimal values obtained in a greedy grid search. We have also confirmed that estimated scaling factors changed little with variations in initial values.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00056762/

  • Robust speech recognition using factorial HMMs for home environments

    Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

    Eurasip Journal on Advances in Signal Processing   2007 ( 20593 )   2007

     More details

  • Robust speech recognition using factorial HMMs for home environments

    Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING   2007 ( 20593 )   2007

     More details

  • Using presentation slide information for lecture speech recognition

    YAMAZAKI Hiroki, IWANO Koji, SHINODA Koichi, FURUI Sadaoki, YOKOTA Haruo

    IPSJ SIG Notes   2006 ( 136 )   221 - 226   2006.12

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We propose a dynamic language model adaptation method for lecture speech recognition in which the information of text on slides for lectures is used. The speech data corresponding to each slide are recognized with a language model adapted to them by using the slide texts as adaptation data. We evaluated the proposed method by using the speech data of three classroom courses in Japanese, and confirmed its effectiveness. The average speech recognition error was reduced by 3.1% by the global adaptation using all slides used in a cource. The error rates of recall and precision for keywords were also reduced by 21.5% and 13.8% respectively. Furthermore, we achieved the improvement of keyword detection performance by the adaptation using each slide locally. The error rates of recall and precision for keywords were reduced by 3.1% and 1.4% respectively from global adaptation.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00056867/

  • Robust scene extraction using multi-stream HMMs for baseball broadcast

    Nguyen Hun Bach, Koichi Shinoda, Sadaoki Furui

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E89D ( 9 )   2553 - 2561   2006.9

     More details

  • Robust Scene Extraction Using Multi-Stream HMMs for Baseball Broadcast

    Nguyen Huu Bach, Koichi Shinoda, Sadaoki Furui

    IEICE Transactions on Information and Systems   E89-D ( 9 )   2553 - 2561   2006

  • 音声情報処理技術の最先端: 2.統計的手法を用いた音声モデリングの高度化とその音声認識への応用

    篠田 浩一, 篠崎 隆宏

    情報処理学会学会誌   45 ( 10 )   1012 - 1019   2004.10

     More details

    Language:Japanese   Publisher:情報処理学会  

    researchmap

  • 音声情報処理技術の最先端: 2.統計的手法を用いた音声モデリングの高度化とその音声認識への応用

    篠田浩一, 篠崎隆宏

    情報処理   45 ( 10 )   1012 - 1019   2004

     More details

  • 確立モデルによる多声音楽演奏のMIDI信号のリズム認識

    武田晴登, 篠田浩一, 嵯峨山茂樹

    情報処理学会論文誌   45 ( 3 )   670 - 679   2004

     More details

  • 確率モデルによる音声認識のための話者適応化技術

    篠田浩一

    電子情報通信学会論文誌D-II   J87 ( 2 )   371 - 386   2004

     More details

  • Acoustic modeling using word contexts for spontaneous speech recognition

    ISOGAWA Kenzo, SHINODA Koichi, SAGAYAMA Shigeki

    IPSJ SIG Notes   2002 ( 121 )   111 - 116   2002.12

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this paper, we study state clustering using word contexts for speech recognition. In spontaneous speech, poorly articulated words often cause recognition error. To improve the recognition performance, we add two questions used in the phonetical decision tree based state clustering. One is a question about parts of speech, and the other is a question about the position of phones within a word. To apply the question about parts of speech, we classify parts of speech into two classes based on the word's duration estimated by using the corpus of spontaneous speech. After making HMMs for each class, we carry out state clustering using a context desicion tree with the questions about the classes. To apply questions about the position, of phones within a word, we make HMMs for phones at the beginning of the word, those for phones at the ending of the word, and those for phones at the other positions, separately. Then we carry out state clustering using a context desicion tree with questions about phone positions. We carried out speech recognition experiments using CSJ(Corpus of Spontaneous Japanese). In the best case, the word accuracy improved by 2.4 points with the use of the former method, and it improved by 6.1 points with the use of the latter method.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00057285/

  • Acoustic modeling using word contexts for spontaneous speech recognition

    ISOGAWA Kenzo, SHINODA Koichi, SAGAYAMA Shigeki

    IEICE technical report. Speech   102 ( 529 )   111 - 116   2002.12

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    In this paper, we study state clustering using word contexts for speech recognition. In spontaneous speech, poorly articulated words often cause recognition error. To improve the recognition performance, we add two questions used in the phonetical decision tree based state clustering. One is a question about parts of speech, and the other is a question about the position of phones within a word. To apply the question about parts of speech, we classify parts of speech into two classes based on the word's duration estimated by using the corpus of spontaneous speech. After making HMMs for each...

    CiNii Books

    researchmap

  • A Rhythm Recognition Method using Rhythm Vectors

    TAKEDA Haruto, SHINODA Koichi, SAGAYAMA Shigeki

    IPSJ SIG Notes   2002 ( 63 )   23 - 28   2002.7

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    This paper proposes a rhythm recognition method for MIDI signal performed by MIDI keyboard. An usual way of automatic transcription from MIDI signals is to play MIDI keyboard with metronome to perform in constant tempo and quantize the note durations in a resolution level which is given by the user. A new method proposed in this paper, however, does not require performer to obey the beats of metronome and can recognize rhythm pattern for automatic transcription. We define ratio of note durations as a new feature "Rhythm vector". Rhythm Vector and tempo variation are integrated in Hidden Mar...

    J-GLOBAL

    researchmap

  • A structural Bayes approach to speaker adaptation

    K Shinoda, CH Lee

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   9 ( 3 )   276 - 287   2001.3

     More details

  • 音声認識のための高速最ゆう推定を用いた声道長正規化

    江森 正, 篠田浩一

    電子情報通信学会論文誌   Vol. J83-D-II ( No. 11 )   2108 - 2117   2000.11

     More details

    Language:Japanese  

    researchmap

  • MDL-based context-dependent subword modeling for speech recognition

    SHINODA K.

    J. Acoust. Soc. Jpn. (E)   21 ( 2 )   79 - 86   2000

     More details

  • MDL-based context-dependent subword modeling for speech recognition

    K. Shinoda, T, Watanabe

    Journal of Acoustic Society of Japan (E)   21 ( 2 )   79 - 86   2000

     More details

  • 音声認識における自律的なモデル複雑度制御を用いた話者適応化

    篠田浩一, 渡辺隆夫

    電子情報通信学会和文論文誌D-II   Vol. J79-D-II ( No. 12 )   2054   1996.12

     More details

    Language:Japanese  

    researchmap

  • 音声認識のためのスペクトル内挿を用いた話者適応化

    篠田浩一, 磯健一, 渡辺隆夫

    電子情報通信学会論文誌A   Vol. J77-A ( No. 2 )   120 - 127   1994.2

     More details

    Language:Japanese  

    researchmap

▼display all

Presentations

  • Speech recognition using tree-structured probability density function

    ICSLP-94  1994 

     More details

  • Unsupervised and incremental speaker adaptation under adverse environmental conditions

    ICSLP-96  1996 

     More details

  • Acoustic modeling based on the MDL criterion for speech recognition

    EuroSpeech-97  1997 

     More details

  • Structural MAP speaker adaptation using hierarchical priors

    IEEE Workshop on Speech Recognition and Understanding  1997 

     More details

  • Unsupervised adaptation using structural Bayes approach

    ICASSP-98  1998 

     More details

  • Unsupervised adaptation using structural Bayes approach

    ICASSP-98  1998 

     More details

  • Rapid Vocal Tract Length Normalization using Maximum Likelihood Estimation

    EuroSpeech2001  2001 

     More details

  • Analytic Methods for Acoustic Model Adapation: A Review

    Isca ITR-Workshop2001  2001 

     More details

  • Rapid Vocal Tract Length Normalization using Maximum Likelihood Estimation

    EuroSpeech2001  2001 

     More details

  • Analytic Methods for Acoustic Model Adapation: A Review

    Isca ITR-Workshop2001  2001 

     More details

  • Acoustic modeling based on the MDL criterion for speech recognition

    EuroSpeech-97  1997 

     More details

  • Structural MAP speaker adaptation using hierarchical priors

    IEEE Workshop on Speech Recognition and Understanding  1997 

     More details

  • Robust Acoustic Modeling for Speech Recognition

    電子情報通信学会 音声研究会  2004 

     More details

  • 隠れマルコフモデルを用いた野球放送の自動的インデクシング

    電子情報通信学会 技術研究報告  2004 

     More details

  • 隠れマルコフモデルを用いた野球放送の自動インデキシング

    電子情報通信学会 パターン認識・メディア理解研究会  2004 

     More details

  • A study of noise discrimination for personal robots

    日本音響学会 2004年秋季講演  2004 

     More details

  • A study of noise discrimination for personal robots

    2004 

     More details

  • 手書き文字の準同期入力を併用した音声認識手法の予備検討

    電子情報通信学会 2004年総合大会  2004 

     More details

  • 動的特徴量を用いたHMMによる連続動作認識

    電子情報通信学会 2004年総合大会  2004 

     More details

  • パーソナルロボット向けの家庭内雑音に頑健な音声認識の検討

    日本音響学会 2003年秋季講演  2003 

     More details

  • パーソナルロボット向けの家庭内雑音に頑健な音声認識の検討

    日本音響学会 2003年秋季研究発表会  2003 

     More details

  • Robust Acoustic Modeling for Speech Recognition

    2004 

     More details

  • Robust Highlight Extraction Using Multi--Stream Hidden Markov Models for Baseball Video

    International Conference on Image Processing 2005(ICIP 2005)  2005 

     More details

  • Recognition of speech in non-stationary noise using Factorial HMMs

    日本音響学会 2005年秋季講演  2005 

     More details

  • 音声と手書き文字の同時入力によるインタフェースの検討

    日本音響学会 2005年秋季講演  2005 

     More details

  • Robust highlight extraction using multi-stream Hidden Markov Models for baseball video

    International Conference on Image Processing 2005 (ICIP2005),  2005 

     More details

  • Recognition of speech in non-stationary noise using factorial HMMs

    2005 

     More details

  • 音声と手書き文字の同時入力によるインターフェースの検討

    日本音響学会2005年秋季研究発表会  2005 

     More details

  • Robust Acoustic Modeling for Speech Recognition

    2004 

     More details

  • Robust Acoustic Modeling for Speech Recognition

    電子情報通信学会 音声研究会  2004 

     More details

  • A study of noise discrimination for personal robots

    日本音響学会 2004年秋季講演  2004 

     More details

  • A study of noise discrimination for personal robots

    2004 

     More details

  • 音声と手書き文字の同時入力インターフェース

    情報処理学会 音声言語情報処理研究会  2005 

     More details

  • Noise discrimination using models with different structures

    日本音響学会 2005年春季講演  2005 

     More details

  • 弁別素性のグラフィカルモデリングによる音声認識

    日本音響学会 2005年春季講演  2005 

     More details

  • Model optimization for noise discrimination in home environment

    Symposium on Large-Scale Knowledge Resources(LKR2005)  2005 

     More details

  • Scene recognition using Hidden Markov Models for video database

    Symposium on Large-Scale Knowledge Resources(LKR2005)  2005 

     More details

  • 隠れマルコフモデルを用いた野球放送の自動的インデクシング

    画像の認識・理解シンポジウム(MIRU2005)  2005 

     More details

  • 隠れマルコフモデルを用いた野球放送の自動的インデクシング

    画像の認識・理解シンポジウム(MIRU2005)  2005 

     More details

  • 隠れマルコフモデルとMLLRによるゲーム適応を用いた野球放送の自動的インデクシング

    第11回画像センシングシンポジウム 講演論文集  2005 

     More details

  • 隠れマルコフモデルとMLLRによるゲーム適応を用いた野球放送の自動インデクシング

    第11回 画像センシングシンポジウム  2005 

     More details

  • 音声と手書き文字の同時入力インターフェース

    情報処理学会 研究報告  2005 

     More details

  • Speaker adaptation with autonomous model complexity control by MDL principle

    ICASSP-96  1996 

     More details

  • Automatically Estimating Number of Scenes for Rushes Summarization

    TRECVID BBC Rushes Summarization Workshop (TVS 2008) at ACM Multimedia  2008 

     More details

  • Improvement of eigenvoice-based speaker adaptation by parameter space clustering

    INTERSPEECH2008  2008 

     More details

  • Robust spoken term detection using combination of phone-based and word-based recognition

    INTERSPEECH2008  2008 

     More details

  • Time-lag Adaptation for Semi-synchronous Speech and Pen Input

    INTERSPEECH2008  2008 

     More details

  • Noise robust speech recognition using spectral subtraction and Fo information extracted by Hough transformation

    2008 

     More details

  • Automatic score Scene Detection for Baseball Video

    Symposium on Large-Scale Knowledge Resources(LKR2008)  2008 

     More details

  • Initial Evaluation of the Drivers' Japanese Speech Corpus in a Car Environment

    2008 

     More details

  • 音声認識のための複数の認識器を利用した能動学習

    情報処理学会研究報告  2009 

     More details

  • SIFT混合ガウス分布と音響特徴を用いた映像からの高次特徴検出

    電子情報通信学会 技術研究報告  2009 

     More details

  • TITGT at TRECVID 2009 Workshop

    TRECVID Workshop (TRECVID 2009)  2009 

     More details

  • Robust Speech Recognition In The Car Environment

    the 4th Language and Technology Conference (LTC'09)  2009 

     More details

  • Noise robust speech recognition using spectral subtraction and F0 information extracted by Hough transform

    Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference  2009 

     More details

  • 音声認識のためのコミッティを用いた能動学習

    日本音響学会秋季研究発表会  2009 

     More details

  • Speaker Adaptation Based on Two-Step Active Learning

    INTERSPEECH 2009 BRIGHTON  2009 

     More details

  • Online speaker clustering using incremental learning of an ergodic hidden markov model

    IEEE ICASSP 2009  2009 

     More details

  • Independent component analysis for noisy speech recognition

    ICASSP 2009  2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • 能動的な適応文選択に基づく話者適応化

    日本音響学会 2009年 春季研究発表会  2009 

     More details

  • ハフ変換による基本周波数情報を用いた耐雑音音声認識の高性能化の検討

    日本音響学会 2009年 春季研究発表会  2009 

     More details

  • 統計的モデル選択によるシーン数の自動推定を用いた動画要約

    電子情報通信学会 技術研究報告  2009 

     More details

  • CHLAC特徴と隠れマルコフモデルを用いたGait認識

    電子情報通信学会 技術研究報告  2009 

     More details

  • 耐雑音音声認識のためのハフ変換による基本周波数情報抽出の高速化

    電子情報通信学会 技術研究報告  2009 

     More details

  • TITGT at TRECVID 2009 Workshop

    TRECVID Workshop (TRECVID 2009)  2009 

     More details

  • Robust Speech Recognition In The Car Environment

    the 4th Language and Technology Conference (LTC'09)  2009 

     More details

  • Noise robust speech recognition using spectral subtraction and F0 information extracted by Hough transform

    Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference  2009 

     More details

  • Speaker Adaptation Based on Two-Step Active Learning

    INTERSPEECH 2009 BRIGHTON  2009 

     More details

  • Online speaker clustering using incremental learning of an ergodic hidden markov model

    IEEE ICASSP 2009  2009 

     More details

  • Independent component analysis for noisy speech recognition

    ICASSP 2009  2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • ToFカメラによる3D手話認識

    画像の認識・理解シンポジウム  2010 

     More details

  • NIST SRE 2010:Tokyo Tech Speaker Recognition

    NIST 2010 Speaker recognition evaluation workshop  2010 

     More details

  • NIST SRE 2010:Tokyo Tech Speaker Recognition

    NIST 2010 Speaker recognition evaluation workshop  2010 

     More details

  • Gait Recognition Using CHLAC Features and Hidden Markov Model

    IEICT Tachnical Report  2009 

     More details

  • Family Adaptation of Factorial HMMs for Personal Robots

    2006 

     More details

  • Home-Environment Adaptation of Phoneme Factorial Hidden Markov Models

    Poznan, Poland  2007 

     More details

  • Family adaptaion of Factorial HMMs for personal robots

    日本音響学会 2006年春季講演  2006 

     More details

  • Robust scene recognition for baseball broadcast

    International Symposium on Large-Scale Knowledge Resources(LKR2006)  2006 

     More details

    Presentation type:Poster presentation  

    researchmap

  • FHMM for Robust Speech Recognition in Home Environment

    International Symposium on Large-Scale Knowledge Resources(LKR2006)  2006 

     More details

    Presentation type:Poster presentation  

    researchmap

  • 十分統計量を用いた教師なし話者適応における話者選択法

    電子情報通信学会 技術研究報告  2007 

     More details

  • 音声認識における確率モデルの重み係数の自動推定

    電子情報通信学会 技術研究報告  2007 

     More details

  • 数値列化したイベントシーンの学習と試合進行状況情報による制約条件を用いた野球映像イベント識別

    電子情報通信学会 技術研究報告  2007 

     More details

  • An Interface Using Semi-synchronous Speech and Pen Input

    IJARC(Microsoft)-Tokyo Institute of Technology Joint Symposium on "The forefront of the Speech Recognition Research"  2007 

     More details

  • TokyoTech's TRECVIC2007 Notebook

    TRECVID 2007 Workshop  2007 

     More details

  • ハイブリッドモデルに基づく単視点ビデオデータにおける人間の歩行動作のトラッキング

    電子情報通信学会 技術研究報告  2007 

     More details

  • Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition

    INTERSPEECH 2007  2007 

     More details

  • Automatic Estimation of Scaling Factors Among Probabilistic Models in Speech Recognition

    INTERSPEECH2007  2007 

     More details

  • Predictive Minimum Bayes Risk Classification for Robust Speech Recognition

    INTERSPEECH2007  2007 

     More details

  • 投球の次ショットに重きを置いたシーンのパターン化と離散隠れマルコフモデルを用いた野球放送映像の自動イベント分類

    映像情報メディア学会誌  2007 

     More details

  • 多段SVMを用いた頑健な動画ショット境界検出

    画像の認識・理解シンポジウム(MIRU 2007)IS-2-19  2007 

     More details

  • A Robust Scene Recognition System for Baseball Broadcast Using Date-Driven Approach

    CIVR2007, Amsterdam, The Netherlands  2007 

     More details

  • 時系列データに対するデータ駆動型アプローチに基づく野球放送の頑健なシーン認識

    画像の認識・理解シンポジウム(MIRU 2007)IS-1-17  2007 

     More details

  • 「野球放送のためのデータ駆動型アプローチを用いた得点シーン検出」

    第13回 画像センシングシンポジウム 予稿集  2007 

     More details

  • Speech Recognition Using FHMMs Robust against Nonstationary Noise

    ICASSP 2007  2007 

     More details

  • Speech Recognition Using FHMMs Robust against Nonstationary Noise

    IEEE ICASSP 2007  2007 

     More details

  • Semi-Synchronous Speech and Pen Input

    ICASSP 2007  2007 

     More details

  • スライド資料を用いた講義音声認識のための言語モデル適応

    2007年春季講演論文集  2007 

     More details

  • Recognition of speech in non-stationary noise using factorial HMMs

    2005 

     More details

  • Noise discrimination using models with different structures

    日本音響学会 2005年春季講演  2005 

     More details

  • Model optimization for noise discrimination in home environment

    Symposium on Large-Scale Knowledge Resources(LKR2005)  2005 

     More details

  • Scene recognition using Hidden Markov Models for video database

    Symposium on Large-Scale Knowledge Resources(LKR2005)  2005 

     More details

  • Noise discrimination using models with different structures

    2005 

     More details

  • 弁別素性のグラフィカルモデリングによる音声認識

    音声音響学会2005年春季研究発表会  2005 

     More details

  • Model optimization for noise discrimination in home environment

    Symposium on Large-Scale Knowledge Resources (LKR2005)  2005 

     More details

  • Robust Highlight Extraction Using Multi--Stream Hidden Markov Models for Baseball Video

    International Conference on Image Processing 2005(ICIP 2005)  2005 

     More details

  • Recognition of speech in non-stationary noise using Factorial HMMs

    日本音響学会 2005年秋季講演  2005 

     More details

  • Robust highlight extraction using multi-stream Hidden Markov Models for baseball video

    International Conference on Image Processing 2005 (ICIP2005),  2005 

     More details

  • 動画像インデクシングのためのシーン時系列の確率的言語モデル

    第12回 画像センシングシンポジウ  2006 

     More details

    Presentation type:Poster presentation  

    researchmap

  • ビデオ画像における人間の歩行動作の3次元トラッキング

    電子情報通信学会 パターン認識・メディア理解研究会  2006 

     More details

  • Towards optimal bayes decision for speech recognition

    ICASSP2006  2006 

     More details

  • Noise discrimination using models with different structures

    2005 

     More details

  • Model optimization for noise discrimination in home environment

    Symposium on Large-Scale Knowledge Resources (LKR2005)  2005 

     More details

  • 講義音声認識における講義スライド情報の利用

    電子情報通信学会  2006 

     More details

  • Multimedia Information Retrieval Using Pattern Recognition Techniques

    IJARC 2nd Symposium  2006 

     More details

  • Robust Scene Recognition Using Language Models

    MIR 2006, ACM Workshop 2006  2006 

     More details

  • 音声とペンの準同期入力に対するマルチモーダル認識

    日本音響学会 2006年秋季講演  2006 

     More details

  • 音声とペン入力の同時入力に対する認識方式の検討

    電子情報通信学会 音声研究会  2006 

     More details

  • Robust Scene Recognition Using Language Models

    MIR 2006, ACM Workshop 2006  2006 

     More details

  • Towards optimal bayes decision for speech recognition

    ICASSP2006  2006 

     More details

  • Family Adaptation of Factorial HMMs for Personal Robots

    2006 

     More details

  • Family adaptaion of Factorial HMMs for personal robots

    日本音響学会 2006年春季講演  2006 

     More details

  • 基本周波数情報を用いたダイナミックベイジアンネットワークによる音声認識

    電子情報通信学会 音声研究会  2006 

     More details

  • 基本周波数情報のグラフィカルモデリングによる音声認識

    日本音響学会 2006年春季講演  2006 

     More details

  • Robust scene recognition for baseball broadcast

    International Symposium on Large-Scale Knowledge Resources(LKR2006)  2006 

     More details

    Presentation type:Poster presentation  

    researchmap

  • FHMM for Robust Speech Recognition in Home Environment

    International Symposium on Large-Scale Knowledge Resources(LKR2006)  2006 

     More details

    Presentation type:Poster presentation  

    researchmap

  • 野球中継番組を対象とした音響情報を用いたシーン認識

    日本音響学会2006年春季講演論文集  2006 

     More details

  • Multimedia Information Retrieval Using Pattern Recognition Techniques

    IJARC 2nd Symposium  2006 

     More details

  • Speaker adaptation for demi-syllable based speech recognition using continuous HMM,

    ICSLP-90  1990 

     More details

  • Speaker adaptation for demi-syllable based speech recognition using continuous HMM,

    ICSLP-90  1990 

     More details

  • Speaker Adaptation for Demi-Syllable-Based Continuous-Density HMM

    ICASSP-91  1991 

     More details

  • Speaker Adaptation for Demi-Syllable-Based Continuous-Density HMM

    ICASSP-91  1991 

     More details

  • Speech recognition using tree-structured probability density function

    ICSLP-94  1994 

     More details

  • Unsupervised speaker adaptation for speech recognition using demi-syllable HMM

    ICSLP-94  1994 

     More details

  • Unsupervised speaker adaptation for speech recognition using demi-syllable HMM

    ICSLP-94  1994 

     More details

  • High speed speech recognition using tree-structured probability density function

    ICASSP-95  1995 

     More details

  • Speaker adaptation with autonomous control using tree structure

    EuroSpeech-95  1995 

     More details

  • High speed speech recognition using tree-structured probability density function

    ICASSP-95  1995 

     More details

  • Speaker adaptation with autonomous control using tree structure

    EuroSpeech-95  1995 

     More details

  • Speaker adaptation with autonomous model complexity control by MDL principle

    ICASSP-96  1996 

     More details

  • Unsupervised and incremental speaker adaptation under adverse environmental conditions

    ICSLP-96  1996 

     More details

  • Home-Environment Adaptation of Phoneme Factorial Hidden Markov Models

    Poznan, Poland  2007 

     More details

  • Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition

    INTERSPEECH 2007  2007 

     More details

  • Automatic Estimation of Scaling Factors Among Probabilistic Models in Speech Recognition

    INTERSPEECH2007  2007 

     More details

  • Predictive Minimum Bayes Risk Classification for Robust Speech Recognition

    INTERSPEECH2007  2007 

     More details

  • A Robust Scene Recognition System for Baseball Broadcast Using Date-Driven Approach

    CIVR2007, Amsterdam, The Netherlands  2007 

     More details

  • Speech Recognition Using FHMMs Robust against Nonstationary Noise

    ICASSP 2007  2007 

     More details

  • Speech Recognition Using FHMMs Robust against Nonstationary Noise

    IEEE ICASSP 2007  2007 

     More details

  • Semi-Synchronous Speech and Pen Input

    ICASSP 2007  2007 

     More details

  • Robust Scene Recognition Using Scene Context Information for Video Contents

    International Symposium on Large-Scale Knowledge Resources(LKR2007)  2007 

     More details

  • Comparative Study on Robust Speech Recognition against Nonstationary Noise in the Home Environment

    Symposium on Large-Scale Knowledge Resources(LKR2007)  2007 

     More details

  • Presentation Scene Retrieval Exploiting Features in Videos Including Pointing and Speech Information

    Symposium on Large-Scale Knowledge Resources(LKR2007).  2007 

     More details

  • Tokyo Tech at TRECVID 2008

    TRECVID Workshop (TRECVID 2008)  2008 

     More details

  • Automatically Estimating Number of Scenes for Rushes Summarization

    TRECVID BBC Rushes Summarization Workshop (TVS 2008) at ACM Multimedia  2008 

     More details

  • Improvement of eigenvoice-based speaker adaptation by parameter space clustering

    INTERSPEECH2008  2008 

     More details

  • Robust spoken term detection using combination of phone-based and word-based recognition

    INTERSPEECH2008  2008 

     More details

  • Time-lag Adaptation for Semi-synchronous Speech and Pen Input

    INTERSPEECH2008  2008 

     More details

  • スペクトルサブトラクションとハフ変換による基本周波数情報を用いた耐雑音音声認識

    日本音響学会秋季研究発表会  2008 

     More details

  • Automatic score Scene Detection for Baseball Video

    Symposium on Large-Scale Knowledge Resources(LKR2008)  2008 

     More details

  • 連続音素認識を用いた単語認識誤りに頑健な講演音声検索

    日本音響学会春季研究発表会  2008 

     More details

  • パラメータ空間のクラスタ化による固有声話者適応化の改良

    日本音響学会春季研究発表会  2008 

     More details

  • 音声とペンの同時入力における個人差への適応化

    日本音響学会春季研究発表会  2008 

     More details

  • 木構造クラスタリングを用いた動画像からの高次特徴抽出

    電子情報通信学会 技術研究報告no.491  2008 

     More details

  • 木構造クラスタリングを用いた動画像からの高次特徴抽出

    電子情報通信学会 技術研究報告  2008 

     More details

  • Initial Evaluation of the Drivers' Japanese Speech Corpus in a Car Environment

    2008 

     More details

  • Tokyo Tech at TRECVID 2008

    TRECVID Workshop (TRECVID 2008)  2008 

     More details

  • Robust Scene Recognition Using Scene Context Information for Video Contents

    International Symposium on Large-Scale Knowledge Resources(LKR2007)  2007 

     More details

  • Comparative Study on Robust Speech Recognition against Nonstationary Noise in the Home Environment

    Symposium on Large-Scale Knowledge Resources(LKR2007)  2007 

     More details

  • Presentation Scene Retrieval Exploiting Features in Videos Including Pointing and Speech Information

    Symposium on Large-Scale Knowledge Resources(LKR2007).  2007 

     More details

  • An Interface Using Semi-synchronous Speech and Pen Input

    IJARC(Microsoft)-Tokyo Institute of Technology Joint Symposium on "The forefront of the Speech Recognition Research"  2007 

     More details

  • TokyoTech's TRECVIC2007 Notebook

    TRECVID 2007 Workshop  2007 

     More details

▼display all

Works

  • Study of speech recognition

    2002

     More details

    Work type:Artistic work  

    researchmap

  • 音声認識の研究

    2002

     More details

    Work type:Artistic work  

    researchmap

Awards

  • 電子情報通信学会論文賞

    1998  

     More details

    Country:Japan

    researchmap

  • Excellent Paper Award from the Institute of Electronics, Information, and Communication Engineers

    1998  

     More details

  • 日本音響学会粟谷学術奨励賞

    1997  

     More details

    Country:Japan

    researchmap

  • the Awaya Prize from the Acoustic Society of Japan

    1997  

     More details

  • 日本音響学会技術開発賞

    1995  

     More details

    Country:Japan

    researchmap

Research Projects

  • 深層生成モデルを活用した構成的なパターン認識・理解

    Grant number:23H00490  2023.4 - 2026.3

    日本学術振興会  科学研究費助成事業  基盤研究(A)

    篠田 浩一, 井上 中順, 横田 理央, 川上 玲, 佐藤 育郎

      More details

    Grant amount:\47190000 ( Direct Cost: \36300000 、 Indirect Cost:\10890000 )

    本研究課題では,識別の対象(インスタンス)を属性の集合(束)とみなし,特徴量空間においてその特徴を属性ごとに分解する.そして,これらの属性特徴からインスタンスを再合成する過程で属性特徴を最適化することで,各属性を高精度で識別し,かつ,外れ値に対し頑健な識別手法を実現することを目的としている。このために深層生成モデルと高密度な属性アノテーションに基づく学習手法を開発する.従来研究の多くが対象とその属性が一対一に対応する平坦な意味構造を仮定していたのに対し,本研究は多くの属性が複雑に絡み合う対象における複数の属性を同時に識別することを可能にする.新しい属性やクラスの創発も視野に入れる.より具体的には、深層学習を用いた「合成による識別」のアプローチにより,構成的なパターン認識・理解を行う方法論を確立する.人の動作認識,話者・感情認識,マルチモーダル認識の3つのタスクで横断的に評価し,従来に比べ高い識別性能を目指す.初年度である本年度は、人の動作認識、話者・感情認識、マルチモーダル認識の各々の課題において、評価データベースの構築と、ベースライン方式の開発を行った。これらと並行して、比較的小規模なタスクで、拡散モデルなどの生成モデルを用いて識別を行う方式の開発を行った。また、ニューラル構造探索などを用いて生成モデルの効率的な学習を行う方式も開発した。特に、センサーと映像のマルチモーダル認識における基本方式の構築、およびデータベース構築、人間の歩容認識の基本方式の開発、マルチモーダル感情認識の基本方式の開発を行った。

    researchmap

  • Decoding of Imagined Speech from Minimally Invasive EEG for Intentional BMI

    Grant number:23H00548  2023.4 - 2026.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (A)

      More details

    Grant amount:\46930000 ( Direct Cost: \36100000 、 Indirect Cost:\10830000 )

    researchmap

  • 知識限界を明確化する機能分化された深層学習

    Grant number:22H03642  2022.4 - 2025.3

    日本学術振興会  科学研究費助成事業  基盤研究(B)

    佐藤 育郎, 川上 玲, 井上 中順, 篠田 浩一

      More details

    Grant amount:\17420000 ( Direct Cost: \13400000 、 Indirect Cost:\4020000 )

    researchmap

  • 知識限界を明確化する機能分化された深層学習

    Grant number:23K24898  2022.4 - 2025.3

    日本学術振興会  科学研究費助成事業  基盤研究(B)

    佐藤 育郎, 篠田 浩一, 井上 中順, 川上 玲

      More details

    Grant amount:\17420000 ( Direct Cost: \13400000 、 Indirect Cost:\4020000 )

    シナプティック記憶テーマについて,従来法の課題を解決できる理論的枠組みを構築した.巡回型のモダンホップフィールドネットワークは,入力クエリに対し,モデルの内部に持つ記憶データの関連付けが行えるが,クエリが分布外データに相当するときに誤った関連付けを行ってしまう.この課題に対し,我々は分布の内外判定機能を持たせることによって原理的に課題を解決できる方法を定式化した.現在論文を執筆中である.
    人物行動の生成的予測テーマについて,異なる人体モデルに基づくデータを統括的に学習に用いることのできるアルゴリズムを開発し,国際会議ECCVに論文を投稿した(査読中).この手法により,人体モデルの定義が異なるデータセットを学習でき,より自然な行動生成が行えることを確認した.
    視点変化の下での三次元理解テーマについて,生成器と回帰器の協調的推論という新規な提案を行い,回帰器のみを用いる従来法に対する性能改善効果を確認した.国際会議ICIPに論文を投稿した(査読中).機能分化されたモデル群(異なる目的関数によって最適化された複数のネットワーク)が協調的に働くことで下流タスクの性能が改善できることを示すことが出来た.
    時系列整合判定テーマについて,既存の自動運転用の認識器の特徴に整合を壊す成分が含まれる課題を確認した.
    目標値伝播法テーマについては,従来法の性能改善として,順・逆ネットワークのヤコビアンの整合性を取る方法を提案した(Y. Baoら,AAAI 2024).

    researchmap

  • 機械学習を用いた突発天体検知サーベイロボットの構築

    Grant number:20K04011  2020.4 - 2023.3

    日本学術振興会  科学研究費助成事業  基盤研究(C)

    村田 勝寛, 谷津 陽一, 篠田 浩一, 井上 中順, 下川辺 隆史

      More details

    Grant amount:\4290000 ( Direct Cost: \3300000 、 Indirect Cost:\990000 )

    本年度の成果は大きく分けて以下の二点である。
    (a) 広視野望遠鏡の設置
    本年度前半は広視野望遠鏡用に検討していた2台のアマチュア天文向けCMOSカメラの性能評価を進めた。実験室での試験と大学屋上での試験観測により、このうち1台で天文研究向けの観測に必要な性能を備えていることを確認できた。それを踏まえて10月に岡山県浅口市のMITSuME望遠鏡天体ドーム内に口径20cm広視野望遠鏡を設置して観測を開始した。はじめに重力波追観測用のSDSS gバンドフィルターでの試験観測をおこない、合計1000秒積分で17.5等の限界等級を達成していることを確認した。また、望遠鏡の赤道儀制御、CMOSカメラ撮像制御のソフトウェアを開発して、事前に準備した天体リストにもとづき自動観測を実施できる機能を導入した。本年度は重力波望遠鏡の観測は停止しているため、銀河系内の天体を中心に観測を進めた。
    (b) 突発天体検知システム開発
    突発天体検知システムは、観測画像の一次処理、深層学習を用いた突発天体の識別システムからなる。我々が運用するMITSuME可視光望遠鏡のパイプラインを移植することで、本年度岡山に設置した広視野望遠鏡の画像処理と測光の自動化を実現した。突発天体識別システムについては、識別精度の向上のため、これまで開発してきた深層学習を用いた識別手法の改善を試みた。また、実運用に向けてサーバーの立ち上げとスクリプト群の開発を進めた。

    researchmap

  • Implementation of Intentional BMI through Large-Scale EEG Data and Calibration-Free Model Construction

    Grant number:20H00235  2020.4 - 2023.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (A)

      More details

    Grant amount:\46020000 ( Direct Cost: \35400000 、 Indirect Cost:\10620000 )

    researchmap

  • 声真似による成りすまし攻撃に対する話者照合の耐性向上に関する研究

    Grant number:19K12051  2019.4 - 2023.3

    日本学術振興会  科学研究費助成事業  基盤研究(C)

    岩野 公司, 篠田 浩一

      More details

    Grant amount:\4030000 ( Direct Cost: \3100000 、 Indirect Cost:\930000 )

    本研究では,声による個人認証(話者照合)の実用化に向けた,「声真似による成りすまし攻撃」の対策についての検討を進める.過去の研究において,物真似のスキルの違いによって声真似の特徴や成りすましが成功する理由に違いがあることが示唆されていることから,そのメカニズムの解明を図り,その知見に基づいて声真似攻撃に対する効果的な対策手法の提案を目指す.
    2021年度は,新規に深層学習に基づく話者照合システムの構築と導入を行い,このシステムが「物真似のスキルが高い人」の声真似の攻撃をどの程度防御できるかについて,調査・分析を行った.その結果,「物真似のスキルが高い人」の声真似は,今回の深層学習ベースの話者照合システムにおいても「物真似のスキルが低い人」の声真似に比べて成りすましの成功率の大きな上昇が見られ,高い攻撃力を有することが確認された.したがって,深層学習の単純な導入のみでは声真似による詐称攻撃の対策としては不十分であることが示された.
    そこで,対策手法の一つとして,「声真似のスキルが高い人」の声真似を収集し,そのデータを(声真似をされた)本人の発声ではないものとして学習に利用し,話者照合システムの識別性能を高める方法が考えられる.しかし,実際に「声真似のスキルが高い人」(プロの物真似タレントなど)に依頼して大量の声真似音声を収集することは現実的ではないため,近年,高性能化が進んでいる「声質変換技術」を用いて声真似に相当する音声を人工的に生成し,それを学習に利用することを考える.2021年度は,2種類の「声質変換技術」を用いて詐称用音声の作成を行い,システムに対する攻撃力の調査を行うことで,「声真似のスキルが高い人」の声真似音声と特徴が類似しているかを調査した.その結果,1種類の声質変換器が,「声真似のスキルが高い人」の声真似攻撃と同様の特徴を有する音声を生成できることが確認された.

    researchmap

  • マルチエージェント深層学習による音声因子分解

    Grant number:19H04133  2019.4 - 2022.3

    日本学術振興会  科学研究費助成事業  基盤研究(B)

    篠田 浩一, 井上 中順, 岩野 公司, 宇都 有昭

      More details

    Grant amount:\17420000 ( Direct Cost: \13400000 、 Indirect Cost:\4020000 )

    前年度に「(A)音韻性と雑音の分離」と「(B)音韻性と話者性の分離」について、ある程度の性能向上が達成されたため、今年度はこれらを活用して、「(D)音源分離」の応用に着手した。より具体的には、これまで開発してきた波形ベースの音声信号処理システムをベースに、複数の話者の音声を分離する音源分離のフレームワークをまず構築した。そして、「(A)音韻性と雑音の分離」の成果をもとに、音源から雑音を除去する仕組みをこのシステムに追加し、雑音の影響に対して頑健な音声分離システムを構築した。既存のデータベースに雑音を重畳したデータを構築し、それを用いて評価を行った。従来法に比べ有意に高い性能を得た。この成果は、信号処理関連の国際会議IEEE APSIPA 2021に採択され、発表した。さらに「(E)言語認識、感情認識」において、まず感情認識に着手した。ここでは、まず、「(B)音韻性と話者性の分離」の成果に基づき、音声から音韻性と話者性を分離する、disentanglement(もつれをほどく)のフレームワークを構築し、残された成分を入力として感情の認識を行うシステムを開発する。今年度は、既存のデータベースを用いたベースラインを構築した。disentanglementには、オートエンコーダー(自己符号化器)を利用した声質変換を用いる方法を採用し、実装を行った。最終年度に評価を行う予定である。評価には、感情認識の分野で広く用いられているIEMOCAPデータベースを用いる予定である。

    researchmap

  • Development of high-precision high-speed transient detection method in wide-field survey utilizing GPU and Deep Learning

    Grant number:16K13783  2016.4 - 2019.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Challenging Exploratory Research

    YATSU YOICHI, KAWAI NOBUYUKI

      More details

    Grant amount:\3770000 ( Direct Cost: \2900000 、 Indirect Cost:\870000 )

    GGeneric technologies for automating the optical/IR telescopes for the up coming Gravitational astronomy Era were developed .Currently the robotic telescopes still require humans help for assigning of observation schedule and the final confirmations of transient detections. To reduce the delay of the follow-up observation, we developed a weather recognition algorithm and a transient detection algorithm by using machine learning and image recognition technologies. In addition we also successfully speeded up the data reduction pipeline by a factor of 1/30 by utilizing GPU parallel computing technology.

    researchmap

  • Multimodal time-sequence data recognition platform based on deep learning

    Grant number:16H02845  2016.4 - 2019.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

    Shinoda Koichi

      More details

    Grant amount:\15990000 ( Direct Cost: \12300000 、 Indirect Cost:\3690000 )

    This research aims to accurately recognize multi-modal time-sequence signals using deep learning. We applied various deep learning techniques such as End-to-end training, deep net which is trainable with a small amount of data, multi-task learning, and noise-robust recognition. Particularly, we improved the recognition and detection performance in simultaneous training for source separation and speech recognition, dementia detection from speech, multi-modal speech recognition using lip reading, noise-robust speech recognition.

    researchmap

  • Pattern recognition using graph signal processing for large-scale time-sequence data

    Grant number:15K12061  2015.4 - 2018.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Challenging Exploratory Research

    Koichi Shinoda

      More details

    Grant amount:\3510000 ( Direct Cost: \2700000 、 Indirect Cost:\810000 )

    We have developed an action recognition method from RGB-D camera inputs. This method uses a time sequence of human skeleton as an input. Every frame it extracts features by using spectral graph wavelet transform. Then the features are pooled in a hierarchical way in the time axis. This method achieved the state-of-the-art in multi-view action recognition.

    researchmap

  • Research on robust speaker verification against spoofing attacks by voice imitation

    Grant number:25330206  2013.4 - 2017.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (C)

    Iwano Koji

      More details

    Grant amount:\4420000 ( Direct Cost: \3400000 、 Indirect Cost:\1020000 )

    In this research, we experimentally analyzed the effect of spoofing attacks by voice imitation on speaker verification systems and acoustical characteristics of the imitated voice. These analyses were conducted by using our original speech data consisting of professional and non-professional impersonators' imitated voice. The analysis results show that the voice imitation by non-professional impersonators significantly increases the success rate of spoofing attacks and the higher success rate is yielded by the professional impersonator's imitation. We also proposed a method for quantitatively evaluating the quality of voice imitation. The method reveals that the professional imitator efficiently approaches his voice characteristics towards the target speaker's voice.

    researchmap

  • Microphone Array Signal Processing with Asynchronous Recording Devices

    Grant number:25280069  2013.4 - 2016.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

    ONO Nobutaka, MAKINO Shoji, MIYABE Shigeki, SHINODA Koichi

      More details

    Grant amount:\17030000 ( Direct Cost: \13100000 、 Indirect Cost:\3930000 )

    Microphone array signal processing is an important technique to estimate the direction of arrival of sound or to enhance a target sound in noisy environment by processing multi-channel signals. In the microphone array signal processing, a tiny time difference between channels is important information. Therefore, multi-channel signals have to be recorded in a synchronized way in conventional framework. While in this study, we have developed a technique to synchronize recording signals or to estimate microphone positions without any a priori knowledge in order to use asynchronous individual recording devices such as smartphones, laptop PC, and IC recorder.

    researchmap

  • Speech information processing using deep generative models and their factorization

    Grant number:25280058  2013.4 - 2016.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

    Shinoda Koichi, IWANO Koji, SHINOZAKI Takahiro

      More details

    Grant amount:\16900000 ( Direct Cost: \13000000 、 Indirect Cost:\3900000 )

    In speech recognition, it is important to train an accurate deep neural network (DNN) acoustic model from a large amount speech data from many speakers. In this study, we developed a framework to improve accuracy of the DNN acoustic model by factorizing speech data into phoneme and speaker elements. First we developed a speaker recognition method using deep Siamese network in which two DNNs which share its part. Second, we applied a DNN with a hierarchical phonetic structure to speaker adaptation. Third, we developed a speaker-adaptive training method where we utilized a student-teacher learning framework using soft targets. We improved speaker verification and speech recognition performance. We also studied DNN implementation and DNN structure design.

    researchmap

  • Spoken Language Proceeding Based on Non-Extensive Information Theory

    Grant number:24650079  2012.4 - 2015.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Challenging Exploratory Research

    SHINODA KOICHI

      More details

    Grant amount:\3900000 ( Direct Cost: \3000000 、 Indirect Cost:\900000 )

    We have developed a methodology for spoken language processing based on non-extensive statistical theory, which is an extension from the conventional extensive statistical theory. We first developed q-log spectral subtraction (q-LMSN) to achieve robustness against the difference of environmental noises and of channels. We proved that it was significantly better than the conventional CMN. Next, we developed a recognition a method using q-Gaussian mixtures for output probabilities in GMMs and in HMMs. We applied it to speech recognition and to video semantic indexing and proved its effectiveness.

    researchmap

  • Macromolecular Potential Energy Decoder Based on Graphical Model

    Grant number:23650068  2011 - 2013

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Challenging Exploratory Research

    SHINOZAKI Takahiro, SHINODA Shinoda, SEKIJIMA Masakazu

      More details

    Grant amount:\3250000 ( Direct Cost: \2500000 、 Indirect Cost:\750000 )

    Knowing tertiary structure is important to understand and predict protein function. However, it is an open question how to predict the tertiary structure of proteins from a sequence of amino acids. In this project, Slice Chain Max-Sum (SCMS) algorithm has been proposed. This method represents the potential function of a protein molecule as a factor graph, which is a kind of a graphical model. The factor graph is converted into a linearly structured one according to a slicing of the molecule in 3D space. Based on the converted graph, max-sum search is performed in combination with node-wise local MCMC sampling that approximates continuous variables by discrete ones. Experimental results show that SCMS is more efficient than conventional MCMC method. It is also shown that improved version of SCMS (i.e. SCMS2.0) outperforms MCMC method that is reinforced by the quasi-Newton method.

    researchmap

  • Advancement of speech recognition technology using WFST

    Grant number:21300062  2009 - 2011

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

    FURUI Sadaoki, SHINODA Koichi, SHINOZAKI Takahiro

      More details

    Grant amount:\18070000 ( Direct Cost: \13900000 、 Indirect Cost:\4170000 )

    With the aim of improving the performance of automatic speech recognition using the Weighted Finite State Transducer(WFST)-based decoder and developing new applications of the decoder, a wide range of research has been conducted and various achievements have been obtained. The world highest performance speech recognition decoder,"T^3 decoder", has been developed by improving the on-the-fly algorithm for the WFST decoder. Recognition performance under noisy environment has been improved by incorporating speech/non-speech information to the decoder. Various new techniques have been developed to apply the decoder to the recognition of resource-deficient languages and code-switching speech, and to transliteration. Innovative ideas have been proposed toward new directions of the decoder technology. T^3 decoder has been released to domestic as well as overseas research laboratories.

    researchmap

  • A study of multimodal recognition for human communication search

    Grant number:20300063  2008 - 2010

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

    SHINODA Koichi, FURUI Sadaoki

      More details

    Grant amount:\17940000 ( Direct Cost: \13800000 、 Indirect Cost:\4140000 )

    We developed multimodal pattern recognition techniques for human communication using speech and video. We proposed a statistical technique using Gaussian mixture models and support vector machines for event extraction. We participated in TRECVID2010 workshop, where our method achieved the 4-th performance among 40 participants from all over the world. We also developed new methods for active learning for speech modeling and adaptation, noise robust speech recognition, signal processing for meeting speech recognition, multimodal pattern recognition, speaker/gesture recognition, speech style analysis and video summarization.

    researchmap

  • Systemization of audio-visual knowledge resources using graphical models

    Grant number:17300059  2005 - 2007

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

    SHINODA Koichi, FURUI Sadaoki

      More details

    Grant amount:\14780000 ( Direct Cost: \13700000 、 Indirect Cost:\1080000 )

    Recent advances in computer technology, particularly in storage technology, have resulted in significant increases in the number and quality of audio-visual knowledge resources. Most of those resources are not equipped with index information, and thus, it has become difficult for ordinary people to browse the entire content of each database. Techniques for systemizing audio-visual knowledge resources and utilizing them have been strongly demanded. However, statistical pattern recognition techniques have not yet achieved enough performance for this purpose. In addition, it is not always clear what kinds of indexing are useful. In this study, we take an approach to index those databases in different ways with unsupervised manner, and extract dependencies among those labels. First, we carried scene recognition for baseball video. We constructed annotated database for 43 games of Major League Baseball with NHK Science & Technical Research Labs and used them for our evaluation. We used various relationships between scene labels such as scene contexts, and unified audio and visual information. We achieved 60% accuracy for 16 scene recognition and 90% recall rate for score scene detection. Our techniques are expected to contribute much to make automatic highlight extraction systems for broadcast companies. Second, we participated in TRECVID workshop organized by NIST, USA, to study the high-level feature extraction task. We constructed tree-structured dictionaries of "visual words" by unsupervised clustering for video features, and selected a tree-cut as a dictionary for each word. By using Bag-of-word approach, we constructed a robust extraction system against the differences in data amount for each feature. We also extracted effective "motion words" for dynamic features. Our method achieved significant improvements in the task of extracting 39 features. The other research topics include robust speech recognition using graphical models, multi-modal interface for asynchronous multi-modal inputs, human-gait modeling.

    researchmap

  • グラフィカルモデリングを用いた話し言葉音声認識の研究

    Grant number:15650028  2003 - 2005

    日本学術振興会  科学研究費助成事業  萌芽研究

    篠田 浩一, 古井 貞煕

      More details

    Grant amount:\2900000 ( Direct Cost: \2900000 )

    グラフィカルモデリングの手法を用いて、音声の内在構造を抽出することを目的とし研究を進めている。これは、従来のHMMに代表される画一的な認識単位をベースとしたモデルより、より自由度の大きいモデルを用いることで、音声の様々な現象に対応することを目的としたものである。我々は、グラフィカルモデルの中でも、時系列データに対応したダイナミックベイジアンネットワークを音声に応用する研究を進めている。モデル化の対象は話し言葉音声とし、その音声認識性能の向上を目標としている。話し言葉音声は、読み上げ音声とは異なり、一般に話速が大きく、また、調音結合(発声のなまけ)が大きい。そのため、従来の隠れマルコフモデル(HMM)を用いた音声認識では、認識性能が70%と、読み上げ音声と比べると性能が劣る。その改善が課題である。
    初年度は、話し言葉音声における音声の特徴の把握、グラフィカルモデリングのツールの準備を行った。次年度は、実際にグラフィカルモデリングツールを用いて調音結合のモデル化を行い、その性能を評価した。調音器官の運動を主に表す弁別素性に着目し、グラフィカルモデリングの手法を用いて音素をそれら弁別素性の束として表したモデルを用いて、認識性能をあげることを試み、若干の性能向上を得た。今年度は、それに加え、基本周波数(ピッチ)の情報を用いて、認識性能の向上を図った。そこでは、基本周波数情報を量子化し、異なるフレーム間の相関も考慮することにより、有声・無声の区別や、イントネーションの情報を反映したモデル化を行っている。評価実験には、日本語話し言葉コーパスを用い、話し言葉における性能を評価した。評価実験の結果、従来手法と比べ、少ないモデルパラメータ数で、より高い認識性能を示すことが判明し、提案手法の有効性が確認された。

    researchmap

  • パターン認識手法とその実世界応用

    2003 - 2005

      More details

    Grant type:Competitive

    researchmap

  • Pattern recognition and its applications

    2003 - 2005

      More details

    Grant type:Competitive

    researchmap

  • SPEECH RECOGNITION WITH SYNCHRONOUS INPUT OF HAND-WRITTEN GESTURES FOR MOBILE DEVICES

    Grant number:15300054  2003 - 2004

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

    SHINODA Koichi, FURUI Sadaoki

      More details

    Grant amount:\5900000 ( Direct Cost: \5900000 )

    Mobile devices have recently been often used in daily life. User-friendly interface with high accuracy has been strongly demanded. For this purpose, we propose an interface using simultaneous inputs of speech and hand-written gestures. This interface is more robust against environmental noise than speech-only interface, and its input speed is faster than the interface with only hand-written gestures. Our target application is e-mail making with the input of sentences.
    First year, we proposed an interface in which a sentence is input by speech while the "hiragana" character at the head of each phrase in the sentence is input by hand-written gestures. We implemented a recognition algorithm for hand-written gestures, designed a method for recognizing the simultaneous inputs of the two modes. The proposed method was evaluated by simulation experiments using speech data and hand-written gesture data, which are recorded independently, and was proved to be effective.
    Second year, we constructed a recording system for the input of the two modes, and recorded 530 sentences from ten subjects. For integrating the two modes, we employed a two-pass process in which a word graph generated by speech recognition in the first pass is utilized for the integration process of the two modes in the second pass. The proposed method improved the recognition accuracy by 2.6 point over the method only with speech recognition.
    For future work, a method for optimizing the weights among the two modes should be developed. We are going to develop a demonstration system which works in real time and evaluate it in noisy environment.

    researchmap

  • Music Information Processing Using Continuous Speech Recognition Methods

    Grant number:14380156  2002 - 2004

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

    SAGAYAMA Shigeki, SHINODA Koichi, TABARU Tetsuya, NISHIMOTO Takuya

      More details

    Grant amount:\16900000 ( Direct Cost: \16900000 )

    We formulated music rhythm recognition for ranscribing MIDI data into music score as a Viterbi path search problem in HMM where hidden states and output probabilities represent the intended note values and actually played note lengths, respectively. We also solved rhythm recognition of polyphonic music by reducing polyphony intomonophony. Tempo modeling and tempo change detection were enabled with segmental k-means algorithm for speech recognition.
    Harmonization (chord finding) of given melodies was formulated as an isomorphic problem as continuous speech recognition by defining output by the given melody, hidden states by the chord behind the melody and stochastic language model by chord sequences. Automatic counterpoint was developed with a two-step maximum likelihood approach consisting of rhythm design and pitch allocation solved by dynamic programming.
    In polyphonic signal analysis, an algorithm named Harmonic-structured Clustering was developed based on the k-means clustering algorithm under harmonic constraint by modeling the framewise observed spectrum as overlapped harmonic structures and considering that the distributed energy in harmonic structure belongs to a single cluster. Furthermore, by introducing the probabilistic assignment to clusters, k-means was generalized into the EM-algorithm and attained higher performance of multi-pitch estimation. Utilizing an information criterion such as AIC, the number of sources and octave location were also enabled.
    "Specmurt analysis" was proposed for polyphonic signal analysis. The inverse Fourier transform of linear spectrum with log-frequency was called "specmurt". Along log-scaled frequency, observed linear spectrum is regarded as convolution of distribution density of fundamental frequencies and harmonic structures of multiple tones which are assumed identical. This idea opened up a new signal processing capabilities.

    researchmap

▼display all