Faculty Profiles - SHINODA KOICHI

写真a

SHINODA KOICHI

Organization

School of Computing Professor

Homepage

http://www.ks.cs.titech.ac.jp

External link

News & Topics

データ駆動型アプローチに基づく映像検索のための意味インデクシング開発

2014/02/14

Languages： Japanese

　 More details

東京工業大学大学院情報理工学研究科計算工学専攻の井上中順(なかまさ)大学院生と篠田浩一教授らはキヤノンの協力で、インターネット上の映像データからタグやメタデータなどのテキスト概要情報を用いずに、必要な映像を検索する新手法「映像意味インデクシングシステム」を開発した。これは自動車・椅子などの「物体」、夕焼け・家族団らんなどの「シーン」、結婚式・花火などの「イベント」など、人間にとって意味のある「概念」を検出することができる。研究の背景インターネット上の映像データが急激に増加している。それらの大部分はユーザーが作成したもので、極めて多様であり、品質も良くなく、また、十分なテキストタグがついていない。このため、映像の画像特徴や音響特徴を利用した映像検索手法の開発が強く求められていた。
Semantic indexing system for video search using a data-driven ap

2014/02/14

Languages： English

　 More details

The volume of video data on the Internet increases rapidly each year, with the majority of the data being various kinds of low quality, consumer videos, without text tags. So there is strong demand for video search techniques based on the use of image and video features-so called “content-based video retrieval” (CBVR).Video semantic indexing systems extract videos with “concepts” that are meaningful for users without using any text information such as tags or meta-data from internet video data. The concepts include: objects such as cars and chairs, scenes such as sunsets and families having an enjoyable time, or events such as wedding ceremonies and fireworks.

News & Media

BS ジャパン「田村淳のBUSINESS BASIC特別編 with 未来EYES」に情報理工学院の篠田浩一教授が出演

2017/04/22 21:00

Languages： Japanese

TV： BSジャパン「田村淳のBUSINESS BASIC特別編 with 未来EYES」

Degree

Doctor of Engineering ( Tokyo Institute of Technology )

Research Interests

Smart agriculture
Astronomical image processing
medical information processing
emotion recognition
gesture recognition
Gait recognition
image recognition
speech recognition
Video recognition
Deep Learning
Pattern recognition
image processing
speech processing
machine learning
speaker recognition
Multimodal recognition

Research Areas

Informatics / Perceptual information processing

Education

The University of Tokyo

- 1989

　 More details

Country： Japan

researchmap
The University of Tokyo Graduate School, Division of Science

- 1989

　 More details

researchmap
The University of Tokyo Graduate School, Division of Science

- 1989

　 More details

Notes： Master course Completed

researchmap
The University of Tokyo Faculty of Science Department of Physics

- 1987

　 More details

Country： Japan

Notes： Graduated

researchmap

Research History

Institute of Science Tokyo School of Computing Professor

2024.10

　 More details

Country：Japan

researchmap
Tokyo Institute of Technology School of Computing Professor

2016.4 - 2024.9

　 More details

Country：Japan

researchmap
Tokyo Institute of Technology Graduate School of Information Science and Engineering Professor

2013.4 - 2016.3

　 More details

Country：Japan

researchmap
Tokyo Institute of Technology Graduate School of Information Science and Engineering Associate Professor

2007.4 - 2013.3

　 More details

Country：Japan

researchmap
Tokyo Institute of Technology Graduate School of Information Science and Engineering Associate Professor

2003.4 - 2007.3

　 More details

Country：Japan

researchmap
The Institute of Statistical Mathematics Visiting asspciate professor

2003.4 - 2005.3

　 More details

researchmap
The University of Tokyo The Graduate School of Information Science and Technology Associate Professor

2001.10 - 2003.3

　 More details

Country：Japan

researchmap
Lucent Technology Bell Laboratories Visiting scholar

1997.1 - 1998.2

　 More details

Country：United States

researchmap
NEC Corporation Central Research Laboratories

1989.4 - 2001.9

　 More details

Country：Japan

researchmap

▼display all

Professional Memberships

ACM(Association for Computing Machinery)

2006

　 More details

researchmap
Japanese Society for Artificial Intelligence

2004

　 More details

researchmap
Information Processing Society of Japan

2004

　 More details

researchmap
情報処理学会

　 More details

researchmap
人工知能学会

　 More details

researchmap
ACM

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

　 More details

researchmap
日本音響学会

　 More details

researchmap
IEEE

　 More details

researchmap
Institute of Electronics

　 More details

researchmap
Information

　 More details

researchmap
and Communication Engineers

　 More details

researchmap
Acoustic Society of Japan

　 More details

researchmap
IEEE(Institute of Electrical and Electronics Engineers)

　 More details

researchmap
ISCA(International Speech Communication Association)

　 More details

researchmap
ISCA

　 More details

researchmap

▼display all

Committee Memberships

電子情報通信学会情報・システムソサイエティソサイエティ会長

2024.6 - 2025.6

　 More details

researchmap
電子情報通信学会功績賞・業績賞委員会委員

2023.9 - 2024.2

　 More details

researchmap
Acoustic Society of Japan

2023.2 - 2025.2

　 More details

researchmap
電子情報通信学会功績賞・業績賞委員会委員

2022.9 - 2023.2

　 More details

researchmap
電子情報通信学会情報・システムソサイエティソサイエティ副会長

2022.6 - 2024.6

　 More details

researchmap
電子情報通信学会研究会連絡会委員

2022.6 - 2024.6

　 More details

researchmap
国立研究開発法人新エネルギー・産業技術総合開発機構 NEDO 技術委員

2021.7 - 2023.3

　 More details

researchmap
電子情報通信学会ソサイエティ論文誌編集委員会査読委員

2021.6 - 2022.6

　 More details

researchmap
日本学術振興会科学研究費委員会専門委員

2020.12 - 2021.11

　 More details

researchmap
映像情報メディア学会代議員

2020.4 - 2021.3

　 More details

researchmap
日本学術振興会科学研究費委員会専門委員

2019.12 - 2020.11

　 More details

researchmap
国立研究開発法人新エネルギー・産業技術総合開発機構 NEDO 技術委員

2019.6 - 2021.3

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2019.6 - 2020.6

　 More details

researchmap
一般社団法人映像情報メディア学会論文委員会委員長

2019.5 - 2021

　 More details

researchmap
国立情報学研究所博士論文審査員

2019.5 - 2020.6

　 More details

researchmap
Acoustic Society of Japan

2019.2 - 2021.2

　 More details

researchmap
日本学術振興会科学研究費委員会専門委員

2018.12 - 2019.11

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2018.6 - 2019.6

　 More details

researchmap
日本音響学会第２６回技術開発賞選定委貝― 会委員・評価

2018.4 - 2019.3

　 More details

researchmap
国立研究開発法人新エネルギー・産業技術総合開発機構 NEDO 技術委員

2017.12 - 2019.3

　 More details

researchmap
2017 IEEE Automatic Speech Recognition and Understanding Workshop TPC Chair (program committee chair)

2017.12

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2017.6 - 2018.6

　 More details

researchmap
日本音響学会本会評議員

2017.5 - 2019.5

　 More details

researchmap
Acoustic Society of Japan

2017.2 - 2019.2

　 More details

researchmap
総合研究大学院大学博士論文審査委員

2017.1

　 More details

researchmap
総研大博士論文外部予備審査委員

2016.11 - 2016.12

　 More details

researchmap
内閣府個人情報保護委員会生体情報の個人識別性に係る意見交換会外部有識者

2016.7 - 2016.9

　 More details

researchmap
情報処理学会音声言語情報処理研究運営委員会運営委員

2016.4 - 2020.3

　 More details

researchmap
日本学術振興会科学研究費委員会専門委員

2013.12 - 2017.11

　 More details

researchmap
日本音響学会本会評議員

2013.5 - 2015.5

　 More details

researchmap
Acoustic Society of Japan

2011.5 - 2013.5

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2011.4 - 2019.5

　 More details

researchmap
Acoustic Society of Japan

2011 - 2013

　 More details

researchmap
ISCA(International Speech Communication Association)

2010.9

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2010.5 - 2013.5

　 More details

researchmap
Information Processing Society of Japan

2010.4

　 More details

researchmap
日本学術振興会特別研究員等審査会専門委員

2009.8 - 2010.7

　 More details

researchmap
日本学術振興会国際事業委員会書面審査員

2009.8 - 2010.7

　 More details

researchmap
電子情報通信学会シニア会員

2009.5

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2008.6

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2008.5 - 2014.5

　 More details

researchmap
電子情報通信学会・音響学会編集連絡会英文論文誌編集幹事

2007.5 - 2010.5

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2006.12 - 2015.5

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2006.6 - 2008.5

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2006.5 - 2010.5

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers Editor of Transactions on Information and Systems

2006 - 2009

　 More details

Committee type：Academic society

Institute of Electronics, Information, and Communication Engineers

researchmap
電子情報通信学会小中高科学教室委員、東京支部評議員、英文論文誌ED編集幹事(企画)、音声研究会幹事、英文論文誌ＥＤ編集委員

2006 - 2009

　 More details

Committee type：Academic society

電子情報通信学会

researchmap
情報処理学会論文誌査読委員

2006 - 2009

　 More details

Committee type：Academic society

情報処理学会

researchmap
Information Processing Society of Japan

2006 - 2009

　 More details

researchmap
電子情報通信学会・音響学会音声研究会専門委員

2005.5 - 2011.5

　 More details

researchmap
Acoustic Society of Japan

2005 - 2011

　 More details

researchmap
日本音響学会編集委員会査読委員、音声専門委員会幹事

2005 - 2007

　 More details

Committee type：Academic society

日本音響学会

researchmap
Acoustic Society of Japan

2005 - 2007

　 More details

researchmap
電子情報通信学会・音響学会音声研究会幹事

2003.6 - 2005.5

　 More details

researchmap
21世紀COE「大規模知識資源の体系化と活用基盤構築」事業担当推進者

2003 - 2008

　 More details

researchmap
Acoustic Society of Japan

2003 - 2005

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers Editor of Transactions on Information and Systems

2002.5 - 2006.5

　 More details

researchmap
日本音響学会編集委員会査読委員

2001.6 - 2003.5

　 More details

researchmap
電子情報通信学会・音響学会幹事補佐

2001.5 - 2003.5

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

2001.5 - 2003.5

　 More details

researchmap
Acoustic Society of Japan

2001 - 2003

　 More details

researchmap
Institute of Electronics, Information, and Communication Engineers

1999.6 - 2012

　 More details

researchmap
日本音響学会編集委員会査読委員

1997.9 - 2001.9

　 More details

researchmap

▼display all

Papers

Integrating Generative and Contrastive Approaches for Human Action Recognition

Pablo Cervantes, Yusuke Sekikawa, Ikuro Sato, Koichi Shinoda

IEEE Access 2025

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/ACCESS.2025.3575707

researchmap
ContextualCoder: Adaptive In-Context Prompting for Programmatic Visual Question Answering

Ruoyue Shen, Nakamasa Inoue, Dayan Guan, Rizhao Cai, Alex C. Kot, Koichi Shinoda

IEEE Transactions on Multimedia 2025

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/TMM.2025.3543043

researchmap
Diffusion-Based Generative Regularization for Supervised Discriminative Learning.

Takuya Asakura, Nakamasa Inoue, Koichi Shinoda

WACV 8915 - 8926 2025

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/WACV61041.2025.00864

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/wacv/wacv2025.html#AsakuraIS25
LDMSE: Low Computational Cost Generative Diffusion Model for Speech Enhancement

Yuki Nishi, Koichi Shinoda, Koji Iwano

2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 1 - 6 2024.12

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/apsipaasc63619.2025.10849051

researchmap
Egocentric Human Activities Recognition With Multimodal Interaction Sensing

Yuzhe Hao, Asako Kanezaki, Ikuro Sato, Rei Kawakami, Koichi Shinoda

IEEE Sensors Journal 2024.3

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/JSEN.2023.3349191

researchmap
Co-speech Gesture Generation with Variational Auto Encoder

Shinichi Ka, Koichi Shinoda

2024

　More details

DOI： 10.1007/978-3-031-53311-2_12

researchmap
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering.

Ruoyue Shen, Nakamasa Inoue, Koichi Shinoda

ICIP 430 - 436 2024

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICIP51287.2024.10648180

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/icip/icip2024.html#ShenIS24
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering.

Ruoyue Shen, Nakamasa Inoue, Koichi Shinoda

CoRR abs/2407.20563 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2407.20563

researchmap
Text-Guided Object Detector for Multi-modal Video Question Answering.

Ruoyue Shen, Nakamasa Inoue, Koichi Shinoda

IEEE/CVF Winter Conference on Applications of Computer Vision(WACV) 1032 - 1042 2023

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/WACV56688.2023.00109

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/wacv/wacv2023.html#ShenIS23
Implicit Neural Representations for Variable Length Human Motion Generation

Pablo Cervantes, Yusuke Sekikawa, Ikuro Sato, Koichi Shinoda

2022

　More details

DOI： 10.1007/978-3-031-19790-1_22

researchmap
Activity detection in extended video using action tubelets

Chihiro Shiraishi, Nakamasa Inoue, Aleksandr Drozd, Koichi Shinoda, Shi-Wook Lee, Alex Chichung Kot

2018 TREC Video Retrieval Evaluation, TRECVID 2018 2020

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology (NIST)

Scopus

researchmap
NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition.

Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda

Comput. Speech Lang. 61 101033 - 101033 2020

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.csl.2019.101033

researchmap
NEC-TT Speaker Verification System for SRE'19 CTS Challenge.

Kong Aik Lee, Koji Okabe, Hitoshi Yamamoto, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Keisuke Ishikawa, Koichi Shinoda

Interspeech 2020(INTERSPEECH) 2227 - 2231 2020

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/Interspeech.2020-1132

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/interspeech/interspeech2020.html#LeeOYWGKZIS20
Semantic indexing using deep CNNs and GMM supervectors

Nakamasa Inoue, Koichi Shinoda, Zhang Xuefeng, Kazuya Ueki

2014 TREC Video Retrieval Evaluation, TRECVID 2014 2020

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology (NIST)

Scopus

researchmap
TokyoTech-Waseda at TRECVID 2014

Nakamasa Inoue, Zhuolin Liang, Mengxi Lin, Tran Hai Dang, Koichi Shinoda, Zhang Xuefeng, Kazuya Ueki

2014 TREC Video Retrieval Evaluation, TRECVID 2014 2020

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology (NIST)

Scopus

researchmap
Tokyo Tech at TRECVID 2020: Relation Modeling for Video Action Detection.

Ronaldo Prata Amorim, Nakamasa Inoue, Koichi Shinoda

2020 TREC Video Retrieval Evaluation(TRECVID) 2020

　More details

Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology (NIST)

researchmap

Other Link： https://dblp.uni-trier.de/rec/conf/trecvid/2020
Deep Video Understanding of Character Relationships in Movies.

Yang Lu, Asri Rizki Yuliani, Keisuke Ishikawa, Ronaldo Prata Amorim, Roland Hartanto, Nakamasa Inoue, Kuniaki Uto, Koichi Shinoda

Companion Publication of the 2020 International Conference on Multimodal Interaction 120 - 129 2020

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3395035.3425639

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/icmi/icmi2020c.html#LuYIAHIUS20
Recurrent out-of-vocabulary word detection based on distribution of features

Taichi Asami, Ryo Masumura, Yushi Aono, Koichi Shinoda

Computer Speech and Language 58 247 - 259 2019.11

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Academic Press

DOI： 10.1016/j.csl.2019.04.007

Scopus

researchmap
The NEC-TT 2018 Speaker Verification System

Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda

Interspeech 2019 2019.9

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2019-1517

researchmap
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.

Kong Aik Lee, Ville Hautamäki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang 0019, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li 0001, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Jose Patino 0001, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md. Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Tran Huy Dat, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-François Bonastre, Chenglin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas W. D. Evans

CoRR abs/1904.07386 2019

　More details

Publishing type：Research paper (scientific journal)

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr1904.html#abs-1904-07386
The NEC-TT 2018 Speaker Verification System.

Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda

Interspeech 2019(INTERSPEECH) 4355 - 4359 2019

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/Interspeech.2019-1517

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/interspeech/interspeech2019.html#LeeYOWGKZS19
Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition.

Raden Mu'az Mun'im, Nakamasa Inoue, Koichi Shinoda

IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP) 6151 - 6155 2019

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICASSP.2019.8683171

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/icassp/icassp2019.html#MunimIS19
深層学習を用いたMITSuME望遠鏡画像からの突発天体検知 Reviewed

飯田康太, 谷津陽一, 伊藤亮介, 村田勝寛, 橘優太朗, 河合誠之, Yan Long, 篠田浩一, 井上中順, 下川辺隆史

2019 2018.12

　More details

J-GLOBAL

researchmap
Skeleton-based Human Action Recognition with Fine-to-Coarse Convolutional Neural Network Reviewed

Thao Minh Le, Nakamasa Inoue, Koichi Shinoda

vol. 118 ( no. 362 ) pp. 61 - 64 2018.12

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
International Conference on Multimedia Retrieval (ICMR2018) : With the organizers' viewpoint

相澤清晴, 佐藤真一, 柳井啓司, 井出一郎, 山崎俊彦, 入江豪, 小川貴弘, 望月貴裕, 新田直子, 篠田浩一, 呉志鵬, 松井勇佑, 牛久祥孝, 内田祐介

映像情報メディア学会誌 Vol. 72 ( No. 6 ) 888 - 895 2018.11

　More details

Language：Japanese

CiNii Books

researchmap
VANT at TRECVID 2018 Reviewed

Nakamasa Inoue, Chihiro Shiraishi, Aleksandr Drozd, Koichi Shinoda, Shi-wook Lee, Alex Chichung Kot

Proc. TRECVID workshop 2018.11

　More details

We propose a system for activity detection, which utilizes the Action 
Tubelet (ACT) Detector to localize activities in video data. Our 
network is trained for all of activities in the ActEV dataset with a 
backbone convolutional neural network pre-trained on the ImageNet 
dataset. We inserted a thresholding module to the original ACT 
framework to adapt detector to the ActEV task, since activities in 
this task appear more sparsely distributed than those in the action 
detection task. Our result was 0.882 in mean-p miss@0.15rfa at the AD 
Leaderboard Evaluation.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Few-Shot Adaptation for Multimedia Semantic Indexing Reviewed

Nakamasa Inoue, Koichi Shinoda

Proc. ACM Multimedia pp. 1110 - 1118 2018.10

　More details

researchmap
A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition Reviewed

Thao Minh Le, Nakamasa Inoue, Koichi Shinoda

Proc. British Machine Vision Conference (BMVC) 2018.9

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Attentive Statistics Pooling for Deep Speaker Embedding Reviewed

Koji Okabe, Takafumi Koshinaka, Koichi Shinoda

Proc. Interspeech 2018 pp. 2252 - 2256 2018.9

　More details

Publishing type：Research paper (international conference proceedings)

This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates not only weighted means but also weighted standard deviations. In this way, it can capture long-term variations in speaker characteristics more effectively. An evaluation on the NIST SRE 2012 and the VoxCeleb data sets shows that it reduces equal error rates (EERs) from the conventional method by 7.5% and 8.1%, respectively.

DOI： 10.21437/Interspeech.2018-993

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data Reviewed

Tifani Warnita, Nakamasa Inoue, Koichi Shinoda

Proc. Interspeech 2018 pp. 1706 - 1710 2018.9

　More details

We propose an automatic detection method of Alzheimer's diseases using a gated convolutional neural network (GCNN) from speech data. This GCNN can be trained with a relatively small amount of data and can capture the temporal information in audio paralinguistic features. Since it does not utilize any linguistic features, it can be easily applied to any languages. We evaluated our method using Pitt Corpus. The proposed method achieved the accuracy of 73.6%, which is better than the conventional sequential minimal optimization (SMO) by 7.6 points.

DOI： 10.21437/Interspeech.2018-1713

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification Reviewed

Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda

Proc. Interspeech 2018 pp. 3613 - 3617 2018.9

　More details

I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-utterance i-vector and its discriminator network is trained to determine whether an i-vector is generated by the generator or the one extracted from a long utterance. Additionally, we assign two other learning tasks to the GAN to stabilize its training and to make the generated i-vector more speaker-specific. Speaker verification experiments on the NIST SRE 2008 “10sec-10sec” condition show that after applying our method, the equal error rate reduced by 11.3% from the conventional i-vector and PLDA system.

DOI： 10.21437/Interspeech.2018-1680

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
単語分散表現を用いた動画からのイベント検出 Reviewed

金井怜, 井上中順, 李時旭, 篠田浩一

2018.8

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Alzheimer's Disease Prediction Using Audio Gated Convolutional Neural Network Reviewed

Tifani Warnita, Nakamasa Inoue, Koichi Shinoda

ASJ 2018 Autumn Meeting pp. 1223 - 1224 2018.8

　More details

As the result of aging society, we face an increasing number of people being affected by Alzheimer's disease (AD). Early prediction of AD has a major importance to not only to prevent the disease become worse but also further to make the patients to be fully recovered. We propose a language-independent approach of detecting AD patients by leveraging paralinguistic features from the audio data. We achieved the best result of 71.35% by aggregating the utterance-level prediction from gated convolutional neural network (GCNN).

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Generative Adversarial Network Based i-Vector Transformation for Short Utterance Speaker Verification Reviewed

Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda

ASJ 2018 Autumn Meeting pp. 1345 - 1346 2018.8

　More details

i-Vector based text-independent speaker verification (SV) systems often 
have poor performance with short utterances, because the biased phonetic 
distribution in a short utterance makes the extracted i-vector unreliable. 
This paper proposes an i-vector compensation method using a generative 
adversarial network (GAN), where its generator network is trained to 
transform an unreliable i-vector from a short utterance into a reliable one 
which can only be extrated from a long utterance and its discriminator 
network is trained to determine whether an i-vector is from the generator or 
from a long utterance. Additionally, we assign two other learning tasks to 
the GAN to stabilize its training and to make the generated i-vector more 
speaker-specific. Speaker verification experiments conducted on the NIST 
SRE 2008 “short2-10sec” and “10sec-10sec” conditions show that our 
method can help reduce the average equal error rate of the conventional 
i-vector and PLDA system.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Astronomical Image Subtraction for Transient Detection Using CNN Reviewed

Yan Long, Nakamasa Inoue, Koichi Shinoda, Yoichi Yatsu, Ryosuke Itoh, Nobuyuki Kawai

2018.8

　More details

Language：English

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances Reviewed

Thao Le Minh, Nobuyuki Shimizu, Takashi Miyazaki, Koichi Shinoda

Proc. International Joint Conference on Artificial Intelligence (IJCAI) pp. 1546 - 1553 2018.7

　More details

researchmap
深層学習のためのCo-Design Reviewed

篠田浩一

電子情報通信学会技術研究報告 SP/PRMU vol. 118 ( no. 112 ) pp. 65 2018.6

　More details

researchmap
社会インフラ映像処理のための高速・省資源深層学習アルゴリズム基盤 Reviewed

篠田浩一

2018.4

　More details

researchmap
Multi-Task Autoencoder for Noise-Robust Speech Recognition Reviewed

Haoyi Zhang, Conggui Liu, Nakamasa Inoue, Koichi Shinoda

Proc. ICASSP pp. 5599 - 5603 2018.4

　More details

DOI： 10.1109/ICASSP.2018.8461446

researchmap
全層ゲート付き2次元畳み込みネットワークによる多重音信号の音高認識 Reviewed

生田目敬弘, 亀岡弘和, 篠田浩一

研究報告音声言語情報処理（SLP） vol. 120 ( no. 12 ) pp. 1 - 7 2018.2

　More details

音楽は音高方向（和音構成，調波構造）と時間方向（旋律，リズム）の 2 
次元構造を有する．我々は，音楽音響信号の音高認識の問題を音響スペクトログ 
ラムに対する音高ラベルの 2 次元的な配置問題と捉え，多重音信号の対数周波 
数スペクトログラムから直接音高認識を行う全層ゲート付き 2 次元畳み込みネ 
ットワークを提案する．全層がゲート付き 2 次元畳み込みネットワークで構成 
され，楽音の音響スペクトログラムと音楽の 2 次元構造を各層で表現する．従 
来の確率的潜在成分分析手法と比較し，Bach10 データセットにおいて従来手法 
の音符単位 F1 スコア 65.0% を 8.3% ポイント上回る 73.3% の性能を得た．ま 
た，室内楽データセットを新たに構築し，モデルの学習に用いた

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition.

Raden Mu'az Mun'im, Nakamasa Inoue, Koichi Shinoda

CoRR abs/1811.04531 2018

　More details

Publishing type：Research paper (scientific journal)

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr1811.html#abs-1811-04531
A Unified Network for Multi-Speaker Speech Recognition with Multi-Channel Recordings Reviewed

Conggui Liu, Nakamasa Inoue, Koichi Shinoda

Proc. APSIPA pp. 1304 - 1307 2017.12

　More details

Despite the recent progress in speech recognition, 
meeting speech recognition is still a challenging task, since it is 
often difficult to separate one speaker’s voice from the others in 
meetings. In this paper, we propose a joint training framework 
of speaker separation and speech recognition with multi-channel 
recordings for this purpose. The location of each speaker is first 
estimated and then used to recover her/his original speech in 
a delay-and-subtraction (DAS) algorithm. The two components, 
speaker separation and speech recognition, are represented by 
one deep net, which is optimized as a whole using training data. 
We evaluated our method using simulated data generated from 
WSJCAM0 database. Compared with the independent training 
of the two components, our proposed method improved word 
accuracy by 15.2% when the locations of speakers are known, 
and by 53.6% when the locations of speakers are unknown

DOI： 10.1109/APSIPA.2017.8282233

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Multimodal Speech Recognition Using Mouth Images from Depth Camera Reviewed

Yuki Yasui, Nakamasa Inoue, Koji Iwano, Koichi Shinoda

Proc. APSIPA pp. 1233 - 1236 2017.12

　More details

Deep learning has been proved to be effective in 
multimodal speech recognition using facial frontal images. In 
this paper, we propose a new deep learning method, a trimodal 
deep autoencoder, which uses not only audio signals and face 
images, but also depth images of faces, as the inputs. We collected 
continuous speech data from 20 speakers with Kinect 2.0 and 
used them for our evaluation. The experimental results with 
10dB SNR showed that our method reduced errors by 30%, 
from 34.6% to 24.2% from audio-only speech recognition when 
SNR was 10dB. In particular, it is effective for recognizing some 
consonants including /k/, /t/.

DOI： 10.1109/APSIPA.2017.8282227

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
高速かつ省資源な深層学習の実現に向けて Reviewed

篠田浩一

2017.12

　More details

researchmap
Action Sequence Recognition in Videos by Combining a CTC Network with a Statistical Language Model Reviewed

Mengxi Lin, Nakamasa Inoue, Koichi Shinoda

Technical Reports of IEICE PRMU vol. 117 ( no. 362 ) pp. 1 - 6 2017.12

　More details

Action sequence recognition aims to recognize what actions occur in a video and their temporal order. In this paper, we propose to combine an LSTM network trained with Connectionist Temporal Classification (CTC) with a statistical language model for action sequence recognition. The statistical language model captures the relations between action instances, which are hardly learned by the CTC network. Our experiments on the Breakfast dataset show that the statistical language model can significantly boost the recognition accuracy of the CTC network, from 37.0% to 43.4%.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
TokyoTech-AIST at TRECVID 2017: Multimedia Event Detection Using Deep CNNs and Zero-Shot Classifiers Reviewed

Nakamasa Inoue, Ryosuke Yamamoto, Na Rong, Satoshi Kanai, Junsuke Masada, Chihiro Shiraishi, Shi-wook Lee, Koichi Shinoda

Proc. TRECVID workshop pp. 1 - 6 2017.11

　More details

researchmap
CTC network with statistical language modeling for action sequence recognition in videos Reviewed

Mengxi Lin, Nakamasa Inoue, Koichi Shinoda

Thematic Workshops 2017 - Proceedings of the Thematic Workshops of ACM Multimedia 2017, co-located with MM 2017 393 - 401 2017.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Association for Computing Machinery, Inc

DOI： 10.1145/3126686.3126755

Scopus

researchmap
User adaptation of convolutional neural network for human activity recognition Reviewed

Shinya Matsui, Nakamasa Inoue, Yuko Akagi, Goshu Nagino, Koichi Shinoda

25th European Signal Processing Conference, EUSIPCO 2017 2017- 753 - 757 2017.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Institute of Electrical and Electronics Engineers Inc.

DOI： 10.23919/EUSIPCO.2017.8081308

Scopus

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Development of a cloud detection system utilizing image recognition technology Reviewed

Y. Yatsu, T. Yoshii, N. Kawai, J. Sakuma, N. Inoue, K. Shinoda, T. Shimokawabe

2017.10

　More details

researchmap
深層学習の音声認識への応用 Reviewed

篠田浩一

2017.10

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
口唇深度画像を利用したディープオートエンコーダに基づくマルチモーダル音声認識 Reviewed

安井勇樹, 岩野公司, 井上中順, 篠田浩一

日本音響学会2017年秋季研究発表会講演論文集 pp. 117 - 118 2017.9

　More details

researchmap
Joint training of speaker separation and speech recognit ion based on deep learning Reviewed

Conggui Liu, Nakamasa Inoue, Koichi Shinoda

ASJ 2017 Autumn Meeting pp. 63 - 64 2017.9

　More details

researchmap
口唇の深度画像を用いたディープオートエンコーダによるマルチモーダル音声認識 Reviewed

安井勇樹, 岩野公司, 井上中順, 篠田浩一

情報処理学会研究報告 SLP 2017.7

　More details

researchmap
Video Information Retrieval Reviewed

Koichi Shinoda

2017.7

　More details

researchmap
Deep Learningを応用した全天画像からの気象識別 Reviewed

谷津陽一, 白石一輝, 吉井健敏, 河合誠之, 佐久間淳一, 井上中順, 篠田浩一, 下川辺隆史

2017.5

　More details

researchmap
話者認識と顔画像認識を用いた映像におけるマルチモーダル人物同定 Reviewed

西史人, 井上中順, 岩野公司, 篠田浩一

日本音響学会2017年春季研究発表会講演論文集 pp. 129 - 130 2017.3

　More details

本研究では話者，顔，背景情報を利用したマル 
チモーダル人物同定手法を提案した．今後の課 
題として，複数のOCR結果を併用することによ 
る性能改善が挙げられる．

researchmap
畳み込みニューラルネットワークを用いた夜間全天画像からの雲領域検出 Reviewed

佐久間惇一, 篠田浩一, 井上中順, 谷津陽一, 吉井健敏, 河合誠之, 下川辺隆史

情報処理学会第79回全国大会論文集 2017 ( 1 ) pp. 283 - 284 2017.3

　More details

Language：Japanese

本研究では，全天画像から雲領域と晴天領域 
を分割する畳み込みニューラルネットワークを 
提案し，全天画像1,000 枚を用いて実験を行った． 
その結果，提案ネットワークでは、既存の物体 
認識用のネットワークと比較して，同程度の精 
度を持ちながら，メモリサイズ及びファイルサ 
イズを削減することができた． 
今後の課題としては，障害物や水滴などの雲 
以外の物体の識別，また，雲の厚さに基づいた 
段階的な識別がある．

CiNii Books

researchmap
Speaker Separation in Multi-Channel Environment Using Deep Learning Reviewed

Conggui Liu, Nakamasa Inoue, Koichi Shinoda

Technical Reports of IPSJ SLP vol. 115 ( no. 11 ) pp. 1 - 6 2017.2

　More details

This paper addresses multi-channel speaker separation based on a deep delay-and-subtraction beamformer. 
Deep neural network(DNN) is first applied to estimate the delay time between speakers and microphones , and then 
speakers’ speech is recovered from mixed signals by using a delay-and-subtraction algorithm. We evaluated our method 
by using simulated data made from WSJCAM0 database. The proposed method achieved high precision source localization, 
and about 62% relative improvement on word error rate (WER) over a delay-and-sum (DS) beamformer.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Cross-view human action recognition from depth maps using spectral graph sequences Reviewed

Tommi Kerola, Nakamasa Inoue, Koichi Shinoda

COMPUTER VISION AND IMAGE UNDERSTANDING 154 108 - 126 2017.1

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.cviu.2016.10.004

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Boredom Recognition Based on Users' Spontaneous Behaviors in Multiparty Human-Robot Interactions Reviewed

Yasuhiro Shibasaki, Kotaro Funakoshi, Koichi Shinoda

MULTIMEDIA MODELING (MMM 2017), PT I 10132 677 - 689 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-51811-4_55

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Deep Learning for spoken language processing: Overview

Koichi Shinoda

日本音響学会誌 vol. 73 ( no. 1 ) 25 - 30 2016.12

　More details

Language：Japanese Publisher：The Acoustical Society of Japan

researchmap
The NEC-TT Speaker Recognition System for NIST SRE16 Reviewed

Hitoshi Yamamoto, Koichi Shinoda

Proc. NIST SRE workshop 2016.12

　More details

This paper describes the speaker recognition system of the 
NEC-TT team for the fixed training condition of the NIST 
2016 speaker recognition evaluation (SRE16). Our system 
is based on standard i-vector feature and Probabilistic LDA 
back-end. The feature extractor employs multi-condition 
training and LSTM-based voice activity detection to be ro- 
bust against acoustic variability in the Call My Net Speech 
Collection. The back-end includes feature normalization and 
unsupervised adaptation methods to compensate mismatch 
between the fixed training set (the past SREs) and the eval- 
uation set. It also utilizes DNN-based gender and language 
estimation to control the parameters of score calibration 
and score fusion for each trial pair. Accordingly, our sys- 
tem achieved 0.6192 minimum Cprimary and 0.6934 actual 
Cprimary for the SRE16 development set.

researchmap
TokyoTech at TRECVID 2016 Reviewed

Nakamasa Inoue, Ryosuke Yamamoto, Na Rong, Koichi Shinoda

Proc. TRECVID workshop pp. 1 - 6 2016.11

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Video Semantic Indexing and Localization Reviewed

Koichi Shinoda

5th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan vol. 140 ( no. 4 ) p. 3009 2016.11

　More details

Nowadays Internet traffic has been largely occupied by consumer video but most of them are not accompanied with text tags for 
search. Hence, video semantic indexing, which extracts visual concepts such as objects, scenes, and actions directly from video contents, 
has been intensively studied. Fundamentally, this task consists of two problems: localization and recognition. While until recently these 
two problems have been studied independently, emerging end-to-end deep learning techniques using convolutional neural networks 
(CNNs) and recurrent neural networks (RNNs) offer effective ways to solve them simultaneously. These techniques are deeply related 
to spoken word detection techniques in the speech field. In this talk, we overview the recent progress in this area and discuss potential 
directions for future research.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Experiments with Optical Properties of Skin on Fingers Reviewed

Martin Drahansky, Ondrej Kanich, Eva Brezinova, Koichi Shinoda

International Journal of Optics and Applications vol. 6 ( no. 2 ) pp. 37 - 46 2016.10

　More details

Language：English Publishing type：Research paper (scientific journal)

This article describes our experiments with optical properties of skin on fingers. At the beginning we introduce the medical skin structure and measurement of blood oxidation. This information is needed for the second part – preparation of our measurement (acquirement) equipment for multispectral skin illumination using various wavelengths and for finger vein acquirement using infrared light. Follow results from our experiments which are summarized in graphs and tables. The aim of this research is to find a suitable solution for liveness detection (anti-spoofing), based on optical properties of skin on fingers.

DOI： 10.5923/j.optics.20160602.03v

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Adaptation of word vectors using tree structure for visual semantics Reviewed

Nakamasa Inoue, Koichi Shinoda

MM 2016 - Proceedings of the 2016 ACM Multimedia Conference 277 - 281 2016.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Association for Computing Machinery, Inc

DOI： 10.1145/2964284.2967226

Scopus

researchmap
東工大TSUBAMEの活用事例：マルチメディア認識のための深層学習 Reviewed

篠田浩一

2016.10

　More details

音声・音楽・画像・映像などのマルチメディア認識の分野では、近年、深層学習が大幅な性能向上をもたらしている。一方で、その研究のために必要な計算機資源の確保が課題となっている。ここでは、東京工業大学のスーパーコンピュータTSUBAMEを活用した、我々の研究事例について紹介する。

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Deep Learning for Speech, Image, and Video Reviewed

Koichi Shinoda

2016.10

　More details

Deep learning for large-scale artificial neural networks has become a new trend in a statistical pattern recognition and is changing its research paradigm. In this talk, we will review the recent progress of deep learning techniques in the areas of speech recognition, image recognition, and video information retrieval, and discuss their future directions.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
突発天体観測用天文台全球リレーのための気象モニターの開発 Reviewed

谷津陽一, 吉井健敏, 針田聖平, 村木雄太郎, 河合誠之, 佐久間惇一, HyunJin Jung, 井上中順, 篠田浩一, 下川辺隆史, 太田佳

天文学会予稿集 2016 p. 210 2016.9

　More details

J-GLOBAL

researchmap
複数スマートフォンで収録された会話音声の対話グループ検出と話者決定の性能改善 Reviewed

岩野公司, 荒毛祐紀, 小平優希, 篠田浩一

電子情報通信学会技術研究報告 SP vol. 116 ( no. 189 ) pp. 53 - 58 2016.8

　More details

本稿では，会話や会議の参加者が各自の所有するスマートフォンで音声を収録することで集められた，複数の多人数会話音声の処理技術として，「対話グループ検出」と「話者決定」について論じる．先行研究で提案した対話グループ検出は，グループ数が既知であることを前提とした手法であったため，本研究ではその改善手法として，BIC基準に基づくグループ数推定機能を有する対話グループ検出手法の提案を行う．また，クラスタリング手法の改良による性能改善も試みる．話者決定については，従来の提案手法では性能が 6 割に満たなかったため，相互スペクトル減算によって発話重畳の影響を抑制した話者決定手法の提案を行い，性能の向上を試みる．対話グループ検出については，15 グループ60 会話音声データを使用した実験の結果，正しくグループ数が推定され，グループ検出性能も 
100%に達することが確認された．話者決定手法については，5 セッションの雑談音声データを利用した評価により，従来手法よりも約 
6%の性能改善が確認され，提案手法の有効性が示された．

researchmap
Concept Elimination for Zero-Shot Event Detection Reviewed

Tran Hai Dang, Nakamasa Inoue, Koichi Shinoda

IS2 - 19 2016.6

　More details

researchmap
Fast Coding of Feature Vectors Using Neighbor-to-Neighbor Search Reviewed

Nakamasa Inoue, Koichi Shinoda

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 38 ( 6 ) 1170 - 1184 2016.6

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/TPAMI.2015.2481390

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
楽器と音高の同時認識のためのRNN音響モデル Reviewed

生田目敬弘, 亀岡弘和, 篠田浩一

第111回音楽情報科学研究会音学シンポジウム vol. 111 ( no. 46 ) pp. 1 - 5 2016.5

　More details

概要:複数の楽器を含む楽曲について楽器識別と音高認識を同時に行う RNN 音響モデルを提案する.楽器別に音高を推定するため, RNN の出力層を識別する楽器数に応じて増加させ,楽器を区別した教師信号を与えて学習させる.主流な音高推定手法では音高を得るために音源分離を試みているが,提案する RNNでは難しい音源分離問題を回避して直接音高を推定できる.また,教師信号として音高に加えて楽器の種類を与えることで RNN がスペクトル形状を認識しやすくなることが期待できる他,楽器別に音高を得られることから,楽器別に学習した音楽言語モデルと統合することが可能になる利点がある.ヴァイオリンとクラリネットの 2 重奏曲について楽器推定と音高推定を行う実験により,クラリネットについて 0.9% と 
まったく検出できなかったが,ヴァイオリンについて 66.0% と,従来手法の 68.3% と同程度の推定性能を持つことが分かった.ヴァイオリンについて性能が良くクラリネットについて性能が良くない原因はデータセットの不足と質の悪さに起因すると考えられ,今後学習データの増加によって改善する見込みである.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Wise teachers train better DNN acoustic models Reviewed

Ryan Price, Ken-ichi Iso, Koichi Shinoda

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING ( 10 ) 1 - 19 2016.4

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1186/s13636-016-0088-7

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
複数スマートフォンで収録された会話音声の相互スペクトル減算を用いた話者決定 Reviewed

小平優希, 篠田浩一, 岩野公司

情報処理学会第78回全国大会講演論文集 pp. 533 - 534 2016.3

　More details

本研究では、複数スマートフォンで収録された会話音声を対象とした、相互スペクトル減算を用いた話者決定手法を提案し、性能評価を行った。その結果、従来手法に比べ、提案手法による検出性能の改善が得られ、その有効性を確認することができた．今後は、深層学習の利用による話者モデルの高精度化や、発言数が極端に少ない話者に対する対処手法の検討などにより、さらなる話者決定性能の改善が望まれる。

researchmap
(チュートリアル) 音声・画像・映像処理における深層学習 Reviewed

篠田浩一, 井上中順

言語処理学会第22回年次大会(NLP2016) 講演論文集 2016.3

　More details

近年，マルチメディア情報処理の分野では，深層学習 (Deep Learning) が盛んに研究されており，従来のパラダイムに変革を迫っています．深層学習では，メディアの違いによる方式の違いがあまりないことから，複数のメディアを横断したマルチメディア処理の研究が，今後急速に進展するものと予想されます．このチュートリアルでは，主にテキスト処理の研究者を対象として，音声・画像・映像処理における深層学習の最新の研究動向について解説します．まず，これらのメディアにおける最新の方法論について概説した後，そこで用いられているツールやデータベースとそれらの使い方について解説します．これらの分野における深層学習についての基本的な知識と情報源へのポインタを提示し，分野横断的研究へ展開する際の一助となることを目的とします．

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
多人数環境下でのロボットとの対話における人間の退屈状態の推定 Reviewed

芝崎泰弘, 船越孝太郎, 篠田浩一

電子情報通信学会技術研究報告 PRMU vol. 115 ( no. 517 ) pp. 119 - 124 2016.3

　More details

Language：Japanese

researchmap
Robust discriminative training against data insufficiency in PLDA-based speaker verification Reviewed

Johan Rohdin, Sangeeta Biswas, Koichi Shinoda

COMPUTER SPEECH AND LANGUAGE 35 32 - 57 2016.1

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.csl.2015.06.003

Web of Science

researchmap
Localization using Faster R-CNN and Multi-Frame Fusion

Ryosuke Yamamoto, Nakamasa Inoue, Koichi Shinoda

2016 TREC Video Retrieval Evaluation, TRECVID 2016 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology (NIST)

Scopus

researchmap
Graph Regularized Implicit Pose for 3D Human Action Recognition Reviewed

Tommi Kerola, Nakamasa Inoue, Koichi Shinoda

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) pp. 155 - 159 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/APSIPA.2016.7820717

Web of Science

researchmap
Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features Reviewed

Taichi Asami, Ryo Masumura, Yushi Aono, Koichi Shinoda

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5 1320 - 1324 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.21437/Interspeech.2016-562

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Semantic indexing for large-scale video retrieval Reviewed

Nakamasa Inoue, Koichi Shinoda

ITE Transactions on Media Technology and Applications 4 ( 3 ) 209 - 217 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Institute of Image Information and Television Engineers

DOI： 10.3169/mta.4.209

Scopus

researchmap
TokyoTech at MediaEval 2016 Multimodal Person Discovery in Broadcast TV task

Fumito Nishi, Nakamasa Inoue, Koji Iwano, Koichi Shinoda

CEUR Workshop Proceedings 1739 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：CEUR-WS

Scopus

researchmap
単語ベクトルによる語彙拡張を用いた映像のセマンティックインデクシング Reviewed

井上中順, 篠田浩一

電子情報通信学会技術研究報告 PRMU vol. 115 ( no. 388 ) pp. 75 - 80 2015.12

　More details

映像から物体・動作・シーンを検出することを目的としたセマンティックインデクシングに対して，単語ベクトルを用いた語彙拡張方法を提案する．提案手法は，Zero-shot Learningの一手法であり，物体・動作・シーンを表す語の中から，映像もしくは画像の学習データがある語に対する検出器の重み付き和で，学習データの無い語の検出器を構成する．具体的には，学習データがある語の単語ベクトルの内挿により，それ以外の語の単語ベクトルを求め，その重み係数を検出器の重み付けに用いる．ここで，単語ベクトルとは，Mikolovらのword2vecなどにより，単語をベクトルで表し，ベクトル間の距離で単語間の類似度を算出できるものである．評価実験では，ImageNETにおける1000種類の物体画像を学習データとして用い，TRECVIDデータセットで，学習データに含まれていない346種類の物体・動作・シーンの検出を行った．その結果，Mean Average Precisionで0.153を得た．これは，Fisher vectorとTRECVIDの学習サンプル100個を用いて学習したサポートベクトルマシンに相当する性能である．

researchmap
TokyoTech at TRECVID 2015 Reviewed

Nakamasa Inoue, Tran Hai Dang, Ryosuke Yamamoto, Koichi Shinoda

Proc. TRECVID workshop pp. 1 - 10 2015.11

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
音声・画像・映像におけるDeep Learningを用いたパターン認識 Reviewed

篠田浩一

人工知能学会AIチャレンジ研究会予稿集 pp. 1 - 6 2015.11

　More details

近年、マルチメディア分野では、Deep Learn- 
ing(深層学習) が盛んに研究されている。特に、音声認識や画像における一般物体認識では、従来法から大幅にエラーを削減し、すでに標準的な技術として商用にも使われている。本稿では、 
まず、マルチメディア分野における深層学習のこれまでの研究を概観した上で、現段階における課題とそれに対するアプローチを解説する。研究の進展は急であり、そろそろできることとできないことがはっきりしてきた。最後に、今後、深層学習を用いたパターン認識の研究がどのような方向に進んでいくかを議論したい。

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Error Correction Using Long Context Match for Smartphone Speech Recognition Reviewed

Yuan Liang, Koji Iwano, Koichi Shinoda

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E98D ( 11 ) 1932 - 1942 2015.11

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1587/transinf.2015EDP7179

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
活性化関数のパラメータ制御を用いた LSTM による音声認識 Reviewed

松山祐輔, Ryan Price, 篠田浩一

日本音響学会2015年秋季研究発表会講演論文集 pp. 1 - 2 2015.9

　More details

researchmap
音声・動画像の因子分析を用いる話者ダイアライゼーション Reviewed

西史人, 井上中順, 篠田浩一

日本音響学会2015年秋季研究発表会講演論文集 pp. 175 - 176 2015.9

　More details

話者ダイアライゼーションとは「誰が，いつ」発話しているかを音声や画像の情報を用いて事前情報なしに推定するタスクである。トークショーや映画における話者ダイアライゼーションは電話や会議における話者ダイアライゼーションと比べ，BGMや環境音などの影響が大きい。そのため，音声と映像を用いたマルチモーダル話者ダイアライゼーションが効果的である。 
Felicienら[1]はトークショーを対象にした実験で，音声情報と話者の服の色を特徴量として用いているが，本研究の対象である映画のように明暗の切り替わりが激しい映像で用いることは難しい。 
そこで本研究では音声・動画像の因子分析を用いる話者ダイアライゼーションを提案する。

researchmap
A DNN-Based ASR System for the Indonesian Language Reviewed

Devin Hoesen, Ryan Price, Puji Lestari Dessi, Koichi Shinoda

Proc. ASJ 2015 Autumn Meeting pp. 5 - 6 2015.9

　More details

This paper presents our work to develop a DNN-based ASR system for the Indonesian language, a language that has limited training data. We evaluate the effectiveness of some common improvements to DNN acoustic models, such as speaker adaptive features and sequence-discriminative training. We also explore incorporating English in the training process as a potential way of reducing WER when the Indonesian training data is very limited.

researchmap
音声認識のためのDeep Learning 企画シンポジウムディープラーニングの切り口：神経回路学会の視点から Reviewed

篠田浩一

2015.9

　More details

researchmap
Autonomous selection of i-vectors for PLDA modelling in speaker verification Reviewed

Sangeeta Biswas, Johan Rohdin, Koichi Shinoda

SPEECH COMMUNICATION 72 32 - 46 2015.9

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.specom.2015.05.001

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Combining Audio Features and Visual i-vector at MediaEval 2015 Multimodal Person Discovery in Broadcast TV Reviewed

Fumito Nishi, Nakamasa Inoue, Koichi Shinoda

Proc. MediaEval Workshop 2015.9

　More details

This paper describes our diarization system for the Multimodal Person Discovery in Broadcast TV task of the MediaEval 2015 Benchmark evaluation campaign [1]. The goal of this task is naming speakers, who are appearing and speaking simultaneously in the video, without prior knowledge. Our diarization system is based on multimodal approach to combine audio and visual informations. We extract features from a face in each shot to make visual i-vectors [2], and introduce them to the provided baseline system. In the case of faces are extracted correctly, the performance becomes better, but based on the test run, clear improvement could not be observed.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
ロボットとの対話における人間の「退屈」状態の解析 Reviewed

芝崎泰弘, 船越孝太郎, 篠田浩一

第14回情報科学技術フォーラム講演論文集 (FIT) vol. 14 ( no. 3 ) pp. 163 - 166 2015.8

　More details

現在の情報システムはユーザの習熟度などの内部状態を把握することが困難である。例えば、ユーザーの退屈度合いをその振る舞いから推定できれば、教育分野などでより個々人に最適化されたシステムを構築できる。この実現に向けて、我々はまずアルデバラン社の人型ロボットNAOと成人男女3名の参加者とが会話やジェスチャー等の対話を通じてゲームを進める場面をマイクロソフト社のKinectで収録した。そして、そのデータに対し、参加者の自発的振る舞いを退屈度を基準に分類し、退屈時の振る舞いの解析を行った。

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
マルチモーダルi-vectorを用いた話者ダイアライゼーション Reviewed

西史人, 井上中順, 篠田浩一

情報処理学会研究報告 SLP vol. 107 ( no. 4 ) pp. 1 - 6 2015.7

　More details

映画を対象とするマルチモーダル話者ダイアライゼーションにおいて,マルチモーダル i-vector を用いる手法を提案する. i-vector とは話者認識において使われている特徴量であり,発話者の情報を表した低次元ベクトルである.音声の i-vector に,動画中の話者の顔画像から抽出した i-vector を結合することで作られたマルチモーダル i-vector に対して教師無しクラスタリングを行う.評価実験は映画「ハンナとその姉妹」のデータセットで行い,Diarization Error Rate (DER) は音声のみを用いた場合比べ,68.3%から 65.5%に改善された.

researchmap
Speaker Diarization Using Multi-Modal i-vectors Reviewed

Fumito Nishi, Nakamasa Inoue, Koichi Shinoda

Proc. International Technical Conference on Circuits/Systems Computers and Communications (ITC-CSCC) pp. 27 - 30 2015.6

　More details

We propose multi-modal i-vectors, which extend the audio i-vector framework for speaker verification to a multi-modal speaker diarization in movies. In addition to the audio i-vector, which represents a speech utterance in an audio stream by a low-dimensional vector, we extract a visual i-vector from faces in a video segment. The audio and visual i-vectors are concatenated as a multi-modal i-vector clustered in an unsupervised way. We evaluate our method on the Hannah movie dataset. Our experiments show that diarization error rate is improved from 68.3% to 65.5% compared with audio stream only.

researchmap
Robust Video Information Retrieval using Speech Technologies Reviewed

Koichi Shinoda

2015.6

　More details

The amount of video data on the Internet has been rapidly increasing. Those video have large variety and in most case with low quality. Robust techniques for video indexing are strongly demanded. In automatic video semantic indexing, a user submits a textual input query for a desired object or a scene to a search system, which returns video shots that include the object or scene. In this application, many techniques developed in speech research have been successfully employed. For example, a new method using Gaussian-mixture model (GMM) supervectors and support vector machines (SVMs) was recently proven to be very effective. In this method, speech technologies such as speaker verification and speaker adaptation techniques play very important roles. In this lecture, we first introduce the activities of NIST TRECVID workshop which is a showcase of the state-of-the-art video search technologies, and then, discuss several techniques such as SIFT and HOG features, Bag of Visual Words, Fisher kernel, Multi-modal framework, and Fast tree search, to achieve robustness against the variety of the Internet video.

researchmap
CNNから抽出した複数特徴量の統合に基づいた映像の意味インデクシング Reviewed

福田竣, 井上中順, 篠田浩一

第21回画像センシングシンポジウム (SSII) 講演論文集 IS2 - 16 2015.6

　More details

researchmap
統計的パターン認識のための中間表現 Reviewed

篠田浩一

電子情報通信学会技術研究報告 SP vol. 114 ( no. 474 ) pp. 73 2015.3

　More details

researchmap
Spectral Graph Wavelets for Skeleton-based 3D Action Recognition Reviewed

Tommi Kerola, Nakamasa Inoue, Koichi Shinoda

Technical Reports of IEICE PRMU vol. 114 ( no. 454 ) pp. 131 - 136 2015.2

　More details

We present spectral graph skeletons (SGS), a novel graph-based method for action recognition from depth cameras. 
The contribution of this paper is to leverage a spectral graph wavelet transform (SGWT) for creating an overcomplete representation of an action signal lying on a 3D skeleton graph. 
The resulting SGS descriptor is efficiently computable in time linear in the action sequence length. 
We investigate the suitability of our method by experiments on three publicly available datasets, resulting in performance comparable to 
state-of-the-art action recognition approaches. Namely, our method achieves 91.4% accuracy on the challenging MSRAction3D dataset in the cross-subject setting. 
SGS also achieves 96.0% and 98.8% accuracy on the MSRActionPairs3D and UCF-Kinect datasets, respectively.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Human Action Retrieval Based on Temporal Matching Reviewed

Mengxi Lin, Nakamasa Inoue, Koichi Shinoda

Technical Reports of IEICE PRMU vol. 114 ( no. 454 ) pp. 125 - 130 2015.2

　More details

This paper focuses on human action retrieval, which aims to retrieve video segments form a video database based on an action of interest specified on-the-fly by a user. In this work, we focus on capturing temporal information of actions and propose to utilize Dynamic Time Warping (DTW) to measure the temporal distortion and difference between a pair of actions. Temporal motion saliency of a query video is introduced to re-rank the retrieval results. We evaluate our method in the Breakfast dataset and show that our method is more effective than the baselines, which do not consider any temporal orders of actions.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
A New Speech Recognition Paradigm Based on Deep Learning Reviewed

Koichi Shinoda

University of Science, VNU-HCM 2015.1

　More details

researchmap
Robust video information retrieval using speech technologies Reviewed

Koichi Shinoda

University of Information Technology, VNU-HCM 2015.1

　More details

researchmap
Vocabulary Expansion Using Word Vectors for Video Semantic Indexing Reviewed

Nakamasa Inoue, Koichi Shinoda

MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE 851 - 854 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2733373.2806347

Web of Science

researchmap
New materials for spoofing touch-based fingerprint scanners Reviewed

Jan Spurny, Michal Dolezel, Ondrej Kanich, Martin Drahansky, Koichi Shinoda

2015 International Conference on Computer Application Technologies (CCATS) 207 - 211 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCATS.2015.57

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Localization with spatio-temporal selective search and SPPNET

Ryosuke Yamamoto, Nakamasa Inoue, Koichi Shinoda

2015 TREC Video Retrieval Evaluation, TRECVID 2015 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology (NIST)

Scopus

researchmap
Spectral Graph Skeletons for 3D Action Recognition Reviewed

Tommi Kerola, Nakamasa Inoue, Koichi Shinoda

COMPUTER VISION - ACCV 2014, PT IV 9006 417 - 432 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-16817-3_27

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Semantic indexing using deep CNN and GMM supervectors

Nakamasa Inoue, Koichi Shinoda

2015 TREC Video Retrieval Evaluation, TRECVID 2015 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology (NIST)

Scopus

researchmap
Error Correction Using Long Context Match for Smartphone Speech Recognition Reviewed

Yuan Liang, Koji Iwano, Koichi Shinoda

Technical Reports of IPSJ SLP vol. 104 ( no. 22 ) pp. 1 - 6 2014.12

　More details

Correcting speech recognition errors on a smartphone is a challenging task that requires a lot of user effort. To reduce this user effort, we previously proposed an error correction method based on Long Context Match (LCM) with higher-order N-grams, which we combined with a simple gesture-based user interface. However, LCM was used when there was only one substitution error exists in an utterance. In this paper, we extended LCM to be used when the contexts of the error region contain errors. We examined the error regions which contain only one error and confirmed the effectiveness of the extended LCM-based method.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
TokyoTech-Waseda at TRECVID 2014 Reviewed

Nakamasa Inoue, Zhuolin Liang, Mengxi Lin, Tran Hai Dang, Koichi Shinoda, Zhang Xuefeng, Kazuya Ueki

Proc. TRECVID workshop pp. 1 - 13 2014.11

　More details

researchmap
n-Gram Models for Video Semantic Indexing Reviewed

Nakamasa Inoue, Koichi Shinoda

Proc. ACM Multimedia (MM) pp. 777 - 780 2014.11

　More details

We propose n-gram modeling of shot sequences for video semantic indexing, in which semantic concepts are extracted from a video shot. Most previous studies for this task have assumed that video shots in a video clip are independent from each other. We model the time-dependency between them assuming that n-consecutive video shots are dependent. Our models improve the robustness against occlusion and camera-angle changes by effectively using information from the previous video shots. In our experiments on the TRECVID 2012 Semantic Indexing Benchmark, we applied the proposed models to a system using Gaussian mixture models and support vector machines. Mean average precision was improved from 30.62% to 32.14%, which is the best performance on the TRECVID 2012 Semantic Indexing to the best of our knowledge.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
An Efficient Error Correction Method for Smartphone Speech Recognition Reviewed

Yuan Liang, Koji Iwano, Koichi Shinoda

pp. 29 - 30 2014.9

　More details

We propose an efficient error correction method and a simple 
gesture-based interface for speech recognition, where users mark 
the error region once, and then the word will be replaced by another 
candidate. Assuming that the words preceding/succeeding the error 
region are validated by the user, we search the Web n-grams for 
long word sequences matched to such a context. The acoustic 
features of the error region are also utilized to rerank the candidate 
words. The experimental result using CSJ proved the effectiveness 
of our method. 30.2% of the error words were corrected by a single 
operation.

researchmap
Deep Learningによる新しい音声認識パラダイム Reviewed

篠田浩一

2014.8

　More details

researchmap
映像意味検索技術の最新動向 Reviewed

篠田浩一

2014.7

　More details

インターネット上で大量の映像が流通する今日、例えば犬、椅子、夜景など、ユーザにとって意味のある情報を大規模な映像データから精度よく抽出する技術が望まれている。米国立標準技術研究所(NIST)が主催する映像検索の競争型ワークショップTRECVIDにおいて卓越した成績を上げている講演者より、当該技術分野の最新動向を紹介いただく。

researchmap
口唇の深度画像を用いたマルチモーダル音声認識 Reviewed

押尾翔平, 岩野公司, 篠田浩一

情報処理学会研究報告 SLP vol. 102 ( no. 2 ) pp. 1 - 6 2014.7

　More details

音声認識の雑音耐性の向上のための手段のひとつとして、唇動画像情報を音声情報とともに利用するマルチモーダル音声認識の研究が数多く行われている。本研究では、音声認識のための画像特徴量として、従来の正面画像に加え、Microsoft Kinectから得られる深度情報を用いる手法を提案する。HMMによる口唇・口腔の輪郭抽出手法に深度情報を入力として加えるほか、唇の突出などで生じる凹凸を画像特徴量として抽出する手法を導入した。日本語文音声に対する連続音声認識実験の結果、複数話者のデータを用いた際に、単語正解制度が66.0%から67.0%に増加し、発声時に口を尖らせる音素や舌が口腔を塞ぐような動きをする音素に対して提案手法が特に有効であることが確認された。

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
映像意味検索の未来 Reviewed

篠田浩一

2014.7

　More details

現在，インターネット上には大量の映像が溢れています．それらは多様であり，かつ，必ずしも品質が高くありません．現在，そのような映像の集まりから，我々にとって意味のある情報を抽出する映像意味検索技術が急速に進歩しています．本講演では，映像意味検索の国際競争型ワークショップTREVIDにおける現在主流の技術を紹介した上で，今後のトレンドを大胆に予測します．多様性や低品質に対応するための機械学習に基づくアプローチ，実用において必須な高速化技術，画像・音声・テキストなどの様々な情報を統合するマルチモーダル認識技術に特に焦点を当て，それらにおける課題と予想される今後の発展について議論します．

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Semantics for Large-Scale Multimedia: New Challenges for NLP Reviewed

Florian Metze, Koichi Shinoda

2014.6

　More details

Thousands of videos are constantly being uploaded to the web, creating a vast resource, and an ever-growing demand for methods to make them easier to retrieve, search, and index. As it becomes feasible to extract both low-level as well as high-level (symbolic) audio, speech, and video features from this data, these need to be processed further, in order to learn and extract meaningful relations between these. The language processing community has made huge process in analyzing the vast amounts of very noisy text data that is available on the Internet. While it is very difficult to create semantic units of low-level image descriptors or non-speech sounds by themselves, it is comparatively easy to ground semantics in the word output of a speech recognizer, or text data that is loosely associated with a video. This creates an opportunity for NLP researchers to use their unique skills, and make significant contributions to solve tasks on data that is even noisier than web text, but (we argue) even more interesting and challenging.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Robust Video Information Retrieval using Speech Technologies Reviewed

Koichi Shinoda

2014.6

　More details

Lecture 1: Speaker adaptation techniques for speech recognition 
Speaker adaptation techniques were extensively studied in early 90's and has been still one of the essential techniques in automatic speech recognition. They belong to one type of transfer learning, in which the parameters of a speaker-independent model are modified so that they fit the acoustic characteristics of an individual, with a small amount of his/her utterances. These techniques are successfully applied not only to the difference of speakers, but also to those of channels, noise environments, and so on. In this lecture, we first explain fundamental speaker adaptation techniques, Maximum A Posteriori (MAP) estimation, Maximum Likelihood Linear Regression (MLLR) , Eigenvoice, and then, how they are combined with each other and with the other training techniques such as discriminative learning and with structure learning, such as Structural MAP (SMAP) or SMAPLR. We also discuss how those techniques are applied to robust speech recognition under noisy environment. 
Lecture 2: Robust video information retrieval using speech technologies 
The amount of video data on the Internet has been rapidly increasing. Those video have large variety and in most case with low quality. Robust techniques for video indexing are strongly demanded. In automatic video semantic indexing, a user submits a textual input query for a desired object or a scene to a search system, which returns video shots that include the object or scene. In this application, many techniques developed in speech research have been successfully employed. For example, a new method using Gaussian-mixture model (GMM) supervectors and support vector machines (SVMs) was recently proven to be very effective. In this method, speech technologies such as speaker verification and speaker adaptation techniques play very important roles. In this lecture, we first introduce the activities of NIST TRECVID workshop which is a showcase of the state-of-the-art video search technologies, and then, discuss several techniques such as SIFT and HOG features, Bag of Visual Words, Fisher kernel, Multi-modal framework, and Fast tree search, to achieve robustness against the variety of the Internet video.

researchmap
Discriminative PLDA training with application-specific loss functions for speaker verification Reviewed

Johan Rohdin, Sangeeta Biswas, Koichi Shinoda

Proc. Odyssey Workshop pp. 26 - 32 2014.6

　More details

Speaker verification systems are usually evaluated by a weighted average of its false acceptance (FA) rate and false rejection (FR) rate. The weights are known as the operating point (OP) and depend on the applications. Recent researches suggest that, for the purpose of score calibration of speaker verification systems, it is beneficial to let discriminative training emphasize on the operating points of interest, i.e., use application-specific loss functions. In score calibration, a transformation is applied to the scores in order to make them better represent likelihood 
ratios. The same application-specific training objective can be used in discriminative training of all parameters of a speaker verification system. In this study, we apply application-specificloss functions in discriminative PLDA training. We observe an improvement in the minimun detection cost function (minDCF) for the male trials of the NIST SRE10 telephone for the targeted operating point compared to the baseline, discriminative PLDA training with logistic regression loss.

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
i-Vector Selection for Effective PLDA Modeling in Speaker Recognition Reviewed

Sangeeta Biswas, Johan Rohdin, Koichi Shinoda

Proc. Odyssey Workshop pp. 100 - 105 2014.6

　More details

Data selection is an important issue in speaker recognition. In previous studies, the data selection for universal background model (UBM) training and for the background dataset of support vector machines (SVM) have been addressed. In this paper, we address the data selection for a probabilistic linear discriminant analysis (PLDA) model which is one of the state-of-the-art methods for i-vector scoring. We first show that the data selection using the conventional k-NN method indeed improves the speaker verification performance. We then propose a robust way of selecting k by using a local distance-based outlier factor (LDOF). We name our method as flexible k-NN or fk-NN. Our fk-NN obtained significant performance improvements on both male and female trials of the NIST speaker recognition 
evaluation (SRE) 2006 core task, NIST SRE 2008 core task (condition-6) and NIST SRE 2010 coreext-coreext task 
(condition-5).

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
映像検索への音声工学からのアプローチ Reviewed

篠田浩一

2014.5

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Collection and analysis of multi-party interaction data for automatic boredom recognition Reviewed

Nataliia Biriukova, Kotaro Funakoshi, Koichi Shinoda

Proc. The 28th Annual Conference of the Japanese Society for Artificial Intelligence (JSAI) 2014 28 pp. 1 - 4 2014.5

　More details

Language：English

In interactive systems, it is important to keep users' attention and not to get them bored. This paper explains our dialog corpus annotated with boredome information, and reports the analysis results.

CiNii Books

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
高速な画像分類のためのNeighbor-to-Neighbor探索 Reviewed

井上中順, 篠田浩一

電子情報通信学会技術研究報告 PRMU vol. 113 ( no. 493 ) pp. 97 - 102 2014.3

　More details

高速な画像分類のためのNeighbor-to-Neighbor(NTN)探索を提案する．NTN探索は，ベクトル量子化および混合ガウス分布の確率計算の計算量削減を行うアルゴリズムであり，近年注目されている，Bag-of-visual-wordsやFisher vectorを用いた画像分類の高速化が可能となる．NTN探索は入力となる各特徴量に対して，その近傍が定義されていることを仮定し，類似した特徴量を近傍から近傍へと探索して，特徴量の類似度に応じて計算を省略することで高速化を実現する．例えば，Dense samplingによって，格子点上から密に抽出されたSIFT特徴量に対してNTN探索を適用する場合，各SIFT特徴量に関して上下左右に隣接した点上のSIFT特徴量の集合を近傍と考えることで，類似したSIFT特徴量を隣から隣へと探索しつつベクトル量子化や混合ガウス分布に関する計算を高速に行うことができる．PASCAL VOC 2007 Classification Challengeにおける本手法の評価実験では，NTN探索の導入により，検出精度を保った状態で，ベクトル量子化および混合ガウス分布に関してそれぞれ77.4%，89.3%の計算量削減を実現した．

researchmap
Deep Learningを用いた映像のセマンティックインデクシングのための特徴次元削減 Reviewed

森宏太郎, 井上中順, 篠田浩一

2014年電子情報通信学会総合大会講演論文集 pp. 85 2014.3

　More details

近年,インターネット上の映像が日々増加している.大量の映像に対して効率よく検索を行うために,映像のセマンティックインデクシングシステムの高精度化が求められている.本研究では,セマンティックインデクシングに有用な特徴量を得るための特徴次元削減手法を提案する.一般物体認識で広く用いられているSIFT特徴ベクトルを対象に,PCAやAutoencoderによる次元削減を適用した.Caltech256データセットを用いて画像分類を行い,提案手法を適用した場合の画像分類精度を評価した.画像分類精度と次数とのトレードオフ関係を明らかにし,特徴次元削減における次数などのパラメータを決定するための知見を得た.

researchmap
Velocity Pyramid for Event Detection Reviewed

Zhuolin Liang, Nakamasa Inoue, Koichi Shinoda

Technical Reports of IEICE PRMU vol. 113 ( no. 493 ) pp. 13 - 18 2014.3

　More details

In this paper, we propose a new motion feature, a velocity pyramid, for multimedia event detection. In an event which is a complex human activity, motion information is an important cue. However, most of the conventional motion features are too expensive when applied to event detection. Spatial pyramid matching introduces coarse geometric information into the Bag of Features framework. A velocity pyramid, which is motivated from spatial pyramid, can represent rough dynamic information. The idea is to categorize densely sampled features according to their velocity direction. It is effective for detecting events characterized by their temporal patterns. Experiments on the MED (Multimedia Event Detection) task of the TRECVID workshop have shown 20% improvement of the performance by velocity pyramid. Further, when combined with spatial pyramid, velocity pyramid provided an extra 5% gains to the detection performance. Also when compared with other motion features, the computation cost is reduced while keeping the performance.

researchmap
Discriminatively Trained PLDA with Partially Preserved Model Assumptions in Speaker Verification Reviewed

Johan Rohdin, Sangeeta Biswas, Koichi Shinoda

Proc. ASJ 2014 Spring Meeting pp. 99 - 100 2014.3

　More details

researchmap
Robust 0-1 Loss Training for PLDA in Speaker Verification Reviewed

Johan Rohdin, Sangeeta Biswas, Koichi Shinoda

Proc. ASJ 2014 Spring Meeting pp. 101 - 102 2014.3

　More details

researchmap
Training Multiple PLDA Models by Clustered I-Vectors for Speaker Verification Reviewed

Sangeeta Biswas, Johan Rohdin, Koichi Shinoda

Proc. ASJ 2014 Spring Meeting pp. 97 - 98 2014.3

　More details

researchmap
TRECVID 映像意味インデクシングのための音声・音響技術 Reviewed

井上中順, 森宏太郎, Liang Zhuolin, 篠田浩一

日本音響学会2014年春季研究発表会講演論文集 pp. 129 - 130 2014.3

　More details

researchmap
映像意味検索技術の最新動向 Reviewed

篠田浩一

日本音響学会2014年春季研究発表会講演論文集 pp. 531 - 532 2014.3

　More details

researchmap
映像意味検索の現状と課題 Reviewed

篠田浩一

電子情報通信学会技術研究報告 EMM vol. 113 ( no. 480 ) pp. 1 - 2 2014.2

　More details

インターネット上の大量の消費者映像から，そのコンテンツを解析して，情報を抽出する映像意味検索技術が盛んに研究されている．映像検索に関する国際競争型ワークショップTRECVID とそこで行われている研究活動を概観する．その中で特に，意味インデクシングとマルチメディアイベント検出の2 つのタスクに焦点を当て，そこで使用されている主な技術を解説する．最後に映像意味検索技術の今後の展開を予想する．

researchmap
Video Semantic Indexing Using Speech Technologies Reviewed

Koichi Shinoda

Dublin City University 2014.1

　More details

researchmap
「音声認識」は今後こうなる！ Reviewed

河原達也, 篠田浩一, 堀貴明, 堀智織, 篠崎隆宏

page 6 2014.1

　More details

researchmap
Event detection by velocity pyramid Reviewed

Zhuolin Liang, Nakamasa Inoue, Koichi Shinoda

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8325 ( 1 ) 353 - 364 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-04114-8_30

Scopus

researchmap
SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS USING A HIERARCHY OF OUTPUT LAYERS Reviewed

Ryan Price, Ken-ichi Iso, Koichi Shinoda

2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 153 - 158 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
AN EFFICIENT ERROR CORRECTION INTERFACE FOR SPEECH RECOGNITION ON MOBILE TOUCHSCREEN DEVICES Reviewed

Yuan Liang, Koji Iwano, Koichi Shinoda

2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 454 - 459 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Simple Gesture-based Error Correction Interface for Smartphone Speech Recognition Reviewed

Yuan Liang, Koji Iwano, Koichi Shinoda

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 1194 - 1198 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
CONSTRAINED DISCRIMINATIVE PLDA TRAINING FOR SPEAKER VERIFICATION Reviewed

Johan Rohdin, Sangeeta Biswas, Koichi Shinoda

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) pp. 1689 - 1693 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
TRECVideo Semantic Indexing Reviewed

Koichi Shinoda

Yahoo! Japan Research 2013.11

　More details

researchmap
Machine Learning for Multimedia Sequential Pattern Recognition International journal

Koichi Shinoda, Jen-Tzung Chien

2013 APSIPA Tutorial #5 2013.10

　More details

Language：English

researchmap
Statistical Video Semantic Indexing Reviewed

Koichi Shinoda

National Chiao Tung University (國立交通大学) 2013.10

　More details

researchmap
A Regression Approach to Emotion Estimation in Spontaneous Speech Reviewed

Qiongqiong Wang, Koichi Shinoda

pp. 87 - 88 2013.9

　More details

researchmap
What speech researchers should know about video technology! International journal

Koichi Shinoda, Florian Metze

Tutorial at INTERSPEECH2013 2013.8

　More details

Language：English

Thousands of videos are constantly being uploaded to the web, creating
a vast resource, and an ever-growing demand for methods to make them
easier to index, search, and retrieve. While visual information is a
very important part of a video, acoustic and speech information often
complements it. State of the art "content-based video retrieval"
(CBVR) research faces several challenges: how to robustly and
efficiently process large amounts of data, how to train classifiers
and segmenters on unlabeled data, how to represent and then fuse
information across modalities, how to include human feedback, etc.
Thanks to the advancement of computation technology, many of the
statistical approaches we originally developed for speech processing
can now be readily used for CBVR. This tutorial aims to present to the
speech community the state of the art in video processing, by
discussing the most relevant tasks at NIST's TREC Video Retrieval
Evaluation (TRECVID) evaluation and workshop series
(http://trecvid.nist.gov/) We liken TRECVID's "Semantic Indexing"
(SIN) task, in which a system must identify occurrences of concepts
such as "desk", or "dancing" in a video to the word spotting approach.
We then proceed to explain more recent, and challenging tasks, such as
"Multimedia Event Detection" (MED), and "Multimedia Event Recounting"
(MER), which can be compared to meeting transcription and
summarization tasks in the speech area. We will then proceed to lay
out how the speech and language community can contribute to this work,
given its own vast body of experience, and identify opportunities for
advancing speech-centric research on these datasets, whose large scale
and multi-modal nature pose unique challenges and opportunities for
future research.

researchmap
A statistical approach for person verification using human behavioral patterns Reviewed

Felipe Gomez-Caballero, Takahiro Shinozaki, Sadaoki Furui, Koichi Shinoda

EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING 2013:44 pp. 1 - 11 2013.8

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1186/1687-5281-2013-44

Web of Science

researchmap
話者認識の国際動向

越仲孝文, 篠田浩一

日本音響学会誌 vol. 69 ( no. 7 ) 342 - 348 2013.7

　More details

Language：Japanese Publisher：The Acoustical Society of Japan (ASJ)

CiNii Books

researchmap
Fusing deep speaker specific features and MFCC for robust speaker verification Reviewed

Ryan Price, Koichi Shinoda, Sangeeta Biswas

IPSJ SIG technical reports Vol. 2013-SLP-97 ( No. 3 ) pp. 1 - 7 2013.7

　More details

researchmap
Feature normalization based on non-extensive statistics for speech recognition Reviewed

Hilman F. Pardede, Koji Iwano, Koichi Shinoda

Speech Communication 55 ( 5 ) 587 - 599 2013.6

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.specom.2013.02.004

Scopus

researchmap
大語彙手話認識のための動素決定木クラスタリング Reviewed

安中哲也, 篠田浩一

第19回画像センシングシンポジウム pp. IS3 - 18 2013.6

　More details

researchmap
多人数対話ロボットの実現にむけたマルチモーダル対話データの収集と分析 Reviewed

石川真也, 船越孝太郎, 篠田浩一, 中野幹生

2013年度人工知能学会全国大会(JSAI)論文集 27 pp. 224 - 227 2013.6

　More details

Language：Japanese

複数人のユーザと同時に対話できるマルチモーダル対話システムの構築を目指して，3人1組のグループが1体の小型ロボットと簡単なゲームを行う対話データを収集した．本発表では収集したデータの概要と，いくつかの側面からの分析結果を報告する．対話データの収集はWizard-of-Oz形式で行い，3人の参加者は監督者からの簡単な指示を受けて対話の場に出入りを繰り返し，ロボットとゲームを行った．

CiNii Books

researchmap
発声様態依存モデルを用いた話者認識 Reviewed

小塚俊来, 岩野公司, 篠田浩一

日本音響学会講演論文集 pp. 185 - 188 2013.3

　More details

researchmap
Speaker verication using deep speaker-discriminative representations Reviewed

Ryan Price, Koichi Shinoda

2013 Spring Meeting ASJ pp. 81 - 82 2013.3

　More details

researchmap
音声認識のためのq ガウス分布を用いた音響モデル Reviewed

周澤西, 岩野公司, 篠田浩一

日本音響学会講演論文集 pp. 175 - 178 2013.3

　More details

researchmap
Reusing Speech Techniques for Video Semantic Indexing Reviewed International journal

Koichi Shinoda, Nakamasa Inoue

IEEE signal processing magazine Vol. 30 ( No. 2 ) 118 - 122 2013.3

　More details

Language：English

DOI： 10.1109/MSP.2012.2230520

Web of Science

researchmap
カメラの動き補正に基づく時空間特徴量とGMM supervectorを用いた映像からのイベント検出 Reviewed

上嶋勇祐, 井上中順, 篠田浩一

電子情報通信学会技術研究報告 vol. 112 ( no. 441 ) pp. 185 - 190 2013.2

　More details

researchmap
話者認識技術の現状と課題 Reviewed

網野加苗, 石原俊一, 小川哲司, 長内隆, 黒岩眞吾, 越仲孝文, 篠田浩一, 柘植覚, 西田昌史, 松井知子, 王龍標

電子情報通信学会技術研究報告 Vol. 112 ( No. 450 ) pp. 63 - 70 2013.2

　More details

researchmap
GMM Supervectorとビデオクリップスコアを用いた映像のセマンティックインデクシング Reviewed

井上中順, 篠田浩一

電子情報通信学会技術研究報告 vol. 112 ( no. 441 ) pp. 173 - 178 2013.2

　More details

researchmap
Spectral subtraction based on non-extensive statistics for speech recognition Reviewed

Hilman Pardede, Koji Iwano, Koichi Shinoda

IEICE Transactions on Information and Systems E96-D ( 8 ) 1774 - 1782 2013

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electronics, Information and Communication, Engineers, IEICE

DOI： 10.1587/transinf.E96.D.1774

Scopus

researchmap
Q-Gaussian mixture models for image and video semantic indexing Reviewed

Nakamasa Inoue, Koichi Shinoda

Journal of Visual Communication and Image Representation 24 ( 8 ) 1450 - 1457 2013

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.jvcir.2013.10.005

Scopus

researchmap
Statistical Person Verification Using Behavioral Patterns from Complex Human Motion Reviewed

Felipe Gomez-Caballero, Takahiro Shinozaki, Sadaoki Furui, Koichi Shinoda

NEW TRENDS IN IMAGE ANALYSIS AND PROCESSING - ICIAP 2013 8158 550 - 558 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-642-41190-8_60

Web of Science

researchmap
Neighbor-To-Neighbor Search for Fast Coding of Feature Vectors Reviewed

Nakamasa Inoue, Koichi Shinoda

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) 1233 - 1240 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICCV.2013.156

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Neighbor-To-Neighbor Search for Fast Coding of Feature Vectors Reviewed

Nakamasa Inoue, Koichi Shinoda

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) 1233 - 1240 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICCV.2013.156

Web of Science

researchmap
Event detection in consumer videos using GMM supervectors and SVMs Reviewed

Yusuke Kamishima, Nakamasa Inoue, Koichi Shinoda

Eurasip Journal on Image and Video Processing 2013 pp. 1 - 13 2013

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1186/1687-5281-2013-51

Scopus

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Combining Deep Speaker Specific Representations with GMM-SVM for Speaker Verification Reviewed

Ryan Price, Sangeeta Biswas, Koichi Shinoda

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 2787 - 2791 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Detection of overlapped speech using lapel microphones in meeting Reviewed

Ryo Yokoyama, Yu Nasu, Koji Iwano, Koichi Shinoda

Speech Communication 55 ( 10 ) 941 - 949 2013

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.specom.2013.06.013

Scopus

researchmap
q-Gaussian mixture models based on non-extensive statistics for image and video semantic indexing Reviewed

Nakamasa Inoue, Koichi Shinoda

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7725 ( 2 ) 499 - 510 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-642-37444-9_39

Scopus

researchmap
Tokyo Tech Speaker Recognition Reviewed

Sangeeta Biswas, Johan Rohdin, Koichi Shinoda

NIST SRE 2012 2012.12

　More details

researchmap
コミュニケーションとしての映像とその検索 Reviewed

篠田浩一

第15回情報理論的学習理論ワークショップ(IBIS2012) 2012.11

　More details

researchmap
TokyoTechCanon at TRECVID 2012 Reviewed

Nakamasa Inoue, Yusuke Kamishima, Kotaro Mori, Koichi Shinoda

TRECVID 2012 2012.11

　More details

researchmap
Emerging Trends in Video Search Technology Reviewed

Koichi Shinoda

電子情報通信学会誌 Vol. 95 ( No. 10 ) 932 - 938 2012.10

　More details

Language：Japanese Publisher：電子情報通信学会

CiNii Books

researchmap
映像検索技術の最新動向 Reviewed

篠田浩一

2012.10

　More details

現在、インターネット上のビデオデータが激増しており、 
その効率的な検索技術が必要とされている。本講演では、米国NIST 
主催の国際競争型映像検索・評価ワークショップTRECVIDにおける 
最新の研究動向を紹介する。

researchmap
Active Learning Using Phone-Error Distribution for Speech Modeling Reviewed

Hiroko Murakami, Koichi Shinoda, Sadaoki Furui

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E95D ( 10 ) 2486 - 2494 2012.10

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1587/transinf.E95.D.2486

Web of Science

researchmap
Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model Reviewed

Takafumi Koshinaka, Kentaro Nagatomo, Koichi Shinoda

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E95D ( 10 ) 2469 - 2478 2012.10

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1587/transinf.E95.D.2469

Web of Science

researchmap
映像のセマンティックインデクシングのためのq-混合ガウス分布 Reviewed

井上中順, 篠田浩一

信学技報 Vol. 112 ( No. 197 ) pp. 31 - 36 2012.9

　More details

researchmap
ディープラーニングを用いた日本語大語彙話し言葉音声認識 Reviewed

西野大輔, 篠田浩一, 古井貞熙

日本音響学会2012年秋季研究発表会講演論文集 2012.9

　More details

researchmap
A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors Reviewed

Nakamasa Inoue, Koichi Shinoda

IEEE TRANSACTIONS ON MULTIMEDIA 14 ( 4 ) 1196 - 1205 2012.8

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/TMM.2012.2191395

Web of Science

researchmap
Mobile or Cloud-based Photo/Video Analytics? Reviewed

Winston Hsu, Kunio Kashino, Keiichiro Hoashi, Koichi Shinoda, Duy-Dinh Le, Masanori Sugimoto

2012.8

　More details

researchmap
A video watermarking method to objects robust against various attacks Reviewed

Ta Minh THANH, Koichi SHINODA

IEICE Technical Report Vol. 112 ( No. 190 ) pp. 43 - 48 2012.8

　More details

Language：English

CiNii Books

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
音声認識における転移学習：話者適応 Reviewed

篠田浩一

人工知能学会誌 vol. 27 ( no. 4 ) 359 - 364 2012.7

　More details

Language：Japanese Publisher：社団法人人工知能学会

CiNii Books

researchmap
複数ピンマイクで収音された会議音声の重畳区間検出 Reviewed

横山諒, 那須悠, 岩野公司, 篠田浩一

情報処理学会研究報告 Vol. 2012-SLP-92 ( No. 6 ) 2012.7

　More details

researchmap
クラウド時代の新しい音声研究パラダイム Reviewed

秋葉友良, 岩野公司, 緒方淳, 小川哲司, 小野順貴, 篠崎隆宏, 篠田浩一, 南條浩輝, 西崎博光, 西田昌史, 西村竜一, 原直, 堀貴明

情報処理学会研究報告 Vol. 2012-SLP-92 ( No. 4 ) 1 - 7 2012.7

　More details

Language：Japanese

CiNii Books

researchmap
映像検索技術の最前線 Reviewed

篠田浩一

第18回画像センシングシンポジウム講演論文集 Vol. ( No. ) OS3 - 02 2012.6

　More details

researchmap
Multimodal Interface for Error Correction in Speech Recognition Reviewed

Koichi Shinoda

Microsoft Research Asia IJARC CORE7 Project Summary Booklet Vol. ( No. ) pp. 15 - 16 2012.6

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
GMM-Supervectorを用いた映像の高速セマンティック検索システム Reviewed

井上中順, 篠田浩一

第18回画像センシングシンポジウム講演論文集 Vol. ( No. ) DS2 - 08 2012.6

　More details

researchmap
音声認識におけるモデル間スケーリング係数の自動推定 Reviewed

大西祥史, 江森正, 越仲孝文, 篠田浩一

電子情報通信学会論文誌 Vol. J95-D ( No. 5 ) pp. 1276 - 1285 2012.5

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
Recognition of Indonesian Code-Switching Speech Reviewed

Yonatan Andy, Fajar Nugraha, Koichi Shinoda, Sadaoki Furui, Koji Iwano

2012 Spring Meeting ASJ Vol. ( No. ) pp. 75 - 76 2012.3

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
相互スペクトル減算と振幅スペクトル相関を用いた会議音声の重畳区間検出 Reviewed

横山諒, 那須悠, 篠田浩一, 岩野公司

日本音響学会2012年春季研究発表会講演論文集 Vol. ( No. ) pp. 13 - 14 2012.3

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
A Compensation Technique Using q-Logarithm for Noisy Speech Recognition Reviewed

Hilman F. Pardede, Koichi Shinoda, Koji Iwano

2012 Spring Meeting ASJ Vol. ( No. ) pp. 19 - 20 2012.3

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
MAP Adaptation Using Multiple Priors for Speaker Verication Reviewed

Sangeeta Biswas, Johan Rohdin, Koichi Shinoda, Sadaoki Furui

2012 Spring Meeting ASJ Vol. ( No. ) pp. 79 - 82 2012.3

　More details

researchmap
Spectral Subtraction Based on q-Gaussian Assumption for Noise Robust Speech Recognition Reviewed

Hilman F. Pardede, Koichi Shinoda, Koji Iwano

2012 Spring Meeting ASJ Vol. ( No. ) pp. 21 - 22 2012.3

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Language Model for Efficient Error Correction in Speech Recognition Reviewed

Yuan Liang, Koichi Shinoda, Sadaoki Furui

2012 Spring Meeting ASJ Vol. ( No. ) pp. 89 - 90 2012.3

　More details

researchmap
Non-extensive Statistics for Feature Normalization in Speech Recognition Reviewed

Hilman F. Pardede, Koichi Shinoda

Proc. International Workshop on Statistical Machine Learning for Speech Processing (IWSML) 2012 Vol. ( No. ) pp. 2012.3

　More details

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
コミッティに基づく能動学習・半教師付き学習を用いた音声モデル Reviewed

蔦岡拓也, 篠田浩一

日本音響学会2012年春季研究発表会講演論文集 Vol. ( No. ) pp. 55 - 56 2012.3

　More details

researchmap
Speaker Adaptation for Dialog Act Recognition Reviewed

Johan Rohdin, Koichi Shinoda

2012 Spring Meeting ASJ Vol. ( No. ) p. 111 2012.3

　More details

researchmap
Two-pass approach for recognizing code-switching speech Reviewed

Yonatan Andy, Fajar Nugraha, Koichi Shinoda, Sadaoki Furui

IEICE Technical Report Vol. ( No. SP2011-150 ) pp. 225 - 229 2012.2

　More details

researchmap
手話素単位を用いた大語彙手話認識 Reviewed

佐藤新, 篠田浩一

電子情報通信学会技術研究報告 Vol. ( No. PRMU2011-222 ) pp. 155 - 160 2012.2

　More details

researchmap
GMM-SupervectorとSVMを用いた映像からのイベント検出 Reviewed

上嶋勇祐, 井上中順, 篠田浩一, 佐藤俊介

電子情報通信学会技術研究報告 Vol. ( No. PRMU2011-230 ) pp. 195 - 200 2012.2

　More details

researchmap
Robust Gait-Based Person Identification against Walking Speed Variations Reviewed

Muhammad Rasyid Aqmar, Koichi Shinoda, Sadaoki Furui

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E95D ( 2 ) 668 - 676 2012.2

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1587/transinf.E95.D.668

Web of Science

researchmap
Subject adaptation and adaptive training for gait-based person identification Reviewed

Muhammad Rasyid Aqmar, Koichi Shinoda, Sadaoki Furui

IEICE Technical Report Vol. ( No. PRMU2011-199 ) pp. 77 - 82 2012.2

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

The performance of gait-based person identification using statistical methods such as hidden Markov models (HMMs) often degrades when the amount of training data for each subject is insufficient. In our previous study, we proposed a gait-based person identification method using HMMs where we trained subject-dependent model from the scratch using each subject's training data. In this paper, we propose a model adaptation scheme, where the data from the other subjects are effectively utilized in the model training. We further improve the adaptation performance using subject adaptive training by effectively excluding the inter-subject variability from the subject-independent model. The proposed method improved the identification performance even when the amount of data was extremely small.

CiNii Books

researchmap
固定監視カメラからの人混み中の行動イベント検出 Reviewed

和田俊也, 篠田浩一

電子情報通信学会技術研究報告 Vol. ( No. PRMU2011-173 ) pp. 257 - 262 2012.1

　More details

researchmap
A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors Reviewed

Nakamasa Inoue, Koichi Shinoda

IEEE Transactions on Multimedia 14 ( 4 ) 1196 - 1205 2012

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/TMM.2012.2191395

Scopus

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Acoustic model training using committee-based active and semi-supervised learning for speech recognition Reviewed

Takuya Tsutaoka, Koichi Shinoda

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Efficient Model Training for HMM-based Person Identification by Gait Reviewed

Muhammad Rasyid Aqmar, Koichi Shinoda, Sadaoki Furui

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Speech technology plays a key role in video semantic indexing Reviewed

Koichi Shinoda

AVMA'12 - Proceedings of the 2012 ACM Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis, Co-located with ACM Multimedia 2012 1 - 2 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2390214.2390216

Scopus

researchmap
MULTIMEDIA EVENT DETECTION USING GMM SUPERVECTORS AND SVMS Reviewed

Yusuke Kamishima, Nakamasa Inoue, Koichi Shinoda, Shunsuke Sato

2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012) 3089 - 3092 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICIP.2012.6467553

Web of Science

researchmap
Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity Reviewed

Ryo Yokoyama, Yu Nasu, Koichi Shinoda, Koji Iwano

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 1498 - 1501 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap

Other Link： http://orcid.org/0000-0003-1095-3203
Q-Gaussian based spectral subtraction for robust speech recognition Reviewed

Hilman E. Pardede, Koichi Shinoda, Koji Iwano

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 1254 - 1257 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Speaker verification using MMAP adaptation Reviewed

Sangeeta Biswas, Johan Rohdin, Koichi Shinoda, Sadaoki Furui

IEICE Technical Report Vol. ( No. SP2011-93 ) pp. 133 - 137 2011.12

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

This paper proposes maximum a posteriori (MAP) adaptation of Gaussian mixture models (GMM) using multiple priors for text-independent speaker verification. Although the hierarchical prior used in structural MAP (SMAP) adaptation has been proven to outperform the prior used in relevance MAP adaptation, there may still be some complementary information in the relevance prior which might be useful. An idea of combining these two priors is introduced here in the MAP framework. We call this method multiprior MAP (MMAP). We evaluated our proposed method on NIST SRE 2006 10sec4w-10sec4w task. We compared MMAP with classical maximum likelihood (ML) estimation, relevance MAP and SMAP adaptation techniques and proved its effectiveness. We also investigated the effect of Z-Norm and T-Norm on the likelihood ratio scores of different systems here.

CiNii Books

researchmap
［特別講演］映像の高性能なセマンティックインデクシングを目指して Reviewed

井上中順, 篠田浩一

電子情報通信学会技術研究報告 Vol. ( No. PRMU2011-140 ) pp. 89 - 94 2011.12

　More details

researchmap
TokyoTech+Canon at TRECVID 2011 Reviewed

Nakamasa Inoue, Yusuke Kamishima, Toshiya Wada, Koichi Shinoda, Shunsuke Sato

Proc.TRECVID Workshop 2011 Vol. ( No. ) pp. 2011.12

　More details

researchmap
Speaker Adaptation Techniques for Automatic Speech Recognition Reviewed

Koichi Shinoda

Proc. APSIPA ASC 2011 Vol. ( No. ) pp. 2011.10

　More details

researchmap
映像検索評価ワークショップTRECVID Reviewed

篠田浩一

キヤノン・イノベイティブ技術フォーラム映像認識技術ワークショップ Vol. ( No. ) pp. 2011.10

　More details

researchmap
Noise Robust Speech Recognition based on Spectral Reduction Measure Reviewed

Mayumi Beppu, Koichi Shinoda, Sadaoki Furui

Proc. APSIPA ASC 2011 Vol. ( No. PM.PS2 ) pp. 2011.10

　More details

researchmap
Committee-Based Active Learning for Speech Recognition Reviewed

Yuzo Hamanaka, Koichi Shinoda, Takuya Tsutaoka, Sadaoki Furui, Tadashi Emori, Takafumi Koshinaka

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E94D ( 10 ) 2015 - 2023 2011.10

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1587/transinf.E94.D.2015

Web of Science

researchmap
GMM尤度補正を用いた耐雑音音声認識 Reviewed

古井貞熙, 那須悠, 篠田浩一

日本音響学会2011年秋季研究発表会講演論文集 Vol. ( No. ) pp. 29 - 32 2011.9

　More details

researchmap
複数マイクロフォンを用いた音声区間検出 Reviewed

大西祥史, 篠田浩一, 越仲孝文

日本音響学会 2011年秋季研究発表会講演論文集 Vol. ( No. ) pp. 37 - 38 2011.9

　More details

researchmap
Speaker Adaptation for Dialogue Act Classification Reviewed

Johan Rohdin, Koichi Shinoda

IPSJ SIG Technical Report Vol. 2011-SLP-87 ( No. 8 ) pp. 2011.7

　More details

researchmap
Nonlinear Normalization Using q-Logarithm for Robust Speech Recognition Reviewed

Hilman, Koichi Shinoda, Koji IWANO

IEICE Technical Report Vol. 111 ( No. 153 ) pp. 45 - 50 2011.7

　More details

researchmap
木構造GMMを用いたセマンティックインデクシングの高速化 Reviewed

井上中順, 篠田浩一

電子情報通信学会技術研究報告 PRMU vol. 111 ( no. 77 ) pp. 105 - 110 2011.6

　More details

researchmap
Multimodal Interface for Error Correction in Speech Recognition Reviewed

Koichi Shinoda

Microsoft Research Asia IJARC CORE6 Project Summary Booklet Vol. ( No. ) pp. 2011.6

　More details

researchmap
音響モデル学習のための相対エントロピーを用いた学習文選択手法 Reviewed

村上博子, 篠田浩一, 古井貞熙

日本音響学会2011年春季講演論文集 Vol. ( No. 1-5-7 ) pp. 17 - 20 2011.3

　More details

researchmap
Semi-synchronous speech and pen input for mobile user interfaces Reviewed

Koichi Shinoda, Yasushi Watanabe, Kenji Iwata, Yuan Liang, Ryuta Nakagawa, Sadaoki Furui

SPEECH COMMUNICATION 53 ( 3 ) 283 - 291 2011.3

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.specom.2010.10.001

Web of Science

researchmap
Voting Approach in SMAP Adaptation for Speaker Verification Reviewed

Sangeeta Biswas, Marc Ferras, Koichi Shinoda, Sadaoki Furui

Vol. ( No. 2-5-2 ) pp. 45 - 48 2011.3

　More details

researchmap
雑音下音声におけるスペクトル縮小の分析とその耐雑音音声認識への利用 Reviewed

別府真由美, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 SP-2010-122 ( No. ) pp. 43 - 48 2011.3

　More details

researchmap
音響モデル学習のための相対エントロピーを用いた学習文選択 Reviewed

村上博子, 篠田浩一, 古井貞熙

情報処理学会研究報告 Vol. 2011-SLP-85 ( No. 3 ) pp. 1 - 6 2011.2

　More details

researchmap
マルチモーダル・マルチフレームな手法を用いたTRECVIDセマンティックインデクシング Reviewed

井上中順, 上嶋勇祐, 篠田浩一

電子情報通信学会技術研究報告 PRMU vol. 110 ( no. 414 ) pp. 25?30 - 30 2011.2

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

We propose a multi-modal, multi-frame approach for semantic indexing in the TRECVID 2010 workshop. The goal of the semantic indexing is to develop a method for indexing many concepts that will be useful for video search. In this study, we implement a simple and accurate system by using Gaussian Mixture Model (GMM) supervectors with SIFT and MFCC features. The SIFT features are extracted not only from key-frames but also from many image frames in a shot in order to get the most out of multi-frame information. Our best result on the TRECVID 2010 dataset was 7.36% in terms of Mean Inferred Average Precision.

CiNii Books

researchmap
映像解析・検索評価ワークショップTRECVID2010の概要 Reviewed

篠田浩一, 佐藤真一

電子情報通信学会技術研究報告 PRMU vol. 110 ( no. 414 ) pp. 19 - 24 2011.2

　More details

researchmap
Multimedia Event Detection using GS-SVMs and Audio-HMMs

Nakamasa Inoue, Yusuke Kamishima, Koichi Shinoda, Shunsuke Sato

2011 TREC Video Retrieval Evaluation Notebook Papers 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology

Scopus

researchmap
Designing text corpus using phone-error distribution for acoustic modeling Reviewed

Hiroko Murakami, Koichi Shinoda, Sadaoki Furui

2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings Vol. ( No. ) 191 - 195 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ASRU.2011.6163929

Scopus

researchmap
A fast MAP adaptation technique for GMM-supervector-based video semantic indexing systems Reviewed

Nakamasa Inoue, Koichi Shinoda

MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops Vol. ( No. ) 1357 - 1360 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2072298.2072014

Scopus

researchmap
Person authentication using 3D human motion Reviewed

Felipe Gomez-Caballero, Takahiro Shinozaki, Sadaoki Furui, Koichi Shinoda

MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - JHGBU 2011 Workshop, J-HGBU'11 Vol. ( No. ) 35 - 40 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2072572.2072586

Scopus

researchmap
Acoustic Forest for SMAP-based Speaker Verification Reviewed

Sangeeta Biswas, Marc Ferras, Koichi Shinoda, Sadaoki Furui

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 Vol. ( No. ) 2388 - + 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Generalized-Log Spectral Mean Normalization for Speech Recognition Reviewed

Hilman F. Pardede, Koichi Shinoda

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 Vol. ( No. ) 1656 - 1659 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Structural Joint Factor Analysis for Speaker Recognition Reviewed

Marc Ferras, Koichi Shinoda, Sadaoki Furui

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 Vol. ( No. ) 2384 - + 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Robust speech recognition in the car environment Reviewed

Agnieszka Betkowska Cavalcante, Koichi Shinoda, Sadaoki Furui

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6562 24 - 34 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-642-20095-3_3

Scopus

researchmap
CROSS-CHANNEL SPECTRAL SUBTRACTION FOR MEETING SPEECH RECOGNITION Reviewed

Yu Nasu, Koichi Shinoda, Sadaoki Furui

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING Vol. ( No. ) 4812 - 4815 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Structural MAP Adaptation in GMM-Supervector based Speaker Recognition Reviewed

Marc Ferras, Koichi Shinoda, Sadaoki Furui

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING Vol. ( No. ) 5432 - 5435 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Semantic indexing using GMM supervectors and tree-structured GMMs

Nakamasa Inoue, Koichi Shinoda

2011 TREC Video Retrieval Evaluation Notebook Papers 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology

Scopus

researchmap
大規模映像資源のためのマルチモーダル高次特徴検出 Reviewed

井上中順, 斉藤辰彦, 篠田浩一, 古井貞熙

電子情報通信学会論文誌 Vol. J93-D ( No. 12 ) pp. 2633 - 2644 2010.12

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
カテゴリ推定に基づく動的な言語モデル適応 Reviewed

山本仁, 花沢健, 三木清一, 篠田浩一

Vol. 2010-SLP-84 ( No. 1 ) pp. 2010.12

　More details

researchmap
Inter-speaker weighted MAP adaptation for GMM-supervector speaker recognition Reviewed

Marc Ferras, Koichi Shinoda, Sadaoki Furui

IPSJ Technical Report Vol. 2010-SLP-84 ( No. 12 ) pp. 1 - 4 2010.12

　More details

researchmap
Optimal use of trees in structural MAP adaptation for speaker verification Reviewed

Sangeeta Biswas, Marc Ferras, Koichi Shinoda, Sadaoki Furui

IPSJ Technical Report Vol. 2010-SLP-84 ( No. 26 ) pp. 1 - 5 2010.12

　More details

researchmap
TT+GT at TRECVID 2010 Workshop Reviewed

Nakamasa Inoue, Toshiya Wada, Yusuke Kamishima, Koichi Shinoda, Ilseo Kim, Byungki Byun, Chin-Hui Lee

Proc. TRECVID Workshop 2010 Vol. ( No. ) pp. 2010.11

　More details

researchmap
Gait-based Person Identification Robust against Speed Variation using CHLAC features and HMMs Reviewed

Muhammad Rasyid Aqmar, Koichi Shinoda, Sadaoki Furui

IEICE technical report Vol. PRMU2010-92 ( SP2010-48, WIT2010-36 ) pp. 23 - 28 2010.10

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

The performance of gait-based person identification is strongly affected by the variations in walking speed. In our previous study, we have proposed a new framework that is robust against speed variation across gait sequences, which combines the Fisher discriminant analysis (FDA)-based cubic higher-order local auto-correlation (CHLAC) features and the statistical framework provided by hidden Markov models (HMMs). The CHLAC features capture the within-phase spatio-temporal characteristics of each walker, while the HMMs identify the person and the phase of each gait even when the walking speed changes nonlinearly. However, since CHLAC features do not have much shape information of a gait phase, it is difficult to identify/segment the walking phase accurately. Therefore in this paper, we not only use CHLAC features to train the HMM, but also utilize principal component analysis (PCA) features that have more shape information of a gait phase in order to have a better gait cycle segmentation/alignment process. We also evaluate our method when the walking speed varied within a gait sequence by manually creating mixed speed variation data within a gait sequence in TokyoTech database. We compared our method with other conventional methods using three other public databases. The proposed method was better than the others when the speed varied across and within a gait sequence.

CiNii Books

researchmap
会議音声認識のためのスペクトル減算に基づく音源分離 Reviewed

那須悠, 篠田浩一, 古井貞熙

日本音響学会2010年秋季講演論文集 Vol. ( No. 3-10-13 ) pp. 627 - 630 2010.9

　More details

researchmap
Acoustic Model Adaptation for Speech Recognition Reviewed

Koichi Shinoda

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E93D ( 9 ) 2348 - 2362 2010.9

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1587/transinf.E93.D.2348

Web of Science

researchmap
SIFT混合ガウス分布を用いた一般物体認識のためのマルチカーネル学習 Reviewed

井上中順, 上嶋勇祐, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 Vol. PRMU2010-58 ( No. ) pp. 7 - 12 2010.9

　More details

researchmap
フランス語における発声スタイルの違いがスペクトル特徴に与える影響の分析 Reviewed

別府真由美, Jean-Luc Rouas, Martine Adda-Decker, 篠田浩一, 古井貞熙

日本音響学会2010年秋季講演論文集 Vol. ( No. 3-1-2 ) pp. 257 - 260 2010.9

　More details

researchmap
ToFカメラによる3D手話認識 Reviewed

佐藤新, 篠田浩一, 古井貞熙

画像の認識・理解シンポジウム（MIRU2010) IS3-44 ( No. ) pp. 1861 - 1868 2010.7

　More details

researchmap
シンボル列化したシーンの学習と2種のプレイ種相関度による野球放送映像プレイ種識別 Reviewed

望月貴裕, 藤井真人, 篠田浩一, 酒井善則

電子情報通信学会論文誌 Vol. J93-D ( No. 6 ) pp. 1009 - 1023 2010.6

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
NIST SRE 2010:Tokyo Tech Speaker Recognition Reviewed

Marc Ferras, Sangeeta Biswas, Koichi Shinoda, Sadaoki Furui

Proc. NIST 2010 Speaker Recognition Evaluation Workshop Vol. ( No. ) pp. 2010.6

　More details

researchmap
局所的な特徴と大局的な特徴を用いた監視カメラ映像からの行動イベント検出 Reviewed

吉澤悠介, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 Vol. 2010-43,PMRU2010-31, MI2010-31 ( No. ) pp. 163 - 168 2010.5

　More details

researchmap
会議音声認識のためのスペクトル減算に基づくオンライン音源分離 Reviewed

那須悠, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 Vol. EA2010-2, SIP2010-2, SP2010-2 pp. 7 - 12 2010.5

　More details

researchmap
音響特徴を用いた映像からのイベント検出の研究 Reviewed

斉藤辰彦, 井上中順, 篠田浩一, 古井貞熙

日本音響学会2010年春季講演論文集 Vol. ( No. ) pp. 201 - 202 2010.3

　More details

researchmap
Semantic indexing using GMM supervectors with MFCCs and SIFT features

Nakamasa Inoue, Toshiya Wada, Yusuke Kamishima, Koichi Shinoda, Ilseo Kim, Byungki Byun, Chin-Hui Lee

2010 TREC Video Retrieval Evaluation Notebook Papers 2010

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：National Institute of Standards and Technology

Scopus

researchmap
SPEECH MODELING BASED ON COMMITTEE-BASED ACTIVE LEARNING Reviewed

Yuzo Hamanaka, Koichi Shinoda, Sadaoki Furui, Tadashi Emori, Takafumi Koshinaka

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING Vol. ( No. ) 4350 - 4353 2010

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Robust gait recognition against speed variation Reviewed

Muhammad Rasyid Aqmar, Koichi Shinoda, Sadaoki Furui

Proceedings - International Conference on Pattern Recognition Vol. ( No. ) 2190 - 2193 2010

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICPR.2010.536

Scopus

researchmap
Dynamic Language Model Adaptation Using Keyword Category Classification Reviewed

Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki, Koichi Shinoda

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 Vol. ( No. ) 2426 - + 2010

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
High-level feature extraction using SIFT GMMs and audio models Reviewed

Nakamasa Inoue, Tatsuhiko Saito, Koichi Shinoda, Sadaoki Furui

Proceedings - International Conference on Pattern Recognition Vol. ( No. ) 3220 - 3223 2010

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICPR.2010.787

Scopus

researchmap
音声認識のための複数の認識器を利用した能動学習 Reviewed

濱中悠三, 江森正, 越中孝文, 篠田浩一, 古井貞煕

情報処理学会研究報告 Vol. 2009-SLP-79 ( No. 4 ) pp. 1 - 8 2009.12

　More details

researchmap
TITGT at TRECVID 2009 Workshop Reviewed

Nakamasa Inoue, Shanshan Hao, Tatsuhiko Saito, Koichi Shinoda, Ilseo Kim, Chin-Hui Lee

Proc. TRECVID Workshop (TRECVID 2009) Vol. ( No. ) pp. 2009.11

　More details

researchmap
Robust Speech Recognition In The Car Environment Reviewed

Agnieszka Betkowska Cavalcante, Koichi Shinoda, Sadaoki Furui

the 4th Language and Technology Conference (LTC'09) Vol. ( No. ) pp. 39 - 43 2009.11

　More details

researchmap
SIFT混合ガウス分布と音響特徴を用いた映像からの高次特徴検出 Reviewed

井上中順, 斉藤辰彦, 篠田浩一, 古井貞煕

電子情報通信学会技術研究報告 Vol. PRMU2009-106 ( No. ) pp. 97 - 102 2009.11

　More details

researchmap
Noise robust speech recognition using spectral subtraction and F0 information extracted by Hough transform Reviewed

Hideki Yasui, Koichi Shinoda, Sadaoki Furui, Koji Iwano

Proc. Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference (APSIPA-ASC '09) Vol. ( No. ) pp. 631 - 634 2009.10

　More details

researchmap
音声認識のためのコミッティを用いた能動学習 Reviewed

濱中悠三, 江森正, 越仲孝文, 篠田浩一, 古井貞熙

日本音響学会2009年秋季講演論文集 Vol. ( No. 1-1-5 ) pp. 15 - 18 2009.9

　More details

researchmap
Multimedia Information Retrieval Using Statistical Approach Reviewed

Koichi Shinoda

Microsoft Research Asia 2009 Annual Workshop of IJARC pp. 13 2009.7

　More details

researchmap
能動的な適応文選択に基づく話者適応化 Reviewed

村上博子, 篠田浩一, 古井貞熙

日本音響学会2009年春季講演論文集 Vol. ( No. ) pp. 191 - 194 2009.3

　More details

researchmap
ハフ変換による基本周波数情報を用いた耐雑音音声認識の高性能化の検討 Reviewed

安井英己, 篠田浩一, 古井貞熙, 岩野公司

日本音響学会2009年春季講演論文集 Vol. ( No. ) pp. 35 - 38 2009.3

　More details

researchmap
統計的モデル選択によるシーン数の自動推定を用いた動画要約 Reviewed

山崎航史, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 Vol. PRUM2008-231 ( No. ) pp. 139 - 144 2009.2

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

This paper describes a video summarization system using model selection techniques to estimate the optimal number of scenes for a summary. We model a set of scenes with a Gaussian mixture model, where each mixture componernt is assumed to represent one scene. We use the minimum description length criterion and the variational Bayesian method as model selection criteria. We evaluate the proposed system using TRECVID 2007 database, that is unedited materials provided by the BBC. The recall was 0.74 when using MDL criterion, and 0.71 when using variational Bayesian method. The variational Bayesian method estimated the number of scenes more correctly than the MDL criterion.

CiNii Books

researchmap
Gait Recognition Using CHLAC Features and Hidden Markov Model Reviewed

MUHAMMAD RASYID, Koichi Shinoda, SADAOKI FURUI

IEICT Tachnical Report Vol. PRUM2008-224 ( No. ) pp. 99 - 103 2009.2

　More details

researchmap
耐雑音音声認識のためのハフ変換による基本周波数情報抽出の高速化 Reviewed

安井英己, 篠田浩一, 古井貞熙, 岩野公司

電子情報通信学会技術研究報告 Vol. SP2008-129 ( No. 2009-1 ) pp. 19 - 24 2009.1

　More details

researchmap
フレッシュアイズ映像研究現場紹介東京工業大学篠田研究室

井上中順, 篠田浩一

映像情報メディア学会誌 Vol. 63 ( No. 8 ) 1116 - 1119 2009

　More details

Language：Japanese

researchmap
INDEPENDENT COMPONENT ANALYSIS FOR NOISY SPEECH RECOGNITION Reviewed

Hsin-Lung Hsieh, Jen-Tzung Chien, Koichi Shinoda, Sadaoki Furui

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS Vol. ( No. ) 4369 - + 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
ONLINE SPEAKER CLUSTERING USING INCREMENTAL LEARNING OF AN ERGODIC HIDDEN MARKOV MODEL Reviewed

Takafumi Koshinaka, Kentaro Nagatomo, Koichi Shinoda

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS Vol. ( No. ) 4093 - + 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Speaker Adaptation Based on Two-Step Active Learning Reviewed

Koichi Shinoda, Hiroko Murakami, Sadaoki Furui

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 Vol. ( No. ) 580 - 583 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Tokyo Tech at TRECVID 2008 Reviewed

Shanshan Hao, Yusuke Yoshizawa, Koji Yamasaki, Koichi Shinoda, Sadaoki Furui

Proc. TRECVID Workshop (TRECVID 2008) Vol. ( No. ) pp. 2008.11

　More details

researchmap
スペクトルサブトラクションとハフ変換による基本周波数情報を用いた耐雑音音声認識 Reviewed

安井英巳, 岩野公司, 篠田浩一, 古井貞熙

日本音響学会2008年秋季講演論文集 Vol. ( No. 1-1-2 ) p. 3 - 6 2008.9

　More details

researchmap
パターン認識と機械学習（下）ベイズ理論による統計的予測 Reviewed

元田浩, 栗田多喜夫, 樋口知之, 松本裕治, 村田昇, 赤穂昭太郎, 神嶌敏弘, 杉山将, 小野田崇, 池田和司, 鹿島久嗣, 賀沢秀人, 中島伸一, 竹内純一, 持橋大地, 小山聡, 井手剛, 篠田浩一, 山川宏

パターン認識と機械学習・ベイズ理論による統計的予測 Vol. 2008.7

　More details

Publisher：丸善出版

researchmap
音声とペンの同時入力における個人差への適応化 Reviewed

渡邉康司, 篠田浩一, 古井貞熙

日本音響学会2008年春季講演論文集 Vol. ( No. 2-4-11 ) p. 55 - 58 2008.3

　More details

researchmap
Initial Evaluation of the Drivers' Japanese Speech Corpus in a Car Environment Reviewed

Kousuke Hiraki, Takahiro Shinozaki, Koji Iwano, Agnieszka Betkowska, Betkowska Agnieszka, Koichi Shinoda, SADAOKI FURUI

Vol. SP2007-202 ( No. ) pp. 93 - 98 2008.3

　More details

researchmap
パラメータ空間のクラスタ化による固有声話者適応化の改良 Reviewed

丹治秀太朗, 篠田浩一, 古井貞熙, オルテガアントニオ

日本音響学会2008年春季講演論文集 Vol. ( No. 2-10-11 ) pp. 91 - 94 2008.3

　More details

researchmap
木構造クラスタリングを用いた動画像からの高次特徴抽出 Reviewed

中村太一, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告no.491 PRMU2007-220 ( No. 491 ) pp. 37 - 42 2008.3

　More details

researchmap
連続音素認識を用いた単語認識誤りに頑健な講演音声検索 Reviewed

岩田憲治, 篠田浩一, 古井貞熙

日本音響学会2008年春季講演論文集 Vol. ( No. 2-10-20 ) pp. 113 - 116 2008.3

　More details

researchmap
Robust Spoken Term Detection Using Combination of Phone-Based and Word-Based Recognition Reviewed

Kenji Iwata, Koichi Shinoda, Sadaoki Furui

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 Vol. ( No. ) 2195 - 2198 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Time-lag Adaptation for Semi-synchronous Speech and Pen Input Reviewed

Yasushi Watanabe, Koichi Shinoda, Sadaoki Furui

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 Vol. ( No. ) 2675 - 2678 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Automatic score scene detection for baseball video Reviewed

Koichi Shinoda, Kazuki Ishihara, Sadaoki Furui, Takahiro Mochizuki

LARGE-SCALE KNOWLEDGE RESOURCES 4938 ( No. ) 226 - + 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Automatically estimating number of scenes for rushes summarization Reviewed

Koji Yamasaki, Koichi Shinoda, Sadaoki Furui

MM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops Vol. ( No. ) 129 - 133 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/1463563.1463587

Scopus

researchmap
Improvement of Eigenvoice-Based Speaker Adaptation by Parameter Space Clustering Reviewed

Shutaro Tanji, Koichi Shinoda, Sadaoki Furui, Antonio Ortega

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 Vol. ( No. ) 1229 - + 2008

　More details

Language：Ukrainian Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
An Interface Using Semi-synchronous Speech and Pen Input Reviewed

Koichi Shinoda

Proc. IJARC(Microsoft)-Tokyo Institute of Technology Joint Symposium on "The forefront of the Speech Recognition Research Vol. ( No. ) pp. 2007.12

　More details

researchmap
音声認識における確率モデルの重み係数の自動推定 Reviewed

江森正, 大西祥史, 篠田浩一

電子情報通信学会技術研究報告 PRUM2007-104 ( No. ) pp. 49 - 54 2007.12

　More details

researchmap
パターン認識と機械学習（上）：ベイズ理論による統計的予測 Reviewed

元田浩, 栗田多喜夫, 樋口知之, 松本裕治, 村田昇, 赤穂昭太郎, 神嶌敏弘, 杉山将, 小野田崇, 池田和司, 鹿島久嗣, 賀沢秀人, 中島伸一, 竹内純一, 持橋大地, 小山聡, 井手剛, 山川宏, 篠田浩一

パターン認識と機械学習・ベイズ理論による統計的予測 Vol. 2007.12

　More details

Publisher：シュプリンガー・ジャパン

researchmap
十分統計量を用いた教師なし話者適応における話者選択法 Reviewed

谷真宏, 江森正, 大西祥史, 越仲孝文, 篠田浩一

電子情報通信学会技術研究報告 PRUM2007-110 ( No. ) pp. 85 - 90 2007.12

　More details

researchmap
数値列化したイベントシーンの学習と試合進行状況情報による制約条件を用いた野球映像イベント識別 Reviewed

望月貴裕, 藤井真人, 八木伸行, 篠田浩一

電子情報通信学会技術研究報告 PRUM2007-149 ( No. ) pp. 77 - 82 2007.12

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

This paper proposes a framework to apply the convential method for baseball scene classification into some event-classes (home-run, single-hit, walk, etc.) targeting at only digest videos by learning symbol-sequenced scenes based on the video patternization process to full-game videos, and a method to modify with information about game status in video intervals between events. "Event candidate scenes (ECS)" to be subjected the classification process are segmented based on apperance positions of pitch-view shot, replay scene, inning change and so on. Each ECS is classified into one event-class by the method for event classification based on learning patternized and symbol-sequenced scenes. Event scenes with no events (not-event scenes) are specified on the basis of relations between classification results for ECSs and their time length. We pick out information about game status from not-event scenes and removed intervals in ECS segmentation process by some ways including the detection of base region on the basis of image analysis, and event classfication results can be modified on some constraint conditions estabished by the game status. The proposed method is evaluated by an experiment using some Major League Baseball video games.

CiNii Books

researchmap
TokyoTech's TRECVIC2007 Notebook Reviewed

Taichi Nakamura, Koichi Shinoda, Sadaoki Furui

Proc. TRECVID 2007 Workshop Vol. ( No. ) pp. 2007.11

　More details

researchmap
Home-Environment Adaptation of Phoneme Factorial Hidden Markov Models Reviewed

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

Proc. EUSIPCO 2007 Vol. ( No. ) pp. 2380 - 2384 2007.9

　More details

researchmap
投球の次ショットに重きを置いたシーンのパターン化と離散隠れマルコフモデルを用いた野球放送映像の自動イベント分類 Reviewed

望月貴裕, 藤井真人, 八木伸行, 篠田浩一

映像情報メディア学会誌 Vol. 61 ( No. 8 ) pp. 1139 - 1149 2007.8

　More details

Language：Japanese Publisher：The Institute of Image Information and Television Engineers

A method has been developed for automatically classifying baseball video scenes into some events that describe their content.The baseball scenes are patternized using a set of rectangles with image features and motion vectors.The basic unit for patternization is a shot.For the second shot of each scene which includes significant information for event-classification,a partial shot generated by dividing the shot is used as a processing unit.The scenes used for training are expressed as sequenced symbols based on the patternized data for shots and partial shots."Event-unknown"baseball scenes are assigned "event-indexes"(i.e.,homerun,single,walk,etc.) using discrete hidden Markov models that have been trained with the training symbol sequences for each kind of event.An experiment using videos of seven Major League Baseball games produced good results,demonstrating that this method can automatically classify events with high accuracy.

DOI： 10.3169/itej.61.1139

CiNii Books

researchmap
ハイブリッドモデルに基づく単視点ビデオデータにおける人間の歩行動作のトラッキング Reviewed

閔庚甫, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 WIT2007-24 ( No. ) pp. 47 - 52 2007.8

　More details

Language：Japanese

CiNii Books

researchmap
時系列データに対するデータ駆動型アプローチに基づく野球放送の頑健なシーン認識 Reviewed

安藤亮一, 篠田浩一, 古井貞熙, 望月貴裕

画像の認識・理解シンポジウム（MIRU 2007）IS-1-17 Vol. ( No. ) pp. 570 - 575 2007.7

　More details

researchmap
多段SVMを用いた頑健な動画ショット境界検出 Reviewed

宮村祐一, 中村太一, 篠田浩一, 古井貞熙

画像の認識・理解シンポジウム（MIRU 2007）IS-2-19 Vol. ( No. ) pp. 815 - 820 2007.7

　More details

researchmap
野球放送のためのデータ駆動型アプローチを用いた得点シーン検出 Reviewed

石原一樹, 安藤亮一, 篠田浩一, 古井貞煕, 望月貴裕

第13回画像センシングシンポジウム予稿集 Vol. ( No. ) pp. 513 - 518 2007.6

　More details

researchmap
Robust Scene Recognition Using Scene Context Information for Video Contents Reviewed

Koichi Shinoda, Ryoichi Ando, Sadaoki Furui, Takahiro Mochizuki

Proc. International Symposium on Large-Scale Knowledge Resources(LKR2007) Vol. ( No. ) pp. 107 - 112 2007.3

　More details

researchmap
Comparative Study on Robust Speech Recognition against Nonstationary Noise in the Home Environment Reviewed

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

Proc. Symposium on Large-Scale Knowledge Resources(LKR2007) Vol. ( No. ) pp. 175 - 178 2007.3

　More details

researchmap
Presentation Scene Retrieval Exploiting Features in Videos Including Pointing and Speech Information Reviewed

Takashi Kobayashi, Wataru Nakano, Haruo Yokota, Koichi Shinoda, Sadaoki Furui

Proc. Symposium on Large-Scale Knowledge Resources(LKR2007). Vol. ( No. ) pp. 95 - 100 2007.3

　More details

researchmap
スライド資料を用いた講義音声認識のための言語モデル適応 Reviewed

山崎裕紀, 岩野公司, 篠田浩一, 古井貞熙, 横田治夫

日本音響学会2007年春季講演論文集 Vol. ( 3-9-8 ) pp. 79 - 80 2007.3

　More details

researchmap
Speech recognition using FHMMS robust against nonstationary noise Reviewed

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 ( No. ) 1029 - + 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Automatic Estimation of Scaling Factors Among Probabilistic Models in Speech Recognition Reviewed

Tadashi Emori, Yoshifumi Onishi, Koichi Shinoda

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 Vol. ( No. ) 1229 - + 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition Reviewed

Hiroki Yamazaki, Koji Iwano, Koichi Shinoda, Sadaoki Furui, Haruo Yokota

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 Vol. ( No. ) 89 - 92 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Predictive Minimum Bayes Risk Classification for Robust Speech Recognition Reviewed

Jen-Tzung Chien, Koichi Shinoda, Sadaoki Furui

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 Vol. ( No. ) 437 - + 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
A robust scene recognition system for baseball broadcast using data-driven approach Reviewed

Ryoichi Ando, Koichi Shinoda, Sadaoki Furui, Takahiro Mochizuki

Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR 2007 Vol. ( No. ) 186 - 193 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/1282280.1282312

Scopus

researchmap
Semi-synchronous speech and pen input Reviewed

Yasushi Watanabe, Kenji Iwata, Ryuta Nakagawa, Koichi Shinoda, Sadaoki Furui

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 Vol. ( No. ) 409 - + 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
講義音声認識における講義スライド情報の利用 Reviewed

山崎裕紀, 岩野公司, 篠田浩一, 古井貞熙, 横田治夫

電子情報通信学会技術報告 Vol. SP2006-122 ( No. ) pp. 43 - 48 2006.12

　More details

researchmap
Tokyo Tech's TRECVID2006 Notebook Reviewed

Taichi Nakamura, Yuichi Miyamura, Koichi Shinoda, Sadaoki Furui

Proc. TRECVID Workshops Vol. ( No. ) pp. 2006.11

　More details

researchmap
Multimedia Information Retrieval Using Pattern Recognition Techniques Reviewed

Koichi Shinoda

Proc. Microsfot Research Asia IJARC 2nd Symposium Vol. ( No. ) pp. 2006.11

　More details

researchmap
音声とペンの準同期入力に対するマルチモーダル認識 Reviewed

岩田憲治, 渡邉康司, 中川竜太, 篠田浩一, 古井貞熙

日本音響学会 2006年秋季講演論文集 Vol. ( No. ) pp. 45 - 46 2006.9

　More details

researchmap
音声とペン入力の同時入力に対する認識方式の検討 Reviewed

渡邉康司, 岩田憲治, 中川竜太, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 Vol. SP2006-19 ( No. ) pp. 49 - 54 2006.6

　More details

researchmap
動画像インデクシングのためのシーン時系列の確率的言語モデル Reviewed

安藤亮一, 篠田浩一, 古井貞熙, 望月貴裕

第12回画像センシングシンポジウム予稿集 Vol. ( No. ) pp. 513 - 518 2006.6

　More details

researchmap
ビデオ画像における人間の歩行動作の3次元トラッキング Reviewed

閔庚甫, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 Vol. PRMU2006-2 ( No. ) pp. 7 - 12 2006.5

　More details

researchmap
野球中継番組を対象とした音響情報を用いたシーン認識 Reviewed

宮崎太郎, 中川弘充, 中川竜太, 岩野公司, 篠田浩一, 古井貞熙

日本音響学会2006年春季講演論文集 Vol. ( No. ) pp. 19 - 20 2006.3

　More details

researchmap
FHMM for Robust Speech Recognition in Home Environment Reviewed

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

Proc. International Symposium on Large-Scale Knowledge Resources (LKR) Vol. ( No. ) pp. 129 - 132 2006.3

　More details

researchmap
Family Adaptation of Factorial HMMs for Personal Robots Reviewed

Betkowska Agnieszka, Koichi Shinoda, Sadaoki Furui

Vol. ( No. ) pp. 135 - 136 2006.3

　More details

researchmap
Robust Scene Recognition for Baseball Broadcast Reviewed

Koichi Shinoda, Sadaoki Furui

Proc. International Symposium on Large-Scale Knowledge Resources (LKR) Vol. ( No. ) pp. 91 - 94 2006.3

　More details

researchmap
基本周波数情報のグラフィカルモデリングによる音声認識 Reviewed

小林隆二, 篠田浩一, 古井貞熙

2006年日本音響学会春季講演論文集 Vol. ( No. ) pp. 39 - 40 2006.3

　More details

researchmap
基本周波数情報を用いたダイナミックベイジアンネットワークによる音声認識 Reviewed

小林隆二, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 Vol. SP2005-197 ( No. ) pp. 77 - 82 2006.3

　More details

researchmap
Towards optimal Bayes decision for speech recognition Reviewed

Jen-Tzung Chien, Chih-Hsien Huang, Koichi Shinoda, Sadaoki Furui

2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13 Vol. ( No. ) 45 - 48 2006

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Robust scene recognition using language models for scene contexts Reviewed

Ryoichi Ando, Koichi Shinoda, Sadaoki Furui, Takahiro Mochizuki

Proceedings of the ACM International Multimedia Conference and Exhibition Vol. ( No. ) 99 - 106 2006

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/1178677.1178693

Scopus

researchmap
音声と手書き文字の同時入力によるインターフェースの検討 Reviewed

中川竜太, 小林唯, 小林隆二, 篠田浩一, 古井貞熙

日本音響学会2005年秋季講演論文集 Vol. ( No. 1-7-11 ) pp. 13 - 14 2005.9

　More details

researchmap
Recognition of speech in non-stationary noise using Factorial HMMs Reviewed

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

Vol. ( No. 3-7-25 ) pp. 151 - 152 2005.9

　More details

researchmap
Speech Recognition System in NEC Reviewed

Takao Watanabe, Kaichiro Hatazaki, Ken-ichi Iso, Ryosuke Isotani, Koichi Shinoda, Keizaburo Takagi

Spoken Language Systems Vol. 2005.9

　More details

Language：English Publisher：Spoken Language Systems

researchmap
隠れマルコフモデルを用いた野球放送の自動的インデクシング Reviewed

Nguyen Huu Bach, 篠田浩一, 古井貞煕

画像の認識・理解シンポジウム(MIRU2005) Vol. ( No. ) pp. 1113 - 1120 2005.7

　More details

researchmap
隠れマルコフモデルとMLLRによるゲーム適応を用いた野球放送の自動インデクシング Reviewed

Nguyen Huu Bach, 篠田浩一, 古井貞煕

第11回画像センシングシンポジウム講演論文集 Vol. ( No. ) pp. 7 - 10 2005.6

　More details

researchmap
音声と手書き文字の同時入力インターフェース Reviewed

中川竜太, 小林唯, 小林隆二, 篠田浩一, 古井貞熙

情報処理学会研究報告 Vol. 2005 ( No. SLP-56 ) pp. 29 - 34 2005.5

　More details

researchmap
弁別素性のグラフィカルモデリングによる音声認識 Reviewed

小林隆二, 篠田浩一, 古井貞熙

日本音響学会2005年春季講演論文集 Vol. ( No. 1-5-21 ) pp. 41 - 42 2005.3

　More details

researchmap
Model optimization for noise discrimination in home environment Reviewed

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

Proc. Symposium on Large-Scale Knowledge Resources (LKR2005) Vol. ( No. ) pp. 167 - 170 2005.3

　More details

researchmap
Noise discrimination using models with different structures Reviewed

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

Vol. ( No. 2-Q-7 ) pp. 111 - 112 2005.3

　More details

researchmap
Scene recognition using Hidden Markov Models for video database Reviewed

Koichi Shinoda, Nguyen Huu Bach, Sadaoki Furui, Naoki Kawai

Proc. Symposium on Large-Scale Knowledge Resources(LKR2005) Vol. ( No. ) pp. 107 - 110 2005.3

　More details

researchmap
Speech Recognition System in NEC Reviewed

Koichi Shinoda

Spoken Language Systems 2005

　More details

Language：English Publisher：Spoken Language Systems

researchmap
Robust highlight extraction using multi-stream Hidden Markov Models for baseball video Reviewed

NH Bach, K Shinoda, S Furui

2005 International Conference on Image Processing (ICIP), Vols 1-5 Vol. ( No. ) 2857 - 2860 2005

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Robust Acoustic Modeling for Speech Recognition Reviewed

Koichi Shinoda

Proc. International Workshop "Beyond HMM" Vol. SP2004-82 ( No. ) pp. 7 - 12 2004.12

　More details

researchmap
隠れマルコフモデルを用いた野球放送の自動的インデクシング Reviewed

Nguyen Huu Bach, 篠田浩一, 古井貞熙

電子情報通信学会技術研究報告 Vol. PRMU2004 ( No. 107 ) pp. 13 - 19 2004.11

　More details

researchmap
隠れマルコフモデルを用いた野球放送の自動インデキシング Reviewed

Nguyen Huu Bach, 篠田浩一, 古井貞煕

11 - 12 2004.9

　More details

researchmap
A study of noise discrimination for personal robots Reviewed

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

ASJ autumn meeting 2004 Vol. ( No. 1-1-6 ) pp. 11 - 12 2004.9

　More details

researchmap
確率モデルによる音声認識のための話者適応化技術（サーベイ論文） Reviewed

篠田浩一

電子情報通信学会論文誌 vol. J87-D ( no. 2 ) pp. 371 - 386 2004.4

　More details

researchmap
確率モデルによる多声音楽演奏のMIDI信号のリズム認識 Reviewed

武田晴登, 篠田浩一, 嵯峨山茂樹

情報処理学会論文誌 Vol. 45 ( No. 3 ) pp. 670 - 679 2004.3

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap
動的特徴量を用いたHMMによる連続動作認識 Reviewed

Nguyen Huu Bach, 篠田浩一, 古井貞熙

電子情報通信学会 2004年総合大会 Vol. ( No. D-12-120 ) pp. 286 - 286 2004.3

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
手書き文字の準同期入力を併用した音声認識手法の予備検討 Reviewed

市屋剛, 中川竜太, 篠田浩一, 古井貞熙

電子情報通信学会 2004年総合大会 Vol. ( No. D-14-007 ) pp. 148 - 148 2004.3

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
パーソナルロボット向けの家庭内雑音に頑健な音声認識の検討 Reviewed

藤崎宣彦, 篠田浩一, 岩野公司, 古井貞熙

日本音響学会2003年秋季研究発表会講演論文集 Vol. 1 ( No. 1-6-11 ) pp. 21 - 22 2003.9

　More details

researchmap
確率モデルによる多声楽曲MIDI演奏からの楽譜推定 Reviewed

篠田浩一

情報処理学会研究報告 Vol. 2003-MUS-50 ( No. ) pp. 27 - 32 2003.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

This paper proposes an automatic transcription method for polyphonic musical performances in MIDI signals. Pitches and rhythms are basic information which is necessary to write scores. From the MIDI signals of human performances,we just need to recognize rhythms from time information in MIDI signals because pitches are already known in MIDI signals. We propose a method on rhythm recognition, especially targeting at polyphonic music performances. In the proposed probabilistic models for rhythm recognition,we use rhythm vectors,which are obtained form IOI(Inter-Onset Interval) sequence across the multi voices, as a feature of probabilistic models.The values of parameters in our model can be optimized by the learning from scores and human performances.In experiments on performances by 5 piano players with an electronic piano, we obtained score restoration rates of 92.2% for "Fuga"and 52.1% for "Traumerai"

CiNii Books

researchmap
モデル適応と音響尤度補正を併用した雑音に頑健な音声認識 Reviewed

山本仁, 西本卓也, 篠田浩一, 嵯峨山茂樹

日本音響学会平成15年春季研究発表会講演論文集 Vol. 1-4-18 ( No. ) pp. 41 - 42 2003.3

　More details

researchmap
ハーモニッククラスタリングによる多重音基本周波数推定 Reviewed

亀岡弘和, 西本卓也, 篠田浩一, 嵯峨山茂樹

日本音響学会平成15年春季研究発表会講演論文集 Vol. 3-7-3 ( No. ) pp. 837 - 838 2003.3

　More details

researchmap
ハーモニッククラスタリングによる多重音基本周波数推定アルゴリズム Reviewed

篠田浩一

情報処理学会研究報告 Vol. ( No. ) pp. 2003.3

　More details

researchmap
HMMを用いた多声部MIDI信号からの楽譜復元 Reviewed

武田晴登, 西本卓也, 篠田浩一, 嵯峨山茂樹

日本音響学会平成15年春季研究発表会講演論文集 Vol. 3-7-4 ( No. ) pp. 839 - 840 2003.3

　More details

Language：Japanese

CiNii Books

researchmap
品詞情報と単語内位置情報を用いた話し言葉音声認識のための状態クラスタリング Reviewed

五十川賢造, 西本卓也, 篠田浩一, 嵯峨山茂樹

日本音響学会平成15年春季研究発表会講演論文集 Vol. 1-4-4 ( No. ) pp. 7 - 8 2003.3

　More details

researchmap
正規分布の尤度補正による突発性雑音に頑健な音声認識 Reviewed

山本仁, 篠田浩一, 嵯峨山茂樹

日本音響学会平成14年秋季研究発表会講演論文集 Vol. 1-9-10 ( No. ) pp. 2002.9

　More details

researchmap
リズムベクトルを用いたHMMによる単旋律MIDI演奏の楽譜化 Reviewed

武田晴登, 篠田浩一, 嵯峨山茂樹

日本音響学会平成14年秋季研究発表会講演論文集 Vol. 1-1-5 ( No. ) pp. 2002.9

　More details

researchmap
スペクトル領域のDPマッチングによる自然楽器演奏の多重音解析 Reviewed

亀岡弘和, 篠田浩一, 嵯峨山茂樹

日本音響学会平成14年秋季研究発表会講演論文集 Vol. 1-1-2 ( No. ) pp. 2002.9

　More details

researchmap
周波数領域のDPマッチングによる自然楽器演奏の和音ピッチ推定 Reviewed

亀岡弘和, 篠田浩一, 嵯峨山茂樹

情報処理学会研究報告 Vol. 2002-MUS-46 ( No. ) pp. 17 - 22 2002.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

This report deals with an pitch estimation method for multi-pitch performances with natural instruments. First, musical audio signal stream is divided into segments. The length of each segment corresponds to that of each note. Secondly, in each segment. template matching using Dynamic Programming is applied between the input spectrogram and every spectrum template which is synthesized by tone templates. Each tone template is a representative spectrum of scale played by an instrument. In the experimental results, the accuracy of pitch estimation and chord estimation were 92% and 82% when using piano performances as musical data And when using violin ensemble, the accuracy of pitch estimation and chord estimation were 87% and 71%. The accuracy of proposed method was higher than that of a simple template matching method especially at lower pitch or when using string instruments.

CiNii Books

researchmap
リズムベクトルの概念に基づくMIDI演奏の音価認識 Reviewed

武田晴登, 篠田浩一, 嵯峨山茂樹

情報処理学会研究報告 Vol. 2002-MUS-46 ( No. ) pp. 23 - 28 2002.7

　More details

CiNii Books

researchmap
ガウス分布の補正による突発性雑音に頑健な音声認識 Reviewed

山本仁, 篠田浩一, 嵯峨山茂樹

電子情報通信学会技術研究報告 Vol. SP2002-45 ( No. ) pp. 2002.6

　More details

researchmap
LSP周波数間隔とCSM強度対を用いた音声認識の検討 Reviewed

五十川賢造, 篠田浩一, 嵯峨山茂樹

電子情報通信学会技術研究報告 Vol. SP2002-42 ( No. ) pp. 2002.6

　More details

researchmap
Vocal tract length normalization using rapid maximum-likelihood estimation for speech recognition Reviewed

Tadashi Emori, Koichi Shinoda

Systems and Computers in Japan 33 ( 5 ) 30 - 40 2002.5

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1002/scj.1125

Scopus

researchmap
MDL基準を用いたHMMサイズの削減 Reviewed

篠田浩一, 磯健一

日本音響学会2002年春季研究発表会講演論文集 2-5-3 ( No. ) pp. 79 - 80 2002.3

　More details

researchmap
話し言葉音声の認識における間投詞の話者性を考慮した言語モデル Reviewed

板垣貴裕, 篠田浩一, 嵯峨山茂樹

第2回話し言葉の科学と工学ワークショップ講演予稿集 Vol. ( No. ) pp. 79 - 84 2002.2

　More details

researchmap
Efficient reduction of Gaussian components using MDL criterion for HMM-based speech recognition Reviewed

K Shinoda, K Iso

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS vol. 101 ( no. 352 ) 869 - 872 2002

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
CSM強度対を用いた音声認識 Reviewed

五十川賢造, 篠田浩一, 嵯峨山茂樹

日本音響学会平成14年春季研究発表会講演論文集 Vol. 1-5-4 ( No. ) pp. 7 - 8 2002

　More details

researchmap
Efficient reduction of Gaussian components using MDL criterion for HMM-based speech recognition Reviewed

K Shinoda, K Iso

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS Vol. ( No. ) 869 - 872 2002

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
話者適応(サーベイ) Reviewed

篠田浩一

第3回音声言語シンポジウム講演論文集 Vol. ( No. ) pp. 2001.12

　More details

researchmap
A Study on robustness against data insufficiency for speech recognition Reviewed

Koichi Shinoda

Ph.D Thesis, Tokyo Institute of Technology 2001.3

　More details

researchmap
A study on robustness against data insufficiency for speech recognition Reviewed

Koichi Shinoda

2001.2

　More details

Language：English Publishing type：Doctoral thesis

researchmap
音声認識のためのMDL基準を用いた効果的なガウス数削減 Reviewed

篠田浩一, Dieu Tran, 磯健一

信学技報 101 ( 352 ) pp. 69 - 76 2001

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

A method is proposed to reduce the number of Gaussian components in continuous density hidden Markov models (HMMs). As its initial model, the method employs a well-trained, large-sized HMM in which the components of each state's Gaussian mixture probability density function are clustered into a binary tree. For each state, a subset of Gaussian components is chosen from the Gaussian tree on the basis of the minimum description length (MDL) criterion. By varying the penalty coefficient for large size models in the MDL criterion, it is possible to obtain the total number of Gaussian components desired for smaller models. In our experimental evaluations, the proposed method successfully reduced the number of Gaussian components by 75%, with only 1% degradation in recognition accuracy.

CiNii Books

researchmap
Analytic Methods for Acoustic Model Adapation: A Review Reviewed

S. Sagayama, K. Shinoda, M. Nakai, H. Shimodaira

Proc. Isca ITR-Workshop2001 Vol. ( No. ) pp. 67 - 76 2001

　More details

researchmap
Rapid Vocal Tract Length Normalization using Maximum Likelihood Estimation Reviewed

T. Emori, K. Shinoda

Proc. EuroSpeech2001 Vol. ( No. ) pp. 1649 - 1652 2001

　More details

researchmap
音声認識のための高速最ゆう推定を用いた声道長正規化 Reviewed

江森正, 篠田浩一

電子情報通信学会技術研究報告(第1回音声言語シンポジウム) Vol. SP99-101 ( No. 108 ) pp. 49 - 54 1999.12

　More details

researchmap
MDL基準を用いた音声認識単位の自動生成 Reviewed

篠田浩一

視聴覚情報研究会(AVIRG)例会予稿 Vol. ( No. ) pp. 1999.9

　More details

researchmap
Acoustic Model Adaptation using Structural Bayes Approach Reviewed

Koichi Shinoda, Chin-Hui Lee

1998 ( 2 ) pp. 47 - 48 1998.9

　More details

Language：English

CiNii Books

researchmap
Unsupervised adaptation using structural Bayes approach Reviewed

K Shinoda, CH Lee

PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 Vol. ( No. ) 793 - 796 1998

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
事後確率最大化手法を用いた言語モデルの学習 Reviewed

花沢健, 篠田浩一

日本音響学会平成10年度秋季研究発表会講演論文集 Vol. 2-1-21 ( No. ) pp. 1998

　More details

researchmap
Acoustic modeling based on the MDL criterion for speech recognition Reviewed

K. Shinoda, T. Watanabe

Proc. EuroSpeech-97 Vol. ( No. 1 ) pp. 99 - 102 1997

　More details

researchmap
Structural MAP speaker adaptation using hierarchical priors Reviewed

K Shinoda, CH Lee

1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS Vol. ( No. ) 381 - 388 1997

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Unsupervised and incremental speaker adaptation under adverse environmental conditions Reviewed

K Takagi, K Shinoda, H Hattori, T Watanabe

ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4 Vol. ( No. ) 2079 - 2082 1996

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
情報量基準を用いた状態クラスタリングによる音響モデルの生成 Reviewed

篠田浩一, 渡辺隆夫

電子情報通信学会技術研究報告 Vol. SP96-79 ( No. ) pp. - 15 1996

　More details

researchmap
情報量基準を用いた音声認識単位の自動生成 Reviewed

篠田浩一, 渡辺隆夫

日本音響学会平成8年度秋季研究発表会講演論文集 Vol. 2-3-11 ( No. ) pp. - 70 1996

　More details

CiNii Books

researchmap
雑音環境を考慮した自律型話者適応化 Reviewed

高木啓三郎, 篠田浩一, 服部浩明, 渡辺隆夫

日本音響学会平成8年度春季研究発表会講演論文集 Vol. 1-5-24 ( No. ) pp. 1996

　More details

researchmap
Speaker adaptation with autonomous model complexity control by MDL principle Reviewed

K Shinoda, T Watanabe

1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6 Vol. ( No. ) 717 - 720 1996

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
パソコンソフト連続音声認識 Reviewed

篠田浩一, 坂井信輔, 磯健一, 畑崎香一郎, 渡辺隆夫, 水野正典

情報処理学会第50回(平成7年度前期)全国大会講演論文集 Vol. 2-465 ( No. ) pp. - 5 1995.3

　More details

CiNii Books

researchmap
HIGH SPEED SPEECH RECOGNITION USING TREE-STRUCTURED PROBABILITY DENSITY FUNCTION Reviewed

T WATANABE, K SHINODA, K TAKAGI, KI ISO

1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5 Vol. ( No. ) 556 - 559 1995

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
雑音環境の変動を考慮した話者適応化 Reviewed

高木啓三郎, 篠田浩一, 服部浩明, 渡辺隆夫

電子情報通信学会技術研究報告(第1回音声言語シンポジウム) Vol. SP95-100 ( No. ) pp. - 52 1995

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

This paper describes an unsupervised and incremental speaker adaptation method. In practical applications, environmental changes for each adaptation utterance often degrade the speaker adaptation performance. To cope with this problem, the proposed method first cancel the effect of environmental differences for each utterance by using REALISE. Then we apply speaker adaptation with autonomous control using tree structure. The evaluation experiment was carried out for the utterances recorded from six speakers under three conditions in a vehicle environment. For the purpose of simulating rapidly changing environment, any two of the adjacent utterances are presented. In different conditions to the adaptation process. The evaluation results were 79%, 90%, and 95% for without processing, only applying REALISE, and proposed method after 100 words adaptation, respectively.

CiNii Books

researchmap
HIGH SPEED SPEECH RECOGNITION USING TREE-STRUCTURED PROBABILITY DENSITY FUNCTION Reviewed

T WATANABE, K SHINODA, K TAKAGI, KI ISO

1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5 Vol. ( No. ) 556 - 559 1995

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Speaker adaptation with autonomous control using tree structure Reviewed

K. Shinoda, T. Watanabe

Proc. EuroSpeech-95 Vol. ( No. ) pp. 1143 - 1146 1995

　More details

researchmap
パソコン向けソフトウェア連続音声認識 Reviewed

磯健一, 篠田浩一, 坂井信輔, 畑崎香一郎, 渡辺隆夫, 水野正典

電子情報通信学会総合大会講演論文集 Vol. SD-9-4 ( No. ) pp. 1995

　More details

researchmap
木構造化された確率分布を用いた話者適応化 Reviewed

篠田浩一, 渡辺隆夫

日本音響学会平成7年度春季研究発表会講演論文集 Vol. 2-5-10 ( No. ) pp. - 50 1995

　More details

CiNii Books

researchmap
記述長最小原理を用いた話者適応化 Reviewed

篠田浩一, 渡辺隆夫

日本音響学会平成7年度秋季研究発表会講演論文集 Vol. 3-2-12 ( No. ) pp. 1995

　More details

researchmap
SPEAKER ADAPTATION USING SPECTRAL INTERPOLATION FOR SPEECH RECOGNITION Reviewed

K SHINODA, KI ISO, T WATANABE

ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE 77 ( 10 ) 1 - 11 1994.10

　More details

Language：English Publishing type：Research paper (scientific journal)

Web of Science

researchmap
大語彙音声入力装置の開発 Reviewed

古賀真二, 塚田聡, 篠田浩一, 野口淳, 畑崎香一郎, 渡辺隆夫, 友岡靖夫, 赤井善裕, 幅崎直行, 羽金広

電子情報通信学会秋季大会講演論文集 Vol. D-392 ( No. ) pp. 1994

　More details

researchmap
Unsupervised speaker adaptation for speech recognition using demi-syllable HMM Reviewed

K. Shinoda, T. Watanabe

Proc. ICSLP-94 Vol. ( No. ) pp. 435 - 438 1994

　More details

researchmap
パソコン向けソフトウェア連続音声認識 Reviewed

篠田浩一, 坂井信輔, 磯健一, 畑崎香一郎, 渡辺隆夫, 水野正典

日本音響学会平成6年度秋季研究発表会講演論文集 Vol. 2-8-3 ( No. ) pp. - 44 1994

　More details

CiNii Books

researchmap
半音節HMMを用いた音声認識のための教師なし適応化 Reviewed

篠田浩一, 渡辺隆夫

日本音響学会平成6年度春季研究発表会講演論文集 Vol. 3-7-8 ( No. ) pp. 1994

　More details

researchmap
半音節単位認識による大語彙音声入力装置の開発 Reviewed

古賀真二, 篠田浩一, 高木啓三郎, 渡辺隆夫, 吉田和永, 塚田聡

日本音響学会平成6年度秋季研究発表会講演論文集 Vol. 2-8-7 ( No. ) pp. - 400 1994

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
パソコン向けソフトウェア音声認識 Reviewed

磯健一, 高木啓三郎, 篠田浩一, 山田栄子, 服部浩明, F. Ehsani, 野口淳, 古賀真二, 畑崎香一郎, 渡辺隆夫

日本音響学会平成5年度秋季研究発表会講演論文集 Vol. 2-Q-21 ( No. ) pp. - 190 1993

　More details

CiNii Books

researchmap
パソコン向け音声認識ソフトウエア Reviewed

畑崎香一郎, 磯健一, 高木啓三郎, 篠田浩一, F. Ehsani, 野口淳, 坂井信輔, 山田栄子, 服部浩明, 渡辺隆夫

情報処理学会平成5年度後期全国大会講演論文集 Vol. 2-375 ( No. ) pp. 1993

　More details

researchmap
木構造確率分布を用いた音声認識 Reviewed

渡辺隆夫, 篠田浩一, 高木啓三郎, 山田栄子, 服部浩明

日本音響学会平成5年度秋季研究発表会講演論文集 Vol. 1-8-7 ( No. ) pp. 1993

　More details

CiNii Books

researchmap
音声認識のための入力環境の適応化 Reviewed

高木啓三郎, 篠田浩一, 渡辺隆夫

日本音響学会平成5年度春季研究発表会講演論文集 Vol. 1-4-22 ( No. ) pp. 1993

　More details

researchmap
音声認識のためのタスク適応化 Reviewed

篠田浩一, 渡辺隆夫

日本音響学会平成4年度春季研究発表会講演論文集 Vol. 1-P-15 ( No. ) pp. 1992

　More details

researchmap
英語不特定話者連続音声認識の試作 Reviewed

磯谷亮輔, 渡辺隆夫, 畑崎香一郎, 永野敬子, 篠田浩一, 田海真一, M. Chong

日本音響学会平成4年度春季研究発表会講演論文集 Vol. 1-P-10 ( No. ) pp. 1992

　More details

researchmap
話者適応化における学習語彙依存性の改善 Reviewed

篠田浩一, 渡辺隆夫

日本音響学会平成4年度秋季研究発表会講演論文集 Vol. 2-5-7 ( No. ) pp. 1992

　More details

researchmap
SPEAKER ADAPTATION FOR DEMI-SYLLABLE BASED CONTINUOUS DENSITY HMM Reviewed

K SHINODA, K ISO, T WATANABE

ICASSP 91, VOLS 1-5 Vol. ( No. ) 857 - 860 1991

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
半音節HMMによる音声認識のための話者適応 Reviewed

篠田浩一, 磯健一, 渡辺隆夫

日本音響学会平成2年度秋季研究発表会講演論文集 Vol. 1-8-12 ( No. ) pp. 1990

　More details

researchmap
DISCOVERY OF THE QUASI-PERIODIC OSCILLATIONS FROM THE X-RAY PULSAR X1627-673 Reviewed

K SHINODA, T KII, K MITSUDA, F NAGASE, Y TANAKA, K MAKISHIMA, N SHIBAZAKI

PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF JAPAN 42 ( 2 ) L27 - L32 1990

　More details

Language：English

Web of Science

researchmap
Speaker adaptation for demi-syllable based speech recognition using continuous HMM, Reviewed

K. Shinoda, K. Iso, T. Watanabe

Proc. of ICSLP-90 Vol. ( No. ) pp. 261 - 264 1990

　More details

researchmap

▼display all

Books

音響学講座 7 音声(下)

岩野公司, 河原達也, 篠田浩一, 伊藤彰則, 増村亮, 小川哲司, 駒谷和範

コロナ社 2023.1

　More details

Language：Japanese Book type：Scholarly book

researchmap
「音声認識」(機械学習プロフェッショナルシリーズ)

篠田浩一

株式会社講談社 2017.12

　More details

Language：Japanese Book type：Textbook, survey, introduction

researchmap
LTC 2009, LNAI 6562

Agnieszka Betkowska Cavalcante, Koichi Shinoda, Sadaoki Furui（Robust speech recognition in the car environment）

Springer 2011.7

　More details

Responsible for pages：24 - 34 Language：English Book type：Scholarly book

researchmap
パターン認識と機械学習・ベイズ理論による統計的予測

元田浩, 栗田多喜夫, 樋口知之, 松本裕治, 村田昇, 赤穂昭太郎, 神嶌敏弘, 杉山将, 小野田崇, 池田和司, 鹿島久嗣, 賀沢秀人, 中島伸一, 竹内純一, 持橋大地, 小山聡, 井手剛, 篠田浩一, 山川宏（パターン認識と機械学習（下）ベイズ理論による統計的予測）

丸善出版 2008.7 （ ISBN:9784431100133 ）

　More details

Responsible for pages：433 Language：Japanese Book type：Scholarly book

researchmap
パターン認識と機械学習・ベイズ理論による統計的予測

元田浩, 栗田多喜夫, 樋口知之, 松本裕治, 村田昇, 赤穂昭太郎, 神嶌敏弘, 杉山将, 小野田崇, 池田和司, 鹿島久嗣, 賀沢秀人, 中島伸一, 竹内純一, 持橋大地, 小山聡, 井手剛, 山川宏, 篠田浩一（パターン認識と機械学習（上）：ベイズ理論による統計的予測）

シュプリンガー・ジャパン 2007.12 （ ISBN:9784431100317 ）

　More details

Language：Japanese Book type：Scholarly book

researchmap
Spoken Language Systems Reviewed

Takao Watanabe, Kaichiro Hatazaki, Ken-ichi Iso, Ryosuke Isotani, Koichi Shinoda, Keizaburo Takagi（Speech Recognition System in NEC）

Ohmsha 2005.9 （ ISBN:427490637X ）

　More details

Responsible for pages：34 - 46 Language：English Book type：Scholarly book

researchmap
Spoken Language Systems

Koichi Shinoda（Speech Recognition System in NEC）

2005

　More details

Language：English Book type：Scholarly book

researchmap

▼display all

MISC

MITSuME望遠鏡画像に対する深層学習を用いた突発天体検知システムの構築

伊藤, 尚泰, Ito, Naohiro, 村田, 勝寛, Murata, Katsuhiro, 細川, 稜平, Hosokawa, Ryohei, 笹田, 真人, Sasada, Mahito, 庭野, 聖史, Niwano, Masafumi, 谷津, 陽一, Yatsu, Yoichi, 河合, 誠之, Kawai, Nobuyuki, 篠田, 浩一, Shinoda, Koichi, 井上, 中順, Inoue, Nakamasa, 伊藤, 亮介, Itoh, Ryosuke, 下川辺, 隆史, Shimokawabe, Takashi

日本天文学会2022年秋季年会講演予稿集 2022.9

　More details

Language：Japanese Publisher：公益社団法人日本天文学会

identifier:oai:t2r2.star.titech.ac.jp:50636760

CiNii Research

researchmap
MITSuME望遠鏡画像に対する深層学習を用いた突発天体検知システムの構築

伊藤尚泰, 村田勝寛, 細川稜平, 笹田真人, 庭野聖史, 谷津陽一, 河合誠之, 篠田浩一, 井上中順, 伊藤亮介, 下川辺隆史

日本天文学会年会講演予稿集 2022 2022

　More details

J-GLOBAL

researchmap
論文誌10年の記録 2010年～2019年

篠田浩一, 瀧口吉郎, 小川一人, 佐野雅規, 内藤整, 藤井俊彰

映像情報メディア学会誌 2020年9月号 Vol. 74 ( No. 5 ) 813~817 2020.9

　More details

Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (scientific journal)

researchmap
深層学習を用いたMITSuME望遠鏡画像からの突発天体検知(2)

飯田康太, 谷津陽一, 村田勝寛, 橘優太朗, 河合誠之, LONG Yan, 篠田浩一, 井上中順, 下川辺隆史

日本天文学会年会講演予稿集 2020 2020

　More details

J-GLOBAL

researchmap
q-Gaussian Mixture Models for Video Semantic Indexing

Nakamasa Inoue, Koichi Shinoda

IPSJ SIG Notes. CVIM 2012 ( 5 ) 1 - 6 2012.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Gaussian mixture models (GMMs) which extend the bag-of-visual-words (BoW) to a probabilistic frame work have been proved to be effective for image and video semantic indexing. Recently, the ^-Gaussian distribution, which is derived in the non-extensive statistics, has been shown to be useful for representing patterns in many complex systems in physics such as fractals and cosmology. We propose q-Gaussian mixture models (q-GMMs),which are mixture models of ^-Gaussian distributions, for image and video semantic indexing. It has a parameter q to control its tail-heaviness. The long-tailed distributions obtained for q > 1 are expected to effectively represent complexly correlated data, and hence, to improve robustness against outliers. In our experiments, our proposed method outperformed the BoW method and achieved 49.4% and 10.9% in Mean Average Precision on the PASCALVOC 2010 dataset and the TRECVID 2010 Semantic Indexing dataset, respectively.

CiNii Books

researchmap
Speech modeling using committee-based active and semi-supervised learning

Takuya Tsutaoka, Koichi Shinoda

IPSJ SIG Notes 2012 ( 22 ) 1 - 6 2012.1

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We propose a combination of active learning and semi-supervised learning using committee-based approach for large vocabulary continuous speech recognition. In this method, utterances for manual/automatic transcription are selected by disagreements among the recognition results obtained from recognizers. Our method was evaluated by using simulated speech data in the Corpus of Spontaneous Japanese. It was shown that proposed method can achieve higher recognition accuracy with lower transcribing costs than random sampling. We also investigate the data selection criterion in semi-supervised learning.

CiNii Books

researchmap
Analysis of spectral reduction in noisy speech and its application to noise robust speech recognition

41 ( 2 ) 117 - 122 2011.3

　More details

Language：Japanese

CiNii Books

researchmap
Active learning using multiple recognizers for speech recognition

Hamanaka Yuzo, Emori Tadashi, Koshinaka Takafumi, Shinoda Koichi, Furui Sadaoki

IEICE technical report 109 ( 355 ) 19 - 23 2009.12

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

researchmap
Active learning using multiple recognizers for speech recognition

HAMANAKA YUZO, EMORI TADASHI, KOSHINAKA TAKAFUMI, SHINODA KOICHI, FURUI SADAOKI

2009 ( 4 ) 1 - 5 2009.12

　More details

Language：Japanese

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00067046/
Automatic recognition of Indonesian declarative questions and statements using polynomial coefficients of the pitch contours

Nazrul Effendy, Koichi Shinoda, Sadaoki Furui, Somchai Jitapunkul

Acoustical Science and Technology 30 ( 4 ) 249 - 256 2009

　More details

Language：English

DOI： 10.1250/ast.30.249

Scopus

researchmap
確率モデルによる音声認識のための話者適応化技術（サーベイ論文）

篠田浩一

電子情報通信学会論文誌 J87-D-Ⅱ ( 2 ) 371 - 386 2009

　More details

researchmap
Efficient estimation method of scaling factors among probabilistic models in speech recognition

EMORI Tadashi, ONISHI Yoshifumi, SHINODA Koichi

IPSJ SIG Notes 2007 ( 129 ) 49 - 53 2007.12

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We propose a new efficient method for estimating scaling factors among probabilistic models in speech recognition. Most speech recognition systems consist of more than one model, include an acoustic and a language model, and require scaling factors to balance probabilities among them. The scaling factors are conventionally optimized in preliminary recognition tests using data for development. In our proposed method, the scaling factors are regarded as parameters of a log-linear model, and they are estimated using a gradient-ascent method based on the maximum a posteriori probability criterion. Posterior probability is computed using word-lattices generated by a speech recognizer. We employ an iteration technique which repeats a word-lattice-generation/scaling-factor-estimation process, and the resulting scaling factor estimation is robust with respect to the changes in initial values. In an experimental evaluation of our method by LVCSR using Japanese dialogue speech data, estimated scaling factors were nearly identical to optimal values obtained in a greedy grid search. We have also confirmed that estimated scaling factors changed little with variations in initial values.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00056762/
Speaker Selection for Unsupervised Speaker Adaptation based on HMM Sufficient Statistics

TANI Masahiro, EMORI Tadashi, OHNISHI Yoshifumi, KOSHINAKA Takafumi, SHINODA Koichi

IPSJ SIG Notes 2007 ( 129 ) 85 - 89 2007.12

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We propose a new speaker selection method for the unsupervised speaker adaptation based on HMM sufficient statistics. The adaptation technique of using HMM sufficient statistics has been proposed as one of the rapid unsupervised speaker adaptation techniques in speech recognition. The procedure is as follows:First the training speakers acoustically close to the test speaker are selected. Then, the acoustic model is trained using the HMM sufficient statistics of these selected training speakers. In this technique, the number of selected training speakers is always constant.In our proposed speaker selection method, the number of speakers is determined by the distances between the test speaker and each training speaker. In our recognition experiments using spoken dialogue data, the proposed method improved word accuracy by 0.74 points. It was confirmed that the proposed method particularly effective when there are not many training speakers around the test speaker in acoustic space.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00056768/
Robust speech recognition using factorial HMMs for home environments

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING 2007 ( 20593 ) 2007

　More details

Language：English

DOI： 10.1155/2007/20593

Web of Science

researchmap
Robust speech recognition using factorial HMMs for home environments

Agnieszka Betkowska, Koichi Shinoda, Sadaoki Furui

Eurasip Journal on Advances in Signal Processing 2007 ( 20593 ) 2007

　More details

Language：English

DOI： 10.1155/2007/20593

Scopus

researchmap
Using presentation slide information for lecture speech recognition

YAMAZAKI Hiroki, IWANO Koji, SHINODA Koichi, FURUI Sadaoki, YOKOTA Haruo

IPSJ SIG Notes 2006 ( 136 ) 221 - 226 2006.12

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We propose a dynamic language model adaptation method for lecture speech recognition in which the information of text on slides for lectures is used. The speech data corresponding to each slide are recognized with a language model adapted to them by using the slide texts as adaptation data. We evaluated the proposed method by using the speech data of three classroom courses in Japanese, and confirmed its effectiveness. The average speech recognition error was reduced by 3.1% by the global adaptation using all slides used in a cource. The error rates of recall and precision for keywords were also reduced by 21.5% and 13.8% respectively. Furthermore, we achieved the improvement of keyword detection performance by the adaptation using each slide locally. The error rates of recall and precision for keywords were reduced by 3.1% and 1.4% respectively from global adaptation.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00056867/
Robust scene extraction using multi-stream HMMs for baseball broadcast

Nguyen Hun Bach, Koichi Shinoda, Sadaoki Furui

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D ( 9 ) 2553 - 2561 2006.9

　More details

Language：English

DOI： 10.1093/ietisy/e89-d.9.2553

Web of Science

researchmap
Robust Scene Extraction Using Multi-Stream HMMs for Baseball Broadcast

Nguyen Huu Bach, Koichi Shinoda, Sadaoki Furui

IEICE Transactions on Information and Systems E89-D ( 9 ) 2553 - 2561 2006

　More details

DOI： 10.1093/ietisy/e89-d.9.2553

researchmap
音声情報処理技術の最先端: 2.統計的手法を用いた音声モデリングの高度化とその音声認識への応用 Reviewed

篠田浩一, 篠崎隆宏

情報処理学会学会誌 Vol. 45 ( No. 10 ) 1012 - 1019 2004.10

　More details

Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (scientific journal) Publisher：情報処理学会

researchmap
確率モデルによる音声認識のための話者適応化技術

篠田浩一

電子情報通信学会論文誌D-II J87 ( 2 ) 371 - 386 2004

　More details

researchmap
音声情報処理技術の最先端: 2.統計的手法を用いた音声モデリングの高度化とその音声認識への応用

篠田浩一, 篠崎隆宏

情報処理 45 ( 10 ) 1012 - 1019 2004

　More details

researchmap
確立モデルによる多声音楽演奏のMIDI信号のリズム認識

武田晴登, 篠田浩一, 嵯峨山茂樹

情報処理学会論文誌 45 ( 3 ) 670 - 679 2004

　More details

researchmap
Acoustic modeling using word contexts for spontaneous speech recognition

ISOGAWA Kenzo, SHINODA Koichi, SAGAYAMA Shigeki

IPSJ SIG Notes 2002 ( 121 ) 111 - 116 2002.12

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this paper, we study state clustering using word contexts for speech recognition. In spontaneous speech, poorly articulated words often cause recognition error. To improve the recognition performance, we add two questions used in the phonetical decision tree based state clustering. One is a question about parts of speech, and the other is a question about the position of phones within a word. To apply the question about parts of speech, we classify parts of speech into two classes based on the word's duration estimated by using the corpus of spontaneous speech. After making HMMs for each class, we carry out state clustering using a context desicion tree with the questions about the classes. To apply questions about the position, of phones within a word, we make HMMs for phones at the beginning of the word, those for phones at the ending of the word, and those for phones at the other positions, separately. Then we carry out state clustering using a context desicion tree with questions about phone positions. We carried out speech recognition experiments using CSJ(Corpus of Spontaneous Japanese). In the best case, the word accuracy improved by 2.4 points with the use of the former method, and it improved by 6.1 points with the use of the latter method.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00057285/
Acoustic modeling using word contexts for spontaneous speech recognition

ISOGAWA Kenzo, SHINODA Koichi, SAGAYAMA Shigeki

IEICE technical report. Speech 102 ( 529 ) 111 - 116 2002.12

　More details

Language：Japanese Publisher：一般社団法人電子情報通信学会

In this paper, we study state clustering using word contexts for speech recognition. In spontaneous speech, poorly articulated words often cause recognition error. To improve the recognition performance, we add two questions used in the phonetical decision tree based state clustering. One is a question about parts of speech, and the other is a question about the position of phones within a word. To apply the question about parts of speech, we classify parts of speech into two classes based on the word's duration estimated by using the corpus of spontaneous speech. After making HMMs for each...

CiNii Books

researchmap
A Rhythm Recognition Method using Rhythm Vectors

TAKEDA Haruto, SHINODA Koichi, SAGAYAMA Shigeki

IPSJ SIG Notes 2002 ( 63 ) 23 - 28 2002.7

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

This paper proposes a rhythm recognition method for MIDI signal performed by MIDI keyboard. An usual way of automatic transcription from MIDI signals is to play MIDI keyboard with metronome to perform in constant tempo and quantize the note durations in a resolution level which is given by the user. A new method proposed in this paper, however, does not require performer to obey the beats of metronome and can recognize rhythm pattern for automatic transcription. We define ratio of note durations as a new feature "Rhythm vector". Rhythm Vector and tempo variation are integrated in Hidden Mar...

J-GLOBAL

researchmap
A structural Bayes approach to speaker adaptation

K Shinoda, CH Lee

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 9 ( 3 ) 276 - 287 2001.3

　More details

Language：English

DOI： 10.1109/89.906001

Web of Science

researchmap
音声認識のための高速最ゆう推定を用いた声道長正規化

江森正, 篠田浩一

電子情報通信学会論文誌 Vol. J83-D-II ( No. 11 ) 2108 - 2117 2000.11

　More details

Language：Japanese

researchmap
MDL-based context-dependent subword modeling for speech recognition

K. Shinoda, T, Watanabe

Journal of Acoustic Society of Japan (E) 21 ( 2 ) 79 - 86 2000

　More details

DOI： 10.1250/ast.21.79

researchmap
MDL-based context-dependent subword modeling for speech recognition

SHINODA K.

J. Acoust. Soc. Jpn. (E) 21 ( 2 ) 79 - 86 2000

　More details

DOI： 10.1250/ast.21.79

researchmap
音声認識における自律的なモデル複雑度制御を用いた話者適応化

篠田浩一, 渡辺隆夫

電子情報通信学会和文論文誌D-II Vol. J79-D-II ( No. 12 ) 2054 1996.12

　More details

Language：Japanese

researchmap
音声認識のためのスペクトル内挿を用いた話者適応化

篠田浩一, 磯健一, 渡辺隆夫

電子情報通信学会論文誌A Vol. J77-A ( No. 2 ) 120 - 127 1994.2

　More details

Language：Japanese

researchmap

▼display all

Presentations

深層学習の音声認識への応用

篠田浩一

情報処理学会連続セミナー2017 第4回ディープラーニングの活用と基盤 2017.10

　More details

Event date： 2017.10

Language：Japanese Presentation type：Oral presentation (invited, special)

Venue：東京都千代田区

researchmap
Semantics for Large-Scale Multimedia: New Challenges for NLP International conference

Florian Metze, Koichi Shinoda

ACL2014 2014.6

　More details

Event date： 2014.6

Language：English Presentation type：Oral presentation (invited, special)

Venue：Baltimore

Thousands of videos are constantly being uploaded to the web, creating a vast resource, and an ever-growing demand for methods to make them easier to retrieve, search, and index. As it becomes feasible to extract both low-level as well as high-level (symbolic) audio, speech, and video features from this data, these need to be processed further, in order to learn and extract meaningful relations between these. The language processing community has made huge process in analyzing the vast amounts of very noisy text data that is available on the Internet. While it is very difficult to create semantic units of low-level image descriptors or non-speech sounds by themselves, it is comparatively easy to ground semantics in the word output of a speech recognizer, or text data that is loosely associated with a video. This creates an opportunity for NLP researchers to use their unique skills, and make significant contributions to solve tasks on data that is even noisier than web text, but (we argue) even more interesting and challenging.

researchmap
TRECVideo Semantic Indexing

Koichi Shinoda

Yahoo! Japan Research 2013.11

　More details

Event date： 2013.11

Language：English Presentation type：Oral presentation (invited, special)

researchmap
Statistical Video Semantic Indexing International conference

Koichi Shinoda

National Chiao Tung University (國立交通大学) 2013.10

　More details

Event date： 2013.10

Language：English Presentation type：Oral presentation (invited, special)

Venue：Hsinchu(新竹)

researchmap
Mobile or Cloud-based Photo/Video Analytics? International conference

Winston Hsu, Kunio Kashino, Keiichiro Hoashi, Koichi Shinoda, Duy-Dinh Le, Masanori Sugimoto

Greater Tokyo Area Multimedia/Vision Workshop 2012.8

　More details

Event date： 2012.8

Language：English Presentation type：Oral presentation (invited, special)

Venue：Tokyo

researchmap
映像検索評価ワークショップTRECVID

篠田浩一

キャノン（株）イノベイティブ技術フォーラム 2011.10

　More details

Event date： 2011.10

Language：Japanese Presentation type：Oral presentation (invited, special)

Venue：東京

researchmap
Multimedia Information Retrieval Using Statistical Approach International conference

Koichi Shinoda

Microsoft Research Asia 2009 Annual Workshop of IJARC 2009.7

　More details

Event date： 2009.7

Language：English Presentation type：Oral presentation (general)

researchmap
音声と手書き文字の同時入力インターフェース

情報処理学会研究報告 2005

　More details

researchmap
Automatically Estimating Number of Scenes for Rushes Summarization

TRECVID BBC Rushes Summarization Workshop (TVS 2008) at ACM Multimedia 2008

　More details

researchmap
Improvement of eigenvoice-based speaker adaptation by parameter space clustering

INTERSPEECH2008 2008

　More details

researchmap
Robust spoken term detection using combination of phone-based and word-based recognition

INTERSPEECH2008 2008

　More details

researchmap
Time-lag Adaptation for Semi-synchronous Speech and Pen Input

INTERSPEECH2008 2008

　More details

researchmap
Noise robust speech recognition using spectral subtraction and Fo information extracted by Hough transformation

2008

　More details

researchmap
Automatic score Scene Detection for Baseball Video

Symposium on Large-Scale Knowledge Resources(LKR2008) 2008

　More details

researchmap
Initial Evaluation of the Drivers' Japanese Speech Corpus in a Car Environment

2008

　More details

researchmap
音声認識のための複数の認識器を利用した能動学習

情報処理学会研究報告 2009

　More details

researchmap
SIFT混合ガウス分布と音響特徴を用いた映像からの高次特徴検出

電子情報通信学会技術研究報告 2009

　More details

researchmap
TITGT at TRECVID 2009 Workshop

TRECVID Workshop (TRECVID 2009) 2009

　More details

researchmap
Robust Speech Recognition In The Car Environment

the 4th Language and Technology Conference (LTC'09) 2009

　More details

researchmap
Noise robust speech recognition using spectral subtraction and F0 information extracted by Hough transform

Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference 2009

　More details

researchmap
音声認識のためのコミッティを用いた能動学習

日本音響学会秋季研究発表会 2009

　More details

researchmap
Speaker Adaptation Based on Two-Step Active Learning

INTERSPEECH 2009 BRIGHTON 2009

　More details

researchmap
Online speaker clustering using incremental learning of an ergodic hidden markov model

IEEE ICASSP 2009 2009

　More details

researchmap
Independent component analysis for noisy speech recognition

ICASSP 2009 2009

　More details

Presentation type：Poster presentation

researchmap
能動的な適応文選択に基づく話者適応化

日本音響学会 2009年春季研究発表会 2009

　More details

researchmap
ハフ変換による基本周波数情報を用いた耐雑音音声認識の高性能化の検討

日本音響学会 2009年春季研究発表会 2009

　More details

researchmap
統計的モデル選択によるシーン数の自動推定を用いた動画要約

電子情報通信学会技術研究報告 2009

　More details

researchmap
CHLAC特徴と隠れマルコフモデルを用いたGait認識

電子情報通信学会技術研究報告 2009

　More details

researchmap
耐雑音音声認識のためのハフ変換による基本周波数情報抽出の高速化

電子情報通信学会技術研究報告 2009

　More details

researchmap
TITGT at TRECVID 2009 Workshop

TRECVID Workshop (TRECVID 2009) 2009

　More details

researchmap
Robust Speech Recognition In The Car Environment

the 4th Language and Technology Conference (LTC'09) 2009

　More details

researchmap
Noise robust speech recognition using spectral subtraction and F0 information extracted by Hough transform

Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference 2009

　More details

researchmap
Speaker Adaptation Based on Two-Step Active Learning

INTERSPEECH 2009 BRIGHTON 2009

　More details

researchmap
Online speaker clustering using incremental learning of an ergodic hidden markov model

IEEE ICASSP 2009 2009

　More details

researchmap
Independent component analysis for noisy speech recognition

ICASSP 2009 2009

　More details

Presentation type：Poster presentation

researchmap
ToFカメラによる3D手話認識

画像の認識・理解シンポジウム 2010

　More details

researchmap
NIST SRE 2010:Tokyo Tech Speaker Recognition

NIST 2010 Speaker recognition evaluation workshop 2010

　More details

researchmap
NIST SRE 2010:Tokyo Tech Speaker Recognition

NIST 2010 Speaker recognition evaluation workshop 2010

　More details

researchmap
Gait Recognition Using CHLAC Features and Hidden Markov Model

IEICT Tachnical Report 2009

　More details

researchmap
Family Adaptation of Factorial HMMs for Personal Robots

2006

　More details

researchmap
Home-Environment Adaptation of Phoneme Factorial Hidden Markov Models

Poznan, Poland 2007

　More details

researchmap
Family adaptaion of Factorial HMMs for personal robots

日本音響学会 2006年春季講演 2006

　More details

researchmap
Robust scene recognition for baseball broadcast

International Symposium on Large-Scale Knowledge Resources(LKR2006) 2006

　More details

Presentation type：Poster presentation

researchmap
FHMM for Robust Speech Recognition in Home Environment

International Symposium on Large-Scale Knowledge Resources(LKR2006) 2006

　More details

Presentation type：Poster presentation

researchmap
十分統計量を用いた教師なし話者適応における話者選択法

電子情報通信学会技術研究報告 2007

　More details

researchmap
音声認識における確率モデルの重み係数の自動推定

電子情報通信学会技術研究報告 2007

　More details

researchmap
数値列化したイベントシーンの学習と試合進行状況情報による制約条件を用いた野球映像イベント識別

電子情報通信学会技術研究報告 2007

　More details

researchmap
An Interface Using Semi-synchronous Speech and Pen Input

IJARC(Microsoft)-Tokyo Institute of Technology Joint Symposium on "The forefront of the Speech Recognition Research" 2007

　More details

researchmap
TokyoTech's TRECVIC2007 Notebook

TRECVID 2007 Workshop 2007

　More details

researchmap
ハイブリッドモデルに基づく単視点ビデオデータにおける人間の歩行動作のトラッキング

電子情報通信学会技術研究報告 2007

　More details

researchmap
Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition

INTERSPEECH 2007 2007

　More details

researchmap
Automatic Estimation of Scaling Factors Among Probabilistic Models in Speech Recognition

INTERSPEECH2007 2007

　More details

researchmap
Predictive Minimum Bayes Risk Classification for Robust Speech Recognition

INTERSPEECH2007 2007

　More details

researchmap
投球の次ショットに重きを置いたシーンのパターン化と離散隠れマルコフモデルを用いた野球放送映像の自動イベント分類

映像情報メディア学会誌 2007

　More details

researchmap
多段SVMを用いた頑健な動画ショット境界検出

画像の認識・理解シンポジウム（MIRU 2007）IS-2-19 2007

　More details

researchmap
A Robust Scene Recognition System for Baseball Broadcast Using Date-Driven Approach

CIVR2007, Amsterdam, The Netherlands 2007

　More details

researchmap
時系列データに対するデータ駆動型アプローチに基づく野球放送の頑健なシーン認識

画像の認識・理解シンポジウム（MIRU 2007）IS-1-17 2007

　More details

researchmap
「野球放送のためのデータ駆動型アプローチを用いた得点シーン検出」

第13回画像センシングシンポジウム予稿集 2007

　More details

researchmap
Speech Recognition Using FHMMs Robust against Nonstationary Noise

ICASSP 2007 2007

　More details

researchmap
Speech Recognition Using FHMMs Robust against Nonstationary Noise

IEEE ICASSP 2007 2007

　More details

researchmap
Semi-Synchronous Speech and Pen Input

ICASSP 2007 2007

　More details

researchmap
スライド資料を用いた講義音声認識のための言語モデル適応

2007年春季講演論文集 2007

　More details

researchmap
Recognition of speech in non-stationary noise using factorial HMMs

2005

　More details

researchmap
Noise discrimination using models with different structures

日本音響学会 2005年春季講演 2005

　More details

researchmap
Model optimization for noise discrimination in home environment

Symposium on Large-Scale Knowledge Resources(LKR2005) 2005

　More details

researchmap
Scene recognition using Hidden Markov Models for video database

Symposium on Large-Scale Knowledge Resources(LKR2005) 2005

　More details

researchmap
Noise discrimination using models with different structures

2005

　More details

researchmap
弁別素性のグラフィカルモデリングによる音声認識

音声音響学会2005年春季研究発表会 2005

　More details

researchmap
Model optimization for noise discrimination in home environment

Symposium on Large-Scale Knowledge Resources (LKR2005) 2005

　More details

researchmap
Robust Highlight Extraction Using Multi--Stream Hidden Markov Models for Baseball Video

International Conference on Image Processing 2005(ICIP 2005) 2005

　More details

researchmap
Recognition of speech in non-stationary noise using Factorial HMMs

日本音響学会 2005年秋季講演 2005

　More details

researchmap
Robust highlight extraction using multi-stream Hidden Markov Models for baseball video

International Conference on Image Processing 2005 (ICIP2005), 2005

　More details

researchmap
動画像インデクシングのためのシーン時系列の確率的言語モデル

第12回画像センシングシンポジウ 2006

　More details

Presentation type：Poster presentation

researchmap
ビデオ画像における人間の歩行動作の3次元トラッキング

電子情報通信学会パターン認識・メディア理解研究会 2006

　More details

researchmap
Towards optimal bayes decision for speech recognition

ICASSP2006 2006

　More details

researchmap
Noise discrimination using models with different structures

2005

　More details

researchmap
Model optimization for noise discrimination in home environment

Symposium on Large-Scale Knowledge Resources (LKR2005) 2005

　More details

researchmap
講義音声認識における講義スライド情報の利用

電子情報通信学会 2006

　More details

researchmap
Multimedia Information Retrieval Using Pattern Recognition Techniques

IJARC 2nd Symposium 2006

　More details

researchmap
Robust Scene Recognition Using Language Models

MIR 2006, ACM Workshop 2006 2006

　More details

researchmap
音声とペンの準同期入力に対するマルチモーダル認識

日本音響学会 2006年秋季講演 2006

　More details

researchmap
音声とペン入力の同時入力に対する認識方式の検討

電子情報通信学会音声研究会 2006

　More details

researchmap
Robust Scene Recognition Using Language Models

MIR 2006, ACM Workshop 2006 2006

　More details

researchmap
Towards optimal bayes decision for speech recognition

ICASSP2006 2006

　More details

researchmap
Family Adaptation of Factorial HMMs for Personal Robots

2006

　More details

researchmap
Family adaptaion of Factorial HMMs for personal robots

日本音響学会 2006年春季講演 2006

　More details

researchmap
基本周波数情報を用いたダイナミックベイジアンネットワークによる音声認識

電子情報通信学会音声研究会 2006

　More details

researchmap
基本周波数情報のグラフィカルモデリングによる音声認識

日本音響学会 2006年春季講演 2006

　More details

researchmap
Robust scene recognition for baseball broadcast

International Symposium on Large-Scale Knowledge Resources(LKR2006) 2006

　More details

Presentation type：Poster presentation

researchmap
FHMM for Robust Speech Recognition in Home Environment

International Symposium on Large-Scale Knowledge Resources(LKR2006) 2006

　More details

Presentation type：Poster presentation

researchmap
野球中継番組を対象とした音響情報を用いたシーン認識

日本音響学会2006年春季講演論文集 2006

　More details

researchmap
Multimedia Information Retrieval Using Pattern Recognition Techniques

IJARC 2nd Symposium 2006

　More details

researchmap
Speaker adaptation for demi-syllable based speech recognition using continuous HMM,

ICSLP-90 1990

　More details

researchmap
Speaker adaptation for demi-syllable based speech recognition using continuous HMM,

ICSLP-90 1990

　More details

researchmap
Speaker Adaptation for Demi-Syllable-Based Continuous-Density HMM

ICASSP-91 1991

　More details

researchmap
Speaker Adaptation for Demi-Syllable-Based Continuous-Density HMM

ICASSP-91 1991

　More details

researchmap
Speech recognition using tree-structured probability density function

ICSLP-94 1994

　More details

researchmap
Unsupervised speaker adaptation for speech recognition using demi-syllable HMM

ICSLP-94 1994

　More details

researchmap
Unsupervised speaker adaptation for speech recognition using demi-syllable HMM

ICSLP-94 1994

　More details

researchmap
High speed speech recognition using tree-structured probability density function

ICASSP-95 1995

　More details

researchmap
Speaker adaptation with autonomous control using tree structure

EuroSpeech-95 1995

　More details

researchmap
High speed speech recognition using tree-structured probability density function

ICASSP-95 1995

　More details

researchmap
Speaker adaptation with autonomous control using tree structure

EuroSpeech-95 1995

　More details

researchmap
Speaker adaptation with autonomous model complexity control by MDL principle

ICASSP-96 1996

　More details

researchmap
Unsupervised and incremental speaker adaptation under adverse environmental conditions

ICSLP-96 1996

　More details

researchmap
Speaker adaptation with autonomous model complexity control by MDL principle

ICASSP-96 1996

　More details

researchmap
Speech recognition using tree-structured probability density function

ICSLP-94 1994

　More details

researchmap
Unsupervised and incremental speaker adaptation under adverse environmental conditions

ICSLP-96 1996

　More details

researchmap
Acoustic modeling based on the MDL criterion for speech recognition

EuroSpeech-97 1997

　More details

researchmap
Structural MAP speaker adaptation using hierarchical priors

IEEE Workshop on Speech Recognition and Understanding 1997

　More details

researchmap
Unsupervised adaptation using structural Bayes approach

ICASSP-98 1998

　More details

researchmap
Unsupervised adaptation using structural Bayes approach

ICASSP-98 1998

　More details

researchmap
Rapid Vocal Tract Length Normalization using Maximum Likelihood Estimation

EuroSpeech2001 2001

　More details

researchmap
Analytic Methods for Acoustic Model Adapation: A Review

Isca ITR-Workshop2001 2001

　More details

researchmap
Rapid Vocal Tract Length Normalization using Maximum Likelihood Estimation

EuroSpeech2001 2001

　More details

researchmap
Analytic Methods for Acoustic Model Adapation: A Review

Isca ITR-Workshop2001 2001

　More details

researchmap
Acoustic modeling based on the MDL criterion for speech recognition

EuroSpeech-97 1997

　More details

researchmap
Structural MAP speaker adaptation using hierarchical priors

IEEE Workshop on Speech Recognition and Understanding 1997

　More details

researchmap
Robust Acoustic Modeling for Speech Recognition

電子情報通信学会音声研究会 2004

　More details

researchmap
隠れマルコフモデルを用いた野球放送の自動的インデクシング

電子情報通信学会技術研究報告 2004

　More details

researchmap
隠れマルコフモデルを用いた野球放送の自動インデキシング

電子情報通信学会パターン認識・メディア理解研究会 2004

　More details

researchmap
A study of noise discrimination for personal robots

日本音響学会 2004年秋季講演 2004

　More details

researchmap
A study of noise discrimination for personal robots

2004

　More details

researchmap
手書き文字の準同期入力を併用した音声認識手法の予備検討

電子情報通信学会 2004年総合大会 2004

　More details

researchmap
動的特徴量を用いたHMMによる連続動作認識

電子情報通信学会 2004年総合大会 2004

　More details

researchmap
パーソナルロボット向けの家庭内雑音に頑健な音声認識の検討

日本音響学会 2003年秋季講演 2003

　More details

researchmap
パーソナルロボット向けの家庭内雑音に頑健な音声認識の検討

日本音響学会 2003年秋季研究発表会 2003

　More details

researchmap
Robust Acoustic Modeling for Speech Recognition

2004

　More details

researchmap
Robust Highlight Extraction Using Multi--Stream Hidden Markov Models for Baseball Video

International Conference on Image Processing 2005(ICIP 2005) 2005

　More details

researchmap
Recognition of speech in non-stationary noise using Factorial HMMs

日本音響学会 2005年秋季講演 2005

　More details

researchmap
音声と手書き文字の同時入力によるインタフェースの検討

日本音響学会 2005年秋季講演 2005

　More details

researchmap
Robust highlight extraction using multi-stream Hidden Markov Models for baseball video

International Conference on Image Processing 2005 (ICIP2005), 2005

　More details

researchmap
Recognition of speech in non-stationary noise using factorial HMMs

2005

　More details

researchmap
音声と手書き文字の同時入力によるインターフェースの検討

日本音響学会2005年秋季研究発表会 2005

　More details

researchmap
Robust Acoustic Modeling for Speech Recognition

2004

　More details

researchmap
Robust Acoustic Modeling for Speech Recognition

電子情報通信学会音声研究会 2004

　More details

researchmap
A study of noise discrimination for personal robots

日本音響学会 2004年秋季講演 2004

　More details

researchmap
A study of noise discrimination for personal robots

2004

　More details

researchmap
音声と手書き文字の同時入力インターフェース

情報処理学会音声言語情報処理研究会 2005

　More details

researchmap
Noise discrimination using models with different structures

日本音響学会 2005年春季講演 2005

　More details

researchmap
弁別素性のグラフィカルモデリングによる音声認識

日本音響学会 2005年春季講演 2005

　More details

researchmap
Model optimization for noise discrimination in home environment

Symposium on Large-Scale Knowledge Resources(LKR2005) 2005

　More details

researchmap
Scene recognition using Hidden Markov Models for video database

Symposium on Large-Scale Knowledge Resources(LKR2005) 2005

　More details

researchmap
隠れマルコフモデルを用いた野球放送の自動的インデクシング

画像の認識・理解シンポジウム(MIRU2005) 2005

　More details

researchmap
隠れマルコフモデルを用いた野球放送の自動的インデクシング

画像の認識・理解シンポジウム（MIRU2005） 2005

　More details

researchmap
隠れマルコフモデルとMLLRによるゲーム適応を用いた野球放送の自動的インデクシング

第11回画像センシングシンポジウム講演論文集 2005

　More details

researchmap
隠れマルコフモデルとMLLRによるゲーム適応を用いた野球放送の自動インデクシング

第11回画像センシングシンポジウム 2005

　More details

researchmap
Home-Environment Adaptation of Phoneme Factorial Hidden Markov Models

Poznan, Poland 2007

　More details

researchmap
Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition

INTERSPEECH 2007 2007

　More details

researchmap
Automatic Estimation of Scaling Factors Among Probabilistic Models in Speech Recognition

INTERSPEECH2007 2007

　More details

researchmap
Predictive Minimum Bayes Risk Classification for Robust Speech Recognition

INTERSPEECH2007 2007

　More details

researchmap
A Robust Scene Recognition System for Baseball Broadcast Using Date-Driven Approach

CIVR2007, Amsterdam, The Netherlands 2007

　More details

researchmap
Speech Recognition Using FHMMs Robust against Nonstationary Noise

ICASSP 2007 2007

　More details

researchmap
Speech Recognition Using FHMMs Robust against Nonstationary Noise

IEEE ICASSP 2007 2007

　More details

researchmap
Semi-Synchronous Speech and Pen Input

ICASSP 2007 2007

　More details

researchmap
Robust Scene Recognition Using Scene Context Information for Video Contents

International Symposium on Large-Scale Knowledge Resources(LKR2007) 2007

　More details

researchmap
Comparative Study on Robust Speech Recognition against Nonstationary Noise in the Home Environment

Symposium on Large-Scale Knowledge Resources(LKR2007) 2007

　More details

researchmap
Presentation Scene Retrieval Exploiting Features in Videos Including Pointing and Speech Information

Symposium on Large-Scale Knowledge Resources(LKR2007). 2007

　More details

researchmap
Tokyo Tech at TRECVID 2008

TRECVID Workshop (TRECVID 2008) 2008

　More details

researchmap
Automatically Estimating Number of Scenes for Rushes Summarization

TRECVID BBC Rushes Summarization Workshop (TVS 2008) at ACM Multimedia 2008

　More details

researchmap
Improvement of eigenvoice-based speaker adaptation by parameter space clustering

INTERSPEECH2008 2008

　More details

researchmap
Robust spoken term detection using combination of phone-based and word-based recognition

INTERSPEECH2008 2008

　More details

researchmap
Time-lag Adaptation for Semi-synchronous Speech and Pen Input

INTERSPEECH2008 2008

　More details

researchmap
スペクトルサブトラクションとハフ変換による基本周波数情報を用いた耐雑音音声認識

日本音響学会秋季研究発表会 2008

　More details

researchmap
Automatic score Scene Detection for Baseball Video

Symposium on Large-Scale Knowledge Resources(LKR2008) 2008

　More details

researchmap
連続音素認識を用いた単語認識誤りに頑健な講演音声検索

日本音響学会春季研究発表会 2008

　More details

researchmap
パラメータ空間のクラスタ化による固有声話者適応化の改良

日本音響学会春季研究発表会 2008

　More details

researchmap
音声とペンの同時入力における個人差への適応化

日本音響学会春季研究発表会 2008

　More details

researchmap
木構造クラスタリングを用いた動画像からの高次特徴抽出

電子情報通信学会技術研究報告no.491 2008

　More details

researchmap
木構造クラスタリングを用いた動画像からの高次特徴抽出

電子情報通信学会技術研究報告 2008

　More details

researchmap
Initial Evaluation of the Drivers' Japanese Speech Corpus in a Car Environment

2008

　More details

researchmap
Tokyo Tech at TRECVID 2008

TRECVID Workshop (TRECVID 2008) 2008

　More details

researchmap
Robust Scene Recognition Using Scene Context Information for Video Contents

International Symposium on Large-Scale Knowledge Resources(LKR2007) 2007

　More details

researchmap
Comparative Study on Robust Speech Recognition against Nonstationary Noise in the Home Environment

Symposium on Large-Scale Knowledge Resources(LKR2007) 2007

　More details

researchmap
Presentation Scene Retrieval Exploiting Features in Videos Including Pointing and Speech Information

Symposium on Large-Scale Knowledge Resources(LKR2007). 2007

　More details

researchmap
An Interface Using Semi-synchronous Speech and Pen Input

IJARC(Microsoft)-Tokyo Institute of Technology Joint Symposium on "The forefront of the Speech Recognition Research" 2007

　More details

researchmap
TokyoTech's TRECVIC2007 Notebook

TRECVID 2007 Workshop 2007

　More details

researchmap
Estimation of skylight conditions based on leaf-scale wheat images International conference

Kuniaki Uto, Mauro Dalla Mura, Jocelyn Chanussot, Koichi Shinoda

Images et data : méthodes d'analyse et modélisation pour l'agriculture numérique 2019.3

　More details

Language：English Presentation type：Oral presentation (general)

Venue：Ivry-sur-Seine

researchmap
Structural MAP for LR & HMMs International conference

Koichi Shinoda

IEEE ASRU 2023 2023.12

　More details

Language：English Presentation type：Oral presentation (invited, special)

Venue：Taipei

researchmap

Other Link： https://bayesian40.github.io/
突発天体観測用天文台全球リレーのための気象モニターの開発

谷津陽一, 吉井健敏, 針田聖平, 村木雄太郎, 河合誠之, 佐久間惇一, HyunJin Jung, 井上中順, 篠田浩一, 下川辺隆史, 太田佳

天文学会 2016.9

　More details

Language：Japanese Presentation type：Oral presentation (general)

Venue：愛媛

researchmap
Deep Learningを応用した全天画像からの気象識別

谷津陽一, 白石一輝, 吉井健敏, 河合誠之, 佐久間淳一, 井上中順, 篠田浩一, 下川辺隆史

天文学におけるデータ科学的方法 2017.5

　More details

Language：Japanese Presentation type：Oral presentation (general)

Venue：立川

researchmap
Video Information Retrieval International conference

Koichi Shinoda

The 2017 IEEE SPS Summer School on Visual Image Search and Visual Analytics (VISVA2017) 2017.7

　More details

Language：English Presentation type：Oral presentation (invited, special)

researchmap

▼display all

Industrial property rights

情報処理装置、情報処理方法およびプログラム

佐藤育郎, CERVANTESBAQUE Pablo Alberto, 篠田浩一, 関川雄介

　More details

Applicant：国立大学法人東京工業大学, 株式会社デンソーアイティーラボラトリ

Application no：特願2021-082514 Date applied：2021.5

Announcement no：特開2022-175810 Date announced：2022.11

researchmap
音声認識誤り単語検出装置、方法及びプログラム

篠田浩一, 浅見太一

　More details

Applicant：国立大学法人東京工業大学, 日本電信電話株式会社

Application no：特願2015-135868 Date applied：2015.7

Announcement no：特開2017-021062 Date announced：2017.1

researchmap

Works

Study of speech recognition

2002

　More details

Work type：Artistic work

researchmap
音声認識の研究

2002

　More details

Work type：Artistic work

researchmap

Awards

情報・システムソサイエティ活動功労賞

2011.6 社団法人電子情報通信学会

　More details

researchmap
電子情報通信学会論文賞

1998

　More details

Country：Japan

researchmap
Excellent Paper Award from the Institute of Electronics, Information, and Communication Engineers

1998 Institute of Electronics, Information, and Communication Engineers

　More details

researchmap
日本音響学会粟谷学術奨励賞

1997

　More details

Country：Japan

researchmap
the Awaya Prize from the Acoustic Society of Japan

1997 the Acoustic Society of Japan

　More details

researchmap
日本音響学会技術開発賞

1995 日本音響学会

　More details

Country：Japan

researchmap

▼display all

Research Projects

深層生成モデルを活用した構成的なパターン認識・理解

Grant number：23H00490 2023.4 - 2026.3

日本学術振興会科学研究費助成事業基盤研究(A)

篠田浩一, 井上中順, 横田理央, 川上玲, 佐藤育郎

　 More details

Grant amount：\47190000 （ Direct Cost: \36300000 、 Indirect Cost：\10890000 ）

本研究課題では，識別の対象（インスタンス）を属性の集合（束）とみなし，特徴量空間においてその特徴を属性ごとに分解する．そして，これらの属性特徴からインスタンスを再合成する過程で属性特徴を最適化することで，各属性を高精度で識別し，かつ，外れ値に対し頑健な識別手法を実現することを目的としている。このために深層生成モデルと高密度な属性アノテーションに基づく学習手法を開発する．従来研究の多くが対象とその属性が一対一に対応する平坦な意味構造を仮定していたのに対し，本研究は多くの属性が複雑に絡み合う対象における複数の属性を同時に識別することを可能にする．新しい属性やクラスの創発も視野に入れる．より具体的には、深層学習を用いた「合成による識別」のアプローチにより，構成的なパターン認識・理解を行う方法論を確立する．人の動作認識，話者・感情認識，マルチモーダル認識の3つのタスクで横断的に評価し，従来に比べ高い識別性能を目指す．初年度である本年度は、人の動作認識、話者・感情認識、マルチモーダル認識の各々の課題において、評価データベースの構築と、ベースライン方式の開発を行った。これらと並行して、比較的小規模なタスクで、拡散モデルなどの生成モデルを用いて識別を行う方式の開発を行った。また、ニューラル構造探索などを用いて生成モデルの効率的な学習を行う方式も開発した。特に、センサーと映像のマルチモーダル認識における基本方式の構築、およびデータベース構築、人間の歩容認識の基本方式の開発、マルチモーダル感情認識の基本方式の開発を行った。

researchmap
Decoding of Imagined Speech from Minimally Invasive EEG for Intentional BMI

Grant number：23H00548 2023.4 - 2026.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

　 More details

Grant amount：\46930000 （ Direct Cost: \36100000 、 Indirect Cost：\10830000 ）

researchmap
知識限界を明確化する機能分化された深層学習

Grant number：22H03642 2022.4 - 2025.3

日本学術振興会科学研究費助成事業基盤研究(B)

佐藤育郎, 川上玲, 井上中順, 篠田浩一

　 More details

Grant amount：\17420000 （ Direct Cost: \13400000 、 Indirect Cost：\4020000 ）

researchmap
知識限界を明確化する機能分化された深層学習

Grant number：23K24898 2022.4 - 2025.3

日本学術振興会科学研究費助成事業基盤研究(B)

佐藤育郎, 篠田浩一, 井上中順, 川上玲

　 More details

Grant amount：\17420000 （ Direct Cost: \13400000 、 Indirect Cost：\4020000 ）

シナプティック記憶テーマについて，従来法の課題を解決できる理論的枠組みを構築した．巡回型のモダンホップフィールドネットワークは，入力クエリに対し，モデルの内部に持つ記憶データの関連付けが行えるが，クエリが分布外データに相当するときに誤った関連付けを行ってしまう．この課題に対し，我々は分布の内外判定機能を持たせることによって原理的に課題を解決できる方法を定式化した．現在論文を執筆中である．
人物行動の生成的予測テーマについて，異なる人体モデルに基づくデータを統括的に学習に用いることのできるアルゴリズムを開発し，国際会議ECCVに論文を投稿した（査読中）．この手法により，人体モデルの定義が異なるデータセットを学習でき，より自然な行動生成が行えることを確認した．
視点変化の下での三次元理解テーマについて，生成器と回帰器の協調的推論という新規な提案を行い，回帰器のみを用いる従来法に対する性能改善効果を確認した．国際会議ICIPに論文を投稿した（査読中）．機能分化されたモデル群（異なる目的関数によって最適化された複数のネットワーク）が協調的に働くことで下流タスクの性能が改善できることを示すことが出来た．
時系列整合判定テーマについて，既存の自動運転用の認識器の特徴に整合を壊す成分が含まれる課題を確認した．
目標値伝播法テーマについては，従来法の性能改善として，順・逆ネットワークのヤコビアンの整合性を取る方法を提案した（Y. Baoら，AAAI 2024）．

researchmap
機械学習を用いた突発天体検知サーベイロボットの構築

Grant number：20K04011 2020.4 - 2023.3

日本学術振興会科学研究費助成事業基盤研究(C)

村田勝寛, 谷津陽一, 篠田浩一, 井上中順, 下川辺隆史

　 More details

Grant amount：\4290000 （ Direct Cost: \3300000 、 Indirect Cost：\990000 ）

本年度の成果は大きく分けて以下の二点である。
(a) 広視野望遠鏡の設置
本年度前半は広視野望遠鏡用に検討していた2台のアマチュア天文向けCMOSカメラの性能評価を進めた。実験室での試験と大学屋上での試験観測により、このうち1台で天文研究向けの観測に必要な性能を備えていることを確認できた。それを踏まえて10月に岡山県浅口市のMITSuME望遠鏡天体ドーム内に口径20cm広視野望遠鏡を設置して観測を開始した。はじめに重力波追観測用のSDSS gバンドフィルターでの試験観測をおこない、合計1000秒積分で17.5等の限界等級を達成していることを確認した。また、望遠鏡の赤道儀制御、CMOSカメラ撮像制御のソフトウェアを開発して、事前に準備した天体リストにもとづき自動観測を実施できる機能を導入した。本年度は重力波望遠鏡の観測は停止しているため、銀河系内の天体を中心に観測を進めた。
(b) 突発天体検知システム開発
突発天体検知システムは、観測画像の一次処理、深層学習を用いた突発天体の識別システムからなる。我々が運用するMITSuME可視光望遠鏡のパイプラインを移植することで、本年度岡山に設置した広視野望遠鏡の画像処理と測光の自動化を実現した。突発天体識別システムについては、識別精度の向上のため、これまで開発してきた深層学習を用いた識別手法の改善を試みた。また、実運用に向けてサーバーの立ち上げとスクリプト群の開発を進めた。

researchmap
Implementation of Intentional BMI through Large-Scale EEG Data and Calibration-Free Model Construction

Grant number：20H00235 2020.4 - 2023.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

　 More details

Grant amount：\46020000 （ Direct Cost: \35400000 、 Indirect Cost：\10620000 ）

researchmap
声真似による成りすまし攻撃に対する話者照合の耐性向上に関する研究

Grant number：19K12051 2019.4 - 2023.3

日本学術振興会科学研究費助成事業基盤研究(C)

岩野公司, 篠田浩一

　 More details

Grant amount：\4030000 （ Direct Cost: \3100000 、 Indirect Cost：\930000 ）

本研究では，声による個人認証（話者照合）の実用化に向けた，「声真似による成りすまし攻撃」の対策についての検討を進める．過去の研究において，物真似のスキルの違いによって声真似の特徴や成りすましが成功する理由に違いがあることが示唆されていることから，そのメカニズムの解明を図り，その知見に基づいて声真似攻撃に対する効果的な対策手法の提案を目指す．
２０２１年度は，新規に深層学習に基づく話者照合システムの構築と導入を行い，このシステムが「物真似のスキルが高い人」の声真似の攻撃をどの程度防御できるかについて，調査・分析を行った．その結果，「物真似のスキルが高い人」の声真似は，今回の深層学習ベースの話者照合システムにおいても「物真似のスキルが低い人」の声真似に比べて成りすましの成功率の大きな上昇が見られ，高い攻撃力を有することが確認された．したがって，深層学習の単純な導入のみでは声真似による詐称攻撃の対策としては不十分であることが示された．
そこで，対策手法の一つとして，「声真似のスキルが高い人」の声真似を収集し，そのデータを（声真似をされた）本人の発声ではないものとして学習に利用し，話者照合システムの識別性能を高める方法が考えられる．しかし，実際に「声真似のスキルが高い人」（プロの物真似タレントなど）に依頼して大量の声真似音声を収集することは現実的ではないため，近年，高性能化が進んでいる「声質変換技術」を用いて声真似に相当する音声を人工的に生成し，それを学習に利用することを考える．２０２１年度は，２種類の「声質変換技術」を用いて詐称用音声の作成を行い，システムに対する攻撃力の調査を行うことで，「声真似のスキルが高い人」の声真似音声と特徴が類似しているかを調査した．その結果，１種類の声質変換器が，「声真似のスキルが高い人」の声真似攻撃と同様の特徴を有する音声を生成できることが確認された．

researchmap
マルチエージェント深層学習による音声因子分解

Grant number：19H04133 2019.4 - 2022.3

日本学術振興会科学研究費助成事業基盤研究(B)

篠田浩一, 井上中順, 岩野公司, 宇都有昭

　 More details

Grant amount：\17420000 （ Direct Cost: \13400000 、 Indirect Cost：\4020000 ）

前年度に「(A)音韻性と雑音の分離」と「(B)音韻性と話者性の分離」について、ある程度の性能向上が達成されたため、今年度はこれらを活用して、「(D)音源分離」の応用に着手した。より具体的には、これまで開発してきた波形ベースの音声信号処理システムをベースに、複数の話者の音声を分離する音源分離のフレームワークをまず構築した。そして、「(A)音韻性と雑音の分離」の成果をもとに、音源から雑音を除去する仕組みをこのシステムに追加し、雑音の影響に対して頑健な音声分離システムを構築した。既存のデータベースに雑音を重畳したデータを構築し、それを用いて評価を行った。従来法に比べ有意に高い性能を得た。この成果は、信号処理関連の国際会議IEEE APSIPA 2021に採択され、発表した。さらに「(E)言語認識、感情認識」において、まず感情認識に着手した。ここでは、まず、「(B)音韻性と話者性の分離」の成果に基づき、音声から音韻性と話者性を分離する、disentanglement（もつれをほどく）のフレームワークを構築し、残された成分を入力として感情の認識を行うシステムを開発する。今年度は、既存のデータベースを用いたベースラインを構築した。disentanglementには、オートエンコーダー(自己符号化器）を利用した声質変換を用いる方法を採用し、実装を行った。最終年度に評価を行う予定である。評価には、感情認識の分野で広く用いられているIEMOCAPデータベースを用いる予定である。

researchmap
社会インフラ映像処理のための高速・省資源深層学習アルゴリズム基盤

Grant number：19189017 2019 - 2021

科学技術振興機構戦略的な研究開発の推進/戦略的創造研究推進事業/CREST

篠田浩一

　 More details

ドライブレコーダーや監視カメラの大量の高精細映像から物体の検出や異常の検知を行うための、高性能かつ実時間で深層学習・解析を行うアルゴリズム基盤を構築します。システムアーキテクチャーからアプリケーションまでの様々な階層の研究者、計算ノード内処理、ノード間並列処理、学習アルゴリズム、FPGA実装、社会実装、情報理論の研究者が互いに綿密に連携します。

researchmap
Development of high-precision high-speed transient detection method in wide-field survey utilizing GPU and Deep Learning

Grant number：16K13783 2016.4 - 2019.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

YATSU YOICHI, KAWAI NOBUYUKI

　 More details

Grant amount：\3770000 （ Direct Cost: \2900000 、 Indirect Cost：\870000 ）

GGeneric technologies for automating the optical/IR telescopes for the up coming Gravitational astronomy Era were developed .Currently the robotic telescopes still require humans help for assigning of observation schedule and the final confirmations of transient detections. To reduce the delay of the follow-up observation, we developed a weather recognition algorithm and a transient detection algorithm by using machine learning and image recognition technologies. In addition we also successfully speeded up the data reduction pipeline by a factor of 1/30 by utilizing GPU parallel computing technology.

researchmap
Multimodal time-sequence data recognition platform based on deep learning

Grant number：16H02845 2016.4 - 2019.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Shinoda Koichi

　 More details

Grant amount：\15990000 （ Direct Cost: \12300000 、 Indirect Cost：\3690000 ）

This research aims to accurately recognize multi-modal time-sequence signals using deep learning. We applied various deep learning techniques such as End-to-end training, deep net which is trainable with a small amount of data, multi-task learning, and noise-robust recognition. Particularly, we improved the recognition and detection performance in simultaneous training for source separation and speech recognition, dementia detection from speech, multi-modal speech recognition using lip reading, noise-robust speech recognition.

researchmap
社会インフラ映像処理のための高速・省資源深層学習アルゴリズム基盤

Grant number：16817271 2016 - 2018

科学技術振興機構戦略的な研究開発の推進/戦略的創造研究推進事業/CREST

篠田浩一

　 More details

ドライブレコーダーや監視カメラの大量の高精細映像から物体の検出や異常の検知を行うための、高性能かつ実時間で深層学習・解析を行うアルゴリズム基盤を構築します。高性能計算に関するハードウェアからアプリケーションまでの4つの異なる階層、ノード内処理、ノード間並列処理、学習処理、ネットワーク処理の研究者が互いに綿密に連携することで、従来と比較して1000分の1のメモリで1000倍高速な処理を目指します。

researchmap
Pattern recognition using graph signal processing for large-scale time-sequence data

Grant number：15K12061 2015.4 - 2018.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

Koichi Shinoda

　 More details

Grant amount：\3510000 （ Direct Cost: \2700000 、 Indirect Cost：\810000 ）

We have developed an action recognition method from RGB-D camera inputs. This method uses a time sequence of human skeleton as an input. Every frame it extracts features by using spectral graph wavelet transform. Then the features are pooled in a hierarchical way in the time axis. This method achieved the state-of-the-art in multi-view action recognition.

researchmap
Research on robust speaker verification against spoofing attacks by voice imitation

Grant number：25330206 2013.4 - 2017.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

Iwano Koji

　 More details

Grant amount：\4420000 （ Direct Cost: \3400000 、 Indirect Cost：\1020000 ）

In this research, we experimentally analyzed the effect of spoofing attacks by voice imitation on speaker verification systems and acoustical characteristics of the imitated voice. These analyses were conducted by using our original speech data consisting of professional and non-professional impersonators' imitated voice. The analysis results show that the voice imitation by non-professional impersonators significantly increases the success rate of spoofing attacks and the higher success rate is yielded by the professional impersonator's imitation. We also proposed a method for quantitatively evaluating the quality of voice imitation. The method reveals that the professional imitator efficiently approaches his voice characteristics towards the target speaker's voice.

researchmap
Microphone Array Signal Processing with Asynchronous Recording Devices

Grant number：25280069 2013.4 - 2016.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

ONO Nobutaka, MAKINO Shoji, MIYABE Shigeki, SHINODA Koichi

　 More details

Grant amount：\17030000 （ Direct Cost: \13100000 、 Indirect Cost：\3930000 ）

Microphone array signal processing is an important technique to estimate the direction of arrival of sound or to enhance a target sound in noisy environment by processing multi-channel signals. In the microphone array signal processing, a tiny time difference between channels is important information. Therefore, multi-channel signals have to be recorded in a synchronized way in conventional framework. While in this study, we have developed a technique to synchronize recording signals or to estimate microphone positions without any a priori knowledge in order to use asynchronous individual recording devices such as smartphones, laptop PC, and IC recorder.

researchmap
Speech information processing using deep generative models and their factorization

Grant number：25280058 2013.4 - 2016.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Shinoda Koichi, IWANO Koji, SHINOZAKI Takahiro

　 More details

Grant amount：\16900000 （ Direct Cost: \13000000 、 Indirect Cost：\3900000 ）

In speech recognition, it is important to train an accurate deep neural network (DNN) acoustic model from a large amount speech data from many speakers. In this study, we developed a framework to improve accuracy of the DNN acoustic model by factorizing speech data into phoneme and speaker elements. First we developed a speaker recognition method using deep Siamese network in which two DNNs which share its part. Second, we applied a DNN with a hierarchical phonetic structure to speaker adaptation. Third, we developed a speaker-adaptive training method where we utilized a student-teacher learning framework using soft targets. We improved speaker verification and speech recognition performance. We also studied DNN implementation and DNN structure design.

researchmap
Spoken Language Proceeding Based on Non-Extensive Information Theory

Grant number：24650079 2012.4 - 2015.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

SHINODA KOICHI

　 More details

Grant amount：\3900000 （ Direct Cost: \3000000 、 Indirect Cost：\900000 ）

We have developed a methodology for spoken language processing based on non-extensive statistical theory, which is an extension from the conventional extensive statistical theory. We first developed q-log spectral subtraction (q-LMSN) to achieve robustness against the difference of environmental noises and of channels. We proved that it was significantly better than the conventional CMN. Next, we developed a recognition a method using q-Gaussian mixtures for output probabilities in GMMs and in HMMs. We applied it to speech recognition and to video semantic indexing and proved its effectiveness.

researchmap
Macromolecular Potential Energy Decoder Based on Graphical Model

Grant number：23650068 2011 - 2013

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

SHINOZAKI Takahiro, SHINODA Shinoda, SEKIJIMA Masakazu

　 More details

Grant amount：\3250000 （ Direct Cost: \2500000 、 Indirect Cost：\750000 ）

Knowing tertiary structure is important to understand and predict protein function. However, it is an open question how to predict the tertiary structure of proteins from a sequence of amino acids. In this project, Slice Chain Max-Sum (SCMS) algorithm has been proposed. This method represents the potential function of a protein molecule as a factor graph, which is a kind of a graphical model. The factor graph is converted into a linearly structured one according to a slicing of the molecule in 3D space. Based on the converted graph, max-sum search is performed in combination with node-wise local MCMC sampling that approximates continuous variables by discrete ones. Experimental results show that SCMS is more efficient than conventional MCMC method. It is also shown that improved version of SCMS (i.e. SCMS2.0) outperforms MCMC method that is reinforced by the quasi-Newton method.

researchmap
Advancement of speech recognition technology using WFST

Grant number：21300062 2009 - 2011

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

FURUI Sadaoki, SHINODA Koichi, SHINOZAKI Takahiro

　 More details

Grant amount：\18070000 （ Direct Cost: \13900000 、 Indirect Cost：\4170000 ）

With the aim of improving the performance of automatic speech recognition using the Weighted Finite State Transducer(WFST)-based decoder and developing new applications of the decoder, a wide range of research has been conducted and various achievements have been obtained. The world highest performance speech recognition decoder,"T^3 decoder", has been developed by improving the on-the-fly algorithm for the WFST decoder. Recognition performance under noisy environment has been improved by incorporating speech/non-speech information to the decoder. Various new techniques have been developed to apply the decoder to the recognition of resource-deficient languages and code-switching speech, and to transliteration. Innovative ideas have been proposed toward new directions of the decoder technology. T^3 decoder has been released to domestic as well as overseas research laboratories.

researchmap
A study of multimodal recognition for human communication search

Grant number：20300063 2008 - 2010

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

SHINODA Koichi, FURUI Sadaoki

　 More details

Grant amount：\17940000 （ Direct Cost: \13800000 、 Indirect Cost：\4140000 ）

We developed multimodal pattern recognition techniques for human communication using speech and video. We proposed a statistical technique using Gaussian mixture models and support vector machines for event extraction. We participated in TRECVID2010 workshop, where our method achieved the 4-th performance among 40 participants from all over the world. We also developed new methods for active learning for speech modeling and adaptation, noise robust speech recognition, signal processing for meeting speech recognition, multimodal pattern recognition, speaker/gesture recognition, speech style analysis and video summarization.

researchmap
Systemization of audio-visual knowledge resources using graphical models

Grant number：17300059 2005 - 2007

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

SHINODA Koichi, FURUI Sadaoki

　 More details

Grant amount：\14780000 （ Direct Cost: \13700000 、 Indirect Cost：\1080000 ）

Recent advances in computer technology, particularly in storage technology, have resulted in significant increases in the number and quality of audio-visual knowledge resources. Most of those resources are not equipped with index information, and thus, it has become difficult for ordinary people to browse the entire content of each database. Techniques for systemizing audio-visual knowledge resources and utilizing them have been strongly demanded. However, statistical pattern recognition techniques have not yet achieved enough performance for this purpose. In addition, it is not always clear what kinds of indexing are useful. In this study, we take an approach to index those databases in different ways with unsupervised manner, and extract dependencies among those labels. First, we carried scene recognition for baseball video. We constructed annotated database for 43 games of Major League Baseball with NHK Science & Technical Research Labs and used them for our evaluation. We used various relationships between scene labels such as scene contexts, and unified audio and visual information. We achieved 60% accuracy for 16 scene recognition and 90% recall rate for score scene detection. Our techniques are expected to contribute much to make automatic highlight extraction systems for broadcast companies. Second, we participated in TRECVID workshop organized by NIST, USA, to study the high-level feature extraction task. We constructed tree-structured dictionaries of "visual words" by unsupervised clustering for video features, and selected a tree-cut as a dictionary for each word. By using Bag-of-word approach, we constructed a robust extraction system against the differences in data amount for each feature. We also extracted effective "motion words" for dynamic features. Our method achieved significant improvements in the task of extracting 39 features. The other research topics include robust speech recognition using graphical models, multi-modal interface for asynchronous multi-modal inputs, human-gait modeling.

researchmap
グラフィカルモデリングを用いた話し言葉音声認識の研究

Grant number：15650028 2003 - 2005

日本学術振興会科学研究費助成事業萌芽研究

篠田浩一, 古井貞煕

　 More details

Grant amount：\2900000 （ Direct Cost: \2900000 ）

グラフィカルモデリングの手法を用いて、音声の内在構造を抽出することを目的とし研究を進めている。これは、従来のHMMに代表される画一的な認識単位をベースとしたモデルより、より自由度の大きいモデルを用いることで、音声の様々な現象に対応することを目的としたものである。我々は、グラフィカルモデルの中でも、時系列データに対応したダイナミックベイジアンネットワークを音声に応用する研究を進めている。モデル化の対象は話し言葉音声とし、その音声認識性能の向上を目標としている。話し言葉音声は、読み上げ音声とは異なり、一般に話速が大きく、また、調音結合(発声のなまけ)が大きい。そのため、従来の隠れマルコフモデル(HMM)を用いた音声認識では、認識性能が70%と、読み上げ音声と比べると性能が劣る。その改善が課題である。
初年度は、話し言葉音声における音声の特徴の把握、グラフィカルモデリングのツールの準備を行った。次年度は、実際にグラフィカルモデリングツールを用いて調音結合のモデル化を行い、その性能を評価した。調音器官の運動を主に表す弁別素性に着目し、グラフィカルモデリングの手法を用いて音素をそれら弁別素性の束として表したモデルを用いて、認識性能をあげることを試み、若干の性能向上を得た。今年度は、それに加え、基本周波数(ピッチ)の情報を用いて、認識性能の向上を図った。そこでは、基本周波数情報を量子化し、異なるフレーム間の相関も考慮することにより、有声・無声の区別や、イントネーションの情報を反映したモデル化を行っている。評価実験には、日本語話し言葉コーパスを用い、話し言葉における性能を評価した。評価実験の結果、従来手法と比べ、少ないモデルパラメータ数で、より高い認識性能を示すことが判明し、提案手法の有効性が確認された。

researchmap
パターン認識手法とその実世界応用

2003 - 2005

　 More details

Grant type：Competitive

researchmap
Pattern recognition and its applications

2003 - 2005

　 More details

Grant type：Competitive

researchmap
SPEECH RECOGNITION WITH SYNCHRONOUS INPUT OF HAND-WRITTEN GESTURES FOR MOBILE DEVICES

Grant number：15300054 2003 - 2004

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

SHINODA Koichi, FURUI Sadaoki

　 More details

Grant amount：\5900000 （ Direct Cost: \5900000 ）

Mobile devices have recently been often used in daily life. User-friendly interface with high accuracy has been strongly demanded. For this purpose, we propose an interface using simultaneous inputs of speech and hand-written gestures. This interface is more robust against environmental noise than speech-only interface, and its input speed is faster than the interface with only hand-written gestures. Our target application is e-mail making with the input of sentences.
First year, we proposed an interface in which a sentence is input by speech while the "hiragana" character at the head of each phrase in the sentence is input by hand-written gestures. We implemented a recognition algorithm for hand-written gestures, designed a method for recognizing the simultaneous inputs of the two modes. The proposed method was evaluated by simulation experiments using speech data and hand-written gesture data, which are recorded independently, and was proved to be effective.
Second year, we constructed a recording system for the input of the two modes, and recorded 530 sentences from ten subjects. For integrating the two modes, we employed a two-pass process in which a word graph generated by speech recognition in the first pass is utilized for the integration process of the two modes in the second pass. The proposed method improved the recognition accuracy by 2.6 point over the method only with speech recognition.
For future work, a method for optimizing the weights among the two modes should be developed. We are going to develop a demonstration system which works in real time and evaluate it in noisy environment.

researchmap
Music Information Processing Using Continuous Speech Recognition Methods

Grant number：14380156 2002 - 2004

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

SAGAYAMA Shigeki, SHINODA Koichi, TABARU Tetsuya, NISHIMOTO Takuya

　 More details

Grant amount：\16900000 （ Direct Cost: \16900000 ）

We formulated music rhythm recognition for ranscribing MIDI data into music score as a Viterbi path search problem in HMM where hidden states and output probabilities represent the intended note values and actually played note lengths, respectively. We also solved rhythm recognition of polyphonic music by reducing polyphony intomonophony. Tempo modeling and tempo change detection were enabled with segmental k-means algorithm for speech recognition.
Harmonization (chord finding) of given melodies was formulated as an isomorphic problem as continuous speech recognition by defining output by the given melody, hidden states by the chord behind the melody and stochastic language model by chord sequences. Automatic counterpoint was developed with a two-step maximum likelihood approach consisting of rhythm design and pitch allocation solved by dynamic programming.
In polyphonic signal analysis, an algorithm named Harmonic-structured Clustering was developed based on the k-means clustering algorithm under harmonic constraint by modeling the framewise observed spectrum as overlapped harmonic structures and considering that the distributed energy in harmonic structure belongs to a single cluster. Furthermore, by introducing the probabilistic assignment to clusters, k-means was generalized into the EM-algorithm and attained higher performance of multi-pitch estimation. Utilizing an information criterion such as AIC, the number of sources and octave location were also enabled.
"Specmurt analysis" was proposed for polyphonic signal analysis. The inverse Fourier transform of linear spectrum with log-frequency was called "specmurt". Along log-scaled frequency, observed linear spectrum is regarded as convolution of distribution density of fundamental frequencies and harmonic structures of multiple tones which are assumed identical. This idea opened up a new signal processing capabilities.

researchmap

▼display all