Faculty Profiles - ITOYAMA KATSUTOSHI

写真a

ITOYAMA KATSUTOSHI

Organization

School of Engineering Associate Professor

External link

Degree

Ph.D. (Informatics) ( Kyoto University )

Research Interests

動物音声分析
ロボット聴覚
音楽情報処理
統計的音響信号処理

Research Areas

Informatics / Intelligent informatics / 統計的音響信号処理, 音楽情報処理, ロボット聴覚, 動物音声分析

Education

Kyoto University Graduate School of Informatics Department of Intelligence Science and Technology

2008.4 - 2011.3

　 More details

Country： Japan

researchmap
Kyoto University Graduate School of Informatics Department of Intelligence Science and Technology

2006.4 - 2008.3

　 More details

Country： Japan

researchmap
Kyoto University Faculty of Engineering School of Informatics & Mathematical Science

2002.4 - 2006.3

　 More details

Country： Japan

researchmap

Research History

Tokyo Institute of Technology Department of Systems and Control Engineering, School of Engineering Specially Appointed Associate Professor

2021.4

　 More details

Country：Japan

researchmap
Tokyo Institute of Technology Department of Systems and Control Engineering, School of Engineering Specially Appointed Associate Professor (Lecturer)

2018.4 - 2021.3

　 More details

Country：Japan

researchmap
Kyoto University Graduate School of Informatics Department of Intelligence Science and Technology Assistant Professor

2011.6 - 2018.3

　 More details

Country：Japan

researchmap
Kyoto University Graduate School of Informatics Department of Intelligence Science and Technology Visiting Junior Associate Professor

2011.4 - 2011.5

　 More details

Country：Japan

researchmap

Professional Memberships

INFORMATION PROCESSING SOCIETY OF JAPAN

　 More details

researchmap
The Robotics Society of Japan

　 More details

researchmap
IEEE

　 More details

researchmap
ACOUSTICAL SOCIETY OF JAPAN

　 More details

researchmap

Committee Memberships

電子情報通信学会ソサイエティ論文誌編集委員会査読委員

2020.6

　 More details

Committee type：Academic society

researchmap
情報処理学会音楽情報科学研究会幹事

2020.4

　 More details

Committee type：Academic society

researchmap
情報処理学会音楽情報科学研究会運営委員

2016.4 - 2020.3

　 More details

Committee type：Academic society

researchmap
情報処理学会音楽情報科学研究会運営委員

2011.4 - 2015.3

　 More details

Committee type：Academic society

researchmap
10th International Society for Music Information Retrieval Conference Local Organizing Committee

　 More details

Committee type：Academic society

researchmap

Papers

An Efficient GPU-based Implementation for Noise Robust Sound Source Localization.

Zirui Lin, Masayuki Takigahira, Naoya Terakado, Haris Gulzar, Monikka Roslianna Busto, Takeharu Eda, Katsutoshi Itoyama, Kazuhiro Nakadai, Hideharu Amano

CoRR abs/2504.03373 2025.4

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2504.03373

researchmap
話者情報の半教師あり学習を用いたオフライン話者ダイアライゼーション

阿坂脩平, Yen Benjamin, 糸山克寿, 中臺一博

人工知能学会第二種研究会資料 2024 ( Challenge-066 ) 04 2024.12

　More details

Language：Japanese Publisher：一般社団法人人工知能学会

DOI： 10.11517/jsaisigtwo.2024.challenge-066_04

CiNii Research

researchmap
小領域移動物体検出における背景フローの弁別手法

西田健次, 中臺一博, 糸山克寿

人工知能学会第二種研究会資料 2024 ( Challenge-066 ) 10 2024.12

　More details

Language：Japanese Publisher：一般社団法人人工知能学会

DOI： 10.11517/jsaisigtwo.2024.challenge-066_10

CiNii Research

researchmap
Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?

Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai

EURASIP Journal on Audio, Speech, and Music Processing 2024 ( 1 ) 66 - 66 2024.12

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1186/s13636-024-00387-x

researchmap
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning.

Runwu Shi, Katsutoshi Itoyama, Kazuhiro Nakadai

CoRR abs/2412.20146 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2412.20146

researchmap
UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios.

Ragib Amin Nihal, Benjamin Yen 0001, Katsutoshi Itoyama, Kazuhiro Nakadai

ICPR (14) 145 - 162 2024

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-031-78341-8_10

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/icpr/icpr2024-14.html#NihalYIN24
Improving Impressions of Response Delay in AI-based Spoken Dialogue Systems.

Shuhei Asaka, Katsutoshi Itoyama, Kazuhiro Nakadai

33rd IEEE International Conference on Robot and Human Interactive Communication(RO-MAN) 1416 - 1421 2024

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/RO-MAN60168.2024.10731216

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ro-man/ro-man2024.html#AsakaIN24
LCMV-based Scan-and-Sum Beamforming for Region Source Extraction.

Aoto Yasue, Benjamin Yen 0001, Katsutoshi Itoyama, Kazuhiro Nakadai

APSIPA 1 - 6 2024

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/APSIPAASC63619.2025.10848984

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/apsipa/apsipa2024.html#Yasue0IN24
A Video Vision Transformer for Sound Source Localization.

Haruto Yokota, Mert Bozkurtlar, Benjamin Yen 0001, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

32nd European Signal Processing Conference(EUSIPCO) 106 - 110 2024

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

researchmap

Other Link： https://dblp.uni-trier.de/rec/conf/eusipco/2024
UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios.

Ragib Amin Nihal, Benjamin Yen 0001, Katsutoshi Itoyama, Kazuhiro Nakadai

CoRR abs/2408.04922 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2408.04922

researchmap
Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?

Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai

CoRR abs/2407.15310 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2407.15310

researchmap
SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization.

Jiang Wang, Yuanzheng He, Daobilige Su, Katsutoshi Itoyama, Kazuhiro Nakadai, Junfeng Wu 0001, Shoudong Huang, Youfu Li 0001, He Kong

CoRR abs/2405.19813 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2405.19813

researchmap
From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection with Super Resolution.

Ragib Amin Nihal, Benjamin Yen 0001, Katsutoshi Itoyama, Kazuhiro Nakadai

CoRR abs/2401.14661 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2401.14661

researchmap
Real Time Sound Source Localization Using von-Mises ResNet.

Mert Bozkurtlar, Benjamin Yen 0001, Katsutoshi Itoyama, Kazuhiro Nakadai

SII 466 - 471 2024

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/SII58957.2024.10417224

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/sii/sii2024.html#BozkurtlarYIN24
SLAM-Based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization.

Jiang Wang, Yuanzheng He, Daobilige Su, Katsutoshi Itoyama, Kazuhiro Nakadai, Junfeng Wu 0001, Shoudong Huang, Youfu Li 0001, He Kong

IEEE Trans. Robotics 40 4024 - 4044 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/TRO.2024.3410456

researchmap
Improving Noise Robustness of Automatic Speech Recognition Based on a Parallel Adapter Model with Near-Identity Initialization.

Takahiro Osaki, Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

IEA/AIE 454 - 466 2024

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-981-97-4677-4_37

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ieaaie/ieaaie2024.html#OsakiSINN24
FPGA-based Low Power Acceleration of HARK Sound Source Localization.

Zirui Lin, Katsutoshi Itoyama, Kazuhiro Nakadai, Hideharu Amano

COOL CHIPS 1 - 6 2024

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/COOLCHIPS61292.2024.10531180

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/coolchips/coolchips2024.html#LinINA24
FPGA based Power-Efficient Edge Server to Accelerate Speech Interface for Socially Assistive Robotics Reviewed

Haris Gulzar, Muhammad Shakeel, Katsutoshi Itoyama, Kazuhiro Nakadai, Kenji Nishida, Hideharu Amano, Takeharu Eda

2023 IEEE/SICE International Symposium on System Integration (SII) 2023.1

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/sii55687.2023.10039093

researchmap
An Ensemble Method for Multiple Speech Enhancement Using Deep Learning Reviewed

Masahiko Fujita, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2023 IEEE/SICE International Symposium on System Integration (SII) 2023.1

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/sii55687.2023.10039167

researchmap
Metric-Based Multimodal Meta-Learning for Human Movement Identification Via Footstep Recognition Reviewed

Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2023 IEEE/SICE International Symposium on System Integration (SII) 2023.1

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/sii55687.2023.10039089

researchmap
Audio-Visual Class Association Based on Two-stage Self-supervised Contrastive Learning towards Robust Scene Analysis Reviewed

Kei Suzuki, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2023 IEEE/SICE International Symposium on System Integration (SII) 2023.1

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/sii55687.2023.10039379

researchmap
Assessment of Simultaneous Calibration for Positions, Orientations, and Time Offsets in Multiple Microphone Arrays Systems Reviewed

Chishio Sugiyama, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2023 IEEE/SICE International Symposium on System Integration (SII) 2023.1

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/sii55687.2023.10039440

researchmap
Reconstruction of Depth Scenes Based on Echolocation Reviewed

Hidehiko Kishinami, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2023 IEEE/SICE International Symposium on System Integration (SII) 2023.1

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/sii55687.2023.10039271

researchmap
Classification of Ball Rotation Direction Using Hitting Sound in Tennis and Investigation of Generalization Performance Improvement

Naoki Yamamoto, Kenji Nishida, Katsutoshi Itoyama, Kazuhiro Nakadai

Proceedings of IEEE/SICE International Symposium on System Integration (SII 2023) 2023.1

　More details

Publishing type：Research paper (international conference proceedings)

researchmap
Is the Ideal Ratio Mask Really the Best? - Exploring the Best Extraction Performance and Optimal Mask of Mask-based Beamformers.

Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai

CoRR abs/2309.12065 2023

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2309.12065

researchmap
Unsupervised Domain Adaptation of Universal Source Separation Based on Neural Full-Rank Spatial Covariance Analysis.

Takahiro Aizawa, Yoshiaki Bando, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai, Masaki Onishi

MLSP 1 - 6 2023

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/MLSP55844.2023.10285999

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/mlsp/mlsp2023.html#AizawaBINNO23
Improving Sign Language Understanding Introducing Label Smoothing.

Tan Sihan, Khan Nabeela Khanum, Katsutoshi Itoyama, Kazuhiro Nakadai

RO-MAN 113 - 118 2023

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/RO-MAN57019.2023.10309531

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ro-man/ro-man2023.html#SihanKIN23
miniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on the Edge.

Haris Gulzar, Monikka Roslianna Busto, Takeharu Eda, Katsutoshi Itoyama, Kazuhiro Nakadai

INTERSPEECH 3277 - 3281 2023

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.21437/Interspeech.2023-1162

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/interspeech/interspeech2023.html#GulzarBEIN23
Is the Ideal Ratio Mask Really the Best? - Exploring the Best Extraction Performance and Optimal Mask of Mask-based Beamformers.

Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai

APSIPA ASC 1843 - 1850 2023

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/APSIPAASC58517.2023.10317440

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/apsipa/apsipa2023.html#HiroeIN23
Development of a continuum robot enhanced with distributed sensors for search and rescue

Yu Yamauchi, Yuichi Ambe, Hikaru Nagano, Masashi Konyo, Yoshiaki Bando, Eisuke Ito, Solvi Arnold, Kimitoshi Yamazaki, Katsutoshi Itoyama, Takayuki Okatani, Hiroshi G. Okuno, Satoshi Tadokoro

ROBOMECH Journal 9 ( 1 ) 2022.12

　More details

Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

Abstract

Continuum robots can enter narrow spaces and are useful for search and rescue missions in disaster sites. The exploration efficiency at disaster sites improves if the robots can simultaneously acquire several pieces of information. However, a continuum robot that can simultaneously acquire information to such an extent has not yet been designed. This is because attaching multiple sensors to the robot without compromising its body flexibility is challenging. In this study, we installed multiple small sensors in a distributed manner to develop a continuum-robot system with multiple information-gathering functions. In addition, a field experiment with the robot demonstrated that the gathered multiple information has a potential to improve the searching efficiency. Concretely, we developed an active scope camera with sensory functions, which was equipped with a total of 80 distributed sensors, such as inertial measurement units, microphones, speakers, and vibration sensors. Herein, we consider space-saving, noise reduction, and the ease of maintenance for designing the robot. The developed robot can communicate with all the attached sensors even if it is bent with a minimum bending radius of 250 mm. We also developed an operation interface that integrates search-support technologies using the information gathered via sensors. We demonstrated the survivor search procedure in a simulated rubble environment of the Fukushima Robot Test Field. We confirmed that the information provided through the operation interface is useful for searching and finding survivors. The limitations of the designed system are also discussed. The development of such a continuum robot system, with a great potential for several applications, extends the application of continuum robots to disaster management and will benefit the community at large.

DOI： 10.1186/s40648-022-00223-x

researchmap

Other Link： https://link.springer.com/article/10.1186/s40648-022-00223-x/fulltext.html
低解像度画像からの小領域物体の検出手法の検討

西田健次, 糸山克寿, 中臺一博

人工知能学会第二種研究会資料 2022 ( Challenge-061 ) 03 2022.11

　More details

Language：Japanese Publisher：一般社団法人人工知能学会

野鳥の生態観測のための全天録画画像から、実際に野鳥の録画されている時間帯を切り出す手法について検討を行った。野鳥に対する画像解像度が低く、また、樹木の枝などでのオクルージョンが繰り返されるため、通常の物体検出、物体追跡手法では対応が難しい。野鳥自体の動き、野鳥による樹木の枝の動きを手掛かりとして、検出・追跡を行う手法について検討した。

DOI： 10.11517/jsaisigtwo.2022.challenge-061_03

CiNii Research

researchmap
PyHARK: HARKのオンライン・オフライン処理用Pythonパッケージ

中臺一博, 糸山克寿, 瀧ヶ平将行

人工知能学会第二種研究会資料 2022 ( Challenge-061 ) 04 2022.11

　More details

Language：Japanese Publisher：一般社団法人人工知能学会

本稿では，ロボット聴覚オープンソースソフトウェア HARK 3.4 で新規に導入されるPyHARK をHARK講習会に先立ち紹介する。PyHARK は HARK の Python インタフェースを提供するパッケージであり，Python から HARK の機能のオンライン・オフライン呼び出しを可能にする実装である。そのアーキテクチャ，既存のHARKとの違い，使い方を中心に解説する。

DOI： 10.11517/jsaisigtwo.2022.challenge-061_04

CiNii Research

researchmap
複数音源追跡におけるドローン群の行動計画の検討

山田泰基, 糸山克寿, 西田健次, 中臺一博

人工知能学会第二種研究会資料 2022 ( Challenge-061 ) 07 2022.11

　More details

Language：Japanese Publisher：一般社団法人人工知能学会

DOI： 10.11517/jsaisigtwo.2022.challenge-061_07

CiNii Research

researchmap
任意の混合音を入力としたマイクロホンアレイ形状のキャリブレーション

糸山克寿, 中臺一博

人工知能学会第二種研究会資料 2022 ( Challenge-061 ) 11 2022.11

　More details

Language：Japanese Publisher：一般社団法人人工知能学会

本稿では，マイクロホンアレイの形状，すなわち各マイクロホンの位置をキャリブレーションする手法について述べる．特殊な試験音ではなく任意の混合音を入力とするため，(1)基準位置に基づくマイクロホン位置の事前確率、(2)音源スペクトルの事前確率、(3)録音スペクトルの条件付確率の3つの確率の積として定義されるスペクトルの確率的生成モデルに基づく反復アルゴリズムで推定を行う．

DOI： 10.11517/jsaisigtwo.2022.challenge-061_11

CiNii Research

researchmap
Outdoor evaluation of sound source localization for drone groups using microphone arrays Reviewed

Taiki Yamada, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022.10

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/iros47612.2022.9982039

researchmap
Spotforming by NMF Using Multiple Microphone Arrays Reviewed

Yasuhiro Kagimoto, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022.10

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/iros47612.2022.9981808

researchmap
Weakly-Supervised Neural Full-Rank Spatial Covariance Analysis for a Front-End System of Distant Speech Recognition Reviewed

Yoshiaki Bando, Takahiro Aizawa, Katsutoshi Itoyama, Kazuhiro Nakadai

Interspeech 2022 3824 - 3828 2022.9

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2022-11077

researchmap
Optimization of Microphone Array Placement for Sound Source Localization Using Drones with Microphone Arrays

Taiki Yamada, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Second International Symposium on Noise from UAVs UASs and eVTOLs (Quiet Drones 2022) 39 2022.7

　More details

Publishing type：Research paper (international conference proceedings)

researchmap
Evaluation of a Speech Enhancement Method Combining Ensemble Time-Frequency Masking and Beamforming Reviewed

Masahiko Fujita, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Journal of the Robotics Society of Japan 40 ( 7 ) 631 - 634 2022.7

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：The Robotics Society of Japan

DOI： 10.7210/jrsj.40.631

researchmap
Visual Scene Reconstruction based on Echolocation with a Generative Adversarial Network Reviewed

Hidehiko Kishinami, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Journal of the Robotics Society of Japan 40 ( 4 ) 351 - 354 2022.4

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：The Robotics Society of Japan

DOI： 10.7210/jrsj.40.351

researchmap
System-on-Chip based Edge Device for Speech Commands Recognition Reviewed

Haris Gulzar, Muhammad Shakeel, Kenji Nishida, Katsutoshi Itoyama, Kazuhiro Nakadai, Hideharu Amano

2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) 2022.4

　More details

Publishing type：Research paper (international conference proceedings)

researchmap
Introduction to robot audition technology for bird localization and classification in the wild

Kazuhiro Nakadai, Ryo Yamamoto, Katsutoshi Itoyama, Kenji Nishida, Shiho Matsubayashi, Reiji Suzuki, Hiroshi G. Okuno

2022

　More details

Publishing type：Research paper (international conference proceedings)

researchmap
複数マイクロホンアレイを用いたNMFによる空間音源分離法の残響下での評価

鍵本泰宏, 糸山克寿, 西田健次, 中臺一博

人工知能学会第二種研究会資料 2021 ( Challenge-058 ) 05 2021.11

　More details

Language：Japanese Publisher：一般社団法人人工知能学会

DOI： 10.11517/jsaisigtwo.2021.challenge-058_05

CiNii Research

researchmap
Multichannel environmental sound segmentation: with separately trained spectral and spatial features

Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Applied Intelligence 51 ( 11 ) 8245 - 8259 2021.11

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Springer

DOI： 10.1007/s10489-021-02314-5

Scopus

researchmap
CASE: CNN Acceleration for Speech-Classification in Edge-Computing Reviewed

Haris Gulzar, Muhammad Shakeel, Kenji Nishida, Katsutoshi Itoyama, Kazuhiro Nakadai, Hideharu Amano

2021 IEEE Cloud Summit 63 - 68 2021.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ieeecloudsummit52029.2021.00018

researchmap
Assessment of sound source tracking using multiple drones equipped with multiple microphone arrays

Taiki Yamada, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

International Journal of Environmental Research and Public Health 18 ( 17 ) 2021.9

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：MDPI

DOI： 10.3390/ijerph18179039

Scopus

PubMed

researchmap
Assessment of von Mises-Bernoulli Deep Neural Network in Sound Source Localization Reviewed

Katsutoshi Itoyama, Yoshiya Morimoto, Shungo Masaki, Ryosuke Kojima, Kenji Nishida, Kazuhiro Nakadai

Interspeech 2021 2152 - 2156 2021.8

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2021-1050

researchmap
Simultaneous Calibration of Positions, Orientations, and Time Offsets, Among Multiple Microphone Arrays Reviewed

Chishio Sugiyama, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2021 IEEE International Conference on Autonomous Systems (ICAS) 2021.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icas49788.2021.9551166

researchmap
Detecting earthquakes: a novel deep learning-based approach for effective disaster response Reviewed

Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Applied Intelligence 2021.4

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

DOI： 10.1007/s10489-021-02285-7

researchmap

Other Link： http://link.springer.com/article/10.1007/s10489-021-02285-7/fulltext.html
EMC: Earthquake Magnitudes Classification on Seismic Signals via Convolutional Recurrent Networks Reviewed

Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2021 IEEE/SICE International Symposium on System Integration (SII) 388 - 393 2021.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ieeeconf49454.2021.9382696

researchmap
Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2021 IEEE/SICE International Symposium on System Integration (SII) 382 - 387 2021.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ieeeconf49454.2021.9382730

researchmap
Assessment of a Beamforming Implementation Developed for Surface Sound Source Separation Reviewed

Zhi Zhong, Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2021 IEEE/SICE International Symposium on System Integration (SII 2021) 369 - 374 2021.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ieeeconf49454.2021.9382648

researchmap
Sound Source Tracking Using Integrated Direction Likelihood for Drones with Microphone Arrays Reviewed

Taiki Yamada, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2021 IEEE/SICE International Symposium on System Integration (SII) 394 - 399 2021.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ieeeconf49454.2021.9382619

researchmap
Two-Dimensional Environment Recognition by Audible Sound with Weighted Likelihood Function and Standing Wave Reviewed

Hidehiko Kishinami, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Journal of the Robotics Society of Japan 39 ( 3 ) 271 - 274 2021

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：The Robotics Society of Japan

DOI： 10.7210/jrsj.39.271

researchmap
Detection of ball spin direction using hitting sound in tennis Reviewed

Naoki Yamamoto, Kenji Nishida, Katsutoshi Itoyama, Kazuhiro Nakadai

8th International Conference on Sport Sciences Research and Technology Support (isSPORTS 2020) 30 - 37 2020.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SCITEPRESS - Science and Technology Publications

DOI： 10.5220/0010107600300037

researchmap
Sound event aware environmental sound segmentation with Mask U-Net

Y. Sudo, K. Itoyama, K. Nishida, K. Nakadai

Advanced Robotics 34 ( 20 ) 1280 - 1290 2020.10

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Robotics Society of Japan

DOI： 10.1080/01691864.2020.1829040

Scopus

researchmap
Synchronization of microphones based on rank minimization of warped spectrum for asynchronous distributed recording Reviewed

Katsutoshi Itoyama, Kazuhiro Nakadai

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2020) 4842 - 4847 2020.10

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Onset-informed source separation using non-negative matrix factorization with binary masks Reviewed

Yuta Kusaka, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

23rd International Conference on Digital Audio Effects (DAFx2020) 289 - 296 2020.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Calibration of a microphone array based on a probabilistic model of microphone positions Reviewed

Katsuhiro Dan, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices (IEA/AIE 2020) 614 - 625 2020.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer International Publishing

DOI： 10.1007/978-3-030-55789-8_53

researchmap
Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories Reviewed

Ryo Nishikimi, Eita Nakamura, Masataka Goto, Katsutoshi Itoyama, Kazuyoshi Yoshii

IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 1678 - 1691 2020.5

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/TASLP.2020.2996095

researchmap
Design and assessment of a scan-and-sum beamformer for surface sound source separation Reviewed

Zhi Zhong, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2020 IEEE/SICE International Symposium on System Integration (SII2020) 808 - 813 2020.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SII46433.2020.9025981

researchmap
Audio-visual 3D reconstruction framework for dynamic scenes Reviewed

Takashi Konno, Kenji Nishida, Katsutoshi Itoyama, Kazuhiro Nakadai

2020 IEEE/SICE International Symposium on System Integration (SII2020) 802 - 807 2020.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SII46433.2020.9025812

researchmap
Sound source tracking by drones with microphone arrays Reviewed

Taiki Yamada, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2020 IEEE/SICE International Symposium on System Integration (SII2020) 796 - 801 2020.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SII46433.2020.9026185

researchmap
Sound source localization based on von-Mises-Bernoulli deep neural network Reviewed

Kazuhiro Nakadai, Shungo Masaki, Ryosuke Kojima, Osamu Sugiyama, Katsutoshi Itoyama, Kenji Nishida

2020 IEEE/SICE International Symposium on System Integration (SII2020) 658 - 663 2020.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SII46433.2020.9025880

researchmap
Multi-channel environmental sound segmentation Reviewed

Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2020 IEEE/SICE International Symposium on System Integration (SII2020) 820 - 825 2020.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SII46433.2020.9025963

researchmap
Environmental sound segmentation utilizing mask U-Net Reviewed

Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019) 5340 - 5345 2019.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/IROS40897.2019.8967954

researchmap
Improvement of DOA estimation by using quaternion output in sound event localization and detection Reviewed

Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2019 DCASE Workshop 244 - 247 2019.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Joint transcription of lead, bass, and rhythm guitars based on a factorial hidden semi-Markov model Reviewed

Kentaro Shibata, Ryo Nishikimi, Satoru Fukayama, Masataka Goto, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019) 236 - 240 2019.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICASSP.2019.8682817

researchmap
2D sound source position estimation using microphone arrays and its application to a VR-based bird song analysis system Reviewed

Daniel Gabriel, Ryosuke Kojima, Kotaro Hoshiba, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Advanced Robotics 33 ( 7-8 ) 403 - 414 2019.3

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1080/01691864.2019.1598491

researchmap
Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition Reviewed

Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 ( 5 ) 960 - 971 2019.3

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/TASLP.2019.2907015

researchmap
ImPACT-TRC thin serpentine robot platform for urban search and rescue

Masashi Konyo, Yuichi Ambe, Hikaru Nagano, Yu Yamauchi, Satoshi Tadokoro, Yoshiaki Bando, Katsutoshi Itoyama, Hiroshi G. Okuno, Takayuki Okatani, Kanta Shimizu, Eisuke Ito

Disaster Robotics 25 - 76 2019.1

　More details

Language：English Publisher：Springer

DOI： 10.1007/978-3-030-05321-5_2

researchmap
Design and assessment of multiple-sound source localization using microphone arrays Reviewed

Daniel Gabriel, Ryosuke Kojima, Kotaro Hoshiba, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2019 IEEE/SICE International Symposium on System Integration (SII 2019) 199 - 204 2019.1

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SII.2019.8700368

researchmap
Extreme Audition for Active Scope Camera

Bando Yoshiaki, Ambe Yuichi, Itoyama Katsutoshi, G. Okuno Hiroshi

Journal of the Robotics Society of Japan 37 ( 9 ) 808 - 813 2019

　More details

Language：Japanese Publisher：The Robotics Society of Japan



DOI： 10.7210/jrsj.37.808

CiNii Books

researchmap
Development of Tough Snake Robot Systems

Fumitoshi Matsuno, Tetsushi Kamegawa, Wei Qi, Tatsuya Takemori, Motoyasu Tanaka, Mizuki Nakajima, Kenjiro Tadakuma, Masahiro Fujita, Yosuke Suzuki, Katsutoshi Itoyama, Hiroshi G. Okuno, Yoshiaki Bando, Tomofumi Fujiwara, Satoshi Tadokoro

Disaster Robotics 267 - 326 2019

　More details

Language：English Publisher：Springer

DOI： 10.1007/978-3-030-05321-5_6

researchmap
Sequential generation of singing F0 contours from musical note sequences based on WaveNet Reviewed

Yusuke Wada, Ryo Nishikimi, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018) 983 - 989 2018.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.23919/APSIPA.2018.8659502

researchmap
Chord-Aware Automatic Music Transcription Based on Hierarchical Bayesian Integration of Acoustic and Language Models Reviewed

Yuta Ojima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

APSIPA Transactions on Signal and Information Processing 7 ( e14 ) 1 - 14 2018.11

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Now Publishers

DOI： 10.1017/ATSIP.2018.17

researchmap
Interactive arrangement of chords and melodies based on a tree-structured generative model Reviewed

Hiroaki Tsushima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

19th International Society for Music Information Retrieval Conference (ISMIR 2018) 145 - 151 2018.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Signal restoration based on bi-directional LSTM with spectral filtering for robot audition Reviewed

Ryosuke Taniguchi, Kotaro Hoshiba, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2018) 955 - 960 2018.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ROMAN.2018.8525793

researchmap
Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization Reviewed

Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018) 716 - 720 2018.4

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICASSP.2018.8461530

researchmap
Unsupervised beamforming based on multichannel nonnegative matrix factorization for noisy speech recognition Reviewed

Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018) 5734 - 5738 2018.4

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICASSP.2018.8462642

researchmap
Generative statistical models with self-emergent grammar of chord sequences Reviewed

Hiroaki Tsushima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

Journal of New Music Research 47 ( 3 ) 226 - 248 2018.3

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Informa UK Limited

DOI： 10.1080/09298215.2018.1447584

researchmap
Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms Reviewed

Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Tatsuya Kawahara, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 ( 2 ) 215 - 230 2018.2

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/TASLP.2017.2772340

researchmap
Bayesian multichannel audio source separation based on integrated source and spatial models Reviewed

Kousuke Itakura, Yoshiaki Bando, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 ( 4 ) 831 - 846 2018.1

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/TASLP.2017.2789320

researchmap
Synchronization of multiple A/D converters based on spectral stretch

ITOYAMA Katsutoshi, NAKADAI Kazuhiro

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2018 2P1-K05 2018

　More details

Language：Japanese Publisher：The Japan Society of Mechanical Engineers

We present a newmethod for synchronization of multiple independent A/D converters for microphone array processing. Since the spectrum obtained in each channel becomes a stretched version of the source spectrum, we construct a probabilistic generative model of spectrum stretch. Synchronization is realized by solving the inverse problem of the probabilistic generative model.

DOI： 10.1299/jsmermd.2018.2p1-k05

CiNii Research

researchmap
Development and Future Extension of Snake-like Robots on ImPACT TRC Project Reviewed

Matsuno Fumitoshi, Fujiwara Tomofumi, Kamegawa Tetsushi, Takemori Tatsuya, Tanaka Motoyasu, Tadakuma Kenjiro, Suzuki Yosuke, Bando Yoshiaki, Itoyama Katsutoshi, Okuno Hiroshi G

Journal of the Robotic Society of Japan 35 ( 10 ) 720 - 726 2017.12

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：一般社団法人日本ロボット学会

DOI： 10.7210/jrsj.35.720

CiNii Books

researchmap
Multi-party Interactions by Quizmaster Robot in Speech-based Jeopardy! like Games Reviewed

Izaya Nishimuta, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

2017 International Conference on Computational Science and Computational Intelligence (CSCI2017) 1787 - 1792 2017.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CSCI.2017.313

Web of Science

Scopus

researchmap
Scale- and rhythm-aware musical note estimation for vocal F0 trajectories based on a semi-tatum-synchronous hierarchical hidden semi-Markov model Reviewed

Ryo Nishikimi, Eita Nakamura, Masataka Goto, Katsutoshi Itoyama, Kazuyoshi Yoshii

18th International Society for Music Information Retrieval Conference (ISMIR 2017) 376 - 382 2017.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://dblp.org/rec/conf/ismir/NishikimiNGIY17
Function- and rhythm-aware melody harmonization based on tree-structured parsing and split-merge sampling of chord sequences Reviewed

Hiroaki Tsushima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

18th International Society for Music Information Retrieval Conference (ISMIR 2017) 502 - 508 2017.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://dblp.org/rec/conf/ismir/TsushimaNIY17
Infinite probabilistic latent component analysis for audio source separation Reviewed

Kazuyoshi Yoshii, Eita Nakamura, Katsutoshi Itoyama, Masataka Goto

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP 2017) 1 - 6 2017.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/MLSP.2017.8168189

Web of Science

researchmap
Semi-blind speech enhancement based on recurrent neural network for source separation and dereverberation Reviewed

Masaya Wake, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP 2017) 1 - 6 2017.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/MLSP.2017.8168191

researchmap
A Singing Instrument for Real-Time Vocal-Part Arrangement of Music Audio Signals Reviewed

Yuta Ojima, Tomoyasu Nakano, Satoru Fukayama, Jun Kato, Masataka Goto, Katsutoshi Itoyama, Kazuyoshi Yoshii

Sound and Music Computing Conference (SMC) 443 - 449 2017.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
An Adaptive Karaoke System that Plays Accompaniment Parts of Music Audio Signals Synchronously with Users' Singing Voices Reviewed

Yusuke Wada, Yoshiaki Bando, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

Sound and Music Computing Conference (SMC) 110 - 116 2017.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Bayesian multichannel nonnegative matrix factorization for audio source separation and localization Reviewed

Kousuke Itakura, Yoshiaki Bando, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017) 551 - 555 2017.3

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICASSP.2017.7952216

researchmap
Audio-visual beat tracking based on a state-space model for a robot dancer performing with a human dancer Reviewed

Misato Ohkita, Yoshiaki Bando, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

Journal of Robotics and Mechatronics 29 ( 1 ) 125 - 136 2017.2

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Fuji Technology Press

DOI： 10.20965/jrm.2017.p0125

Scopus

researchmap
Simultaneous identification and localization of still and mobile speakers based on binaural robot audition Reviewed

Karim Youssef, Katsutoshi Itoyama, Kazuyoshi Yoshii

Journal of Robotics and Mechatronics 29 ( 1 ) 59 - 71 2017.2

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Fuji Technology Press

DOI： 10.20965/jrm.2017.p0059

Scopus

researchmap
Layout optimization of cooperative distributed microphone arrays based on estimation of source separation performance Reviewed

Kouhei Sekiguchi, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii

Journal of Robotics and Mechatronics 29 ( 1 ) 83 - 93 2017.2

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Fuji Technology Press

DOI： 10.20965/jrm.2017.p0083

Scopus

researchmap
Low latency and high quality two-stage human-voice-enhancement system for a hose-shaped rescue robot Reviewed

Yoshiaki Bando, Hiroshi Saruwatari, Nobutaka Ono, Shoji Makino, Katsutoshi Itoyama, Daichi Kitamura, Masaru Ishimura, Moe Takakusaki, Narumi Mae, Kouei Yamaoka, Yutaro Matsui, Yuichi Ambe, Masashi Konyo, Satoshi Tadokoro, Kazuyoshi Yoshii, Hiroshi G. Okuno

Journal of Robotics and Mechatronics 29 ( 1 ) 198 - 212 2017.2

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Fuji Technology Press Ltd.

This paper presents the design and implementation of a two-stage human-voice enhancement system for a hose-shaped rescue robot. When a microphone-equipped hose-shaped robot is used to search for a victim under a collapsed building, human-voice enhancement is crucial because the sound captured by a microphone array is contaminated by the ego-noise of the robot. For achieving both low latency and high quality, our system combines online and offline human-voice enhancement, providing an overview first and then details on demand. The online enhancement is used for searching for a victim in real time, while the offline one facilitates scrutiny by listening to highly enhanced human voices. Our online enhancement is based on an online robust principal component analysis, and our offline enhancement is based on an independent low-rank matrix analysis. The two enhancement methods are integrated with Robot Operating System (ROS). Experimental results showed that both the online and offline enhancement methods outperformed conventional methods.

DOI： 10.20965/jrm.2017.p0198

CiNii Books

researchmap
Singing voice separation and vocal F0 estimation based on mutual combination of robust principal component analysis and subharmonic summation Reviewed

Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii

IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 ( 11 ) 2084 - 2095 2016.11

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/TASLP.2016.2577879

researchmap
Online simultaneous localization and mapping of multiple sound sources and asynchronous microphone arrays Reviewed

Kouhei Sekiguchi, Yoshiaki Bando, Keisuke Nakamura, Kazuhiro Nakadai, Katsutoshi Itoyama, Kazuyoshi Yoshii

2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016) 1973 - 1979 2016.10

　More details

Authorship：Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/IROS.2016.7759311

researchmap
Sound-based online localization for an in-pipe snake robot Reviewed

Yoshiaki Bando, Hiroki Suhara, Motoyasu Tanaka, Tetsushi Kamegawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Fumitoshi Matsuno, Hiroshi G. Okuno

2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR 2016) 207 - 213 2016.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SSRR.2016.7784300

researchmap
Student's t multichannel nonnegative matrix factorization for blind source separation Reviewed

Koichi Kitamura, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii

2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC2016) 1 - 5 2016.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IWAENC.2016.7602889

Web of Science

researchmap
A hierarchical Bayesian model of chords, pitches, and spectrograms for multipitch analysis Reviewed

Yuta Ojima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

17th International Society for Music Information Retrieval Conference (ISMIR 2016) 309 - 315 2016.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://dblp.org/rec/conf/ismir/OjimaNIY16
A unified Bayesian model of time-frequency clustering and low-rank approximation for multi-channel source separation Reviewed

Kousuke Itakura, Yoshiaki Bando, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

2016 24th European Signal Processing Conference (EUSIPCO 2016) 2280 - 2284 2016.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/EUSIPCO.2016.7760655

researchmap
Rhythm transcription of MIDI performances based on hierarchical Bayesian modelling of repetition and modification of musical note patterns Reviewed

Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

2016 24th European Signal Processing Conference (EUSIPCO 2016) 1946 - 1950 2016.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/EUSIPCO.2016.7760588

researchmap
Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array Reviewed

Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

2016 24th European Signal Processing Conference (EUSIPCO 2016) 1018 - 1022 2016.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/EUSIPCO.2016.7760402

researchmap
Musical note estimation for F0 trajectories of singing voices based on a Bayesian semi-beat-synchronous HMM Reviewed

Ryo Nishikimi, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

17th International Society for Music Information Retrieval Conference (ISMIR 2016) 461 - 467 2016.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://dblp.org/rec/conf/ismir/NishikimiNIY16
Sparse learning for music signal analysis

Kazuyoshi Yoshii, Katsutoshi Itoyama

Journal of the Institute of Electronics, Information and Communication Engineers 99 ( 5 ) 456 - 460 2016.5

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：Institute of Electronics Information Communication Engineers

Scopus

researchmap
Parallel speech corpora of Japanese dialects Reviewed

Koichiro Yoshino, Naoki Hirayama, Shinsuke Mori, Fumihiko Takahashi, Katsutoshi Itoyama, Hiroshi G. Okuno

Tenth International Conference on Language Resources and Evaluation (LREC 2016) 4652 - 4657 2016.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap

Other Link： https://dblp.org/rec/conf/lrec/YoshinoHMTIO16
音楽音響信号解析のためのスパース学習（小特集「スパースモデリングの発展 —原理から応用まで—」）

吉井和佳, 糸山克寿

電子情報通信学会誌 99 ( 5 ) 456 - 460 2016.5

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap
歌声・調波楽器音・打楽器音分離とユーザ演奏のリアルタイム可視化に基づく音楽演奏練習システム Reviewed

土橋彩香, 池宮由楽, 糸山克寿, 吉井和佳

情報処理学会インタラクション2016 97 - 105 2016.3

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Student's t nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation Reviewed

Kazuyoshi Yoshii, Katsutoshi Itoyama, Masataka Goto

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016) 51 - 55 2016.3

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICASSP.2016.7471635

researchmap
HMM-based attacks on Google's ReCAPTCHA with continuous visual and audio symbols Reviewed

Shotaro Sano, Takuma Otsuka, Katsutoshi Itoyama, Hiroshi G. Okuno

Journal of Information Processing 23 ( 6 ) 814 - 826 2015.11

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Information Processing Society of Japan

CAPTCHAs distinguish humans from automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual characters or audio utterances. The state of the art research suggests that the security of visual and audio CAPTCHAs mainly lies in anti-segmentation techniques, because individual symbol recognition after segmentation can be solved with a high success rate with certain machine learning algorithms. Thus, most recent commercial CAPTCHAs present continuous symbols to prevent automated segmentation. We propose a novel framework that can automatically decode continuous CAPTCHAs and assess its effectiveness with actual CAPTCHA questions from Googles reCAPTCHA. Our framework is constructed on the basis of a sequence recognition method based on hidden Markov models (HMMs), which can be concisely implemented by using an off-the-shelf library HMM toolkit. This method concatenates several HMMs, each of which recognizes a symbol, to build a larger HMM that recognizes a question. Our experimental results reveal vulnerabilities in continuous CAPTCHAs because the solver cracks the visual and audio reCAPTCHA systems with 31.75% and 58.75% accuracy, respectively. We further propose guidelines to prevent possible attacking from HMM-based CAPTCHA solvers on the basis of synthetic experiments with simulated continuous CAPTCHAs.

DOI： 10.2197/ipsjjip.23.814

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00146090/
Music signal decomposition based on sparseness

Kazuyoshi Yoshii, Katsutoshi Itoyama

Journal of the Acoustical Society of Japan 71 ( 11 ) 607 - 614 2015.11

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：一般社団法人日本音響学会

DOI： 10.20697/jasj.71.11_607

CiNii Books

researchmap
Infinite Superimposed Discrete All-pole Modeling for Source-Filter Decomposition of Wavelet Spectrograms Reviewed

Kazuyoshi Yoshii, Katsutoshi Itoyama, Masataka Goto

16th International Society for Music Information Retrieval Conference (ISMIR2015) 86 - 92 2015.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Unified inter- and intra-recording duration model for multiple music audio alignment Reviewed

Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA2015) 1 - 5 2015.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/WASPAA.2015.7336929

Web of Science

researchmap
Human-voice enhancement based on online RPCA for a hose-shaped rescue robot with a microphone array Reviewed

Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

2015 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR 2015) 1 - 6 2015.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SSRR.2015.7442949

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ssrr/ssrr2015.html#BandoIKTNYO15
Identification and localization of one or two concurrent speakers in a binaural robotic context Reviewed

Karim Youssef, Katsutoshi Itoyama, Kazuyoshi Yoshii

2015 IEEE International Conference on Systems, Man, and Cybernetics (IEEE SMC 2015) 407 - 412 2015.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SMC.2015.82

researchmap
Infinite superimposed discrete all-pole modeling for multipitch analysis of wavelet spectrograms Reviewed

Kazuyoshi Yoshii, Katsutoshi Itoyama, Masataka Goto

16th International Society for Music Information Retrieval Conference (ISMIR 2015) 86 - 92 2015.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://dblp.org/rec/conf/ismir/YoshiiIG15
Audio-visual beat tracking based on a state-space model for a music robot dancing with humans Reviewed

Misato Ohkita, Yoshiaki Bando, Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015) 5555 - 5560 2015.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/IROS.2015.7354164

researchmap
Bayesian integration of sound source separation and speech recognition: A new approach to simultaneous speech recognition Reviewed

Kousuke Itakura, Izaya Nishimuta, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii

16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015) 736 - 740 2015.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Toward a quizmaster robot for speech-based multiparty interaction Reviewed

Izaya Nishimuta, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

Advanced Robotics 29 ( 18 ) 1205 - 1219 2015.9

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Informa UK Limited

DOI： 10.1080/01691864.2015.1079504

researchmap
Optimizing the layout of multiple mobile robots for cooperative sound source separation Reviewed

Kouhei Sekiguchi, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015) 5548 - 5554 2015.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/IROS.2015.7354163

researchmap
Microphone-accelerometer based 3D posture estimation for a hose-shaped rescue robot Reviewed

Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015) 5580 - 5586 2015.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/IROS.2015.7354168

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/iros/iros2015.html#BandoIKTNYO15
A Music Performance Assistance System based on Vocal, Harmonic, and Percussive Source Separation and Content Visualization for Music Audio Signals Reviewed

Ayaka Dobashi, Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii

12th Sound and Music Computing Conference (SMC15) 99 - 104 2015.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
A Score-Informed Piano Tutoring System with Mistake Detection and Score Simplification Reviewed

Tsubasa Fukuda, Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii

12th Sound and Music Computing Conference (SMC15) 105 - 110 2015.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
A feedback framework for improved chord recognition based on NMF-based approximate note transcription Reviewed

Satoshi Maruo, Kazuyoshi Yoshii, Katsutoshi Itoyama, Matthias Mauch, Masataka Goto

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015) 196 - 200 2015.4

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICASSP.2015.7177959

researchmap
Challenges in deploying a microphone array to localize and separate sound sources in real auditory scenes Reviewed

Yoshiaki Bando, Takuma Otsuka, Katsutoshi Itoyama, Kazuyoshi Yoshii, Yoko Sasaki, Satoshi Kagami, Hiroshi G. Okuno

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015) 723 - 727 2015.4

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICASSP.2015.7178064

researchmap
Singing voice analysis and editing based on mutually dependent F0 estimation and source separation Reviewed

Yukara Ikemiya, Kazuyoshi Yoshii, Katsutoshi Itoyama

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015) 574 - 578 2015.4

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICASSP.2015.7178034

researchmap
市販楽曲中の歌声の分離と音高推定に基づく歌唱表現編集システム Reviewed

池宮由楽, 糸山克寿, 吉井和佳

情報処理学会インタラクション2015 128 - 135 2015.3

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Recent progress of statistical audio signal processing

Kazuyoshi Yoshii, Katsutoshi Itoyama

Journal of the Institute of Image Information and Television Engineers 69 ( 2 ) 111 - 116 2015.2

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：一般社団法人映像情報メディア学会

DOI： 10.3169/itej.69.111

CiNii Books

researchmap
Automatic speech recognition for mixed dialect utterances by mixing dialect language models Reviewed

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 ( 2 ) 373 - 382 2015.2

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/TASLP.2014.2387414

researchmap
Posture estimation of hose-shaped robot by using active microphone array Reviewed

Yoshiaki Bando, Takuma Otsuka, Takeshi Mizumoto, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Hiroshi G. Okuno

Advanced Robotics 29 ( 1 ) 35 - 49 2015.1

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1080/01691864.2014.981291

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/ar/ar29.html#BandoOMIKTNO15
Recognition of in-field frog chorusing using Bayesian nonparametric microphone array processing Reviewed

Yoshiaki Bando, Takuma Otsuka, Ikkyu Aihara, Hiromitsu Awano, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

Computational Sustainability, Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence 2015.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Picture Llnguage for constructive 3D modeling in Scheme Reviewed

Koutarou Furukawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

Computer Software 32 ( 4 ) 4_31 - 4_49 2015

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：日本ソフトウェア科学会

We have developed a picture language system for constructing 3D models in Scheme, one version of Lisp. Understanding both procedural abstractions and data abstractions is important for all people who learn to write efficient and compact source codes. A picture language has been the most effective way to understand those concepts, because it can present structures of programs as intuitively and visually understandable 2D figures, which was originally introduced in a legendary book of computer science, SICP (Structure and Interpretation of Computer Programs). We designed and implemented the extended picture language for 3D models using JAKLD and enabled the system to export models in 3D printable format, in order to deepen learners' understanding of abstractions by analogy of spatial fractal structures of substantial models. We report the use of our system as a teaching material in a lecture for first-year undergraduate students.

DOI： 10.11309/jssst.32.4_31

CiNii Books

researchmap
Nonparametric Bayesian dereverberation of power spectrograms based on infinite-order autoregressive processes Reviewed

Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 ( 12 ) 1918 - 1930 2014.12

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：IEEE

DOI： 10.1109/TASLP.2014.2355772

CiNii Books

researchmap
Development of a robot quizmaster with auditory functions for speech-based multiparty interaction Reviewed

Izaya Nishimuta, Kazuyoshi Yoshii, Katsutoshi Itoyama, Hiroshi G. Okuno

2014 IEEE/SICE International Symposium on System Integration (SII2014) 328 - 333 2014.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/sii.2014.7028059

Web of Science

researchmap
A robot quizmaster that can localize, separate, and recognize simultaneous utterances for a fastest-voice-first quiz game Reviewed

Izaya Nishimuta, Naoki Hirayama, Kazuyoshi Yoshii, Katsutoshi Itoyama, Hiroshi G. Okuno

2014 IEEE-RAS International Conference on Humanoid Robots (Humanoids 2014) 967 - 972 2014.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/HUMANOIDS.2014.7041480

researchmap
Visualization of auditory awareness based on sound source positions estimated by depth sensor and microphone array Reviewed

Takahiro Iyama, Osamu Sugiyama, Takuma Otsuka, Katsutoshi Itoyama, Hiroshi G. Okuno

2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014) 1908 - 1913 2014.11

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IROS.2014.6942814

researchmap
A sound-based online method for estimating the time-varying posture of a hose-shaped robot Reviewed

Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

2014 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR 2014) 1 - 6 2014.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/SSRR.2014.7017665

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ssrr/ssrr2014.html#BandoIKTNYO14
Transferring Vocal Expressions of a Professional Singer to Unaccompanied Singing Signals

Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii

Late Breaking Demo (LBD), International Society for Music Information Retrieval (ISMIR) 1 - 2 2014.10

　More details

Language：English Publishing type：Research paper (other academic)

researchmap
Bayesian audio alignment based on a unified model of music composition and performance Reviewed

Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

15th International Society for Music Information Retrieval Conference (ISMIR 2014) 233 - 238 2014.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://dblp.org/rec/conf/ismir/MaezawaIYO14
Sound annotation tool for multidirectional sounds based on spatial information extracted by HARK robot audition software Reviewed

Osamu Sugiyama, Katsutoshi Itoyama, Kazuhiro Nakadai, Hiroshi G. Okuno

2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC2014) 2335 - 2340 2014.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/SMC.2014.6974275

researchmap
Parameter estimation of virtual musical instrument synthesizers Reviewed

Katsutoshi Itoyama, Hiroshi G. Okuno

40th International Computer Music Conference and 11th Sound and Music Computing Conference (ICMC SMC 2014) 1426 - 1431 2014.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Dialect-mixed speech recognition by mixing simulated multiple dialect language models Reviewed

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, Hiroshi G.Okuno

IPSJ Journal 55 ( 7 ) 1681 - 1694 2014.7

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

This paper designs and implements an automatic speech recognition (ASR) system that accepts a mixture of various kinds of dialects.The language model for a particular dialect is trained on a dialect language corpus simulated from a large common language corpus.The simulation is carried out with a weighted finite-state transducer (WFST)trained on a parallel corpus of a dialect and common language.The resulting system recognizes dialect utterances with a mixture of dialect language models by estimating the optimal dialect mixing proportion for each utterance.Since actually-spoken dialect is not a purely single dialectbut a mixture of various dialects,influenced by communication in daily lives and broadcasting such as television and radio,estimating optimal dialect mixing proportion, that is,what maximizes the value of log-likelihood forthe input utterance, is critical in ASR.Experiments showed that recognition accuracy improves by usingthe dialect language model,that log-likelihood and recognition accuracyare highly correlated, and that recognition accuracy improvesby choosing the dialect mixing proportion that maximizes log-likelihoodfor each utterance,compared to a fixed dialect mixing proportion.

CiNii Books

researchmap
Automatic transcription of guitar tablature from audio signals in accordance with player's proficiency Reviewed

Kazuki Yazawa, Katsutoshi Itoyama, Hiroshi G. Okuno

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2014) 3122 - 3126 2014.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2014.6854175

Web of Science

researchmap
Transcribing vocal expression from polyphonic music Reviewed

Yukara Ikemiya, Katsutoshi Itoyama, Hiroshi G. Okuno

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2014) 3127 - 3131 2014.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2014.6854176

Web of Science

researchmap
Selective index combination method based on out-of-vocabulary region estimator for open-vocabulary spoken term detection Reviewed

Naoyuki Kanda, Katsutoshi Itoyama, Hiroshi G.Okuno

IPSJ Journal 55 ( 3 ) 1201 - 1211 2014.3

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

In this paper, a novel index combination method for spoken term detection is proposed. In our method, outputs from four different recognizers are combined into one confusion network. A novel index-selection method for the multiple index-combination method is then used to suppress the increase of the index size. Two methods are proposed to reduce index size: (1) arc selection and (2) unit selection, both of which are based on an Out-of-Vocabulary (OOV)-region estimator score. Experimental results with Japanese lecture recordings, Corpus of Spontaneous Japanese, showed that the index-selection method achieved a 22.7% reduction of index size of the best confusion network with only 1.4 points loss of its high accuracy. Compared with the best phoneme-based index from a single recognizer, the proposed method achieved smaller index size while keeping high accuracy of the index combination method (a 16.3% and 16.0% relative error reduction for IV and OOV queries).

CiNii Books

researchmap
Transferring vocal expression of F0 contour using singing voice synthesizer Reviewed

Yukara Ikemiya, Katsutoshi Itoyama, Hiroshi G. Okuno

Modern Advances in Applied Intelligence (IEA/AIE 2014) ( 2 ) 250 - 259 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Verlag

DOI： 10.1007/978-3-319-07467-2_27

Scopus

researchmap
Posture estimation of hose-shaped robot using microphone array localization Reviewed

Yoshiaki Bando, Takeshi Mizumoto, Katsutoshi Itoyama, Kazuhiro Nakadai, Hiroshi G. Okuno

2013 IEEE International Conference on Intelligent Robots and Systems (IROS2013) 3446 - 3451 2013.11

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IROS.2013.6696847

Scopus

researchmap
Noise correlation matrix estimation for improving sound source localization by multirotor UAV Reviewed

Koutarou Furukawa, Keita Okutani, Kohei Nagira, Takuma Otsuka, Katsutoshi Itoyama, Kazuhiro Nakadai, Hiroshi G. Okuno

2013 IEEE International Conference on Intelligent Robots and Systems (IROS2013) 3943 - 3948 2013.11

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IROS.2013.6696920

Web of Science

researchmap
Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier Reviewed

Naoyuki Kanda, Katsutoshi Itoyama, Hiroshi G. Okuno

2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2013) 8540 - 8544 2013.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2013.6639332

Web of Science

researchmap
Initialization-robust Bayesian multipitch analyzer based on psychoacoustical and musical criteria Reviewed

Daichi Sakaue, Takuma Otsuka, Katsutoshi Itoyama, Hiroshi G. Okuno

2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2013) 226 - 230 2013.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2013.6637642

Scopus

researchmap
Audio-based guitar tablature transcription using multipitch analysis and playability constraints Reviewed

Kazuki Yazawa, Daichi Sakaue, Kohei Nagira, Katsutoshi Itoyama, Hiroshi G. Okuno

2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2013) 196 - 200 2013.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2013.6637636

Web of Science

researchmap
Robust multipitch analyzer against initialization based on latent harmonic allocation using overtone corpus Reviewed

Daichi Sakaue, Katsutoshi Itoyama, Tetsuya Ogata, Hiroshi G. Okuno

Journal of Information Processing 21 ( 2 ) 246 - 255 2013.4

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.2197/ipsjjip.21.246

Scopus

researchmap
Automatic Estimation of Dialect Mixing Ratio for Dialect Speech Recognition Reviewed

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, Hiroshi G. Okuno

14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013) 1492 - 1496 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Robust Multipitch Analyzer against Initialization based on Latent Harmonic Allocation using Overtone Corpus

Sakaue Daichi, Itoyama Katsutoshi, Ogata Tetsuya, Okuno Hiroshi G.

IMT 8 ( 2 ) 467 - 476 2013

　More details

Language：English Publisher：Information and Media Technologies Editorial Board

We present a Bayesian analysis method that estimates the harmonic structure of musical instruments in music signals on the basis of psychoacoustic evidence. Since the main objective of multipitch analysis is joint estimation of the fundamental frequencies and their harmonic structures, the performance of harmonic structure estimation significantly affects fundamental frequency estimation accuracy. Many methods have been proposed for estimating the harmonic structure accurately, but no method has been proposed that satisfies all these requirements: robust against initialization, optimization-free, and psychoacoustically appropriate and thus easy to develop further. Our method satisfies these requirements by explicitly incorporating Terhardt's virtual pitch theory within a Bayesian framework. It does this by automatically learning the valid weight range of the harmonic components using a MIDI synthesizer. The bounds are termed "overtone corpus." Modeling demonstrated that the proposed overtone corpus method can stably estimate the harmonic structure of 40 musical pieces for a wide variety of initial settings.

DOI： 10.11185/imt.8.467

researchmap
Bayesian nonnegative harmonic-temporal factorization and its application to multipitch analysis Reviewed

Daichi Sakaue, Takuma Otsuka, Katsutoshi Itoyama, Hiroshi G. Okuno

13th International Society for Music Information Retrieval Conference (ISMIR 2012) 91 - 96 2012.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.5281/zenodo.1418163

researchmap

Other Link： https://dblp.org/rec/conf/ismir/SakaueOIO12
Initialization-robust multipitch estimation based on latent harmonic allocation using overtone corpus Reviewed

Daichi Sakaue, Katsutoshi Itoyama, Tetsuya Ogata, Hiroshi G. Okuno

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2012) 425 - 428 2012.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2012.6287907

Web of Science

researchmap
Automated violin fingering transcription through analysis of an audio recording Reviewed

Akira Maezawa, Katsutoshi Itoyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Computer Music Journal 36 ( 3 ) 57 - 72 2012

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：MIT Press Journals

DOI： 10.1162/COMJ_a_00129

Scopus

researchmap
Automatic chord recognition based on probabilistic integration of acoustic features, bass sounds, and chord transition Reviewed

Katsutoshi Itoyama, Tetsuya Ogata, Hiroshi G. Okuno

Advanced Research in Applied Artificial Intelligence (IEA/AIE 2012) 7345 LNAI 58 - 67 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-642-31087-4_7

Scopus

researchmap
Musical sound separation and synthesis using harmonic/inharmonic GMM and NMF for phrase replacing system Reviewed

Naoki Yasuraoka, Takuya Yoshioka, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G.Okuno

IPSJ Journal 52 ( 12 ) 3839 - 3852 2011.12

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：Information Processing Society Japan

This paper presents a sound separation and synthesis method for a new music manipulating system that facilitates a user to replace an instrument performance phrase in polyphonic audio mixture. The system first separates the instrument part from polyphony using the original performance score, and then synthesizes a new instrument performance of the user-specified target score, keeping sound characteristics of the original one. Two technical problems must be solved to realize this system: 1) separating one instrument part without knowledge of the other parts, and 2) synthesizing a new performance from separated sound with high sound quality. We introduce a new sound separation and synthesis method for the phrase replacing system; 1) sound separation by harmonic/inharmonic Gaussian mixture and nonnegative-matrix-factorization, and 2) sound synthesis by modifying MIDI-synthesizer-generated sound to follow estimated timbre and expression of original performance. Two evaluations confirm the effectiveness of our method. The method separates the target part more accurately by 28.2% in log spectral distance, and synthesizes instrument performance more accurately by 11.5% in comparison with conventional methods.

CiNii Books

researchmap
Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling Reviewed

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2011) 3816 - 3819 2011.5

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2011.5947183

Web of Science

researchmap
Automatic chord sequence recognition based on integration of chord and bass pitch features Reviewed

Kouhei Sumi, Katsutoshi Itoyama, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G.Okuno

IPSJ Journal 52 ( 4 ) 1803 - 1812 2011.4

　More details

Authorship：Corresponding author Language：Japanese Publishing type：Research paper (scientific journal)

This paper presents a method that identifies musical chords in polyphonic musical signals. As musical chords mainly represent harmony of music and are related to other musical elements such as melody and rhythm, performance of chord recognition should improve if this interrelationship is taken into consideration. In this paper, bass lines are utilized as clues for improving chord recognition. Our chord recognition system is constructed based on Viterbi-algorithm-based maximum a posteriori estimation which uses a posterior probability based on chord features, chord transition patterns, and bass pitch distributions. Experimental results with 150 songs which have scales and no modulation in twelve Beatles albums showed the recognition rate of 73.7% on average.

CiNii Books

researchmap
Query-by-example music information retrieval by score-informed source separation and remixing technologies Reviewed

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

EURASIP Journal on Advances in Signal Processing 2010 ( 172961 ) 1 - 14 2011.1

　More details

Authorship：Lead author Language：English Publishing type：Research paper (scientific journal) Publisher：Hindawi Publishing Corporation

DOI： 10.1155/2010/172961

CiNii Books

researchmap
A musical mood trajectory estimation method using lyrics and acoustic features Reviewed

Naoki Nishikawa, Katsutoshi Itoyama, Hiromasa Fujihara, Masataka Goto, Tetsuya Ogata, Hiroshi G. Okuno

2011 ACM Multimedia Conference and Co-Located Workshops (MIRUM '11) 51 - 56 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2072529.2072543

Scopus

researchmap
SpeakBySinging: Converting singing voices to speaking voices while retaining voice timbre Reviewed

Shimpei Aso, Takeshi Saitou, Matataka Goto, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

13th International Conference on Digital Audio Effects (DAFx-10) 114 - 121 2010.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Scopus

researchmap
Violin fingering estimation based on violin pedagogical fingering model constrained by bowed sequence estimation from audio input Reviewed

Akira Maezawa, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Trends in Applied Intelligent Systems (IEA/AIE 2010) 249 - 259 2010.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-642-13033-5_26

researchmap
Query-by-Example music retrieval approach based on musical genre shift by changing instrument volume Reviewed

Katsutoshi Itoyama, Masataka Goto, Kazuhiro Komatani, Testuya Ogata, Hiroshi G. Okuno

12th International Conference on Digital Audio Effects (DAFx-09) 205 - 212 2009.9

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings)

Scopus

researchmap

Other Link： http://dafx09.como.polimi.it/proceedings/data/DAFx09_Proceedings.pdf
Parameter estimation for harmonic and inharmonic models by using timbre feature distributions Reviewed

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Journal of Information Processing 17 191 - 201 2009.7

　More details

Authorship：Lead author Language：English Publishing type：Research paper (scientific journal) Publisher：Information Processing Society of Japan

DOI： 10.2197/ipsjjip.17.191

Scopus

researchmap
An analysis-and-synthesis approach for manipulating pitch of a musical instrument sound considering pitch-dependency of timbral characteristics Reviewed

Takehiro Abe, Katsutoshi Itoyama, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IPSJ Journal 50 ( 3 ) 1054 - 1066 2009.3

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

This paper presents a synthesis method that can generate musical instrument sounds with arbitrary pitches from a given musical instrument sound while constraining distorting timbral characteristics. Based on the psychoacoustical knowledge on auditory effects of timbre, we define timbral features on the spectrogram of a musical instrument sound as (i) relative amplitudes of harmonic components, (ii) distribution of inharmonic components, and (iii) temporal envelopes of harmonic components. First, to analyze timbral features of a seed, it is separated into harmonic and inharmonic components by using Itoyama's integrated model. In pitch manipulation, it is necessary to take into account the relation of pitch and features (i) and (ii). Therefore, we predict the values of each feature by using a cubic polynomial that approximates the feature distribution over pitches. Experimental results showed the effectiveness of our method; the spectral and MFCC distances between synthesized sounds and real sounds of 32 instruments were reduced by 64.70% and 32.31%, respectively.

CiNii Books

researchmap
Parameter Estimation for Harmonic and Inharmonic Models by Using Timbre Feature Distributions

Itoyama Katsutoshi, Goto Masataka, Komatani Kazunori, Ogata Tetsuya, G. Okuno Hiroshi

Information and Media Technologies 4 ( 3 ) 672 - 682 2009

　More details

Language：English Publisher：Information and Media Technologies Editorial Board

We describe an improved way of estimating parameters for an integrated weighted-mixture model consisting of both harmonic and inharmonic tone models. Our final goal is to build an instrument equalizer (music remixer) that enables a user to change the volume of parts of polyphonic sound mixtures. To realize the instrument equalizer, musical signals must be separated into each musical instrument part. We have developed a score-informed sound source separation method using the integrated model. A remaining but critical problem is to find a way to deal with timbre varieties caused by various performance styles and instrument bodies because our method used template sounds to represent their timbre. Template sounds are generated from a MIDI tone generator based on an aligned score. Difference of instrument bodies between mixed signals and template sounds causes timbre difference and decreases separation performance. To solve this problem, we train probabilistic distributions of timbre features using various sounds to reduce template dependency. By adding a new constraint of maximizing the likelihood of timbre features extracted from each tone model, we can estimate model parameters that express the timbre more accurately. Experimental results show that separation performance improved from 4.89 to 8.48dB.

DOI： 10.11185/imt.4.672

researchmap
Changing timbre and phrase in existing musical performances as you like - Manipulations of single part using harmonic and inharmonic models Reviewed

Naoki Yasuraoka, Takehiro Abe, Katsutoshi Itoyama, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

17th ACM international conference on Multimedia (MM '09) 203 - 212 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/1631272.1631302

Scopus

researchmap
Bowed String Sequence Estimation of a Violin Based on Adaptive Audio Signal Classification and Context-Dependent Error Correction Reviewed

Akira Maezawa, Katsutoshi Itoyama, Tom Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

2009 11th IEEE International Symposium on Multimedia (ISM '09) 9 - 16 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ISM.2009.30

Web of Science

researchmap
Automatic chord recognition based on probabilistic integration of chord transition and bass pitch estimation Reviewed

Kouhei Sumi, Katsutoshi Itoyama, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

9th International Conference on Music Information Retrieval (ISMIR 2008) 39 - 44 2008.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Instrument equalizer for query-by-example retrieval: Improving sound source separation based on integrated harmonic and inharmonic models Reviewed

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

9th International Conference on Music Information Retrieval (ISMIR 2008) 133 - 138 2008.9

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Simultanious realization of score-informed sound source separation of polyphonic musical signals and constrained parameter estimation for integrated model of harmonic and inharmonic structure Reviewed

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IPSJ Journal 49 ( 3 ) 1465 - 1479 2008.3

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper (scientific journal) Publisher：Information Processing Society Japan

This paper describes a sound sourse separation method for polyphonic sound mixtures of musical signals which include both harmonic instrument sounds and inharmonic instrument sounds, and a constrained parameter estimation method by using a score which includes pitch, duration, volume, onset time, and instrument of each note as prior information. We separate a power spectrum of sound mixtures into each musical note by using an integrated weighted-mixture model consisting of both harmonic-structure and inharmonic-structure tone models (generative models for the power spectrogram). The integrated model realize a parameter estimation method under a constraint of parameter similarity in the same musical instruments. We initialize model parameters using template sounds which are recorded from a MIDI tone generator. On the basis of the Maximum A Posteriori Probability estimation using the EM algorithm, we estimated all parameters of this integrated model under several original constraints for preventing over-training and maintaining intra-instrument consistency. Using standard MIDI files as prior information of the model parameters, we confirmed that the integrated model increased the SNR by 0.4-0.9dB.

CiNii Books

researchmap
Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics Reviewed

Takehiro Abe, Katsutoshi Itoyama, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

11th International Conference on Digital Audio Effects (DAFx-08) 249 - 256 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Scopus

researchmap
Integration and adaptation of harmonic and inharmonic models for separating polyphonic musical signals Reviewed

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2007 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2007) I 57 - 60 2007

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP.2007.366615

Web of Science

researchmap
Automatic feature weighting in automatic transcription of specified part in polyphonic music Reviewed

Katsutoshi Itoyama, Tetsuro Kitahara, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

7th International Conference on Music Information Retrieval (ISMIR 2006) 172 - 175 2006.10

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://dblp.org/rec/conf/ismir/ItoyamaKKOO06

▼display all

Books

ロボット工学ハンドブック

日本ロボット学会（ Role： Contributor音響情報処理）

コロナ社 2023.3 （ ISBN:9784339046793 ）

　More details

Total pages：ix, 1072p Language：Japanese

CiNii Books

researchmap

MISC

距離ベース時間周波数マスク推定による音声強調手法の検討

石井, 遼平, 中臺, 一博, 糸山, 克寿

第86回全国大会講演論文集 2024 ( 1 ) 361 - 362 2024.3

　More details

Language：Japanese Publisher：情報処理学会

一般に会議では、複数の人が集まって話をするため、たとえ各話者の口元にマイクをつけて収録した場合でも、収録音には対象話者の音声に加え、他の話者の音声が混入してしまう。このため、収録音中の対象話者の音声の聴取が困難になり、議事録作成などの用途に支障をきたすという問題がある。本稿では、この問題を解決するため、ディープラーニングにより推定された時間周波数マスクを用いて、モノラル収録音から、近距離話者の音声のみを抽出する音声強調法を提案する。提案手法を人間の聴覚と相関があるPESQとSTOIを用いて評価した結果、提案手法の有効性を示すことができた。

CiNii Books

CiNii Research

researchmap
Detection of small moving objects as rare events in videos

西田健次, 糸山克寿, 糸山克寿, 中臺一博

人工知能学会第二種研究会資料(Web) 2024 ( Challenge-064 ) 2024

　More details

J-GLOBAL

researchmap
LCMVベースのScan-and-Sum Beamformerによる面領域内音源の抽出

安江蒼人, YEN Benjamin, 糸山克寿, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 42nd 2024

　More details

J-GLOBAL

researchmap
Biasing Networkを用いた音声認識の雑音耐性向上

大崎崇博, 周藤唯, 糸山克寿, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 42nd 2024

　More details

J-GLOBAL

researchmap
距離学習を用いた話者識別に基づく話者ダイアライゼーションの検討

阿坂脩平, 西田健次, 糸山克寿, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 42nd 2024

　More details

J-GLOBAL

researchmap
ガウス過程回帰を用いた音響伝達関数の環境変化適応

藤田侑樹, 糸山克寿, 西田健次, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 42nd 2024

　More details

J-GLOBAL

researchmap
屋外環境下でのドローンのローターノイズによる地表材質推定手法の検討

矢野翼, YEN Benjamin, 糸山克寿, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 42nd 2024

　More details

J-GLOBAL

researchmap
Video Vision Transformerに基づく音源定位の提案

横田遥大, BOZKURTLAR Mert, BOZKURTLAR Mert, YEN Benjamin, 糸山克寿, 西田健次, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 42nd 2024

　More details

J-GLOBAL

researchmap
Improvement in Target Speech Extraction Using Distance- and Speaker-Based Time-Frequency Masks

田口鐵人, 石井遼平, 大崎崇博, 阿坂脩平, YEN Benjamin, 糸山克寿, 中臺一博

計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 25th 2024

　More details

J-GLOBAL

researchmap
HARK3.6 and Its Application to Active Drone Audition

中臺一博, 公文誠, 佐々木洋子, 干場功太郎, YEN Benjamin, 糸山克寿, 瀧ヶ平将行, 寺門直哉, LIN Zirui, GULZAR Haris, BUSTO Monikka Rosalianna, 江田毅晴, 天野英晴

計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 25th 2024

　More details

J-GLOBAL

researchmap
Improving Noise Robustness of Automatic Speech Recognition with Speech Enhancement and Adapters

大崎崇博, 周藤唯, 糸山克寿, 糸山克寿, 西田健次, 中臺一博

日本ロボット学会誌 42 ( 9 ) 2024

　More details

J-GLOBAL

researchmap
Performance Improvement and Acceleration of Surface Source Extractionbased on Multiple Constraint MVDR Beamforming and Woodbury Matrix Identity

安江蒼人, 糸山克寿, 糸山克寿, 西田健次, 中臺一博

日本ロボット学会誌 42 ( 6 ) 2024

　More details

J-GLOBAL

researchmap
深層ブラインド音源分離を用いた転移学習による環境音分離

合澤隆拓, 坂東宜昭, 糸山克寿, 西田健次, 中臺一博

情報処理学会第85回全国大会 ( 5S-02 ) 2023.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音声強調ネットワークとアダプターを用いた音声認識の耐雑音ロバスト性向上

大崎崇博, 周藤唯, 糸山克寿, 糸山克寿, 西田健次, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 41st 2023

　More details

J-GLOBAL

researchmap
フォンミーゼス分布に基づく音響伝達関数オンライン適応の向上

藤田侑樹, 糸山克寿, 糸山克寿, 西田健次, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 41st 2023

　More details

J-GLOBAL

researchmap
面音源抽出のための複数拘束MVDRビームフォーマーの逐次計算による高速化

安江蒼人, 糸山克寿, 糸山克寿, 西田健次, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 41st 2023

　More details

J-GLOBAL

researchmap
ロボット聴覚のための音源定位と深層ブラインド音源分離の統合

合澤隆拓, 合澤隆拓, 坂東宜昭, 糸山克寿, 糸山克寿, 西田健次, 中臺一博, 大西正輝

日本ロボット学会学術講演会予稿集(CD-ROM) 41st 2023

　More details

J-GLOBAL

researchmap
Towards Natural Spoken Dialogue Systems Based on AI Services

阿坂脩平, 西田健次, 糸山克寿, 糸山克寿, 中臺一博

計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 24th 2023

　More details

J-GLOBAL

researchmap
気配センシングに向けた磁束密度センサと風速センサを用いた動作検出

川口洋慶, SHAKEEL Muhammad, 糸山克寿, 糸山克寿, 西田健次, 中臺一博

日本ロボット学会学術講演会予稿集(CD-ROM) 41st 2023

　More details

J-GLOBAL

researchmap
Improved 3D spatial recognition based on audible sound-based echolocation with a 5-channel microphone array

小林宙輝, 糸山克寿, 糸山克寿, 西田健次, 中臺一博

計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 24th 2023

　More details

J-GLOBAL

researchmap
Groud Surface Material Estimation Using Drone Rotor Noise

矢野翼, 糸山克寿, 糸山克寿, 西田健次, 中臺一博

計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 24th 2023

　More details

J-GLOBAL

researchmap
Scan-and-Sum Beamformerの拡張による二次元領域抽出の検討

安江蒼人, 糸山克寿, 西田健次, 中臺一博

第23回計測自動制御学会システムインテグレーション部門後援会 (SI 2022) ( 3A2-B04 ) 2022.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
最頻値フィルタを用いたマイクロホンアレイ音響伝達関数の環境適応手法の検討

藤田侑樹, 糸山克寿, 西田健次, 中臺一博

第23回計測自動制御学会システムインテグレーション部門後援会 (SI 2022) ( 3A2-B01 ) 2022.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
HARK 3.4 ～PyHARKの紹介～

中臺一博, 糸山克寿

第23回計測自動制御学会システムインテグレーション部門後援会 (SI 2022) ( 3P2-H12 ) 2022.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音響伝達関数の二次元補間手法の提案とその音源定位への適用

大﨑崇博, 糸山克寿, 西田健次, 中臺一博

第23回計測自動制御学会システムインテグレーション部門後援会 (SI 2022) ( 3A2-B14 ) 2022.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数音源追跡におけるドローン群の行動計画の検討

山田泰基, 糸山克寿, 西田健次, 中臺一博

第61回AIチャレンジ研究会 ( SIG-Challenge-061-07 ) 33 - 39 2022.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
ロボット聴覚用音響処理ソフトウェアHARKを用いたサウンドスケープの解析

山本遼, 西田健次, 糸山克寿, 松林志穂, 鈴木麗璽, 中臺一博

日本鳥学会2022年度大会 ( P048 ) 2022.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
PyHARK: HARKのオンライン・オフライン処理用Pythonパッケージ

中臺一博, 糸山克寿, 瀧ヶ平将行

第61回AIチャレンジ研究会 ( SIG-Challenge-061-04 ) 14 - 19 2022.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
低解像度画像からの小領域物体の検出手法の検討

西田健次, 糸山克寿, 中臺一博

第61回AIチャレンジ研究会 ( SIG-Challenge-061-03 ) 9 - 13 2022.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
任意の混合音を入力としたマイクロホンアレイ形状のキャリブレーション

糸山克寿, 中臺一博

第61回AIチャレンジ研究会 ( SIG-Challenge-061-11 ) 57 - 62 2022.11

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
深層ブラインド音源分離と転移学習に基づく遠隔音声認識の評価

合澤隆拓, 坂東宜昭, 糸山克寿, 西田健次, 中臺一博

第61回AIチャレンジ研究会 2022 ( SIG-Challenge-061-09 ) 09 2022.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人人工知能学会

DOI： 10.11517/jsaisigtwo.2022.challenge-061_09

CiNii Research

researchmap
音源定位結果の3D可視化とmAPベースの評価指標の提案

山本遼, 糸山克寿, 西田健次, 中臺一博

第40回日本ロボット学会学術講演会 ( 4J3-07 ) 2022.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
環境イベント識別学習フレームワークの提案とその日本語テキスト入力からの音響シーン生成部の実装

露口弘毅, シャキールムハマド, 糸山克寿, 西田健次, 中臺一博

第40回日本ロボット学会学術講演会 ( 4J3-07 ) 2022.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
アンサンブル時間周波数マスクを用いた複数の音声強調手法の統合

藤田雅彦, 糸山克寿, 西田健次, 中臺一博

第40回日本ロボット学会学術講演会 ( 4J3-04 ) 2022.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Speech-Recognition on Low-Power GPU Device for Robotic Applications on the Edge

Haris Gulzar, Kenji Nishida, Katsutoshi Itoyama, Kazuhiro Nakadai

40th Annual Conference of the Robotics Society of Japan ( 2J3-06 ) 2022.9

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数マイクロホンアレイのパラメータ同時最適化

杉山地塩, 糸山克寿, 西田健次, 中臺一博

第40回日本ロボット学会学術講演会 ( 4J3-09 ) 2022.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数のマイクロホンアレイ搭載ドローンの配置最適化による音源追跡性能の向上

山田泰基, 糸山克寿, 西田健次, 中臺一博

第40回日本ロボット学会学術講演会 ( 4J3-08 ) 2022.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
深層フルランク空間相関分析に基づく遠隔音声認識のフロントエンド

合澤隆拓, 坂東宜昭, 糸山克寿, 西田健次, 中臺一博

情報処理学会第84回全国大会 ( 1R-02 ) 2022.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
深層学習を用いた複数音声強調処理のアンサンブル手法の検討

藤田雅彦, 糸山克寿, 西田健次, 中臺一博

情報処理学会第84回全国大会 ( 5R-05 ) 2022.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Soundscape analysis using robot audition open source software HARK

山本遼, 西田健次, 糸山克寿, 中臺一博, 中臺一博

日本生態学会大会講演要旨(Web) 69th 2022

　More details

J-GLOBAL

researchmap
転移学習を用いた音響クラス分類の検討

露口弘毅, 西田健次, 糸山克寿, 中臺一博

第22回計測自動制御学会システムインテグレーション部門講演会 (SI2021) ( 3B4-03 ) 2021.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
スポットフォーミングによる音声認識性能向上の評価

合澤隆拓, 鍵本泰宏, 西田健次, 糸山克寿, 中臺一博

第22回計測自動制御学会システムインテグレーション部門講演会 (SI2021) ( 2G4-03 ) 2021.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Detecting earthquakes: a novel deep learning-based approach for effective disaster response

Shakeel Muhammad, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

第58回人工知能学会AIチャレンジ研究会 47 - 52 2021.11

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

researchmap
Haris Gulzar, Shakeel Muhammad, Kenji Nishida, Katsutoshi Itoyama, Kazuhiro Nakadai

Haris Gulzar, Shakeel Muhammad, Kenji Nishida, Katsutoshi Itoyama, Kazuhiro Nakadai

第58回人工知能学会AIチャレンジ研究会 29 - 34 2021.11

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数マイクロホンアレイを用いたNMFによる空間音源分離法の残響下での評価

鍵本泰宏, 糸山克寿, 西田健次, 中臺一博

第58回人工知能学会AIチャレンジ研究会 22 - 28 2021.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Numerical Evaluation of 3D Sound Source Tracking Methods for Drones with Microphone Arrays

Taiki Yamada, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

第39回日本ロボット学会学術講演会 (RSJ2021) ( 2D4-02 ) 2021.9

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

researchmap
類似度行列を考慮した野鳥の歌自動識別の検討

山本遼, 中臺一博, 西田健次, 糸山克寿

第39回日本ロボット学会学術講演会 (RSJ2021) ( 2D4-04 ) 2021.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数マイクロホンアレイの同期および3次元位置・姿勢推定の同時最適化の検討

杉山地塩, 糸山克寿, 西田健次, 中臺一博

第39回日本ロボット学会学術講演会 (RSJ2021) ( 2D4-01 ) 2021.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
アンサンブル時間周波数マスクによる音声強調手法の評価

藤田雅彦, 糸山克寿, 西田健次, 中臺一博

第39回日本ロボット学会学術講演会 (RSJ2021) ( 2D3-03 ) 2021.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
エコロケーションに基づく視覚シーンの再構成手法の提案と入力特徴量の検討

岸波華彦, 糸山克寿, 西田健次, 中臺一博

第39回日本ロボット学会学術講演会 (RSJ2021) ( 2D3-02 ) 2021.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
類似度行列による野鳥の歌識別器の検討

山本遼, 中臺一博, 糸山克寿, 西田健次, 鈴木麗璽, 松林志保

日本鳥学会2021年度大会 2021.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
アンサンブル時間周波数マスクによる音声強調手法の検討

藤田雅彦, 糸山克寿, 西田健次, 中臺一博

情報処理学会第83回全国大会 ( 7N-6 ) 2021.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数マイクロホンアレイの同期および位置・姿勢推定の同時最適化の検討

杉山地塩, 糸山克寿, 西田健次, 中臺一博

情報処理学会第83回全国大会 ( 5W-7 ) 2021.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
A Multi-Access Edge Computing Solution with Distributed Sound Source Localization for IoT Networks

Haris Gulzar, Muhammad Shakeel, Kenji Nishida, Katsutoshi Itoyama, Kazuhiro Nakadai

( 1E3-04 ) 2020.12

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

researchmap
バイナリマスク付き非負値行列因子分解に基づく発音時刻を用いた音源分離

日下湧太, 糸山克寿, 西田健次, 中臺一博

第57回人工知能学会 AIチャレンジ研究会 2020.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
マイクロホンアレイ搭載ドローンによる音源方向尤度統合に基づく音源追跡

山田泰基, 糸山克寿, 西田健次, 中臺一博

第57回人工知能学会 AIチャレンジ研究会 2020.11

　More details

Publishing type：Research paper, summary (national, other academic conference)

researchmap
表情による感情推定と音声による感情推定手法の検討

西田健次, 山田亨, 糸山克寿, 中臺一博

第57回人工知能学会 AIチャレンジ研究会 2020.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
テニスにおける打球音を用いた打球回転方向の識別

山本修己, 糸山克寿, 西田健次, 中臺一博

第57回人工知能学会 AIチャレンジ研究会 2020.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
マイクロホン位置と音源スペクトルの確率モデルに基づくマイクロホンアレイのキャリブレーション

段雄啓, 糸山克寿, 西田健次, 中臺一博

第57回人工知能学会 AIチャレンジ研究会 2020.11

　More details

Authorship：Corresponding author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
重み付け尤度関数と定在波を用いた可聴音による二次元環境認識

岸波華彦, 糸山克寿, 西田健次, 中臺一博

第38回日本ロボット学会学術講演会 (RSJ2020) ( 1D3-04 ) 2020.10

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
環境音情報と画像情報を用いた物体検出による音ラベル付きセグメントの生成

鈴木啓, 糸山克寿, 西田健次, 中臺一博

第38回日本ロボット学会学術講演会 (RSJ2020) ( 1D3-02 ) 2020.10

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数マイクロホンアレイを用いたNMFによる空間音源分離法の提案と評価

鍵本泰宏, 糸山克寿, 西田健次, 中臺一博

第38回日本ロボット学会学術講演会 (RSJ2020) ( 1D2-04 ) 2020.10

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
伸縮スペクトルのランク最小化の緩和に基づくチャネル間同期

糸山克寿, 中臺一博

第38回日本ロボット学会学術講演会 (RSJ2020) ( 1D2-03 ) 2020.10

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
テニスの打球音による球種識別の検討

山本修己, 糸山克寿, 西田健次, 中臺一博

第38回日本ロボット学会学術講演会 (RSJ2020) ( 1D3-05 ) 2020.10

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
バイナリマスク付き非負値行列因子分解に基づく音源分離手法の発音時刻ずれに対する性能評価

日下湧太, 糸山克寿, 西田健次, 中臺一博

情報処理学会第82回全国大会 ( 5S-1 ) 361 - 362 2020.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
クラウドソーシングを用いた作成した環境音キャプションコーパスの評価

岩月道生, 糸山克寿, 西田健次, 中臺一博

情報処理学会第82回全国大会 ( 5Q-7 ) 201 - 202 2020.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数マイクロホンアレイを用いた尤度分布統合による移動音源追跡

山田泰基, 糸山克寿, 西田健次, 中臺一博

情報処理学会第82回全国大会 ( 5Q-2 ) 191 - 192 2020.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
A Spatial Filter Design for Surface Sound Source Separation

Shi Zhong, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

( 5Q-1 ) 189 - 190 2020.2

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数マイクロホンアレイを用いたLDAによるスポットフォーミングの検討

鍵本泰宏, 糸山克寿, 西田健次, 中臺一博

第20回計測自動制御学会システムインテグレーション部門講演会 (SI2019) ( 2C2-16 ) 2019.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
重み付け尤度関数と定在波を用いた可聴音による距離測定

岸波華彦, 糸山克寿, 西田健次, 中臺一博

第20回計測自動制御学会システムインテグレーション部門講演会 (SI2019) ( 2C2-14 ) 2019.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音響距離計測情報を用いた透明物体の三次元構造復元法の検討

岡本悠太朗, 糸山克寿, 西田健次, 中臺一博

第20回計測自動制御学会システムインテグレーション部門講演会 (SI2019) ( 1C5-08 ) 2019.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
リハビリテーション効果推定のための感情識別器の構成と評価

西田健次, 山田亨, 藤村友美, 糸山克寿, 中臺一博

第55回人工知能学会 AIチャレンジ研究会 ( SIG-Challenge-055-8 ) 41 - 47 2019.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
視聴覚統合による動的環境下における三次元再構成の提案

紺野隆志, 西田健次, 糸山克寿, 中臺一博

第55回人工知能学会 AIチャレンジ研究会 ( SIG-Challenge-055-7 ) 33 - 40 2019.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
スペクトル伸縮モデルと複素正規分布音源モデルに基づく複数マイクロホンの同期

糸山克寿, 中臺一博

第55回人工知能学会 AIチャレンジ研究会 ( SIG-Challenge-055-5 ) 24 - 29 2019.11

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数マイクロホンアレイにおける音源方向尤度に基づく三次元音源追跡

山田泰基, 糸山克寿, 西田健次, 中臺一博

第55回人工知能学会 AIチャレンジ研究会 ( SIG-Challenge-055-3 ) 12 - 17 2019.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Design of a Scan-and-sum Beamformer for Surface Sound Source Separation

Zhi Zhong, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

37th Annual Conference of the Robotics Society of Japan (RSJ2019) ( 1F3-04 ) 2019.9

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数マイクロホンアレイを搭載した複数のUAVによる移動音源の三次元追跡手法の実収録音評価

山田泰基, 糸山克寿, 西田健次, 中臺一博

第37回日本ロボット学会学術講演会 (RSJ2019) ( 2I1-02 ) 2019.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音環境説明ロボットの実現に向けた環境音キャプションコーパスの構築

岩月道生, 周藤唯, 糸山克寿, 西田健次, 中臺一博

第37回日本ロボット学会学術講演会 (RSJ2019) ( 2I1-05 ) 2019.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数同時音源を用いたマイクロホンアレイのキャリブレーション

段雄啓, 糸山克寿, 西田健次, 中臺一博

第37回日本ロボット学会学術講演会 (RSJ2019) ( 2I2-04 ) 2019.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
バイナリマスク付き非負値行列因子分解による発音時刻を用いた音源分離手法の評価

日下湧太, 糸山克寿, 西田健次, 中臺一博

第37回日本ロボット学会学術講演会 (RSJ2019) ( 2I2-05 ) 2019.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
バイナリマスク付き非負値行列因子分解による発音時刻を用いた音源分離手法とその評価

日下湧太, 糸山克寿, 西田健次, 中臺一博

情報処理学会第124回音楽情報科学研究会 2019-MUS-124 ( 14 ) 1 - 7 2019.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Listen and Tell: Acoustic Scene Caption Generation using Deep Learning

Michio Iwatsuki, Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Third International Workshop on Symbolic-Neural Learning (SNL-2019) P-17 2019.7

　More details

Language：English Publishing type：Research paper, summary (international conference)

researchmap
Sound Source Tracking Using Multiple Microphone Arrays Mounted to an Unmanned Aerial Vehicle

Taiki Yamada, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

ICRA 2019 Workshop on Sound Source Localization and Its Applications for Robots 2019.5

　More details

Language：English Publishing type：Research paper, summary (international conference)

researchmap
マルコフ連鎖に基づくマスク付きNMFを用いた特定音源の分離

日下湧太, 糸山克寿, 西田健次, 中臺一博

情報処理学会第81回全国大会 ( 7T-1 ) 419 - 420 2019.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
マイクロホンと音源位置に関する確率モデルに基づくマイクロホンアレイのキャリブレーションの検討

段雄啓, 糸山克寿, 西田健次, 中臺一博

情報処理学会第81回全国大会 ( 4V-2 ) 553 - 554 2019.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
von Mises - Bernoulli RBMを用いた音源定位の検討

正木俊伍, 杉山治, 小島諒介, 中臺一博, 糸山克寿, 西田健次

情報処理学会第81回全国大会 ( 4V-3 ) 555 - 556 2019.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数のマイクロホンアレイを搭載した複数のUAVによる移動音源の三次元追跡

山田泰基, Daniel Gabriel, 糸山克寿, 西田健次, 中臺一博

情報処理学会第81回全国大会 ( 2M-3 ) 115 - 116 2019.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
視聴覚統合による三次元構造復元に関する検討

紺野隆志, 西田健次, 糸山克寿, 中臺一博

情報処理学会第81回全国大会 ( 5R-9 ) 207 - 208 2019.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Listen and Tell: 深層学習を用いた音響シーンのキャプション生成

岩月道生, 周藤唯, 糸山克寿, 西田健次, 中臺一博

情報処理学会第81回全国大会 ( 6T-3 ) 407 - 408 2019.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
柔軟索状レスキューロボットのための空気噴射音下での単チャネル音声強調

坂東宜昭, 安部祐一, 糸山克寿, 糸山克寿, 昆陽雅司, 田所諭, 中臺一博, 中臺一博, 奥乃博

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2019 2019

　More details

J-GLOBAL

researchmap
Mask U-Netを用いた環境音セグメンテーションの提案

周藤唯, 西田健次, 糸山克寿, 中臺一博

第52回人工知能学会 AIチャレンジ研究会 ( SIG-Challenge-052-5 ) 21 - 26 2018.12

　More details

researchmap
階乗隠れセミマルコフモデルに基づく音楽音響信号に対するカバー譜生成

柴田健太郎, 錦見亮, 中村栄太, 深山覚, 後藤真孝, 糸山克寿, 吉井和佳, 吉井和佳

情報処理学会第121回音楽情報科学研究会 2018-MUS-121 ( 16 ) 1 - 8 2018.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
WaveNetを用いた楽譜情報に基づく歌唱F0軌跡の生成

和田雄介, 錦見亮, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第120回音楽情報科学研究会 2018-MUS-120 ( 8 ) 1 - 7 2018.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
デモンストレーション：音楽情報処理の研究紹介XVII

糸山克寿, 飯島祥, 梅村祥之, 尾形正泰, 加藤淳, 柴田健太郎, 津島啓晃, 佃洸摂, 出口幸子, 錦見亮, 橋田光代, 濱崎雅弘, 廣瀬均, Junichi Yamagishi, 吉久怜子, 和田雄介

情報処理学会第120回音楽情報科学研究会 2018-MUS-120 ( 5 ) 1 - 5 2018.8

　More details

Authorship：Corresponding author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Development of Tough Snake Robot in ImPACT Tough Robotics Challenge

KAMEGAWA Tetsushi, MATSUNO Fumitoshi, SUZUKI Yosuke, BANDO Yoshiaki, ITOYAMA Katsutoshi, OKUNO Hiroshi G., QI Wei, SUHARA Hiroki, MATSUDA Eriko, AKIYAMA Taichi, SAKAI Satoshi, UNE Kazushi, TAKEMORI Tatsuya, FUJIWARA Tomofumi

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2018 ( 2A2-K05 ) 2A2 - K05 2018.6

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：Japan Society of Mechanical Engineers

We are developing snake robots as a solution for inspection of plants. The snake robots are constructed by connecting pitch axis and a yaw axis alternately. The snake robots realize various locomotion mode. Especially, helical rolling motion is utilized to move inside and outside of a pipe. In this paper, designed and system of the snake robots are described in addition to experimental results conducted in test field of Tough Robotics Challenge.

DOI： 10.1299/jsmermd.2018.2a2-k05

researchmap
Inertial-Sound Based Posture Estimation for a Hose-Shaped Rescue Robot

2018 ( 2A1-M01 ) 2018.6

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人日本機械学会

Posture estimation of a hose-shaped rescue robot is crucial for handling the flexible robot body. Conventional posture estimation based on inertial sensors gradually accumulates its errors due to unexpected posture change and temperature change. The accumulative error problem can be avoided by using a sound-based method that localizes microphones and loudspeakers on the robot by measuring time differences of arrival (TDOAs) of a reference sound. The sound-based method, however, cannot distinguish mirror-symmetric postures because of the sensors serially placed on the robot. To solve these problems, we integrate the inertial and sound measurements into a unified state-space model. The time-varying posture is estimated by using the inertial sensors while the accumulative error is estimated and corrected by using the sound sensors. Experimental results that our method suppresses the accumulative errors for more than 10 minutes whereas the inertial-based method increased monotonically.

DOI： 10.1299/jsmermd.2018.2A1-M01

J-GLOBAL

researchmap
Development of Robot Audition to Extreme Environments

Hiroshi G. Okuno, Katsutoshi Itoyama, Kazuhiro Nakadai, Makoto Kumon, Yoshiaki Bando, Kotaro Hoshiba

The 62nd Annual Conference of the Institute of Systems, Control and Information Engineers (SCI’18) 62 ( 221‐1 ) 5p 2018.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：システム制御情報学会

CiNii Books

J-GLOBAL

researchmap
Development and Future Extension of Snake-like Robots on ImPACT TRC Project

The 62nd Annual Conference of the Institute of Systems, Control and Information Engineers (SCI’18) 62 ( 141-8 ) 4p 2018.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

researchmap
ImPACT TRC Thin Serpentine Robot Platform: Air-floating-type Active Scope Camera Integrated with Multiple Sensory Functions

The 62nd Annual Conference of the Institute of Systems, Control and Information Engineers (SCI’18) 62 ( 141-5 ) 5p 2018.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

CiNii Research

researchmap
WaveNetを用いた音符系列に対する歌唱F0軌跡の生成

和田雄介, 糸山克寿, 吉井和佳

情報処理学会第80回全国大会 80th ( 3N-5 ) 169 - 170 2018.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
VAEを事前分布とするNMFを用いた音楽音響信号に対するドラム譜推定

上田舜, 坂東宜昭, 糸山克寿, 吉井和佳

情報処理学会第80回全国大会 80th ( 3N-1 ) 161 - 162 2018.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
木構造モデルに基づくコードとメロディの対話的生成システム

津島啓晃, 糸山克寿, 吉井和佳

情報処理学会第80回全国大会 80th ( 2N-2 ) 145 - 146 2018.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
Factorial HSMMに基づく音楽音響信号に対するリード・リズムギター譜推定

柴田健太郎, 坂東宜昭, 尾島優太, 錦見亮, 糸山克寿, 吉井和佳

情報処理学会第80回全国大会 80th ( 3N-2 ) 163 - 164 2018.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
VAEを用いたメロディのモーフィング

村田叡, 坂東宜昭, 糸山克寿, 吉井和佳

情報処理学会第80回全国大会 2018 ( 3N-8 ) 175 - 176 2018.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では,音楽的な妥当性を考慮しながら,異なるメロディ間をモーフィングする方法について述べる.従来の生成音楽理論GTTMに基づくモーフィング手法では,メロディを階層的に簡約化するための木構造が似ているメロディ同士でしかモーフィングを行うことができなかった.本研究では,潜在変数モデルであるVAEを用いて,あらかじめ大量のメロディからメロディの生成モデルを学習しておくことにより,潜在空間において任意のメロディ間の線形補完を行う手法を提案する.

CiNii Books

CiNii Research

researchmap
ブラインド音源分離のための高速相関テンソル分解

北村昂一, 坂東宜昭, 糸山克寿, 吉井和佳, 河原達也

電子情報通信学会音声研究会 117 ( 517 ) 235 - 240 2018.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：電子情報通信学会

CiNii Books

CiNii Research

researchmap

Other Link： http://id.ndl.go.jp/bib/028943418
マルチチャネル非負値行列因子分解に基づくビームフォーミングを用いた雑音環境下音声認識

島田一希, 坂東宜昭, 三村正人, 糸山克寿, 吉井和佳, 河原達也

電子情報通信学会音声研究会 117 ( 517 ) 33 - 38 2018.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：電子情報通信学会

CiNii Books

CiNii Research

researchmap
Variational Auto Encoderを用いたメロディとコードのモーフィング

村田叡, 坂東宜昭, 糸山克寿, 吉井和佳

情報処理学会全国大会講演論文集 80th ( 2 ) 2018

　More details

J-GLOBAL

researchmap
感覚機能統合型能動スコープカメラの改良と瓦礫フィールドへの適用

山内悠, 安部祐一, 永野光, 昆陽雅司, 坂東宜昭, 山崎公俊, 糸山克寿, 猿渡洋, 岡谷貴之, 奥乃博, 田所諭

第18回計測自動制御学会システムインテグレーション部門講演会 (SI2017) 18th ( 1C6‐09 ) 2017.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
Development of Cord-like Robots in ImPACT TRC Project

Matsuno F., Ashizawa R., Suzuki Y., Itoyama K., Fujuwara M., Bando Y., Takemori T., Fujita M., Kamegawa T., Tanaka M., Okuno H., Tadakuma K., Date H., Ariizumi R., Ito K., Oomichi T.

Proceedings of the Conference of Transdisciplinary Federation of Science and Technology 2017 ( C‐3‐1 ) C - 3-1 2017.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：横断型基幹科学技術研究団体連合（横幹連合）

In ImPACT TRC (Tough Robot Challenge) project we have developed cord-like robots for not only daily maintenance/inspection of plants but also disaster response. In this paper overview of development of the cord-like robots is introduced and future progress is discussed.

DOI： 10.11487/oukan.2017.0_C-3-1

J-GLOBAL

researchmap
配管内探査ロボットのための音響センサを用いた自己位置推定

坂東宜昭, 須原大貴, 亀川哲志, 糸山克寿, 吉井和佳, 松野文俊, 奥乃博

第8回横幹連合コンファレンス ( C-4-2 ) 2017.12

　More details

Authorship：Corresponding author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
ポスター講演調とリズムを考慮した階層隠れセミマルコフモデルに基づく歌声の自動採譜 (情報論的学習理論と機械学習)

錦見亮, 中村栄太, 後藤真孝, 糸山克寿, 吉井和佳

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 117 ( 293 ) 147 - 153 2017.11

　More details

Language：Japanese Publisher：電子情報通信学会

researchmap
ポスター講演和音系列に対するPCFGのベイズ学習とSplit-Mergeサンプリングを用いたメロディへの和声付け (情報論的学習理論と機械学習)

津島啓晃, 中村栄太, 糸山克寿, 吉井和佳

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 117 ( 293 ) 101 - 107 2017.11

　More details

Language：Japanese Publisher：電子情報通信学会

researchmap
和音系列に対するPCFGのベイズ学習とSplit-Mergeサンプリングを用いたメロディへの和声付け

津島啓晃, 中村栄太, 糸山克寿, 吉井和佳

第20回情報論的学習理論ワークショップ (IBIS2017) 117 ( 293 ) 101 - 107 2017.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
調とリズムを考慮した階層隠れセミマルコフモデルに基づく歌声の自動採譜

錦見亮, 中村栄太, 後藤真孝, 糸山克寿, 吉井和佳

第20回情報論的学習理論ワークショップ (IBIS2017) 117 ( 293 ) 147 - 153 2017.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音響センサを用いた配管内探査ヘビ型ロボットの3次元位置推定

坂東宜昭, 須原大貴, 亀川哲志, 糸山克寿, 吉井和佳, 松野文俊, 奥乃博

第35回日本ロボット学会学術講演会 (RSJ2017) ( 3A2-01 ) 2017.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
雑音環境下音声認識のための多チャネル非負値行列因子分解に基づく教師なしビームフォーマ

島田一希, 坂東宜昭, 三村正人, 糸山克寿, 吉井和佳, 河原達也

電子情報通信学会音声研究会 117 ( 189 ) 19 - 24 2017.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：電子情報通信学会

CiNii Books

researchmap
調とリズムを考慮した階層隠れセミマルコフモデルに基づく歌声F0軌跡に対する音符推定

錦見亮, 中村栄太, 後藤真孝, 糸山克寿, 吉井和佳

情報処理学会第116回音楽情報科学研究会 2017-MUS-116 ( 17 ) 1 - 8 2017.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
和音系列の統計的木構造解析とSplit-Mergeサンプリングに基づくメロディへの和声付け

津島啓晃, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第116回音楽情報科学研究会 2017-MUS-116 ( 14 ) 1 - 7 2017.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
既存歌唱曲アレンジのための歌声キーボード

尾島優太, 中野倫靖, 深山覚, 加藤淳, 後藤真孝, 糸山克寿, 吉井和佳

情報処理学会第116回音楽情報科学研究会 2017-MUS-116 ( 4 ) 1 - 7 2017.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
楽曲中の歌声とユーザ歌唱のリアルタイムアラインメントに基づく伴奏追従型カラオケシステム

和田雄介, 坂東宜昭, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第116回音楽情報科学研究会 2017-MUS-116 ( 3 ) 1 - 7 2017.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
深層生成モデルを事前分布に用いた教師なし音声強調

坂東宜昭, 三村正人, 糸山克寿, 吉井和佳, 河原達也

電子情報通信学会音声研究会 117 ( 189 ) 1 - 6 2017.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：電子情報通信学会

CiNii Books

researchmap
ニューラルネットワークを用いたセミブラインド音声分離・強調

和気雅弥, 坂東宜昭, 三村正人, 糸山克寿, 吉井和佳, 河原達也

電子情報通信学会音声研究会 117 ( 189 ) 13 - 18 2017.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Real-Time Human-Voice Enhancement for a Hose-Shaped Rescue Robot Based on Multi-Channel Low-Rank Sparse Decomposition

Yoshiaki bando, Yuichi Ambe, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2017 ( 1P2-P05 ) 1P2 - P05 2017.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：Japan Society of Mechanical Engineers

DOI： 10.1299/jsmermd.2017.1p2-p05

researchmap
Sound Source Localization and Separation and Self-Localization Using Asynchronous Distributed Microphone Arrays

Sekiguchi Kouhei, bando Yoshiaki, Itoyama Katsutoshi, Yoshii Kazuyoshi

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2017 ( 1P2-P06 ) 1P2 - P06 2017.5

　More details

Authorship：Corresponding author Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：The Japan Society of Mechanical Engineers

This paper presents a method of simultaneous localization and mapping (SLAM) for estimating the positions of multiple sound sources and those of stationary robots and synchronizing microphone arrays attached to those robots. Since each robot with a microphone array can solely estimate the directions of sound sources, the two-dimensional sound positions can be estimated from the source directions estimated by multiple robots. In addition, sound mixtures can be separated accurately by regarding distributed microphone arrays as one big array. To perform these tasks, the robot positions and synchronization between microphone arrays are necessary. The proposed method estimates the posterior distribution of the positions and time offsets and conducts source separation simultaneously in a Bayesian manner, given the observed signals. We conducted experiments using three robots and four sound sources. When the two of the model parameters (robot positions, sound source positions, and time offsets) were fixed to the correct value, the other one was correctly estimated and the observed signals were separated precisely. However, when all of the parameters were estimated simultaneously, they cannot be estimated correctly because of many local optimal solutions of the posterior distribution.

DOI： 10.1299/jsmermd.2017.1P2-P06

researchmap
Development of active scope camera with sensory functions

Ambe Yuichi, Bando Yoshiaki, Nagano Hikaru, Konyo Masashi, Yamazaki Kimitoshi, Itoyama Katsutoshi, Saruwatari Hiroshi, Okatani Takayuki, G. Okuno Hiroshi, Tadokoro Satoshi

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2017 ( 1P2-P01 ) 1P2-P01 2017.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：The Japan Society of Mechanical Engineers

We had developed an active scope camera: the robot video scope that can move by itself to probe narrow gaps for rescue missions. However, to investigate the interiors of collapsed houses effectively, not only the high mobility but also the sensors, such as vision, audition and haptics should be installed on the robot. According to this, we have developed an active scope camera with the sensory functions and demonstrated these functions in the test field which imitates collapsed houses. In the demonstration, we mainly showed four functions. (1)Detection of collision and it's presentation to the operator, (2)Searching operation assist with the processed vision data, (3)Shape estimation of the robot with sound and IMU data, and (4)Voice enhancement to detect victim's voice.

DOI： 10.1299/jsmermd.2017.1p2-p01

CiNii Research

researchmap
既存歌唱曲のリアルタイム歌声アレンジシステム

尾島優太, 中野倫靖, 深山覚, 加藤淳, 後藤真孝, 糸山克寿, 吉井和佳

情報処理学会第79回全国大会 ( 7L-3 ) 127 - 128 2017.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，歌唱曲中の歌声を分離し，MIDIキーボードを用いてリアルタイムに音高・タイミングを変化させ，再合成する歌声編集システムについて述べる．これまでに既存楽曲中のドラムパートをリアルタイムにアレンジできるシステムが存在したが，同様のアレンジを歌声に対して行うためには歌詞・音高情報を用意する必要があり，その実現は困難であった．本研究では，音楽音響信号から分離された歌声を用いることで，そのような情報を事前に用意することなく歌声を編集することが可能なシステムを提案する．本システムにより，ハモリパートや合いの手といったパートを楽曲に付与することが可能となる．また，音高・タイミングの編集はリアルタイムで行われるため，DJのようなパフォーマンスが可能である．提案システムの評価のため，被験者実験を行った．

researchmap
ロボット対話における深層学習を用いたセミブラインド音声強調

和気雅弥, 坂東宜昭, 三村正人, 糸山克寿, 吉井和佳, 河原達也

情報処理学会第79回全国大会 2017 ( 6M-1 ) 219 - 220 2017.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では、人間とロボットの発話の混合音から人間の発話を強調するセミブラインド音声強調法について述べる。人間同士の対話で発生するような双方の発話が重なる状況にロボット対話が対処するには、自己発話を除去し、人間の発話を強調する必要がある。従来法のセミブラインド独立成分分析では、実環境で起こりうる非線形混合過程が考慮されていなかった．そこで本研究では，非線形混合過程を表現可能な深層学習を用いてセミブラインド音声強調を実現する．提案法を用いて2発話の混合音のうち一方を強調し，その音声認識率により有効性を評価した．

CiNii Books

CiNii Research

researchmap
遠隔音声認識のためのブラインド音源分離に基づくビームフォーマ

島田一希, 坂東宜昭, 板倉光佑, 三村正人, 糸山克寿, 吉井和佳, 河原達也

情報処理学会第79回全国大会 2017 ( 6M-2 ) 221 - 222 2017.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では、遠隔音声認識のための音声強調について述べる。録音信号に含まれる雑音は認識性能を大きく低下させるため、前処理として雑音を除去し目的音声を強調することは遠隔音声認識において不可欠である。マイクロホンアレイを用いた音声強調手法が活発に研究されており、アレイの配置や雑音の音源数・音色といった事前情報を用いず頑健に動作することが課題である。また認識性能低下の要因となる分離歪みの影響を抑えることも課題となる。本研究では、ブラインド音源分離により推定した空間相関行列に基づくビームフォーマを提案し、これらの課題を包括的に解決する。CHiME-4のデータを用いて、提案手法の有効性を評価した。

CiNii Books

CiNii Research

researchmap
ベイズ文脈自由文法に基づく和音系列の教師なし構文解析と自動生成

津島啓晃, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第79回全国大会 ( 4L-2 ) 87 - 88 2017.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
スケールと音高の過渡的変化を考慮したHSMMに基づく歌声F0軌跡に対する音符推定

錦見亮, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第79回全国大会 ( 7L-1 ) 123 - 124 2017.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
市販音楽CDを用いたユーザ歌唱に伴奏音が自動追従するスマートカラオケシステム

和田雄介, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第79回全国大会 ( 5L-1 ) 97 - 98 2017.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
熱帯の長期環境録音データから鳥類のさえずりを検出する

藤田素子, 丸山晃央, 糸山克寿, 奥乃博, 神崎護

第64回日本生態学会大会 ( P2‐B‐094 ) 2017.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
楽譜簡略化と自動補完伴奏によるピアノ演奏練習支援システム

福田翼, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第114回音楽情報科学研究会 2017-MUS-114 ( 21 ) 1 - 4 2017.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
HARK2.3の紹介とタフロボティクスチャレンジへの展開

中臺一博, 坂東宜昭, 水本武志, 干場功太郎, 小島諒介, 糸山克寿, 杉山治, 公文誠, 奥乃博

第17回計測自動制御学会システムインテグレーション部門講演会 (SI2016) ( 3A3‐3 ) 2016.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
マルチチャネル音源分離のための低ランク音源モデルとスパース重畳過程に基づくネスト型ベイズ混合・因子モデル

板倉光佑, 坂東宜昭, 中村栄太, 糸山克寿, 吉井和佳, 河原達也

第19回情報論的学習理論ワークショップ (IBIS2016) 116 ( 300 ) 353 - 359 2016.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：電子情報通信学会

CiNii Books

CiNii Research

researchmap
音楽音響信号解析のためのディリクレ過程に基づくベイズ潜在成分分析

吉井和佳, 中村栄太, 糸山克寿, 後藤真孝

第19回情報論的学習理論ワークショップ (IBIS2016) 116 ( 300 ) 155 - 162 2016.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音楽音響信号に対する多重音高推定と和音構造学習のための階層ベイズ音響・言語統合モデル

尾島優太, 中村栄太, 糸山克寿, 吉井和佳

第19回情報論的学習理論ワークショップ (IBIS2016) 116 ( 300 ) 329 - 335 2016.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
歌声F0軌跡に対する自動採譜のための準ビート同期セグメンタルHMM

錦見亮, 中村栄太, 糸山克寿, 吉井和佳

第19回情報論的学習理論ワークショップ (IBIS2016) 116 ( 300 ) 337 - 343 2016.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
変分ベイズ多チャネルRNMFに基づく柔軟索状レスキューロボットのための音声強調

坂東宜昭, 糸山克寿, 昆陽雅司, 田所諭, 中臺一博, 吉井和佳, 奥乃博

第34回日本ロボット学会学術講演会 (RSJ2016) 34th ( 1C2‐04 ) 2016.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
オンセイ ; オーガナイズドセッション「アラユルオトオタイショウトシタジョウホウショリノジツゲンニムケテ」

116 ( 189 ) 47 - 52 2016.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

CiNii Research

researchmap
マルチチャネル音源分離のためのネスト型基底・音源混合モデルに基づく時間周波数クラスタリング

板倉光佑, 坂東宜昭, 中村栄太, 糸山克寿, 吉井和佳, 河原達也

電子情報通信学会音声研究会 116 ( 189 ) 25 - 28 2016.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：電子情報通信学会

CiNii Books

researchmap
音型の反復と変形に基づく階層ベイズ音楽言語モデルとMIDI演奏のリズム採譜への応用

中村栄太, 糸山克寿, 吉井和佳

情報処理学会第112回音楽情報科学研究会 2016‐MUS‐112 ( 22 ) 1 - 6 2016.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
調・コード・音高・スペクトログラムの階層ベイズモデルに基づく多重音解析

尾島優太, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第112回音楽情報科学研究会 2016‐MUS‐112 ( 6 ) 1 - 8 2016.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
NMF vs PLCA: 多重音生成過程のための無限因子モデルと無限混合モデル

吉井和佳, 中村栄太, 糸山克寿, 後藤真孝

情報処理学会第112回音楽情報科学研究会 2016‐MUS‐112 ( 21 ) 1 - 10 2016.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
視聴覚統合ビートトラッキングとリアルタイムコード認識を用いたダンス共演ロボット

大喜多美里, 坂東宜昭, 糸山克寿, 吉井和佳

情報処理学会第112回音楽情報科学研究会 2016‐MUS‐112 ( 15 ) 1 - 6 2016.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
歌声F0軌跡に対する音符推定のためのベイジアン準ビート同期HMM

錦見亮, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第112回音楽情報科学研究会 2016-MUS-112 ( 7 ) 1 - 7 2016.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
3D Posture Estimation for a Hose-shaped Rescue Robot using a Microphone and Accelerometer Array

bando Yoshiaki, Itoyama Katsutoshi, Konyo Masashi, Tadokoro Satoshi, Nakadai Kazuhiro, Yoshii Kazuyoshi, G. Okuno Hiroshi

2016 ( 1A2-10a6 ) 2016.6

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人日本機械学会

This paper presents an online method that estimates a 3D posture of a hose-shaped rescue robot using a microphone and accelerometer array. Posture (shape) estimation of a self-driving hose-shaped rescue robot is crucial for handling the robot body because the unseen robot posture deforms in narrow spaces under collapsed buildings. Conventional sound-based method that uses time-differences of arrivals (TDOAs) works only on a two-dimensional surface and is often hampered by the rubble around the robot. Our method eliminates the outliers of sound-based TDOA measurements, and compensates the lack of the posture information with the tilt information measured by accelerometers. Experimental results using a 3-m hose-shaped robot that was deployed in a simple 3D structure demonstrate that our method reduces the errors of initial states to about 20cm in the 3D space.

DOI： 10.1299/jsmermd.2016.1A2-10a6

J-GLOBAL

researchmap
Development of Robot Audition under Severe Conditions

Hiroshi G. Okuno, Kazuhiro Nakadai, Makoto Kumon, Katsutoshi Itoyama, Kazuyoshi Yoshii, Yoshiaki Bando, Yoko Sasaki

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2016 ( 1A2-09b3 ) 1A2 - 09b3 2016.6

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：The Japan Society of Mechanical Engineers

The ability of robots to listen to several things at once with their own "ears", i.e., robot audition, is critical in improving the performance of search and rescue activities under severe conditions. This paper introduces "HARK" robot audition open-source software and its capabilities of suppressing ego-noise that is caused by robot's own movements such as motor, propeller and/or flying noise. Then it describes three main applications of robot audition: 1) Unmanned Aerial Vehicle (UAV) with a microphone array to capture sounds can localize a sound source by suppressing ego-noise with either hovering, slow gliding or fast gliding. It can also recognize a sound source by CNN. 2) A serpentine robot with a microphone array can estimate its posture by sound. It can also enhance a voice by Online Robust PCA. 3) A robot with a LiDAR and 32-channel microphone can visualize a sound map by superimposing sound source directions on point clouds.

DOI： 10.1299/jsmermd.2016.1A2-09b3

researchmap
Online Localization of Multiple Sound Sources and Multiple Robots with Asynchronous Microphone Arrays

Sekiguchi Kouhei, bando Yoshiaki, Nakamura Keisuke, Nakadai Kazuhiro, Itoyama Katsutoshi, Yoshii Kazuyoshi

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2016 ( 1A2-09b5 ) 1A2-09b5 2016.6

　More details

Authorship：Corresponding author Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：The Japan Society of Mechanical Engineers

This paper presents an online method for localizing the positions of multiple sound sources and stationary robots and synchronizing microphone arrays attached to those robots. Since each robot can estimate only the directions of sound sources, the two-dimensional source positions can be estimated from the source direction estimated by each robot using a triangulation. In addition, mixture signals can be separated accurately by regarding multiple microphone arrays as one big array. To perform these tasks, some methods have been proposed for localizing and synchronizing microphone arrays. These methods, however, assume only a single sound source exists. To overcome this limitation, we estimate the directions of arrival (DOAs) and separate observed signals to estimate the time differences of arrival (TDOAs) by using microphone array techniques, and integrate the DOAs and TDOAs by using a state-space model. The latent variables are estimated in an online manner with a FastSLAM2.0 algorithm.

DOI： 10.1299/jsmermd.2016.1A2-09b5

CiNii Research

researchmap
「音学シンポジウム2016」開催にあたって

北原鉄朗, 齋藤大輔, 森勢将雅, 深山覚, 糸山克寿, 滝口哲也, 饗庭絵里子, 堀内俊治, 寺島裕貴, 亀岡弘和, 大石康智, 程島奈緒, 向井智彦, 小幡哲史

情報処理学会第111回音楽情報科学研究会 (音学シンポジウム2015) 2016-MUS-111 ( 1 ) 1 - 2 2016.5

　More details

Language：Japanese Publishing type：Lecture material (seminar, tutorial, course, lecture, etc.)

researchmap
音源スペクトログラムの低ランク性とスパース性を考慮したNMF-LDAに基づくマルチチャネル音源定位と音源分離

板倉光佑, 坂東宜昭, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第78回全国大会 2016 ( 4Q-3 ) 485 - 486 2016.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，マイクロホンアレイを用いたマルチチャネル音源定位と音源分離を行ううえで，混合音スペクトログラムの低ランク性とスパース性を同時に考慮する手法について述べる．従来，LDAに基づく手法では，音源スペクトログラムのスパース性に着目し，観測スペクトログラムの各時間・周波数における空間相関行列をいずれかの音源・方向にクラスタリングすることが行われていた．本研究ではさらに，音源スペクトログラムの低ランク性に着目し， NMFを用いて観測スペクトログラムを低ランク近似すると同時に，各時間・周波数成分を音源・方向にクラスタリングできる統一的なベイズモデルを提案する．

CiNii Books

CiNii Research

researchmap
コード進行と多重音スペクトルの階層ベイズモデルに基づく音楽音響信号の音高推定

尾島優太, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第78回全国大会 ( 3Q-6 ) 475 - 476 2016.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
ビート準同期隠れマルコフモデルに基づく歌声音高軌跡に対する音符推定

錦見亮, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第78回全国大会 ( 3Q-5 ) 473 - 474 2016.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音源到来方向・時間差を用いた非同期複数マイクロホンアレイ位置のオンライン推定

関口航平, 中村圭佑, 坂東宜昭, 糸山克寿, 吉井和佳, 中臺一博

情報処理学会第78回全国大会 2016 ( 4Q-2 ) 483 - 484 2016.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では非同期複数マイクロホンアレイの同期ずれ・位置推定手法について述べる．マイクロホンアレイを搭載した複数台のロボットを用いた音源定位・分離などの音環境認識技術は，単独のロボットを用いた場合よりも高精度な処理を行うことができる．しかし，複数台のロボットを用いたマイクロホンアレイ信号処理には，各ロボットの位置，マイクロホンアレイ間の同期ずれの推定が不可欠である．本稿では各マイクロホンアレイごとに個別に推定した音源定位・位相情報をもとに，非同期複数マイクロホンアレイ間の同期ずれ・位置推定を行う．ロボットと音源の位置・同期ずれを潜在変数として持つ状態空間モデルを設計し，その事後分布をオンライン推定する．

CiNii Books

CiNii Research

researchmap
マイクロホンアレイ音源分離のための複素t分布に基づくマルチチャネル非負値行列因子分解

北村昂一, 坂東宜昭, 糸山克寿, 吉井和佳

情報処理学会第78回全国大会 2016 ( 4Q-1 ) 481 - 482 2016.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，マイクロホンアレイで収録された混合音の音源分離を行うため，複素t分布を尤度関数とするマルチチャネル非負値行列因子分解(NMF)について述べる．マルチチャネルNMFは，音源の低ランク性を仮定することで，伝達関数の測定を不要とするブラインド音源分離法の一つである．最近，単一チャネルNMFでは，尤度関数に複素正規分布の代わりに複素t分布を用いることにより，初期値依存性が低く，外れ値に頑健な音源分離を実現できることが報告されている．本研究では，複素t分布に基づく単一チャネルNMFをマルチチャネルNMFに拡張した手法を提案する．

CiNii Books

CiNii Research

researchmap
ビート位置依存隠れセミマルコフモデルに基づく音楽音響信号に対するコード認識

丸尾智志, 前澤陽, 中村栄太, 糸山克寿, 吉井和佳

情報処理学会第78回全国大会 ( 3Q-2 ) 467 - 468 2016.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
A HUMANOID ROBOT THAT CAN SING AND DANCE TO MUSIC BY RECOGNIZING BEATS AND CHORDS IN REAL TIME

ELCAS Journal 1 97 - 100 2016.3

　More details

Language：Japanese

researchmap
日本語方言における音声対訳コーパスの構築

吉野幸一郎, 平山直樹, 森信介, 高橋文彦, 糸山克寿, 奥乃博

言語処理学会第22回年次大会 (NLP2016) ( B5‐2 ) 2016.2

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
複数移動ロボットによる協調音源分離のための分離精度予測を用いた配置最適化

関口航平, 坂東昭宜, 糸山克寿, 吉井和佳

第43回人工知能学会 AIチャレンジ研究会 ( SIG-Challenge-043-08 ) 41 - 46 2015.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音楽音響信号解析のためのステューデントt分布に基づく非負値行列分解と半正定値テンソル分解

吉井和佳, 糸山克寿, 後藤真孝

第18回情報論的学習理論ワークショップ (IBIS2015) 115 ( 323 ) 131 - 138 2015.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
複数移動ロボットを用いた音源分離における音源配置に応じたロボットの最適配置探索

関口航平, 坂東宜昭, 糸山克寿, 吉井和佳

第33回日本ロボット学会学術講演会 (RSJ2015) ( 3D1-06 ) 2015.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
ロバスト主成分分析を用いた動作雑音抑圧に基づく柔軟索状ロボットのための音声強調

坂東宜昭, 糸山克寿, 昆陽雅司, 田所諭, 中臺一博, 吉井和佳, 奥乃博

第33回日本ロボット学会学術講演会 (RSJ2015) 33rd ( 2D2-05 ) 2015.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
音源分離のためのベイズモデルに基づく音源信号の不確実性を考慮した音声認識

板倉光佑, 坂東宣昭, 糸山克寿, 吉井和佳

日本音響学会 2015年秋季研究発表会 ( 3-2-3 ) 2015.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音楽音響信号に対する歌声・伴奏音・打楽器音分離に基づくコード認識

丸尾智志, 池宮由楽, 糸山克寿, 吉井和佳

情報処理学会第108回音楽情報科学研究会 2015-MUS-108 ( 1 ) 1 - 6 2015.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
非ガウス性モノラル音響信号に対する音源分離のための非負値行列分解と半正定値テンソル分解

吉井和佳, 糸山克寿, 後藤真孝

情報処理学会第108回音楽情報科学研究会 2015-MUS-108 ( 2 ) 1 - 9 2015.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
両耳聴ロボット聴覚ソフトウェアHARK‐BinauralとRaspberry Pi2を用いたヒューマノイドロボットへの適用

坂東宜昭, 金宜鉉, 糸山克寿, 吉井和佳, 中臺一博, 奥乃博

情報処理学会第107回音楽情報科学研究会 (音学シンポジウム2015) 2015-MUS-107 ( 33 ) 1 - 2 2015.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
モノラル音楽音響信号を対象としたRPCAと音高推定に基づく歌声・伴奏分離

池宮由楽, 糸山克寿, 吉井和佳

情報処理学会第107回音楽情報科学研究会 (音学シンポジウム2015) 2015-MUS-107 ( 57 ) 1 - 3 2015.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
視聴覚統合NMFによるカエル合唱音声の分析

糸山克寿, 坂東宜昭, 粟野皓光, 合原一究, 吉井和佳

情報処理学会第107回音楽情報科学研究会 (音学シンポジウム2015) 2015-MUS-107 ( 55 ) 1 - 6 2015.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
音楽音響信号に対する相補的な歌声分離と音高推定

池宮由楽, 糸山克寿, 吉井和佳

情報処理学会第77回全国大会 ( 5S-1 ) 417 - 418 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
早言いクイズ司会者ロボットの開発と評価

西牟田勇哉, 糸山克寿, 吉井和佳, 奥乃博

情報処理学会第77回全国大会 2015 ( 5T-6 ) 509 - 510 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，複数プレイヤーで競う「早言い」クイズの司会を行えるロボットの開発と評価について述べる．「早言い」クイズでは，各プレイヤーはボタンを押すなどの事前の合図を必要とせず，思いついた回答を直接発話する自然な音声インタラクションことが許されている．そのため，ロボットの出題中における割り込み回答や複数プレイヤーによる同時回答に対処する必要がある．本研究では，マイクロフォンアレイを用いた音源定位・音源分離技術に基づいて，クイズインタラクションの進行管理を行うロボットを開発した．被験者実験により，人・ロボット聴覚能力比較や印象評価を行い，提案ロボットの有用性を確認した．

CiNii Books

J-GLOBAL

researchmap
聴覚アウェアネスの可視化のための深度センサとマイクロフォンアレイを用いた物体認識と音イベント検出

井山貴裕, 杉山治, 坂東宜昭, 糸山克寿, 吉井和佳, 奥乃博

情報処理学会第77回全国大会 2015 ( 2ZB-7 ) 379 - 380 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，聴覚アウェアネス可視化のための深度センサとマイクロフォンアレイを用いた物体認識と音イベント検出について述べる．従来の可視化手法では，すべての音響情報をカメラ画像に重ね合わせてユーザに提示していたため，画面内の音環境の詳細な観察が困難であった．本研究では，この問題を解決するため，深度センサで得られる音源形状データに対して物体認識を，マイクロフォンアレイで得られる音圧データに対して音イベント検出を行うことで，画面内の興味のある物体のみに着目し，発生する音の時間変化の様子を観察（聴覚アウェアネスの可視化）できる手法を提案する．実験の結果，提案手法の有効性を確認した．

CiNii Books

J-GLOBAL

researchmap
プログラミング基礎教育のための図形言語の3D拡張

古川孝太郎, 糸山克寿, 吉井和佳, 奥乃博

情報処理学会第77回全国大会 2015 ( 3ZF-5 ) 947 - 948 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，Scheme 上で 3D 図形を構成することでプログラミングの基礎的内容への初学者の理解を促進するシステムについて述べる．従来，図形とそれを描画するプログラムとの構造の類推による抽象化の学習は，プログラミングの代表的な教科書である SICP において導入される図形言語によってなされてきたが，これは構成可能な対象が 2D 図形の画像に限定されていた．本システムは図形言語の枠組みにのっとり，CSG の思想を取り入れて構成的に 3D 図形を描画し，3D プリンタにより造形可能な形式でモデルを出力する 3D 図形言語システムを提案する．本システムを講義の補助教材として用いて受講生に図形を作成させ，造型されたモデルとともに学生にフィードバックを与えたところ良好な反応を得て有効性を確認した．

CiNii Books

J-GLOBAL

researchmap
ユーザの技術に合わせた自動編曲機能をもつピアノ演奏練習システム

福田翼, 池宮由楽, 糸山克寿, 吉井和佳

情報処理学会第77回全国大会 ( 4S-2 ) 403 - 404 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
歌声・伴奏音・打楽器音分離に基づく音楽演奏支援システム

土橋彩香, 池宮由楽, 糸山克寿, 吉井和佳

情報処理学会第77回全国大会 ( 4S-1 ) 401 - 402 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
混合音に対する音源分離の不確実性を考慮した同時発話音声認識

板倉光佑, 西牟田勇哉, 坂東宜昭, 糸山克寿, 吉井和佳

情報処理学会第77回全国大会 2015 ( 5P-2 ) 117 - 118 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，複数の発話を含む混合音に対する音源分離結果を一意に定めることなく同時発話音声認識を行う方法について述べる．人間は複数の人から話かけられた時に，脳の中で単独発話音声信号を復元しているわけではないが，直接単語を聞き取ることが可能である．従来の同時発話音声認識システムでは，音源分離を行ったのちに独立した処理として音声認識を行っており，認識精度に限界があった．この問題を解決するため，本研究では，分離音声の不確実性を確率的に取り扱うことで分離音声をベイズ的に積分消去することにより，混合音を直接認識することができる方法を提案する．実験の結果，提案法により認識率が向上することを確認した．

CiNii Books

CiNii Research

researchmap
柔軟索状レスキューロボットのためのロバスト主成分分析を用いた走行雑音抑圧

坂東宜昭, 池宮由楽, 糸山克寿, 昆陽雅司, 田所諭, 中臺一博, 吉井和佳, 奥乃博

情報処理学会第77回全国大会 77th ( 5T-4 ) 505 - 506 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
ダンス共演ロボットのためのマルチモーダルビートトラッキング

大喜多美里, 坂東宣昭, 池宮由楽, 糸山克寿, 吉井和佳

情報処理学会第77回全国大会 2015 ( 5S-5 ) 425 - 426 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，ダンス共演ロボットのためのマルチモーダルビートトラッキング手法について述べる．ダンス共演ロボットは，音楽と人間の動作を認識しながら人間と共に踊るエンターテインメントロボットである．この種のロボットでは，音楽のテンポとビート時刻をリアルタイムに推定することが重要であるが，音響信号のみを用いた従来法では，音楽のテンポの揺らぎや裏拍ビートを含む多様なリズムの追従に失敗する問題があった．本研究では，この問題を解決するため，音響信号に加えてダンサーの骨格時系列情報を同時に考慮しながらビートトラッキングを行う手法を提案する．実際のセンサデータを用いた実験により，提案法の有効性を確認した．

CiNii Books

CiNii Research

researchmap
コード制約付きNMFを用いた音高推定に基づくコード認識

丸尾智志, 吉井和佳, 糸山克寿, Matthias Mauch, 後藤真孝

情報処理学会第77回全国大会 ( 5S-3 ) 421 - 422 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
分散型マイクロホンアレイを用いた音源分離のための複数移動ロボットの配置最適化

関口航平, 坂東宣昭, 糸山克寿, 吉井和佳

情報処理学会第77回全国大会 ( 4T-7 ) 497 - 498 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
タイ熱帯林における鳥類の自動音声認識による多様性調査法の開発

丸山晃央, 藤田素子, 奥乃博, 糸山克寿, PRATUMTHONG Dome, ARTCHAWACOM Taksin, 神崎護

第62回日本生態学会大会 ( D1-18 ) 2015.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
マイクロホンアレイとスピーカをもつ柔軟索状ロボットのための動的スピーカ選択による姿勢推定の高速化

坂東宜昭, 糸山克寿, 昆陽雅司, 田所諭, 中臺一博, 吉井和佳, 奥乃博

第41回人工知能学会 AIチャレンジ研究会 41st ( SIG-ChallengeB402-08 ) 45 - 50 2014.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
深度センサとマイクロフォンアレイを用いた聴覚アウェアネスの提示

井山貴裕, 杉山治, 坂東宜昭, 糸山克寿, 吉井和佳, 奥乃博

第41回人工知能学会 AIチャレンジ研究会 ( SIG-Challenge-B402-04 ) 20 - 25 2014.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
音楽音響信号解析のためのガンマ過程に基づく無限重畳離散全極モデル

吉井和佳, 糸山克寿, 後藤真孝

第17回情報論的学習理論ワークショップ (IBIS2014) 114 ( 360 ) 191 - 198 2014.11

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap

Other Link： http://sap.ist.i.kyoto-u.ac.jp/members/yoshii/papers/ibis-2014-yoshii.pdf
Schemeによる3D図形の構成的制作

古川孝太郎, 坂東宜昭, 糸山克寿

日本ソフトウェア科学会大会論文集 31 101 - 108 2014.9

　More details

Language：Japanese Publisher：[日本ソフトウェア科学会]

CiNii Books

researchmap
「早言い」合図を識別しインタラクションに活用するロボットクイズ司会者

西牟田勇哉, 吉井和佳, 西出俊, 糸山克寿, 奥乃博

第32回日本ロボット学会学術講演会 (RSJ2014) ( 1I2-05 ) 2014.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
聴覚アウェアネス可視化モデルに基づくジェスチャ操作インタフェースの開発

井山貴裕, 杉山治, 坂東宜昭, 糸山克寿, 吉井和佳, 奥乃博

第32回日本ロボット学会学術講演会 (RSJ2014) ( 1I2-04 ) 2014.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
マイクロホンアレイを用いた駆動機構付ホース型ロボットの姿勢推定

坂東宜昭, 糸山克寿, 昆陽雅司, 田所諭, 中臺一博, 吉井和佳, 奥乃博

第32回日本ロボット学会学術講演会 (RSJ2014) 32nd ( 1I2-02 ) 2014.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
Schemeによる3D図形の構成的制作

古川孝太郎, 坂東宜昭, 糸山克寿, 吉井和佳, 奥乃博

日本ソフトウエア科学会第31回大会 ( 一般2-3 ) 2014.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
結合動的モデルに基づく音響信号アライメント

前澤陽, 糸山克寿, 吉井和佳, 奥乃博, 河原達也

情報処理学会第104回音楽情報科学研究会 2014-MUS-104 ( 13 ) 1 - 7 2014.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

本稿では，複数の演奏者が演奏した同一楽曲の複数の音響信号の比較を支援するため，各音響信号の時刻を同一楽曲内での位置に対応づける手法（音響信号アライメント）について述べる．従来，演奏の解析において，テンポの動特性に関するモデルの有用性が指摘されていたが，一般的な音響信号アライメント手法にはテンポ推定の機構がなく，テンポ情報を活用することができなかった．本研究では，テンポの動特性を間接的にモデル化するため，楽曲の各位置で，各音響信号が演奏する，瞬時的なテンポ同士の比率をモデル化する．具体的には，瞬時的なテンポの比率が連続的であり，その変化量は音響信号間で相関があることを仮定することで，テンポ軌跡の連続性と演奏者間の類似性を同時にモデル化する．このとき，変化量を生成する背後にある共分散行列は，少数の代表的な共分散行列から構成されるマルコフ系列であるとして確率的な定式化を行う．これにより，楽曲を通して頻出する，特徴的なテンポ比率の発生箇所とその変動パターンを同時に学習することが出来るため，演奏解析に有益な情報も得られる．評価実験の結果，アライメントの精度が向上することが示され，解釈の違いの分析に対する有用性が示唆された．

CiNii Books

J-GLOBAL

researchmap
多重音基本周波数解析のための無限重畳離散全極型モデル

吉井和佳, 糸山克寿, 後藤真孝

情報処理学会第104回音楽情報科学研究会 2014-MUS-104 ( 9 ) 1 - 8 2014.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

本稿では，多数の楽器音が重畳している音楽音響信号を，音の三要素である音高（基本周波数）・音色（スペクトル包絡）・音量に分解するための確率的ソース・フィルタモデルについて述べる．ソース・フィルタ理論は楽器音分析に広く利用されており，楽器音のフーリエ変換スペクトルは，音源信号の基本周波数に起因するスペクトル微細構造と楽器音の音色を表すスペクトル包絡との積に分解される．このとき，スペクトル包絡が全極型モデルで表現できると仮定すると，理論的には線形予測分析 (LPC) を用いて，線形周波数領域でスペクトル包絡を推定することができる．しかし，実際には，調波構造のピークのみがスペクトル包絡からの信頼できるサンプルであるとみなせるため，スペクトル包絡推定に全周波数帯域を利用することは適切ではない．この問題の解決法のひとつに離散全極型モデルが知られているが，多重音に対して適用することはできなかった．本研究では，離散全極型モデルを LPC の多重音拡張である複合自己回帰モデルの枠組みに組み入れることで，調波構造が複数重畳した音響信号を扱うことができる無限重畳離散全極型モデルを提案する．本モデルは，人間の聴覚特性に則した対数周波数領域で定式化されるノンパラメトリックベイズモデルであり，適切な個数のスペクトル包絡とそこからサンプルされた適切な個数の調波構造を推定することができる．実験の結果，提案手法の有効性を確認した．

CiNii Books

researchmap
混合音中の歌声F0軌跡に対する歌唱表現転写システム

池宮由楽, 糸山克寿, 吉井和佳, 奥乃博

情報処理学会第104回音楽情報科学研究会 2014-MUS-104 ( 23 ) 1 - 6 2014.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

本稿では，音楽音響信号に含まれる歌声の基本周波数 (F0) 軌跡に対して歌唱表現（ビブラート・グリッサンド・こぶし）を転写することを可能とするシステムを提案する．能動的音楽鑑賞インタフェースは，エンドユーザのインタラクティブな音楽鑑賞を実現することを目的とした研究アプローチである．これには既存楽曲の加工支援も含まれ，歌声に関連するものでは，声質変換や歌声分離などの研究がなされている．本研究では，歌唱の歌い回しの加工を扱い，特に混合音中の歌声の F0 軌跡を任意に編集するインタフェースを実現する．ユーザは，歌声の任意の箇所を指定し，好みの歌唱表現を転写することで，歌い回しを自由に加工することができる．また，事前に市販楽曲からプロ歌手の歌唱表現を蓄積したデータベースを作成し，ユーザはそのデータベースから歌唱表現を参照することで直感的に転写を行うことが可能となる．歌唱表現の転写は，対数周波数軸において選択的に歌声のスペクトルのみをシフトさせ，伴奏音への影響を抑圧しながら歌声の音高を操作することで行われる．このとき，音韻性を保持するためスペクトル包絡を用いて音色の補正を行う．実際にユーザが表現の転写箇所を指定したり，F0 の存在範囲を提示するため，Graphical User Interface (GUI) の作成を行っている．実験では，音色補正の有効性やユーザ入力を用いた F0 推定の頑健性などを確認した．

CiNii Books

J-GLOBAL

researchmap
新博士によるパネルディスカッションIV「新博士さんいらっしゃい！」

竹川佳成, 平田圭二, 糸山克寿, 大石康智, 橘秀幸, 寺澤洋子, 土井啓成, 平野砂峰旅, 深山覚, 松原正樹

情報処理学会第104回音楽情報科学研究会 2014-MUS-104 ( 12 ) 1 - 5 2014.8

　More details

Language：Japanese Publishing type：Lecture material (seminar, tutorial, course, lecture, etc.) Publisher：一般社団法人情報処理学会

「新博士によるパネルディスカッション」は，音楽情報科学の研究に取り組んできた博士号を取得したばかりの方を集め，研究の紹介，博士課程進学の動機，博士課程在学中のドラマ，今後の抱負などについてパネル形式で議論する．本稿では，今回パネリストとして参加していただく 8 名の新博士を紹介する．

CiNii Books

researchmap
HARKによって定位・分離された多方向音声のアノテーションツールの開発

杉山治, 糸山克寿, 中臺一博, 奥乃博

電子情報通信学会クラウドネットワークロボット研究会 114 ( 85 ) 23 - 26 2014.6

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
市販楽曲からの歌い方ライブラリの作成(ポスターセッション,音学シンポジウム2014)

池宮由楽, 糸山克寿, 奥乃博

電子情報通信学会技術研究報告. SP, 音声 114 ( 52 ) 243 - 244 2014.5

　More details

Language：Japanese Publisher：一般社団法人電子情報通信学会

本稿では,市販楽曲からビブラート,こぶしやグリッサンドといった歌い方に関係する特徴を歌唱表現として抽出することで,歌手の歌い方のライブラリを作成する手法について述べる.これらの特徴は,歌唱F0軌跡中の特徴的な変動として現れる.本手法ではまず,時間周波数領域での最適経路探索問題を定式化することにより高周波数分解能,高精度な歌唱F0推定を行う.推定F0軌跡からパターンマッチングにより各歌唱表現を同定,パラメータ表現する.実験では,実際に市販楽曲からプロ歌手の歌唱表現を抽出できることを確認した.

CiNii Books

researchmap
市販楽曲からの歌い方ライブラリの作成

池宮由楽, 糸山克寿, 吉井和佳, 奥乃博

情報処理学会第103回音楽情報科学研究会 (音学シンポジウム2014) 2014-MUS-103 ( 48 ) 1 - 2 2014.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：http://id.nii.ac.jp/1001/00101343/

本稿では，市販楽曲からビブラート，こぶしやグリッサンドといった歌い方に関係する特徴を歌唱表現として抽出することで、歌手の歌い方のライブラリを作成する手法について述べる．これらの特徴は，歌唱 F0 軌跡中の特徴的な変動として現れる．本手法ではまず，時間周波数領域での最適経路探索問題を定式化することにより高周波数分解能，高精度な歌唱 F0 推定を行う．推定 F0 軌跡からパターンマッチングにより各歌唱表現を同定，パラメータ表現する．実験では、実際に市販楽曲からプロ歌手の歌唱表現を抽出できることを確認した．

CiNii Books

researchmap
潜在共通構造モデルに基づく音響信号間アライメント

前澤陽, 糸山克寿, 吉井和佳, 奥乃博

情報処理学会第103回音楽情報科学研究会 (音学シンポジウム2014) 2014-MUS-103 ( 23 ) 1 - 6 2014.5

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では,同一楽曲を演奏した複数の音響信号に対して時間軸対応付け(音響信号間アライメント)を行うための確率モデルを提案する.我々は,アライメント結果に基づいて演奏分析を行う応用を考えると,複数の演奏の背後に存在する潜在的な共通構造と各演奏に固有の時間的ゆらぎとを区別することが重要であると考えている.従来は,動的時間伸縮法(DTW)やLeft-to-Right型隠れマルコフモデル(LRHMM)を用いて,表層的な音響的類似度に基づいて対応点を探す手法が主流であった.一方,本研究では,複数の演奏に共通な状態系列を生成する上位HMMと,上位HMMで定められた順序で状態を遷移する演奏ごとに独立な下位LRHMMを考え,両者を階層HMMとして確率的に統合する.このとき,上位HMMにおいては,楽曲中で繰り返し登場する音響的特徴が同じ状態に割り当てられているので,楽曲自体の音楽構造の解析が容易に行える.さらに,下位LRHMMにおいては,各状態での滞留時間に着目することで,各演奏に固有の時間的ゆらぎを調査することができる.実験の結果,音響信号間アライメント精度の点で,提案手法は従来法より優れていることが分かった.

CiNii Books

researchmap
深度センサとマイクロホンアレイを用いた音源位置可視化による聴覚アウェアネスの提示

井山貴裕, 杉山治, 大塚琢馬, 糸山克寿, 奥乃博

情報処理学会第76回全国大会 2014 ( 6S-5 ) 489 - 490 2014.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では、複数の音源が同時に存在しうる環境下における音源可視化による聴覚アウェアネスの提示手法について述べる。従来の音源可視化は観測混合音から検出される全ての音源を区別なく表示するため、可視化結果が煩雑になる。可視化の煩雑さを軽減するためには、音源をフィルタリングし、ユーザの必要な音源を選択的に提示する必要がある。本稿では，音源位置によって可視化方法を変化させる複数音環境における聴覚アウェアネス可視化手法を述べる．音源位置推定には，マイクロホンアレイによる音源到来方向推定と深度センサによる物体までの距離推定を併用する．本手法を深度センサとマイクロホンアレイを用いて実装し、その有効性を確認した。

CiNii Books

J-GLOBAL

researchmap
環境音に頑健な同時合図を識別するクイズ司会者の構築

西牟田勇哉, 平山直樹, 大塚琢馬, 杉山治, 糸山克寿, 奥乃博

情報処理学会第76回全国大会 2014 ( 5S-1 ) 461 - 462 2014.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

近年，実環境で人と共存してコミュニケーションを行うロボットが期待されているが，従来の音声対話システムは一人で，理想的な環境において利用するに留まっていた．ここで，ロボットが実環境で人と共存してコミュニケーションを行うには，ロボット自身に搭載されたマイクロフォンを用いた複数話者の位置同定，同時発話の分離といった音環境理解，また実環境での環境音に頑健な音声認識が必要となる．本研究ではロボット聴覚ソフトウェアHARKを用いて音環境理解を，言語モデルの切り替えによる誤認識の抑制，音節タイプライタを用いた雑音棄却によって環境音に頑健な音声認識を実現した対話システムを構築した．

CiNii Books

J-GLOBAL

researchmap
音響特徴量を用いた楽曲印象分布の推定

絵本詩織, 糸山克寿, 奥乃博

情報処理学会第76回全国大会 2014 ( 6R-8 ) 391 - 392 2014.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では、楽曲の音響信号から素の楽曲の印象の分布を推定する手法について述べる。音響信号から抽出された音響特徴量と、被験者実験によって得られた楽曲の印象の関係を学習する。楽曲の印象はラッセルの円環モデルに基づくV-A平面の座標で表現する。楽曲の印象には個人やフレーズの移り変わりによるばらつきがあるため、単一の座標ではなくばらつきをもつ分布として未知楽曲の印象を推定する。

CiNii Books

J-GLOBAL

researchmap
マイクロホンアレイの位置推定によるホース型ロボットの姿勢推定

坂東宜昭, 大塚琢馬, 糸山克寿, 昆陽雅司, 田所諭, 中臺一博, 奥乃博

情報処理学会第76回全国大会 76th ( 4V-1 ) 189 - 190 2014.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

ホース型ロボットは細長い形状が特徴のレスキューロボットで，倒壊した建築物の隙間などの探索が可能である．操縦の効率化のために加速度センサやカメラ画像などを用いた本ロボットの姿勢推定法が提案されてきたが，累積誤差が生じるなどの問題があった．本稿ではマイクロホンアレイと小型スピーカを本ロボットに装着し，音によるこれらの位置推定によって姿勢を推定する手法について述べる．本手法ではスピーカから発する試験音の各マイクへの到達時間差を用いて姿勢を推定するが，到達時間差は現在のマイクとスピーカの位置関係を表しており，過去の誤差を修正できる．実録音データを用いて本手法の有効性を評価した．

J-GLOBAL

researchmap
混合方言言語モデルと混合比推定による方言音声認識システム

平山直樹, 吉野幸一郎, 糸山克寿, 森信介, 奥乃博

情報処理学会第76回全国大会 2014 ( 4S-6 ) 451 - 452 2014.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，複数方言が混合した日常発話を対象とする音声認識システムを開発する．日常発話では居住地の方言だけでなく，話者や両親の経歴，交通やメディアの影響で様々な地域の方言が混合する．これまでの方言音声認識では単一方言が対象であり，方言の混合という概念は捉えられていなかった．本稿では，入力発話の方言をいくつかの方言の混合として，単一方言言語モデルの重み付き平均で構築した混合方言言語モデルで音声認識を行う．言語モデルの混合比を変化させて，尤度が最大となる音声認識結果を出力する．5方言を対象とした混合方言言語モデルによる実験で，話者方言のみの言語モデルを用いた場合と比較して音声認識精度が向上することを示した．

CiNii Books

J-GLOBAL

researchmap
歌声-話声変換における動的音響特徴量が話声らしさに及ぼす影響

山崎健史, 池宮由楽, 糸山克寿, 奥乃博

情報処理学会第76回全国大会 2014 ( 5R-8 ) 373 - 374 2014.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

近年、CGMなどの普及と共に多様な音声合成技術が求められるようになってきた. 中でもSpeakBySingingでは歌声から話声への変換を目的としている。従来法では、変換話声の話声らしさと元の歌声の声質との保持性について評価しているが、どのような処理が話声の自然性を生み出しているのかを深く吟味されていなかった。歌声と話声の識別における音響的特徴として音韻長や音高、jitter、ビブラートに代表される動的成分等が挙げられるが、本稿ではその音響的特徴から歌声-話声変換の話声らしさに影響を与える主要な要素として動的な音響的特徴に着目する。実験では、実録音声を各手法による変換結果を視聴者実験によって比較することで、どの音響的特徴が歌声-話声変換の話声としての自然さに影響を与えるのか評価した。

CiNii Books

J-GLOBAL

researchmap
伴奏付き歌唱からの歌唱表現のパラメータ化と転写

池宮由楽, 糸山克寿, 奥乃博

情報処理学会第76回全国大会 2014 ( 5R-7 ) 371 - 372 2014.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，伴奏付き歌唱に含まれるビブラートやこぶしといった歌唱表現のパラメータ化と，それらを用いた歌い方の転写について述べる．歌唱表現は歌唱者の個人性を強く反映し，それらをパラメータ化し保存することで，CGMやMIRへの応用が可能になる．本手法ではまず，歌唱F0を探索範囲を制限したビタビ探索によって推定する．歌唱表現はF0軌跡中の特徴的な変動として現れ，観測的に決定したテンプレートに基づき同定・パラメータ化する．また，集積されたパラメータから歌唱表現を再合成し，単調な歌唱への転写を行う．実験では，市販楽曲からプロ歌手の歌唱表現を学習し，歌声合成システムへの転写を行った．

CiNii Books

J-GLOBAL

researchmap
ギター演奏音からの難易度調整可能なタブ譜自動生成システム

矢澤一樹, 糸山克寿, 奥乃博

情報処理学会第76回全国大会 2014 ( 5R-5 ) 367 - 368 2014.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，ギター演奏者の演奏支援を目的とした，音響信号からの難易度調整可能なタブ譜自動生成手法を提案する．従来のタブ譜生成法の多くは，ある音響信号に対して一意に音高・運指を推定する手法であったため，出力タブ譜の難易度とユーザーの演奏レベルとが一致しない可能性があった．そこで提案法では，音響信号からの運指推定を重み付き有向グラフ上での最適経路探索問題として新たにモデル化し，本グラフのパラメータを調整することによって，出力タブ譜の難易度を調整可能にした．評価実験では，本システムによって出力されたタブ譜について，音高推定精度と運指難易度の両面から評価を行った．

CiNii Books

J-GLOBAL

researchmap
ロボット聴覚ソフトウェアHARKを用いたクイズの同時回答を識別するロボット司会者の設計と実装

西牟田勇哉, 平山直樹, 大塚琢馬, 杉山治, 糸山克寿, 奥乃博

第38回人工知能学会 AIチャレンジ研究会 ( SIG-Challenge-B302-09 ) 45 - 50 2013.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
ホースの伸び縮みによるマイク位置の変化を許容するマイクロホンアレイを用いたホース型ロボットの姿勢推定

坂東宜昭, 大塚琢馬, 糸山克寿, 中村圭佑, 昆陽雅司, 田所諭, 中臺一博, 奥乃博

第38回人工知能学会 AIチャレンジ研究会 38th ( SIG-Challenge-B302-10 ) 51 - 56 2013.12

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
HARKを用いたロボットクイズ司会者HATTACK25の開発

西牟田勇哉, 平山直樹, 大塚琢馬, 杉山治, 糸山克寿, 奥乃博

第31回日本ロボット学会学術講演会 (RSJ2013) ( 3D3-08 ) 2013.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
Multirotor UAVを用いた音源定位のための雑音相関行列推定

古川孝太郎, 大塚琢馬, 糸山克寿, 中臺一博, 奥乃博

第31回日本ロボット学会学術講演会 (RSJ2013) ( 3D3-02 ) 2013.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
ホース型ロボットのマイクロホンアレイを用いた姿勢推定

坂東宜昭, 大塚琢馬, 水本武志, 糸山克寿, 中臺一博, 奥乃博

第31回日本ロボット学会学術講演会 (RSJ2013) ( 3D3-01 ) 2013.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
楽器音に対する仮想音源のパラメータ推定

糸山克寿, 奥乃博

情報処理学会第100回音楽情報科学研究会 (MUS) 2013-MUS-100 ( 5 ) 1 - 6 2013.8

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

本稿では，音源分離などに起因する雑音や歪みを含む楽器音に対して，それらを含まないクリーンな楽器音を得るための仮想楽器音源のパラメータ推定法について述べる．多数の楽器音をランダムに生成し，楽器音からフレームベースの音響特徴量とその統計量を計算する．重回帰分析を用いて音源パラメータと音響特徴量との関係を学習し，未知楽器音のパラメータをその関係性を用いて推定する．評価実験の結果，推定対象のパラメータが少ない場合には学習データを十分に用意することで，実用上十分な精度でのパラメータ推定を実現した．

CiNii Books

J-GLOBAL

researchmap
伴奏付き歌唱に含まれる歌い方要素の個別抽出

池宮由楽, 糸山克寿, 奥乃博

情報処理学会第100回音楽情報科学研究会 (MUS) 2013-MUS-100 ( 20 ) 1 - 6 2013.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

本稿では，伴奏付き歌唱に含まれるビブラートやこぶしといった歌い方要素を個別に抽出する手法について述べる．歌い方要素は歌唱者の個人性を強く反映し，それらを個別に検出しパラメータ化することで，CGM や MIR への多様な応用が可能となる．本手法では，ユーザが簡易に取得できる歌唱の音高列を事前知識として用いる．音高列から探索範囲を制限したビタビ探索によって高精度に F0 を推定する．各要素は歌唱者の意図による F0 の特徴的な変動として現れ，それらを個別に検出し，設計したモデルに従ってパラメータとして抽出する．評価実験により，市販楽曲からプロ歌手の歌い方要素を個別に抽出できることを確認した．

CiNii Books

J-GLOBAL

researchmap
ギター演奏者の習熟度に合わせた音響信号からのタブ譜自動生成

矢澤一樹, 糸山克寿, 奥乃博

情報処理学会第100回音楽情報科学研究会 (MUS) 2013-MUS-100 ( 17 ) 1 - 6 2013.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

本稿では，ギター演奏者の演奏支援をするために実際のギター演奏音から演奏者の習熟度に応じたタブ譜を自動生成する手法について述べる．具体的には，初級者向けには音符の欠落などを許容してでも演奏が容易なタブ譜を，上級者向けには音高を正確に再現するタブ譜を，それぞれ生成する．推定される運指の難易度は，音響再現度と運指容易度の相対的な重みをユーザー側で調整することによって変更可能である．本手法によって得られたタブ譜について音響再現度と運指容易度の両面から評価を行った結果，パラメータを変更することによって音高推定の適合率を保ったまま運指を簡略化できることが確認された．

CiNii Books

J-GLOBAL

researchmap
楽器音分析合成に基づく音量・音色・旋律の置換

糸山克寿, 奥乃博

情報処理学会第99回音楽情報科学研究会 (音学シンポジウム2013) 2013-MUS-099 ( 25 ) 1 - 2 2013.5

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿は，混合音の音楽音響信号に対して，楽器音分析合成手法を用いてその構成楽器音の音量・音色・旋律を操作・置換する手法を述べる．調波・非調波統合モデルと呼ぶ，スペクトログラム上での混合ガウス分布で単独楽器音をモデル化し，各単音の発音時刻と音高に応じた時間周波数平面上の位置にモデルを重み付きで配置することで混合音を表現する．楽譜を事前情報とした混合音に対する最適なモデルパラメータの推定を通じて，楽器音を分析し，音源分離と音色などの音響特徴抽出を行う．音量操作は分離された楽器音の音量を操作し，各楽器音を再度加算することで実現する．音色と旋律はモデルパラメータ中のそれぞれに対応する成分の操作と楽器音再合成を通じて実現する．

CiNii Books

J-GLOBAL

researchmap
ギター演奏からの押弦パターン・発音時刻・フォーム変化時間制約を用いたタブ譜自動生成システム

矢澤一樹, 阪上大地, 柳楽浩平, 糸山克寿, 奥乃博

情報処理学会第75回全国大会 2013 ( 4R-3 ) 269 - 270 2013.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本研究の目的は，ギター演奏初心者の演奏支援や個人作曲楽曲の二次利用のために，音響信号からのタブ譜自動生成を行うことである．従来の多重基本周波数推定法を用いてタブ譜の自動生成を行う場合，主に次の3つの問題があった．1. ギターで演奏不可能な音高の組合わせが推定される，2. オンセット時刻以外でフォームが変化する，3. フォーム変化が短時間で頻繁に起こりすぎる．そこで我々は，既存の多重基本周波数推定手法LHAに新たに 1.押弦パターン 2.発音時刻 3.フォーム変化に要する時間に関する制約を加えることで，ギター演奏に適したタブ譜を自動生成することに成功した．

CiNii Books

J-GLOBAL

researchmap
Score following of human accompaniment using a lead-sheet for an artificial lead singer

JooYoung Ahn, Katsutoshi Itoyama, Louis-Kenzo Cahier, Hiroshi G. Okuno

2013 ( 4R-8 ) 279 - 280 2013.3

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

The goal of this work is a system capable of following human accompaniment. The fundamental difficulty of score following for human accompaniment is that the score －a lead-sheet, common in popular music－ only provides chord-names. Thus, actual accompaniments have uncertain octave and timber; fluctuating rhythm and tempo. This can degrade performance in conventional score following systems. Our solution is using chroma vectors as features for a particle filter. We measure robustness to timber and tempo changes by testing our system on 9 sets of 20 songs, covering all combinations of 3 levels of accompaniment complexity, and 3 types of instruments.

CiNii Books

researchmap
非負値調波時間構造因子分解法に基づく音楽音響信号の多重基本周波数解析

阪上大地, 大塚琢馬, 糸山克寿, 奥乃博

情報処理学会第75回全国大会 2013 ( 4T-8 ) 491 - 492 2013.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

音楽音響信号は構成音の音量包絡，音高，音色など様々な特性によって形作られており，これらの同時推定を実現することで高精度な多重基本周波数解析を行うことができる．我々は入力音のウェーブレットスペクトログラムを精度よく解析するため，潜在的調波配分法と非負値行列因子分解法をベイズ的に統合し，音量の時間包絡を考慮した新手法を開発した．提案法では，各楽器音のスペクトル包絡と時間包絡をそれぞれ混合ガウス分布によりモデル化し，二つの分布の積として観測スペクトログラムの確率密度を表現した．実験の結果，提案法がF値基準で従来法の性能を上回ることを確認した．

CiNii Books

J-GLOBAL

researchmap
歌声F0生成過程とメロディ分離手法に基づく楽譜逸脱成分推定

池宮由楽, 阪上大地, 糸山克寿, 奥乃博

情報処理学会第75回全国大会 2013 ( 3R-9 ) 261 - 262 2013.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本研究の目的は，楽曲中に含まれる歌声の楽譜逸脱成分を楽譜情報と分離して抽出することである．楽譜逸脱成分とはビブラートやオーバーシュートなど，歌声に含まれる楽譜に記載されないダイナミクスのことであり歌唱者の特徴が反映されるため，歌声合成や音楽情報検索などに広く使われる．従来研究では，クリーンな歌声のみを対象としており，多様なデータを扱うことができなかった．本研究では，メロディ分離手法により楽曲から歌声を分離し，F0 生成過程に基づき楽譜逸脱成分を推定する．評価実験において，楽曲から歌声楽譜逸脱成分を取り出せることを確認した．

CiNii Books

J-GLOBAL

researchmap
単音の音量ダイナミクスを共有化したNMFによる楽器パート分離

田島照久, 阪上大地, 糸山克寿, 奥乃博

情報処理学会第75回全国大会 2013 ( 3R-10 ) 263 - 264 2013.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本研究の目標は次元圧縮や特徴抽出の手法である非負値行列因子分解 (Non-negative Matrix Factorization; NMF)を用い, 複数の楽器で演奏された音楽音響信号から楽器パートごとの音響信号を精度よく分離することである.従来の研究のような, 各楽器の周波数構造を調波や非調波でモデル化し基底に制約を加える手法では, 周波数方向のみの制約で時間方向の制約はなかった.そこで我々は単音の音量はその単音の発音時刻に依存せずに変化するとの仮定を元に, 複数の単音の音量変化を関連付ける制約をアクティベーションの更新に加えた.実験では楽譜から各単音の発音時刻を得たうえで, 比較実験を行い分離性能の向上を確認した.

CiNii Books

J-GLOBAL

researchmap
Automatic Guitar Tablature Production System based on Configuration and Fingering Constraints

2012-MUS-96 ( 11 ) 1 - 7 2012.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
ベイジアン非負値調波因子分解と多重基本周波数推定への応用

阪上大地, 大塚琢馬, 糸山克寿, 奥乃博

情報処理学会第99回音楽情報科学研究会 2012-MUS-96 ( 9 ) 1 - 6 2012.8

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では，音の三要素（音量・音高・音色）を同時にモデル化し，音楽の多重音解析を行う手法，ベイジアン非負値調波領域分解について報告する．本手法では，非負値行列因子分解 (NMF) にならい，観測音のウェーブレットスペクトログラムを基底とアクティベーションの積に分解する．さらに，各基底を調波構造を模した混合ガウス分布とすることで，各単音の音高・音色を表現する．これは，NMF と調波クラスタリングを統合した非負値調波因子分解 (NHF) という手法により実現する．パラメータの推定時には，特性事前分布と呼ぶ新しい確率分布族を用いてより正確な多重音解析を行う解を探索する．従来法と提案法それぞれで，一様乱数を初期値とする多重基本周波数解析の性能評価を行った結果，F 値基準で平均 5.2% 性能が向上することを確認した．

CiNii Books

researchmap
倍音コーパスを用いた初期値依存性の低い多重基本周波数推定法

阪上大地, 糸山克寿, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 2012 ( 4S-7 ) 393 - 394 2012.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

本稿では，倍音コーパスを用いた初期値依存性の低い多重基本周波数推定法について述べる．Harmonic Temporal Clustering (HTC) など従来の多重基本周波数推定法ではモデル上任意の倍音構造を取りうるため，事前分布を精密に設定する必要があった．しかし，この値は統計的根拠に基づいて決定することが出来ないため，手作業によるチューニングが必要であった．本稿では，MIDI 音声を用いて楽器音の倍音構造の一覧 (倍音コーパス) を作成し，楽器音として適切な倍音構造の範囲を決定して推論を行った．実験の結果，音楽的でない局所解を排除し，初期値依存性が低く，統計的にも妥当なモデルが得られることを確認した．

CiNii Books

J-GLOBAL

researchmap
楽曲印象軌跡に基づく楽曲検索システムの実装と評価

西川直毅, 糸山克寿, 藤原弘将, 後藤真孝, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 2012 ( 1S-7 ) 337 - 338 2012.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

本稿では,楽曲印象軌跡を用いた楽曲検索システムの実装と,被験者実験による評価について報告する.実際の楽曲において,印象は時々刻々と変化し,また印象は歌詞と音響信号の両方で特徴づけられる.この2点の特徴を反映する為に,我々は楽曲印象を歌詞印象軌跡と音響印象軌跡の組み合わせで表現する.歌詞印象軌跡は確率的潜在意味解析,音響信号印象軌跡は多重線形回帰を用いて推定する.ユーザは検索システムに歌詞,音響信号印象軌跡を入力し,入力軌跡と類似する軌跡を持つ楽曲が検索される.印象軌跡間の類似度はDPマッチングによって求める.実験では,被験者に本システムを使用して楽曲検索を行わせたのち,検索された楽曲と入力した印象軌跡がどの程度合致するかを評価させた.

CiNii Books

J-GLOBAL

researchmap
押弦制約付きギター演奏自動採譜システム

矢澤一樹, 阪上大地, 糸山克寿, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 2012 ( 4S-6 ) 391 - 392 2012.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

本稿では，押弦制約を用いたギター演奏の自動採譜システムについて述べる．Latent Harmonic Allocation (LHA) などの従来の多重基本周波数推定法ではあらゆる音の組み合わせが許容されるため，人間の身体構造上は同時に演奏できない音の組み合わせが推定結果に含まれる場合があった．我々は，LHAの出力に押弦制約を組み合わせることで，このような音の組み合わせを除外可能な自動採譜システムを構築した．押弦制約は，押弦位置が3～4フレット以内である押弦パターンをリストアップしたものである．実験の結果，押弦制約を用いることで推定精度の向上，およびLHAの閾値に対する頑健性の向上を確認した．

CiNii Books

J-GLOBAL

researchmap
アクセント特徴量を用いた歌声と朗読音声の識別システム

阿曽慎平, 齋藤毅, 後藤真孝, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 2012 ( 6U-9 ) 625 - 626 2012.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

アクセント特徴量を用いて歌声と歌詞の朗読音声を識別するシステムを開発した.システムの入力は雑音を含まない単独音声,出力は歌声か朗読音声かの2値である.我々はリズム構造が歌声と朗読音声で聴感上異なることに着目し,リズムと関連が強いと考えられるアクセント特徴量を識別に用いた.この特徴量は音響的な時間変化の大きい部分,例えば音素境界や発話開始時間で極大値(ピーク)を持つ.隣り合うアクセントピークの時間間隔と,アクセント強度の分布をそれぞれ混合ガウス分布でモデル化し,2つの識別機を設計した.実験の結果,約10秒の音声に対し,ピーク時間間隔を用いた識別機では89.2%,アクセント強度を用いた識別機では59.7%の精度であった.

CiNii Books

J-GLOBAL

researchmap
A System for Automatic Discrimination between Singing and Speaking Voices on the Basis of Peak Interval of Spectral Change, F0, and MFCC

Shimpei Aso, Takeshi Saitou, Masataka Goto, Katsutoshi Itoyama, Toru Takahashi, Tetsuya Ogata, Hiroshi G.Okuno

IPSJ SIG Notes 2012-MUS-94 ( 13 ) 1 - 8 2012.1

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：Information Processing Society of Japan (IPSJ)

In this paper we describe a system that discriminates between singing and speaking voices. Given a clean speech signal, it outputs the likelihood of each of the singing and speaking voices. Previous systems use temporal transition of spectral envelope (MFCC) and fundamental frequency (F0) as discrimina- tion features. Our system adds peak interval of spectral change as a phoneme duration feature and weights these features according to the duration of the input speech signal. Experimental results with one-second speech signal show that our system achieves 90.2 % accuracy compared to 86.7 % with previous systems. We also describe a real-time application demonstrating our system.

CiNii Books

J-GLOBAL

researchmap
音響特徴・ベース音・和音遷移を用いた自動和音認識

糸山克寿, 尾形哲也, 奥乃博

情報科学研究会第94回音楽情報科学研究会 2012-MUS-94 ( 29 ) 1 - 7 2012.1

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

researchmap
Design and evaluation of a musical mood trajectory estimation method using lyrics and acoustic features

Naoki Nishikawa, Katsutoshi Itoyama, Hiromasa Fujihara, Masataka Goto, Tetsuya Ogata, Hiroshi G. Okuno

2011-MUS-91 ( 7 ) 1 - 8 2011.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

This paper describes a method that represents an overall musical mood trajectory (time-varying impressions) of a song by using two mood trajectories estimated for both of the lyrics and audio signals. The mood trajectory of the lyrics is obtained by using the Probabilistic Latent Semantic Analysis (PLSA) to estimate topics (representing impressions) from words in the lyrics. The mood trajectory of the audio signals is estimated from acoustic features by using the Multiple Linear Regression Analysis. In our experiments, mood trajectories of 175 songs by The Beatles are estimated and clustered into several classes. Comparison of acoustic features within each class and comparison of social tags and mood trajectories showed that the estimated mood trajectories is suitable and can represent time-varying impressions of songs.

CiNii Books

researchmap
MAHL: Score alignment method for analyzing inter-performer interaction

Akira Maezawa, Katsutoshi Itoyama, Tetsuya Ogata, Hiroshi G. Okuno

2011-MUS-91 ( 19 ) 1 - 6 2011.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

This paper presents a method to align an audio signal and individual music instrument parts comprising a music score. Such method allows a machine to analyze temporal interaction of music performers. Proposed method is based on fitting multiple Hidden Semi-Markov Models (HSMM) to the observed audio signal, each HSMM of which emits Latent Harmonic Allocation parameters. Each HSMM corresponds to a music instrument part, and the state duration probability is conditioned on an auto-regressive tempo model. Evaluation suggests usefulness as score alignment method, and hints at the usefulness as multiple part alignment method.

CiNii Books

researchmap
歌詞と音響特徴量を用いた楽曲の印象軌跡推定

西川直毅, 糸山克寿, 藤原弘将, 後藤真孝, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 5R-3 ) 297 - 298 2011.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本研究は，楽曲の進行によって変化する印象の軌跡を推定する事を目的としている．従来の楽曲印象データベースには曲の進行に応じた印象変化の情報がなく，学習データがない．この問題に対し，(a) 歌詞中の単語を，確率的潜在意味解析 (pLSA) を用いて潜在的印象クラスに分割(b) 楽曲の一部分の単語集合が持つ印象と音響特徴量の相関を学習というアプローチをとる．本手法により，音響特徴量と歌詞を用いて楽曲の印象軌跡が推定可能となる．

CiNii Books

J-GLOBAL

researchmap
F0・音韻長・パワー制御による歌声らしさ・話声らしさの変化の評価

阿曽慎平, 齋藤毅, 後藤真孝, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 2R-6 ) 255 - 256 2011.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

歌声，話声，歌声と話声の中間的な音声，歌舞伎や能の音声のそれぞれの歌声らしさ，話声らしさを評価する．人間の歌声らしさ・話声らしさに関する知覚は連続的に変化すると考えられるため，中間的に知覚される音声が存在する．従来の研究では歌声か，話声かのみを考えており，中間的な音声や歌声・話声以外を考慮していない．計算機で中間的な音声の評価ができれば人間の歌声らしさ・話声らしさの知覚機構解明に貢献できると考えられる．本報告ではF0・音韻長・パワーを制御することで中間的な音声を作る．合成した音声に対し聴取実験を行い，どのような制御が歌声らしさ・話声らしさ知覚に影響を与えるのか聴取実験で評価する．

CiNii Books

J-GLOBAL

researchmap
Classification of Harmonic and Textural Keyboard Playing Style Using Acoustic Features

JooYoung Ahn, Akira Maezawa, Katsutoshi Itoyama, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

2011 ( 4C-2 ) 17 - 18 2011.3

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

Keyboard playing is a widely used method to represent musical idea,which is played in either harmonic or textural styles. The goal ofthis paper is to classify such style of the user's keyboard playingfrom its audio signal. Because the acoustic features for suchclassification is poorly studied, we defined acoustic features whichrepresent harmonic and textural playing style, and classified actualpractical keyboard playings.

CiNii Books

researchmap
調波パラメトリックNMFによる楽器演奏音響信号の分析合成

安良岡直希, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 5R-1 ) 293 - 294 2011.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

本稿では調波パラメトリックNonnegative Matrix Factorization (HPNMF) と呼ぶ新しい振幅スペクトログラムモデリング手法を用いた音源分離と演奏合成法について述べる．HPNMFでは，振幅スペクトログラムを直接因子分解するのではなく，各時刻のスペクトルを調波Gaussian Mixtureによりモデル化した上でその各倍音強度パラメータを楽曲全体で因子分解する．これにより基本周波数パラメータをNMFの枠組みの外側で適応でき，通常のNMFが苦手とするビブラート信号などを効率的に推定できる．HPNMFを用いて多重奏からの特定楽器パート音源分離と演奏音響信号再合成が高精度に実現されることを示す．

CiNii Books

J-GLOBAL

researchmap
多重奏音響信号中の歌唱音声の歌詞を自由に差し替える歌詞置換システム

安良岡直希, 糸山克寿, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

日本音響学会 2010年秋期研究発表会 ( 2-7-7 ) 2010.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
SpeakBySinging: A Speaking Voice Synthesis System Converting Singing Voices to Speaking Voices

Shimpei Aso, Takeshi Saitou, Masataka Goto, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G.Okuno

13th International Conference on Digital Audio Effects, DAFx 2010 Proceedings 2010-MUS-86 ( 8 ) 1 - 7 2010.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

Scopus

researchmap
Phrase Replacing System for Polyphonic Music Waveforms

Naoki Yasuraoki, Katsutoshi Itoyama, Takuya Yoshioka, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G.Okuno

2010-MUS-86 ( 20 ) 1 - 8 2010.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：情報処理学会

This paper presents a music manipulating system that enables a user to replace an instrument performance phrase in polyphonic audio mixture. Two technical problems must be solved to realize this system: 1)separating the melody part from accompaniment, and 2)synthesizing a new instrument performance that has timbre and expression of the original one. Our method first performs the separation using statistical model integrating harmonic and inharmonic Gaussian mixture and nonnegative-matrix-factorization. Then our method synthesizes a new instrument performance by adding the acoustic characteristics given by Gaussian mixture parameters to a MIDI synthesizer-generated sound. Two evaluations confirm the effectiveness of the proposed method.

CiNii Books

researchmap
Acoustic Feature Variation and Application to Similarity-based Music Retrieval using Instrument Equalizer

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

72 ( 6J-6 ) 25 - 26 2010.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
Performance and Timbre Rendering for MIDI-Synthesized Audio Signal by using Harmonic Inharmonic GMM

Naoki Yasuraoka, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

72 ( 5T-5 ) 183 - 184 2010.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
SpeakBySinging : A Speaking Voice Synthesis System Converting Singing Voices to Speaking Voices By Controlling F0, Amplitude, and Duration

Shimpei Aso, Takeshi Saitou, Masataka Goto, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G.Okuno

72 ( 6U-1 ) 295 - 296 2010.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
音響信号とコンテキスト制約を併用したバイオリンの演奏弦系列の推定

前澤陽, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

日本音響学会 2009年秋期研究発表会 ( 2-5-15 ) 2009.9

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
Estimation of Bowed String Sequence of a Violin Performance Using Audio and Score-Based Anomaly Detection

Akira Maezawa, Katsutoshi Itoyama, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

2009-MUS-81 ( 5 ) 1 - 6 2009.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

We present a violin bowed string sequence identification method by combining context-based rules and audio-based bowed string estimator. Using audio-based estimator followed by error correction using context-based rules increases the accuracy of the estimator. Using six musical phrases, we confirm that the accuracy increases on average by 5% (max. 8%) when using the set of strings used for training, and, when using different brand of strings than that used for training, confirm 7% increase on average (max. 15%).

CiNii Books

researchmap
Improvement of Performance Analysis-and-Symthesis Method by using Residual Spectrum Model for Reduction of Accompaniment or Sound Reverberation

Naoki Yasuraoka, Katsutoshi Itoyama, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

2009-MUS-81 ( 10 ) 1 - 6 2009.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：情報処理学会

This paper presents a musical performance analysis-and-synthesis method using residual model for reduction of accompaniment or sound reverberation. The residual model is designed for representing spectrum that the score does not convey about the performance. This leads to an efficient extraction of a performed part from accompanied and/or reverberant audio source. The extraction is performed simultaneously with estimation of musical tone models that represent both harmonic and inharmonic sound of the performance. Using the estimated tone models, a new performance sound corresponding to a new given score is synthesized. An experiment showed that the spectral distance of one instrument part extracted from polyphonic audio source improved by 35.0 points by incorporating the residual model. Another result showed the effectiveness of our method under reverberant source.

CiNii Books

researchmap
Parameter Estimation of Mixture Model of Multiple Instruments and Application to Musical Instrument Identification

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G.Okuno

2009-MUS-81 ( 13 ) 1 - 6 2009.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

This report presents parameter estimation of mixture model of multiple instruments based on the integrated harmonic and inharmonic model, and its application to musical instrument identification. Parameters of the integrated model, which fit an observed power spectrogram, are estimated by an bayesian inference method based on calculus of variations. Since parameter distributions of the integrated model depend on each instrument, the instrument name is identified by selecting an instrument which has maximum relative instrument weight. Experimental results showed 81.6% of accuracy for 10 instruments.

CiNii Books

researchmap
Musical Genre Shift of Polyphonic Musical Pieces by Changing Instrument Volume

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G.Okuno

2009-MUS-81 ( 3 ) 1 - 6 2009.7

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

This report presents a novel Query-by-Example (QBE) approach in Music Information Retrieval, which allows a user to customize query examples by directly modifying the volume of different instrument parts. The underlying hypothesis is that the musical genre shifts (changes) in relation to the volume balance of different instruments. Our QBE system first separates the musical audio signal into all instrument parts with the help of its musical score, and then lets a user remix those parts to change acoustic features that represent musical mood of the piece. The distribution of those features is modeled by the Gaussian Mixture Model for each musical piece, and the Earth Movers Distance between mixtures of different pieces is used as the degree of their mood similarity. Experimental results showed that the shift was actually caused by the volume change of vocal, guitar, and drums.

CiNii Books

researchmap
Parameter Estimation for Harmonic and Inharmonic Models by Using Timbre Feature Distributions

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, HiroshiG.Okuno

IPSJ Journal 50 ( 7 ) 1757 - 1767 2009.7

　More details

Language：English

CiNii Books

researchmap
Performance Rendering and Sound Synthesis considering the Timbral Deviation within Note Sequence

YASURAOKA Naoki, ABE Takehiro, ITOYAMA Katsutoshi, TAKAHASHI Toru, OGATA Tetsuya, OKUNO Hiroshi G.

71 ( 4R-1 ) 217 - 218 2009.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
Musical instrument sound morphing based on psychoacoustic timbre characteristics using harmonic and inharmonic models

ABE Takehiro, ITOYAMA Katsutoshi, TAKAHASHI Toru, KOMATANI Kazunori, OGATA Tetsuya, OKUNO Hiroshi G.

71 ( 4R-2 ) 219 - 220 2009.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
Probabilistic Classification of Monophonic Instrument Playing Techniques

Akira Maezawa, Katsutoshi Itoyama, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

71 ( 4R-3 ) 221 - 222 2009.3

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

researchmap
A Music Retrieval Approach from Alternative Genres of Query by Adjusting Instrument Volume

Kaiping Wang, Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

71 ( 5R-5 ) 239 - 240 2009.3

　More details

Language：English Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

researchmap
Automatic Chord Recognition Considering the Relation between Bass Pitch Probability and Chroma Vector

Hideki Takano, Kouhei Sumi, Katsutoshi Itoyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

71 ( 5R-6 ) 241 - 242 2009.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
西洋古典歌唱における発声時の頭部、頸部、胸部の姿勢変化

鈴木茉莉緒, 進矢正宏, 高橋徹, 糸山克寿, 奥乃博, 小田伸午

京都体育学会 2009.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
An Analysis-and-synthesis Approach for Manipulating Pitch of a Musical Instrument Sound Considering Pitch-dependency of Timbral Characteristics

50 ( 3 ) 1054 - 1066 2009.3

　More details

Language：Japanese

CiNii Books

researchmap
Estimation of Bowed String Sequence of a Violin Performance Using Audio and Score-Based Anomaly Detection

前澤陽, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会研究報告(CD-ROM) 2009 ( 2 ) 2009

　More details

J-GLOBAL

researchmap
A music information retrieval system based on timbre similarity using the instrument equalizer

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G.Okuno

IPSJ SIG Notes 2008 ( 78 (2008-MUS-076) ) 143 - 148 2008.7

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

This paper describes a music remixing interface, called Instrument Equalizer, that allows users to control the volume of each instrument part within existing audio recordings in real time. Although query-by-example retrieval systems need a user to prepare favorite examples (songs) in general, our interface gives a user to generate examples from existing ones by cutting or boosting some instrument/vocal parts, resulting in a variety of retrieved results. To change the volume, all instrument parts are separated from the input sound mixture using the corresponding standard MIDI file. For the separation, we used an integrated tone (timbre) model consisting of harmonic and inharmonic models. To improve the accuracy of parameter estimation, we train probabilistic distributions of model parameters by using various sounds.

CiNii Books

J-GLOBAL

researchmap
A Method for Manipulating Pitch and Duration of Musical Instrument Sounds Dealing with Pitch-dependency of Timbre

Takehiro Abe, Katsutoshi Itoyama, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IPSJ SIG Notes 2008 ( 78 (2008-MUS-76) ) 155 - 160 2008.7

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference) Publisher：一般社団法人情報処理学会

This paper presents a manipulation method that can generate musical instrument sounds with arbitrary pitches and durations from a given musical instrument sound without distorting timbral characteristics. Based on the psychoacoustical knowledge on auditory effects of timbre, we define timbral features on the spectrogram of a musical instrument sound as (i) relative amplitudes of harmonic components, (ii) distribution of inharmonic components, and (iii) temporal envelopes of harmonic components. We use Itoyama's integrated model to analyze timbral features. For pitch manipulation, we take into account the pitch-dependency of timbre by using a cubic polynomial that approximates the distribution of features (i) and (ii) over pitches and predicting the values of each feature. To manipulate duration, we preserve feature (iii) in the attack and decay durations of a seed by expanding or shrinking only the steady duration. Experimental results showed the effectiveness of our method; the MFCC distance between synthesized sounds and real sounds of 32 instruments was reduced by 32.31%.

CiNii Books

J-GLOBAL

researchmap
Synthesis approach for manipulating pitch of a musical instrument sound with considering timbral characteristics

Takehiro Abe, Katsutoshi Itoyama, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

70 ( 2X-7 ) 437 - 438 2008.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
Parameter Estimation for Harmonic and Inharmonic Models Using Prior Distributions from Multiple Instrument Bodies

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

70 ( 2X-6 ) 435 - 436 2008.3

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
Cross-media Retrieval Using a Congruency Model between Music and Video in Multimedia Content

Hiroki Saito, Katsutoshi Itoyama, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

70 ( 4X-4 ) 465 - 466 2008.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
Automatic Chord Recognition Based on the Pitch of Bass Sound for Popular Music

Kouhei Sumi, Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

70 ( 2X-5 ) 433 - 434 2008.3

　More details

Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

CiNii Books

J-GLOBAL

researchmap
Demonstrations: Introduction of Research by Young Researchers V

Masatoshi Hamanaka, Akira Nishimura, Hiroshi Takaesu, Shigeyuki Hirai, Katsutoshi Itoyama, Akiyuki Yoshino, Shohei Kajiwara, Nozomi Kigimoto, MichiakiKatsumoto, Tomoyasu Nakano, Naoki Itou, Shunsuke Nakamura, Makiko Nagasawa, Kotaro Shibata

MUS 2007 ( 81 (2007-MUS-071) ) 127 - 136 2007.8

　More details

Language：Japanese Publishing type：Lecture material (seminar, tutorial, course, lecture, etc.) Publisher：Information Processing Society of Japan (IPSJ)

Toward further progresses of young researchers in the field of music information processing, we introduce case studies of demonstrations.

CiNii Books

researchmap
Parameter Estimation for Harmonic and Inharmonic Models by Using Timbre Feature Distributions

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2007 ( 81 (2008-MUS-071) ) 161 - 166 2007.8

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
Constrained Parameter Estimation of Harmonic and Inharmonic Models for Separating Polyphonic Musical Audio Signals

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2007 ( 37 (2007-MUS-070) ) 81 - 88 2007.5

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
楽譜情報を用いたNMFによる音楽音響信号の音源分離

糸山克寿, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 ( 2N-1 ) 159 - 160 2007.3

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap
Demonstrations: Introduction of Research by Young Researchers IV

HAMANAKA Masatoshi, TAKEGAWA Yoshinari, IWAI Kenichi, TAKAHASHI Naoya, NAKANO Tomoyasu, OHISHI Yasunori, ITOYAMA Katsutoshi, KITAHARA Tetsuro, YOSHII Kazuyoshi, Nakano Tomoyasu, Ohishi Yasunori, Itoyama Katsutoshi, Kitahara Tetsuro, Yoshii Kazuyoshi

MUS 2006 ( 113 (2007-MUS-067) ) 9 - 14 2006.10

　More details

Language：Japanese Publishing type：Lecture material (seminar, tutorial, course, lecture, etc.) Publisher：Information Processing Society of Japan (IPSJ)

Toward further progresses of young researchers in the field of music information processing, we introduce case studies of demonstrations.

CiNii Books

researchmap
多重奏中特定パートの自動採譜における複数特徴量の自動重み付け

糸山克寿, 北原鉄朗, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 ( 2L-4 ) 169 - 170 2006.3

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper, summary (national, other academic conference)

J-GLOBAL

researchmap

▼display all

Industrial property rights

マイクロホンアレイ位置推定装置、マイクロホンアレイ位置推定方法、およびプログラム

中臺一博, 段雄啓, 糸山克寿, 西田健次

　More details

Applicant：本田技研工業株式会社

Application no：特願2019-034898 Date applied：2019.2

Announcement no：特開2020-141232 Date announced：2020.9

researchmap
音源定位装置、音源定位方法、およびプログラム

中臺一博, 正木俊伍, 小島諒介, 杉山治, 糸山克寿, 西田健次

　More details

Applicant：本田技研工業株式会社

Application no：特願2019-034717 Date applied：2019.2

Announcement no：特開2020-141222 Date announced：2020.9

researchmap
キャプション生成装置、キャプション生成方法、およびプログラム

中臺一博, 岩月道生, 糸山克寿, 西田健次

　More details

Applicant：本田技研工業株式会社

Application no：特願2019-034979 Date applied：2019.2

Announcement no：特開2020-140050 Date announced：2020.9

researchmap
音源分離装置、音源分離方法、およびプログラム

中臺一博, 日下湧太, 糸山克寿, 西田健次

　More details

Applicant：本田技研工業株式会社

Application no：特願2019-034713 Date applied：2019.2

Announcement no：特開2020-140041 Date announced：2020.9

researchmap
音響信号処理装置、音響信号処理方法及びプログラム

糸山克寿, 中臺一博

　More details

Applicant：本田技研工業株式会社

Application no：特願2018-165504 Date applied：2018.9

Announcement no：特開2020-039057 Date announced：2020.3

researchmap
目的音響信号復元システム及び方法

坂東宜昭, 吉井和佳, 糸山克寿, 奥乃博

　More details

Applicant：国立大学法人京都大学

Application no：特願2018-519566 Date applied：2017.5

researchmap
歌声信号分離方法及びシステム

池宮由楽, 吉井和佳, 糸山克寿

　More details

Applicant：国立大学法人京都大学

Application no：特願2015-034339 Date applied：2015.2

Announcement no：特開2016-156938 Date announced：2016.9

researchmap

▼display all

Research Projects

Spatio-temporal calibration of asynchronous distributed microhoone arrays in dynamic environment

Grant number：23K11160 2023.4 - 2026.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

　 More details

Grant amount：\4550000 （ Direct Cost: \3500000 、 Indirect Cost：\1050000 ）

researchmap
Research of calibration free model for asynchronous distributed microphone array

Grant number：19K12017 2019.4 - 2022.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C) Grant-in-Aid for Scientific Research (C)

　 More details

Authorship：Principal investigator

Grant amount：\3900000 （ Direct Cost: \3000000 、 Indirect Cost：\900000 ）

researchmap
Development of an avian monitoring system using species identification by bird songs

Grant number：16K16222 2016.4 - 2019.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B) Grant-in-Aid for Young Scientists (B)

Fujita Motoko, OKUNO Hiroshi, ITOYAMA Katsutoshi, SUZUKI Reiji, MARUYAMA Akio

　 More details

Authorship：Collaborating Investigator(s) (not designated on Grant-in-Aid)

Grant amount：\3380000 （ Direct Cost: \2600000 、 Indirect Cost：\780000 ）

Automatic species identification system from bird songs consists of (1) detection of bird songs and (2) identification of bird songs. At the stage of (1), it turned out that with the 2-channel recordings, which was the target of this study in the beginning, detection rate remained quite low during analysis. The primary reason of this is due to difficulties in differentiation of songs of more than two individuals at the same time. Therefore, I decided to increase the recording channel up to 8 channels and to introduce Open Source Software for Robot Audition (HARK). Newly recorded samples of Fukui prefecture and Indonesian forests using 8 channels were analyzed, and it was clarified that the separation of individuals singing simultaneously was possible, which would lead to better identification results.

researchmap
A Unified Bayesian Approach to Simultaneous Speech Recognition for Mixture Signals

Grant number：15K12063 2015.4 - 2017.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research Grant-in-Aid for Challenging Exploratory Research

Yoshii Kazuyoshi, KAWAHARA Tatsuya, MOCHIHASHI Daichi

　 More details

Authorship：Coinvestigator(s)

Grant amount：\3640000 （ Direct Cost: \2800000 、 Indirect Cost：\840000 ）

We proposed a method that can simultaneously recognize multiple utterances by using a probabilictic model of source separation. Since there is uncertainty about source signals, we combined speech recognition with source separation by considering the posterior distributin of the source signals. This enabled us to obtain recognition results directly from mixture signals without uniquely determining the source signals. In addition, we proposed a source separation method based on an integrated model involving a source model and a superimposition model. Each model is represented as a mixture (LDA) or factor model (NMF) and the performance of each combination was evaluated.

researchmap
Deployment of Robot Audition Toward Understanding Real World

Grant number：24220006 2012.5 - 2017.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S) Grant-in-Aid for Scientific Research (S)

OKuno Hiroshi G

　 More details

Grant amount：\218140000 （ Direct Cost: \167800000 、 Indirect Cost：\50340000 ）

This research project aims at deployment of robot audition even to natural and disastrous environments by enhancing the robot audition software HARK. Once HARK for Windows was released, it has been downloaded about 90K times. Applications of multi-party interaction and music co-player robots demonstrate their feasibility. Robustness of sound source localization for UAV provided by iGSVD-MUSIC and sound-based shape estimation and speech enhancement for hose-shaped robots demonstrate the feasibility of using sounds for search and rescue robots. Acoustic analysis of frog choruses and development of HARKBIrd based on HARK and its evaluation in observing and analyzing bird song communication in actual fields demonstrate the feasibility of acoustical analysis of ecology. Finally, we have established fundamental technologies of robot audition for acoustical understanding of real world.

researchmap
A Study of Musical Scene Analysis and Direction of Musical Elements based on Statistical Machine Learning

Grant number：24700168 2012.4 - 2015.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B) Grant-in-Aid for Young Scientists (B)

KATSUTOSHI Itoyama

　 More details

Authorship：Principal investigator

Grant amount：\4420000 （ Direct Cost: \3400000 、 Indirect Cost：\1020000 ）

(1) Development of a musical audio signal analysis method based on nonparametric Bayesian manner, (2) development of an automatic chord recognition method based on Bayesian estimation, (3) development of a violin fingering estimation method from musical audio signals, (4) development of a method that estimates parameters of virtual instrument sound synthesizers (5) development of an automatic guitar tablature transcription method based on guitar player's proficiency, (6) constructing a singing style library from professional singers' singing voices, (7) development of a method that separates musical audio signals into singing voices and accompaniment signals, and edits singing styles of the separated singing voice, (8) development of an automatic dereverberation method, (9) development of a repetitive chord and pitch estimation method.

researchmap
ロボット聴覚の実環境理解に向けた多面的展開

Grant number：24240035 2012

日本学術振興会科学研究費助成事業基盤研究(A) 基盤研究(A)

奥乃博, 加賀美聡, 糸山克寿, 公文誠, 中臺一博

　 More details

Authorship：Coinvestigator(s)

Grant amount：\21060000 （ Direct Cost: \16200000 、 Indirect Cost：\4860000 ）

音は画像と比べ拡散性が強いので,ロボット聴覚による音環境理解は,画像だけでは捉えきれない環境でも理解できる一方,広域から得られる情報の活用方法が課題となる.本研究課題では,既開発のロボット聴覚を基に,実環境音環境理解が可能な安全安心のためのロボット聴覚技術の多面的展開を目的とする.
具体的には,
WP1:多様なマイクロフォンコンフィグレーションへの展開,HARK-16の性能向上や分散設置された複数のマイクロホンアレイの同期方法,
WP2:室内から屋外への展開,室内での音響マップ作成から無人飛行機による空中からの音の取得と音源定位,
WP3:音声から楽音・環境音を含めた音一般への展開,特にノンパラメトリックベイズ信号処理,音光変換による動物音響学,楽器演奏音からの楽器音実時間分離,環境音の擬音語認識,
に取り組むことになっていた.研究開始から辞退までの2ヶ月間で,実験装置の準備と,無人ヘリコプタの使用の詳細化,無人ヘリコプタ搭載用のマルチチャネルAD装置の設計,特に,非同期分散マイクの処理を高性能化するための時間情報付き音響データ転送方式の設計を行った.また,
HARK-Binauralの洗練化,移動音源を対象とした音源定位のベイズ手法の開発,ベイズ手法による突発音や反射音を抑制したMUSIC(Multiple Signal Classification)法の開発,音源の活動状況と音源分離とを同時に推定するノンパラメトリックベイズ手法によるIVA法の開発,楽器音の音モデルのゆらぎを許容する多重演奏曲の楽器音分離法の開発,バンドパスフィルタを用いたカエルホタルの高機能化などに取り組んだ.

researchmap
音楽音響信号の音源分離における統合的理論の構築とその応用

Grant number：08J02757 2008 - 2010

日本学術振興会科学研究費助成事業特別研究員奨励費特別研究員奨励費

糸山克寿

　 More details

Authorship：Principal investigator

Grant amount：\1800000 （ Direct Cost: \1800000 ）

本年度は,音源分離と楽器名同定の同時処理,および音源分離の応用としての類似楽曲検索システムに関する研究に取り組み,論文を発表した.
1.複数楽器混合モデルのパラメータ推定と楽器名同定への応用複雑な音楽音響信号中の楽器音を認識し,信号からその構成要素である楽器音や歌声を分離することは,近年の音楽情報処理における重要な要素技術である.混合音の音楽音響信号および楽曲中の各単音の音高と発音区間を入力として,各単音の分離音響信号と楽器名同定結果を出力する問題に取り組んだ.2～3楽器音の混合音に対して楽器音を分離しそれぞれの楽器名を同定する実験を行ったところ,2音の混合に対する平均音源同定正解率は72.1%,3音に対しては54.8%であった.音源分離性能は2音,3音のそれぞれに対して平均対数スペクトル距離が3.12,3.65であった.さらに,楽器名同定の正解によって音源分離性能が向上することを確認した.
2.楽器音イコライザによる音色の類似度に基づく楽曲検索システム類似楽曲検索とは,ユーザが指定した楽曲をクエリとして与え,楽曲を類似性に基づいてランキングする検索手法である.楽器音量バランスを操作したクエリ楽曲を用いて類似楽曲検索を行い,検索結果のジャンルからクエリ楽曲のジャンルシフトを調査した.楽器音量バランスとジャンルシフトとの間には合理的な関係があり,音楽ジャンルの典型的なイメージと整合していることが確認できた.ボーカルパートとドラムスパートでは,分離音と原音で同じ傾向のジャンルシフトが確認されたが,ギターパートでは異なっていた.

researchmap
Development of Robot Audition based on Computational Auditory Scene Analysis

Grant number：19100003 2007 - 2011

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S) Grant-in-Aid for Scientific Research (S)

OKUNO Hiroshi, OGATA Tetsuya, KOMASTANI Kazunori, TAKAHASHI Toru, SHIRAMATSU Shun, NAKADAI Kazuhiro, KITAHARA Tetsuro, ITOYAMA Katsutoshi, ASANO Futoshi

　 More details

Authorship：Coinvestigator(s)

Grant amount：\119340000 （ Direct Cost: \91800000 、 Indirect Cost：\27540000 ）

Three main features of Computational Auditory Scene Analysis, sound source localization, sound source separation, and recognition of separated sounds, have been developed and their collections are made available as an open-sourced robot audition software called "HARK". As a proof of concepts in this robot audition, we developed "Prince Shotoku" robots that can listen to simultaneous talkers, and a spoken dialogue system that accepts a barge-in utterance of the user. We also developed various technologies to separate musical instrument parts for polyphonic performance, and real-time score following systems. These musical-related technologies are applied to make musical robots to play ensemble with human players

researchmap

▼display all