Faculty Profiles - MATSUOKA SATOSHI

写真a

MATSUOKA SATOSHI

Organization

School of Computing Visiting Professor

External link

News & Topics

スーパーコンピュータ「京」がGraph500において9期連続で世界第1位を獲得ビッグデータの処理で重要となるグラフ解析で最高レベルの評価

2019/06/21

Languages： Japanese

　 More details

理化学研究所（理研）、九州大学、東京工業大学、バルセロナ・スーパーコンピューティング・センター、富士通株式会社、株式会社フィックスターズによる国際共同研究グループは、ビッグデータ処理（大規模グラフ解析）に関するスーパーコンピュータの国際的な性能ランキングであるGraph500において、スーパーコンピュータ｢京（けい）｣[補足1]による解析結果で、2018年11月に続き9期連続（通算10期）で第1位を獲得しました。
スーパーコンピュータ「京」がGraph500において8期連続で世界第1位を獲得ビッグデータの処理で重要となるグラフ解析で最高レベルの評価

2018/11/14

Languages： Japanese

　 More details

理化学研究所（理研）、九州大学、東京工業大学、バルセロナ・スーパーコンピューティング・センター、富士通株式会社、株式会社フィックスターズによる国際共同研究グループは、ビッグデータ処理（大規模グラフ解析）に関するスーパーコンピュータの国際的な性能ランキングであるGraph500において、スーパーコンピュータ｢京（けい）｣[用語1]による解析結果で、2018年6月に続き8期連続（通算9期）で第1位を獲得しました。
スーパーコンピュータ「京」がGraph500において7期連続で世界第1位を獲得ビッグデータの処理で重要となるグラフ解析で最高レベルの評価

2018/06/29

Languages： Japanese

　 More details

理化学研究所（理研）、九州大学、東京工業大学、バルセロナ・スーパーコンピューティング・センター、富士通株式会社、株式会社フィックスターズによる国際共同研究グループは、ビッグデータ処理（大規模グラフ解析）に関するスーパーコンピュータの国際的な性能ランキングであるGraph500において、スーパーコンピュータ｢京（けい）｣[用語1]による解析結果で、2017年11月に続き7期連続（通算8期）で第1位を獲得しました。
スーパーコンピュータ「京」がGraph500において5期連続で世界1位を獲得

2017/06/26

Languages： Japanese

　 More details

九州大学と東京工業大学、理化学研究所、スペインのバルセロナ・スーパーコンピューティング・センター、富士通株式会社による国際共同研究グループは、2017年6月21日（水）（米国ソルトレイクシティ現地時間）に公開された最新のビッグデータ処理（大規模グラフ解析）に関するスーパーコンピュータの国際的な性能ランキングであるGraph500において、スーパーコンピュータ「京（けい）」[用語1]による解析結果で、2016年11月に続き5期連続（通算6期）で第1位を獲得しました。
スーパーコンピュータ「京」がGraph500において4期連続で世界1位を獲得

2016/11/24

Languages： Japanese

　 More details

概要九州大学と東京工業大学、理化学研究所、スペインのバルセロナ・スーパーコンピューティング・センター、富士通株式会社による国際共同研究グループは、2016年11月15日（火）（米国ソルトレイクシティ現地時間）に公開された最新のビッグデータ処理（大規模グラフ解析）に関するスーパーコンピュータの国際的な性能ランキングであるGraph500において、スーパーコンピュータ「京（けい）」[用語1]による解析結果で、2016年6月に続き4期連続（通算5期）で第1位を獲得しました。
スーパーコンピュータ「京」がGraph500で世界第1位を獲得―ビッグデータの処理で重要となるグラフ解析で最高の評価―

2016/07/13

Languages： Japanese

　 More details

九州大学と東京工業大学、理化学研究所、スペインのバルセロナ・スーパーコンピューティング・センター、富士通株式会社による国際共同研究グループは、2016年6月に公開された最新のビッグデータ処理（大規模グラフ解析）に関するスーパーコンピュータの国際的な性能ランキングであるGraph500において、スーパーコンピュータ「京（けい）」による解析結果で、2015年11月に続き3期連続（通算4期）で第1位を獲得しました。
2期連続でスーパーコンピュータ「京」がGraph500で世界第1位を獲得

2015/11/24

Languages： Japanese
K computer takes first place in Graph 500 supercomputer ranking for second consecutive time

2015/11/24

Languages： English
K computer takes first place in Graph 500 supercomputer ranking

2015/07/31

Languages： English
スーパーコンピュータ「京」がGraph500で世界第1位を奪還

2015/07/27

Languages： Japanese
Supercharging a supercomputer

2009/07/31

Languages： English

▼display all

Degree

Master of Science
Doctor of Science

Research Areas

Informatics / High performance computing / Supercomputer, high performance AI, power-efficient computing, HPC-driven Big Data, Heterogeneous HPC

Education

The University of Tokyo Graduate School, Division of Science

- 1989

　 More details

Country： Japan

researchmap
The University of Tokyo Faculty of Science

- 1986

　 More details

Country： Japan

researchmap

Research History

Institute of Science Tokyo School of Computing Visiting Professor

2024.10

　 More details

researchmap
Tokyo Institute of Technology School of Computing Visiting Professor

2023.4 - 2024.9

　 More details

researchmap
RIKEN Riken Center for Computational Science (R-CCS) Director

2018.4

　 More details

Country：Japan

researchmap
Tokyo Institute of Technology School of Computing Specially Appointed Professor

2018.4 - 2023.3

　 More details

researchmap
Tokyo Institute of Technology Global Scientific Information and Computing Center Professor

2001 - 2018.3

　 More details

Country：Japan

researchmap
Tokyo Institute of Technology Associate Professor

1996

　 More details

researchmap
The University of Tokyo Lecturer

1993

　 More details

researchmap
The University of Tokyo

1989

　 More details

researchmap

▼display all

Professional Memberships

IEEE Supercomputing

　 More details

researchmap
HPC Asia 2004

　 More details

researchmap
ACM Object-Oriented Programming: Languages, Systems and Applications (OOPSLA 2002)

　 More details

researchmap
IEEE Computing Clusters and the Grid (CCGrid 2003)

　 More details

researchmap

Committee Memberships

IEEE Supercomputing Area Chair

2004

　 More details

Committee type：Academic society

researchmap
HPC Asia 2004 Program Co-chair

2004

　 More details

Committee type：Academic society

researchmap
IEEE Supercomputing Area Chair

2004

　 More details

Committee type：Academic society

researchmap
HPC Asia 2004 Program Co-chair

2004

　 More details

Committee type：Academic society

HPC Asia 2004

researchmap
IEEE Computing Clusters and the Grid (CCGrid 2003) Program Chair

2003

　 More details

Committee type：Academic society

researchmap
IEEE Computing Clusters and the Grid (CCGrid 2003) Program Chair

2003

　 More details

Committee type：Academic society

researchmap
ACM Object-Oriented Programming: Languages, Systems and Applications (OOPSLA 2002) Program Chair

2002

　 More details

Committee type：Academic society

researchmap
ACM Object-Oriented Programming: Languages, Systems and Applications (OOPSLA 2002) Program Chair

2002

　 More details

Committee type：Academic society

researchmap

▼display all

Papers

Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers

Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Proceedings of the 39th ACM International Conference on Supercomputing 57 - 72 2025.6

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3721145.3730422

researchmap
A General and Scalable GCN Training Framework on CPU Supercomputers.

Chen Zhuang, Peng Chen 0035, Xin Liu 0020, Rio Yokota, Nikoli Dryden, Lingqi Zhang 0001, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

PPoPP 566 - 568 2025

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3710848.3710860

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ppopp/ppopp2025.html#ZhuangCLYD0EMW25
Real-time High-resolution X-Ray Computed Tomography Invited Reviewed

In proceedings of ACM International Conference on Supercomputing (ICS 2024), Kyoto, June 2023. 2024.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3650200.3656634

researchmap
Asynchronous I/O Optimization for X-Ray Imaging via GPUDirect Storage.

Du Wu, Peng Chen 0035, Yiyu Tan, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

IEEE International Conference on Cluster Computing 196 - 197 2024

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/CLUSTERWorkshops61563.2024.00056

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#WuCTTEMW24
Communication Optimization for Distributed GCN Training on ABCI Supercomputer.

Chen Zhuang, Peng Chen 0035, Xin Liu 0020, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

IEEE International Conference on Cluster Computing 160 - 161 2024

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/CLUSTERWorkshops61563.2024.00038

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#Zhuang0LEMW24
Investigating Nvidia GPU Architecture Trends via Microbenchmarks.

Lingqi Zhang 0001, Ryan Barton, Peng Chen 0035, Xiao Wang 0004, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

IEEE International Conference on Cluster Computing 174 - 175 2024

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/CLUSTERWorkshops61563.2024.00045

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#ZhangBCWEMW24
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.

Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang 0001, Peng Chen 0035, Aleksandr Drozd, Satoshi Matsuoka

ACM Transactions on Architecture and Code Optimization 20 ( 4 ) 57 - 26 2023.12

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1145/3629520

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/taco/taco20.html#DomkeVGKWPMPZCDM23
Myths and legends in high-performance computing.

Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler

International Journal of High Performance Computing Applications 37 ( 3-4 ) 245 - 259 2023.7

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1177/10943420231166608

researchmap
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications Invited Reviewed

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

In proceedings of ACM International Conference on Supercomputing (ICS 2023), Orlando, June 2023. 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3577193.3593705

researchmap
Revisiting Temporal Blocking Stencil Optimizations Invited Reviewed

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

In proceedings of ACM International Conference on Supercomputing (ICS 2023), Orlando, June 2023. 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3577193.3593716

researchmap
Exploiting Scratchpad Memory for Deep Temporal Blocking

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

Proceedings of the 15th Workshop on General Purpose Processing Using GPU 2023.2

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3589236.3589242

researchmap
Simeuro: A Hybrid CPU-GPU Parallel Simulator for Neuromorphic Computing Chips

Huaipeng Zhang, Nhut-Minh Ho, Yigit Polat Dogukan, Peng Chen, Mohamed Wahib, Truong Thao Nguyen, Jintao Meng, Rick Siow Mong Goh, Satoshi Matsuoka, Tao Luo, Weng-Fai Wong

IEEE Transactions on Parallel and Distributed Systems 1 - 16 2023

　More details

Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/tpds.2023.3291795

researchmap
Scalable FBP decomposition for cone-beam CT reconstruction

Peng Chen, Mohamed Wahib, Xiao Wang, Takahiro Hirofuchi, Hirotaka Ogawa, Ander Biguri, Richard Boardman, Thomas Blumensath, Satoshi Matsuoka

International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021.11

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3458817.3476139

Scopus

researchmap
Performance portable back-projection algorithms on CPUs

Peng Chen, Mohamed Wahib, Xiao Wang, Shinichiro Takizawa, Takahiro Hirofuchi, Hirotaka Ogawa, Satoshi Matsuoka

Proceedings of the ACM International Conference on Supercomputing 2021.6

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3447818.3460353

researchmap
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey C. Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shirahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda, Toshiyuki Imamura, Akihiko Kasagi, Kentaro Kawakami, Shuhei Kudo, Akiyoshi Kuroda, Maxime Martinasso, Satoshi Matsuoka, Henrique Mendonça, Kazuki Minami, Prabhat Ram, Takashi Sawada, Mallikarjun Shankar, Tom St. John, Akihiro Tabuchi, Venkatram Vishwanath, Mohamed Wahib, Masafumi Yamazaki, Junqi Yin

CoRR abs/2110.11466 2021

　More details

Publishing type：Research paper (scientific journal)

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2110.html#abs-2110-11466
Scalable FBP decomposition for cone-beam CT reconstruction.

Peng Chen, Mohamed Wahib, Xiao Wang 0004, Takahiro Hirofuchi, Hirotaka Ogawa, Ander Biguri, Richard P. Boardman, Thomas Blumensath, Satoshi Matsuoka

SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis(SC) 9 - 9 2021

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3458817.3476139

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/sc/sc2021.html#ChenWWHOBBBM21
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?

Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang 0001, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

1056 - 1065 2021

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS49936.2021.00114

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ipps/ipdps2021.html#DomkeVDCO0SMPWM21
A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs.

Lingqi Zhang 0001, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(IPDPS) 483 - 493 2020

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

GPUs are playing an increasingly important role in general-purpose computing.
Many algorithms require synchronizations at different levels of granularity in
a single GPU. Additionally, the emergence of dense GPU nodes also calls for
multi-GPU synchronization. Nvidia's latest CUDA provides a variety of
synchronization methods. Until now, there is no full understanding of the
characteristics of those synchronization methods. This work explores important
undocumented features and provides an in-depth analysis of the performance
considerations and pitfalls of the state-of-art synchronization methods for
Nvidia GPUs. The provided analysis would be useful when making design choices
for applications, libraries, and frameworks running on single and/or multi-GPU
environments. We provide a case study of the commonly used reduction operator
to illustrate how the knowledge gained in our analysis can be useful. We also
describe our micro-benchmarks and measurement methods.

DOI： 10.1109/IPDPS47924.2020.00057

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ipps/ipdps2020.html#ZhangWZM20
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism.

Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Erin McCarthy, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Peter Nugent, Brian Van Essen

CoRR abs/2007.12856 2020

　More details

Publishing type：Research paper (scientific journal)

We present scalable hybrid-parallel algorithms for training large-scale 3D
convolutional neural networks. Deep learning-based emerging scientific
workflows often require model training with large, high-dimensional samples,
which can make training much more costly and even infeasible due to excessive
memory usage. We solve these challenges by extensively applying hybrid
parallelism throughout the end-to-end training pipeline, including both
computations and I/O. Our hybrid-parallel algorithm extends the standard data
parallelism with spatial parallelism, which partitions a single sample in the
spatial domain, realizing strong scaling beyond the mini-batch dimension with a
larger aggregated memory capacity. We evaluate our proposed training algorithms
with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive
performance studies show that good weak and strong scaling can be achieved for
both networks using up 2K GPUs. More importantly, we enable training of
CosmoFlow with much larger samples than previously possible, realizing an
order-of-magnitude improvement in prediction accuracy.

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2007.html#abs-2007-12856
AN5D: automated stencil framework for high-degree temporal blocking on GPUs.

Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka

CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization(CGO) 199 - 211 2020

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

Stencil computation is one of the most widely-used compute patterns in high
performance computing applications. Spatial and temporal blocking have been
proposed to overcome the memory-bound nature of this type of computation by
moving memory pressure from external memory to on-chip memory on GPUs. However,
correctly implementing those optimizations while considering the complexity of
the architecture and memory hierarchy of GPUs to achieve high performance is
difficult. We propose AN5D, an automated stencil framework which is capable of
automatically transforming and optimizing stencil patterns in a given C source
code, and generating corresponding CUDA code. Parameter tuning in our framework
is guided by our performance model. Our novel optimization strategy reduces
shared memory and register pressure in comparison to existing implementations,
allowing performance scaling up to a temporal blocking degree of 10. We achieve
the highest performance reported so far for all evaluated stencil benchmarks on
the state-of-the-art Tesla V100 GPU.

DOI： 10.1145/3368826.3377904

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cgo/cgo2020.html#MatsumuraZWEM20
A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective.

Artur Podobas, Kentaro Sano, Satoshi Matsuoka

CoRR abs/2004.04509 2020

　More details

Publishing type：Research paper (scientific journal)

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2004.html#abs-2004-04509
A Template-based Framework for Exploring Coarse-Grained Reconfigurable Architectures.

Artur Podobas, Kentaro Sano, Satoshi Matsuoka

31st IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) 1 - 8 2020

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ASAP49362.2020.00010

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/asap/asap2020.html#PodobasSM20
AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs.

Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka

CoRR abs/2001.01473 2020

　More details

Publishing type：Research paper (scientific journal)

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2001.html#abs-2001-01473
A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs.

Lingqi Zhang 0001, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka

CoRR abs/2004.05371 2020

　More details

Publishing type：Research paper (scientific journal)

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2004.html#abs-2004-05371
Scaling distributed deep learning workloads beyond the memory capacity with KARMA.

Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang 0001, Ryousei Takano, Satoshi Matsuoka

19 - 19 2020

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/SC41405.2020.00023

researchmap

Other Link： https://dblp.uni-trier.de/conf/sc/2020
Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydin Buluc

PARALLEL COMPUTING 90 2019.12

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.parco.2019.102545

Web of Science

researchmap
iFDK

Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2019.11

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3295500.3356163

researchmap
Scaling Word2Vec on Big Corpus.

Bofang Li, Aleksandr Drozd, Yuhe Guo, Tao Liu 0001, Satoshi Matsuoka, Xiaoyong Du 0001

Data Sci. Eng. 4 ( 2 ) 157 - 175 2019

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1007/s41019-019-0096-6

researchmap
How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications.

Aamer Shah, Chih-Song Kuo, Akihiro Nomura 0002, Satoshi Matsuoka, Felix Wolf 0001

Supercomput. Front. Innov. 6 ( 2 ) 29 - 55 2019

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.14529/jsfi190203

researchmap
A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3295500.3356162

Web of Science

arXiv

researchmap

Other Link： http://arxiv.org/pdf/1907.06154v2
iFDK: A Scalable Framework for Instant High-resolution Image Reconstruction

Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3295500.3356163

Web of Science

arXiv

researchmap

Other Link： http://arxiv.org/pdf/1909.02724v1
Learning Neural Representations for Predicting GPU Performance

Shweta Salaria, Aleksandr Drozd, Artur Podobas, Satoshi Matsuoka

HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2019 11501 40 - 58 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-030-20656-7_3

Web of Science

researchmap
MH-QEMU: Memory-State-Aware Fault Injection Platform

Hideyuki Jitsumoto, Yuya Kobayashi, Akihiro Nomura, Satoshi Matsuoka

SUPERCOMPUTING FRONTIERS, SCFA 2019 11416 71 - 85 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-030-18645-6_5

Web of Science

researchmap
The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees?

Jens Domke, Satoshi Matsuoka, Ivan Radanov, Yuki Tsushima, Tomoya Yuki, Akihiro Nomura 0002, Shin'ichi Miura, Nic McDonald, Dennis Lee Floyd, Nicolas Dubé

2019 IEEE Symposium on High-Performance Interconnects 1 - 4 2019

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/HOTI.2019.00013

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/hoti/hoti2019.html#DomkeMRTY0MMFD19
Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method

Yohei Tsuji, Kazuki Osawa, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019) 21 - 8 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3339186.3339202

Web of Science

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/icppw/icppw2019.html#TsujiOUNYM19
Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35

Kazuki Oosawa, Youhei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

2019

　More details

researchmap
HyperX Topology: First At-Scale Implementation and Comparison to the Fat-Tree

Jens Domke, Satoshi Matsuoka, Ivan R. Ivanov, Yuki Tsushima, Tomoya Yuki, Akihiro Nomura, Shin'ichi Miura, Nic McDonald, Dennis L. Floyd, Nicolas Dube

PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3295500.3356140

Web of Science

researchmap
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks

Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) 12351 - 12359 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CVPR.2019.01264

Web of Science

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/conf/cvpr/2019
Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks

Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka

2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) 231 - 240 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGRID.2019.00037

Web of Science

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ccgrid/ccgrid2019.html#NagasakaNKM19
Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations

James Lin, Zhigeng Xu, Linjin Cai, Akira Nukada, Satoshi Matsuoka

PARALLEL COMPUTING 77 128 - 143 2018.9

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.parco.2018.06.001

Web of Science

researchmap
Interference between I/O and MPI Traffic on Fat-tree Networks

Kevin A. Brown, Nikhil Jain, Satoshi Matsuoka, Martin Schulz, Abhinav Bhatele

Proceedings of the 47th International Conference on Parallel Processing 2018.8

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3225058.3225144

researchmap
MRG8 - Random Number Generation for the Exascale Era

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka, Kenichi Miura, John Shalf

Proceedings of the Platform for Advanced Scientific Computing Conference 2018.7

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3218176.3218230

researchmap
世界最大規模のオープンAIインフラストラクチャAI橋渡しクラウド(ABCI)の概要

小川宏高, 松岡聡, 松岡聡, 佐藤仁, 高野了成, 滝澤真一朗, 谷村勇輔, 三浦信一, 三浦信一, 関口智嗣

情報処理学会研究報告(Web) 2018 ( HPC-165 ) Vol.2018‐HPC‐165,No.19,1‐7 (WEB ONLY) 2018.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

J-GLOBAL

researchmap
0.55AI‐EFLOPSの計算インフラストラクチャを支える超グリーンAIデータセンタ

高野了成, 三浦信一, 三浦信一, 杉田正, 小川宏高, 松岡聡, 松岡聡

情報処理学会研究報告(Web) 2018 ( HPC-165 ) Vol.2018‐HPC‐165,No.20,1‐7 (WEB ONLY) 2018.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

J-GLOBAL

researchmap
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydın Buluç

2018.4

　More details

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive
that is widely used in areas ranging from traditional numerical applications to
recent big data analysis and machine learning. Although many SpGEMM algorithms
have been proposed, hardware specific optimizations for multi- and many-core
processors are lacking and a detailed analysis of their performance under
various use cases and matrices is not available. We firstly identify and
mitigate multiple bottlenecks with memory management and thread scheduling on
Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and
many-core processors, we develop a hash-table-based algorithm and optimize a
heap-based shared-memory SpGEMM algorithm. We examine their performance
together with other publicly available codes. Different from the literature,
our evaluation also includes use cases that are representative of real graph
algorithms, such as multi-source breadth-first search or triangle counting. Our
hash-table and heap-based algorithms are showing significant speedups from
libraries in the majority of the cases while different algorithms dominate the
other scenarios with different matrix size, sparsity, compression factor and
operation type. We wrap up in-depth evaluation results and make a recipe to
give the best SpGEMM algorithm for target scenario. A critical finding is that
hash-table-based SpGEMM gets a significant performance boost if the nonzeros
are not required to be sorted within each row of the output matrix.

arXiv

researchmap

Other Link： http://arxiv.org/pdf/1804.01698v2
Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

2018.2

　More details

Recent developments in High Level Synthesis tools have attracted software
programmers to accelerate their high-performance computing applications on
FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms
of performance for stencil computation, most previous work achieve this by
avoiding spatial blocking and restricting input dimensions relative to FPGA
on-chip memory. In this work we create a stencil accelerator using Intel FPGA
SDK for OpenCL that achieves high performance without having such restrictions.
We combine spatial and temporal blocking to avoid input size restrictions, and
employ multiple FPGA-specific optimizations to tackle issues arisen from the
added design complexity. Accelerator parameter tuning is guided by our
performance model, which we also use to project performance for the upcoming
Intel Stratix 10 devices. On an Arria 10 GX 1150 device, our accelerator can
reach up to 760 and 375 GFLOP/s of compute performance, for 2D and 3D stencils,
respectively, which rivals the performance of a highly-optimized GPU
implementation. Furthermore, we estimate that the upcoming Stratix 10 devices
can achieve a performance of up to 3.5 TFLOP/s and 1.6 TFLOP/s for 2D and 3D
stencil computation, respectively.

DOI： 10.1145/3174243.3174248

arXiv

researchmap

Other Link： http://arxiv.org/pdf/1802.00438v1
Lock Contention Management in Multithreaded MPI

Abdelhalim Amer, Huiwei Lu, Pavan Balaji, Milind Chabbi, Yanjie Wei, Jeff Hammond, Satoshi Matsuoka

ACM TRANSACTIONS ON PARALLEL COMPUTING 5 ( 3 ) 2018.1

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1145/3275443

Web of Science

researchmap
Machine Learning Predictions for Underestimation of Job Runtime on HPC System

Jian Guo, Akihiro Nomura, Ryan Barton, Haoyu Zhang, Satoshi Matsuoka

Supercomputing Frontiers 179 - 198 2018

　More details

Publisher：Springer International Publishing

DOI： 10.1007/978-3-319-69953-0\_11

DOI： 10.1007/978-3-319-69953-0_11

researchmap
Machine Learning Predictions for Underestimation of Job Runtime on HPC System

Jian Guo, Akihiro Nomura, Ryan Barton, Haoyu Zhang, Satoshi Matsuoka

SUPERCOMPUTING FRONTIERS, SCFA 2018 10776 179 - 198 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-69953-0_11

Web of Science

researchmap
Efficient Solving of Scan Primitive on Multi-GPU Systems

Adrian P. Dieguez, Margarita Amor, Ramon Doallo, Akira Nukada, Satoshi Matsuoka

2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) 794 - 803 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS.2018.00089

Web of Science

researchmap
Predicting Performance Using Collaborative Filtering

Shweta Salaria, Aleksandr Drozd, Artur Podobas, Satoshi Matsuoka

2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 504 - 514 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CLUSTER.2018.00066

Web of Science

researchmap
Hardware Implementation of POSITs and Their Application in FPGAs

Artur Podobas, Satoshi Matsuoka

2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018) 138 - 145 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPSW.2018.00029

Web of Science

researchmap
Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs

Hiroki Kanezashi, Toyotaro Suzumura, Dario Garcia-Gasulla, Min-hwan Oh, Satoshi Matsuoka

2018 IEEE 25TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) 92 - 101 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/HiPC.2018.00019

Web of Science

arXiv

researchmap

Other Link： http://arxiv.org/pdf/1812.10321v1
Explorations of Data Swapping on Burst Buffer

Tianqi Xu, Kento Sato, Satoshi Matsuoka

2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018) 517 - 526 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICPADS.2018.00074

Web of Science

researchmap
DRAGON: Breaking GPU Memory Capacity Limits with Direct NVM Access

Pak Markthub, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, Satoshi Matsuoka

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18) 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Optimizing Preconditioned Conjugate Gradient on TaihuLight for OpenFOAM

James Lin, Minhua Wen, Delong Meng, Xin Liu, Akira Nukada, Satoshi Matsuoka

2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) 273 - 282 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGRID.2018.00042

Web of Science

researchmap
Accelerating Deep Learning Frameworks with Micro-batches

Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 402 - 412 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CLUSTER.2018.00058

Web of Science

researchmap
Cambrian Explosion of Computing and Big Data in the Post-Moore Era

Satoshi Matsuoka

HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING 105 - 105 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3208040.3225055

Web of Science

researchmap
Efficient Algorithms for the Summed Area Tables Primitive on GPUs

Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 482 - 493 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CLUSTER.2018.00064

Web of Science

researchmap
High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018) abs/2002.05983 123 - 130 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPSW.2018.00027

Web of Science

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2002.html#abs-2002-05983
MACC: An OpenACC Transpiler for Automatic Multi-GPU Use Reviewed

Kazuaki Matsumura, Mitsuhisa Sato, Taisuke Boku, Artur Podobas, Satoshi Matsuoka

SUPERCOMPUTING FRONTIERS, SCFA 2018 10776 109 - 127 2018

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-69953-0_7

Web of Science

Scopus

researchmap
Overview of TSUBAME3.0, Green Cloud Supercomputer for Convergence of HPC, AI and Big-Data

松岡聡, 遠藤敏夫, 額田彰, 三浦信一, 野村哲弘, 佐藤仁, 實本英之, DROZD Aleksandr

Tsubame e-Science Journal 16 02‐08 (JA),20‐27 (EN) - 8 2017.11

　More details

Language：Japanese

CiNii Books

J-GLOBAL

researchmap
Applying Temporal Blocking with a Directive-based Approach

Shota Kuroda, Toshio Endo, Satoshi Matsuoka

Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC 2017.11

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3148173.3148190

researchmap
AI橋渡しクラウド―AI Bridging Cloud Infrastructure(ABCI)―の構想

小川宏高, 松岡聡, 松岡聡, 佐藤仁, 高野了成, 滝澤真一朗, 谷村勇輔, 三浦信一, 関口智嗣

情報処理学会研究報告(Web) 2017 ( HPC-160 ) Vol.2017‐HPC‐160,No.28,1‐7 (WEB ONLY) 2017.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

J-GLOBAL

researchmap
HPCとビッグデータ・AIを融合するグリーン・クラウドスパコンTSUBAME3.0の概要

松岡聡, 遠藤敏夫, 額田彰, 三浦信一, 野村哲弘, 佐藤仁, 實本英之, DROZD Aleksandr

情報処理学会研究報告(Web) 2017 ( HPC-160 ) Vol.2017‐HPC‐160,No.29,1‐6 (WEB ONLY) 2017.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

J-GLOBAL

researchmap
Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration)

Kevin Brown, Tianqi Xu, Keita Iwabuchi, Kento Sato, Adam Moody, Kathryn Mohror, Nikhil Jain, Abhinav Bhatele, Martin Schulz, Roger Pearce, Maya Gokhale, Satoshi Matsuoka

2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW) 343 - 347 2017.6

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icdcsw.2017.74

researchmap
Efficient Breadth-First Search on Massively Parallel and Distributed-Memory Machines

Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, Satoshi Matsuoka

Data Science and Engineering 2 ( 1 ) 22 - 35 2017.3

　More details

Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

DOI： 10.1007/s41019-016-0024-y

researchmap

Other Link： http://link.springer.com/article/10.1007/s41019-016-0024-y/fulltext.html
Fast Recognition of Bird Sounds Using Extreme Learning Machines

Kun Qian, Jian Guo, Ken Ishida, Satoshi Matsuoka

IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING 12 ( 2 ) 294 - 296 2017.3

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1002/tee.22378

Web of Science

researchmap
Co-locating Graph Analytics and HPC Applications

Kevin Brown, Satoshi Matsuoka

2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 659 - 660 2017

　More details

Language：English

DOI： 10.1109/CLUSTER.2017.111

Web of Science

researchmap
Optimizations of Two Compute-bound Scientific Kernels on the SW26010 Many-core Processor

James Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka

2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) 432 - 441 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICPP.2017.52

Web of Science

researchmap
GPU-based Training of Autoencoders for Bird Sound Data Processing

Jian Guo, Kun Qian, Bjorn Schuller, Satoshi Matsuoka

2017 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW) 2017

　More details

Language：English

DOI： 10.1109/icce-china.2017.7991037

Web of Science

researchmap
High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) 101 - 110 2017

　More details

Language：English

DOI： 10.1109/ICPP.2017.19

Web of Science

researchmap
Being "BYTES-oriented" in HPC leads to an Open Big Data/AI Ecosystem and Further Advances into the Post-Moore Era

Satoshi Matsuoka

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) 5 - 5 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Benchmarking SW26010 Many-core Processor

Zhigeng Xu, James Lin, Satoshi Matsuoka

2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) 743 - 752 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPSW.2017.9

Web of Science

researchmap
Asynchronous, Data-Parallel Deep Convolutional Neural Network Training with Linear Prediction Model for Parameter Transition

Ikuro Sato, Ryo Fujisaki, Yosuke Oyama, Akihiro Nomura, Satoshi Matsuoka

NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II 10635 305 - 314 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-70096-0_32

Web of Science

researchmap
Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing

Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Designing and Accelerating Spiking Neural Networks using OpenCL for FPGAs

Artur Podobas, Satoshi Matsuoka

2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT) 255 - 258 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Evaluation of HPC-Big Data Applications Using Cloud Platforms

Shweta Salaria, Kevin Brown, Hideyuki Jitsumoto, Satoshi Matsuoka

2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) 1053 - 1061 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGRID.2017.143

Web of Science

researchmap
Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing

Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) 2017

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Migrating Legacy Fortran to Python While Retaining Fortran-Level Performance through Transpilation and Type Hints

Mateusz Bysiek, Aleksandr Drozd, Satoshi Matsuoka

2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC) 2016.11

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/pyhpc.2016.006

researchmap
Special Issue on Cluster Computing

Michela Taufer, Pavan Balaji, Satoshi Matsuoka

PARALLEL COMPUTING 58 25 - 26 2016.10

　More details

Language：English

DOI： 10.1016/j.parco.2016.09.001

Web of Science

researchmap
Critical mass in the emergence of collective intelligence: a parallelized simulation of swarms in noisy environments

Aleksandr Drozd, Olaf Witkowski, Satoshi Matsuoka, Takashi Ikegami

Artificial Life and Robotics 21 ( 3 ) 317 - 323 2016.9

　More details

Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

DOI： 10.1007/s10015-016-0303-8

researchmap

Other Link： http://link.springer.com/article/10.1007/s10015-016-0303-8/fulltext.html
仮想マシンエミュレータを用いた特定故障パターン発生時におけるアプリケーションの誤差の評価

小林佑矢, 實本英之, 野村哲弘, 松岡聡

情報処理学会研究報告(Web) 2016 ( HPC-155 ) Vol.2016‐HPC‐155,No.10,1‐7 (WEB ONLY) - 7 2016.8

　More details

Language：Japanese

J-GLOBAL

researchmap
Routing on the Dependency Graph

Jens Domke, Torsten Hoefler, Satoshi Matsuoka

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing 2016.5

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/2907294.2907313

researchmap
From FLOPS to BYTES: Disruptive change in high-performance computing towards the post-moore era Reviewed

Satoshi Matsuoka, Hideharu Amano, Kengo Nakajima, Koji Inoue, Tomohiro Kudoh, Naoya Maruyama, Kenjiro Taura, Takeshi Iwashita, Takahiro Katagiri, Toshihiro Hanawa, Toshio Endo

2016 ACM International Conference on Computing Frontiers - Proceedings 274 - 281 2016.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Association for Computing Machinery, Inc

DOI： 10.1145/2903150.2906830

Scopus

researchmap
From FLOPS to BYTES

Satoshi Matsuoka, Hideharu Amano, Kengo Nakajima, Koji Inoue, Tomohiro Kudoh, Naoya Maruyama, Kenjiro Taura, Takeshi Iwashita, Takahiro Katagiri, Toshihiro Hanawa, Toshio Endo

Proceedings of the ACM International Conference on Computing Frontiers 274 - 281 2016.5

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/2903150.2906830

researchmap
Serving More GPU Jobs, with Low Penalty, using Remote GPU Execution and Migration

Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 485 - 488 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CLUSTER.2016.36

Web of Science

researchmap
Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures

Abdelhalim Amer, Satoshi Matsuoka, Miquel Pericas, Naoya Maruyama, Kenjiro Taura, Rio Yokota, Pavan Balaji

OPENMP: MEMORY, DEVICES, AND TASKS 9903 156 - 170 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-45550-1_12

Web of Science

researchmap
GPU-Based Fast Signal Processing for Large Amounts of Snore Sound Data

Jian Guo, Kun Qian, Huijie Xu, Christoph Janott, Bjoern Schuller, Satoshi Matsuoka

2016 IEEE 5TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Tapas: An Implicitly Parallel Programming Framework For Hierarchical N-body Algorithms

Keisuke Fukuda, Motohiko Matsuda, Naoya Maruyama, Rio Yokota, Kenjiro Taura, Satoshi Matsuoka

2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) 1100 - 1109 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICPADS.2016.143

Web of Science

researchmap
アプリケーションからみた将来の HPCI システムへの要件の抽出のためのベンチマーク

野村哲弘, 鈴木惣一朗, 三上和徳, 丸山直也, 松岡聡

2016

　More details

researchmap
Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn't.

Anna Gladkova, Aleksandr Drozd, Satoshi Matsuoka

Proceedings of the NAACL Student Research Workshop 2016

　More details

Publishing type：Research paper (international conference proceedings) Publisher：Association for Computational Linguistics

DOI： 10.18653/v1/n16-2002

researchmap
Word embeddings, analogies, and machine learning: beyond king - man + woman = queen

Aleksandr Drozd, Anna Gladkova, Satoshi Matsuoka

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers 2016

　More details

Publisher：The COLING 2016 Organizing Committee

researchmap
Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs

Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, Satoshi Matsuoka

SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS 409 - 420 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
CloudBB: Scalable I/O Accelerator for Shared Cloud Storage

Tianqi Xu, Kento Sato, Satoshi Matsuoka

2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) 509 - 518 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICPADS.2016.72

Web of Science

researchmap
I/O Chunking and Latency Hiding Approach for Out-of-core Sorting Acceleration using GPU and Flash NVM Reviewed

Hitoshi Sato, Ryo Mizote, Satoshi Matsuoka, Hirotaka Ogawa

2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) 398 - 403 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Predicting Statistics of Asynchronous SGD Parameters for a Large-Scale Distributed Deep Learning System on GPU Supercomputers

Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, Yukimasa Tamatsu, Satoshi Matsuoka

2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) 66 - 75 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016) 80 131 - 142 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1016/j.procs.2016.05.304

Web of Science

researchmap
Extreme Scale Breadth-First Search on Supercomputers

Koji Ueno, Toyotaro Suzumura, Naova Maruyama, Katsuki Fujisawa, Satoshi Matsuoka

2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) 1040 - 1047 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Towards a Distributed Large-Scale Dynamic Graph Data Store

Keita Iwabuchi, Scott Sallinen, Roger Pearce, Brian Van Essen, Maya Gokhale, Satoshi Matsuoka

2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) 892 - 901 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPSW.2016.189

Web of Science

researchmap
A Directive-based Data Layout Abstraction for Performance Portability of OpenACC Applications

Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka

PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) 1147 - 1154 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/HPCC-SmartCity-DSS.2016.34

Web of Science

researchmap
Towards Convergence of Extreme Computing and Big Data Centers

Satoshi Matsuoka

DIDC'16: PROCEEDINGS OF THE ACM INTERNATIONAL WORKSHOP ON DATA-INTENSIVE DISTRIBUTED COMPUTING 1 - 1 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2912152.2912159

Web of Science

researchmap
Discovering Aspectual Classes of Russian Verbs in Untagged Large Corpora

Aleksandr Drozd, Anna Gladkova, Satoshi Matsuoka

2015 IEEE International Conference on Data Science and Data Intensive Systems 2015.12

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/dsdis.2015.30

researchmap
MPI plus Threads: Runtime Contention and Remedies

Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, Satoshi Matsuoka

ACM SIGPLAN NOTICES 50 ( 8 ) 239 - 248 2015.8

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1145/2688500.2688522

Web of Science

researchmap
TSUBAME2におけるスケジュール効率化への取り組みとユーザ動向の見える化

野村哲弘, 野村哲弘, 佐々木淳, 三浦信一, 三浦信一, 遠藤敏夫, 遠藤敏夫, 松岡聡, 松岡聡

情報処理学会研究報告(Web) 2015 ( HPC-150 ) VOL.2015-HPC-150,NO.2 (WEB ONLY) 2015.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

J-GLOBAL

researchmap
Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers

Toshio Endo, Yuki Takasaki, Satoshi Matsuoka

2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) 625 - 632 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICPADS.2015.84

Web of Science

researchmap
Evaluating AVX2 Vgather Instruction with Stencils

James Lin, Qiang Qin, Shuo Li, Minhua Wen, Satoshi Matsuoka

2015

　More details

researchmap
GPUクラスタにおける大規模都市気流シミュレーションの最適化と性能モデル

高嵜祐樹, 遠藤敏夫, 松岡聡

2015

　More details

researchmap
Python, performance, and natural language processing

Aleksandr Drozd, Anna Gladkova, Satoshi Matsuoka

Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing - PyHPC '15 2015

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM Press

DOI： 10.1145/2835857.2835858

researchmap
Design of a NVRAM Specialized Degree Aware Dynamic Graph Data Structure

Keita Iwabuchi, Roger Pearce, Brian Van Essen, Maya Gokhale, Satoshi Matsuoka

2015

　More details

researchmap
Performance Analysis of MapReduce Implementations for High Performance Homology Search

Chaojie Zhang, Koichi Shirahata, Shuji Suzuki, Yutaka Akiyama, Satoshi Matsuoka

2015

　More details

researchmap
Porting and Optimizing GTC-P on Sunway TaihuLight Supercomputer with Sunway OpenACC

Yichao Wang, James Lin, Linjin Cai, William Tang, Stephane Ethier, Bei Wang, Simon See, Satoshi

2015

　More details

researchmap
Pregelグラフ処理系におけるメッセージ配送最適化

上野晃司, 鈴村豊太郎, 松岡聡

2015

　More details

researchmap
Signal-Driven Swarming: A Parallel Implementation of Evolved Autonomous Agents to Perform A Foraging Task

Aleksandr Drozd, Olaf Witkowski, Satoshi Matsuoka, Takashi Ikegami

2015

　More details

researchmap
Software Certification in Legal and Scientific Metrology

MATSUOKA Satoshi

Journal of The Society of Instrument and Control Engineers 54 ( 10 ) 766 - 769 2015

　More details

Language：Japanese Publisher：The Society of Instrument and Control Engineers

DOI： 10.11499/sicejl.54.766

CiNii Books

researchmap

Other Link： https://jlc.jst.go.jp/DN/JLC/20016439007?from=CiNii
Exploration of Lossy Compression for Application-level Checkpoint/Restart Reviewed

Naoto Sasaki, Kento Sato, Toshio Endo, Satoshi Matsuoka

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) 914 - 922 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS.2015.67

Web of Science

researchmap
Efficient Execution of Multiple CUDA Applications Using Transparent Suspend, Resume and Migration

Taichiro Suzuki, Akira Nukada, Satoshi Matsuoka

EURO-PAR 2015: PARALLEL PROCESSING 9233 687 - 699 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-662-48096-0_53

Web of Science

researchmap
Understanding Performance Portability of OpenACC for Supercomputers

Suttinee Sawadsitang, James Lin, Simon See, Francois Bodin, Satoshi Matsuoka

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS 699 - 707 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPSW.2015.60

Web of Science

researchmap
Hardware-Centric Analysis of Network Performance for MPI Applications

Kevin A. Brown, Jens Domke, Satoshi Matsuoka

2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) 692 - 699 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICPADS.2015.92

Web of Science

researchmap
Characterizing MPI and Hybrid MPI plus Threads Applications at Scale: Case Study with BFS

Abdelhalim Amer, Huiwei Lu, Pavan Balaji, Satoshi Matsuoka

2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING 1075 - 1083 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGrid.2015.93

Web of Science

researchmap
Modeling Gather and Scatter with Hardware Performance Counters for Xeon Phi

James Lin, Akira Nukada, Satoshi Matsuoka

2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING 713 - 716 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGrid.2015.59

Web of Science

researchmap
An OpenACC Extension for Data Layout Transformation

Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka

2014 First Workshop on Accelerator Programming using Directives 12 - 18 2014.11

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/waccpd.2014.12

researchmap
Tracing data movements within MPI collectives

Kevin A. Brown, Jens Domke, Satoshi Matsuoka

ACM International Conference Proceeding Series 09-12-September-2014 117 - 118 2014.9

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2642769.2642789

Scopus

researchmap
実アプリケーションを用いた計算機評価ベンチマークと性能リポジトリの開発

野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡

情報処理学会研究報告(Web) 2014 ( 29 ) 1 - 7 2014.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：一般社団法人情報処理学会

CiNii Books

J-GLOBAL

researchmap
HPCI先端ソフトウェア運用基盤の構築と運用

三浦信一, 滝澤真一朗, 松岡聡, 棟朝雅晴, 實本英之, 小林泰三

情報処理学会研究報告(Web) 2014 ( 30 ) 1 - 6 2014.2

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：一般社団法人情報処理学会

CiNii Books

J-GLOBAL

researchmap
Cache-aware Sparse Matrix Formats for Kepler GPU

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) 281 - 288 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Special issue: SC13-The International Conference for High Performance Computing, Networking, Storage and Analysis

William Gropp, Satoshi Matsuoka

SCIENTIFIC PROGRAMMING 22 ( 2 ) 57 - 58 2014

　More details

Language：English

DOI： 10.1155/2014/915921

Web of Science

researchmap
Petascale General Solver for Semidefinite Programming Problems with over Two Million Constraints

Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki

2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS.2014.121

Web of Science

researchmap
NVM-based Hybrid BFS with Memory Efficient Data Structure

Keita Iwabuchi, Hitoshi Sato, Yuichiro Yasui, Katsuki Fujisawa, Satoshi Matsuoka

2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) 529 - 538 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Hybrid BFS Approach Using Semi-External Memory

Keita Iwabuchi, Hitoshi Sato, Ryo Mizote, Yuichiro Yasui, Katsuki Fujisawa, Satoshi Matsuoka

PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) 1698 - 1707 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPSW.2014.189

Web of Science

researchmap
Fail-in-Place Network Design: Interaction between Topology, Routing Algorithm and Failures

Jens Domke, Torsten Hoefler, Satoshi Matsuoka

SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS 597 - 608 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/SC.2014.54

Web of Science

researchmap
TSUBAME-KFC: a Modern Liquid Submersion Cooling Prototype towards Exascale Becoming the Greenest Supercomputer in the World

Toshio Endo, Akira Nukada, Satoshi Matsuoka

2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) 360 - 367 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Efficient String Sorting on Multi- and Many-Core Architectures

Aleksandr Drozd, Miquel Pericas, Satoshi Matsuoka

2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS) 637 - 644 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/BigData.Congress.2014.97

Web of Science

researchmap
Extreme Big Data (EBD): Next Generation Big Data Infrastructure Technologies Towards Yottabyte/Year.

Satoshi Matsuoka, Hitoshi Sato, Osamu Tatebe, Michihiro Koibuchi, Ikki Fujiwara, Shuji Suzuki, Masanori Kakuta, Takashi Ishida, Yutaka Akiyama, Toyotaro Suzumura, Koji Ueno, Hiroki Kanezashi, Takemasa Miyoshi

Supercomput. Front. Innov. 1 ( 2 ) 89 - 107 2014

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.14529/jsfi140206

researchmap
Tracing Data Movements within MPI Collectives.

Kevin A. Brown, Jens Domke, Satoshi Matsuoka

117 - 117 2014

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2642769.2642789

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/pvm/eurompi2014.html#BrownDM14
Latent Fault Detection With Unbalanced Workloads

Moshe Gabel, Kento Sato, Daniel Keren, Satoshi Matsuoka, Assaf Schuster

2014

　More details

researchmap
Node-level Memory Access Optimization on Intel Knights Corner

James Lin, Shuo Li, Jiaming Zhao, Satoshi Matsuoka

2014

　More details

researchmap
Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Processing

Koichi Shirahata, Hitoshi Sato, Satoshi Matsuoka

2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 221 - 229 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Large-scale Distributed Sorting for GPU-based Heterogeneous Supercomputers

Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato, Satoshi Matsuoka

2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) 510 - 518 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Scalable Analysis of Multicore Data Reuse and Sharing

Miquel Pericas, Kenjiro Taura, Satoshi Matsuoka

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14) 353 - 362 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2597652.2597674

Web of Science

researchmap
Analysis of Data Reuse in Task-Parallel Runtimes

Miquel Pericas, Abdelhalim Amer, Kenjiro Taura, Satoshi Matsuoka

HIGH PERFORMANCE COMPUTING SYSTEMS: PERFORMANCE MODELING, BENCHMARKING AND SIMULATION 8551 73 - 87 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-10214-6_4

Web of Science

researchmap
FMI: Fault Tolerant Messaging Interface for Fast and Transparent Recovery

Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka

2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS.2014.126

Web of Science

researchmap
A User-level InfiniBand-based File System and Checkpoint Strategy for Burst Buffers

Kento Sato, Kathryn Mohror, Adam Moody, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka

2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) 21 - 30 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGrid.2014.24

Web of Science

researchmap
Using rCUDA to Reduce GPU Resource-assignment Fragmentation caused by Job Scheduler

Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

2014 15TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2014) 105 - 112 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/PDCAT.2014.26

Web of Science

researchmap
How File Access Patterns Influence Interference Among Cluster Applications

Chih-Song Kuo, Aamer Shah, Akihiro Nomura, Satoshi Matsuoka, Felix Wolf

2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 185 - 193 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
システム評価のためのアプリケーション性能リポジトリの構築と性能モデルの評価

野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡, 鈴木惣一朗, 丸山直也

情報処理学会研究報告(Web) 2013 ( 4 ) 1 - 6 2013.7

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

CiNii Books

J-GLOBAL

researchmap
Guest Editors' Introduction: Special Issue on Applications for the Heterogeneous Computing Era

Pavan Balaji, Satoshi Matsuoka

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 27 ( 2 ) 87 - 88 2013.5

　More details

Language：English

DOI： 10.1177/1094342013481158

Web of Science

researchmap
Tsubame2.0: The first petascale supercomputer in japan and the greatest production in the world

Satoshi Matsuoka, Takayuki Aoki, Toshio Endo, Hitoshi Sato, Shin'Ichiro Takizawa, Akihiko Nomura, Kento Sato

Contemporary High Performance Computing: From Petascale toward Exascale 525 - 555 2013.1

　More details

Publishing type：Part of collection (book)

Scopus

researchmap
Scalable Implementation of a MapReduce-based Graph Processing Algorithm for Large-scale Heterogeneous Supercomputers

Koichi Shirahata, Hitoshi Sato, Toyotaro Suzumura, Satoshi Matsuoka

PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013) 277 - 284 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGrid.2013.85

Web of Science

researchmap
Energy-aware I/O optimization for checkpoint and restart on a NAND flash memory system

Takafumi Saito, Kento Sato, Hitoshi Sato, Satoshi Matsuoka

FTXS 2013 - Proceedings of the 3rd ACM Workshop on Fault-Tolerance for HPC at eXtreme Scale 41 - 47 2013

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2465813.2465822

Scopus

researchmap
Proceedings of SC13 The International Conference for High Performance Computing, Networking, Storage and Analysis Denver, Colorado 17-22 November 2013

William Gropp, Satoshi Matsuoka

2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC) 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
A Multi GPU Read Alignment Algorithm with Model-Based Performance Optimization

Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka

HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012 7851 270 - 277 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
A Parallel Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPUs

Guanghao Jin, Toshio Endo, Satoshi Matsuoka

2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application11

Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka, Ryoji Takaki

PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013) 136 - 143 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGrid.2013.12

Web of Science

researchmap
Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM

Abdelhalim Amer, Naoya Maruyama, Miquel Pericas, Kenjiro Taura, Rio Yokota, Satoshi Matsuoka

SUPERCOMPUTING (ISC 2013) 7905 255 - 266 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Energy-aware I/O optimization for checkpoint and restart on a NAND flash memory system.

Takafumi Saito, Kento Sato, Hitoshi Sato, Satoshi Matsuoka

41 - 48 2013

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2465813.2465822

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/hpdc/ftxs2013.html#SaitoSSM13
Multi-GPU Implementation of the NICAM Atmospheric Model

Irina Demeshko, Naoya Maruyama, Hirofumi Tomita, Satoshi Matsuoka

EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS 7640 175 - 184 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Improving the computing efficiency of HPC systems using a combination of proactive and preventive checkpointing

Mohamed Slim Bouguerra, Ana Gainaru, Leonardo Bautista Gomez, Franck Cappello, Satoshi Matsuoka, Naoya Maruyama

IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013) 501 - 512 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS.2013.74

Web of Science

researchmap
Towards Exascale with the ANR-JST Japanese-French Project FP3C Reviewed

G. Antoniu, T. Boku, A. Buttari, C. Calvin, P. Codognet, M. Dayde, N. Emad, Y. Ishikawa, G. Joslin, S. Matsuoka, K. Nakajima, H. Nakashima, R. Namyst, S. Petiton, T. Sakurai, M. Sato

2013 COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES (CSIT) 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CSITechnol.2013.6710357

Web of Science

researchmap
Scalable Reed-Solomon-Based Reliable Local Storage for HPC Applications on IaaS Clouds

Leonardo Bautista Gomez, Bogdan Nicolae, Naoya Maruyama, Franck Cappello, Satoshi Matsuoka

EURO-PAR 2012 PARALLEL PROCESSING 7484 313 - 324 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Hierarchical Clustering Strategies for Fault Tolerance in Large Scale HPC Systems

Leonardo Bautista-Gomez, Thomas Ropars, Naoya Maruyama, Franck Cappello, Satoshi Matsuoka

2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 355 - 363 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CLUSTER.2012.71

Web of Science

researchmap
Using Bittorrent and SVC for Efficient Video Sharing and Streaming

Amer Abdelhalim, Toufik Ahmed, Hidouci Walid-Khaled, Satoshi Matsuoka

2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC) 537 - 543 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Design and Modeling of a Non-blocking Checkpointing System

Kento Sato, Kathryn Mohror, Adam Moody, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka

2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC) 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Topic 16: GPU and Accelerators Computing

Alex Ramirez, Dimitrios S. Nikolopoulos, David Kaeli, Satoshi Matsuoka

EURO-PAR 2012 PARALLEL PROCESSING 7484 857 - 858 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Design and Implementation of Portable and Efficient Non-blocking Collective Communication. Reviewed

Akihiro Nomura 0002, Yutaka Ishikawa, Naoya Maruyama, Satoshi Matsuoka

12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing(CCGRID) 1 - 8 2012

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE Computer Society

DOI： 10.1109/CCGrid.2012.96

researchmap
High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems

Katsuki Fujisawa, Toshio Endo, Hitoshi Sato, Makoto Yamashita, Satoshi Matsuoka, Maho Nakata

2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC) 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Sequence Alignment on Massively Parallel Heterogeneous Systems

Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka

2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW) 2498 - 2501 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPSW.2012.311

Web of Science

researchmap
Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Akira Nukada, Kento Sato, Satoshi Matsuoka

2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC) 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
The International Exascale Software Project roadmap Reviewed

Jack Dongarra, Pete Beckman, Terry Moore, Patrick Aerts, Giovanni Aloisio, Jean-Claude Andre, David Barkai, Jean-Yves Berthou, Taisuke Boku, Bertrand Braunschweig, Franck Cappello, Barbara Chapman, Xuebin Chi, Alok Choudhary, Sudip Dosanjh, Thom Dunning, Sandro Fiore, Al Geist, Bill Gropp, Robert Harrison, Mark Hereld, Michael Heroux, Adolfy Hoisie, Koh Hotta, Zhong Jin, Yutaka Ishikawa, Fred Johnson, Sanjay Kale, Richard Kenway, David Keyes, Bill Kramer, Jesus Labarta, Alain Lichnewsky, Thomas Lippert, Bob Lucas, Barney Maccabe, Satoshi Matsuoka, Paul Messina, Peter Michielse, Bernd Mohr, Matthias S. Mueller, Wolfgang E. Nagel, Hiroshi Nakashima, Michael E. Papka, Dan Reed, Mitsuhisa Sato, Ed Seidel, John Shalf, David Skinner, Marc Snir, Thomas Sterling, Rick Stevens, Fred Streitz, Bob Sugar, Shinji Sumimoto, William Tang, John Taylor, Rajeev Thakur, Anne Trefethen, Mateo Valero, Aad van der Steen, Jeffrey Vetter, Peg Williams, Robert Wisniewski, Kathy Yelick

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 25 ( 1 ) 3 - 60 2011.2

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1177/1094342010391989

Web of Science

researchmap
Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers.

Naoya Maruyama, Tatsuo Nomura, Kento Sato, Satoshi Matsuoka

11 - 12 2011

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2063384.2063398

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/sc/sc2011.html#MaruyamaNSM11
Physis: An implicitly parallel programming model for stencil computations on large-scale gpu-accelerated supercomputers

Naoya Maruyama, Tatsuo Nomura, Kento Sato, Satoshi Matsuoka

Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis 2011

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/2063384.2063398

Scopus

researchmap
Performance Characteristics of Graph500 on Large-Scale Distributed Environment

Toyotaro Suzumura, Koji Ueno, Hitoshi Sato, Katsuki Fujisawa, Satoshi Matsuoka

2011 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC) 149 - 158 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Model-based Fault Localization: Finding Behavioral Outliers in Large-scale Computing Systems

Naoya Maruyama, Satoshi Matsuoka

NEW GENERATION COMPUTING 28 ( 3 ) 237 - 255 2010.7

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1007/s00354-009-0088-6

Web of Science

researchmap
THE INTERNATIONAL EXASCALE SOFTWARE PROJECT: A CALL TO COOPERATIVE ACTION BY THE GLOBAL HIGH-PERFORMANCE COMMUNITY

Jack Dongarra, Pete Beckman, Patrick Aerts, Frank Cappello, Thomas Lippert, Satoshi Matsuoka, Paul Messina, Terry Moore, Rick Stevens, Anne Trefethen, Mateo Valero

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 23 ( 4 ) 309 - 322 2009.11

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1177/1094342009347714

Web of Science

researchmap
Linpack Tuning on a Heterogeneous Supercomputer with Four Types of Processors

ENDO TOSHIO, NUKADA AKIRA, MATSUOKA SATOSHI, MARUYAMA NAOYA, JITSUMOTO HIDEYUKI

IPSJ SIG Notes 182 ( 14 ) 13 - 18 2009.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We report Linpack benchmark results on the TSUBAME supercomputer, a large scale heterogenous system with two types of general processors and two types of accelerators. Although accelerator architectures are promising for performance improvement of computer systems while keeping power consumption and footprint low, there are only few reports about large scale computations on a large number of accelerators, except our previous trials. With all of about 10,000 Opteron cores, 500 Xeon cores, 640 ClearSpeed accelerators and 620 NVIDIA Tesla GPUs, we have achieved 77TFlops in Linpack. Keys for obtaining this result are modification to the program code and careful tuning that preserve performance of accelerators. With this result, TSUBAME is ranked as 29th in the latest Top500 supercomputer ranking, and it is the second largest heterogeneous system in the world.

CiNii Books

researchmap
Speculative Checkpointing: Exploiting Temporal Affinity of Memory Operations Reviewed

Satoshi Matsuoka, Ikuhei Yamagata, Hideyuki Jitsumoto

Conference on High Performance Computing (HPC Asia 2009) 2009

　More details

researchmap
Adaptive Resource Indexing Technique for Unstructured Peer-to-Peer Networks

Sumeth Lerthirunwong, Naoya Maruyama, Satoshi Matsuoka

CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID 172 - 179 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGRID.2009.41

Web of Science

researchmap
Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters

Tomoaki Hamano, Toshio Endo, Satoshi Matsuoka

2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5 1912 - 1919 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Auto-Tuning 3-D FFT Library for CUDA GPUs

Akira Nukada, Satoshi Matsuoka

PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Fast Conjugate Gradients with Multiple GPUs

Ali Cevahir, Akira Nukada, Satoshi Matsuoka

COMPUTATIONAL SCIENCE - ICCS 2009, PART I 5544 893 - 903 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
A Model-Based Algorithm for Optimizing I/O Intensive Applications in Clouds using VM-Based Migration

Kento Sato, Hitoshi Sato, Satoshi Matsuoka

CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID 466 - + 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGRID.2009.24

Web of Science

researchmap
File Clustering Based Replication Algorithm in a Grid Environment Reviewed

Hitoshi Sato, Satoshi Matsuoka, Toshio Endo

CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID 204 - 211 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGRID.2009.73

Web of Science

researchmap
Aspects of GPU for General Purpose High Performance Computing

Reiji Suda, Takayuki Aoki, Shoichi Hirasawa, Akira Nukada, Hiroki Honda, Satoshi Matsuoka

PROCEEDINGS OF THE ASP-DAC 2009: ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 2009 216 - + 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Coupled-Simulation e-Science Support in the NAREGI Grid Reviewed

Satoshi Matsuoka, Kazushige Saga, Mutsumi Aoyagi

COMPUTER 41 ( 11 ) 42 - + 2008.11

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/MC.2008.449

Web of Science

researchmap
GridARS: An Advance Reservation-based Grid Co-allocation Framework for Distributed Computing and Network Resources Reviewed

Atsuko Takefusa, Hidemoto Nakada, Tomohiro Kudoh, Yoshio Tanaka, Satoshi Sekiguchi

Proc. 13th Workshop on Job Scheduling Strategies for Parallel Processing (LNCS 4942) 152 - 168 2008.4

　More details

Language：English Publishing type：Research paper (international conference proceedings)

For high performance parallel computing on actual Grids, one of the important issues is to co-allocate the distributed resources that are managed by various local schedulers with advance reservation. To address the issue, we proposed and developed the GridARS resource co-allocation framework, and a general advance reservation protocol that uses WSRF/GSI and a two-phased commit (2PC) protocol to enable a generic and secure advance reservation process based on distributed transactions, and provides the interface module for various existing resource schedulers. To confirm the effectiveness of GridARS, we describe the performance of a simultaneous reservation process and a case study of GridARS grid co-allocation over transpacific computing and network resources. Our experiments showed that: 1) the GridARS simultaneous 2PC reservation process is scalable and practical and 2) GridARS can co-allocate distributed resources managed by various local schedulers stably.

DOI： 10.1007/978-3-540-78699-3_9

researchmap
An efficient, model-based CPU-GPU heterogeneous FFT library Reviewed

Yasuhito Ogata, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka

2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 380 - + 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Model-Based Optimization for Data-Intensive Application on Virtual Cluster

Kento Sato, Hitoshi Sato, Satoshi Matsuoka

2008 9TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING 367 - 368 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Bandwidth Intensive 3-D FFT kernel for GPUs using CUDA

Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka

INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS 273 - 283 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Index Distribution Technique for Efficient Search on Unstructured Peer-to-Peer Networks

Sumeth Lerthirunwong, Naoya Maruyama, Satoshi Matsuoka

ECTI-CON 2008: PROCEEDINGS OF THE 2008 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, VOLS 1 AND 2 97 - + 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
The Rise of the Commodity Vectors

Satoshi Matsuoka

HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2008 5336 53 - 62 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Model-based fault localization in large-scale computing systems

Naoya Maruyama, Satoshi Matsuoka

2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 1841 - 1852 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
The road to TSUBAME and beyond

Satoshi Matsuoka

HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2007 265 - 267 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-540-74384-2_19

Web of Science

researchmap
Connecting Text Mining and Pathways using the PathText Resource Reviewed

Kemper Oda Okazaki Saetre, Matsuoka, Kikuchi, Kitano, Ananiadou Tsujii Tsuruoka

SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 1736 - 1740 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid Environment Reviewed

Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Naoya Maruyama

2008 9TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING 250 - 257 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method Reviewed

Yuto Hosogaya, Toshio Endo, Satoshi Matsuoka

2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 862 - 869 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Massive supercomputing coping with heterogeneity of modern accelerators Reviewed

Toshio Endo, Satoshi Matsuoka

2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 1179 - 1188 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Locality aware MPI communication on a commodity opto-electronic hybrid network Reviewed

Shin'ichiro Takizawa, Toshio Endo, Satoshi Matsuoka

2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 2158 - + 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
An Advance Reservation-based Computation Resource Manager for Global Scheduling Reviewed

Hidemoto Nakada, Atsuko Takefusa, Katsuhiko Ookubo, Tomohiro Kudoh, Yoshio Tanaka, Satoshi Sekiguchi

Proc. of GCA 2007 3 - 14 2007.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Advance Reservation is one possible way to enable resource co-allocation on the Grid. This method requires all the resources to have advance reservation capability as well as coordination protocol support. We employed 2-phased commit protocol as a coordination protocol, which is common in the distributed transaction area, and implemented an Advance Reservation Manager called {\bf PluS}. PluS works with existing local queuing managers, such as TORQUE or Grid Engine, and provides users advance reservation capability. To provide the capability, there are two implementation methods; 1) completely replaces the scheduling module of the queuing manger, 2) represents reservation as a queue and controls the queues using external interface. We designed and implemented a reservation manager with both way, and evaluated them. We found that the former has smaller overhead and allows arbitrary scheduling policy, while the latter is much easier to implement withacceptable response time.

DOI： 10.1142/9789812708823_0001

researchmap
Foundations for Dependable Computing Infrastructure in the Information Explosion Era(<Special Issue>Grant in Aid for Scientific Research on Priority Areas: Cyber Infrastructure for the Information Explosion Era)

MATSUOKA Satoshi, SHIBAYAMA Etsuya, CHIKAYAMA Takashi, NAKAJIMA Tatsuo, TAURA Kenjiro

Journal of the Japanese Society for Artificial Intelligence 22 ( 2 ) 222 - 228 2007.3

　More details

Language：Japanese Publisher：The Japanese Society for Artificial Intelligence

DOI： 10.11517/jjsai.22.2_222

CiNii Books

CiNii Research

researchmap

Other Link： http://id.nii.ac.jp/1004/00006713/
A decentralized, scalable, and autonomous grid monitoring system Reviewed

Laurent Baduel, Satoshi Matsuoka

PRINCIPLES OF DISTRIBUTED SYSTEMS, PROCEEDINGS 4878 1 - 15 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
情報爆発時代における安全・安心ITシステム基盤 Reviewed

松岡聡, 柴山悦哉, 近山隆, 田浦健次朗

人工知能学会誌 22 222 - 228 2007

　More details

Publishing type：Research paper (scientific journal)

CiNii Research

researchmap

Other Link： https://kaken.nii.ac.jp/grant/KAKENHI-PLANNED-18049073/
Grid'BnB: A parallel branch and bound framework for grids

Denis Caromel, Alexandre di Costanzo, Laurent Baduel, Satoshi Matsuoka

HIGH PERFORMANCE COMPUTING - HIPC 2007, PROCEEDINGS 4873 566 - 579 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Teddy: a sketching interface for 3D freeform design.

Takeo Igarashi, Satoshi Matsuoka, Hidehiko Tanaka

21 - 21 2007

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/1281500.1281532

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/siggraph/siggraph2007courses.html#IgarashiMT07
ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs. Reviewed

Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka

21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26-30 March 2007, Long Beach, California, USA 1 - 8 2007

　More details

Publisher：IEEE

DOI： 10.1109/IPDPS.2007.370603

researchmap
A peer-to-peer infrastructure for autonomous grid monitoring Reviewed

Laurent Baduel, Satoshi Matsuoka

Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS.2007.370653

Scopus

researchmap
Virtual clusters on the fly - Fast, scalable, and flexible installation

Hideo Nishimura, Naoya Maruyama, Satoshi Matsuoka

CCGRID 2007: SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID 549 - + 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
High-performance MPI broadcast algorithm for grid environments utilizing multi-lane NICs Reviewed

Tatsuhiro Chiba, Toshio Endo, Satoshi Matsuoka

CCGRID 2007: SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID 487 - + 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Design and Implementation of a Local Scheduling System with Advance Reservation for Co-allocation on the Grid Reviewed

Hidemoto Nakada, Atsuko Takefusa, Katsuhiko Ookubo, Makoto Kishimoto, Tomohiro Kudoh, Yoshio Tanaka, Satoshi Sekiguchi

Proceedings of CIT2006 2006.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

While advance reservation is an essential capability for co-allocating several resources on Grid environments, it is not obvious how it can co-exist with priority-based First Come First Served scheduling, that is widely used as local scheduling policy today. To investigate this problem, we 1) developed a scheduling API in Java for TORQUE, a variant of OpenPBS, that enables users to implement their own schedulers and replace the original scheduling module with them, 2) implemented a prototype scheduler module that has advance reservation capability with the API. We also provide an external interface for the reservation capability based on WSRF to enable co-allocation of resources over the Grid. Using this interface with the job submission module from Globus toolkit 4, users can make reservation for resources and submit jobs over the Grid.

DOI： 10.1109/CIT.2006.71

researchmap
Interactive beautification: A technique for rapid geometric design Reviewed

Takeo Igarashi, Satoshi Matsuoka, Sachiko Kawachiya, Hidehiko Tanaka

SIGGRAPH 2006 - ACM SIGGRAPH 2006 Courses 2006.7

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Association for Computing Machinery, Inc

DOI： 10.1145/1185657.1185769

Scopus

researchmap
G-lambda: Coordination of a Grid Scheduler and Lambda Path Service over GMPLS Reviewed

Atsuko Takefusa, Michiaki Hayashi, Naohide Nagatsu, Hidemoto Nakada, Tomohiro Kudoh, Takahiro Miyamoto, Tomohiro Otani, Hideaki Tanaka, Masatoshi Suzuki, Yasunori Samejima, Wataru Imajuku, Masahiko Jinno, Yoshihiro Takigawa, Shuichi Okamoto, Yoshio Tanaka, Satoshi Sekiguchi

Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications 22 ( 2006 ) 868 - 875 2006.6

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.future.2006.03.005

Scopus

researchmap
MegaProto/E: Power-Aware High-Performance Cluster with Commodity Technology Reviewed

Taisuke, Boku, Mitsuhisa, Sato, Daisuke, Takahashi, Hiroshi, Nakashima, Hiroshi, Nakamura, Satoshi, Matsuoka, Yoshihiko, Hotta

Proc. 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006), The Second Workshop on High-Performance, Power-Aware Computing (HP-PAC 2006) 2006.4

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE Computer Society

researchmap
Profile-based Optimization of Power Performance by using Dynamic Voltage Scaling on a PC cluster Reviewed

HOTTA YOSHIHIKO, SATO MITSUHISA, KIMURA HIDEAKI, MATSUOKA SATOSHI, BOKU TAISUKE, TAKAHASHI DAISUKE

IPSJ SIG Notes 2006 ( 20 ) 139 - 144 2006.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Currently, several of the high performance processors used in a PC cluster have a DVS (Dynamic Voltage Scaling) architecture that can dynamically scale processor voltage and frequency. Adaptive scheduling of the voltage and frequency enables us to reduce power dissipation without a performance slowdown during communication and memory access. In this paper, we propose a method of profiled-based power-performance optimization by DVFS scheduling in a high-performance PC cluster. We divide the program execution into several regions and select the best gear for power efficiency. We propose an optimization algorithm to select a gear using the execution and power profile by taking the transition overhead into account. We have built and designed a power-profiling system, Power Watch. With this system we examined the effectiveness of our optimization algorithm on two types of power-scalable clusters (Crusoe and Turion). According to the results of benchmark tests, we achieved almost 40% reduction in terms of EDP (energy-delay product) without performance impact (less than 5%) compared to results using the standard clock frequency.

CiNii Books

researchmap
Making wide-area, multi-site MPI feasible using Xen VM

Masaki Tatezono, Naoya Maruyama, Satoshi Matsuoka

FRONTIERS OF HIGH PERFORMANCE COMPUTING AND NETWORKING - ISPA 2006 WORKSHOPS, PROCEEDINGS 4331 387 - + 2006

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Teddy: a sketching interface for 3D freeform design.

Takeo Igarashi, Satoshi Matsuoka, Hidehiko Tanaka

11 - 11 2006

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/1185657.1185772

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/siggraph/siggraph2006courses.html#IgarashiMT06
Multi-Replication with Intelligent Staging in Data-Intensive Grid Applications Reviewed

Yuya Machida, Shin'ichiro Takizawa, Hidemoto Nakada, Satoshi Matsuoka

2006 7TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING 88 - + 2006

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICGRID.2006.311002

Web of Science

researchmap
MegaProto: A Low-power and Compact Cluster for High-performance Computing Reviewed

NAKASHIMA HIROSHI, NAKAMURA HIROSHI, SATO MITSUHISA, BOKU TAISUKE, MATSUOKA SATOSHI, TAKAHASHI DAISUKE, HOTTA YOSHIHIKO

46 ( 12 ) 46 - 61 2005.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

MegaProto is a proof-of-concept prototype for our project "Mega-Scale Computing Based on Low-Power Technology and Workload Modeling", implementing our key idea that a millionscale parallel system should be built with densely mounted low-power commodity processors. It also serves as a platform to implement and evaluate our new technologies such as power conscious compilation, highly reliable and high performance networking, highly dependable cluster management, and multi-level scalable parallel programming. The building block of the MegaProto is a 1U-high 19 inch-rack mountable motherboard unit on which 16 low-power, one-dollar note-sized, commodity PC-architecture daughterboards are mounted with a high bandwidth, 2Gbps per processor network based on Gigabit Ethernet. The peak performance of each unit is 14.4GFlops for the first version and will improve to 32.0GFlops in the second version through a processor/daughterboard upgrade. The intra- and inter-unit network bandwidths are 32Gbps and 16Gbps respectively. As for power consumption, the entire unit idles at less than 150W and consumes 300-320W maximum under extreme computational stress; this is comparable to or better than conventional 1U servers comprised of dual high-performance, power hungry processors, while benchmarks exhibit up to 279% superior performance for some NPB programs. This demonstrates that higher performance can be achieved with low-power, densely populated architectures with commodity components.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018370/
MegaProto: A Low-Level and Compact Cluster for High-Performance Computing Reviewed

H., Nakashima, H., Nakamura, M., Sato, T., Boku, S., Matsuoka, D., Takahashi, Y., Hotta

Proc. of HP-PAC05 (in IPDPS2005), Denver CDROM 2005.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
A scalable multi-replication framework for data grid Reviewed

S Takizawa, Y Takamiya, H Nakada, S Matsuoka

2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS 2005 310 - 315 2005

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

Scopus

researchmap

Other Link： http://orcid.org/0000-0002-8901-2504
Parallelization of phylogenetic tree inference using grid technologies

Yo Yamamoto, Hidemoto Nakada, Hidetoshi Shimodaira, Satoshi Matsuoka

Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science) 3370 103 - 116 2005

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-540-32251-1_10

Scopus

researchmap
The second trans-pacific grid datafarm testbed and experiments for SC2003 Reviewed

O Tatebe, H Ogawa, Y Kodama, T Kudoh, S Sekiguchi, S Matsuoka, K Aida, T Boku, M Sato, Y Morita, Y Kitatsuji, J Williams, J Hicks

2004 INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS 602 - 607 2004

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Parallelization of Phylogenetic Tree Inference Using Grid Technologies.

Yo Yamamoto, Hidemoto Nakada, Hidetoshi Shimodaira, Satoshi Matsuoka

103 - 116 2004

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-540-32251-1_10

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/lsgrid/lsgrid2004.html#YamamotoNSM04
A Java-based programming environment for hierarchical Grid: Jojo Reviewed

H Nakada, S Matsuoka, S Sekiguchi

2004 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID - CCGRID 2004 51 - 58 2004

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

Scopus

researchmap

Other Link： http://orcid.org/0000-0002-8901-2504
GridSpeed: A Web-based grid portal generation server Reviewed

Toyotaro Suzumura, Satoshi Matsuoka, Hidemoto Nakada, Henri Casanova

Proceedings - Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, HPCAsia 2004 26 - 33 2004

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/HPCASIA.2004.1324013

Scopus

researchmap
The design and implementation of a fault-tolerant RFC system: Ninf-C Reviewed

Hidemoto Nakada, Satoshi Matsuoka, Yoshio Tanaka, Satoshi Sekiguchi

Proceedings - Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, HPCAsia 2004 9 - 18 2004

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/HPCASIA.2004.1324011

Scopus

researchmap
Autonomous configuration of grid monitoring systems Reviewed

K Shirose, S Matsuoka, H Nakada, H Ogawa

2004 INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS 651 - 657 2004

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

Scopus

researchmap

Other Link： http://orcid.org/0000-0002-8901-2504
Performance of a Deadline-Scheduling Scheme on the Computational Grids Reviewed

TAKEFUSA Atsuko, MATSUOKA Satoshi

The Transactions of the Institute of Electronics,Information and Communication Engineers. 86 ( 9 ) 661 - 670 2003.9

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture Reviewed

TAKEFUSA ATSUKO, TATEBE OSAMU, MATSUOKA SATOSHI, MORITA YOUHEI

44 ( 11 ) 57 - 67 2003.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Data Grid is a Grid environment for ubiquitous access and analysis of large-scale data. Due to its early research status, the performance of petabyte-scale Data Grid models in a realistic data processing setting have not been well investigated. By enhancing our Bricks Grid simulator to be able to simulate Data Grid scenarios, we investigate and compare the performance of different Data Grid models in the Grid Datafarm architecture, mainly categorized into the central and the tier models but with varying scheduling and replication strategies, under realistic assumptions of job processing for the CERN LHC experiments. Our results show the central model is efficient but the tier model with greater amount of resources and speculative class of background replication policies is quite effective and achieves higher performance while each tier being smaller than the central model.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018532/
Worldwide Fast File Replication on Grid Datafarm Reviewed

Osamu Tatebe, Satoshi Sekiguchi, Youhei Morita, Satoshi Matsuoka, Noriyuki Soda

CoRR cs.PF/0306090 2003.6

　More details

The Grid Datafarm architecture is designed for global petascale
data-intensive computing. It provides a global parallel filesystem with online
petascale storage, scalable I/O bandwidth, and scalable parallel processing,
and it can exploit local I/O in a grid of clusters with tens of thousands of
nodes. One of features is that it manages file replicas in filesystem metadata
for fault tolerance and load balancing.
This paper discusses and evaluates several techniques to support
long-distance fast file replication. The Grid Datafarm manages a ranked group
of files as a Gfarm file, each file, called a Gfarm file fragment, being stored
on a filesystem node, or replicated on several filesystem nodes. Each Gfarm
file fragment is replicated independently and in parallel using rate-controlled
HighSpeed TCP with network striping. On a US-Japan testbed with 10,000 km
distance, we achieve 419 Mbps using 2 nodes on each side, and 741 Mbps using 4
nodes out of 893 Mbps with two transpacific networks.

arXiv

researchmap

Other Link： http://arxiv.org/pdf/cs/0306090v1
Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture for High Energy Physics Applications Reviewed

Atsuko Takefusa, Osamu Tatebe, Satoshi Matsuoka, Youhei Morita

Proc. 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12) 34 - 43 2003.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/HPDC.2003.1210014

researchmap
Grid Datafarmにおけるスケジューリング・複製手法の性能評価 Reviewed

竹房あつ子, 建部修見, 松岡聡, 森田洋平

情報処理学会・電気通信処理学会 SACSIS2003シンポジウム論文集 121 - 128 2003.5

　More details

Language：Japanese

researchmap
Building A High Performance Parallel File System Using Grid Datafarm and ROOT I/O Reviewed

Youhei Morita, Hiroyuki Sato, Yoshiyuki Watase, Osamu Tatebe, Satoshi Sekiguchi, Satoshi Matsuoka, Noriyuki Soda, A. Dell'Acqua

CoRR cs.DC/0306092 2003

　More details

researchmap
Ninf-G: A Reference Implementation of RPC-based Programming Middleware for Grid Computing. Reviewed

Yoshio Tanaka, Hidemoto Nakada, Satoshi Sekiguchi, Toyotaro Suzumura, Satoshi Matsuoka

J. Grid Comput. 1 ( 1 ) 41 - 51 2003

　More details

DOI： 10.1023/A:1024083511032

researchmap
Performance Evaluation Model for Scheduling in Global Computing Systems Reviewed

Kento Aida, Atsuko Takefusa, Satoshi Matsuoka, Hidemoto Nakada, Umpei Nagashima

International Journal of High-Performance Computing Applications 14 ( 3 ) 268 - 279 2000.10

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1177/109434200001400308

researchmap
OMPC++ - A portable high-performance implementation of DSM using OpenC plus plus reflection Reviewed

Y Sohda, H Ogawa, S Matsuoka

PARALLEL AND DISTRIBUTED COMPUTING FOR SYMBOLIC AND IRREGULAR APPLICATIONS 316 - 320 2000

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Design Issues of Network Enabled Server Systems for the Grid. Reviewed

Satoshi Matsuoka, Mitsuhisa Sato, Hidemoto Nakada, Satoshi Sekiguchi

Grid Computing - GRID 2000, First IEEE/ACM International Workshop, Bangalore, India, December 17, 2000, Proceedings 4 - 17 2000

　More details

Publisher：Springer

DOI： 10.1007/3-540-44444-0_2

researchmap
Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms Reviewed

Atsuko Takefusa, Satoshi Matsuoka, Hidemoto Nakada, Kento Aida, Umpei Nagashima

Proc. 8th IEEE International Symposium on High Performance Distributed Computing (HPDC-8) 97 - 104 1999.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/HPDC.1999.805287

researchmap
グローバルコンピューティングのためのスケジューリングフレームワーク Reviewed

中田秀基, 竹房あつ子, 松岡聡, 佐藤三久, 関口智嗣

情報処理学会・電気通信処理学会並列処理シンポジウム JSPP'99 論文集 277 - 284 1999.6

　More details

Language：Japanese

researchmap
Enhancing and Porting an Efficient Constraint Solver for Hierarchical Linear Systems

Satoshi Matsuoka, Hiroshi Hosobe

Proceedings of the Workshop of the FY1998 Research Funding Program 2 - 7 1999.3

　More details

Language：Japanese Publisher：AITEC, JIPDEC

researchmap
HiRise : An Incremental Constraint Solver for Constructing Graphical User Interfaces Reviewed

Hosobe Hiroshi, Matsuoka Satoshi, Yonezawa Akinori

Computer Software 16 ( 6 ) 6_549 - 6_561 1999

　More details

Language：Japanese Publisher：Japan Society for Software Science and Technology



DOI： 10.11309/jssst.16.6_549

CiNii Books

researchmap
HiRise: An Incremental Constraint Solver for Constructing Graphical User Interfaces Reviewed

Hiroshi Hosobe, Satoshi Matsuoka, Akinori Yonezawa

Michiaki Yasumura (Ed.), Interactive Systems and Software VI--JSSST WISS'98, Lecture Notes in Software Science 21 73 - 82 1998.12

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：Kindai-Kagaku-Sha

researchmap
広域計算システムのシミュレーションによる評価 - Ninfシステムの広域分散環境でのジョブスケジューリング実現に向けて - Reviewed

竹房あつ子, 合田憲人, 小川宏高, 中田秀基, 松岡聡, 佐藤三久, 関口智嗣, 長嶋雲兵

情報処理学会・電気通信処理学会並列処理シンポジウム JSPP'98 論文集 127 - 134 1998.6

　More details

Language：Japanese

researchmap
Development of an Efficient Solver for Hierarchical Linear Systems

Satoshi Matsuoka, Hiroshi Hosobe

Proceedings of the Workshop of the FY1997 Research Funding Program 4 - 9 1998.3

　More details

Language：Japanese Publisher：AITEC, JIPDEC

researchmap
An Interactive Drawing Editor with Low Cognitive Overload

Kwachiya Sachiko, Igarashi Takeo, Matsuoka Satoshi, Tanaka Hidehiko

Computer Software 15 ( 4 ) 4_296 - 4_306 1998

　More details

Language：Japanese Publisher：Japan Society for Software Science and Technology



DOI： 10.11309/jssst.15.4_296

CiNii Books

researchmap
Ninflet: a migratable parallel objects framework using Java. Reviewed

Hiromitsu Takagi, Satoshi Matsuoka, Hidemoto Nakada, Satoshi Sekiguchi, Mitsuhisa Sato, Umpei Nagashima

Concurrency - Practice and Experience 10 ( 11-13 ) 1063 - 1078 1998

　More details

DOI： 10.1002/(SICI)1096-9128(199809/11)10:11/13<1063::AID-CPE414>3.0.CO;2-1

researchmap
Efficient Satisfaction of Constraint Hierarchies Using Hierarchical Linear Systems (short paper) Reviewed

Hiroshi Hosobe, Satoshi Matsuoka, Akinori Yonezawa

Rikio Onai (Ed.), Interactive Systems and Software V--JSSST WISS'97, Lecture Notes in Software Science 18 129 - 134 1997.12

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：Kindai-Kagaku-Sha

researchmap
Preliminary Evaluation of Scheduling in Ninf: a Global Computing System Reviewed

Satoshi Matsuoka, Hirotaka Ogawa, Atsuko Takefusa, Hidemoto Nakada, Kento Aida, Umpei Nagashima, Mitsuhisa Sato, Satoshi Sekiguchi

Proc. International Workshop on Innovative Architectures '97 1 - 7 1997.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
In Search for an Ideal Computer-Assisted Drawing System Reviewed

Takeo Igarashi, Sachiko Kawachiya, Satoshi Matsuoka, Hidehiko Tanaka

INTERACT'97 ( The Sixth IFIP Conference on Human-Computer Interaction ) Sydney, Australia 104 - 111 1997.7

　More details

Publishing type：Research paper (international conference proceedings) Publisher：Chapman & Hall

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/interact/interact1997.html#IgarashiKMT97
マルチクライアントによるネットワーク数値情報システム Ninfの性能 Reviewed

竹房あつ子, 小川宏高, 松岡聡, 中田秀基, 佐藤三久, 関口智嗣, 長嶋雲兵

情報処理学会・電気通信処理学会並列処理シンポジウム JSPP'97 論文集 273 - 280 1997.5

　More details

Language：Japanese

researchmap
Towards a Parallel C++ Programming Language Based on Commodity Object-Oriented Technologies.

Satoshi Matsuoka, A. Nikami, Hirotaka Ogawa, Yutaka Ishikawa

Scientific Computing in Object-Oriented Parallel Environments(ISCOPE) 81 - 88 1997

　More details

Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/3-540-63827-X_47

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/iscope/iscope1997.html#MatsuokaNOI97
Ninf: A Network Based Information Library for Global World-Wide Computing Infrastructure. Reviewed

Mitsuhisa Sato, Hidemoto Nakada, Satoshi Sekiguchi, Satoshi Matsuoka, Umpei Nagashima, Hiromitsu Takagi

High-Performance Computing and Networking, International Conference and Exhibition, HPCN Europe 1997, Vienna, Austria, April 28-30, 1997, Proceedings 491 - 502 1997

　More details

Publisher：Springer

DOI： 10.1007/BFb0031622

researchmap
Generalized Local Propagation: A Framework for Solving Constraint Hierarchies Reviewed

Hiroshi Hosobe, Satoshi Matsuoka, Akinori Yonezawa

Eugene C. Freuder (Ed.), Principles and Practice of Constraint Programming--CP'96, Lecture Notes in Computer Science 1118 237 - 251 1996.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer-Verlag

researchmap
GIGA:A Pen-Based Constraint Drawing System Reviewed

Sachiko Kawachiya, Takeo Igarashi, Satoshi Matsuoka, Hidehiko Tanaka

Proc. of OZCHI'96 (6th Australian Conference on Computer-Human Interaction) 314 - 315 1996

　More details

researchmap
Efficient Satisfaction of Constraint Hierarchies with Inequalities Reviewed

Hiroshi Hosobe, Satoshi Matsuoka, Akinori Yonezawa

Jiro Tanaka (Ed.), Interactive Systems and Software III--JSSST WISS'95, Lecture Notes in Software Science 12 123 - 132 1995.12

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：Kindai-Kagaku-Sha

researchmap
Adaptive Recognition of Implicit Structure in Human-Organized Layouts Reviewed

Takeo Igarashi, Satoshi Matsuoka, Toshiyuki Masui

Proceedings of Visual Languages '95 51 ( 0 ) 265 - 266 1995.9

　More details

Language：Japanese

CiNii Books

researchmap
A Constraint-Based Approach for Visualization and Animation Reviewed

Shin Takahashi, Satoshi Matsuoka, Ken Miyashita, Hiroshi Hosobe, Akinori Yonezawa, Tomihisa Kamada

Proceedings of the International Workshop on Constraints for Graphics and Visualization 103 - 117 1995.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Stackthreads: An abstract machine for scheduling fine-grain threads on stock CPUs Reviewed

Kenjiro Taura, Satoshi Matsuoka, Akinori Yonezawa

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 907 121 - 136 1995

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Verlag

DOI： 10.1007/BFb0026567

Scopus

researchmap
Locally Simultaneous Constraint Satisfaction Reviewed

Hiroshi Hosobe, Ken Miyashita, Shin Takahashi, Satoshi Matsuoka, Akinori Yonezawa

Alan Borning (Ed.), Principles and Practice of Constraint Programming--PPCP'94, Lecture Notes in Computer Science 874 51 - 62 1994.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer-Verlag

researchmap
Locally Simultaneous Constraint Satisfaction Reviewed

Hiroshi Hosobe, Ken Miyashita, Shin Takahashi, Satoshi Matsuoka, Akinori Yonezawa

Akikazu Takeuchi (Ed.), Interactive Systems and Software I--JSSST WISS'93, Lecture Notes in Software Science 7 49 - 56 1994.9

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：Kindai-Kagaku-Sha

researchmap
Highly efficient and encapsulated re-use of synchronization code in concurrent object-oriented languages Reviewed

Satoshi Matsuoka, Kenjiro Taura, Akinori Yonezawa

Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA 129674 109 - 126 1993.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Association for Computing Machinery

DOI： 10.1145/165854.165874

Scopus

researchmap
A Constraint Solving Algorithm for Real-Time Interaction in User Interfaces

Hiroshi Hosobe, Ken Miyashita, Shin Takahashi, Satoshi Matsuoka, Akinori Yonezawa, Tomihisa Kamada

Proceedings of the 10th JSSST Conference 77 - 80 1993.6

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Implementing Concurrent Object-Oriented Languages on Multicomputers

Akinori Yonezawa, Satoshi Matsuoka, Masahiro Yasugi, Kenjiro Taura

IEEE Parallel and Distributed Technology 1 ( 2 ) 49 - 61 1993.5

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/88.218175

Scopus

researchmap
An Efficient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers

Kenjiro Taura, Satoshi Matsuoka, Akinori Yonezawa

ACM SIGPLAN Notices 28 ( 7 ) 218 - 228 1993.1

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1145/173284.155355

Scopus

researchmap
An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers

Kenjiro Taura, Satoshi Matsuoka, Akinori Yonezawa

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 748 LNCS 402 - 403 1993

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/bfb0018667

Scopus

researchmap
OBJECT-ORIENTED CONCURRENT REFLECTIVE LANGUAGES CAN BE IMPLEMENTED EFFICIENTLY

H MASUHARA, S MATSUOKA, T WATANABE, A YONEZAWA

SIGPLAN NOTICES 27 ( 10 ) 127 - 144 1992.10

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1145/141937.141948

Web of Science

researchmap
OBJECT-ORIENTED CONCURRENT REFLECTIVE LANGUAGES CAN BE IMPLEMENTED EFFICIENTLY Reviewed

H MASUHARA, S MATSUOKA, T WATANABE, A YONEZAWA

OOPSLA '92 CONFERENCE PROCEEDINGS: CONFERENCE ON OBJECT-ORIENTED PROGRAMMING SYSTEMS, LANGUAGES, AND APPLICATIONS 127 - 144 1992

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Object-oriented concurrent reflective architectures

Satoshi Matsuoka, Takuo Watanabe, Yuuji Ichisugi, Akinori Yonezawa

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 612 LNCS 211 - 226 1992

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/3-540-55613-3_11

Scopus

researchmap

▼display all

Books

スパコン富岳の挑戦 GAFAなき日本の戦い方

（ Role： Sole author）

文藝春秋 2022.10 （ ISBN:9784166613663 ）

　More details

researchmap
スーパーコンピュータ

小柳, 義夫, 中村, 宏, 佐藤, 三久, 松岡, 聡

岩波書店 2012.3 （ ISBN:9784000113076 ）

　More details

Total pages：x, 204p Language：Japanese

CiNii Books

researchmap
The road to TSUBAME and beyond

Chapman & Hall Crc Computational Science Series 2007 （ ISBN:9781584889090 ）

　More details

researchmap
The road to TSUBAME and beyond

Chapman & Hall Crc Computational Science Series 2007 （ ISBN:9781584889090 ）

　More details

researchmap
ネットワークアプリケーション

砂原, 秀樹, 知念, 賢一, 中田, 秀基, 松岡, 聡, 後藤, 滋樹

岩波書店 2003.3 （ ISBN:9784000110549 ）

　More details

Total pages：xviii, 196p Language：Japanese

CiNii Books

researchmap
<4> ネットワークアプリケーション岩波講座インターネット

岩波書店 2003

　More details

researchmap
Metalevel architectures and separation of crosscutting concerns : Third International Conference, REFLECTION 2001, Kyoto, Japan, September 25-28, 2001 : proceedings

Reflection, 米沢, 明憲, 松岡, 聡

Springer 2001 （ ISBN:3540426183 ）

　More details

Total pages：xi, 281 p. Language：English

CiNii Books

researchmap
Computing in object-oriented parallel environments : third International Symposium, ISCOPE 99, San Francisco, CA, USA, December 8-10, 1999 : proceedings

ISCOPE (Conference), 松岡, 聡, Oldehoeft, Rodney R., Tholburn, Marydell

Springer 1999 （ ISBN:3540668187 ）

　More details

Total pages：viii, 203 p. Language：English

CiNii Books

researchmap
認知的負荷の少ないインタラクティブ描画方式の提案

コンピュータソフトウェア15-4 1998

　More details

researchmap
ECOOP'97 - object-oriented programming : 11th European Conference, Jyväskylä, Finland, June 9-13, 1997 : proceedings

European Conference on Object-Oriented Programming, Akşit, Mehmet, 松岡, 聡

Springer-Verlag 1997 （ ISBN:3540630899 ）

　More details

Total pages：xi, 529 p. Language：English

CiNii Books

researchmap
Compiling Concurrent Objects for MPPs In Parallel Language and Compiler Research in Japan, Bic, Nicolau and Sato (eds.)

Kluwer Academic Press 1996

　More details

researchmap
Object technologies for advanced software : Second JSSST International Symposium, ISOTAS ʾ96, Kanazawa, Japan, March 11-15, 1996 : proceedings

International Symposium on Object Technologies for Advanced Software, 二木, 厚吉, 松岡, 聡, 日本ソフトウェア科学会

Springer-Verlag 1996 （ ISBN:3540609547 ）

　More details

Total pages：x, 307 p. Language：English

CiNii Books

researchmap
Compiling Concurrent Objects for MPPs In Parallel Language and Compiler Research in Japan, Bic, Nicolau and Sato (eds.)

Kluwer Academic Press 1996

　More details

researchmap
並列オブジェクト指向言語処理系の汎用MIMD型並列計算機上での高効率な実現

米沢, 明憲, 松岡, 聡, 小林, 直樹

米澤明憲 1996

　More details

Total pages：259p Language：English

CiNii Books

researchmap
オブジェクト指向コンピューティング

田中, 克己, 西尾, 章治郎, 米沢, 明憲, 松岡, 聡, 尾内, 理紀夫

近代科学社 1993 （ ISBN:4764902133 ）

　More details

Total pages：3冊 Language：Japanese

CiNii Books

researchmap
並列オブジェクト指向計算におけるReflectionとその応用

米沢, 明憲, 松岡, 聡, 渡部, 卓雄, 一杉, 裕志

米澤明憲 1992

　More details

Total pages：vii,152p Language：English

CiNii Books

researchmap

▼display all

MISC

Efficient FDK Algorithms on SIMD-accelerated Processors

Peng Chen, Mohamed Wahib, shinichiro takizawa, Takahiro Hirofuchi, Ogawa Hirotaka, Satoshi Matsuoka

研究報告ハイパフォーマンスコンピューティング（HPC） 2020-HPC-175 ( 6 ) 1 - 11 2020.7

　More details

Computed Tomography (CT) is a widely used 3D imaging technology that requires compute-intense algorithms to generate volume data (or images). We propose a collection of novel back-projection algorithms that reduce the arithmetic computation and improve data locality. We also implement novel algorithms as efficient back-projection kernels that are performance portable over a wide range of CPUs. Unlike the conventional approaches that use OpenMP and target-specific SIMD intrinsics, we employ a high-level OpenCL implementation to generate the vectorized code and use the OpenCL local memory to prefetch the pixels at sub-pixel precision in a regular memory access fashion. Performance evaluation using a variety of Intel CPUs generations demonstrates that our back-projection implementation runs up to 10 times faster than the multi-threading optimized implementation.

researchmap
A Study of Synchronization Methods in Modern GPUs

Lingqi Zhang, Wahib Mohamed, Haoyu Zhang, Satoshi Matsuoka

IEEE International Parallel & Distributed Processing Symposium 2020 2020.4

　More details

Language：English

GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia's latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of those synchronization methods. This work explores important undocumented features and provides an in-depth analysis of the performance considerations and pitfalls of the state-of-art synchronization methods for Nvidia GPUs. The provided analysis would be useful when making design choices for applications, libraries, and frameworks running on single and/or multi-GPU environments. We provide a case study of the commonly used reduction operator to illustrate how the knowledge gained in our analysis can be useful. We also describe our micro-benchmarks and measurement methods.

researchmap
High resolution Image Reconstruction on Super computers

Chen Peng,Wahib, Mohamed, Takizawa Shinichiro, Matsuoka Satoshi

2020.3

　More details

researchmap
A Software Systolic Array on GPUs

Chen Peng,Wahib, Mohamed, Takizawa Shinichiro, Matsuoka Satoshi

2020.3

　More details

researchmap
A Survey on Coarse-Grained Reconfigurable Architectures From a Performance Perspective

Artur Podobas, Kentaro Sano, Satoshi Matsuoka

IEEE ACCESS 8 146719 - 146743 2020

　More details

Language：English

DOI： 10.1109/ACCESS.2020.3012084

Web of Science

arXiv

researchmap
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA.

Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang 0001, Ryousei Takano, Satoshi Matsuoka

CoRR abs/2008.11421 2020

　More details

The dedicated memory of hardware accelerators can be insufficient to store
all weights and/or intermediate states of large deep learning models. Although
model parallelism is a viable approach to reduce the memory pressure issue,
significant modification of the source code and considerations for algorithms
are required. An alternative solution is to use out-of-core methods instead of,
or in addition to, data parallelism. We propose a performance model based on
the concurrency analysis of out-of-core training behavior, and derive a
strategy that combines layer swapping and redundant recomputing. We achieve an
average of 1.52x speedup in six different models over the state-of-the-art
out-of-core methods. We also introduce the first method to solve the
challenging problem of out-of-core multi-node training by carefully pipelining
gradient exchanges and performing the parameter updates on the host. Our data
parallel out-of-core solution can outperform complex hybrid model parallelism
in training large models, e.g. Megatron-LM and Turning-NLG.

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2008.html#abs-2008-11421
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?

Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang 0001, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

CoRR abs/2010.14373 2020

　More details

Matrix engines or units, in different forms and affinities, are becoming a
reality in modern processors; CPUs and otherwise. The current and dominant
algorithmic approach to Deep Learning merits the commercial investments in
these units, and deduced from the No. 1 benchmark in supercomputing, namely
High Performance Linpack, one would expect an awakened enthusiasm by the HPC
community, too. Hence, our goal is to identify the practical added benefits for
HPC and machine learning applications by having access to matrix engines. For
this purpose, we perform an in-depth survey of software stacks, proxy
applications and benchmarks, and historical batch job records. We provide a
cost-benefit analysis of matrix engines, both asymptotically and in conjunction
with state-of-the-art processors. While our empirical data will temper the
enthusiasm, we also outline opportunities to "misuse" these dense
matrix-multiplication engines if they come for free.

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2010.html#abs-2010-14373
早期終了タイミングを予測する：深層学習における確率勾配の分布の変化点検出

八島慶汰, 石川康太, 佐藤育郎, 野村哲弘, 横田理央, 松岡聡

第22回情報論的学習理論ワークショップ (IBIS 2019) 2019.9

　More details

researchmap
Understanding the Overheads of Launching CUDA Kernels

Lingqi Zhang, Wahib Mohamed, Satoshi Matsuoka

2019.8

　More details

researchmap
Towards Performance Portability and Modernization of FLASH via Transpilation with High-Level Intermediate Representation

Mateusz Bysiek, Saurabh Chawdhary, Mohamed Wahib, Anshu Dubey, Satoshi Matsuoka

2019-HPC-170 ( 30 ) 1 - 9 2019.7

　More details

With concurrent increase in application complexity and hardware heterogeneity, large multiphysics code FLASH faces huge challenges to its continued usability on high performance computing platforms. We are building a novel transpilation framework that relies on high-level intermediate representation to confront this challenge and enable FLASH to adapt to accelerator-based architecture via performance-oriented refactoring. Additionally, we use the framework to modernize code by enabling higher level of abstraction in expressing computations. We evaluate the effectiveness of the tool with respect to speedup obtained relative to original code performance, and also quantify productivity gains.

researchmap
メモリアクセスデータを用いた機械学習によるアプリケーションの類型化

土川稔生, 遠藤敏夫, 野村哲弘, 近藤正章, 大山洋介, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2019-HPC-170 ( 12 ) 1 - 7 2019.7

　More details

Language：Japanese

researchmap
Breaking the Limitation of GPU memory for Deep Learning

Haoyu Zhang, Wahib Mohamed, Lingqi Zhang, Yohei Tsuji, Satoshi Matsuoka

2019-HPC-170 ( 10 ) 1 - 7 2019.7

　More details

GPU memory can be insufficient for Deep Learning workloads with respect to the model and dataset sizes. Although model parallelism could help, significant modification of the code is needed for every case. An alternative general solution to this problem is to use out-of-core methods. Recent work proposed data-swapping and CUDA Unified Memory (UM) methods to break the limitation of GPU memory capacity. However, there is a lack of detailed analysis, via performance modeling, of the behavior and limitations of those methods. In this paper we analyze the behavior in terms of both single layer and the whole model. as well as propose a performance model based on the analysis to study how out-of-core training behaves and hence empower the co-design process for Deep Learning workloads.

researchmap
DNNの汎化の解明に向けた学習過程における勾配データの解析

八島慶汰, 石川康太, 佐藤育郎, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2019-HPC-170 ( 7 ) 1 - 5 2019.7

　More details

近年，Deep Neural Network (DNN) を用いた深層学習は画像認識や自然言語等の多くの分野において優れた結果を残している．その中でも SGD を用いた学習メカニズムと未知データに対する汎化性能との関連性については未解明な部分が多く存在している．私達は学習過程において学習データから得られる Fisher 情報行列の固有値や勾配データの解析を行うことで，これまでに汎化の指標であると考えられてきた Fisher 情報量行列の固有値の値は不安定であるということを実験的に示した．また，その実験から勾配の外れ値や分布と汎化性能が関連しているのではないのかという仮説をもとに，学習モデルから全訓練データから得られる勾配量の時系列的解析を行った．

researchmap
Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization

Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Marc Snir, Peter Nugent, Brian Van Essen

2019.6

　More details

Language：English

researchmap
Boosting GCN Application with Batched Sparse Matrix Multiplication

Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka

2019.3

　More details

researchmap
Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches?

Jens Domke, Kazuaki Matsumura, Mohamed Wahib, Haoyu Zhang, Keita Yashima, Toshiki Tsuchikawa, Yohei Tsuji, Artur Podobas, Satoshi Matsuoka

2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019) 78 - 88 2019

　More details

Language：English

DOI： 10.1109/IPDPS.2019.00019

Web of Science

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ipps/ipdps2019.html#DomkeMWZYTTPM19
The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface

Hamid Reza Zohouri, Satoshi Matsuoka

PROCEEDINGS OF H2RC 2019: 2019 FIFTH IEEE/ACM INTERNATIONAL WORKSHOP ON HETEROGENEOUS HIGH-PERFORMANCE RECONFIGURABLE COMPUTING (H2RC) 11 - 18 2019

　More details

Language：English

DOI： 10.1109/H2RC49586.2019.00007

Web of Science

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/sc/h2rc2019.html#ZohouriM19
Cloud-based Burst Buffers for I/O Acceleration

Cloud-based Burst Buffers for I, O Acceleration

2018.7

　More details

Cloud computing offers high computational resources, scalability, as well as ease of access. Such cloud environments provide users with virtually unlimited computational resources to run HPC applications at larger scale than what in-house systems can provide. Since large scale data intensive applications typically generate huge amounts of intermediate data and are shared by hundreds and thousands of compute nodes, such applications require high I/O throughput to shared storage. However, current shared storage in cloud environments cannot provide enough I/O throughput for these applications. The low I/O throughput becomes a performance bottleneck and the prolonged execution time incurs more cost to users as most cloud providers employ pay-as-you-go pricing models. Furthermore, the eventual consistency policy adopted by most cloud storages causes multiple-node job failure due to the inconsistent read-after-write. To solve these problems, we propose a cloud-based burst buffer system as a new tier in cloud storage systems. The cloud-based burst buffer system uses computing nodes as burst buffer nodes, and buffers applications' data in the burst buffer nodes. Because throughput between compute nodes is much higher and more stable than shared storage throughput, we can accelerate I/O performance for data intensive applications. Moreover, by maintaining data consistency among burst buffer nodes, we can avoid job failure caused by eventual consistency issue. To explore the effectiveness of cloud-based burst buffers, we implement a prototype and evaluate the system in Amazon EC2/S3. Our experiments reveal that our system can perfectly solve the eventual consistency issue as well as improve performance of a real-world data intensive application by up to 4.5 times as well as reduced monetary cost by 56.3%.

researchmap
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

2018.4

　More details

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used
in deep learning. Specifically, cuDNN implements several equivalent convolution
algorithms, whose performance and memory footprint may vary considerably,
depending on the layer dimensions. When an algorithm is automatically selected
by cuDNN, the decision is performed on a per-layer basis, and thus it often
resorts to slower algorithms that fit the workspace size constraints. We
present {\mu}-cuDNN, a transparent wrapper library for cuDNN, which divides
layers' mini-batch computation into several micro-batches. Based on Dynamic
Programming and Integer Linear Programming, {\mu}-cuDNN enables faster
algorithms by decreasing the workspace requirements. At the same time,
{\mu}-cuDNN keeps the computational semantics unchanged, so that it decouples
statistical efficiency from the hardware efficiency safely. We demonstrate the
effectiveness of {\mu}-cuDNN over two frameworks, Caffe and TensorFlow,
achieving speedups of 1.63x for AlexNet and 1.21x for ResNet-18 on P100-SXM2
GPU. These results indicate that using micro-batches can seamlessly increase
the performance of deep learning, while maintaining the same memory footprint.

arXiv

researchmap

Other Link： http://arxiv.org/pdf/1804.04806v1
HuronFS : Hierarchical, User-level and On-demand Burst Buffer File System

Tianqi Xu, Kento Sato, Satoshi Matsuoka

ISC2018 2018.4

　More details

researchmap
Pushing the Limits for 2D Convolution Computation On CUDA-enabled GPUs

Chen Peng,Wahib, Mohamed, Takizawa Shinichiro, Matsuoka Satoshi

2018-HPC-163 ( 22 ) 1 - 9 2018.2

　More details

The 2D convolution operator is the computational bottleneck in a variety of image processing and machine learning applications. We propose an algorithm to compute convolution by employing register files to cache image data (known as register cache), rather than using the user-managed scratch-pad memory. We take advantage of CUDA's warp shuffle functions to accelerate the intra-warp communication of partial results. Unlike the GEMM-based, FFT-based or Winograd method, our algorithm executes the convolution computation without using any GPU memory as a workspace, and is general to all filter shapes. Our algorithm performs better than state-of-the-art 2D convolution implementations. Using a single TitanXp GPU, it is in average 4.7x faster than NPP (Nvidia Performance Primitives), and 1.8x faster than the highly-optimized ArrayFire library.

researchmap
Efficiently Enlarging GPU Memory Capacity with NVM

Pak Markthub, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, Satoshi Matsuoka

2018.1

　More details

researchmap
機械学習による計算機トレースの自動生成

土川稔生, 大山洋介, 野村哲弘, 松岡聡, 松岡聡

情報処理学会研究報告(Web) 2018 ( HPC-165 ) 2018

　More details

J-GLOBAL

researchmap
大規模データセンター運用最適化フレームワーク構築に向けて

滝澤真一朗, 高野了成, 松岡聡

2017.12

　More details

researchmap
Less is More: Accelerating Deep Neural Networks with Micro-Batching

Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

2017-HPC-162 ( 22 ) 1 - 9 2017.12

　More details

researchmap
OpenCL-Based High-Performance 3D Stencil Computation on FPGAs

Hamid Reza Zohouri, Artur Podobas, Naoya Maruyama, Satoshi Matsuoka

2017.11

　More details

researchmap
A Simulation-Based Analysis on the Configuration of the Burst Buffer

Tianqi Xu, Kento Sato, Satoshi Matsuoka

2017.11

　More details

researchmap
Deep Q-Networkを用いての計算機の制御による電力最適化

寺西賢人, 野村哲弘, 松岡聡

情報処理学会研究報告 017-HPC-158 ( 3 ) 2017.8

　More details

近年のスーパーコンピュータは大量に電力を消費するようになり，実用的なスーパーコンピュータの性能の向上には電力効率が課題となっている．省電力手法としてはCPUの周波数や電圧などの制御による電力の最適化があり，その制御に適した値をパフォーマンスカウンタなどのデータを用いて算出する研究が多く進められている．しかし，既存の研究では各データを詳細に解析する手法を取っており，扱うデータ数の制限や環境の変化による再解析を必要としている．そこで我々は，近年研究が盛んに行われている深層学習を用いて解析をする汎用性が高い制御方法を提案する．特にゲーミングや囲碁のAIなどで使用されているDeep Q-Networkという深層強化学習手法によって計算機を直接制御する装置を実装し，評価する．

researchmap
動的なプロセス数操作による分散深層学習の耐故障性と性能評価

辻陽平, 野村哲弘, 實本英之, 佐藤育郎, 松岡聡

情報処理学会研究報告(Web) (IPSJ Technical Report (Web)) 2017 ( HPC-160 ) 2017.7

　More details

深層学習はその認識精度の高さから研究開発が盛んに行われており,実社会においても深層学習を取り入れた応用技術を目にすることができる。深層学習では十分な認識精度を得るまでに,大量のデータとGPUなどを用いた長時間の計算が必要となる。そのためHPCクラスタなどの高性能計算機での分散処理が利用される。分散システムでは故障発生間隔が短くなる傾向があり,アプリケーションの計算を正しく継続させるために耐故障性の手法が必要になる。本研究では大規模システム上の深層学習において重要になる耐故障性に対して,既存のcheckpoint/restartでない新たな手法detect/respawnを提案し,これをULFM-MPIによって実装した。SPRINTと呼ばれる分散深層学習アプリケーションを用いてTSUBAME-KFCの16ノード(128GPU)上で提案手法と既存手法を比較したところ,10時間の学習において提案手法が2.5%低いエラー率となり,より高い認識精度を達成することができた。(著者抄録)

researchmap
Accelerating Spiking Neural Networks on FPGAs using OpenCL

Artur Podobas, Satoshi Matsuoka

2017-ARC-227 ( 23 ) 1 - 7 2017.7

　More details

Spiking Neural Networks (SNNs) are artificial neural networks inspired by the biological brain. They are used to study everything from various aspects of neuroscience to artificial intelligence and machine learning. SNNs are typically computed using general-purpose processors and the use of custom hardware is still fairly uncommon. Creating custom hardware is often time-consuming and error-prone. However, with the recent maturity in High-Level Synthesis (HLS) tools, algorithms can now be described using more abstract C/C++/Java programming models and automatically be transformed down to custom hardware. In the present work we present our findings and experience in using HLS to accelerate SNNs. We describe our prototype framework and our FPGA design and empirically evaluate its performance against modern general-purpose processors. Our evaluation shows that our design can reach up-to 82% (73% average) of the performance delivered by Intel Xeon E5-2650v3-a CPU that is two years younger and built using better technology than our FPGA platform.

researchmap
人工知能処理向け大規模・省電力クラウド基盤 AI Bridging Cloud Infrastructure (ABCI)の構想

小川宏高, 松岡聡, 佐藤仁, 高野了成, 滝澤真一朗, 谷村勇輔, 三浦信一, 関口智嗣

情報処理学会研究報告 2017-HPC-160 2017.7

　More details

researchmap
メタゲノム解析アプリケーションGHOSTZ-GPUの性能モデリングおよび改善

山川智史, 野村哲弘, 松岡聡

情報処理学会研究報告 2017-HPC-160 2017.7

　More details

researchmap
Prototype Modular Framework for Deep Learning Performance Testing

Aleksandr Drozd, Satoshi Matsuoka

2017.4

　More details

researchmap
ディープラーニングのデータ並列学習における少精度浮動小数点数を用いた通信量の削減

大山洋介, 野村哲弘, 佐藤育郎, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2017-HPC-158 ( 30 ) 1 - 10 2017.3

　More details

Deep Neural Network を用いた学習手法であるディープラーニングは他の機械学習手法と比較して高い認識精度を発揮することから近年非常に重要視されている．一方でディープラーニングはネットワークの計算量や学習に使用するデータ量が膨大であることから GPU クラスタを用いた場合でも学習に非常に長い時間を要する．また，特にパラメータ数が多いネットワークを一定のミニバッチサイズで学習する場合は勾配の GPU 間・ノード間通信がスケーラビリティのボトルネックとなり，現存する GPU スパコンで利用可能な並列数よりもはるかに小さな規模でしか学習できないことが指摘されている．本論文では単精度よりも更に bit 数の少ない浮動小数点数型を用いた通信量の削減手法を提案する．提案手法では通信するデータを半精度浮動小数点数の上位 8bit により表現し，レイヤーごとに動的に表現範囲を調整することにより高速かつ単精度と比較して学習後の認識精度を大きく損なわない通信を実現する．提案手法は TSUBAME-KFC / DL の 2 ノード (16 GPU) を用いた CaffeNet と GoogLeNet の学習において，既存の単精度浮動小数点型を用いる場合と比較して認識精度を損なわずにそれぞれ 2.71 倍，2.19 倍の高速化を達成した．

researchmap
Evolutionary Power Modeling for Energy Efficiency in CPU-GPU based systems

Patricia Arroba, José M. Moya, José L. Ayala, Satoshi Matsuoka

2017-HPC-158 ( 2 ) 1 - 7 2017.3

　More details

Supercomputers have reached a massive energy consumption due to computational demand, so there is an urgent necessity to keep them on a more scalable curve. In the last years, there has been a rising interest in reducing the power consumption of these systems. Recently research works focus on the adjustment of their power states by reducing clock frequency, applying power capping, and on the analysis of the thermal impact on static consumption. These techniques rely on power models to predict the power consumption of the infrastructure. However, the power consumption in these complex systems involves a vast number of interacting variables of different nature that may include non-linear dependencies. So, extracting the relationships between the most representative parameters and the power consumption requires an enormous effort and knowledge about the problem. We propose an automatic method based on Grammatical Evolution to obtain a model that minimizes the power prediction error of a supercomputer node that incorporates both CPU and GPU devices. We monitor the system during runtime using performance counters and frequency, temperature and power measurements. This evolutionary technique provides both Feature Engineering and Symbolic Regression to infer accurate models, which only depend on the most suitable variables, with little designers expertise requirements and effort. Our work improves the possibilities of deriving proactive energy-efficient policies in supercomputers that are simultaneously aware of complex considerations of different nature.

researchmap
低ランク近似行列によるCNNにおける畳み込み演算の最適化

本山義史, 遠藤敏夫, 松岡聡, 横田理央, 福田圭祐, 佐藤育郎

研究報告ハイパフォーマンスコンピューティング（HPC） 2017-HPC-158 ( 25 ) 1 - 7 2017.3

　More details

機械学習による画像認識の分野において，Convolutional Neural Network (CNN) を用いた優れた認識結果が報告されている．データセットが巨大であるため，学習には非常に大きな時間がかかり，また，必要となるメモリ量は大きくなる．そこで我々は，DL の計算におけるメモリ量の削減を図るため，畳み込みの演算の約 7 割を占める行列積計算に対し，低ランク近似行列を用いることを提案する．CNN アプリケーション中の行列に対し，SVD と階層型行列を適用し，評価した．特に，SVD を用いた時，圧縮率と精度とのトレードオフにおいて，認識精度をほとんど落とさず，サイズが特に大きい image 行列で最大約 9 割程のメモリ量削減に成功した．

researchmap
Assessing the Interference Between Internode Communication and Network I/O Traffic

Kevin Brown, Nikhil Jain, Abhinav Bhatele, Alfredo Gimenez, Kathryn Mohror, Satoshi Matsuoka, Martin Schulz

2017-HPC-158 ( 11 ) 1 - 6 2017.3

　More details

Parallel file systems are used by supercomputers to support a range of applications that require concurrent access to high-performance shared storage for data workflow and resilience. The design of most of these systems result in the logical storage network sharing the same physical network infrastructure that is used for inter-process communication. Resource sharing in this manner on shared systems is a potential area of contention, which can be significant for communication and I/O intensive applications. We assess the interference caused by inter-process communication on the I/O throughput to parallel file system when they both traffic share the same network resources. For our experiments, we used miranda io and IOR I/O benchmarks for generating I/O traffic, and we used pF3D FFT kernel and NBP FT MPI benchmark for generating inter-node communication traffic. Our preliminary results from running I/O and communication benchmarks simultaneously indicate that inter-process communication does not affect the performance of typical I/O workloads.

researchmap
Predicting Probabilistic Parameters of a Large-Scale Asynchronous SGD Deep Learning System

Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, Yukimasa Tamatsu, Satoshi Matsuoka

2017.2

　More details

researchmap
Towards Making Legacy HPC Codes Maintainable: Two-Way Fortran-Python Transpilation with Python Type Hints (Unrefereed Workshop Manuscript)

Mateusz Bysiek, Aleksandr Drozd, 松岡聡

2016-HPC-157 ( 9 ) 1 - 10 2016.12

　More details

We propose a method of accelerating Python code by just-in-time compilation leveraging type hints mechanism introduced in Python 3.5. In our approach performance-critical kernels are expected to be written as if Python was a strictly typed language, however without the need to extend Python syntax. This approach can be applied to any Python application, however we focus on a special case when legacy Fortran applications are automatically translated into Python for easier maintenance. We developed a framework implementing two-way transpilation and achieved performance equivalent to that of Python manually translated to Fortran, and better than using other currently available JIT alternatives (up to 5x times faster than Numba in some experiments).

researchmap
ディレクティブによる時空間ブロッキングの自動適用

黒田勝汰, 遠藤敏夫, 松岡聡

情報処理学会研究報告(Web) (IPSJ Technical Report (Web)) 2016 ( HPC-157 ) 2016.12

　More details

ステンシル計算向けのループ最適化である時空間ブロッキングは非常に高い効果があるが,ループの制御が複雑になるためプログラミングコストが大きく,汎用的な最適化ではない。そのためループ変換ツールやステンシル向けDSLコンパイラの機能として実装されてきた。しかし,これらはパラメータ設定の柔軟性や対象プログラムの大幅な書き換えが必要という点で問題を抱えている。そこで,我々はディレクティブによる時空間ブロッキングの適用を提案する。いくつかの条件を満たすループにディレクティブにより指定されたパラメータで時空間ブロッキングを適用するツールを実装した。ステンシルベンチマークを用いて提案システムの性能改善効果とプログラミングコストを評価する。(著者抄録)

researchmap
Fast Sparse General Matrix-Matrix Multiplication on GPU with Low Memory Usage

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

2016.11

　More details

Sparse general matrix-matrix multiplication (SpGEMM) is one of the key kernel of preconditioner such as algebraic multigrid method or graph algorithms. The performance of SpGEMM is quite low because of its random memory access to both input and output matrices. Moreover, the pattern of non-zero elements of resulting matrix is not known beforehand, which makes it hard to manage the memory usage. There are several GPU implementations of fast SpGEMM computation while consuming large temporal memory. We devise new SpGEMM algorithm requiring small amount of memory so that we can compute larger matrices using limited device memory of GPU. Accesses to input matrices are optimized for coalesced memory access. We devise efficient hash table on shared memory to calculate output matrix with appropriate case analysis for better load-balancing. Our algorithm achieves speedups of up to x4.0 in single precision and x3.3 in double precision compared to existing fast SpGEMM libraries.

researchmap
I/O分割による遅延隠蔽を取り入れたOut-of-coreなGPU Set Intersectionの性能評価

佐藤仁, 溝手竜, 松岡聡, 小山宏高

2016.8

　More details

researchmap
メモリ使用量を抑えた疎行列疎行列積計算のGPU高速化

長坂侑亮, 額田彰, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2016-HPC-156 ( 15 ) 1 - 9 2016.8

　More details

AMG 法など反復解法の前処理において用いられる疎行列疎行列積計算は，ランダムなメモリアクセスによって性能向上が困難であることに加え，出力される行列の非ゼロ要素配置が計算開始時には不明であるという特徴を持つ．GPU での高速化を目的とした既存のアルゴリズムでは，実際に出力行列に必要となるメモリ使用量と比べて多大なメモリを要するため，適用可能な行列が制限されている．適切な場合分けとシェアードメモリの活用によってメモリの使用量を抑えることで広範な行列に対して適用可能であり，かつ更なる高性能化を実現する GPU での疎行列疎行列積計算手法を提案する．様々な特性を持つ 12 個の行列に対して Maxwell 世代 GPU にて性能評価を行い，既存の疎行列計算ライブラリから単精度で最大 4.77倍，倍精度で最大 3.84 倍の性能向上を達成した．

researchmap
Towards Understanding HPC-Big Data Convergence Using Cloud Platforms

Shweta Salaria, Kevin Brown, Hideyuki Jitsumoto, Satoshi Matsuoka

2016-HPC-155 ( 2 ) 1 - 5 2016.8

　More details

The path to HPC-big data convergence has resulted in numerous researches that demonstrate the performance-cost tradeoff between running applications on supercomputers and cloud platforms. Previous studies typically focus on either scientific HPC benchmarks or a specific cloud configuration, failing to consider all the opportunities offered by cloud platforms. We present a comparative study of the performance of representative big data benchmarks, or ”Big Data Ogres”, and HPC benchmarks running on supercomputer and cloud. Our work distinguishes itself from previous studies in a way that we explore multiple cloud configurations: Shared, Dedicated and Spot Instances. Our results provide a more comprehensive performance-cost trade-off, thereby highlighting the gap that needs to be bridged to attain HPC-big data convergence.

researchmap
データレイアウト最適化指示文によるOpenACCアプリケーションの高速化

情報処理学会研究報告 2016-HPC-155 ( 9 ) 2016.8

　More details

researchmap
学習条件を考慮した大規模非同期ディープラーニングシステムの性能モデリング

大山洋介, 野村哲弘, 佐藤育郎, 西村裕紀, 玉津幸政, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2016-HPC-155 ( 5 ) 1 - 9 2016.8

　More details

機械学習による画像認識において Convolutional Neural Network (CNN) と大規模なデータセットを用いた高い認識結果が報告されている．CNN の学習にはミニバッチ Stochastic Gradient Descent (SGD) と呼ばれる最適化手法が広く用いられるが，不適切なミニバッチサイズ下では認識性能が悪化することが知られている．SGD を高速化するために GPU での CNN の計算とパラメータの更新を非同期に行う非同期 SGD が提案されているが，ミニバッチサイズが動的に定まることからノード数等の学習条件の最適値は明らかではない．本論文では非同期 SGD で CNN の学習を行うシステム SPRINT の性能モデルを提案する．この性能モデルは CNN の構造とマシン性能・構成を入力とし，データセット全体を学習に使用する時間と平均ミニバッチサイズを予測する．TSUBAME-KFC/DL の 1～16 ノードを用いた評価では複数の CNN 構造について学習時間と平均ミニバッチサイズの平均予測誤差は 8%以下だった．また，2 つの異なるマシン上である平均ミニバッチサイズの範囲内で学習時間が最短となる学習条件を探索したところ，モデルが予測した順位は実測での順位と一致した．

researchmap
ポストムーア時代におけるFLOPSからBYTESへの変革

松岡聡, 天野英晴, 中島研吾, 井上弘士, 工藤知宏, 丸山直也, 田浦健次, 岩下武史, 片桐孝弘, 塙敏博, 遠藤敏夫

情報処理学会研究報告HPC-155-2016 2016.8

　More details

researchmap
Evaluating tolerance of applications against realistic DRAM faults

Yuya Kobayashi, Hideyuki Jitsumoto, Akihiro Nomura, Satoshi Matsuoka

2016.6

　More details

researchmap
Training Condition Conscious Performance Modeling of an Asynchronous Data-Parallel Deep Learning System

Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, Yukimasa Tamatsu, Satoshi Matsuoka

2016.6

　More details

researchmap
大規模グラフ処理ライブラリScaleGraphのout-of-coreメモリ拡張

岩渕圭太, 佐藤仁, 松岡聡

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 2016 56 - 56 2016.5

　More details

researchmap
Reducing Remote GPU Execution’s Overhead with mrCUDA

Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

2016.4

　More details

researchmap
GPUアクセラレータと不揮発性メモリを考慮した大規模分散ソート

社本秀之, 佐藤仁, 松岡聡

情報処理学会研究報告 2015-HPC-154 2015-HPC-154 2016.4

　More details

researchmap
Towards Understanding the Performance of FPGAs using OpenCL Benchmarks

Hamid Reza, Zohouri Naoya Maruyama Aaron, Smith Motohiko Matsuda, Satoshi Matsuoka

HiPEAC Workshop on Reconfigurable Computing 2016.3

　More details

We evaluate the performance of a subset of the benchmarks available in the Rodinia Suite, using Altera’s OpenCL SDK and the Terasic DE5-Net FPGA board, equipped with an Altera Stratix V GXA7 FPGA, and present timing and power estimation results and comparison with a modern CPU and GPU. The results are presented for multiple versions of each benchmark, each with a varying degree of optimization for FPGAs, ranging from direct ports from the initial OpenCL implementation to loop-pipelined kernels specifically optimized for FPGAs. Our results show that, while it is possible to use a common programming language available for other more-widely used accelerators in HPC, the implementation method optimal for FPGAs is significantly different from those for other accelerators such as GPUs. Specifically, we find that multithreaded kernels typically used for GPUs do not perform as efficiently as those optimized with FPGA-specific optimizations such as sliding windows. However, by exploiting the FPGA-specific optimizations, FPGA with OpenCL shows promising performance. Our results using the Altera Stratix V 5SGXA7 FPGA indicate that, with FPGA-specific optimizations, it is possible to achieve up to 3.9x better power efficiency in comparison to an Nvidia K20C GPU.

researchmap
GPU-Accelerated Large-scale Distributed Sorting Coping with Device Memory Capacity

Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato, Satoshi Matsuoka

IEEE Transactions on Big Data Volume 1 ( Issue 1 ) 57 - 69 2016.3

　More details

Publisher：IEEE

Splitter-based parallel sorting algorithms are known to be highly efficient for distributed sorting due to their low communication complexity. Although using GPU accelerators could help to reduce the computation cost in general, their effectiveness in distributed sorting algorithms remains unclear. We investigate applicability of using GPU devices to the splitter-based algorithms and extend HykSort, an existing splitter-based algorithm by offloading costly computation phases to GPUs. To cope with the volumes of data exceeding the GPU memory capacity, out-of-core local sort is used with small overhead about 7.5 percent when the data size is tripled. We evaluate the performance of our implementation on the TSUBAME2.5 supercomputer that comprises over 4,000 NVIDIA K20x GPUs. Weak scaling analysis shows 389 times speedup with 0.25 TB/s throughput when sorting 4 TB of 64 bit integer values on 1,024 nodes compared to running on one node; this is 1.40 times faster than the reference CPU implementation. Detailed analysis however reveals that the performance is mostly bottlenecked by the CPU-GPU host-to-device bandwidth. With orders of magnitude improvements announced for next generation GPUs, the performance boost will be tremendous in accordance with other successful GPU accelerations.

researchmap
Linguistic Regularities from Multiple Samples

Research Reports on Mathematical and Computing Sciences. Ser. C, Computer Science ( 283 ) 1 - 6 2016.2

　More details

Language：English Publisher：Dept. of Mathematical and Computing Sciences, Tokyo Institute of Technology

researchmap
I/O分割による遅延隠蔽を用いたOut-of-coreなGPU Set Intersectionの性能評価(Unrefereed Workshop Manuscript)

佐藤仁, 佐藤仁, 佐藤仁, 溝手竜, 溝手竜, 松岡聡, 松岡聡, 松岡聡, 小川宏高

情報処理学会研究報告(Web) 2016 ( HPC-155 ) 2016

　More details

J-GLOBAL

researchmap
Distributed Computing for Machine Learning on Large-Scale Image Dataset

佐藤育郎, 渡邉隆太郎, 西村裕紀, 野村哲弘, 松岡聡

Tsubame e-Science Journal 14 2016

　More details

J-GLOBAL

researchmap
Optimizing the Rodinia Benchmark for FPGAs

Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, Satoshi Matsuoka

IPSJ SIG Technical Report 2015-HPC-152 No.16 2015.12

　More details

researchmap
Design and Modelling of Cloud-based Burst Buffers

Tianqi Xu, Kento Sato, Satoshi Matsuoka

2015.11

　More details

researchmap
多段階ブロッキングによるメモリアクセス量削減を図ったGPU向け疎行列ベクトル積計算手法の性能評価

長坂侑亮, 額田彰, 松岡聡

2015.9

　More details

researchmap
mrCUDA: Low-Overhead Middleware for Live Migrating Remote GPU Execution to Local GPU Execution

Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

2015.9

　More details

researchmap
疎行列ベクトル積計算を対象としたGPU向けメモリアクセス削減手法

長坂侑亮, 額田彰, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2015-HPC-151 ( 8 ) 1 - 7 2015.9

　More details

科学技術計算において巨大で疎な問題行列を持つ連立一次方程式を解く際，疎行列ベクトル積計算が実行時間の大部分を占めている．疎行列ベクトル積計算の GPU 向けの高速化も数多く行われてきているものの，疎行列ベクトル積計算がメモリバウンドなカーネルであることや入力ベクトルへのランダムアクセスによって発生する局所性低下等が要因となって性能向上が妨げられている．我々は GPU での疎行列ベクトル積計算時のメモリアクセス量とアクセス頻度を効果的に削減する疎行列フォーマットである AMB（Adaptive Multi-level Blocking）フォーマットを提案する．16bit integer の利用と種々のブロッキング手法によって，列インデックスの圧縮を行い，メモリアクセス量の削減を図っている．Florida 大学の疎行列データセットから選出した 40 個の行列に対して，既存手法との比較を行い，cuSparse と比較して最大で 2.81 倍，平均で 1.77 倍の性能向上を果たし，また，近年提案された高速な疎行列ベクトル積ライブラリである yaSpMV と比較して最大で 1.38 倍，平均で 1.13 倍の性能向上を果たした．

researchmap
ノード内同時実行ジョブにおけるパフォーマンスカウンタによるプロセス毎消費電力のモデル化

寺西賢人, 野村哲弘, 遠藤敏夫, 松岡聡

情報処理学会研究報告 2015.8

　More details

近年のスーパーコンピュータは大量に電力を消費するようになり，実用的なスーパーコンピュータの性能の向上には電力効率が課題となっている．消費電力の効率の良い制御のためにはより詳しい消費電力の計測を行う必要がある．しかし現状ノード毎の消費電力を計測することは可能だが，プロセス毎の消費電力の計測をすることはできない．本論文ではプロセス毎に計測可能なパフォーマンスカウンタを用いて消費電力をモデリングし，同一ノード内で同時にジョブを実行した場合のプロセス毎の消費電力の推定を提案する．作成したモデル式を用いた電力の推定実験を1プロセス時と2プロセス同時実行時についてそれぞれ行い，1プロセス時は最大で誤差5.16%，2プロセス時は計測した組み合わせのうちの84.8%が誤差4%以内となった

researchmap
mrCUDA: A middleware for migrating rCUDA virtual GPUs to native GPUs

Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

2015.8

　More details

researchmap
GPUアクセラレータと不揮発性メモリを考慮した外部ソート

佐藤仁, 溝手竜, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2015-HPC-150 ( 24 ) 1 - 7 2015.7

　More details

GPU アクセラレータと不揮発性メモリを考慮した外部ソートアルゴリズム xtr2sort (extreme external sort) を提案する．GPU の高い演算性能とメモリバンド幅を活かし，不揮発性メモリ，ホストメモリ，デバイスメモリ間のデータ移動に伴う遅延を隠蔽するために，不揮発性メモリ上のソートの対象となるレコードをデバイスメモリの収まるサイズへチャンクに分割し，チャンク毎にパイプラインで不揮発性メモリへの I/O 操作， CPU-GPU 間のメモリ転送，GPU 上でのソート処理を非同期に行うことで，デバイスメモリやホストメモリの容量を超えたサイズのレコードに対しても高速なソートを行う．提案手法を 2-way の Intel Xeon E5-2699 v3 2.30GHz (18コア), NVIDIA Tesla K40 を搭載した 1 台のサーバで評価した結果，Linux Asynchronous I/O(libaio) を用いたノンブロッキング I/O による提案手法の実装において，CPU 上で実行可能なレコード数の 4 倍，GPU 上で実行可能なレコード数の 64 倍となる 25:6 × 109 の int64 t 型の整数値からなるレコードに対し，78,121,548 records/sec で動作し，2 ソケット 72 スレッドで動作させた CPU 版のノンブロッキング I/O による out-of-core ソートと比較して 2.16 倍の性能を示すことを確認した．これらから，GPU アクセラレータを用いた Out-of-core な処理に向けて，不揮発性メモリを組み合わせ I/O のチャンク化と遅延隠蔽を行うことが良好な手法であることが伺える．

researchmap
メモリアクセスパターン依存故障の注入のためのQEMUベース故障注入器

小林佑矢, 實本英之, 野村哲弘, 松岡聡

情報処理学会研究報告(Web) (IPSJ Technical Report (Web)) 2017 ( HPC-160 ) 2015.7

　More details

並列計算機の大規模化で,Silent Data Corruption(SDC)による信頼性低下が懸念されている。SDCは検出が困難な障害で,対応にはコストがかかる。適切な方法を構築・選択するには,故障注入によるオーバーヘッドや耐故障性の評価が重要になる。しかし,これまでの故障注入器はランダムなビットフリップを行うものが多く,ハードウェア特有の故障パターンを再現できない。本研究では実故障の注入を目的として,仮想マシンエミュレータQEMUを拡張し,故障注入器MH-QEMUを作成した。MH-QEMUでは,メモリ状態の変更のみならず,仮想マシンのメモリへのアクセスを検知・処理できるメモリアクセスハンドラ機能を実現した。これによりメモリアクセスパターン依存故障や永続的故障を注入できる。これらの機能のオーバーヘッドは仮想マシン上のワークロードごとに異なり,NAS Parallel Benchmarks(NPB)を用いた場合には,もっともよい場合で実行時間が約20倍で抑えられることを確認した。さらに,NPBのCGカーネルに対し,シングルビットフリップの注入では約100%の割合で計算が正常終了したが,Row-Hammerの注入では,約40%の割合で異常終了が起き,3%の割合でSDCが発生することを確認した。(著者抄録)

researchmap
TSUBAME2におけるジョブスケジューリング効率化への取り組みと検証

野村哲弘, 佐々木淳, 三浦信一, 遠藤敏夫, 松岡聡

情報処理学会研究報告(Web) (IPSJ Technical Report (Web)) 2015 ( HPC-150 ) 2015.7

　More details

スーパーコンピュータの資源利用の効率化のためには,投入されるジョブの情報を正確に把握し,ジョブのスケジューリングを最適化することが重要である。東京工業大学学術国際情報センターのTSUBAME2においても,各種のログ情報・センサー情報を蓄積していたが,蓄積したデータの分析は十分なものではなかった。本報告では,TSUBAMEにおいて行われているユーザの資源指定を正確なものにするための取り組みと,その成果を確認するための各種ログ情報・センサー情報の解析について報告する。(著者抄録)

researchmap
Performance Optimization of Large-Scale Traffic Simulation on Parallel and Distributed Systems

Hiroki Kanezashi, Toyotaro Suzumura, Satoshi Matsuoka

2015.7

　More details

researchmap
計算科学と計算機科学のコデザインのためのミニアプリ（ミニアプリ集FIBERの紹介／アプリケーションのEmpiricalな性能モデル構築のためのプロファイル情報の収集／FIBERミニアプリの性能およびそのモデル化）

丸山直也, 鈴木惣一朗, 三上和徳, 小村幸浩, 滝澤真一朗, 松田元彦, 野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 ( 2015 ) 107 - 108 2015.5

　More details

Language：Japanese

researchmap
GPU搭載システムにおける都市気流シミュレーションの大規模化と性能モデル

高嵜祐樹, 遠藤敏夫, 松岡聡

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2015 ( 13 ) 1 - 8 2015.2

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

GPU 向けステンシル計算の規模は,通常 GPU のメモリ容量に制限されるが,テンポラル・ブロッキングと呼ばれる手法により性能劣化なく大規模化を実現可能である.本研究では,10,000 行を超えるコード規模を持つ GPU クラスタ向けアプリケーションである都市気流シミュレーションの大規模化・高性能維持を実現する手法として,HHRT をテンポラルブロッキングを組み合わせた手法を導入した結果,大規模化に対して,性能劣化とプログラミングコストを抑えることに成功した.本研究では,更なる性能最適化のために,HHRT のスワップデータサイズを削減する手法を提案する.その結果,性能が約 1.3〜1.9 倍向上し,元プログラムの約 19〜85 %の性能を達成した.さらに性能予測モデルの構築により,性能に影響を与えるパラメータの絞り込みを可能にした.

CiNii Books

researchmap
Towards Cloud-based Burst Buffers for I/O Intensive Computing in Cloud

Tianqi Xu, Kento Sato, Satoshi Matsuoka

2015.2

　More details

researchmap
mrCUDA: Low-Overhead Middleware for Transparently Migrating CUDA Execution from Remote to Local GPUs

Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

2015

　More details

researchmap
OpenCLによるFPGAの予備評価

丸山直也, Hamid Reza Zohouri, 松田元彦, 松岡聡

情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC） 2015-HPC-150 2015

　More details

researchmap
Efficient Utilization of Multi-level Memory System for Stencil Computation (Unrefereed Workshop Manuscript)

Tianqi Xu, Guanghao Jin, Toshio Endo, Satoshi Matsuoka

IPSJ SIG Notes 2014 ( 10 ) 1 - 7 2014.12

　More details

Language：English Publisher：Information Processing Society of Japan (IPSJ)

This paper is to efficiently use the multi-level memory system for stencil computation to enable Tera-Scale computation by single GPU. We build a performance model to explain the relationship between different memories and propose a new algorithm to reduce the communication cost between memories and efficiently use the capacity of memories. We evaluated 7 point stencil computation on the multi-level memory system which includes GPU memory, CPU memory and SSD. The evaluation on the real system shows that our algorithm enables the computation on the 23 times bigger domain than GPU memory capacity as well as achieves 5.5 times higher performance than other optimization methods.

CiNii Books

researchmap
HPC and Interactive Big Data Analytics: Case Study of Distributional Semantics

Aleksandr Drozd, Satoshi Matsuoka

IPSJ SIG Notes 2014-HPC-146(12) 2014.10

　More details

researchmap
GPUクラスタ上の実ステンシルアプリケーションの大規模化に向けた局所性向上の評価

高嵜祐樹, 遠藤敏夫, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2014 ( 23 ) 1 - 8 2014.9

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

GPU の登場により，CFD で使われているステンシル計算は性能向上している．しかし，問題サイズの大きさはホストメモリより小さい GPU メモリの容量に制限されてしまっている．この問題に対して，ステンシル計算に対応したメモリアクセスの局所性を向上させる手法による解決策が提案されているが，プログラミングコストの増加が問題となっており，大規模なステンシルアプリケーションへの適応は難しいと考えられる．本研究の目的は，ステンシルアプリケーションにおける高性能化，大規模化，低プログラミングコストの 3 つを実現することである．その実現のために，実局所性向上アルゴリズムと CPU-GPU 間のデータ転送を自動化するメモリスワップランタイムを組み合わせたプログラミングモデルを提案する．本研究では，実ステンシルアプリケーションである都市気流シミュレーションに提案手法を適用し，その性能評価を行った．

CiNii Books

researchmap
大規模分散メモリ環境におけるハイブリッドBFSの最適化

上野晃司, 鈴村豊太郎, 丸山直也, 松岡聡

2014.9

　More details

近年,Web グラフやソーシャルグラフなど大規模なグラフデータが多くあり,大規模グラフ解析への関心が高まっている.本論文では,比較的直径の短いグラフで有効な幅優先探索 (BFS) アルゴリズムであるハイブリッド BFS を,計算ノードが数千〜数万あるような大規模なスーパーコンピュータ上で効率よく計算する手法を提案する.ビットマップを使った疎行列表現や,頂点濃度に応じたデータ構造選択,ボトムアップ探索の並列性を上げることによる効率化を行い,数万ノード規模でのスケーラビリティを得られた.「京」を使った性能評価では,65,536 ノードで 17,997GTEPS の性能を達成し,2014 年 6 月の Graph500 ランキングにおいて「京」は 1 位を獲得した.

researchmap
Increasing GPU batch queue’s utilization using rCUDA

Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

IPSJ SIGTechnical Report 2014-HPC-145 ( 24 ) 2014.7

　More details

In heterogeneous supercomputer, GPU job queue whose nodes compose of multiple GPUs can be under-utilized dueto resource-assignment fragmentation. For example, in the case that each node has three GPUs like TSUBAME2.5, ifa node has already been assigned to a job requesting two GPUs, that node cannot be assigned to another job requestingmore than one GPU until the current job leaves the node.We examine this problem on TSUBAME2.5’s GPU batch-queue system, and present a scheduling algorithm that usesrCUDA to alleviate it. Our simulation shows that the proposed scheduling algorithm can finish all simulated jobson simulated congesting queue by 15% - 30% faster. Moreover, using jobs patterns obtained from scheduler log ofTSUBAME GPU queue, the proposed algorithm shows 5.06% decrease in job life time (from arrives until finishesprocessing) on average. It also shows that even reducing the number of nodes in the queue by around 4% the averagejobs life time is still around the same as the present algorithm

researchmap
GPU間マイグレーションによる効率的な並列実行

鈴木太一郎, 額田彰, 松岡聡

情報処理学会研究報告 Vol.2014-HPC-145(42) 2014.7

　More details

researchmap
Visualizing Collectives over InfiniBand Networks

Kevin Brown, Jens Domke, Satoshi Matsuoka

IPSJ SIG Technical Report 2014-HPC-145 ( 13 ) 2014.7

　More details

As the scale of high performance computing systems increases, optimizing interprocess communicationbecomes more challenging while being critical for ensuring good performance. Furthermore, the hardware layer ab-straction provided by MPI makes it difficult to perform any application optimization that links network utilization withapplication communication. We overcome this barrier by extending the Peruse utility in Open MPI to track networkevents within MPI operations from the application layer. We also develop a non-intrusive profiling library to makeuse of our Peruse enhancement and show how we can use BoxFish with our profiling library to visualize the flow ofapplication traffic over each link within large scale InfiniBand networks. The tool-chain that we describe can be usedwithout any modification to the target application and incurs less than 1% application runtime overhead

researchmap
Towards Cloud Bursting for Extreme Scale Supercomputers

Tianqi Xu, Kento Sato, Satoshi Matsuoka

2014-HPC-145 ( 5 ) 1 - 8 2014.7

　More details

Extreme-scale HPC systems, which consist of a large number of compute nodes, can provide high computational capacity for multiple users. However, computing nodes in the systems occasionally can not meet the demand due to bursty job requests in short period times. In order to accommodate the bursty requests, we consider federating HPC systems with public clouds, which is known as cloud bursting. Although the federated systems can acquire virtually infinite computational power with cloud bursting, the QoS may not be guaranteed due to a significant performance gap between HPC systems and public clouds. The most critical problem is a gap in I/O performance. In this paper, we propose an I/O acceleration technique using distributed cloud bursting buffers. We also create the I/O performance model to explore the effectiveness. Our model-based simulations, which target the TSUBAME supercomputer for an HPC system, and AMAZON EC2 for a public cloud, show that the distributed cloud busting buffer can improve I/O throughput while reducing the cost.

researchmap
実アプリケーションにおけるウェーブレット変換を用いたチェックポイントデータの非可逆圧縮手法

佐々木尚人, 佐藤賢斗, 遠藤敏夫, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2014-HPC-145 ( 7 ) 1 - 8 2014.7

　More details

近年，HPC システムやスーパーコンピュータの規模は急速に拡大しつつあり，それに伴いシステムの平均故障間隔が短縮してしまう傾向にある．また，多くのシステムでは耐故障機能としてチェックポインティングが採用されているが，将来的にチェックポイント時間が平均故障間隔を上回ってしまう可能性があることが問題視されている．そこで，我々はチェックポイント時間を短縮するため，チェックポイントデータの非可逆圧縮手法を提案する．具体的には，チェックポイントデータに対してウェーブレット変換，量子化，符号化に加えてスタンダードな圧縮手法を適用することで非可逆圧縮を行う．本研究ではこの提案手法を気象アプリケーション NICAM のチェックポイント対象データに適用し，発生する誤差，圧縮率，圧縮時間について測定，評価を行った．その結果，特定の条件下で，相対誤差の最大が 5% 以内で，チェックポイント時間を約 70%短縮できることを確認した．

researchmap
Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)

Keisuke Fukuda, Naoya Maruyama, Jeremy S.Meredith, Jeffrey S.Vetter, Satoshi Matsuoka

IPSJ SIG Notes 2014 ( 26 ) 1 - 8 2014.7

　More details

Language：English Publisher：Information Processing Society of Japan (IPSJ)

Hierarchical algorithms are considered to be important in next-generation large scale scientific computing. Such algorithms are typically compute-intensive and have higher communication locality that are beneficial on future supercomputers with much less B/F ratio. However, one of the big challenges of such algorithms is that the data structures and computation/communication patterns are irregular and it is difficult to analyze and predict the performance. In this paper, we introduce a performance modeling method for Fast Multipole Method, a typical example of hierarchical algorithms for N-body problems, using a domain specific performance modeling language Apsen. We show that our modeling scheme can adapt to various particle distributions parameters and provides useful information to application researchers to optimize algorithmic parameters.

CiNii Books

researchmap
OpenACCディレクティブ拡張によるデータレイアウト最適化

星野哲也, 丸山直也, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2014 ( 45 ) 1 - 8 2014.7

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

近年増加傾向にある GPU 等のアクセラレータを搭載した計算環境への既存プログラムの移植方法として，CUDA・OpenCL に代表されるローレベルなプログラミングモデルを用いる方法に対し，ディレクティブベースの OpenACC のようなハイレベルなプログラミングモデルを用いる方法が注目されている．このようなディレクティブベースのプログラミングモデルの利点として，元のプログラムを維持したまま移植を行えるために，デバイス間の機能的な可搬性が高いことがあげられる．しかし現状の OpenACC などの High-level なプログラミングモデルは，スカラプロセッサとメニーコアアクセラレータの得意とするデータレイアウトの相違に対応することが出来ず，異なる性質を持ったデバイス間の性能可搬性に問題がある．そこで本研究では，データレイアウトを抽象化し，異なるデバイス間での性能可搬性を向上させるための OpenACC の拡張ディレクティブを試作し，姫野ベンチマークのデータレイアウトをトランスレーターにより変更し，マルチコア CPU，Intex Xeon Phi，K20X GPU のそれぞれで評価を行った．その結果，オリジナルと同一のデータレイアウトと比較して，Intel Xeon Phi では 27%，K20X GPU では 24%の性能向上が得られることを確認した．

CiNii Books

researchmap
Performance modeling of a tree-based hierarchical N-body algorithm with arbitrary particle distributions

Keisuke Fukuda, Naoya Maruyama, Jeremy S.Meredith, Jeffrey S.Vetter, Satoshi Matsuoka

2014.7

　More details

researchmap
TSUBAME-KFC : the Greenest Supercomputer in the World With Liquid Submersion Cooling

Tsubame ESJ. : e-science journal 11 18 - 23 2014.6

　More details

Language：English

researchmap
TSUBAME-KFC : the Greenest Supercomputer in the World With Liquid Submersion Cooling

Tsubame ESJ. : e-science journal 11 2 - 7 2014.6

　More details

Language：Japanese

researchmap
GPUのキャッシュを考慮した疎行列ベクトル積計算手法の性能評価

長坂侑亮, 額田彰, 松岡聡

情報処理学会研究報告 014-HPC-144 ( 5 ) 2014.5

　More details

researchmap
Lustre 2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Hitoshi Sato, Shuichi Ihara, Satoshi Matsuoka

2014.4

　More details

researchmap
Abstractions for Convergence of Big Data and HPC in Deep Memory Hierarchy Machines

Satoshi Matsuoka, Hitoshi Sato

Workshop on Programming Abstractions for Data Locality (PADAL 2014) 2014.4

　More details

researchmap
自動テンポラルブロッキングによる大規模ステンシル計算の実現

河村知輝, 丸山直也, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2014 ( 32 ) 1 - 6 2014.2

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

偏微分方程式を解く際に差分法を用いるとステンシル計算に帰着する．この計算は高いメモリバンド幅を要求するため GPU を用いることで高速化が可能である．しかし GPU メモリ容量は小さく，大規模な問題を解く際に GPU メモリ容量が制限となってしまう．この問題に対して，テンポラルブロッキングを行うことで性能低下なく GPU メモリ容量以上の大きなドメインを解くことができることを示す先行研究があるが，プログラミングコストが高いという問題を抱えている．そこで，本研究ではこのテンポラルブロッキングをフレームワークに組み込むことで自動最適化を実現した．また，ブロッキング段数などのパラメータの最適値を導出するために性能モデルを構築した．

CiNii Books

researchmap
CPU-GPUそれぞれに最適なデータレイアウトを選択可能にするOpenACCディレクティブ拡張

星野哲也, 丸山直也, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2014 ( 5 ) 1 - 5 2014.2

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

近年増加傾向にある GPU 等のアクセラレータを搭載した計算環境への既存プログラムの移植方法として，CUDA・OpenCL に代表される Low-level なプログラミングモデルを用いる方法に対し，ディレクティブベースの OpenACC のような High-level なプログラミングモデルを用いる方法が考えられる．このようなディレクティブベースのプログラミングモデルの利点として，元のプログラムを壊さずに移植を行えるために，デバイス間の可搬性が高いことがあげられる．しかし現状の OpenACC などのプログラミングモデルは，スカラプロセッサとメニーコアアクセラレータの得意とするデータレイアウトの相違等に対応することが出来ず，異なる性質を持ったデバイス間の性能可搬性に問題がある．そこで本研究では，データレイアウトを抽象化し，異なるデバイス間での性能可搬性を向上させるための OpenACC の拡張ディレクティブを試作し，評価を行った．

CiNii Books

researchmap
Multi-level Temporal Blocking for Stencil Computation for Memory Hierarchy on TSUBAME2.5

Guanghao Jin, Toshio Endo, Satoshi Matsuoka

IPSJ SIG Notes 2014 ( 33 ) 1 - 8 2014.2

　More details

Language：English Publisher：Information Processing Society of Japan (IPSJ)

The domain of the stencil computation is limited by the memory capacity of GPUs on a GPU cluster. As the domain grows to cope with higher accuracy requirements, more GPUs need to be employed to extend the memory capacity. In this paper, we propose new methods which apply temporal blocking method to device memory and registers of a set of GPUs to allow computations on the domain that is bigger than the memory capacity of GPUs while maintaining high performance on TSUBAME2.5. We also analyze the parameters and performance differences between TSUBAME2.0 and TSUBAME2.5 to apply our methods to wide range GPU clusters.

CiNii Books

researchmap
不揮発性メモリを用いたHybrid BFSアルゴリズム

岩渕圭太, 佐藤仁, 溝手竜, 安井雄一郎, 藤澤克樹, 松岡聡

研究報告アルゴリズム（AL） 2014 ( 7 ) 1 - 1 2014.2

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

近年、SNS 解析、道路ネットワークの経路探索、スマートグリッド、創薬、遺伝子解析等の様々な分野で大規模なグラフに対する高速処理が求められているが、従来手法では、妥当な性能を得るためには全てのデータを DRAM 上にロードして実行する必要があり、その結果、DRAM の容量を増設することによる消費電力、価格の面でのコストの増加が問題になっている。そこで、我々は、BFS に対して NVM(不揮発性メモリ) を補助的に利用することで、DRAM の容量を超えるサイズのグラフを性能低下を抑えながら高速に処理する手法を提案し、開発を進めている。現時点で、省電力なビッグデータ処理のランキングである GreenGraph500 (2013 年 11 月) のビッグデータカテゴリのリストで 4 位 (1 ノードでは世界一) を達成した。

CiNii Books

researchmap
Burst SSD Buffer: Checkpoint Strategy at Extreme Scale

Kento Sato, Satoshi Matsuoka, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R.DeSupinski, Naoya Maruyama

IPSJ SIG Notes 2013 ( 19 ) 1 - 9 2013.9

　More details

Language：English Publisher：Information Processing Society of Japan (IPSJ)

Checkpointing is an indispensable fault tolerance technique, commonly used by HPC applications that run continuously for hours or days at a time. However, when checkpointing extreme scale systems, the bursty nature of the I/O pattern of checkpointing overburdens file systems and also causes huge overhead to be added to an application's runtime. In order to alleviate the overhead and achieve fast checkpoint/restart, we propose a highly-resilient mini-SSD-based burst buffer system, and explore a checkpoint strategy on the system based on our checkpointing model.

CiNii Books

researchmap
不揮発性メモリを用いたHybrid-BFSアルゴリズムの最適化と性能解析

岩渕圭太, 佐藤仁, 安井雄一郎, 藤澤克樹, 松岡聡

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2013 ( 3 ) 1 - 9 2013.9

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

近年さまざまな分野で大規模なグラフに対する高速な処理が求められているが,その処理の特性上,妥当な性能を得るためには全てのデータを DRAM 上にロードして実行する必要があり,その結果,DRAM の容量を増設することによる消費電力,価格面でのコストの増加が問題となっている.そこで,Hybrid-BFS アルゴリズムに対して不揮発性メモリを補助的に利用した場合の I/O の最適化,性能低下要因の解析を行うことで性能低下を抑えながら大規模グラフ処理が実行可能かの評価を行った.その結果,一部データを不揮発性メモリに退避することで DRAM 用量が半分の環境において性能低下を 47.1% まで抑えることができた.また,参照され難いエッジデータをさらに退避することで性能の低下を抑えながらより DRAM 使用量が削減可能なことの確認,さらに,性能低下要因の特定とその改善案を示し,性能低下を抑えながら大規模グラフ処理の実現可能性が示唆された.

CiNii Books

researchmap
不揮発性メモリを用いたGraph500ベンチマークの大規模実行へ向けた予備評価

岩渕圭太, 佐藤仁, 安井雄一郎, 藤澤克樹, 松岡聡

先進的計算基盤システムシンポジウム論文集 2013 130 - 131 2013.5

　More details

Language：Japanese

researchmap
大規模ヘテロ型スーパーコンピュータ向けデータ並列処理フレームワークの設計と実装

佐藤仁, 白幡晃一, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2013 ( 24 ) 1 - 7 2013.2

　More details

Language：Japanese

我々は，現在，数千～数万のアクセラレータを搭載したスパコン上でのスケーラブルなデータ並列処理を目指したソフトウェア基盤として Hamar(Highly Accelerated MapReduce) の開発を進めている．本稿では，その初期設計と実装について述べ，アプリケーションとして，MapReduce に基づいた汎用グラフ処理モデルである GIM-V へ適用した事例を述べる．適用した結果，Hamar では，Map，Reduce 処理は CUDA 及び OpenMP で柔軟に記述できることを確認した．また，予備実験として，両実装を 1 台の GPU が搭載された単一計算ノード上で動作させたところ，Map 処理は平均して CUDA 版が OpenMP 版と比較して平均 1.2 倍の性能向上を示し，Reduce 処理は 10 倍以上の性能低下を示した．この構成は，単一計算ノードに GPU 1 台が接続された環境であり，CUDA 版の実装では不利な条件での結果であったものの，更なる大規模計算環境への適用や，性能最適化，自動タスクスケジューリングなどの課題が明らかになった．

CiNii Books

researchmap
不揮発性メモリを用いたGraph500ベンチマークの大規模実行へ向けた予備評価

岩渕圭太, 佐藤仁, 安井雄一郎, 藤澤克樹, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2013 ( 31 ) 1 - 6 2013.2

　More details

Language：Japanese

近年大規模グラフはさまざまな分野で出現しており，DRAM の容量を増設することによる消費電力増加の問題やそもそもシングルノード上の DRAM 容量を超えるグラフも出現している．本研究ではGraph 500 ベンチマークに対して不揮発性メモリを補助的に利用することで性能低下を最小限に押さえながらシングルノード上でできる限り大容量のグラフを扱えるようにすることを目指している．そこでまず本論文ではDRAM に乗りきらない問題サイズを実行するための手法を提案し，DRAM と不揮発性メモリの容量の比率が実行性能にどのような影響を与えるかについての予備評価を行った．

CiNii Books

researchmap
ディレクティブベースプログラミング言語OpenACCの性能評価

星野哲也, 丸山直也, 松岡聡

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 2013 91 - 91 2013.1

　More details

Language：Japanese

researchmap
Extreme Big Data時代に向けたTSUBAMEスーパーコンピュータでの取り組み

佐藤仁, 松岡聡

大学ICT推進協議会年次大会論文集 8p 2013

　More details

Language：Japanese Publisher：[大学ICT推進協議会]

researchmap
Evaluating Resilience Towards Exascale-Tsubame2.0 as an Example-

松岡聡, 佐藤賢斗, 佐藤賢斗, 遠藤敏夫

情報処理学会研究報告(Web) 2013 ( HPC-141 ) 2013

　More details

J-GLOBAL

researchmap
TSUBAME2.0におけるMulti-rail InfiniBandネットワークの性能評価

野村哲弘, 遠藤敏夫, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2012 ( 3 ) 1 - 5 2012.12

　More details

Language：Japanese

TSUBAME2.0 のネットワークは Fat tree トポロジであるものの，大規模実行時に集団通信性能が劣化することが観測されている．本稿では想定される原因としてスイッチ間リンクにおけるパケット衝突とスイッチ間リンクの性能劣化に着目し，それぞれの問題を緩和するネットワーク設定を提示し，バンド幅および集団通信性能への影響を示す．ネットワーク設定の改善の結果，通信の確率的な遅延の発生をほぼなくすことができ，大規模実行時のインジェクションバンド幅において 16.0%～39.5% の性能向上を確認した．

CiNii Books

researchmap
TSUBAME2.0におけるMulti-rail InfiniBandネットワークの性能評価

野村哲弘, 遠藤敏夫, 松岡聡

研究報告計算機アーキテクチャ（ARC） 2012 ( 3 ) 1 - 5 2012.12

　More details

Language：Japanese Publisher：情報処理学会

TSUBAME2.0 のネットワークは Fat tree トポロジであるものの，大規模実行時に集団通信性能が劣化することが観測されている．本稿では想定される原因としてスイッチ間リンクにおけるパケット衝突とスイッチ間リンクの性能劣化に着目し，それぞれの問題を緩和するネットワーク設定を提示し，バンド幅および集団通信性能への影響を示す．ネットワーク設定の改善の結果，通信の確率的な遅延の発生をほぼなくすことができ，大規模実行時のインジェクションバンド幅において 16.0%～39.5% の性能向上を確認した．

CiNii Books

researchmap
A Fast Stencil Computation Method for the Domain to Surpass Memory Capacity of GPU

2012 ( 31 ) 1 - 6 2012.12

　More details

Language：Japanese

CiNii Books

researchmap
TSUBAME2.0におけるMulti-rail InfiniBandネットワークの性能評価

野村哲弘, 遠藤敏夫, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2012 ( 3 ) 1 - 5 2012.12

　More details

Language：Japanese

TSUBAME2.0 のネットワークは Fat tree トポロジであるものの，大規模実行時に集団通信性能が劣化することが観測されている．本稿では想定される原因としてスイッチ間リンクにおけるパケット衝突とスイッチ間リンクの性能劣化に着目し，それぞれの問題を緩和するネットワーク設定を提示し，バンド幅および集団通信性能への影響を示す．ネットワーク設定の改善の結果，通信の確率的な遅延の発生をほぼなくすことができ，大規模実行時のインジェクションバンド幅において 16.0%～39.5% の性能向上を確認した．

CiNii Books

researchmap
動的タスクスケジューリングエンジンStarPUによるKIFMMの実装と性能評価

福田圭祐, 丸山直也, MiquelPericas, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC） 2012 ( 13 ) 1 - 7 2012.9

　More details

Language：Japanese

Fast Multipole Method （FMM）は， N 体問題のアルゴリズムで，近似計算により O(N) の計算量を実現する． FMM は，計算特性が異なり入力データによって負荷が変動する複数の計算ステップから構成される．本研究では，FMM の入力データ（粒子分布）による負荷変動に対して CPU/GPU 間の負荷分散を適切に行うことを目的とする．そのための手法として，動的タスクスケジューリングエンジンを採用し，そのためのライブラリである StarPU 上に Kernel Independent FMM （KIFMM）アプリケーションを実装し，性能を評価した．この実装を，入力データ毎の総当たりによって最適な静的スケジューリングを決定することができる実装と比較した．均一分散では単純なヒューリスティクスを１つ導入することにより静的スケジューリング実装に対して 137.9% ，球表面（不均一）分散においてはヒューリスティクスを用いずに同実装に対して 89.5% の性能を得た．このことから，動的タスクスケジューリングを用いることにより，最適な静的スケジューリング実装に対して競争的なパフォーマンスを発揮しつつ，入力データによる負荷変動に抗して負荷分散を実現することが可能であると言える．

CiNii Books

researchmap
Towards a Dataflow FMM using the OmpSs Programming Model

2012 ( 12 ) 1 - 7 2012.9

　More details

Language：English

CiNii Books

researchmap
Challenges in Green Supercomputing towards 50GFlops/W, PUE<1,100KW/rack, for TSUBAME30 and future Exascale

Matsuoka Satoshi

IEICE technical report. Internet Architecture 112 ( 212 ) 63 - 63 2012.9

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

The current global supercomputing grand-challenge it to achieve exaflop within the 20MW power budget by 2020.This is 25 times the power efficiency of the most power efficient supercomputer in the world, namely BlueGene/Q, and requires not only imovations in memory storage, network, and cooling. Such innovations will not only benefit supercomputing but will have broad impact in the overall IT infrastructure. At Tokyo tech we have been awarded the"Greenest Production Supercomputer in the World" award from the Green500 in 2010 with our Tsubame2.O supercomputer, and currently is designing Tsubame3.O which will be a stepping stone for this drive to exascale.

CiNii Books

researchmap
Evaluation of Portability for a Real-world CFD Application with CUDA and OpenACC

2012 ( 42 ) 1 - 9 2012.7

　More details

Language：Japanese

CiNii Books

researchmap
大規模流体アプリケーションのGPUによる高速化手法の評価

星野哲也, 丸山直也, 松岡聡

先進的計算基盤システムシンポジウム論文集 2012 73 - 74 2012.5

　More details

Language：Japanese

CiNii Research

researchmap
A Multi GPU Implementation of Generalized Graph Processing Model GIM-V with Data Transfer Optimization

2012 ( 34 ) 1 - 8 2012.3

　More details

Language：Japanese

CiNii Books

researchmap
Physis:ヘテロジニアススパコン向けステンシル計算フレームワーク

丸山直也, 野村達男, 佐藤賢斗, 松岡聡

Tsubame e-Science Journal ( 5 ) 2012

　More details

J-GLOBAL

researchmap
【招待講演】TSUBAME2.0との1年間とエクサスケールへの飛翔

松岡聡

研究報告数理モデル化と問題解決（MPS） 2011 ( 1 ) 1 - 1 2011.11

　More details

Language：Japanese

CiNii Books

researchmap
Towards Optimizations of FMM on CPU-GPU Heterogeneous Environments using Dynamic Task Scheduling Runtimes

Keisuke Fukuda, Naoya Maruyama, Satoshi Matsuoka

IPSJ SIG Notes 2011 ( 28 ) 1 - 9 2011.11

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

FMM is an O(N) approximative algorithm for N-body problems and recognized more scalable and promising than other N-body computation methods. Effectively utilizing heterogeneous systmes in FMM, however, is a challenging issue because FMM consists of several phases with different performance characteristics that call for careful load balancing for optimal performance. This paper extends our previous work18) that partially ported the CPU implementation of kifmm3d to CUDA, and presents a complete CUDA implementation. To exploit heterogeneous processing elements, we further extend the implementation with StarPU, which allows dynamic task scheduling on CPU-GPU heterogeneous environments. We have found several technical issues and challenges, such as failing CUDA kernel invocations, phase splitting and implementation of filters, to achieve a good load balancing.

CiNii Books

researchmap
Towards an Asynchronous Checkpointing System

2011 ( 18 ) 1 - 8 2011.11

　More details

Language：English

CiNii Books

researchmap
Operation of TSUBAME 2.0 Green Supercomputer dealing with Power Crisis

Toshio Endo, Satoshi Matsuoka, Akira Nukada, Masamichi Nagasaka, Tadayasu Yotsu

IPSJ SIG Notes 2011 ( 12 ) 1 - 9 2011.11

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We report the operation of TSUBAME2.0 supercomputer dealing with the power crisis caused by the poweful earthquake on March 11, 2011. While saving energy consumtion is and will be the most important issue in design and operation of supercomputers, capping 'peak power consumption' also becomes essential in the power crisis. We report measures taken on operation of TSUBAME2.0 in this summer within the limitation on time and resources, and issues to be solved.

CiNii Books

researchmap
Achievement of Linpack Performance of over 1PFlops on TSUBAME 2.0 Supercomputer

4 ( 4 ) 169 - 179 2011.10

　More details

Language：Japanese

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00078062/
Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index

2011 ( 13 ) 1 - 4 2011.7

　More details

Language：English

CiNii Books

researchmap
Analysis of Workflow Aplication I/O Performonce on Large Parallel File System

2011 ( 40 ) 1 - 8 2011.7

　More details

Language：Japanese

CiNii Books

researchmap
Towards On Demand Hierarchical Data Store for Massive Amounts of Small File Access

2011 ( 27 ) 1 - 8 2011.7

　More details

Language：Japanese

CiNii Books

researchmap
Design of Advanced Software Deployment Infrastructure in HPCI Wide-area Distributed Environment

2011 ( 68 ) 1 - 7 2011.7

　More details

Language：Japanese

CiNii Books

researchmap
Towards GPGPU-Based Large-Scale Fast Graph Processing

2011 ( 14 ) 1 - 8 2011.7

　More details

Language：Japanese

CiNii Books

researchmap
Ultralow-power, high-performance computation

MATSUOKA Satoshi

80 ( 7 ) 579 - 584 2011.7

　More details

Language：Japanese

CiNii Books

researchmap
Optimization of Resource Allocation for Data-intensive Workflow Applications

2010 ( 6 ) 1 - 7 2011.4

　More details

Language：Japanese

CiNii Books

researchmap
Performance Studies with Hadoop in the TSUBAME2.0 Supercomputer

2010 ( 6 ) 1 - 8 2011.4

　More details

Language：Japanese

CiNii Books

researchmap
Optimization of FMM on CPU-GPU heterogeneous environment

2010 ( 6 ) 1 - 8 2011.4

　More details

Language：Japanese

CiNii Books

researchmap
High Performance Large Data Transfer for Inter-Clouds

2010 ( 6 ) 1 - 7 2011.4

　More details

Language：Japanese

CiNii Books

researchmap
Performance Evaluation of TSUBAME 2.0 Heterogeneous Supercomputer with Linpack Benchmark

2010 ( 5 ) 1 - 6 2011.2

　More details

Language：Japanese

CiNii Books

researchmap
Optimization of electric power efficiecy based on model in GPU

2010 ( 2 ) 1 - 6 2010.12

　More details

Language：Japanese

CiNii Books

researchmap
Towards Characteristic-aware Optimization of OpenCL programs on Heterogeneous GPUs

2010 ( 23 ) 1 - 7 2010.12

　More details

Language：Japanese

CiNii Books

researchmap
A Code Generation Framework for Stencil Computations on Large Scale GPU Clusters

2010 ( 7 ) 1 - 9 2010.12

　More details

Language：Japanese

CiNii Books

researchmap
Improving MapReduce Task Scheduling for CPU-GPU Heterogeneous Environments

2010 ( 3 ) 1 - 7 2010.10

　More details

Language：Japanese

researchmap
Resource Federation for e-science by a Point-of-Presence

TAKIZAWA Shin'ichiro, MATSUOKA Satoshi, SATO Hitoshi, HIGASHIDA Manabu, TOMOISHI Masahiko

IEICE technical report. Internet Architecture 110 ( 206 ) 19 - 24 2010.9

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

As an e-Science infrastructure, We propose a network environment where site resources are federated by a point-of-presence named RENKEI-PoP. RENKEI-PoPs are located in sites that provide resources for e-Science, are integrated with site local resources, and relay communications between sites by cooperating with each other using a grid security infrastructure. RENKEI-PoP provides 1) a virtual machine hosting environment that executes e-science infrastructure services and 2) a general-purpose data transfer/sharing environment. We installed RENKEI-PoPs in eight sites in Japan and connected them to SINET 10Gbps network. We show the current RENKEI-PoP system and its network and storage access performance.

CiNii Books

researchmap
The total picture of TSUBAME 2.0

Tsubame ESJ. 1 2 - 4 2010.9

　More details

Language：Japanese

researchmap
POP (Point-of-Presence) Linkage between Computer Centers as an E-Science Infrastructure

TAKIZAWA SHIN'ICHIRO, MATSUOKA SATOSHI, SATO HITOSHI, HIGASHIDA MANABU, TOMOISHI MASAHIKO, JITSUMOTO HIDEYUKI

126 e1 - e8 2010.8

　More details

Language：Japanese

CiNii Books

researchmap
Towards an Automatic Code Generation Framework for Parallel Stencil Computations on GPU Clusters

NOMURA TATSUO, MARUYAMA NAOYA, ENDO TOSHIO, MATSUOKA SATOSHI

126 ( 9 ) I1 - I10 2010.8

　More details

Language：Japanese

CiNii Books

researchmap
Improving MapReduce Task Scheduling for CPU-GPU Heterogeneous Environments

SHIRAHATA KOICHI, SATO HITOSHI, MATSUOKA SATOSHI

126 ( 5 ) E1 - E8 2010.8

　More details

Language：Japanese

CiNii Books

researchmap
MPI-CUDA Applications Checkpointing

TOAN Nguyen, NOMURA Tatsuo, JITSUMOTO Hideyuki, MARUYAMA Naoya, ENDO Toshio, MATSUOKA Satoshi

2010 ( 18 ) 1 - 7 2010.7

　More details

Language：English

CiNii Books

researchmap
Dynamic Optimization for Large Data Broadcast on Clouds

3 ( 2 ) 126 - 137 2010.6

　More details

Language：Japanese

Data-intensive parallel applications on clouds need to deploy large data sets from the cloud's storage facility to all compute nodes as fast as possible. Many optimal broadcast algorithms have been proposed for clusters and grid environments. The most common approach is, for example, to construct one or more optimal spanning trees, which can maximize available bandwith or avoid bottleneck links based on network topology and network monitoring data. Once available bandwidth changes dynamically, however, it is difficult to keep optimal performance. In this paper we focus on Amazon EC2/S3, which is most commonly used clouds, and we propose high performance broadcast algorithms; these algorithms make it possible to broadcast large data from Amazon S3 to multiple Amazon EC2 nodes. The salient features of our algorithms are to construct an overlay network on clouds without network topology information, to optimize node available throughput dynamically, and to increase the download throughput by letting nodes cooperate with each other. As a result, all nodes can download files from S3 quickly, even when the network performance changes while the algorithm is running. We evaluate our algorithms on EC2/S3, and show that they are scalable and consistently achieve high throughput. Both algorithms perform much better than each node downloads all data directly from S3.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00069740/
Auto-Tuning of a Scientific Application on GPU clusters

IPSJ SIG Notes 2009 ( 6 ) 1 - 9 2010.4

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap
Auto-Tuning of a Scientific Application on GPU clusters

WATANABE Yuya, ENDO Toshio, MATSUOKA Satoshi

124 ( 18 ) R1 - R7 2010.2

　More details

Language：Japanese

CiNii Books

researchmap
Power-Aware Task Scheduling on GPU Accelerated Clusters

HAMANO TOMOAKI, NUKADA AKIRA, ENDO TOSHIO, MATSUOKA SATOSHI

124 ( 17 ) Q1 - Q9 2010.2

　More details

Language：Japanese

CiNii Books

researchmap
A Resource Selection Support Expert System for Large-Scale Computing Environments

KOKUBU RIO, SATO HITOSHI, MATSUOKA SATOSHI

124 ( 12 ) L1 - L8 2010.2

　More details

Language：Japanese

CiNii Books

researchmap
Linpack Evaluation on the TSUBAME Supercomputer with Hybrid Accelerators(<Special Topics>GPGPU Computing)

Endo Toshio, Nukada Akira, Matsuoka Satoshi

Bulletin of the Japan Society for Industrial and Applied Mathematics 20 ( 2 ) 117 - 124 2010

　More details

Language：Japanese Publisher：The Japan Society for Industrial and Applied Mathematics

This paper reports Linpack benchmark evaluation on the TSUBAME supercomputer, a large scale hybrid supercomputer equipped with graphics processing units (GPUs) and ClearSpeed SIMD accelerators. With all of about 10,000 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, we have achieved 87 TFlops. This paper also describes our design policy and tuning method that take characteristics of accelerators into account, which are essential to achieve scalability on hybrid supercomputers. The design is significantly different from that of LANL RoadRunner, a hybrid system equipped with Cell processors. We discuss the difference from the viewpoint of system architecture.

DOI： 10.11540/bjsiam.20.2_117

CiNii Books

researchmap
仮想マシン動的再配置による大規模データアクセスの高速化

佐藤賢斗, 佐藤仁, 松岡聡, 松岡聡

情報処理学会シンポジウム論文集 2010 ( 5 ) 2010

　More details

J-GLOBAL

researchmap
MapReduce Implementation on the TSUBAME Supercomputer

SATO HITOSHI, KONISHI FUMIKAZU, YAMAMOTO YASUNORI, TAKAGI TOSHIHISA, MATSUOKA SATOSHI

123 F1 - F7 2009.11

　More details

Language：Japanese

CiNii Books

researchmap
Software Framework for GPU Memory Errors

MARUYAMA NAOYA, NUKADA AKIRA, MATSUOKA SATOSHI

123 ( 8 ) H1 - H6 2009.11

　More details

Language：Japanese

CiNii Books

researchmap
The Efficient Checkpoint based on Erasure Coding with Incremental Method

JITSUMOTO HIDEYUKI, NAKAMURA SYUNSUKE, ENDO TOSHIO, MATSUOKA SATOSHI

122 ( 9 ) I1 - I6 2009.10

　More details

Language：Japanese

CiNii Books

researchmap
Linpack Tuning Method on a Heterogeneous Supercomputer with Hybrid Accelerators

ENDO T.

Proc. Summer United Workshops on Parallel, Distributed and Cooperative Processing, SWoPP2009, Sendai, Aug. 2009 ( 3 ) 1 - 8 2009.10

　More details

Language：Japanese

CiNii Books

researchmap
Correlative Analysis of Performance Counters and Power Consumption on GPUs

2009 ( 3 ) 1 - 5 2009.10

　More details

Language：Japanese

CiNii Books

researchmap
Power-Performance Evaluation of Fault Tolerant Numerics on GPUs

2009 ( 3 ) 1 - 6 2009.10

　More details

Language：Japanese

CiNii Books

researchmap
Auto-Tuning FFT Library for CUDA GPUs

2 ( 3 ) 107 - 115 2009.9

　More details

Language：Japanese

NVIDIA CUDA capable GPUs have extremely high memory bandwidth which benefits memory intensive applications such as FFT. Already there are several implementations of FFT using CUDA but they are optimized for specific transform sizes like powers of two which are suitable for GPU architecture. In this paper, we present our auto-tuning method to generate high performance CUDA kernels for FFTs of varying transform sizes. The optimized kernels outperform not only NVIDIA CUFFT libraries but also many of existing implementations.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00066245/
A Study of MPI Communication on a Next Generation Optical Interconnect

TAKIZAWA Shin'ichiro, ENDO Toshio, MATSUOKA Satoshi

26 ( 3 ) 5 - 19 2009.7

　More details

Language：Japanese

DOI： 10.11309/jssst.26.3_5

CiNii Books

researchmap
Towards Resource Management Considering User's Satisfaction in Large Distributed Computing Environments

KOKUBU Rio, SATO Hitoshi, MATSUOKA Satoshi

IEICE technical report 109 ( 168 ) 19 - 24 2009.7

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Application users on large-scale distributed computing systems are force to select resource parameters for effective application execution, whereas this may degrade the usability of the systems for non-expert users on HPC. Expert systems, which recommend suitable resource selections for the users by considering their demands, solve such situation; however, the demands of the application users are not clear in productive large-scale computing systems. To address this problem, we sampled the actual user's demands for application executions in the TSUBAME system by questionary surveys. Then, we modeled application usage patterns from the surveys. We confirmed the model is adequate for the recommendation of the resource selection.

CiNii Books

researchmap
HPC Application Performance Improvement by a Supplemental Optical Circuit Switching Network

Shinichiro Takizawa, Toshio Endo, SATOSHI MATSUOKA

IPSJ Transactions on Advanced Computing Systems 2 ( 2 ) 110 - 121 2009.7

　More details

Language：Japanese

For large scale HPC systems which consist of many nodes, it will be unfeasible to construct a fully-connected network with high bisection bandwidth due to cost and power consumption, etc. We propose a hybrid network that is composed of an electronic packet switching (EPS) network with low bisection bandwidth and a high bandwidth supplemental optical circuit switching (OCS) network, and communication method on the network. In this network, each node connects to the EPS network with one link and partial nodes also do to the OCS network with another one link. We assign optical pathways to node pairs that are connected to the OCS network and are not in the same EPS switch by considering application's communication pattern. We avoid contentions on the EPS upstream network by letting these nodes relay messages from other nodes. By conducting simulations, we confirmed that our approach can improve the performance of applications which require high bisection bandwidth by connecting only half of nodes to the OCS network. Moreover, performance of all-to-all communication on our system was almost the same as that on fat tree EPS only network.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00060776/
Acceleration of Himeno Benchmark on Multi-node GPU System by Overlapping Communication with Calculation : Over 700 GFLOPS of Sustained Performance is Achieved with 32 GPUs

KATO Toshihiro, AOKI Takayuki, NUKADA Akira, ENDO Toshio, MATSUOKA Satoshi, HASEGAWA Atsushi

120 ( 3 ) C1 - C6 2009.6

　More details

Language：Japanese

CiNii Books

researchmap
Dynamic Estimation of Swap Cost for Reducing Memory Energy

HOSOGAYA YUTO, ENDO TOSHIO, MATSUOKA SATOSHI

IPSJ SIG Notes 182 ( 14 ) 85 - 90 2009.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Recently, memory system is getting one of the most power consuming parts in high performance computers. This is mainly because computers are equipped with larger capacity of DRAM than applications actually need, thus there is an opportunity for reducing power by decreasing the capacity. We have already proposed a system that uses FLASH memory for the swap device, and shown that decreasing DRAM can reduce the energy with some applications, even if it causes page swapping. In such systems, the best capacity of DRAM, which achieves the lowest energy consumption, depends on characteristics of applications and problem sizes, so it is challenging to find such a capacity. We propose an algorithm that monitors the memory accesses while applications are running and optimizes the memory capacity dynamically. Our algorithm assumes that capacity of DRAM system can be controlled dynamically, and estimates energy consumption with all selectable capacities of DRAM. Through our trace driven simulation, we show that the 25% of energy consumption can be reduced with performance loss of 8%.

CiNii Books

researchmap
Migration Optimization Accounting for Similarity of Process Images

YAMASAKI SHOHEI, ENDO TOSHIO, MATSUOKA SATOSHI

IPSJ SIG Notes 182 ( 14 ) 145 - 150 2009.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Demands for migration of large scale jobs are getting stronger on large scale systems for several reasons. For example, jobs may be migrated to different machines to avoid machine maintainance or performance degradation. In many cases, destination and timing should be determined dynamically. For reduction of migration costs of large scale jobs, this work presents an optimization method that utilizes similarity among memory images of parallel processes. In addition to reducing amount of communication, this method has high scalability, since it creates differences of images in parallel. With this method, we evaluated migration costs on a real cluster in detail, with several problem sizes and the number of nodes.

CiNii Books

researchmap
Accelerator Again, - Key for Super Computing -:Light and Shadow of Accelerator Technologies - Mainstream Devices Towards Next-Generation Petascale and Exascale HPC

MATSUOKA Satoshi

IPSJ Magazine 50 ( 2 ) 95 - 99 2009.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00000017/
Performance Evaluation of Software-Based ECC for GPUs

MARUYAMA NAOYA, NUKADA AKIRA, MATSUOKA SATOSHI

IPSJ SIG Notes 2009 ( 14 ) 25 - 30 2009

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

General-Purpose Processing on GPUs (GPGPUs) has rapidly been recoginized as a promissing HPC technology because of GPUs' much higher peak floating-point processing power. However, GPUs have originally been developed for graphics applications, such as 3D games, where reliability is not considered as an important issue as in HPC communities. One notable example is the lack of ECC in graphics memory systems. To improve the reliability of GPUs for HPC applications, we propose a software-based technique to generate and check ECC for graphics memory. Our library-based approache allows for CUDA-based GPGPU applications to be easily extended with ECC-based error checking with little manual intervention. To evaluate the applicability of our approach, we extended two CUDA applications with our ECC libarary: 3-D FFT, matrix multiplication, and an N-body problem. Our performance studies showed that while FFT and matrix multiplication can take up to 300% overhead, the N-body application only incurrs 15% of overhead. These results suggest that software-based ECC would be a promissing approach for computation-intensive applications such as N-body problems.

CiNii Books

researchmap
光サーキットネットワークの補助的利用によるHPCアプリケーション性能向上

滝澤真一朗, 遠藤敏夫, 松岡聡

情報処理学会コンピューティングシステム（ACS） 2 ( 2 ) 110--121 2009

　More details

researchmap
High Performance 3-D FFT in CUDA Environment

1 ( 2 ) 231 - 239 2008.8

　More details

Language：Japanese

CUDA environment, which is supported in latest NVIDIA GPUs, allows data sharing between threads using shared memory, and also provides more flexible memory accesses. We propose a high performance 3-D FFT algorithm for the CUDA environment. Using GeForce 8 series GPUs, we achieved a high performance up to 79.5GFLOPS at 3-D FFT, which is from 3.1 to 3.3 times the performance compared with the performance of CUFFT library 1.1.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018177/
Creating Vital Information Technologies for the Info-plosion Era : Information Explosion Makes Information Systems Explode

MATSUOKA Satoshi

IPSJ Magazine 49 ( 8 ) 904 - 911 2008.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00061025/
Interoperability Testing of NAREGI Grid Middleware for Large-Scale Cooperation

HIGASHIDA Manabu, TOMOISHI Masahiko, SAKANE Eisaku, SATO Hitoshi, YAMANASHI Takeshi, OOBA Junichi, KOBAYASHI Taizo, MIZUTANI Fumiyasu, YAMADA Kiyoshi, TSUDA Tomoko, KONO Takahisa, AIDA Kento, MATSUOKA Satoshi, AOYAGI Mutsumi, SHIMOJO Shinji

IPSJ SIG Notes 109 ( 77 ) 133 - 140 2008.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Four Information Technology Centers in Osaka University, Tokyo Institute of Technology, Kyushu University and Nagoya University deployed NAREGI Grid Middleware to their Supercomputing Resources under real operational scenarios, and tested its interoperability for nation-wide large-scale cooperation with two NAREGI R & D centers: National Institute of Informatics and Institute for Molecular Science. We successfully demonstrated that we're able to formulate virtual organizations with certificates from multiple authorities and manage their grid resources, and also able to submit real grid applications to authorized computing resources with resource reservations, with coordinated execution across multiple meta-schedulers that issue reservation requests independently to potentially a same resource.

CiNii Books

researchmap
Optimization of MPI_Scatter/Gather Algorithm for Grid Environment

CHIBA TATSUHIRO, ENDO TOSHIO, MATSUOKA SATOSHI

IPSJ SIG Notes 116 ( 74 ) 13 - 18 2008.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Many Collective algorithms have been proposed for grid environments, that enable us to construct optimized network topologies and to perform fast collective communications, but they are optimized under the condition that WAN is low and bottleneck bandwidth. However, recent WAN has become much wider and many nodes in LAN are connected with high-speed netoworks, so the previous assumption isn't suitable now. In this paper, we proposed multilane MPI_Scatter/Gather Algorithms to effectively utlize the available WAN and LAN bandwidth. We assumed MPI systems use TCP/IP in low-level communications, and experimentations on an emulated network environment show that proposed multilane collective algorithms achieve higher performance than traditional methods.

CiNii Books

researchmap
HPC Performance Improvement by Supplementing a Small Optical Network

TAKIZAWA SHIN'ICHIRO, ENDO TOSHIO, MATSUOKA SATOSHI

IPSJ SIG Notes 116 67 - 72 2008.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

For future peta-scale HPC systems, it will be unfeasible to construct a fully-connected network with high bisection bandwidth due to cost and power consumption, etc. We propose a network which is composed of an electronic packet switching (EPS) network with low bisection bandwidth and a high bandwidth supplemental optical circuit switching (OCS) network, and a communication methodology where messages are relayed from EPS to OCS and vice versa for MPI applications. In this network, partial nodes connected to the OCS network relay messages from other nodes under the same EPS switch to nodes under other EPS switches. From results of simulations, we confirmed that our approach reduces execution time against EPS only network by 30% at maximum.

CiNii Books

researchmap
Model-based Optimization for Data-Intensive Applications on a Virtual Cluster

SATO KENTO, SATO HITOSHI, MATSUOKA SATOSHI

IPSJ SIG Notes 116 ( 74 ) 25 - 30 2008.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We propose a model-based optimization algorithm that determines virtual machine migration strategies, i.e., which virtual machines should be migrated to which nodes, while minimizing I/O access costs on the assumption that the network bandwidth between nodes and the order, sizes and locations of target files are given. Our algorithm models this problem as a directed acyclic graph, where the vertex represents a location of a virtual machine when target files are accessed, the edge represents a flow of data access that includes a virtual machine migration and a remote I/O access, and the edge weight represents a cost of data access; we solves this problem as a shortest path problem that minimizes overall data access costs of target file accesses. Our simulation-based studies suggest that the proposed algorithm can achieve higher performance than simple techniques, such as ones that never migrate virtual machines or always migrate virtual machines onto the nodes that holds target files.

CiNii Books

researchmap
Parallel Numerical Computation on Multiple GPUs with Self Scheduling

WATANABE YUYA, ENDO TOSHIO, MATSUOKA SATOSHI

IPSJ SIG Notes 179 ( 75 ) 85 - 90 2008.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In high performance computing area, commodity accelerators, especially GPUs attract considerable attention for their superior cost performance. Thus systems with a large number of those accelerators are promising. In such situations, incremental upgrade of systems will cause heterogeneity of accelerators. With the rapid advance of GPU performance, techniques to utilize heterogeneous GPUs effectively will become important. To achieve this goal without knowledge of precise performance of GPUs, we adopt the self scheduling technique for dynamic task distribution. We take the SGEMM, dense matrix multiply computation as target, and have evaluated its performance on a machine with multiple heterogeneous GPUs. The results show that self scheduling achieves 94% performance relative to the ideal speed, which is the sum of those individual speeds.

CiNii Books

researchmap
Access-Pattern and Bandwidth Aware File Replication Algorithm for a Grid File System

SATO HITOSHI, MATSUOKA SATOSHI, ENDO TOSHIO

IPSJ SIG Notes 116 ( 74 ) 211 - 216 2008.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We propose an automated replication algorithm for a grid file system that considers file access frequency and replica maintenance policy, and that allows most of I/O accesses to be performed within given throughput and storage usage thresholds, while simultaneously minimizing replica transfer time. Our algorithm models the replication problem as a combinational optimization problem, where the constraints are derived from the given throughput and storage usage threshold, and various system parameters collected from direct file access monitoring. Our simulated-based studies suggest that the proposed algorithm can achieve higher performance than simple techniques, such as ones that always or never create replicas, while keeping storage usage very low. The results also indicate that the proposed algorithm can perform comparably with manual replica placement.

CiNii Books

researchmap
Power-Saving Task Scheduling on Heterogeneous Environment

HAMANO Tomoaki, ENDO Toshio, MATSUOKA Satoshi

IEICE technical report 108 ( 180 ) 97 - 102 2008.8

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Recent accelerators such as GPUs, which are originally designed as graphics devices, and ClearSpeed SIMD accelerators achieve better cost-performance and watt-performance ratio, while the range of their application is more limited than general CPUs. Thus clusters and supercomputers equipped both with acclerators and general CPUs are becoming popular. In order to optimize power efficiency and throughput on such systems, we will require (1) that each task is compiled so that it can be executed on a CPU or an accelerator, and (2) that tasks are maintained by a task scheduler that is aware of energy consumption. With an assumption that the former is realized, we describe a modeling method of heterogeneous cluster systems. And we propose a task scheduling method that considers property of each task, and evaluate it with simulation.

CiNii Books

researchmap
ソフトウェアECCによるGPUメモリの耐故障性の実現と評価

丸山直也, 松岡聡, 尾形康彦, 額田彰, 遠藤敏夫

電子情報通信学会技術研究報告. DC, ディペンダブルコンピューティング : IEICE technical report 108 ( 181 ) 9 - 15 2008.8

　More details

Language：Japanese Publisher：一般社団法人電子情報通信学会

高い浮動小数点演算性能により、GPUをHPC用途に用いるGPGPUが注目されている。しかし、GPUは本来グラフィックス用途に開発されてきたものであり、HPC用途としては耐故障性に不十分な点が存在する。その一つとして、メモリ誤りの検出、訂正が挙げられる。現状のGPUにはECCを備えたものなく、一般的なHPC計算ノードと比較して信頼性に劣る。我々は、GPUの信頼性向上のために、ソフトウェアによってメモリ誤りの検出、訂正を行う手法を提案する。本手法では、GPGPUアプリケーション中にECCを計算、検査するコードを追加することで、グラフィックスメモリ中のビットフリップなどの誤りを検出、訂正する。提案手法をNvidiaによるC言語拡張CUDA向けにライブラリとして実装し、行列積とN体問題アプリケーションに適用した。両アプリケーションを用いて、ECC計算による性能オーバーヘッドを調査したところ、行列積で最大300%程度,N体問題で15%程度のオーバーヘッドになることを確認し、N体問題のようにメモリアクセス頻度に対して計算量の多いアプリケーションでは比較的小さなオーバーヘッドで実現可能であることを確認した。

CiNii Books

researchmap
An Efficient, Model-based CPU-GPU Heterogeneous FFT Library

1 ( 1 ) 40 - 50 2008.6

　More details

Language：Japanese

General Purpose computing on Graphics Processing Units (GPGPU) is becoming popular in HPC because of it's high peak performance. However, in spite of the potential performance improvements, it might not necessarily perform better than the current high-performance CPUs, especially with recent trends for increases in their number of cores on a single die. This is because the GPU performance can be severely limited by such restrictions as memory size and I/O bandwidth. For this reason, we can expect that performance is improved by using CPU and GPU simultaneously. In heterogeneous environments, we need to find optimal load distribution ratio. We implement a 2D-FFT library that uses heterogeneous CPU-GPU computing resources. To find optimal load distribution ratios, we construct a performance model that predicts execution time of 2D-FFT that captures the respective contributions of CPU vs. GPU. The model parameters are determined by pre-stage performance profiling; based on this, we predict the overall execution time of 2D-FFT for arbitrary problem sizes and load distributions. Preliminary evaluation shows that the performance model can predict the execution time of problem sizes that are 16 times as large as the profile runs with less than 15% error, and that the predicted optimal load distribution ratios have less than 5% error; performance overhead caused by this error is less than 1%. We show that the resulting performance improvement by such heterogeneous parallelization can be 1.19 to 1.55 times compared to using only a CPU core or a GPU.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018188/
衛星観測データの処理と保管のためのストレージシステムの性能評価

谷村勇輔, 山本直孝, 石橋拓也, 田中良夫, 西川武志, 松岡聡, 関口智嗣

情報処理学会シンポジウム論文集 2008 ( 5 ) 27 - 28 2008.6

　More details

Language：Japanese

J-GLOBAL

researchmap
Intelligent data staging with overlapped execution of grid applications

Yuya Machida, Shin'ichiro Takizawa, Hidemoto Nakada, Satoshi Matsuoka

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 24 ( 5 ) 425 - 433 2008.5

　More details

Language：English

DOI： 10.1016/j.future.2007.07.005

Web of Science

J-GLOBAL

researchmap
Intelligent data staging with overlapped execution of grid applications

Yuya Machida, Shin'ichiro Takizawa, Hidemoto Nakada, Satoshi Matsuoka

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 24 ( 5 ) 425 - 433 2008.5

　More details

Language：English

DOI： 10.1016/j.future.2007.07.005

Web of Science

J-GLOBAL

researchmap
情報爆発に対応する耐故障性MPIフレームワークの提案

實本英之, 遠藤敏夫, 松岡聡

全国大会講演論文集 70 ( 0 ) 133 - 134 2008.3

　More details

Language：Japanese

CiNii Books

researchmap
情報爆発時代におけるモデルベース資源選択による高速な仮想クラスタ構築

山崎翔平, 丸山直也, 松岡聡

全国大会講演論文集 70 ( 0 ) 119 - 120 2008.3

　More details

Language：Japanese

CiNii Books

researchmap
Autonomic, Scalable Fault Localization for the Information Explosion Era

MARUYAMA Naoya, MATSUOKA Satoshi

70 ( 0 ) 127 - 128 2008.3

　More details

Language：Japanese

CiNii Books

researchmap
情報爆発時代のグリッドファイルシステム上での大規模データ管理

佐藤仁, 松岡聡, 遠藤敏夫

全国大会講演論文集 70 ( 0 ) 121 - 122 2008.3

　More details

Language：Japanese

CiNii Books

researchmap
MPI Communication Algorithm over an Optical Interconnect for the Information Explosion Era

TAKIZAWA Shin'ichiro, ENDO Toshio, MATSUOKA Satoshi

70 ( 0 ) 137 - 138 2008.3

　More details

Language：Japanese

CiNii Books

researchmap
Supercomputing on Heterogeneous Architecture toward the Information Explosion Era

ENDO Toshio, MATSUOKA Satoshi

70 ( 0 ) 131 - 132 2008.3

　More details

Language：Japanese

CiNii Books

researchmap
Our exneriences at TSUBAME Grid Cluster : Managing the super computer for the information explosion era

NISHIKAWA Takeshi, MATSUOKA Satoshi

70 ( 0 ) 129 - 130 2008.3

　More details

Language：Japanese

CiNii Books

researchmap
Optimization for MPI Collective Operations on Grid Utilizing Multilane Transfer

CHIBA Tatsuhiro, ENDO Toshio, MATSUOKA Satoshi

70 ( 0 ) 135 - 136 2008.3

　More details

Language：Japanese

CiNii Books

researchmap
Environmental-Aware Optimization of MPI Checkpointing Intervals Reviewed

Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka

2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING 326 - 329 2008

　More details

Language：English

DOI： 10.1109/CLUSTR.2008.4663790

Web of Science

researchmap
仮想クラスタを用いたData-Intensive Application実行環境の性能モデル構築と最適化に向けて

佐藤賢斗, 佐藤仁, 松岡聡, 松岡聡

情報処理学会シンポジウム論文集 2008 ( 5 ) 2008

　More details

J-GLOBAL

researchmap
Building A Large-Scale Storage System Using Sun Fire X4500 and Gfarm

TANIMURA YUSUKE, YAMAMOTO NAOTAKA, ISHIBASHI TAKUYA, TANAKA YOSHIO, NISHIKAWA TAKESHI, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

IPSJ SIG Notes 113 ( 122(HPC-113) ) 1 - 6 2007.12

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Sun Fire X4500 Server integrates a four-way x86-64 server and 24TB storage, which may deliver remarkable benefits to the large-scacle data processing applications. In particular, the server architecture is supposed to match a data processing model of Gfarm, which uses file-affinity scheduling. In this paper, system integration for building a storage system using 20 nodes of X4500 and Gfarm is discussed. Configuration of the ZFS/RAID-Z storage pool or UFS is determined so that Gfarm achieves significant performance in I/O throughput and metadata operations. According to the discussions and preliminary experiments, a storage system which has 256.5TB capacity was constructed and the basic performance was measured by benchmarks. The results indicate several issues for building a petabytes-scale storage system with such as the architecture.

CiNii Books

J-GLOBAL

researchmap
Evaluation of the issue of time stamps scalability of the distributed time stamping authority grid on the Internet

NISHIKAWA Takeshi, MATSUOKA Satoshi

IPSJ SIG Notes 112 ( 88 ) 1 - 5 2007.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We have previously proposed a distributed time stamping scheme called "K=L+M among N for G generation" that solved the problem of scalability that both a centralized TSA (Time Stamping Authority) as well as other previous distributed time stamping schemes exhibited, and moreover implemented and tested its viability in issuing one million time stamps per second on a LAN testbed environment which has the low latency and the high bandwidth. To verify the globlal scalability of our approach, we install the distributed time stamping units (TSU) in various locations on the Internet with varying access characteristics, such as the NTT East B-Flets network (regional shared optical 100Mbps best effort) as well as a European WiFi Internet service provider network. There, realistic operational experiment of the distributed time stamping grid system exhibited good scalability in that sufficient number of TSUs distributed on the Internet allows issuance of one million time-stamps per second even if there are the unpredictable network delay and/or the response delay by garbage collection of Java VM, just as was the case under a LAN environment.

CiNii Books

researchmap
Web-Site-Based Partitioning Techniques for Efficient Parallelization of the PageRank Computation

CEVAHIR ALI, MATSUOKA SATOSHI

IPSJ SIG Notes 2007 ( 88 ) 19 - 24 2007.9

　More details

Language：English Publisher：Information Processing Society of Japan (IPSJ)

The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enourmous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. Graph and hypergraph partitioning techniques are widely used for efficient parallelization of matrix-vector multiplications. These techniques suffer from high preprocessing overhead for PageRank algorithm. In this work, we propose Web-site-based partitioning techniques to reduce the preprocessing overhead of Parallel PageRank computation.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00028745/
High-Performance Distributed Solar Computing (?) : Towards a Grid that Computes like Trees

MATSUOKA Satoshi

IPSJ SIG Notes 112 ( 88 ) 61 - 66 2007.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Power-heat dissipation as well as the associated CO_2 emission are becoming serious bottlenecks in scaling of large supercomputers. Indeed a single day's operation of TSUBAME, the fastest supercomputer in Asia-Pacific circa 2007, incurs as much CO_2 emission as an entire Formula-1 race. Instead, the use of photovoltaic power generation is promising to minimize or eliminate the emission altogether. While the traditional methods would incur simple attachment to a power grid, and involve very little effect or merit from grid computing, we actually claim that grids and distributed power generation go hand-in-hand to create a robust and self-sustainable computing infrastructure that could scale to TSUBAME-class applications. For robust operation as a pragmatic operational infrastructure, much continuing research would be required customizing and integrating the results from P2P, autonomic computing, sensor networks, etc.

CiNii Books

researchmap
Evaluation of MPI Communication Performance on Next Generation Optical Interconnect

24 1 - 11 2007.9

　More details

Language：Japanese

CiNii Books

researchmap
Distributed Time-stamping Authority Grid and Analysis of Parameter Dependencies

NISHIKAWA TAKESHI, MATSUOKA SATOSHI

48 ( 13 ) 117 - 126 2007.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Digital time stamping is a technique to prove the existence of a digital data prior to a specific point in time. The centralized time-stamping scheme which is the main stream at present can not stand up to the concentration of numerous time-stamping requests. So, the centralized time-stamping scheme has vulnerability to the distributed DoS (DDoS) attack. Distributed time stamping schemes have been proposed to solve a performance scalability problem such as tolerance to DDoS attack. They still have high cost problems which are caused by a utilization of atomic clock and by audit of trusted third party. In this paper, we define a reliable, a high-performance, a robust, and inexpensive distributed time stamping scheme. It is named "TSA Grid" with (N,K = L+M,G) scheme and its scheme is based on a network of peer-to-peer time-stamping programs managed by administratively independent entities. It solves the cost problem of proposed distributed time stamping schemes. In (N,K = L+M,G) scheme, one time stamp request propagates for G generation to N Time Stamping Units (TSU). In each generation, L time stamps replies from reliable TSU and M time stamps replies from randomly chosen TSU. The G and the L parameters enabled us to expect authorized time of time-stamping. And they also enabled TSU to audit TSU themselves mutually and automatically. We also investigate basic characteristic of parameter dependencies of the TSA Grid.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018232/
Experiments of Distributed Time Stamping Grid on the Internet

NISHIKAWA Takeshi, MATSHUOKA Satoshi

IEICE technical report 107 ( 175 ) 61 - 64 2007.8

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

We have proposed the distributed time stamping scheme named K=L+M among N for G generation which solved the problem that a single point TSA and other distributed time stamping scheme had. Not only we have proposed but also we have implemented the programs. We have shown to issue one million time stamp per second was possible on the LAN environment which have the low latency and the high band width. In this work, time stamping units (TSU) were installed on the NTT East B-flets and on the European WiFi Internet service provider network et.al., and then operational experiment of the distributed time stamping grid system on the Internet was executed. As the result, it was shown that enough number of Time-stamping units on the Internet enables to issue one million time-stamps per second.

CiNii Books

researchmap
Modeling of Virtual Cluster Construction Time and Its Optimization

YAMASAKI Shohei, MARUYAMA Naoya, MATSUOKA Satoshi

IEICE technical report 107 ( 175 ) 65 - 70 2007.8

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

When constructing virtual clusters on grids, randomly selecting nodes from any available computing resources can incur large time overhead in installation time. This is because installation time of each node can vary greatly in heterogeneous grid environments and the total installation time of a virtual cluster is determined by the slowest node. To achieve fast virtual cluster installation, we propose a model-based resource selection policy that chooses a near-optimal node combination of nodes to assemble each cluster. We divide the VM setup process into five steps and generate a model for each step. The time of each step is represented as a linear combination of CPU frequency, disk performance, and package size to install. Experiments using our virtual cluster installer VPC shows that the model-based selection policy is indeed effective, especially when the size of packages to install differs depending on each site. The proposed policy has shown to reduce the installation time by up to 68% compared to the most naive method, 60% to the method considering only CPU frequency, and 58% to the method considering only disk performance.

CiNii Books

researchmap
Anarysis of MPI Applications over Next Generation Optical Interconnect

TAKIZAWA SHIN'ICHIRO, ENDO TOSHIO, MATSUOKA SATOSHI

IPSJ SIG Notes 111 183 - 188 2007.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

For the future tens of thousands of processors systems, it is difficult to construct interconnects which fully connect all nodes with high bandwidth due to cost and power consumption. We propose a network which utilizes both fully-connected low bandwidth electronic packet switched network and optical circuit switched network. Optical network is supplimentally used only when a node communicates with nodes in other packet switches. MPI application runs on this environment in such manner that processes connect to optical circuits forward other processes' messages that cross packet switches, in accordance with a topology constructed from communication pattern. As a result of evaluations, our proposal achieves lower inter-process distance than electronic network.

CiNii Books

researchmap
Node Grouping for Large-Scale Data Management on the Grid

SATO HITOSHI, MATSUOKA SATOSHI, ENDO TOSHIO

IPSJ SIG Notes 111 ( 80 ) 109 - 114 2007.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In parallel computing environments such as HPC clusters and the Grid, data-intensive applications involve large overhead due to the access concentration on files on commonly shared nodes. A grid filesystem with an automatic data management mechanism is one of the solutions to avoid such performance decrease. However, metrics to achieve efficient large scale data management are not clear for a given real grid environment. We federated 5 geographically distributed HPC clusters using a grid filesystem and experimented its various performance metrics of file access on the filesystem. We observed that, although remote access performance of files is affected by inter-node bandwidth, other factors are in place which makes prediction of performance solely based on limited inter-node information such as RTT or network bandwidth difficult, and that even for local file access, performance difference could be an order of magnitude depending on file access patterns due to access contentions.

CiNii Books

researchmap
Proposal and Evaluation of a FFT Library That Uses CPU and GPU Together

OGATA YASUHIKO, ENDO TOSHIO, MATSUOKA SATOSHI

IPSJ SIG Notes 111 ( 80 ) 13 - 18 2007.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

General purpose computation on graphics processing units (GPGPU) is becoming popular in HPC field, in expectation of excellent peak performance of GPUs. Their effective performance is, however, not so far from that of recent multi-core CPUs. Therefore we can expect to improve performance by using GPUs and CPUs cooperatively. One of the key challenges in such heterogeneous environments is to determine optimal load balancing ratio among processors. It depends not only on characteristics of target computation and processors, but also on problem sizes. Our approach is to construct a performance prediction model that covers computational cost and data transfer cost of target computation. We train the model with a small number of test runs to determine model parameters. Then we use the model to obtain optimal load balancing ratio for arbitrary problem sizes. According to this approach, we have implemented a two-dimensional FFT library for heterogeneous environments and constructed its performance model. We have evaluated accuracy of our model by comparing prediction and real performance on a heterogeneous system with a GeForce8800GTX GPU and a Core2Duo CPU. After training the model with test runs of 512^2 FFT, we have evaluated larger (up to 8192^2) problem sizes. The results show that our model succeeds to predict the optimal load balancing ratio within 5% accuracy, while prediction errors in execution time are 15% or less.

CiNii Books

researchmap
Evaluation of Power Saving of Parallel Applications with Next Generation Low Power Memory

HOSOGAYA YUTO, ENDO TOSHIO, MATSUOKA SATOSHI

IPSJ SIG Notes 174 49 - 54 2007.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

With the increasing demand for low power high performance computing, reducing power of not only CPUs but also memories is becoming important. In typical HPC environments large capacity of DRAM is installed to avoid memory swapping, although not all of the memory is used in many cases. Since DRAM is a volatile memory, such unused memory can waste a significant amount of power even in a standby state. We propose a next generation low power system that intends to reduce the DRAM capacity without causing application performance degradation. In this system, MRAM and DRAM is used as a main memory, while FLASH is used as a SWAP. Our profile-based paging algorithm optimizes memory accesses by avoiding I/O with slower memories and using faster memories as much as possible. Results from our simulation of parallel applications show that the power consumption can be reduced up to one third, with 30% performance loss.

CiNii Books

researchmap
Evaluation of I/O Performance of IP-SAN on Cluster System using Parallel Benchmark

KAMISAKA KIKUKO, YAMAGUCHI SANEYASU, OGUCHI MASATO, MATSUOKA SATOSHI

IPSJ SIG Notes 111 ( 80 ) 225 - 230 2007.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In Supercomputing and large scale HPC clusters, cluster system integrating connection networks of computing nodes and storage is beginning to be realized. Such cluster system simplifys network composition and reduce its costs. However, it is not clarified how the integration affects total performance of the system. In this paper, as one of the integrated networks, the cluster system connected with IP-SAN is evaluated by using parallel benchmark with I/O. In consequence, the performance of IP-SAN integrated cluster is about the same as that of cluster using local storage. According to the result, the bottleneckof en tire system's performance should be parallel processing and/or I/O processing of storage, rather than data transfer processing of networkstorage.

CiNii Books

researchmap
Performance Evaluation of TSUBAME Heterogeneous Supercomputer with Linpack

ENDO TOSHIO, MATSUOKA SATOSHI, HASHIZUME NOBUAKI, NAGASAKA MASAMICHI

48 ( 8 ) 62 - 70 2007.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

The TSUBAME supercomputer is a heterogeneous large-scale cluster system, which is equipped with 10480 Opteron CPU cores on 655 nodes and 360 ClearSpeed SIMD accelerator boards. This paper describes techniques to run HPL, which is a parallel Linpack implementation, on the TSUBAME system efficiently, and evaluates the performance. The techniques include sharing heterogeneous computing resources among fine grained processes, and using asynchronous communications. Through the evaluation of the system with the modified HPL, we have observed 47.38TFlops, which is the world's fastest Linpack performance on heterogeneous systems. The result of this work shows that heterogeneous supercomputers, which are expected to be much more popular in the near future, are promising for large scale parallel computations.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018254/
MPI Collective Operations Algorithm by Using Multi-lane for Grid Environment

CHIBA TATSUHIRO, ENDO TOSHIO, MATSUOKA SATOSHI

48 ( 8 ) 104 - 113 2007.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

The performance of MPI collective operations, such as broadcast and reduction, is heavily affected by network topologies, especially in grid environments. Many techniques to construct efficient broadcast trees have been proposed for grids. On the other hand, recent high performance computing nodes are often equipped with multi-lane network interface cards (NICs), most previous collective communication methods fail to harness effectively. Our new broadcast algorithm for grid environments harnesses almost all downward and upward bandwidths of multi-lane NICs; a message to be broadcast is split into two pieces, which are broadcast along two independent binary trees in a pipelined fashion, and swapped between both trees. The salient feature of our algorithm is generality; it works effectively on both large clusters and grid environments. It can be also applied to nodes with a single NIC, by making multiple sockets share the NIC. Experimentations on a emulated network environment show that we achieve higher performance than traditional methods, regardless of network topologies or the message sizes.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018258/
Creating Informatics - What the KAKEN Project is Aming at : How Does ICT Affect on Progress of Science ?

SHIMOJO Shinji, NOZAKI Kazunori, MATSUOKA Satoshi

IPSJ Magazine 48 ( 5 ) 521 - 526 2007.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00065970/
Building Time-Stamp Authority Grid and Basic Experiment

NISHIKAWA Takeshi, MATSUOKA Satoshi

IEICE technical report 107 ( 16 ) 13 - 18 2007.4

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Digital time stamping is a technique to prove the existence of a digital data prior to a specific point in time. The centralized time-stamping authority has two major difficulties. One is administrative costs and another is scalability of performance. Distributed time-stamping methods were proposed to solve the problems. But they still have the problem to prepare many TSAs. We have shown that K=L+M among N for G-generation method is able to solve these existing problems. That method does the time-stamping to the mutuality by using many time-stamping units. And we also reported basic characteristics of the method and dependence on configuration parameters at the computer cluster environment on the LAN. In this report, we described that we built TSA Grid on the Internet to install several distributed TSU. And we investigated the influence of network time delay to the authorized time. We also considered that how many time-stamping units enable to be arithmetic mean value of responding authorized times within 1-second. As that result, if it prepared more than 256TSU that arithmetic mean of the authentication time because within 1 second. It became clear that the mode is able to get smaller delay time than that of arithmetic-mean.

CiNii Books

researchmap
Job invocation interoperability between NAREGI middleware Beta and gLite

NAKADA HIDEMOTO, SATO HITOSHI, SAGA KAZUSHIGE, HATANAKA MASAYUKI, SAEKI YUJI, MATSUOKA SATOSHI

IPSJ SIG Notes 2007 ( 17 ) 269 - 274 2007.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

As grid middlewares getting mature, the importance of the inter-operation among them is getting more significant. There is a community group called GIN(Grid Interoperation Now) in the OGF (Open Grid Forum), a standardization body for grid related technologies, which aims to establish interoperation technologies among several grid middlewares. We performed experiments on inter-operation between NAREGI Middleware beta and EGEE gLite, as one of the contributions for the group. For the experiments, we implemented several modules to enable information exchange and mutual job submission. As the result of the experiment, we confirmed the follows: 1) The security layer, such as certficate and virutal organization management, there is no essential difference between them, 2) While information services differs substantially, the resource information can be translated to enable information exchange, 3) Jobs can be mutually submitted based on the exchanged information.

CiNii Books

J-GLOBAL

researchmap
ABARIS: an adaptable fault detection/recovery component framework for MPIs

JITSUMOTO HIDEYUKI, ENDO TOSHIO, MATSUOKA SATOSHI

IPSJ SIG Notes 2007 ( 17 ) 163 - 168 2007.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Long-running MPI applications on clusters and grids that are prone to node and network failures, motivates the use of fault tolerant MPI implementations. However, previous fault tolerant MPIs lack the ability to allow the user to easily choose appropriate fault recovery strategies according to the execution environment, independent of the application codes. ABARIS is our new Fault/Recovery model aware component framework for MPI, where users can customize MPI fault detection and recovery algorithms according to their application and execution environmental requirements by merely selecting appropriate fault/recovery compo nents, independent of the application code. Currently, the ARABIS framework prototype is implemented on top of MPICH-P4MPD. Preliminary evaluation of the prototype using NPB on our MPI fault simulator demonstrates that overhead compaxed to the original MPICH-P4MPD is almost negligible (less than 1%) under normal execution, and when faults occur, appropriate selections and pairings of fault model and recovery method components for corresponding to the execution environment is significant to the overall execution time.

CiNii Books

researchmap
Performance evaluation of a cache-based virtual cluster installation method

NISHIMURA HIDEO, MARUYAMA NAOYA, MATSUOKA SATOSHI

IPSJ SIG Notes 2007 ( 17 ) 121 - 126 2007.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Recently, clusters of virtual machines called virtual clusters are proposed as a means to share Grid resources efficiently. Such virtual cluster construction should be not only fine-grained customizable but also fast and scalable. However, existing ways have not fullfilled these requrirements. We have been proposing a novel virtual cluster installation system which is fast, scalable and fully customizable in corporation with existing cluster installer tools. To achieve efficiency in the presence of such full customization, it automatically caches frequently-constructed virtual disk images to save software installation time in common cases On broader environments, our experimental studies show that the average installation time could be reduced by approximately 66.7% after creation of cache images and 204-node virtual cluster can be done in 40 seconds with our prototype implementation. From the result along with a scalability study, we estimate that installation of a 1000-node virtual cluster could be done in several tens of seconds.

CiNii Books

researchmap
Multi-site MPI execution with virtual cluster

TATEZONO MASAKI, MARUYAMA NAOYA, MATSUOKA SATOSHI

IPSJ SIG Notes 2007 ( 17 ) 115 - 120 2007.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Recently, a large-scale MPI application requests a large amount of computation resource. We propose a MPI execution environment on the the geographically distributed computation resource using virtual clusters, and we confirm that our proposal is feasible according to the application characteristics since experiment on prototype of virtual cluster. Moreover, virtual cluster makes dynamic relocation of virtual node possible. By using this feature, we propose a system which automatically relocates virtual nodes includes MPI process to suitable resources, besed on MPI application characteristics and resource usage. We confirm our approach in a experiment on the prototype, and amount of Cross-site communication gives an indication of possibility of cross-site MPI execution.

CiNii Books

researchmap
High Performance Distributed Time-Stamping Authority : How to Issue Millions Time-Stamp

NISHIKAWA TAKESHI, MATSUOKA SATOSHI

IPSJ SIG Notes 2007 ( 17 ) 221 - 226 2007

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Time stamping is a technique to prove the existence of a digital data prior to a specific point in time. The centralized time-stamping scheme which is the main stream at present can not stand up to the concentration of numerous time-stamping requirement. So, the centralized time-stamping scheme has vulnerability to the distributed DoS(DDoS) attack. It also has high cost problem which causes using an expensive time source such as atomic clock. We soleved these problem by developing a distributed time stamping scheme. In this report, we investigated an implementation and parameter configuration those make a million time-stamp per second possible.

CiNii Books

researchmap
Outil autonome de surveillance de grilles

Laurent Baduel, Satoshi Matsuoka

Revue de l'Ingenierie des Systemes d'Information 12 ( 3 ) 85 - 104 2007

　More details

researchmap
Outil autonome de surveillance de grilles

Laurent Baduel, Satoshi Matsuoka

Revue de l'Ingenierie des Systemes d'Information 12 ( 3 ) 85 - 104 2007

　More details

researchmap
TSUBAMEの飛翔 (Extended Abstract)

松岡聡

電子情報通信学会技術研究報告. CPSY, コンピュータシステム 106 ( 287 ) 33 - 36 2006.10

　More details

Language：Japanese Publisher：一般社団法人電子情報通信学会

CiNii Books

researchmap
Next Generation High Performance Computing Systems and Aspect of Ultimate System

TANABE NOBORU, IKEI Mitsuru, ENDO Toshio, MATSUOKA Satoshi, HATAZAKI Takao, SUMIMOTO SHINJI

IEICE technical report. Computer systems 106 ( 287 ) 49 - 49 2006.10

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

System Area Networks(SAN), which realizes high bandwidth and low latency communication, has become widely used as inter-connection network of PC clusters. This panel discusses some directions to next generation computer systems and ultimate computer system, including advanced hardware and software technologies for high performance computing.

CiNii Books

researchmap
ORE Grid: A Virtual-machine Based Fast Deployment Tool for Grid Execution Environment

TAKAMIYA YASUHITO, YAMAGATA IKUHEI, AOKI TAKAFUMI, NAKADA HIDEMOTO, MATSUOKA SATOSHI

47 ( 12 ) 229 - 239 2006.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

With the increased variety of jobs executed in the Grid, the execution environments such as OSes, softwares, and libraries requested by such jobs have becoming increasingly diversified. However, it is difficult for grid users to acquire the necessary environment suited for each jobs because the job execution environment on the grid are strongly tied to its local administration policies. Recently proposed solutions may achieve virtualization of execution environment at certain level, but are still incomplete that construction of execution environments will again requires manual operations and/or expert knowledge of underlying systems. Instead, we propose the system called ORE (Open Resource Environment) Grid which automatically and dynamically builds exclusive execution environment for each submitted jobs. Moreover, the GUI setup front-end offers succinct methods to pick the necessary features and generate an execution environment description automatically instead of resorting to tool-dependent VM description forms such as shell scripts or DAG descriptions. Our experiences have shown that setup of 16 VM nodes itself will only take 151 seconds, and the setup cost is certainly within an allowable range compared to accumulated running time of general Grid jobs (several hours to several days).

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018304/
Profile-based Optimization of Power Performance by Using Dynamic Voltage Scaling on a PC Cluster

HOTTA YOSHIHIKO, SATO MITSUHISA, KIMURA HIDEAKI, MATSUOKA SATOSHI, BOKU TAISUKE, TAKAHASHI DAISUKE

47 ( 12 ) 272 - 284 2006.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Currently, several of the high performance processors used in a PC cluster have a DVS (Dynamic Voltage Scaling) architecture that can dynamically scale processor voltage and frequency. Adaptive scheduling of the voltage and frequency enables us to reduce power dissipation without a performance slowdown during communication and memory access. In this paper, we propose a method of profiled-based power-performance optimization by DVS scheduling in a high-performance PC cluster. We divide the program execution into several regions and select the best gear (combinations of clock frequency and voltage) for power efficiency. Selecting the best gear is not straightforward since the overhead of DVFS transition is not free. We propose an optimization algorithm to select a gear using the execution and power profile by taking the transition overhead into account. We have built and designed a power-profiling system, PowerWatch. With this system we examined the effectiveness of our optimization algorithm on two types of power-scalable clusters (Crusoe and Turion). According to the results of benchmark tests, we achieved almost 30% reduction in terms of EDP (energy-delay product) without performance impact (less than 5%) compared to results using the standard clock frequency.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018308/
B-12-10 Demonstration of Collective Communication in Grid Computing over OBS Network

ONO Takashi, TAKADA Atsushi, KOGA Masafumi, TAKIZAWA Shin'ichiro, MATSUOKA Satoshi

Proceedings of the Society Conference of IEICE 2006 ( 2 ) 296 - 296 2006.9

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
Construction and Operation of the Grid Challenge Testbed

AIDA KENTO, OSAWA KIYOSHI, OSUMI TOMOTAKA, KASAI TAKEFUMI, ONO ISAO, JITSUMOTO HIDEYUKI, MATSUOKA SATOSHI, SAITO HIDEO, ENDO TOSHIO, YOKOYAMA DAISAKU, TAURA KENJIRO, CHIKAYAMA TAKASHI, TANAKA YOSHIO, SHIMOSAKA HISASHI, KAJIWARA HIROKI, HIROYASU TOMOYUKI, FUJISAWA KATSUKI

IPSJ SIG Notes 107 ( 87 ) 49 - 54 2006.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

This paper presents a case study to operate the Grid testbed for the Grid Challenge in SACSIS2006. The Grid Challenge is a programming competition on a Grid testbed, which is organized by multiple computing resources installed in universities and laboratories. In the last competition, the Grid testbed with more than 1200 CPUs was operated. The paper shows hardware/software specifications of the Grid testbed, and reports experience of the operation, which includes accounting, job management, and troubleshooting.

CiNii Books

researchmap
Flight of TSUBAME: Construction of `Supercomputer for Everyone' toward Petascale

IPSJ SIG Notes 107 ( 87 ) 37 - 42 2006.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap
Large Scale Distributed Time-Stamp Authority : Its design, implementation and performance evaluation

NISHIKAWA Takeshi, MATSUOKA Satoshi

IEICE technical report 106 ( 199 ) 25 - 30 2006.8

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Time stamping is a technique to prove the existence of a digital data prior to a specific point in time. The centralized time-stamping scheme which is the main stream at present can not stand up to the concentration of numerous time-stamping requirement. So, the centralized time-stamping scheme has vulnerability to the distributed DoS(DDoS) attack. It also has high cost problem which causes using an expensive time source such as atomic clock. In this report, we define a reliable, a high-performance, a robust, and inexpensive distributed time stamping scheme. This scheme is based on a network of peer-to-peer time-stamping programs managed by administratively independent entities. It solves the DDos tolerance and the cost problem.

CiNii Books

researchmap
Performance Evaluation of TSUBAME Heterogeneous Supercomputer with Linpack

ENDO TOSHIO, MATSUOKA SATOSHI, HASHIZUME NOBUAKI, NAGASAKA MASAMICHI, GOTO KAZUSHIGE

IPSJ SIG Notes 107 ( 87 ) 43 - 48 2006.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

The TSUBAME supercomputer is a heterogeneous large-scale cluster system, which is equipped with 10480 Opteron CPU cores on 655 nodes and 360 ClearSpeed SIMD accelerator boards. The TSUBAME system has achieved 38.18TFlops with Linpack benchmark and is ranked 7th in the Top500 supercomputer ranking in June 2006, even though the measurement is done without any accelerator boards. This paper discusses issues to obtain high Linpack performance on heterogeneous systems with general purpose processors and accelerators, and describes solutions. Through preliminary experiments with 256 CPU cores on sixteen nodes, we observed +8.2% speed-up when eight accelerators are added, and +19% with sixteen accelerators.

CiNii Books

researchmap
Virtual Cluster with Virtual Machines and Virtual Network

NISHIMURA HIDEO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 107 ( 87 ) 73 - 78 2006.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Recently, a virtual cluster constructed by using a virtual machine and a virtual network attracts attention as the technique of hiding heterogeneous of the grid environment. It is necessary to distribute a VM image which has requested environment to the real computing resources for constructing proper virtual cluster. However, the transfer time of the VM image cannot be generally disregarded, since those sizes have several GBytes from 100MBytes. In an existing research, there is a limitation in the execution environment though a comparatively high-speed virtual cluster construction system is advocated. Then, we propose the virtual cluster construction system that makes the environment for which the user hopes at dynamically and high speed. This system automatically generates cache images that contain packages composition frequently used. Moreover, due to estimating the construction time beforehand and using cache, we confirmed the construction time was shortened from about 100 seconds at about 75 seconds, and obtained the indicator to speed-up.

CiNii Books

researchmap
The Proposal and Evaluation of Cuckoo FTMPI : Framework of Fault/Recovery model aware Component-based FTMPI

JITSUMOTO Hideyuki, MATSUOKA Satoshi

IEICE technical report 106 ( 198 ) 73 - 78 2006.7

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Execution of MPI applications on clusters and Grid deployments suffering from node and network failures motivates the use of fault tolerant MPI implementations. Therefore, some fault tolerant MPI was implemented. But, these fault tolerant MPI implementations cannot choose easily appropriate restoration according to the environment. We present Cuckoo FTMPI: Fault/Recovery model aware component framework. Users can get a MPI implementation according to their executing environment by the selection of the components. This paper presents the architecture of Cuckoo FTMPI, its theoretical foundation and the performance of the implementation. Preliminary evaluation using NPB, there's no overhead to use Cuckoo FTMPI on MPICH. And we presented validity of Fault/Recovery model aware component framework.

CiNii Books

researchmap
Towards Fault Diagnosis for Large-Scale Distributed Systems

MARUYAMA Naoya, MATSUOKA Satoshi

IEICE technical report 106 ( 198 ) 19 - 24 2006.7

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

As distributed systems, such as clusters and grids, are getting larger scale and more commoditized, analysis of faults in such systems are becoming significantly harder than before. Nonetheless, none of the existing analysis techniques is not effective for such platforms, resulting huge burden to system administrators. We detect and analyze faults as follows. First, we take function-call traces from each process of the target distributed system. Next, to find anomalous behaviors, we apply an online analysis the call traces. Based on the premise that most of distributed systems processing is request-driven or event-driven, we analyze the call trace of each processing routine of requests or events. We implemented a prototype fault analyzer, applied it to a cluster resource manager, and evaluated the efficacy of our method.

CiNii Books

researchmap
Design and implementation of NAREGI SuperScheduler based on the OGSA architecture

Satoshi Matsuoka, Masayuki Hatanaka, Yasumasa Nakano, Yuji Iguchi, Toshio Ohno, Kazushige Saga, Hidemoto Nakada

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 21 ( 4 ) 521 - 528 2006.7

　More details

Language：English

DOI： 10.1007/s11390-006-0521-y

Web of Science

J-GLOBAL

researchmap
Dependability and Security : Devices, Architecture and Software

SAKAI Shuichi, NAKAMURA Hiroshi, GOSHIMA Masahiro, MATSUOKA Satoshi, HASHIMOTO Mikio, KOHIYAMA Kiyoshi, NAKAMURA Tomohiro

IEICE technical report. Dependable computing 106 ( 4 ) 67 - 67 2006.4

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Dependability and security are two of the most significant things in information systems. Dependability, however, is complex concept depending on the people who use the word: one regards it as LSI design reliability and another think it as reliability and security of the internet. Here, six distinguished researchers, LSI designers, computer architects and software developers, will take the rostrum and discuss what the dependability is, what the most significant problem is and how we should solve it. After they understand each other, they discuss what is necessary for ensuring dependability of the whole information system.

CiNii Books

researchmap
Decentralized Job Scheduling System based on Information Sharing framework for Large-Scale Computing Environment

UMEDA NORIHIRO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 2006 ( 20 ) 223 - 228 2006.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Job scheduling system enables to unify and to use distributed computer resources However these systems has a single point of failure that just a few computers makes assignments job to resources, and lack of scalability to increase number of resources and jobs. We claim decentralized job scheduling system to share resources status using communication framework for large-scale computing environment and evaluated on simulations. The results showed that our proposal reduced a decline of efficency on large-scale computing environment.

CiNii Books

researchmap
A virtual-machine based fast deployment tool for Grid execution environment

YAMAGATA IKUHEI, TAKAMIYA YASUHITO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 2006 ( 20 ) 127 - 132 2006.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

With the increased variety of jobs executed in the Grid, the execution environments requested by such jobs have becoming increasingly diversified. So, we implemented system that supply exclusive virtual execution environment every submitted job by virtual machine and installer. So, we implemented system that supply exclusive virtual execution environment every submitted job by virtual machine and installer. This system enable to setup 16 machines that can execute BLAST job dynamically at about 173 seconds. This research suggest system that save and recycle virtual execution environment for shorten time to build job execution environment. We implemented this system and evaluate against old system. So our experiences have shown that the build time has been reduced by 12% than older one.

CiNii Books

researchmap
MPI Collective Communication on All Optical Network

TAKIZAWA SHIN'ICHIRO, MATSUOKA SATOSHI, NAKADA HIDEMOTO

IPSJ SIG Notes 2006 ( 20 ) 193 - 198 2006.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

On the Optical Burst Switching network, it is necessary to establish an optical path connection between nodes before communication and release it after communication. This process takes about 10 ms on average. For this reason, in compute-intensive applications, like MPI applications, an execution of collective communication is heavily influenced by the cost of establishing and releasing a connection. We propose a method which establishes and releases connections independent of communication occurrence to reduce cost in collective communication. In this method, we accomplish fast execution by changing algorithms and simultaneously connecting based on ports on node. The evaluation result shows our proposed technique performs superior to the method which establishes connections whenever communication occurs.

CiNii Books

researchmap
A Scheduling System Coupled with a Replica Management System for Data-intensive Applications

MACHIDA YUYA, TAKIZAWA SHIN'ICHIRO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 2006 ( 20 ) 229 - 234 2006.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Existing scheduling systems for the Grid mostly handle huge I/O via a shared file system or simple staging. However, when numerous nodes access a single I/O node simultaneously, major performance degradation occurs, or in a worst case, causes I/O nodes to hang. Moreover, when a user launches a job consisting of hundreds or even thousands of tasks which share the same data set, it becomes extremely inefficient to stage essentially the same data set to each compute node after every dynamic brokering and allocation of the compute nodes. So we propose to tightly couple replica management and computation scheduling in order to reuse already replicated data effectively. We implemented a prototype system which uses a replica management system that embodies a scalable multi-replication framework, where multiple copies could be made in O(1) transfer time, and enables scheduling computation and data trasfer to single node simultaneously. The evaluation result shows our proposed technique performs superior to the traditional techniques and improves the throughput.

CiNii Books

researchmap
Cyber Science Infrastructure Initiative for Boosting Japan’s Scientific Research

Masao Sakauchi, Shigeki Yamada, Noboru Sonehara Shigeo, Urushidani Jun, Adachi Kazunobu Konishi, Satoshi Matuoka

CTWatch Quarterly Journal 2 ( 1 ) 20 - 26 2006

　More details

researchmap
Cyber Science Infrastructure Initiative for Boosting Japan’s Scientific Research

Masao Sakauchi, Shigeki Yamada, Noboru Sonehara Shigeo, Urushidani Jun, Adachi Kazunobu Konishi, Satoshi Matuoka

CTWatch Quarterly Journal 2 ( 1 ) 20 - 26 2006

　More details

researchmap
Speculative Checkpointing

Ikuhei Yamagata, Satoshi, Matsuoka, Hidemoto Nakada

Proceedings of DSW `06 1 2006

　More details

researchmap
Speculative Checkpointing

Ikuhei Yamagata, Satoshi, Matsuoka, Hidemoto Nakada

Proceedings of DSW `06 1 2006

　More details

researchmap
Cyber science infrastructure initiative for boosting Japan's scientific research

Masao Sakauchi, Shigeki Yamada, Noboru Sonehara, Shigeo Urushidani, Jun Adachi, Kazunobu Konishi, Satoshi Matsuoka

CTWatch Quarterly Journal 2 ( 1 ) 20 - 26 2006

　More details

researchmap
Cyber science infrastructure initiative for boosting Japan's scientific research

Masao Sakauchi, Shigeki Yamada, Noboru Sonehara, Shigeo Urushidani, Jun Adachi, Kazunobu Konishi, Satoshi Matsuoka

CTWatch Quarterly Journal 2 ( 1 ) 20 - 26 2006

　More details

researchmap
光ネットワーク環境におけるMPI集団通信

滝澤真一朗, 松岡聡, 松岡聡, 中田秀基, 中田秀基

情報処理学会シンポジウム論文集 2006 ( 5 ) 2006

　More details

J-GLOBAL

researchmap
BS-6-4 The Next-generation e-Science Infrastructure based on High-Speed Networking and Grid Technologies

Matsuoka Satoshi

Proceedings of the Society Conference of IEICE 2005 ( 2 ) "S - 60"-"S-61" 2005.9

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
Gridifying a Genetic Algorithm for NMR Three-dimensional Protein Structure Determination by Using Ninf-1 and Ninf-G

ONO ISAO, MIZUGUCHI NAOAKI, NAKASHIMA NAOTOSHI, ONO NORIHIKO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, TATE SHINICHI

46 ( 12 ) 396 - 406 2005.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this paper, we parallelize the genetic algorithm (GA) for NMR protein three-dimensional structure determination, which has been proposed by Ono et al., on a grid that consists of multiple PC clusters on the WAN and report some results on the performance evaluation of the proposed system. The proposed system is parallelized with the hierarchical master-worker paradigm and consists of a master, submasters and workers. The communication between the master and each PC cluster is realized with Ninf-G, which is a secure GridRPC middleware, and that in each PC cluster is implemented by using Ninf-1, which is a fast GridRPC middleware. In the proposed system, we employ the slide transfer technique in order to hide the latency of communication on the Internet by using Ninf-G. The experimental results on the grid testbed consisting of 5 sites/1,196 CPUs showed that the proposed system effectively utilized computing resources on the grid testbed when it was applied to a problem of determining the three-dimensional structure of a 78-residue protein.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018402/
Gridifying a Genetic Algorithm for NMR Three-dimensional Protein Structure Determination by Using Ninf-1 and Ninf-G

ONO ISAO, MIZUGUCHI NAOAKI, NAKASHIMA NAOTOSHI, ONO NORIHIKO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, TATE SHINICHI

情報処理学会論文誌コンピューティングシステム（ACS） 46 ( SIG12(ACS11) ) 396 - 406 2005.8

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

In this paper, we parallelize the genetic algorithm (GA) for NMR protein three-dimensional structure determination, which has been proposed by Ono et al., on a grid that consists of multiple PC clusters on the WAN and report some results on the performance evaluation of the proposed system. The proposed system is parallelized with the hierarchical master-worker paradigm and consists of a master, submasters and workers. The communication between the master and each PC cluster is realized with Ninf-G, which is a secure GridRPC middleware, and that in each PC cluster is implemented by using Ninf-1, which is a fast GridRPC middleware. In the proposed system, we employ the slide transfer technique in order to hide the latency of communication on the Internet by using Ninf-G. The experimental results on the grid testbed consisting of 5 sites/1,196 CPUs showed that the proposed system effectively utilized computing resources on the grid testbed when it was applied to a problem of determining the three-dimensional structure of a 78-residue protein.

J-GLOBAL

researchmap
Optimization of Power-Performance by controlling DVS on a PC cluster

HOTTA YOSHIHIKO, SATO MITSUHISA, KIMURA HIDEAKI, BOKU TAISUKE, TAKAHASHI DAISUKE, MATSUOKA SATOSHI

ARC 2005 ( 80 ) 49 - 54 2005.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Recently, some high performance processors have a DVS (Dynamic Voltage Scaling) functionality that dynamically changes processor's voltage and frequency. The DVS can be used to reduce power consumption by controlling the clock frequency according to the dynamic characteristics of program execution. In parallel applications on a PC cluster, we expect the reduction of power dissipation without losing the performance by running communication phases with lower frequency by controlling DVS. In this paper, we propose a method to optimize power-performance by controlling DVS at each phase in the program based on the execution profile and observation of dynamic power consumption. We select the appropriate frequency for each phase from trial runs and decide the optimal control of DVS for actual run. In this paper, we focus on the phases of communication and computation and examine whether our proposed method will be used to optimize power-performance of some parallel benchmarks. We have conducted some experiments in two kind of PC cluster, Transmeta Crusoe and AMD Turion, and found 30% power reduction from a regular frequency of in terms of EDP possible by our method.

CiNii Books

researchmap
A scheduling system coupled with a replica management system for data-intensive applications

MACHIDA Yuya, TAKIZAWA Shin'ichiro, NAKADA Hidemoto, MATSUOKA Satoshi

IEICE technical report. Computer systems 105 ( 226 ) 67 - 72 2005.7

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Existing scheduling systems for the Grid mostly handle huge I/O via a shared file system or simple staging. However, when numerous nodes access a single I/O node simultaneously, major performance degradation occurs, or in a worst case, causes I/O nodes to hang. Moreover, when a user launches a job consisting of hundreds or even thousands of tasks which share the same data set, it becomes extremely inefficient to stage essentially the same data set to each compute node after every dynamic brokering and allocation of the compute nodes. Instead, we propose to utilize a replica management system that embodies a scalable multi-replication framework as a data staging mechanism, where multiple copies could be made in O(1) transfer time as well as make intelligent reuse of already-created replicas in scheduling for efficiency. A prototype executing a sample data-intensive application proved to be quite superior to shared files or traditional staging techniques.

CiNii Books

researchmap
Job execution in Grid on customizable virtual machine

YAMAGATA Ikuhei, AOKI Takafumi, TAKAMIYA Yasuhito, NAKADA Hidemoto, MATSUOKA Satoshi

IEICE technical report. Computer systems 105 ( 225 ) 13 - 18 2005.7

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

As the diversity of users' jobs increase, their requirements on execution environments have been getting richer as well. However, it would not be necessarily true that a certain execution environment satisfies their specific requirements. To fill the gap, we propose a technique that allows the user to easily generate an execution environment that is tailored to his/her own requirements. To provide the customizability, our system utilizes a virtual machine monitor and a customizable installation tool. With our initial prototype, the user submits a job using GRAM, a remote job invocation infrastructure, with a description of his/her requirements for the environment. Given the description, the system creates the tailored environment using Lucie[5], a customizable installation infrastructure, where the job is finally executed. To illustrate the effectiveness of our technique, we have conducted several preliminary studies. Based on the results, we show the system can execute jobs without interfering the existing environments, and demonstrate the environment creation time is not significantly large compared to typical job execution time on Grids. These preliminary experiments show that the proposed system achieves the customizability of execution environments on Grids.

CiNii Books

researchmap
A Flexible Configuration and Packaging Method for Cluster Installers

TAKAMIYA Yasuhito, SAKAE Yoshiaki, YAMAGATA Ikuhei, MATSUOKA Satoshi

IEICE technical report. Computer systems 105 ( 225 ) 19 - 24 2005.7

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Although automated cluster installers are becoming better known, it has not attained widespread popularity for several reasons, one of which is that customization of cluster configurations according to the needs of the underlying environment as well as configuring multiple user-level packages are quite difficult for the layman. Recently proposed solutions may relieve expertise at a certain level, but are incomplete that detailed customization and/or packaging will again require expert knowledge. Instead, we propose the notion of metapackages that treats a set of packages that define a certain functionality and their mutual configurations as a templatable package in itself, and treated as a first-class entity in the installation process. We show that, with associated tools support metapackages provide very high flexibility, rigorous dependency management, ease of end-user customizability, without sacrificing performance or expressive power in cluster configurations. We demonstrate the effectiveness by implementing the metapackage feature on top of our automated cluster installer Lucie. Experiences have shown that cluster installation itself will only take 5-6 minutes after a set of necessary metapackages have been selected, with appropriate dependency and conflict checks performed. Even with low-level debugging with our support we expect that a layman can pick the necessary features from a list, get full account of possible conflicts, and build a cluster in less than an hour by resolving such dependencies with alternate picks.

CiNii Books

researchmap
MPI Environment with Load Balancing using Virtual Machine

TATEZONO Masaki, NAKADA Hidemoto, MATSUOKA Satoshi

IEICE technical report. Computer systems 105 ( 225 ) 7 - 12 2005.7

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Load balancing is one of the key features for efficient execution of long-running jobs, idle-cycle harvesting, etc. We propose a technique to achieve a load-balanced MPI execution environment using a virtual machine monitor. The key idea here is that transparent migration of a MPI process running on a virtual machine would be made possible by moving the underlying virtual machine itself. Our initial implementation of the technique uses a virtual machine monitor called Xen, which has an integrated VM migration ability, and VPN for migration among different subnets. We experimentally show that the prototype successfully achieves runtime migration of MPI processes, and that the overhead due to virtualization ranges from 10% for computation-intensive applications to 50% for network-intensive ones. We also implemented a simple load-balancing algorithm on top of the prototype. Experiments with it suggest that the runtime migration is effective for efficient execution of long-running jobs even with the virtualization overhead.

CiNii Books

researchmap
Design and Implementation of NAREGI Super-Scheduler based on OGSA Architecture

HATANAKA Masayuki, NAKANO Yasumasa, IGUCHI Yuji, OHNO Toshio, SAGA Kazushige, AKIOKA Sayaka, NAKADA Hidemoto, MATSUOKA Satoshi

IPSJ SIG Notes 102 ( 57 ) 33 - 38 2005.6

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this paper, we describe design and implementation of NAREGI Super-Scheduler based on OGSA-EMS Architecture. Through our experience of its design and implementation, we made sure that OGSA-EMS architecture is feasible. Also, we clarify the issues for the specification on resource allocation of a MPI parallel job that requires heterogeneous and many computational resources, and propose a set of extensions to OGSA-EMS components to resolve the issues.

CiNii Books

researchmap
An Interactive Job Scheduling System that Allows Job Steering by Users

IINO AKIKO, NAKADA HIDEMOTO, SHIMODAIRA HIDETOSHI, MATSUOKA SATOSHI

IPSJ SIG Notes 102 ( 57 ) 39 - 44 2005.6

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Since the grid environment is suitable for long-runnig workflow execution, the workflow engine for grid becomes one of the hot research areas. However, applications that require user steering during its workflow execution are not addressed with existing workflow engine research. Although there are lot of work on computational steering, the requirements and nature of the steering for the long-running workflow execution are totally different from conventional computational steering, and therefore, it should be addressed differently. We designed and implemented a workflow scheduling framework that allows users to control the execution of their application. It is implemented using Condor DAGMan as a workflow engine and provides users steering capability via e-mail and web-enabled interface. We also evaluated the system with phylogenetic tree inference application.

CiNii Books

researchmap
Contribution of Information Science and Engineering Departments to informatization in Universities

IWANO Kazuo, TOKUDA Hideyuki, MATSUOKA Satoshi, MURAKAMI Kazuaki, NISHIMURA Yoshio, YONEZAKI Naoki, Kazuo Iwano, Hideyuki Tokuda, Satoshi Matsuoka, Kazuaki Murakami, Yoshio Nishimura, Naoki Yonezaki, Software Development Laboratory:IBM Japan Ltd., Graduate School of Media and Governance Faculty of Environmental Information Keio University, Global Scientific Information and Computing Center Tokyo Institute of Technology, Department of Informatics Graduate School of Information Science and Electrical Engineering Kyushu University, Department of Computer Science Graduate School of Information Science and Engineering Tokyo Institute of Technology

Computer Software 22 ( 2 ) 1 - 20 2005.4

　More details

Language：Japanese Publisher：Japan Society for Software Science and Technology

DOI： 10.11309/jssst.22.2_1

CiNii Books

researchmap
グリッド上での遺伝アルゴリズムによるNMR蛋白質立体構造解析

小野功, 水口尚亮, 中島直敏, 松原彬光, 小野典彦, 中田秀基, 松岡聡, 関口智嗣, 楯真一

電気学会全国大会講演論文集 2005 ( 3 ) 3.S18(11)-3.S18(14) 2005.3

　More details

Language：Japanese

J-GLOBAL

researchmap
Autonomous Deployment of Grid Monitoring Systems

SHIROSE KEN'ICHIRO, MATSUOKA SATOSHI, NAKADA HIDEMOTO

IPSJ SIG Notes 162 ( 19 ) 1 - 6 2005.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

The problem with practical, large-scale deployment of Grid monitoring system is that, it takes considerable management cost and skills to maintain the level of quality required by production usage, since the monitoring system will be fundamentally be distributed, need to be running continuously, and will itself likely be affected by the various faults and dynamic reconfigurations of the Grid itself. Given our goal to develop a generalized autonomous management framework for Grid monitoring, we have built a prototype, on top of NWS, featuring automatic configuration of the components well as coping with single-node faults without user intervention. An experimental deployment on the Tokyo Institute of Technology's Campus Grid (The Titech Grid) consisting of over 15 sites and 800 processors has shown the system to be robust in handling faults and reconfigurations, automatically deriving an ideal configuration for the head login nodes of each PC cluster in about ten minutes.

CiNii Books

researchmap
Towards a Portable Fault Tolerant Component Framework for MPI

JITSUMOTO HIDEYUKI, MATSUOKA SATOSHI

IPSJ SIG Notes 2005 ( 19 ) 193 - 198 2005.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Execution of MPI applications on clusters and Grid deployments suffering from node and network failures motivates the use of fault tolerant MPI implementations. Therefore, some fault tolerant MPI was implemented. But, these fault tolerant MPI implementations cannot choose easily appropriate restoration according to the environment. We present CuckooMPI, used Fault/Recovery model aware component framework. Users can get a MPI implementation according to their executing environment by the selection of the components. This paper presents the architecture of CuckooMPI, its theoretical foundation and the performance of the implementation. Preliminary evaluation using NPB-CG with 32 processes showed, CuckooMPI has 10% performance improvement compared with MPICH.

CiNii Books

researchmap
Towards a high-performance infrastructure to recover the Internet connectivity

HAMANO TOMOYUKI, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 2005 ( 19 ) 85 - 90 2005.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Many researches and developments have been done or being carried out to recover the Internet connectivity. But most of them are not suitable for Grid environment in terms of connectivity, security, independency of site policies, and high performance. We propose a infrastructure for Grid environment to recover the Internet connectivity. We have also implemented a prototype system JRouter and evaluated its performances. The result showed that the system achieves requirements of Grid environment except for high performance. So we made considerations for means of high-throughput communication.

CiNii Books

researchmap
MegaProto : A Low-Power and Compact Cluster for High-Performance Computing

NAKASHIMA HIROSHI, NAKAMURA HIROSHI, SATO MITSUHISA, BOKU TAISUKE, MATSUOKA SATOSHI, TAKAHASHI DAISUKE, HOTTA YOSHIHIKO

IPSJ SIG Notes 2005 ( 19 ) 121 - 126 2005.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

DOI： 10.1109/IPDPS.2005.278

Scopus

CiNii Books

J-GLOBAL

researchmap
A Study on the Effect of Cooperative Superschedulers on the Computational Grid

AKIOKA SAYAKA, TAKEFUSA ATSUKO, NAKADA HIDEMOTO, MATSUKOKA SATOSHI, MIURA KENICHI

IPSJ SIG Notes 2005 ( 19 ) 55 - 60 2005.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this paper, we evaluated the effect of the cooperative superschedulers on the computational Grid with a Grid simulator. Many studies have proposed to utilize superschedulers in order to achieve effective load balancing and improve the resource utilization. On the other hand, there is no deep consideration on the affects caused by superschedulers in different ways of cooperation. Through the simulation, we got many results to support the effectiveness of superschedulers in cooperation. Cooperative superschedulers shorten the waiting times of applications, and accelerate utilization of the resources. In addition to those results, we discuss on pros and cons of two major cooperative styles : tiers and distributed network.

CiNii Books

researchmap
Distributed File System with Automatic File Access Distribution for the Grid

SATO HITOSHI, MATSUOKA SATOSHI, NAKADA HIDEMOTO

IPSJ SIG Notes 162 ( 19 ) 7 - 12 2005.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In the parallel computing environment like HPC Cluster or the Grid, some application involves large overhead due to the access concentration on the node that maintains the file. To avoid this problem on the traditional distributed file system, users have to distribute the file access manually. However, it is hard and difficult for users to do such file access distribution on the Grid environment because of its resource heterogeneousness. We claim an automatic file distribution scheme using the access concentration detection on the file system and the file replication. We implement this prototype on Gfarm and evaluate its performance. The results showed that our prototype is better performance than Gfarm in the file concentration situation.

CiNii Books

researchmap
MegaProto: 1 TFlops/10kW rack is feasible even with only commodity technology

Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke Boku, Satoshi Matsuoka, Daisuke Takahashi, Yoshihiko Hotta

Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC'05 2005 2005

　More details

Language：English

DOI： 10.1109/SC.2005.45

Scopus

J-GLOBAL

researchmap
ファイルへのアクセスの自動分散を行うグリッド用分散ファイルシステム

佐藤仁, 松岡聡, 中田秀基

コンピュータシステム・シンポジウム 2005 91 - 98 2005

　More details

researchmap
Low Power Computing for Fleas, Mice, and Mammoth ? Do They Speak the Same Language ?

Satoshi Matsuoka

CTWatch Quarterly Journal 1 ( 3 ) 2月11日 2005

　More details

researchmap
GridRPCシステムNinf-GにおけるUNICOREおよびGT4によるジョブ起動

中田秀基, 田中良夫, 関口智嗣

情報処理学会研究報告 2005-HPC-102 45 - 50 2005

　More details

researchmap
Towards a high-performance overlay network infrastructure for grid computing

2005 36 - 42 2005

　More details

Language：Japanese

CiNii Books

researchmap
The Titech Grid ～Can a Center Sustain a Large Production Grid on Campus? ～History, Lessons Learned, and the Future～

Satoshi Matsuoka

4 ( 2 ) 17 - 27 2005

　More details

researchmap
Primary Study of A Task Farming API over The GridRPC Framework

Yusuke Tanimura, Hidemoto, Nakada, Yoshio Tanaka, Satoshi Sekiguchi

Eighth International Conference on High-Performance Computing in Asia-Pacific Region, 2005 2005 339 - 345 2005

　More details

DOI： 10.1109/HPCASIA.2005.78

J-GLOBAL

researchmap
Design and implementation of Condor-UNICORE bridge

Hidemoto Nakada, Jaime Frey, Motohiro Yamada, Yasuyoshi Itou, Yasumasa Nakano, Satoshi Matsuoka

Eighth International Conference on High-Performance Computing in Asia-Pacific Region, Proceedings 307 - 314 2005

　More details

Language：English

DOI： 10.1109/HPCASIA.2005.32

Web of Science

J-GLOBAL

researchmap
Ninf-1/Ninf-Gを用いたNMR蛋白質立体構造決定のための遺伝アルゴリズムのグリッド化

小野功, 水口尚亮, 中島直敏, 小野典彦, 中田秀基, 松岡聡, 関口智嗣, 楯真一

先進的計算基盤システムシンポジウム SACSIS2005 143 - 152 2005

　More details

researchmap
Megaproto: A low-power and compact cluster for high-performance computing

Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke Boku, Satoshi Matsuoka, Daisuke Takahashi, Yoshihiko Hotta

Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005 2005 2005

　More details

Language：English

DOI： 10.1109/IPDPS.2005.278

Scopus

J-GLOBAL

researchmap
Low Power Computing for Fleas, Mice, and Mammoth ? Do They Speak the Same Language ?

Satoshi Matsuoka

CTWatch Quarterly Journal 1 ( 3 ) 2005

　More details

researchmap
インタラクティブなジョブスケジューリングシステム

飯野彰子, 中田秀基, 中田秀基, 下平英寿, 松岡聡, 松岡聡

情報処理学会シンポジウム論文集 2005 ( 5 ) 2005

　More details

J-GLOBAL

researchmap
The Titech Grid ～Can a Center Sustain a Large Production Grid on Campus? ～History, Lessons Learned, and the Future～

Satoshi Matsuoka

4 ( 2 ) 17 - 27 2005

　More details

researchmap
Primary Study of A Task Farming API over The GridRPC Framework

Yusuke Tanimura, Hidemoto, Nakada, Yoshio Tanaka, Satoshi Sekiguchi

Eighth International Conference on High-Performance Computing in Asia-Pacific Region, 2005 2005 339 - 345 2005

　More details

DOI： 10.1109/HPCASIA.2005.78

J-GLOBAL

researchmap
Design and Implementation of a Fault -Tolerant RPC System : Ninf- C

NAKADA HIDEMOTO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

45 ( 11 ) 160 - 170 2004.10

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this paper, we describe design and implementation of a fault tolerant RPC system, Ninf-C. Ninf-C is designed for large-scale master-worker programs, that take from a few days to a few months for its execution. Ninf-C takes Condor, developed by University Wisconsin, as the base structure of the system. It uses file transmission and checkpointing mechanisms and provides system-wide robustness for programmers. In Ninf-C, master and workers communicate each other using file, not the socket, making crash-recovery easy. To prove robustness of the system, we performed an experiment on a heterogeneous cluster consisted of x86 and SPARC. We ran a simple but long-running master-worker program on the cluster and rebooted several machines of the cluster to disturb the program execution. As a result, the program execution finished normally, showing the robustness of Ninf-C.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018447/
Design and Implementation of a Fault -Tolerant RPC System : Ninf- C

NAKADA HIDEMOTO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

情報処理学会論文誌コンピューティングシステム（ACS） 45 ( SIG11(ACS7) ) 160 - 170 2004.10

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

In this paper, we describe design and implementation of a fault tolerant RPC system, Ninf-C. Ninf-C is designed for large-scale master-worker programs, that take from a few days to a few months for its execution. Ninf-C takes Condor, developed by University Wisconsin, as the base structure of the system. It uses file transmission and checkpointing mechanisms and provides system-wide robustness for programmers. In Ninf-C, master and workers communicate each other using file, not the socket, making crash-recovery easy. To prove robustness of the system, we performed an experiment on a heterogeneous cluster consisted of x86 and SPARC. We ran a simple but long-running master-worker program on the cluster and rebooted several machines of the cluster to disturb the program execution. As a result, the program execution finished normally, showing the robustness of Ninf-C.

J-GLOBAL

researchmap
Design and Implementation of Configuration Packaging Methods for Cluster Installers

TAKAMIYA YASUHITO, MATSUOKA SATOSHI

IPSJ SIG Notes 99 ( 81 ) 55 - 60 2004.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

While the wide spread of commodity clusters, fully automatic cluster installers haven't become the setup tool of choice for many users because of its complexity and difficulty in configuration. This paper introduces methods of 1) by packaging a set of typical configurations of cluster installers into common software package format (MetaPackaging) and make it downloadable for end-users with standard package managers, allows automatic generation of customized configurations for each local sites over integrated template customization GUI, and 2) by making use of package dependency information stored at underlying software package management system and pseudo installation environment built by automatic installer, allows sanity check of contentment of dependencies between user selected software packages without actual executions of installer. Moreover, for MetaPackage developers, we deploy a helper toolkit to enable detecting package dependency problems occurs when building metapackages and code generation of metapackage customization front-end.

CiNii Books

researchmap
Scalable MultiReplication Framework on The Grid

TAKIZAWA SHIN'ICHIRO, TAKAMIYA YASUHITO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 99 ( 81 ) 247 - 252 2004.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We propose a "MultiReplication Framework" for data replication in a data grid environment, where redundant and inefficient long-haul copying of replica data is avoided on local or near-local usage of same sets of data by employing aggregated and efficient multicast-based replication schemes. The replica manager manages the replica creation centrally using an XML-based schema, and when multiple peers requests a replica in a near-simultaneous fashion, this is detected and an organized multi-replication over multiple involved peers are initiated by the use of our multi-replicaiton tool Dolly+. Benchmarks on prototype implementation show that, our scheme scales well in a realistic data grid environment constituting of multiple clusters interconnected to a distant data archiver, maintaining constant replication time even if the number of nodes increase, being superior to simpler schemes such as maintaining a local cache of replicated data within a cluster.

CiNii Books

researchmap
Design and Implementation of a Highly Portable Job Scheduling System

MACHIDA YUYA, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 99 ( 81 ) 217 - 222 2004.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We present a job-scheduling system for the Grid, Jay. Jay handles two difficulties inherent in the Grid: namely heterogeneity and instability. Jay is based on the techniques of Condor, which was developed at the University of Wisconsin, and has been implemented in Java for better portability. Although Java'does not have a secure way of changing user IDs of an arbitrary process, we resolved the problem in Jay by developing a highly-portable C++ daemon that achieves this property and can run in Java environments that ,does not support JNI. The results of small-scale experiments show its fault-tolerance and high portability.

CiNii Books

researchmap
Implementation and Evaluation of Dynamic Load Balancing for Performance Heterogeneous Clusters on Omni/SCASH

SAKAE YOSHIAKI, MATSUOKA SATOSHI, SATO MITSUHISA, HARADA HIROSHI

IPSJ SIG Notes 99 ( 81 ) 61 - 66 2004.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Increasingly large-scale clusters of PC/WS continue to become majority platforms in HPC field. In such a commodity cluster environment, there may be incremental upgrade due to several reasons, such as rapid progress in processor technologies, or user needs and it may cause the performance heterogeneity between nodes from which the application programmer will suffer as load imbalances. To overcome these problems, some dynamic load balancing mechanisms are needed. We have implemented and reported on loop re-partitioning mechanisms based on the runtime performance so far. However, loop re-partitioning involeves changes of data access ranges so that some applications whose performance rather depends on data locality shows performance degradation. In this paper, we report our recent work on page migration mechanisms based on page reference counting and its performance. Results show that we can acheive about 60% performance gain with Laplace on 4 nodes cluster by page migration and restore the performance that degraded by loop re-partitioning.

CiNii Books

researchmap
Design and Implementation of a Cluster-Aware Fault Injector

MARUYAMA Naoya, MATSUOKA Satoshi

IEICE technical report. Dependable computing 104 ( 239 ) 25 - 30 2004.7

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

We report our design and implementation of a cluster-aware fault injector, or CFI, which enables systematic reproduction of faulty clustered environments by software. CFI focuses on building a testbed for research on fault-tolerant cluster computing. It allows to investigate the behavior of systems software and applications on clustered environments when user-specified faults happen. It consists of a Linux kernel module, which accounts for injecting network-related faults into a running operating system of each node, and a set of various tools, which controls the kernel module. It injects a fault into the entire cluster nodes based on a given fault scenario written by a user. Our preliminary experiments showed that its intrusiveness is not significantly large, and that it is a promising tool for further research on fault tolerance.

CiNii Books

researchmap
Design and implementation of Speculative Checkpointing

YAMAGATA Ikuhei, JITSUMOTO Hideyuki, NAKADA Hidemoto, MATSUOKA Satoshi

IEICE technical report. Dependable computing 104 ( 239 ) 31 - 36 2004.7

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Checkpointing parallel processes causes high temporal and spatial pressure to I/O subsystems. To decrease the pressure, we propose a new Checkpointing technique, called Speculative Checkpointing, that improves upon incremental checkpointing by speculatively distributing the checkpointing workload and avoiding the necessity of file synchronization. Experimentation with our prototype Speculative Checkpointer on a variety of parallel workload on a cluster showed marked improvements when speculation works effectively, exhbiting up to 33% improvement over conventional incremental checkpointing schemes. We expect that, in a production environment with larger number of nodes and dedicated backend checkpointing storage this improvement would be even higher.

CiNii Books

researchmap
耐故障性を重視したRPCシステムNinf‐Cの設計と実装

中田秀基, 田中良夫, 松岡聡, 関口智嗣

情報処理学会シンポジウム論文集 2004 ( 6 ) 77 - 84 2004.5

　More details

Language：Japanese

J-GLOBAL

researchmap
The Future of Grid Computing with Optical Networks : Future Grids with Fast Optical Networks

MATSUOKA Satoshi

104 ( 1 ) 1 - 4 2004.4

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Future Grid infrastructures for high-performance, large-scale science will require high-bandwidth, low-latency optical networks. The talk will introduce several projects that base themselves on such network infrastructures, and furthermore will describe future Grids that will assume peer-to-peer optical connectivity.

CiNii Books

researchmap
Parallelization of the Genetic Programming using Jojo

TOKUDA TAKU, TANAKA KOUJI, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 157 ( 20 ) 187 - 192 2004.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Estimating mutual interactions of genetic networks is mainly to infer the mutual control relationships from multiple genes from the gene expression data. Such correlations are typically expressible in the form of nonlinear simultaneous differential equations. However, most work to date has employed S-systems as an expression of such differential equations, allowing only rough approximations of mass actions, and as such it was difficult to determine the actual correlations between the genes. Instead, we formulate the mutual interactions as actual simultaneous partial differential equations, and automatically determine its structure and coefficients using genetic programming (GP) from a given data series. Parallel implementation of the scheme in a Grid environment using our Jojo Grid programming system for Java has resulted in precise determination of the equations in many cases within some reasonable time.

CiNii Books

researchmap
Examination of the job execution on VM in Grid Environment

OGURA SHOJI, KOUNO KENJI, MATSUOKA SATOSHI, NAKADA HIDEMOTO

IPSJ SIG Notes 157 ( 20 ) 25 - 30 2004.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Despite recent proposals for fine-grained resource control on Grid computing nodes using virtual machine technologies, the impact of the respective virtualization schemes, as well as feasibility of actually imposing control, has not been well investigated in a comprehensive fashion. We propose a virtual machine framework for the Grid that allows selection of different virtualization schemes depending on application characteristics, and (2) perform comprehensive measurements of the impact of individual schemes, as well as when the schemes are actually used for resource control, and derive a guideline that would lead to (semi-) automated selection of virtualization schemes. The created prototype runs as a job manager in Globus 2.4, and allows selection of virtualization schemes, as well as pluggable resource control depending on the user and the intended policy. Benchmarks using NPB2.4 show that we can minimize the overhead by appropriate selection of virtualization schemes, as well as deriving several guidelines such as communication-intensive applications favor virtualization via library call interpositions, whereas virtualization of multiple process tend to favor kernel modules.

CiNii Books

researchmap
Parallelization of phylogenetic tree inference using Grid technology

YAMAMOTO YO, NAKADA HIDEMOTO, SHIMODAIRA HIDETOSHI, MATSUOKA SATOSHI

IPSJ SIG Notes 157 ( 20 ) 181 - 186 2004.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

The maximum likelihood method is considered as one of the most reliable methods forphylogenetic tree inference. But if the number of species increases, it becomes impossible to calculate all phylogenetic trees, since the number of the trees increses explosively. An approximation method using split decomposition is proposed. It reduces calculation cost drastically, although, the calculation cost for larger number of species is still too high. We propose a method to reduce the cost using combinatorial optimization technique. We also parallelize it in a master-worker style using Grid Middlewares. The 64.0 times speedup is obtained as the result of using 16 workers in the problem of 9 species.

CiNii Books

researchmap
Design and implementation of Condor-UNICORE bridge

NAKADA HIDEMOTO, FREY JAIME, YAMADA MOTOHIRO, ITOU YASUYOSHI, NAKANO YASUMASA, MATSUOKA SATOSHI

IPSJ SIG Notes 157 ( 20 ) 37 - 42 2004.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this paper, we describe design and implementation of a Generic Grid interface for Condor. Though Condor has intefaces for specific Grid systems, such as Globus GRAM, it is not easy to add new interface for other Grid systems, since it will require some code modification inside the Condor. With our new interface, supporting a new Grid system can be established without any code modification in Condor itself. We also implemented a bridge for UNICORE system and validated that our approach is effective.

CiNii Books

researchmap
Current Status of Grid Computing Projects in Japan(Special Issue on Large-scale Computer Simulation)

SHIMOJO Shinji, SEKIGUCHI Satoshi, MIURA Kenichi, MATSUOKA Satoshi

SYSTEMS, CONTROL AND INFORMATION 48 ( 7 ) 244 - 249 2004

　More details

Language：Japanese Publisher：THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS

DOI： 10.11509/isciesci.48.7_244

CiNii Books

researchmap
MegaProto : A Prototype of the Ultra Low-Power Mega-Scale System

NAKASHIMA HIROSHI, NAKAMURA HIROSHI, SATO MITSUHISA, BOKU TAISUKE, MATSUOKA SATOSHI

IPSJ SIG Notes 96 ( 102 ) 85 - 90 2003.10

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

This paper gives the conceptual design of the MegaProto machine, a prototype mega-scale system developed in a research project named "Mega-Scale Computing Based on Low-Power Technology and Workload Modeling". The MegaProto is a prototype implementation of our key idea that million-scale parallel systems should be built with densely mounted low-power processors. It will also act as a platform to implement and evaluat our new technologies such as power conscious compilation, highly reliable and high performance network, highly dependable cluster management and multi-level parallel programming. The building block of the MegaProto is a 1U height and 19 inch-rack mountable mother-board unit on which 16 low-power processors are mounted with a high bandwidth, 2 Gbps per processor, network. The peak performance of the unit is 14.4 GFlops and the intra-and inter-unit network bandwidth are 32 Gbps and 8 Gbps respectively, while the unit consumes 300 W power at most to achieve high performance and density with low power consumption.

CiNii Books

researchmap
A Task-Farming API on GridRPC and its implementation

NAKADA HIDEMOTO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

IPSJ SIG Notes 96 ( 102(HPC-96) ) 61 - 66 2003.10

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Task-farming means that to compute huge number fo tasks for different parameter on large number of computers. Grid RPC is a kind of middleware on the Grid, that is considered to be suitable for task-forming. While Grid RPC provedes basic functionarity for task-forming, it lacks high-level features such as scheduling or fault tolerance, due to its design principle, and burdens application programmer to implement them. In this paper we descrive Task-forming API implemented on the GridRPC API. It is designed to ease the burden and to support master-worker computation. We also the implementation of the API and a sample program which uses the API.

CiNii Books

J-GLOBAL

researchmap
グリッドコンピューティングの現状と未来

中田秀基, 松岡聡

計算数理工学レビュー 2003 ( 1 ) 9 - 12 2003.10

　More details

Language：Japanese Publisher：日本計算数理工学会

researchmap
Lucie : A Fast Installation and Administration Tool for Large - scaled Clusters

TAKAMIYA YASUHITO, MANABE ATSUSHI, MATSUOKA SATOSHI

44 ( 11 ) 79 - 88 2003.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Rapid increase in the number of nodes for commodity clustering is mandating the handling the potential cost of setup and maintenance clusters as the norm. Moreover, with arising of data intensive applications which requires several GBs of data on each cluster nodes, it is revealed that there were no installation tool aimed at installation-time setup of such large-scaled data. In this paper, we propose a new cluster installation/administration tool called Lucie which allows network boot/installation mechanism with no specific installation media and configurability which allows reconstruction of installer itself on demand. Additionally, we propose a new data distribution mechanism called Dolly+ which deploys fault tolerant, high-speed virtual ring topology data transferring. With Dolly-)-, one could distribute several GBs of images to all cluster nodes in installation-time maintaining fault tolerance. Our several benchmarks show that Lucie and Dolly+ can install and setup the whole cluster in constant time. This result shows that Lucie and Dolly+ are scalable and efficient, and could well serve as a basis for 'Plug-and-Play' clustering.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018534/
A Java - based Programming Environment for the Grid : Jojo

NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

44 ( 11 ) 46 - 56 2003.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

This paper introduces a Java-based programming environment for the Grid; Jojo. Jojo is a distributed programming environment implemented in Java, which is suitable for hierarchal grid environment. Jojo provides several features, including remote invocation using Globus GRAM, intuitive message passing API suitable for parallel execution and automatic user program staging. Using Jojo, users can construct parallel distributed application on the Grid with ease. In this paper, we show design and implementation of Jojo, its programming API, configuration file syntax and a working program example. We also show preliminary performance evaluation results that prove effectiveness of multi-hierarchal execution.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00018531/
A Java - based Programming Environment for the Grid : Jojo

NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

情報処理学会論文誌コンピューティングシステム（ACS） 44 ( SIG11(ACS3) ) 46 - 56 2003.8

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

This paper introduces a Java-based programming environment for the Grid; Jojo. Jojo is a distributed programming environment implemented in Java, which is suitable for hierarchal grid environment. Jojo provides several features, including remote invocation using Globus GRAM, intuitive message passing API suitable for parallel execution and automatic user program staging. Using Jojo, users can construct parallel distributed application on the Grid with ease. In this paper, we show design and implementation of Jojo, its programming API, configuration file syntax and a working program example. We also show preliminary performance evaluation results that prove effectiveness of multi-hierarchal execution.

J-GLOBAL

researchmap
You Don't Really Need Big Fat Switches Anymore-Almost

MATSUOKA SATOSHI

IPSJ SIG Notes 154 ( 84 ) 157 - 162 2003.8

　More details

Language：English Publisher：Information Processing Society of Japan (IPSJ)

Although commodity cluster computing based on very fast and inexpensive commodity processors are proliferating today, one of the prohibitive factors towards its large-scale deployment is the high cost of the network switching fabric in order to retain properly high bandwidth. We argue that, except for the most demanding applications, appropriate aggregation of inexpensive switches, with collective communication algorithms that utilize the characteristics of such networks, will accommodate a bulk of parallel applications, even those with substantial communication requirements. We present 3 techniques for implementing high-bandwidth collective communications in such a setting, and provide preliminary performance measurements that hint the effectiveness of our proposal. The technique can be extended to interconnect a set of clusters for implementing high-bandwidth Grid interconnect as well as replacing SAN for high-bandwidth I/O.

CiNii Books

researchmap
Execution of the replica exchange molecular dynamics simulator on the Grid

SATO HITOSHI, ITO MASAKATSU, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 95 ( 83 ) 41 - 46 2003.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Replica-exchange method is considered to be suitable for execution on the Grid environment because of its large granurality and small size data transfer. To confirm the suitability, we performed several experiments on various environment, using an application program called REMD toolkit that implements Replica-exchange method. We also improved the REMD toolkit to cope with performance-heterogeneous environment. The results showed that, 1) REMD toolkit is scalable upto around 100 workers, 2) the improved version is faster than original version in the performance-heterogeneous environment.

CiNii Books

researchmap
Development of A Grid Portal Construction Toolkit(PCT4G) Supporting Application Installation and Data Distribution/Update

SHIRASUNA SATOSHI, SUZUMURA TOYOTARO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 95 ( 83 ) 173 - 178 2003.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

As Grid technologies become more practical, a number of Grid Portals have been constructed and used in various fields to offer user friendly interfaces for Grid resources. Along with that, several toolkits to generate Grid portals have been developed in order to reduce the burden of portal developers. However, even with the aid of those toolkits, portal developers still have to install target applications on each node on the Grid. In addition to that, it is necessary tokeep application data up to date for some applications, especially applications in bioinformatics field. In order to automate these tasks, we are implementing a toolkit. PCT4G, which automates application installation, data management, and interface generation. Also, users can construct Grid Portals for their own applications on the fly through Web interfaces of PCT4G.

CiNii Books

researchmap
Automatic Management System for Monitoring System on the Grid

SHIROSE KEN'ICHIRO, OGAWA HIROTAKA, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 95 ( 83 ) 89 - 94 2003.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Monitoring CPU, memory and disk usage and network performance is needed inGrid Computing environment. Generally, monitoring systems on Grid Computing consist of multiple components based on own functions. It is difficult to configure them all manually, because there are many dependencies between them and monitoring systems must run continuously. We propose a automatic management system for monitoring system on Grid Computing, implement a part of functions and evaluate its usefulness.

CiNii Books

researchmap
Towards a C Language Hosting Environment for OGSA

HAMANO TOMOYUKI, NAKADA HIDEMOTO, SUZUMURA TOYOTARO, MATSUOKA SATOSHI

IPSJ SIG Notes 95 ( 83 ) 179 - 184 2003.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

OGSA is a new architecture that is hubrid of traditional Grid architecture and Web services. OGSI base OGSA has two issues: 1) current OGSI implementations do not provide hosting environment for C language, and 2) XML-based protocol communication decline performance. In This paper, we propose a system that provides C hosting environment on OGSI and provide a auxiliary tool that eases developing services on the system. We also show performance evaluation results that prove effectiveness of the system and issues of OGSI implementations.

CiNii Books

researchmap
Implementation and Evaluation of a Fault Tolerant MPI with Reliable TCP/IP Sockets

JITSUMOTO HIDEYUKI, TAKAMIYA YASUHITO, MATSUOKA SATOSHI

IPSJ SIG Notes 95 ( 83 ) 149 - 154 2003.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

On cluster systems, failure rates tend to be high due to a large number of constituents. Therefore, to perform stable long-time computation on cluster systems, middleware support for fault-tolerancy is inevitably required. We implemented a fault-tolerant MPI prototype system and measured the overhead of the system. Our MPI system implements coordinated checkpointing and recovery protocol on MPICH using a single process checkpointer called ckpt and a reliable network called Rocks. Preliminary evaluation using NPB-CG with 32 processes showed the overhead posed by Rocks stayed within just 8%.

CiNii Books

researchmap
Javaによる階層型グリッド環境Jojoの設計と実装

中田秀基, 松岡聡, 関口智嗣

情報処理学会シンポジウム論文集 2003 ( 8 ) 113 - 120 2003.5

　More details

Language：Japanese

J-GLOBAL

researchmap
An Efficient NAS Parallel Benchmarks Algorithm for Heterogeneous Clusters

SASOU TAKERU, MATSUOKA SATOSHI

IPSJ SIG Notes 93 ( 29 ) 1 - 6 2003.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this study, we implemented the optimization of the Kernel Benchmarks of NAS Parallel Benchmarks for a heterogeneous cluster system and evaluated on the CPU heterogeneous cluster. We used the technique of optimization that load sharing by changing data size corre sponding to a performance of each nodes. From the experimental results, our method achieves improvement of performance on EP, IS, and MG. But in the case of CG and FT, increase of a communicative overhead affects the performance, and the performance of our method less than original CG and FT.

CiNii Books

researchmap
A Parallel Minimal Generation Gap Model Using Ninf for Evolutionary Analysis of Protein Structures and Its Performance Evaluation

ONO Isao, IMADE Hiroaki, NOKADA Hidemoto, ONO Norihiko, MATSUOKA Satoshi, SEKIGUCHI Satoshi, TATE Shin-ichi

IPSJ SIG Notes 93 ( 29 ) 119 - 154 2003.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Nuclear Magnetic Resonance (NMR) spectroscopy is a promising method for the three-dimensional structure of proteins that is one of the most important problems in post-sequence era. This method has a serious problem that it takes several months for experts to analyze the data of only one protein. In order to remedy the problem, Ono et al have proposed an automatic method based on a genetic algorithm (GA) for analyzing the data and determining the three-dimensional structures of proteins and reported that, they had good results on relatively small-size problems. In this report, to speed up the GA, we propose an parallel implementation of the generation alternation model, Minimal Generation Gap (MGG), which is employed in the GA. In the implementation, we employ Ninf proposed by National Institute of Advanced Industrial Science and Technology (AIST) as a middleware. In order to examine the performance, we perform some experiments.

CiNii Books

researchmap
Protein structure optimization using Genetic Algorithm on Jojo

NAKADA HIDEMOTO, NAKAJIMA NAOTOSHI, ONO ISAO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, ONO NORIHIKO, TATE SHIN'ICH

IPSJ SIG Notes 93 ( 29 ) 155 - 160 2003.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Java language is suitable for Grid environment due to its 1) portability among heterogeneous architechtures, 2) integrated multi-thread capability that can effectively hide latency. Genetic Algorithm (GA) is a good as an application area for the Grid, because its affinity for parallel execution. From these viewpoints, we are developping a Java-based programming framework for GA called JP0P-GA. However, we did not have concrete knowledge on effective parallel implementation for GA. In this paper, we implemneted a real GA application on top of Java-based Grid programming environment Jojo in several parallelization methods. As an application, we deployed protein 3-dimensional structure optimization using NMR spectroscopy. We performed several experiments on a Grid environment and obtained knowledge on parallelization of GA applications.

CiNii Books

researchmap
Implementation and Evaluation of Dynamic Load Balancing Using Runtime Performance Monitoring on Omni/SCASH

SAKAE YOSHIAKI, MATSUOKA SATASHI, SATO MITSUHISA, HARADA HIROHSI

IPSJ SIG Notes 93 ( 29 ) 131 - 136 2003.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In the commodity cluster environment, there may be performance heterogeneity between nodes due to several reasons, from which the application programmer will suffer as load imbalances. To overcome these problems, some dynamic load balancing mechanisms are needed. In this paper, we report our ongoing work on dynamic load balancing extension to Omni/SCASH. Using our dynamic load balancing mechanisms, we expect that programmers can have load imbalances adjusted automatically by the runtime system without explicit definition of data and task placements in a commodity cluster environment with possibly heterogeneous performance nodes. The results of evaluation indicates that, our loop re-partitioning scheme which is one of the dynamic load balancing extension works well, and also it is important to combine loop re-partition with dynamic page migration.

CiNii Books

researchmap
Preliminary evaluation of dynamic load balancing using loop re-partitioning on Omni/SCASH

Y Sakae, S Matsuoka, M Sato, H Harada

CCGRID 2003: 3RD IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS 463 - 470 2003

　More details

Language：English

DOI： 10.1109/CCGRID.2003.1199402

Web of Science

J-GLOBAL

researchmap
蛋白質立体構造の進化的解析のためのNinf 版並列MGG とその性能評価

小野功, 今出広明, 中田秀基, 小野典彦, 松岡聡, 関口智嗣, 楯真一

情報処理学会研究報告 2002-HPC-93(HOKKE2003) 149 - 154 2003

　More details

researchmap
Omni/SCASH のループ再分割を用いた動的負荷分散拡張の実装と評価

栄純明, 松岡聡, 佐藤三久, 原田浩

先進的計算基盤システムシンポジウム SACSIS2003 論文集 307 - 314 2003

　More details

researchmap
Evaluation of the inter-cluster data transfer on Grid environment

S Ogura, S Matsuoka, H Nakada

CCGRID 2003: 3RD IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS 374 - 381 2003

　More details

Language：English

DOI： 10.1109/CCGRID.2003.1199390

Web of Science

J-GLOBAL

researchmap
グリッド環境に適した並列組み合わせ最適化システム jPoP における分枝限定法の実装

秋山智宏, 中田秀基, 松岡聡, 関口智嗣

第6回プログラミングおよび応用のシステムに関するワークショップ SPA 2003 2003

　More details

researchmap
Lucie: 大規模クラスタに適した高速セットアップ・管理ツール

高宮安仁, 真鍋篤, 松岡聡

先進的計算基盤システムシンポジウム SACSIS2003 論文集 365 - 372 2003

　More details

researchmap
A Java-based Programming Environment for the Grid : Jojo

NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

IPSJ SIG Notes 92 ( 99 ) 31 - 36 2002.10

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

This paper introduces a java-based programming environment for the Grid; Jojo. Jojo is a distributed programming environment implemented in Java, which is suitable for hierarchal grid environment. Jojo provides several features, including remote invocation using Globus GRAM, intuitive message passing API suitable for parallel execution and automatic user program staging. Using Jojo, users can construct parallel distributed application on the Grid. In this paper, we show design and implementation of Jojo, its programming API, configuration file syntax and a working program example. We also show preliminary performance evaluation result.

CiNii Books

researchmap
Grid Portal Toolkit Ninf-Portal

NAKADA HIDEMOTO, SAITO MASAYUKI, SUZUMURA TOYOTARO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

情報処理学会論文誌. ハイパフォーマンスコンピューティングシステム 43 ( 5 ) 172 - 183 2002.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

As the Grid proliferates as the next-generation wide-area high-performance computing infrastructure, end-user Grid interfaces in the form of "Grid Portals" is becoming increasingly important, especially computational scientists and engineers. Although several Grid portal toolkits and proposals have been proposed, a Grid Portal creator must construct and deploy both the user interface and the application portions of the Grid Portal, resulting in considerable programming efforts. We aim to easen this burden by applying the state-of-the-art Web/XML interface generation technologies for the former, and the Ninf-G GridRPC system for easily "Gridifying" exisiting applications for the latter, and realizing their seamless integration. The resulting system which we call the "Ninf Portal" allowed concise description and easy deployment of a sample application on the Grid with very small programming efforts.

CiNii Books

J-GLOBAL

researchmap
Grid Datafarm Architecture for Global Petascale Data-intensive Computing

TATEBE OSAMU, MORITA YOUHEI, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, SODA NORIYUKI

43 ( 5 ) 184 - 195 2002.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Grid Datafarm architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage using a grid of clusters with tens of thousands of nodes. New parallel I/O APIs and file-affinity scheduling make it possible to achieve scalable I/O bandwidth and scalable parallel processing. Preliminary performance evaluation of a Gfarm reference implementation has shown scalable disk I/O and network bandwidth on the Presto III Athlon cluster. Gfarm parallel I/O write and read operations has achieved 1.74 GB/s and 1.97 GB/s, respectively, using 64 cluster nodes. Gfarm parallel file copy achieved 443 MB/s with 23 parallel streams on the Myrinet 2000.

CiNii Books

researchmap
Evaluation of the inter-cluster data transfer on Grid environment

OGURA SHOJI, MATSUOKA SATOSHI, NAKADA HIDETOMO

IPSJ SIG Notes 91 ( 80 ) 155 - 160 2002.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Large-scale storage systems to be utilized in DataGrid settings implemented by interconnecting and federating large-scale storage clusters is being proposed and constructed. On peer-to-peer data transfer between two large clusters, two major factors are involved: on one hand network pipes with large RTT×bandwidth typically become data-starved, resulting in bandwidth loss whereas when multiple nodes on the clusters attempt simultaneous transfer, the network pipe could become saturated, resulting in packet loss which again may result in bandwidth degradation in large RTT×bandwidth networks. By dynamically and automatically adjusting transfer parameters between the two clusters, such as the number bf network nodes, number of socket stripes, we could achieve optimal bandwidth even when the network is under heavy contention. We have conducted several simulations on a few environments to evaluate and determine the appropriate transfer parameters for this purpose.

CiNii Books

researchmap
Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture for High Energy Physics Applications

TAKEFUSA ATSUKO, TATEBE OSAMU, MATSUOKA SATOSHI, MORITA YOUHEI

IPSJ SIG Notes 91 ( 80 ) 137 - 142 2002.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Data Grid is a Grid environment for ubiquitous access and analysis of large-scale data. Due to its early research status, the performance of petabyte-scale Data Grid models in a realistic data processing setting have not been well investigated. By enhancing our Bricks Grid simulator to be able to simulate Data Grid scenarios, we investigate and compare the performance of different Data Grid models in the Grid Datafarm architecture, mainly categorized into the central and the tier models but with varying scheduling and replication strategies, under realistic assumptions of job processing for the CERN LHC experiments.

CiNii Books

researchmap
Lucie : A fast installation and administration tool for large-scaled clusters

TAKAMIYA YASUHITO, MANABE ATSUSHI, SHIRASUNA SATOSHI, MATSUOKA SATOSHI

IPSJ SIG Notes 91 ( 80 ) 131 - 136 2002.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Rapid increase in the number of nodes for commodity clustering is mandating the handling the potential cost of setup and maintenance clusters as the norm. Moreover, with arising of data intensive applications which requires several GBs of data on each cluster nodes, it is revealed that there were no installation tool aimed at installation-time setup of such large-scaled data. In this paper, we propose a new cluster installation/administration tool called Lucie which allows network boot/installation mechanism with no specific installation media and configurability which allows reconstruction of installer itself on demand. Additionally, using data distribution mechanism with virtual ring topology network, one could distribute several GBs of images to all cluster nodes in installation-time maintaining fault tolerance. Our several benchmarks show that Lucie can install and setup whole cluster in constant time. This result show's that Lucie is scalable and efficient, and could well serve as a basis for Plug-and-Play clustering.

CiNii Books

researchmap
Dynamic Application Development and Execution Environment for Grid Portals

SUZUMURA TOYOTARO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

IPSJ SIG Notes 91 ( 80 ) 191 - 196 2002.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

As the Grid proliferates as the next-generation computing infrastructure, a user interface in the form of "Grid Portals" is becoming increasingly important, especially for computational scientists and engineers. Although several Grid Portal toolkits have been proposed, the portal developer still must build and deploy both the user interface and the application, which results in considerable programming efforts. We aim to ease this burden by generating the portal frontend (that constitutes of JSP and Java Servlets) from a XML document for the former, and a GridRPG system, Ninf-G for easily "gridifying" existing applications for the latter, and realizing their seamless integration. The resulting system, which we call the Ninf Portal, allowed concise description and easy deployment of a real Grid application with greatly small programming efforts. This paper describes the off-the-self architecture which automatically generates an application portal by developing a Grid application by the use of an scripting language in an interactive way and giving user interface information on the web page. This allows portal users to utilize a large variety of applications including default applications defined by portal administrators as well as user-defined applications generated by this architecture.

CiNii Books

researchmap
Performance Tuning High-Performance Linpack (HPL)

SASOU TAKERU, MATSUOKA SATOSHI

IPSJ SIG Notes 91 ( 80 ) 125 - 130 2002.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

HPL is one of the implementation of LINPACK benchmark and is used for performance evaluation of Top500 by many users. We can achieve good performance by tuning parametars, but it is difficult to determine the best parameter since HPL has many parameters. So, the information about a parameter setup of HPL on various parallel systems of is very useful for users. In this paper, we exhibit the configuration of HPL when PrestoIII cluster ranked as the 47th place in the 19th Top500 list, and evaluate in all kinds of parametar setting on PrestoIII cluster. Therefore, we acquired the knowledge about the line of the best parameter tuning.

CiNii Books

researchmap
A Proposal for Parallel Combinatorial Optimization System for the Grid

AKIYAMA TOMOHIRO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

IPSJ SIG Notes 91 ( 80 ) 143 - 148 2002.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

For combinatorial optimization problems, which compute the optimal value of a multidimensional parameter function, serveral methods are known to be effective, such as Branchand-Bound methods, Genetic Algorithm, etc. Since these methods can be massively parallelized and the granularities of computation tasks are easily controllable, they are considered to be suitable for executing on the Grid. However, distributed parallel programming on the Grid is quite complicated and furthermore setting up the Grid-wide computing environment is a heavy burden. Here, we propose a system called jPoP, which makes it easy to develop and execute optimization-problem solvers on the Grid. To support the development, the jPoP provides a template class for each algorithm. And to reduce the cost of the setup, it automatically stages the user programs to the Grid environment. This paper describes the design and implementation of the jPoP system. The template classes for Genetic Algorithms are also shown.

CiNii Books

researchmap
Evaluating Web Service Based Implementations of GridRPC

SHIRASUNA SATOSHI, NAKADA HIDEMOTO, MASTUOKA SATOSHI, SEKIGUCHI SATOSI

IPSJ SIG Notes 91 ( 80 ) 197 - 202 2002.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

GridRPC is a class of Grid middleware for scientific computing. Interoperability has been an important issue, because current GridRPC systems each employ its own protocol. Web services, where XML-based standards such as SOAP and WSDL are expected to see widespread use, could be the medium of interoperability; however, it is not clear 1) if XML-based schemas have sufficient expressive power for GridRPC, and 2) whether performance could be made sufficient. Our experiments indicate that the use of such technologies are more promising than previously reported. Although a naive implementation of SOAP-based GridRPC has severe performance overhead, application of a series of optirnizations improves performance. However, encoding of various features of GridRPC proved to be somewhat difficult due to WSDL limitations. The results show that GridRPC systems can be based on Web technologies, but there needs to be work to extend WSDL specifications.

CiNii Books

researchmap
Grid as the Future of Wide-Area Distributed Computing

MATSUOKA Satoshi

7 ( 3 ) 529 - 532 2002.7

　More details

Language：Japanese

CiNii Books

researchmap
A Case Study of Access Grid Node Construction and a Global Technical Conference

SHUDO Kazuyuki, TANAKA Yoshio, KOMATSU Hiroyuki, MATSUOKA Satoshi, NANRI Takeshi, OKAMURA Koji, SEKIGUCHI Satoshi

IPSJ SIG Notes 147 ( 22 ) 31 - 36 2002.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Access Grid represents a project and a software suite that support human interaction across the grid. The main and basic part of the technology is a large-scale video conference system. We have designed and constructed a package Access Grid node, named "Delivery Grid". Utilizing the technology, SC Global was held at the SC2001 conference. It is the first global technical conference on the Grid. Over 40 nodes attended to the event. We contributed to the event by planning and hosting a panel discussion related to Asia-Pacific Grid and attending from Japan and Denver. This paper describes our experiences in the construction of Access Grid and the SC Global.

CiNii Books

researchmap
Implementation and Evaluation of a Scalable Job Management Architecture for Large-Scale PC Cluster on the Grid Environment

IWASAKI SATORU, MATSUOKA SATOSHI, SODA NORIYUKI, HIRANO MOTONORI, TATEBE OSAMU, SEKIGUCHI SATOSHI

IPSJ SIG Notes 147 ( 22 ) 37 - 42 2002.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this paper we describe the design and implementation of the job launch architecture for Grid Data Farm(Gfarm) system. Gfarm system is composed of PG clusters with ten thousands of nodes on the Grid. Gfarm system uses GSI for communication and authentication between nodes. Because of this, if an ingenuous method is used to start a job on the Gfarm system, the GSI authentication cost which is in proportion to the number of nodes occurs, and expects that the start of the job which consists of thousands of processes takes several thousand seconds. We avoid the authentication cost by using the connection which has been established in advance. Our system shows that the job launching time is 3.5 second with 15 nodes and 6 second with 63 nodes. We think that we can achieve more scalability by improving job-launching protocol.

CiNii Books

researchmap
情報処理学会の終焉? : IPSJのあり方

村岡洋一, 土居範久, 戸田巖, 萩谷昌己, 松岡聡

情報処理 43 ( 2 ) 37 - 37 2002.2

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

学会で「学会のあり方」を議論すると, 非常に単純化した言い方とするならば, 次のような図式になります.(1)学会は論文のaccreditaion機関(2)加えて, 研究発表の機会提供機関(3)これらの活動をコストパフォーマンスよく行いたい.そのためには経済的基盤の確立が必須(4)さあ, 一般会員の数を増やそう.会員のメリットのために質の良い会誌が不可欠(5)でも市販雑誌に太刀打ちできるかしら(6)ということで, あの手この手の会員エンタテインメント策の立案でもこれだけで本当にいいのでしょうか.たとえば, 以下のような疑問があります.(1)研究活動の場として, 学会はドッグイヤーとか称するこの時代に対応できるほど, 身軽に動いているのでしょうか?(2)そもそもそれほど大事にしているはずの研究活動が, 本当に世のため, 自分のためになるものでしょうか?単なる自己満足だけでなければいいのですが.(3)身軽に, 素晴らしい研究成果を世に問う場所である学会が, 重たくなっていないでしょうか.なんでもできる場である学会になるためには, もっと強力なサポートインフラの構築がいらないでしょうか?(4)そもそも学会は, 研究成果の発表の場だけでいいのでしょうか?これからの大不況時代を技術者・研究者として生き延びていくために, もっと智恵を発揮する場になる必要はないでしょうか?このような疑問も含め, 常日ごろから学会のあり方について「建設的な破壊的ご意見(?)」をお持ちの論客の方々にご参加いただき, 「春の嵐」を巻き起こしたいと思います.若人よ, 黙っていると学会は解体されてしまいますよ!!

CiNii Books

researchmap
Overview of GridRPC: A remote procedure call API for grid computing

K Seymour, H Nakada, S Matsuoka, J Dongarra, C Lee, H Casanova

GRID COMPUTING - GRID 2002 2536 274 - 278 2002

　More details

Language：English

Web of Science

researchmap
Titech Grid : Toward the Next Generation Computation Infrastructure

NAKADA Hidemoto, MATSUOKA Satoshi

The Proceedings of The Computational Mechanics Conference 2002 ( 0 ) 685 - 686 2002

　More details

Language：Japanese Publisher：The Japan Society of Mechanical Engineers

Evolving "e-science" requires more and more computing power. Considering power consumption and cooling, it is getting impossible to provide computing resources from a single centralized site, like ordinary "University Computer Center". Global Scientific Information and Computing Center (GSIC) at Tokyo Institute of Technology deployed a new form of the computation infrastructure, called Titech Grid, utillzing Grid technology, commodity PC cluster technology and newly introduced hi-speed network. Here, we give presice description of Titech Grid configuration and operation.

DOI： 10.1299/jsmecmd.2002.15.685

CiNii Books

researchmap
The Ninf Portal : An Automatic Generation Tool for Computing Portals

Toyotaro Suzumura Hidemoto Nakada, Masayuki Saito, Satoshi Matsuoka, Yoshio Tanaka, Satoshi Sekiguchi

Joint ACM Java Grande - ISCOPE 2002 Conference, Seattle, Washington, November 3-5, 2002 2002

　More details

researchmap
Evaluating Web Services Based Implementations of GridRPC.

Satoshi Shirasuna, Hidemoto Nakada, Satoshi Matsuoka, Satoshi Sekiguchi

In Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002) 237 - 245 2002

　More details

DOI： 10.1109/HPDC.2002.1029923

J-GLOBAL

researchmap
Towards Dynamic Load Balancing Using Page Migration and Loop Re-partitioning on Omni/SCASH

Yoshiaki Sakae, Satoshi, Matsuoka Mitsuhisa Sato, Hiroshi Harada

In Proceedings of The Fourth European Workshop on OpenMP (EWOMP 2002) 2002

　More details

researchmap
Grid datafarm architecture for petascale data intensive computing

Osamu Tatebe, Youhei Morita, Satoshi Matsuoka, Noriyuki Soda, Satoshi Sekiguchi

2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2002 2002

　More details

Language：English

DOI： 10.1109/CCGRID.2002.1017117

Scopus

J-GLOBAL

researchmap
ユーザー透過な耐故障製を実現するMPIへ向けて

高宮安仁, 松岡

情報処理学会・電気通信処理学会並列処理シンポジウム JSPP2002 論文集 217 - 224 2002

　More details

researchmap
ヘテロなクラスタ環境における並列LINPACKアルゴリズム

笹生健, 松岡聡, 建部修見

情報処理学会・電気通信処理学会並列処理シンポジウム JSPP2002 論文集 71 - 78 2002

　More details

researchmap
Grid Datafarm architecture for petascale data intensive computing

O Tatebe, Y Morita, S Matsuoka, N Soda, S Sekiguchi

CCGRID 2002: 2ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS 102 - 110 2002

　More details

Language：English

DOI： 10.1109/CCGRID.2002.1017117

Web of Science

J-GLOBAL

researchmap
ペタスケール広域分散データ解析のためのGrid Datafarmアーキテクチャ

建部修見, 森田洋平, 松岡聡, 関口智嗣, 曽田哲之

情報処理学会論文誌：ハイパフォーマンスコンピューティングシステム，HPCS2002論文集情報処理学会 89 - 96 2002

　More details

researchmap
XMLベースGridRPCシステムの構築と評価

白砂哲, 中田秀基, 松岡聡, 関口智嗣

日本ソフトウエア科学会第５回プログラミングおよび応用システムに関するワークショップ（SPA2002） 2002

　More details

researchmap
Towards Dynamic Load Balancing Using Page Migration and Loop Re-partitioning on Omni/SCASH

Yoshiaki Sakae, Satoshi, Matsuoka Mitsuhisa Sato, Hiroshi Harada

In Proceedings of The Fourth European Workshop on OpenMP (EWOMP 2002) 2002

　More details

researchmap
Evaluating Web services based implementations of GridRPC

S. Shirasuna, H. Nakada, S. Matsuoka, S. Sekiguchi

Proceedings of the IEEE International Symposium on High Performance Distributed Computing 2002- 237 - 245 2002

　More details

Language：English Publisher：Institute of Electrical and Electronics Engineers Inc.

DOI： 10.1109/HPDC.2002.1029923

Scopus

J-GLOBAL

researchmap
The Ninf Portal : An Automatic Generation Tool for Computing Portals

Toyotaro Suzumura Hidemoto Nakada, Masayuki Saito, Satoshi Matsuoka, Yoshio Tanaka, Satoshi Sekiguchi

Joint ACM Java Grande - ISCOPE 2002 Conference, Seattle, Washington, November 3-5, 2002 2002

　More details

researchmap
Gridポータル構築ツールキットNinf-Portal

中田秀基, 齊藤真幸, 鈴村豊太郎, 田中良夫, 松岡聡, 関口智嗣

情報処理学会・電気通信処理学会並列処理シンポジウム JSPP2002 論文集 209 - 216 2002

　More details

researchmap
A Proposal for API of GridRPC

NAKADA HIDEMOTO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

IPSJ SIG Notes 88 ( 78 ) 37 - 42 2001.10

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Computation system based on RPC (Remote Procedure Call) is a promising candidate as a middleware of the Grid. Several systems, including Ninf and NetSolve, are already proposed and used in various area. However, Grid RPC API is not standardised yet, and the fact is precluding further spread of Grid RPC systems. In this paper, we examine two existing Grid RPC API and propose a Grid RPC API intended to be a standard, based on them. The API is designed to be minimal but sufficient for aplications. We are planning to promote this API as a standard for Grid RPC in Global Grid Forum.

CiNii Books

researchmap
Towards performance evaluation of high-performance computing on multiple Java platforms

S Matsuoka, S Itou

FUTURE GENERATION COMPUTER SYSTEMS 18 ( 2 ) 281 - 291 2001.10

　More details

Language：English

DOI： 10.1016/S0167-739X(00)00099-6

Web of Science

J-GLOBAL

researchmap
Fault Tolerance on the Ninf System

SHIRASUNA SATOSHI, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 87 ( 77 ) 153 - 158 2001.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Fault Tolerance is becoming an increasingly important research topic in the Grid as it gains widespread use. The availability of abundance of albeit unstable resources in non-dedicated environments mandate that all faults in the stages of user computation be handled in a transparent and graceful fashion. Our analysis shows that, in GridRPC, which is one of the viable programming models and systems for the Grid, variable stages during the computation exhibits various facets of fault tolerance, and as such they must be handled in a stage-by-stage basis. An experiment in integrating Ninf, a Grid RPC system, with the Condor system for checkpointing to enable fault tolerance for computation shows that the integration is largely transparent to the user, and for large-grained computations, the overhead is relatively small. On the other hand, overhead for sma11er-grained computations exhibits anomolous and spurious overhead, in addition to overhead incurred for transfer of the checkpointing library on each invocation, and we are conducting further investigation on its viability.

CiNii Books

researchmap
Evaluation of Monitoring Method in the Grid

AKIYAMA TOMOHIRO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

IPSJ SIG Notes 87 ( 77 ) 159 - 164 2001.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

The Grid allows distributed resources to be coordinated in order to facilitate large-scale computing over the wide-area network. In such an environment, fault detection and performance monitoring as well as its predeiction becomes one of the important features that need to be agreed upon and possibly standardized. The Grid Perbrmance Working Group within the Global Grid Forum has recently proposed and defined the basic architecture of Grid monitoring and the XML-based data format definitions, but the proposal has been yet tested in practice. In particular, technical concerns include 1)scalability of the proposed architecture, 2)the cost of XML representation of instrumentation events, and 3)extensibility and flexibility of the data definition schema. Our experimental implementation of the part of the proposed architecture on our Ninf GridRPC system has shown that, within a realistic Grid setting the architecture seems reasonably scalable, the added cost of data representation is within permissible bounds, and the schema is sufficiently extensible to accomodidate the specifics of the Ninf system.

CiNii Books

researchmap
Towards MPI with user-transparent fault tolerance

TAKAMIYA YASUHITO, MATSUOKA SATOSHI

IPSJ SIG Notes 87 ( 77 ) 129 - 134 2001.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Rapid increase in the number of nodes as well as the massive scale of computing in terms of both time and memory space for commodity clustering is mandatinig the handling the potential failure of applications and system as the norm, while inherent fault tolerance and recovery have not been integral part of software tools being developed for parallel computing on such clusters. Moreover, flexible fault tolerance mechanisms in which the user could manage the balance of reliability vs. transparency vs. execution overhead would be vital, but most previous work on cluster fault tolerance have made available only a single policy and/or mechanism, and moreover, their overhead have not been exactly measured for practical applications. Insted, we propose a new fault tolerant MPI system called Parakeet which al1ows variouss fault tolerance and recovery mechanism could be easily specified by the user, while retaining the efficiency. As a preliminary basis, we have implemented a user-level, coordinated checkpointing and migration protocol on top of MPICH in a user-transparent fashion. By specifying new protocols based on the underlying Parakeet mechanism, one could achieve Plug-and-Play management of large-scale clusters. Preliminary benchmarks show that Parakeet is portable and efficient, and could well serve as a basis for Plug-and-Play clustering.

CiNii Books

researchmap
Design and Implementation of a Jini-based Computing Portal System

SUZUMURA TOYOTARO, MATSUOKA SATOSHI, NAKADA HIDEMOTO

IPSJ SIG Notes 87 ( 78 ) 171 - 176 2001.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

JiPANG(Jini-based Portal Augmenting Grids)is a portal system and a toolkit which provides uniform access interface layer to a variety of Grid systems, and is built on top of Jini distributed object technology. JiPANG performs uniform higher-level management of the computing services and resources being managed by individual Grid systems such as Ninf, NetSolve, Globus, etc. In order to give the user a uniform interface to the Grids JiPANG provides a set of simple Java APIs called the JiPANG Toolkits, and furthermore, allows the user to interact with Grid systems, again in a uniform way, using the JiPANG Browser application. With JiPANG, users need not install any client packages beforehand to interact with Grid systems, nor be concerned about updating to the latest version. Such uniform, transparent services available in a ubiquitous manner we believe is essential for the success of Grid as a viable computing platform for the next generation.

CiNii Books

researchmap
Grid Datafarm Architecture for Petascale Data Intensive Computing

TATEBE OSAMU, MORITA YOUHEI, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, SODA NORIYUKI

IPSJ SIG Notes 87 ( 77 ) 177 - 182 2001.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Design of Grid Datafarm architecture for Petascale data intensive computing is described. Grid Datafarm provides global data parallel filesystems with online Petascale storage and scalable I/O bandwidth to exploit local disks of group of PC clusters on the Grid. Gfarm parallel I/O APIs and Gfarm commands provide a single system image for the filesystem. Automatic management of fault-tolerance and load balancing is also an important issue, which is done by file duplication and re-computation using a command history.

CiNii Books

researchmap
An Implementation of Java Based Software DSM System

NAKADA HIDEMOTO, SOHDA YUKIHIKO, OGAWA HIROTAKA, MATSUOKA SATOSHI

42 ( 7 ) 85 - 85 2001.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Due to rapid commoditi ation of advanced hardware, parallel machines are being commoditied in the form of PC clusters. Software DSM systems using Java language, which is portable on heterogenous systems, are good candidates for such computing environment. In our previous paper, we proposed a ava based software DSM system for clusters. The system successfully proved its usefulness, but we found some defects including 1) long startup time due to remote invocation of Java VM and 2) troublesome labor to transfer class files on each nodes. In this paper, we introduce our new Java DSM system, which enables Java VMs to settle on each nodes, reducing startup time. It automatically transfers application class files and provides access to the client file system.

CiNii Books

researchmap
The Optimization of The LINPACK Benchmark for Heterogeneous Clusters

SASOU TAKERU, MATSUOKA SATOSHI, TATEBE OSAMU

IPSJ SIG Notes 86 ( 49 ) 49 - 54 2001.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this study, we implemented the optimization of HPL, which is one of the LINPACK Benchmark, for a heterogeneous cluster system and evaluated on the CPU heterogeneous cluster. We used the technique of optimization that load sharing by changing data size corre-spoonding to a performance of each nodes. From the experimental results, we attains 57.1% efficiency to theoretical peak performance and 1.49 times at maximum as much as best performance of HPL.

CiNii Books

researchmap
Implementation of Software DSM in Java

SOHDA YUKIHIKO, NAKADA HIDEMOTO, OGAWA HIROTAKA, MATSUOKA SATOSHI

情報処理学会論文誌プログラミング（PRO） 42 ( 3 ) 14 - 26 2001.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Rapid commoditization of advanced hardware and progress of networking technology is now making wide area high-performance computing a.k.a.the 'Grid'Computing a reality.Since a Grid will consist of vastly heterogeneous sets of compute nodes, especially commodity clusters, some have articulated the use of Jave as a suitable technology to satisfy portability across different machines.Since Java's natural model of parallelism is shared memory multithreading, one will have to support distributed shared memory(DSM)in a portable manner;however, none of the previous work on implementing Java on DSM has been portable solution.Instead, we propose a software architecture whose goal is to achieve portability of DSM implementation across different commodity clustering platforms, and implemented a prototype system JDSM.Benchmark results show that the current implementation on Jave incurs increased memory coherency maintenance cost compared to C-based DSMs, thus limiting scalability to some degree, and we are currently working on a solution to alleviate this cost.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00016879/
Evaluating OpenMP Performance on SDSM using SPLASH2: Omni/SCASH Benchmarks.

SAKAE YOSHIAKI, MATSUOKA SATOSHI, SATO MITSUHISA, HASEGAWA ATSUSHI, HARADA HIROSHI

IPSJ SIG Notes 2001 ( 22 ) 187 - 192 2001.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Omni/SCASH is an implementation of OpenMP on top of a DSM system SCASH, allowing portable execution of shared-memory OpenMP programs on SMPs as well as on clusters. To validate the effectiveness of Omni/SCASH, we conduct the following benchmarks: porting of selected sets of SPLASH2 benchmarks onto OpenMP and execution thereof on Omni/SCASH to measure the effectiveness of the implementation, such as the costs/frequencies of cache hit/cache miss/DSM fault handler/barrier invocations. We then test the effectiveness of whether Omni/SCASH serves as a effective programming platform for heterogeneous clusters. Preliminary results are mixed, and indicate that further work is needed for portable parallel programming on (heterogeneous) clusters.

CiNii Books

researchmap
An evaluation of multiple pointing input systems

K Fukuchi, S Matsuoka

HUMAN-COMPUTER INTERACTION - INTERACT'01 739 - 740 2001

　More details

Language：English

Web of Science

researchmap
A Foundation of Solution Methods for Constraint Hierarchies

Hiroshi Hosobe, Satoshi Matsuoka

Constraints Journal, Special Issue on Soft Constraints 2001

　More details

researchmap
Implementation of a Portable Software DSM in Java

Yukihiko Sohda, Hidemoto, Nakada Satoshi, Matsuoka Hirotaka Ogawa

Proceedings of ACM JavaGrande/ISCOPE 2001,San Francisco, pp.163--172, June, 2001.JavaGrande/ISCOPE 2001 Conference, Jun. 2001 163 - 172 2001

　More details

researchmap
Grid Data Farm for Petascale Data Intensive Computing

Osamu Tatebe, Youhei Morita Satoshi, Matsuoka Noriyuki Soda, Hiroyuki Sato, Yoshio Tanaka, Satoshi Sekiguchi, Yoshiyuki Watase, Masatoshi Imori, Tomio Kobaya

Techinical Report, Electrotechnical Laboratory, TR-2001-4 2001

　More details

researchmap
A Grid Programming Primer, (Draft 2.4)

Craig Lee, Satoshi, Matsuoka Domenico Talia Alan, Sussman Nicholas Karonis Gabrielle Allen Mary Thomas

Whitepaper for Global Grid Forum Advanced Programming Models Working Group 2001

　More details

researchmap
OpenJIT 2: The Design and Implementation of Application Framework for JIT Compilers

Fuyuhiko Maruyama Satoshi, Matsuoka Hirotaka Ogawa Naoya, Maruyama Kouya Shimura

USENIX Java Virtual Machine Research and Technology Symposium (JVM'01), Work in Progress session. Monterey. April 23-24 2001 2001

　More details

researchmap
A Jini-based Computing Portal System

Toyotaro Suzumura, Satoshi Matsuoka, Hidemoto Nakada

Proceedings of IEEE/ACM Supercomputing '2001, IEEE Computer Society, Denver, Colorado, Nov. 2001 2001

　More details

researchmap
Network-Enabled Server Systems and the Computational Grid

Henri Casanova, Satoshi, Matsuoka Jack Dongarra

High Performance Computing Symposium (HPC'01),Advanced Simulation Technologies Conference, April 22-26 in Seattle, Washington (USA), 2001 2001

　More details

researchmap
Ninfシステムにおけるフォールトトレランス

白砂哲, 中田秀基, 松岡聡

情報処理学会研究報告 2001-HPC-87(SwoPP2001沖縄), July 2001 159 - 164 2001

　More details

researchmap
Implementation of a Portable Software DSM in Java

Yukihiko Sohda, Hidemoto, Nakada Satoshi, Matsuoka Hirotaka Ogawa

Proceedings of ACM JavaGrande/ISCOPE 2001,San Francisco, pp.163--172, June, 2001.JavaGrande/ISCOPE 2001 Conference, Jun. 2001 163 - 172 2001

　More details

researchmap
A Foundation of Solution Methods for Constraint Hierarchies

Hiroshi Hosobe, Satoshi Matsuoka

Constraints Journal, Special Issue on Soft Constraints 2001

　More details

researchmap
Grid data farm for atlas simulation data challenges

Y Morita, O Tatebe, S Matsuoka, N Soda, H Sato, Y Tanaka, S Sekiguchi, S Kawabata, Y Watase, M Imori, T Kobayashi

PROCEEDINGS OF CHEP 2001 699 - 701 2001

　More details

Language：English

Web of Science

researchmap
A study of deadline scheduling for client-server systems on the Computational Grid

A Takefusa, H Casanova, S Matsuoka, F Berman

10TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS 406 - 415 2001

　More details

Language：English

Web of Science

researchmap
OpenJIT 2: The Design and Implementation of Application Framework for JIT Compilers

Fuyuhiko Maruyama Satoshi, Matsuoka Hirotaka Ogawa Naoya, Maruyama Kouya Shimura

USENIX Java Virtual Machine Research and Technology Symposium (JVM'01), Work in Progress session. Monterey. April 23-24 2001 2001

　More details

researchmap
Grid計算環境におけるデッドラインスケジューリング手法の性能

竹房あつ子, 松岡聡

情報処理学会電気通信処理学会並列シンポジウムJSPP 2001 論文集 2001.06 263 - 270 2001

　More details

researchmap
Java向けソフトウエア分散共有メモリの実現

早田恭彦, 中田秀基, 小川宏高, 松岡聡

情報処理学会論文誌 ,Vol.42 No.SIG 3 (PRO10), March. 2001 12 - 24 2001

　More details

researchmap
Problem Solving Environment Comparison

Rajkummar Buyya, Tom Eidson Dennis Gannon Erwin Laure Satoshi, Matsuoka Thierry, Priol Joel Saltz, Seidel Yoshio Tanaka

Whitepaper for Global Grid Forum Advanced Programming Models Working Group 2001

　More details

researchmap
MPC plus plus performance for commodity clustering

Y Sakae, S Matsuoka

HIGH-PERFORMANCE COMPUTING AND NETWORKING 2110 503 - 512 2001

　More details

Language：English

Web of Science

researchmap
Network-Enabled Server Systems and the Computational Grid

Henri Casanova, Satoshi, Matsuoka Jack Dongarra

High Performance Computing Symposium (HPC'01),Advanced Simulation Technologies Conference, April 22-26 in Seattle, Washington (USA), 2001 2001

　More details

researchmap
Grid Data Farm for Petascale Data Intensive Computing

Osamu Tatebe, Youhei Morita Satoshi, Matsuoka Noriyuki Soda, Hiroyuki Sato, Yoshio Tanaka, Satoshi Sekiguchi, Yoshiyuki Watase, Masatoshi Imori, Tomio Kobaya

Techinical Report, Electrotechnical Laboratory, TR-2001-4 2001

　More details

researchmap
A Grid Programming Primer, (Draft 2.4)

Craig Lee, Satoshi, Matsuoka Domenico Talia Alan, Sussman Nicholas Karonis Gabrielle Allen Mary Thomas

Whitepaper for Global Grid Forum Advanced Programming Models Working Group 2001

　More details

researchmap
Problem Solving Environment Comparison

Rajkummar Buyya, Tom Eidson Dennis Gannon Erwin Laure Satoshi, Matsuoka Thierry, Priol Joel Saltz, Seidel Yoshio Tanaka

Whitepaper for Global Grid Forum Advanced Programming Models Working Group 2001

　More details

researchmap
JavaでのOpen Just-In-Timeコンパイラ技術 OpenJIT

小川宏高, 松岡

2001

　More details

researchmap
分散オブジェクト技術Jiniを用いたComputing Portal Systemの実装

鈴村豊太郎, 松岡聡, 中田秀基

情報処理学会研究報告 2001-HPC-87(SwoPP2001沖縄), July 2001 171 - 176 2001

　More details

researchmap
2010 : A Simulation Roadmap : A Road to PetaFLOPS Using Commodity Technology

MATSUOKA Satoshi

Journal of the Japan Society for Simulation Technology 19 ( 4 ) 238 - 245 2000.12

　More details

Language：Japanese Publisher：Japan Society for Simmulation Technology

Commodity High-Performance Computing which utilizes commodity computing building blocks for high-performance computing is expected to reduce the cost of computing by a factor of over ten thousand over the next ten years, implementing the so-called Petaflops computing as well as making Terascale computing prevalent. As a result, simulations of unprecedented scale or resolution will become possible, making the role of simulation ever more important in science and technology. We attempt to predict the advances of computing power by exploring the technical trends, and investigate how such advances will affect to revolutionize simulation.

CiNii Books

researchmap

Other Link： http://dl.ndl.go.jp/info:ndljp/pid/11082261
Evaluation of MPC++-on-MPI on Commodity Cluster Environment

SAKAE YOSHISAKI, ISHIKAWA YUTAKA, MATSUOKA SATOSHI, TAKAHASHI TOSHIYUKI

41 ( 2 ) 60 - 72 2000.11

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Parallel Programming Languages such as MPC++ which facilitates finer-grained multithreading, remote method invocation, global memory read/write, and synchronized data structures at the language level, have often been claimed as being allowing his parallelism to be expressed in much richer, easier style than programming with libraries such as C +MPI. Due to reliance on language mechanisms which are finer-grained, such languages have traditionally been implemented only on specialized user-level libraries on top of fast, expensive networks. On the other hand, in order for such languages to gain common acceptance, they must be implemented on top of portable messaging libraries running on commodity hardware with substantially less expensive networking. However, little systematic studies have beendone as to identify(1)whether the languages allow easy expression of traditional parallel programs, and(2)in such a case, how much performance one loses by using commodity software/hardware, and(3)the degree of scalability compared to dedicated software/hardware implementations. In order to verify the viability of commodity implementation, we ported the MPC++ language on top of different breeds of MPI, to be executed on two networks of substantial performance/cost difference, namely, Myrinet and 100Base-T Ethernet. We then investigated whether some NASPAR applications can be ported"naturally"on top of MPC++, to be benchmarked in such a environment. Results were quite positive for MPC++ and its commodity implementation, namely(a)the port was quite effortless, (b)the small penalty caused by the additional MPI layer was negligible for NASPAR applications, and(c)for large data sets, MPC++/MPI running on the 100Base-T network was surprisingly competitive to both the C+MOI on Myrinet, the original dedicated implementation of MPC++ on PM/Myrinet. The results are quite promising for wider-spread acceptance of higher-level parallel languages on commodity clustering environments.

CiNii Books

researchmap
コモディティな並列処理のORにおける可能性 : クラスタとGridコンピューティングの動向(特別部会セッション : 数理計画)

松岡聡

日本オペレーションズ・リサーチ学会秋季研究発表会アブストラクト集 2000 258 - 259 2000.9

　More details

Language：Japanese Publisher：公益社団法人日本オペレーションズ・リサーチ学会

CiNii Books

researchmap
Will a x86 Android Dream of an Electronic Cow?(Broadcasting and Information Processing)

MATSUOKA Satoshi

IPSJ Magazine 41 ( 9 ) 1072 - 1074 2000.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap
Implementing and Evaluating of MPC++ Multi-Thread Template Library on Multiple Communication Layers

NODA YUSUKE, SAKAE YOSHIAKI, MATSUOKA SATOSHI, OGAWA HIROTAKA

IPSJ SIG Notes 82 ( 73 ) 137 - 142 2000.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Parallel Programming Languages such as MPC++ embody various features required for finer-grained parallel processing such as user-level threads, remote thread invocation, and remote memory access. Such languages are often implemented assuming a fixed, high-performance hardware to eliminate software overhead as much as possible, and portable, high-performance implementations on top of commodity clusters which could involve a variety of execution environment(different CPUs, OS, networks)have not been well investigated. In order to clarify the commodity-level viability of such languages, we have been experimenting with a variety of combinations of the underlying execution environments. In particular, we are testing the use of VIA as an underlying messaging layer for MPC++. Although a general low-level messaging layer, nontheless the semantics of MPC++ makes it non-trivial to perform a straightforward port. Although the current problems in the implementation prohibits us from large-scale benchmarks, the initial experiments with NAS Parallel CG show that implementation of VIA on 100Base-T allows notable speedup compared to MPI im-plementations due to low-latency messaging, and throughput increasing by 190% for small, 32-byte messages, which is often used for such languages.

CiNii Books

researchmap
Evaluation of Fast Barrier Synchronization on commodity PC Cluster connected with Ethernet

IWASAKI SATORU, MATSUOKA SATOSHI, SAKAE YOSHIAKI, OGAWA HIROTAKA

IPSJ SIG Notes 82 ( 73 ) 131 - 136 2000.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

It is still a typical belief that high-performance clusters require expensive networks with low latency and high bandwidth such as the Myrinet, especially for communication-intensive situations such as barrier synchronization. In order to achieve similar level of performance in barrier synchronization with commodity networks, in particular Fast Ethernet, we propose and investigate the design space of using multicasts and multiple networks. Our experimental library employs VIA-style low-latency access to Ethernet cards as well as supports multicasts, both of which are employed to construct several barrier algorithms. Benchmarks show that the Shuffle Exchange algorithm on our library can be low as 170μseconds with 32 nodes, almost matching the best performance on Myrinet. Although the use of multicast is found to be currently slower with 200μseconds, theoretical analysis using the LogP model reveals that better design of the library will likely yield even lower latency than Shuffle Exchange. The results show that commodity networks are sufficient for clustering, allowing lower cost and their wider acceptance as a result.

CiNii Books

researchmap
Interactive Essay : For the Future of Japanese Super Computers / Keep It up and Let's Work Smart for Japanese Supercomputers! / Why in the World Would a Software Guy Like Me Want to Create a Large-Scale Commodity Cluster ? / Supercomputing Business in Japan, for Tomorrow

41 ( 7 ) 877 - 884 2000.7

　More details

Language：Japanese

CiNii Books

researchmap
For the Future of Japanese Super Computers

BOKU Taisuke

IPSJ Magazine 41 ( 7 ) 877 - 878 2000.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap
Why in the World Would a Software Guy Like Me Want to Create a Large-Scale Commodity Cluster?

MATSUOKA Satoshi

IPSJ Magazine 41 ( 7 ) 880 - 882 2000.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap
Implementation of Multiple Pointing Input System

FUKUCHI KENTAROU, MATSUOKA SATOSHI

89 ( 61 ) 15 - 21 2000.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

This paper describes a prototype system of Multiple Pointing Input System(MPIS), which allows concurrent manipulation of multiple pointing devices. On the traditional GUI system the users have to manipulate each objects sequentially with one pointing device. Our MPIS allows users to point multiple places on GUI screen simultaneously, and manipulate multiple graphical objects(icon, slider)concurrently. The devices are manipulated on a clear acrylic table and the coordinate of each device is calculated from the images captured by a video camera below the table.

CiNii Books

researchmap
Design issues for Network Enabled Server Systems

NAKADA HIDEMOTO, MATSUOKA SATOSHI, SATO MITSUHISA, SEKIGUCHI SATOSHI

IPSJ SIG Notes 81 ( 57(HPC-81) ) 69 - 74 2000.6

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Network Enabled Server is considered to be a good candidate for global computing middleware. This paper clarifies design issues for Network Enabled Server systems and discusses alternatives on each issue. Issues are connection methods, protocol command representation, security methods. We implemented new Ninf system considering with the issues. We also show the design of the system focusing on the security facility.

CiNii Books

J-GLOBAL

researchmap
Overview of a Jini-based Computing Portal System

SUZUMURA TOYOTARO, MATSUOKA SATOSHI, NAKADA HIDEMOTO, SEKIGUCHI SATOSHI

IPSJ SIG Notes 81 ( 57 ) 57 - 62 2000.6

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

JiPANG(Jini-based Portal Augmenting Grids) is a portal system which provides uniform access interface layer to a variety of Grid systems, and is built on top of Jini distributed object technology. JiPANG supports a virtual computing infrastructure called the JiPANG pool, which performs uniform higher-level management of the computing services and resources being managed by individual Grid systems such as Globus or Ninf. In order to give the user a uniform interface to the system JiPANG provides a set of simple Jave Grid APIs called the JiPANG API, and furthermore, allows the user to interact with Grid systems, again in a uniform way, using the JiPANG Browser application. With JiPANG, users need not install any client packages beforehand to interact with Grid systems, nor be concerned about updating to the latest version. Such uniform, transparent services available in a ubiquitous manner we believe is essential for the success of Grid as a viable computing platform for the next generation.

CiNii Books

researchmap
A Scheduling Framework for Global Computing

NAKADA Hidemoto, TAKEFUSA Atsuko, MATSUOKA Satoshi, SATO Mitsuhisa, SEKIGUCHI Satoshi

Transactions of Information Processing Society of Japan 41 ( 5 ) 1617 - 1627 2000.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Rapid progress in networking technology is now making global computing systems feasible. Although there have been proposals of global computing systems, it is still a research issue as to how to achieve efficient usage of computing resources in global computing. In particular, we need to devise appropriate scheduling strategies/algorithms of computing resources over wide-area networks, which are often dynamic and unstable in nature. This paper presents our preliminary scheduling framework for unifying application and job scheduling in global computing. The proposed framework establishes a layer of scheduling and resource allocation subframeworks. We show our software framework Ninf metaserver which provides low-level scheduler and resource monitor. We also evaluate some scheduling strategies using the framework. The evaluation results prove that the framework is flexible enough to implement plural scheduling algorithms on top of it.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00012320/
Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms

TAKEFUSA Atsuko, AIDA Kento, MATSUOKA Satoshi, NAKADA Hidemoto, NAGASHIMA Umpei

Transactions of Information Processing Society of Japan 41 ( 5 ) 1628 - 1638 2000.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

While there have been several proposals of high performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. Our Bricks performance evaluation system would allow analysis and comparison of various scheduling schemes on a typical high-performance global computing setting. Bricks can simulate various behaviors of global computing systems, especially the behavior of networks and resource scheduling algorithms. Moreover, Bricks is componentalized such that not only its constituents could be replaced to simulate various different system algorithms, but also allows incorporation of existing global computing components via its foreign interface. To test the validity of the latter characteristics, we incorporated the NWS system, which monitors and forecasts global computing systems behavior. Experiments were conducted by running NWS under a real environment versus the simulated environment given the observed parameters of the real environment. We observed that Bricks behaved in the same manner as the real environment, and NWS also behaved similarly, making quite comparative forecasts under both environments.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00012321/
A Design of OpenJIT Frontend System

OGAWA HIROTAKA, MATSUOKA SATOSHI, MARUYAMA FUYUHIKO, SOHDA YUKIHIKO, SHIMURA KOUYA

41 ( 2 ) 1 - 12 2000.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

The so-called 'Open Compilers' is a technique to incorporate various self-descriptive modules for language customization and optimization based on computational reflection. We apply the open compiler technique to a Java Just-In-Time compiler to develop the OpenJIT compiler, which allows class-specific customization and optimization, fostering research of new compilation techniques such as application-specific customization and dynamic optimizations. The OpenJIT is largely divided into the frontend and the backend. The frontend takes the Java bytecodes as input, performs higher-level optimizations involving source-to-source transformations, and passes on the intermediate code to the backend. The backend takes the intermediate code from the frontend as input, performs lower-level optimizations, and outputs the native code for direct execution. In this paper, we describe the internal architecture of the frontend system and evaluate it for a simple loop example.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00016944/
Evaluation of Parallel LU Factorization in Java

HASEGAWA HIROKAZU, MATSUOKA SATOSHI, ITOU SHIGEO

IPSJ SIG Notes 137 ( 23 ) 83 - 88 2000.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Most previous attempts at utilizing Java for HPC sacrificed Java's portability, or did not achieve necessary performance required for HPC. Instead, we propose an alternative methodology based on Downloadale Self-tuning Library, and constructed an experimental prototype called AJaPACK, which is a portable and high-performance parallel BLAS library for Java which "tunes" itself to the environment to which it is installed upon. Once AJaPACK is downloaded and executed, the Java version of ATLAS (ATLAS for Java) and the parallelized version of JLAPACK combine to achieve optimized pure Java execution for the given environment. Benchmarks have shown that AJaPACK achieves approximately 1 / 2 to 1 / 5 of the speed of optimized C-ATLAS and vendor supplied BLAS libraries, and with portable parallelization in SMP environments, achieves superior performance to single-threaded C-based native libraries. This is an order of magnitude superior w.r.t. performance compared to previous pure Java BLAS libraries. For Blocked LU-decomposition, reasonable speedup had also been reached ; on the other hand, the AJaPACK version suffers from high-overhead of subarray manipulation, resulting in loss in performance compared to previous routines such as JLAPACK. This shows that building numerical libraries in Java is still not straightforward, and programming techniques specific to Java should be developed for high-performance.

CiNii Books

researchmap
Are Global Computing Systems Useful? Comparison of Client-server Global Computing Systems Ninf, NetSolve Versus CORBA

SATOSHI MATSUOKA

14th IEEE International Parallel \& Distributed Processing Symposium 547 - 556 2000

　More details

researchmap
Performance Issues in Client-Server Global Computing

SATOSHI MATSUOKA

International Workshop on Global and Cluster Computing (WGCC'2000).2000.03 2000

　More details

researchmap
AJaPack; A Performance Portable Parallel Java Numerical Library

SATOSHI MATSUOKA

Proceedings of the ACM 2000 Java Grande Conference, The ACM Press,June, 2000 140 - 149 2000

　More details

researchmap
Performance Issues in Client-Server Global Computing

SATOSHI MATSUOKA

International Workshop on Global and Cluster Computing (WGCC'2000).2000.03 2000

　More details

researchmap
Open JIT：Javaのための開放型自己反映的JITコンパイラフレームワーク

松岡聡

日本ソフトウエア科学会第三回プログラミングおよび応用システムに関するワークショップ（SPA2000,口頭発表）,March 2000 2000

　More details

researchmap
AJaPack; A Performance Portable Parallel Java Numerical Library

SATOSHI MATSUOKA

Proceedings of the ACM 2000 Java Grande Conference, The ACM Press,June, 2000 140 - 149 2000

　More details

researchmap
Are Global Computing Systems Useful? Comparison of Client-server Global Computing Systems Ninf, NetSolve Versus CORBA

SATOSHI MATSUOKA

14th IEEE International Parallel \& Distributed Processing Symposium 547 - 556 2000

　More details

researchmap
NetCFD: A Ninf CFD component for global computing, and its Java applet GUI

M. Sato, K. Kusano, H. Nakada, S. Sekiguchi, S. Matsuoka

Proceedings - 4th International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region, HPC-Asia 2000 1 501 - 506 2000

　More details

Language：English Publisher：Institute of Electrical and Electronics Engineers Inc.

DOI： 10.1109/HPC.2000.846605

Scopus

J-GLOBAL

researchmap
OpenJIT: An open-ended, reflective JIT compiler framework for Java

H Ogawa, K Shimura, S Matsuoka, F Maruyama, Y Sohda, Y Kimura

ECOOP 2000 - OBJECT-ORIENTED PROGRAMMING 1850 362 - 387 2000

　More details

Language：English

Web of Science

researchmap
An Effective Decompilation Algorithm for Java Bytecodes

MARUYAMA FUYUHIKO, OGAWA HIROTAKA, MATSUOKA SATOSHI

40 ( 10 ) 39 - 50 1999.12

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

The technique called decompilation that reads sequences of machine code and generates the corresponding source program has been known for some time, and utilized primarily for reverse-engineering. For Java and its bytecode, although there have been several proposals of decompilers, most generate outputs that are inappropriately extend the Java language, such as insertion of gotos not present in Java. Moreover, the decompilation algorithms are somewhat ad-hoc and difficult to extend of verify its applicability, which is a hindrance to our OpenJIT compiler which requires a decompiler frontend to recover the correct source structure from arbitrary bytecode. Instead, we have devised a new and effective algorithm for decompilation, with emphasis on properly recovering control structures. The key idea is to base on the observation that, for a properly-nested block-structured language, each part of program representing a control structure corresponds to just a single subtree in the dominator tree. As such, the algorithm is general enough to be applied to other languges besides Java. The evaluation of our preliminary implementation in OpenJIT shows that our algorithm properly recovers control structures where other existing decompilers fail, and with relatively equivalent execution speeds.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00016961/
手書きスケッチによる3次元モデリングシステム Teddy--フリーハンドで自由曲線を描くだけで手軽に3次元モデルを作成できる

五十嵐健夫, 松岡聡, 田中英彦

日経CG ( 156 ) 110 - 117 1999.9

　More details

Language：Japanese Publisher：日経BP社

CiNii Books

researchmap
Comparison of Client-Server Global Computing Systems : Performance Evaluation of Ninf, NetSlove, CORBA, Ninf-on-Globus

SUZUMURA TOYOTARO, NAKAGAWA TAKAYUKI, MATSUOKA SATOSHI, NAKADA HIDEMOTO

IPSJ SIG Notes 77 ( 66 ) 197 - 202 1999.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Recent developments of global computing systems such as Ninf, NetSolve and Globus have opened up the opportunites for providing high-performance computing services over wide-area networks. However, most research focused on the individual architectural aspects of the system, or application deployment examples, instead of the necessary charactersistics such systems should intrinsically satisfy, nor how such systems relate with each other. Our comparative study performs deployment of example publications of network-based libraries using Ninf, NetSolve, and CORBA. There, we discover that dedicated systems for global computing such as Ninf and NetSolve have clear management, progammability, as well as performance advantages over CORBA. Furthermore, deployment of Ninf on top of Nexus, the communication layer of Globus, has exhibited some loss of performance as well as somewhat kludgy glueing, due to the fundamental difference on the assumptions of the underlying communication models. Such results indicate that further basic research is necessary across multiple systems to identify the ideal software architectures for global computing.

CiNii Books

researchmap
Implementing and Evaluating the MPC++ Multi-Thread Template Library on Multiple MPI Platforms

SAKAE YOSHIAKI, ISHIKAWA YUTAKA, MATSUOKA SATOSHI, OGAWA HIROTAKA

IPSJ SIG Notes 77 ( 66 ) 41 - 46 1999.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Our parallel programming language MPC++ has been only available on the SCore cluster system software developed at the Real World Computing Partnership. In order to achieve better portability amongst multiple platforms, the scope or MPC++ is being widened via implementation using MPI as the underlying communication layer. This brings up the question of applicability, since MPI performance varies considerably on different platforms. Our evaluation results show that the communication overhead is negligible when the data size is larger than 8 Kbytes. Furthermore, the CG kernel benchmark of Nas Parallel Benchmarks written in MPC++ using MPI achieves a comparable speed to one written in MPI when the number of nodes are small. However, increase in the number of nodes causes severe loss of performance for commodity platforms with low network performance, while it continues to scale well on those with high-performance networks, as well as MPIs on MPPs with fast communication infrastructure. These results suggest that, although MPC++ on MPI is viable on high-performance platforms, we need further research on optimizing for commodity networks.

CiNii Books

researchmap
グローバルコンピューティングシステムNinfを用いた数値流体解析コンポーネントnetCFD

佐藤三久, 草野和寛, 中田秀基, 関口智嗣, 松岡聡

年会一般講演 18 369 - 370 1999.7

　More details

Language：Japanese

CiNii Books

researchmap
並列処理 Javaによる大域的並列計算環境Ninflet

高木浩光, 松岡聡, 中田秀基, 関口智嗣, 佐藤三久, 長嶋雲兵

情報処理学会論文誌 40 ( 5 ) 2203 - 2214 1999.5

　More details

Language：Japanese

J-GLOBAL

researchmap
Ninflet : A Java - based Global Parallel Computing Environment

TAKAGI Hiromitsu, MATSUOKA Satoshi, NAKADA Hidemoto, SEKIGUCHI Satoshi, SATO Mitsuhisa, NAGASHIMA Umpei

Transactions of Information Processing Society of Japan 40 ( 5 ) 2203 - 2214 1999.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

To make global-wide distributed computing system attractive, the system should be open to an arbitrary individual not only for its usage but also for construction of wide variety of application programs. For this purpose, the system must supply a secure environment for safely executing arbitrary programs. Our proposed global computing environment "Ninflet" fulfills such a requirement by exploiting the security mechanism of the Java language, allowing computation to occur on machines not owned or administered by the individual invoking the computation. Ninflet realizes a globally-shared metacomputer which would allow "lending" of computing cycles of machines which would be otherwise unused at nights to the other side of the globe, or to simply build a parallel execution environment on a heterogeneous sets of workstation clusters. We present the system architecture of Ninflet and a preliminary performance evaluation when used as a parallel execution environment.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00012665/
Additive Interaction Nets : Yet Another Linear Logic Programming Language

MATSUOKA SATOSHI

40 ( 4 ) 72 - 72 1999.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We propose a new programming language, which is an extension of Lafont's interaction nets to the additive case. The extension here is to introduce first-order unification variables: each agent in interaction nets has several first order terms with unification variables. When agents interact, information on interaction nets can be distributed by first order unification. In contrast with the standard interaction nets, our interaction nets with first order terms do not have the Church-Rosser property: several rewrite rules may apply to an additive interaction net. Girard's additive proof nets can be considered as a special case of our interaction nets with first order terms. We consider the extended interaction nets as a better substitute for linear logic programming languages based on backward proof search, which is a concurrent object oriented programming language, for some purposes, especially for formalization of componentbased programming, which is a trend in real computing, e.g. Java Beans and Active X. We can encode a π-calculus-like logic programming language as well as the SLD-resolution into the additive interaction nets.

CiNii Books

researchmap
Performance Evaluation of Global Computing Systems by Simulation

TAKEFUSA Atsuko, AIDA Kento, NAKADA Hidemoto, OGAWA Hirotaka, MATSUOKA Satoshi, SATO Mitsuhisa, SEKIGUCHI Satoshi, NAGASHIMA Umpei

Transactions of Information Processing Society of Japan 40 ( 5 ) 2192 - 2202 1999.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

While there have been several proposals of high performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. This paper describes design and implementation of the simulator that evaluates scheduling schemes on a typical high-performance global computing system. The simulator can simulate various features of global computing systems by adopting a queueing model. Effectiveness of the simulator was verified by the simulation results, which showed very similar results to the experimental results on a real global computing system. This paper also shows simulation results of simple scheduling schemes by the simulator. Results show it is important to consider resource conditions appropriately for overall system performance.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00012664/
A Navigational Interface for Mobile Computing using 3D Spatial Audio

KII MANABU, MATSUOKA SATOSHI, HAYASHI KAZUTERU

83 31 - 36 1999.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We proipose the 3D Audio Compass, an navigational interface for mobile computing using 3D spatial audio. 3D Audio Compass can guide the user to the destination intuitively, allowing the user to concentrate his attention on his real-world task. A prototype system is tested using VRML, and experimental results suggest that the guidance by 3D spatial audio is effective.

CiNii Books

researchmap
Evaluation of Portable Software DSM employing Reflection

YAGISAWA NAOYA, OGAWA HIROTAKA, SOHDA YUKIHIKO, MATSUOKA SATOSHI

IPSJ SIG Notes 132 ( 21 ) 109 - 114 1999.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Platform portability is one of the utmost demanded properties of a system today, due to the diversity of runtime execution environment of wide-area networks, and parallel programs are no exceptions. However, parallel execution environments are very diverse, could change dynamically, while performance must be portable as well. As a result, techniques for achieving platform portability are sometimes not appropriate, or could restrict the programming model, e.g., to simple message passing. Instead, we propose the use of reflection for achieving platform portability of parallel programs. As a prototype experiment, a software DSM system was created which utilizes the compile-time metaprogramming features of OpenC++ 2.5 to generate a message-passing MPC++ code from a SPMD-style, shared-memory C++ program. To characterize the effect of our system, we perform SPLASH2 on a PC cluster linked by the Myrinet gigabit network, and resulted in resonable performance compared to a high-performance SMP. We also indicate that it can achieve comparable performance to low-overhead DSMs, such as Shasta.

CiNii Books

researchmap
Towards Performance Evaluation of High-Performance Computing on Multiple Java Systems

ITOU SHIGEO, MATSUOKA SATOSHI

IPSJ SIG Notes 132 ( 21 ) 25 - 30 1999.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Despite claims of platform portability, it is not clear whether Java is suitable for high-performance scientific computing. In fact optimizations by e.g. JIT compilers may not be effective for achieving high performance in various scientific code, i.e., "performance portability" may not be guaranteed in current Java systems. To solve this situation, we are constructing a benchmarking platform for Java that candidly compares different Java systems. In particular, we have constructed a Java version of ATLAS, a program generator that outputs platform-specific optimized BLAS, to investigate the peak performance of each Java system. Then, we compared this performance to typical source-level optimizations that a user or a compiler might perform, to see how close such optimizations can approach the peak performance.

CiNii Books

researchmap
Overview of a Global Computing Simulator

TAKEFUSA ATSUKO, AIDA KENTO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, NAGASHIMA UMPEI

IPSJ SIG Notes 132 ( 21 ) 31 - 36 1999.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

While there have been several proposals of high performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. This paper describes an overview of the Bricks simulator that evaluates scheduling schemes on a typical high-performance global computing system. Bricks can simulate various behaviors of global computing systems, especially the behavior of networks and resource scheduling algorithms. Moreover, Bricks is componentalized such that not only its constituents could be replaced to simulate various different system algorithms,but also allows incorporation of existing global computing components via its foreign interface. To test the validity of the latter characteristics, we incorporated the NWS system, which monitors and forecasts global computing systems behavior. Experiments were conducted by running NWS under a real environment versus the simulated environment given the observed parameters or the real environment. Under both environments, NWS behaved similarly, making quite comparative forecasts.

CiNii Books

researchmap
Development and preliminary evaluation of remote computing resource access systems using Ninf

SATO MITSUHISA, TANAKA YOSHIO, KUSANO KAZUHIRO, NAKADA HIDEMOTO, SEKIGUCHI SATOSHI, NAGASHIMA UMPEI, MATSUOKA SATOSHI

IPSJ SIG Notes 132 ( 21 ) 37 - 42 1999.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We are developing prototype systems to access remote computing resources by using a global computing middle ware, Ninf. Ninf allows the users to make use of the remote computing resources as computational components in his program. As our prototypes, we designed a Computational Fluid Dynamics (CFD) component, netCFD, and a Computational Chemistry component, netMO for molecular orbital computation. Though a large amount of data in each time step may be stored in CFD applications, the overhead of the I/O can be reduced by overlapping I/O and computation even in the remote CFD computation. As a demonstration of netCFD, we design the GUI using Java applet to make use of the net CFD component through Web browsers. The GUI applet invokes the CFD computation in remote Ninf server, and receives the results by the callback interface in Ninf to visualize the results in each time step.

CiNii Books

J-GLOBAL

researchmap
「世紀末討論会 : 20世紀, コンピュータ・サイエンスは何の役に立ったか? : <現場エンジニアvs理論研究者たちの壮絶バトル>」

竹内郁雄, 鯵坂恒夫, 荒木啓二郎, 石田喬也, 上原三八, 土屋正登, 松岡聡

情報処理 40 ( 2 ) 32 - 32 1999.2

　More details

Language：Japanese Publisher：一般社団法人情報処理学会

CiNii Books

researchmap
Implementation of DSM Using OpenC++ Reflection

SOHDA YUKIHIKO, OGAWA HIROTAKA, MATSUOKA SATOSHI

40 ( 1 ) 13 - 22 1999.2

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Platform portability is one of the utmost demanded properties of a system today, due to the diversity of runtime execution environment of wide-area networks, and parallel programs are no exceptions. However, parallel execution environments are VERY diverse, could change dynamic any, while performance must be portable as well. As a result, techniques for achieving platform portability are sometimes not appropriate, or could restrict the programming model, e.g., to simple message passing. Instead, we propose the use of reflection for achieving platform portability of parallel programs. As a prototype experiment, a soft ware DSM system was created which utilizes the compile-time metaprogramming features of OpenC++ 2.5 to generate a message-passing MPC++ code from a SPMD-style, shared-memory C++ program. The translation creates memory management objects on each node to manage the consistency protocols for objects arrays residing on different nodes. Read-and write-barriers are automatically inserted on references to shared objects. We evaluated this system on a PC cluster linked by the Myrinet gigabit network.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00017020/
Teddy: A sketching interface for 3D freeform design

T Igarashi, S Matsuoka, H Tanaka

SIGGRAPH 99 CONFERENCE PROCEEDINGS 409 - 416 1999

　More details

Language：English

Web of Science

researchmap
OMPC++ --- A Portable High-Performance Implementation of DSM using OpenC++ Reflection

Yukihiko Sohda, Hirotaka Ogawa, Satoshi Matsuoka

Proc. of Reflection'99, Springer LNCS 1616 215 - 234 1999

　More details

researchmap
Teddy: A Sketching Interface for 3D Freeform Design

Takeo Igarashi, Satoshi Matsuoka, Hidehiko Tanaka

Proc. ACM SIGGRAPH'99 409 - 416 1999

　More details

researchmap
OMPC++-A Portable High-Performance Implementation of DSM using OpenC++ Reflection

SOHDA Y.

LNCS 1616 215 - 234 1999

　More details

researchmap
OpenJIT--自己反映計算に基づいた動的に変更可能なJava JITコンパイラ (特集ネットワ-クコンピュ-ティングの新展開--オ-プンJavaのもたらすもの)

松岡聡

Computer today 15 ( 6 ) 4 - 11 1998.11

　More details

Language：Japanese Publisher：サイエンス社

CiNii Books

researchmap
History and Developments of Java Implementation Technologies

MATSUOKA Satoshi

Journal of The Society of Instrument and Control Engineers 37 ( 9 ) 627 - 632 1998.9

　More details

Language：Japanese Publisher：The Society of Instrument and Control Engineers

DOI： 10.11499/sicejl1962.37.627

CiNii Books

researchmap
Implementation of Communication Library on Ninflet : A Java-based Global Parallel Computing System

OOHISA MITSUTAKA, TAKAGI HIROMITSU, MATSUOKA SATOSHI, OGAWA HIROTAKA

IPSJ SIG Notes 72 ( 72 ) 67 - 72 1998.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

There have been several recent proposals of high-performance distributed systems that utilize idle computing resources during the nights, etc. These systems typically employ highly portable programming language systems such as Java, and our Ninflet is one such system.However, evaluation of these systems have been mostly done with simple master-worker styles only, and more complex parallel programming styles have resorted to low-level communication primitives such as RMI and MPI.Instead, we design and encapaulate several high-level par-allel programming patterns as class libraries for Ninflet using the so-called 'design patterns', and evaluate its effectiveness by comparing with traditional parallel Programming styles.

CiNii Books

researchmap
Authentication for Ninf : Global Computation System

NAKADA HIDEMOTO, MATSUOKA SATOSHI, SATOH MITSUHISA, SEKIGUCHI SATOSHI

IPSJ SIG Notes 72 ( 72(HPC-72) ) 79 - 84 1998.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Rapid growth of network technology made high-performance distributed computing possible, Technical aspects of software framework for high-performance distributed computing are already almost established. However, from social aspect, some important issues still remain open, i.e, access control or accounting. In this paper, we discuss authentication mechanism which is needed for the above issues.Strictness of authentication and easiness of system usage are tradeoff.Authentication machanism have to be choosen according to system usage.

CiNii Books

J-GLOBAL

researchmap
Implementation and Preliminary Evaluation of Global Scheduling Framework in Ninf

TAKEFUSA ATSUKO, NAKADA HIDEMOTO, AIDA KENTO, OGAWA HIROTAKA, MATSUOKA SATOSHI, NAGASHIMA UMPEI

IPSJ SIG Notes 72 ( 72 ) 73 - 78 1998.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Rapid progress in networking technology is now making global computing systems feasible.Although there have been proposals of global computing systems, it is still a research issue as to how to achieve efficient usage of computing resources in global computing. In particular, we need to devise appropriate scheduling strategies/algorithms of computing resources over wide-area networks, which are often dynamic and unstable in nature. This paper presents our preliminary scheduling framework for unifying application and job scheduling in global computing.The proposed framework establishes a layer of scheduling and resource allocation subframeworks. We show our software framework Ninf metaserver which provides low-level scheduler and resource monitor.We also evaluate some scheduling strategies by actual envi-ronment and our performance evaluation model.

CiNii Books

researchmap
OpenJIT : A Reflective Java JIT Compiler

MATSUOKA Satoshi, OGAWA Hirotaka, SHIMURA Kouya, KIMURA Yasunori, HOTTA Kohichior, TAKAGI Hiromitsu

IEICE technical report. Computer systems 98 ( 234 ) 49 - 56 1998.8

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

The so-called 'Open Compilers' is a technique to incorporate various self-descriptive modules for language customization and optimization based on computational reflection. We apply the open compiler technique to a Java Just-In-Time compiler to develop the OpenJIT compiler, which allows class-specific customization and optimization, fostering research of new compilation techniques such as application-specific customization and dynamic optimizations.

CiNii Books

researchmap
Global Parallel Computation using Ninf (Special Issue on Parallel Processings)

NAKADA Hideki, TAKAGI Hiromitsu, MATSUOKA Satoshi, NAGASHIMA Umpei, SATOH Mitsuhisa, SEKIGUCHI Satoshi

Transactions of Information Processing Society of Japan 39 ( 6 ) 1818 - 1826 1998.6

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Distributed computing using message passing libraries in a LAN (Local Area Network) environment is already accepted as an effective supercomputing methodology. On the other hand, although distributed computing in WAN (Wide Area Network) environment is becoming practical due to recent development of high-speed network facilities, software framework for supercomputing in WAN is yet to be established. We propose'Ninf', a distributed computing framework for globally distributed computing environment. Ninf enables parallel computing in WAN based on the macro dataflow model, and facilitates automatic dynamic load distribution and scheduling. Ninf has the following advantages over using existing message passing libraries in WAN supercomputing: (1) communication protocol suited for globally distributed environment, (2) ease of programming (3) reuse of existing libraries, (4) integration with existing data resources on the Internet.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00013059/
OMPI : A Compile - time Optimizer for MPI Programs (Special Issue on Parallel Processings)

OGAWA Hirotaka, MATSUOKA Satoshi

Transactions of Information Processing Society of Japan 39 ( 6 ) 1700 - 1708 1998.6

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

MPI is gaining widespread acceptance as a standard for message passing in high-performance computing, due to its powerful and flexible support of various communication styles. However, the complexity of its API poses significant software overhead, and as a result, applicability of MPI has been restricted to rather regular, coarse-grained computations. Our OMPI (Optimizing MPI) system removes much of the excess overhead by employing partial evaluation techniques, which exploit static information of MPI calls. Because partial evaluation alone is insufficient, we also utilize template functions for further optimization. Benchmarks show that OMPI improves execution efficiency by as much as factor of two for communication-intensive application core with minimal code increase. It also performs significantly better than previous dynamic optimization technique.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00013046/
Multi - client LAN/WAN Performance Analysis of Ninf (Special Issue on Parallel Processings)

TAKEFUSA Atsuko, OGAWA Hirotaka, MATSUOKA Satoshi, NAKADA Hideki, TAKAGI Hiromitsu, SATO Mitsuhisa, SEKIGUCHI Satoshi, NAGASHIMA Umpei

Transactions of Information Processing Society of Japan 39 ( 6 ) 1827 - 1838 1998.6

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Rapid increase in speed and availability of network of supercomputers is making high-performance global computing possible, including our Ninf system. However, critical issues regarding system performance characteristics in global computing have been little investigated, especially under multi-client, multi-site WAN settings. In order to investigate the feasibility of Ninf and similar systems, we conducted benchmarks under various LAN and WAN environments, and observed the following results: 1)Given sufficient communication bandwidth, Ninf performance quickly overtakes client local performance, 2) current supercomputers are sufficient platforms for supporting Ninf and similar systems in terms of performance and OS fault resiliency, 3) for a vector-parallel machine (Cray J90), employing optimized data-parallel library is a better choice compared to conventional task-parallel execution employed for non-numerical data servers, 4) computationally intensive tasks such as EP can readily be supported under the current Ninf infrastructure, and 5) for communication-intensive applications such as Linpack, server CPU utilization dominates LAN performance, while communication bandwidth dominates WAN performance, and furthermore, aggregate bandwidth could be sustained for multiple clients located at different Internet sites; as a result, distribution of multiple tasks to computing servers on different networks would be essential for achieving higher client-observed performance.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00013060/
Javaによる大域的並列計算環境Ninflet

高木浩光, 松岡聡, 中田秀基, 関口智嗣, 佐藤三久, 長嶋雲兵

情報処理学会シンポジウム論文集 98 ( 7 ) 135 - 142 1998.6

　More details

Language：Japanese

J-GLOBAL

researchmap
Interactive Beautification : A Technique for Rapid Geometric Design (Special Issue on Next Generation Human Interface and Interaction)

IGARASHI Takeo, MATSUOKA Satoshi, KAWACHIYA Sachiko, TANAKA Hidehiko

Transactions of Information Processing Society of Japan 39 ( 5 ) 1373 - 1384 1998.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Diagram drawing with conventional computer-assisted drawing editors often tend to take considerable amount of time despite their seeming ease of use. The causes of the problem are too many commands and unintuitive procedures to satisfy geometric constraints. To solve the problem, we propose interactive beautification, a technique for rapid geometric design, and developed a prototype system Pegasus to verify the efficiency of the technique. Interactive beautification system receives the user's freestroke and beautifies it considering geometric constraints among segments. Using the technique, the user can draw precise diagrams with geometric relations rapidly without using any editing commands. Current prototype system supports drawings comprised of straight lines, and a user study was preformed using the prototype system, a commercial CAD, and an OO-based drawing system. The result showed that the users can draw required diagrams more rapidly and more precisely using the prototype system.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00013100/
Essay: Java Versus Programming Language Research

MATSUOKA Satoshi

IPSJ Magazine 39 ( 4 ) 301 - 301 1998.4

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap
The Frontiers of Java-based Frameworks : Applications to Metacomputing

TAKAGI Hiromitsu, MATSUOKA Satoshi

IPSJ Magazine 39 ( 4 ) 302 - 305 1998.4

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap
Implementation of Global Numerical Information Database Server system "NinfDB"

56 ( 0 ) 258 - 259 1998.3

　More details

Language：Japanese

CiNii Books

researchmap
Evaluation of Implicit Co-scheduling on Clustered Parallel Computer

FUKUCHI KENTAROU, MATSUOKA SATOSHI, HORI ATSUSHI, ISHIKAWA YUTAKA

IPSJ SIG Notes 128 ( 18 ) 43 - 48 1998.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Implicit co-scheduling is a parallel job scheduling methodology proposed by the UC Berkeley NOW project, and embodies favorable characteristics such as lack of global schedulers, low overhead, and easy implementation. Previous literatures have claimed that overhead versus traditional gang schedulers was about a factor or 0.6to1.6; however, evaluations were not performed using real-life workloads. We have implemented an implicit co-scheduler on a large-scale, high-performance cluster, and used NAS parallel benchmarks to measure effective performance, There, we found that for FT and CG, the overhead versus gang scheduling can be as high as factor of 2.3, negating the Berkeley results. We conjecture that this is due to excessive network traffic, but are still in the process of performing additional experiments.

CiNii Books

researchmap
Ninf and PM: Communication libraries for global computing and high-performance cluster computing

M Sato, H Tezuka, A Hori, Y Ishikawa, S Sekiguchi, H Nakada, S Matsuoka, U Nagashima

FUTURE GENERATION COMPUTER SYSTEMS 13 ( 4-5 ) 349 - 359 1998.3

　More details

Language：English

DOI： 10.1016/S0167-739X(97)00036-8

Web of Science

J-GLOBAL

researchmap
Ninflet: a Migratable Parallel Objects Framework using Java

Hiromitsu Takagi, Satoshi, Matsuoka Hidemoto, Nakada Satoshi Sekiguchi, Mitsuhisa Satoh, Umpei Nagashima

ACM 1998 Workshop on Java for High-Performance Network Computing 151 - 159 1998

　More details

researchmap
Pegasus: A Drawing System for Rapid Geometric Design

Takeo Igarashi, Satoshi Matsuoka, Sachiko Kawachiya, Hidehiko Tanaka

CHI'98 Summary (ACM Conference on Human Factors in Computing Systems) 24 - 25 1998

　More details

Publisher：ACM

DOI： 10.1145/286498.286511

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/chi/chi1998a.html#IgarashiKTM98
Popup Vernier: A Tool for Sub-Pixel-Pitch Dragging with a Smooth Mode Transition

Yuji Ayatsuka Satoshi, Matsuoka Jun Rekimoto

Proceedings of ACM Symposium on User Interface Software and Technology (UIST'98) 39 - 48 1998

　More details

Publisher：ACM

DOI： 10.1145/288392.288407

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/uist/uist1998.html#AyatsukaRM98
Utilizing the Metaserver Architecture in the Ninf Global Computing System

Hidemoto Nakada Hiromitsu, Takagi Satoshi, Matsuoka Umpei Nagashima Mitsuhisa Sato, Satoshi Sekiguchi

Proc. High-Performance Computing and Networking '98, Springer LNCS 1401 607 - 616 1998

　More details

researchmap
OpenJIT ---A Reflective Java JIT Compiler

S. Matsuoka, H. Ogawa, K. Shimura, Y. Kimura, K. Hotta, H. Takagi

Proc. OOPSLA '98 Workshop on Reflective Programming in C++ and Java 16 - 20 1998

　More details

researchmap
Layered penumbrae: An effective 3D feedback technique

Y Ayatsuka, S Matsuoka, J Rekimoto

3RD ASIA PACIFIC COMPUTER HUMAN INTERACTION, PROCEEDINGS 202 - 209 1998

　More details

Language：English

Web of Science

researchmap
Ninflet: a Migratable Parallel Objects Framework using Java

Hiromitsu Takagi, Satoshi, Matsuoka Hidemoto, Nakada Satoshi Sekiguchi, Mitsuhisa Satoh, Umpei Nagashima

ACM 1998 Workshop on Java for High-Performance Network Computing 151 - 159 1998

　More details

researchmap
Popup Vernier: A Tool for Sub-Pixel-Pitch Dragging with a Smooth Mode Transition

Yuji Ayatsuka Satoshi, Matsuoka Jun Rekimoto

Proceedings of ACM Symposium on User Interface Software and Technology (UIST'98) 39 - 48 1998

　More details

researchmap
A Constraint-Based Approach for Visualization and Animation

Shin Takahashi, Satoshi Matsuoka, Ken Miyashita, Hiroshi Hosobe, Tomihisa Kamada

Constraints 3 ( 1 ) 61 - 86 1998

　More details

Language：English Publisher：Kluwer Academic Publishers

DOI： 10.1023/A:1009708715411

Scopus

researchmap
OpenJIT ---A Reflective Java JIT Compiler

S. Matsuoka, H. Ogawa, K. Shimura, Y. Kimura, K. Hotta, H. Takagi

Proc. OOPSLA '98 Workshop on Reflective Programming in C++ and Java 16 - 20 1998

　More details

researchmap
Pegasus: a drawing system for rapid geometric design.

Takeo Igarashi, Sachiko Kawachiya, Hidehiko Tanaka, Satoshi Matsuoka

CHI 98 Conference Summary on Human Factors in Computing Systems 24 - 25 1998

　More details

Publisher：ACM

DOI： 10.1145/286498.286511

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/chi/chi1998a.html#IgarashiKTM98
Utilizing the metaserver architecture in the Ninf global computing system

H Nakada, H Takagi, S Matsuoka, U Nagashima, M Sato, S Sekiguchi

HIGH-PERFORMANCE COMPUTING AND NETWORKING 1401 607 - 616 1998

　More details

Language：English

Web of Science

researchmap
A performance evaluation model for effective job scheduling in global computing systems

K Aida, A Takefusa, H Nakada, S Matsuoka, U Nagashima

SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING - PROCEEDINGS 352 - 353 1998

　More details

Language：English

Web of Science

researchmap
Reduction of overhead in drawing figures with computer - Detailed analyses of drawing tasks

S Kawachiya, T Igarashi, S Matsuoka, H Tanaka

3RD ASIA PACIFIC COMPUTER HUMAN INTERACTION, PROCEEDINGS 11 - 18 1998

　More details

Language：English

Web of Science

researchmap
A Report on Grandprix for Java, 1997

MATSUOKA Satoshi

IPSJ Magazine 38 ( 12 ) 1093 - 1098 1997.12

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00003323/
ABCL/EM - 4 : An Implementation and Evaluation of a Concurrent Object -oriented Language System on a Data- driven Parallel Computer

YASUGI Masahiro, MATSUOKA Satoshi, YONEZAWA Akinori

Transactions of Information Processing Society of Japan 38 ( 9 ) 1790 - 1799 1997.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Concurrent object-oriented computing provides modeling power and natural MIMD parallelism through concurrency of objects. Unfortunately the high costs of inter-node message passing and intra-node scheduling make the implementation of concurrent object-oriented languages inefficient. To overcome these problems, we have proposed a new software/hardware architecture (ABCL/EM-4) which realizes efficient parallel execution of programs based on a concurrent object-oriented computation model. Our ABCL/EM-4 achieved high performance with a combination of simple and fast hardware mechanisms and sophisticated software design, where the cost of a remote message-passing and/or a context-switch can be almost comparable to that of a sequential procedure call. This paper shows the evaluation results with the developed ABCL/ST compiler on the data-driven parallel computer EM-4.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00013334/
The Plan -Do Style Compilation Technique for Eager Data Transfer in Thread- based Execution and Its Evaluation

YASUGI Masahiro, MATSUOKA Satoshi, YONEZAWA Akinori

Transactions of Information Processing Society of Japan 38 ( 9 ) 1840 - 1848 1997.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Plan-Do compilation technique is a new, advanced compilation framework for eager data transfer on distributed-memory parallel architectures. The technique is especially effective for a recent breed of fine-grain architectures by realizing a high-throughput low-latency communication scheme, pipelined sends. The compilation of high-level, plan-do style code into low-level, eager data transfer code is achieved via straightforward application of the translation function. Benchmark results on a real parallel architecture, EM-4, with the developed ABCL/ST compiler exhibit good performance.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00013338/
Preliminary Study of Global Job Scheduling for Ninf : a High-Performance Global Computing System

OGAWA HIROTAKA, TAKEFUSA ATSUKO, NAKADA HIDEMOTO, AIDA KENTO, MATSUOKA SATOSHI

IPSJ SIG Notes 67 ( 75 ) 85 - 90 1997.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Rapid increase in speed and availability of global-network may make global supercomputing possible including our Ninf system. However, performance characteristics of these systems have been little investigated, especially under multi-clients, multi-sites situation. In order to establish methodology to schedule multiple job requests to multiple computational servers effectively and guarantee performance per each client, we conducted benchmarks under various WAN environments. And we observed communication bandwidth dominated performance for communication-intensive applications such as Linpack, and aggregate bandwidth could be sustained for multi-clients located at different internet sites. Furthermore, according to these observations, we proposed simulation model based on queuing theory. And we also performed preliminary benchmarks using our scheduling server named Metaserver.

CiNii Books

researchmap
Ninfによる広域分散並列計算

中田秀基, 高木浩光, 松岡聡, 長嶋雲兵, 佐藤三久, 関口智嗣

並列処理シンポジウム論文集 1997 281 - 288 1997.5

　More details

Language：Japanese

J-GLOBAL

researchmap
A Constraint Drawing System Combining Dexterity and Precision

54 ( 0 ) 425 - 426 1997.3

　More details

Language：Japanese

CiNii Books

researchmap
Network Numerical Information System Ninf : Performance for Multi-Clients

TAKEFUSA ATSUKO, OGAWA HIROTAKA, MATSUOKA SATOSHI, NAKADA HIDEMOTO, SATO MITSUHISA, SEKIGUCHI SATOSHI, NAGASHIMA UMPEI

IPSJ SIG Notes 65 ( 21(HPC-65) ) 3 - 8 1997.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

To establish a basis for globally-distributed parallel computing in numerial computing, we are currently working on the Ninf (Network based Information Library towards a Globally High Performance Computing) software system. To evaluate the Ninf system, we perform Linpack Benchmark with the Ninf system on Cray J90 vector-parallel supercomputer and DEC Alpha cluster of workstations, and Sun workstations. Results show that the utility and robustness of the Ninf system, and multicomputers such as vector-parallel computers and MPPs can effectively support network information services via Ninf.

CiNii Books

J-GLOBAL

researchmap
Supporting Multiple Parallel Programming Styles with MPC++ and their Performance

NIKAMI ATSUYUKI, MATSUOKA SATOSHI, ISHIKAWA YUTAKA, SATOH MITSUHISA

IPSJ SIG Notes 65 ( 21 ) 57 - 62 1997.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

For parallel processing to become general, the underlying basis should be advanced commodity technology, and parallel (programming) languages are no exceptions. On the other hand, parallel languages must also satisfy the requirements that inherently stem from parallel processing, such as the support of a wide range of parallel programming styles, ease-of-programming, and high performance. We investigated whether existing object-oriented languages satisfy such requirements or not by showing that C++ can support a wide range of parallel programming styles without special language extensions. More concretely, based on MPC++, which is a parallel dialect of C++ extended using only templates and inheritance, we created a class/template library which support three major kinds of parallel programming styles. We tested its performance with representative benchmark programs of each programming styles on a workstation cluster.

CiNii Books

researchmap
Global Parallel Computation using Ninf

NAKADA HIDEMOTO, TAKAGI HIROMITSU, MATSUOKA SATOSHI, NAGASHIMA UMPEI, SATOH MITSUHISA, SEKIGUCHI SATOSHI

IPSJ SIG Notes 65 ( 21(HPC-65) ) 9 - 14 1997.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Distributed computing using message passing libraries in a LAN(Local Area Network) environment is already accepted as an effective supercomputing methodology. On the other hand, although distributed computing in WAN(Wide Area Network) environment is becoming practical due to recent development of high-speed network facilities, software framework for supercomputing in WAN is yet to be established. We propose 'Ninf', a distributed computing framework for globally distributed computing environment. Ninf enables parallel computing in WAN based on the macro dataflow model, and facilitates automatic dynamic load distribution and scheduling. Ninf has the following advantages over using existing message passing libraries in WAN supercomputing : (1)communication protocol suited for grobally distributed environment, (2)ease of programming (3)reuse of existing libraries, (4)integration with existing data resources on the Internet.

CiNii Books

J-GLOBAL

researchmap
Ninflet:JavaによるWorld-Wide High Performance Computing環境 (インターネットコンファレンス'97論文集) -- (Session 3(Application)〔和文〕)

高木浩光, 松岡聡, 中田秀基

インタ-ネットコンファレンス論文集 ( 1997 ) 133 - 147 1997

　More details

Language：Japanese Publisher：日本ソフトウェア科学会インタ-ネットテクノロジ研究会〔ほか〕

CiNii Books

researchmap
Multi-client LAN/WAN Analysis of Ninf : a High-Performance Global Computing System

SATOSHI MATSUOKA

Proceedings of IEEE Supercomputing '97, San Jose, CA 1997

　More details

DOI： 10.1145/509593.509615

J-GLOBAL

researchmap
Interactive Beautification : A Technique for Rapid Geometric Design

SATOSHI MATSUOKA

Proceedings of ACM Symposium on User Interface Software and Technology (UIST'97), Banff, Canada 1997

　More details

researchmap
A Methodology for Specifying Data Distribution using only Standard Object-Oriented Features

SATOSHI MATSUOKA

Proceedings of ACM/IEEE International Conference on Supercomputing (ICS'97), Vienna, Austria 116 - 123 1997

　More details

researchmap
Multi-client LAN/WAN Analysis of Ninf : a High-Performance Global Computing System

SATOSHI MATSUOKA

Proceedings of IEEE Supercomputing '97, San Jose, CA 1997

　More details

DOI： 10.1145/509593.509615

J-GLOBAL

researchmap
Interactive Beautification : A Technique for Rapid Geometric Design

SATOSHI MATSUOKA

Proceedings of ACM Symposium on User Interface Software and Technology (UIST'97), Banff, Canada 1997

　More details

researchmap
A Methodology for Specifying Data Distribution using only Standard Object-Oriented Features

SATOSHI MATSUOKA

Proceedings of ACM/IEEE International Conference on Supercomputing (ICS'97), Vienna, Austria 116 - 123 1997

　More details

researchmap
Developement of NinfCalc: Network Based Table Calculator for Matrix

53 ( 0 ) 467 - 468 1996.9

　More details

Language：Japanese

CiNii Books

researchmap
Ninf : Network Based Information Library for Global World-Wide Computing Infrastructure : the Software Architecture and its Performance

SEKIGUCHI SATOSHI, NAKADA HIDEMOTO, SATO MITSUHISA, NAGASHIMA UMPEI, MATSUOKA SATOSHI

IPSJ SIG Notes 62 ( 81(HPC-62) ) 153 - 158 1996.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

For the purpose of establishinig a framework of information sharing over the Internetwork, we have proposed the Ninf, Network based information library for high performance computing. Basically, the Ninf is based on the server-client model. Thus, servers, residing in the network world, handle information resources either its numerical executables or scientific constants, and clients are programmed by users with the Ninf client library which establishes RPC connections connect to servers. In this article, the Ninf software system will be overviewed followed by the preliminary results on the Linpack Benchmark with the Ninf-RPC, which lead to the conclusion that the possibility of the network computing with the Ninf even if the granularity of the program size is rather smaller.

CiNii Books

J-GLOBAL

researchmap
Parallel Programming using Parallel STL

NAKADA HIDEMOTO, SATOH MITSUHISA, MATSUOKA SATOSHI, ISHIKAWA YUTAKA, MATSUDA MOTOHIKO

IPSJ SIG Notes 96 ( 82 ) 85 - 90 1996.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

There are several works on parallel processing with C++ using template library. In this paper, we discuss. template technology as an interface for parallel programming. Parallel template library allows us to program parallel programs taking no thought of target machine configuration. Data-parallel template library enables optimization using data locality. Task-parallel template library provides methods to distribute work-loads.

CiNii Books

researchmap
Ninf API for Distributed Memory Multiprocessors

OGAWA HIROTAKA, MATSUOKA SATOSHI, NAKADA HIDEMOTO, SATO MITSUHISA, SEKIGUCHI SATOSHI

IPSJ SIG Notes 62 ( 81(HPC-62) ) 159 - 164 1996.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

To establish a basis for globally-distributed parallel computing in numerial computing, we are currently working on the Ninf (Network based Information library for High Performance Computing) software system. Using this system on distributed memory multiprocessors, Ninf core server runs on a single node such as a frond-end machine or an I/O processor. As a result, computation data concentrates on the node and easily become a bottleneck and even might exhaust its memory resource. To prevent this problem, we propose new common API for describing initial data distributions and mechanism to hand-off connection with Ninf-client to the target node successively. We preliminarily evaluate our hand-off mechachism on Fujitsu's AP1000. Results show that it improves the total execution times.

CiNii Books

J-GLOBAL

researchmap
Implementing MPI in a High-Performance, Multithreaded Language MPC++

O'CARROLL FRANCIS B., HORI ATSUSHI, TEZUKA HIROSHI, ISHIKAWA YUTAKA, MATSUOKA SATOSHI

IPSJ SIG Notes 62 ( 81 ) 141 - 146 1996.8

　More details

Language：English Publisher：Information Processing Society of Japan (IPSJ)

We have ported the MPICH implementation of MPI to the high-performance, multithreaded programming language MPC++. We discuss our modifications to the design of MPICH to support multiple threads. MPICH now runs experimentally on top of MPC++ on a Sun workstation cluster connected by Myrinet and achieves higher performance than standard MPICH on Myrinet TCP/IP on the same hardware.

CiNii Books

researchmap
A Pen-Based Constraint Drawing System Combining Dexterity and Precision

IGARASHI Takeo, KAWACHIYA Sachiko, MATSUOKA Satoshi, TANAKA Hidehiko

IPSJ SIG Notes 81 ( 77 ) 85 - 90 1996.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Traditional sketching on computers lacked the freedom of real pens, Our drawing editor combines the dexterity of real pens and computer-assisted precision based on techniques such as two-phased sketch interaction with pie menu and sliders, beautification with perceptual constraints, and segment-based drawing representation. Prototype implementation on IBM pen PC and Xerox Liveboard has shown that the system is fast and easy to use.

CiNii Books

researchmap
A Compilation Technique for Parallel Reflective Language Systems Using Partial Evaluation

MASUHARA Hidehiko, MATSUOKA Satoshi, YONEZAWA Akinori

Transactions of Information Processing Society of Japan 37 ( 7 ) 1290 - 1298 1996.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Meta-programmability of parallel reflective language systems is beneficial for parallel applications to describe optimizations, etc. On the other hand, their execution model based on interpretation is an obstacle to efficient implementation. We propose a compilation technique for parallel reflective languages using partial evaluation. The technique, which effectively eliminates program interpretation, includes partial evaluation extended for side-effects, and several program transformation techniques. Benchmarks on a MPP show that parallel applications with meta-level optimizations can be executed with small overhead.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00013611/
UbiquitousLinks : Hypermedia Links Embeded in the Real World

AYATSUKA Yuji, REKIMOTO Jun, MATSUOKA Satoshi

67 23 - 30 1996.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Progress of hardware technology makes computers portable, and they become able to mediate between users and the real world. On the other hand, World Wide Web (WWW) becomes widely available. Although information related with objects in the real world exists on many WWW sites, these relations are unclear. This paper proposes a hypermedia system to link the real world objects with information on the WWW. We use IDs attached on the real world objects and a portable computer connected to the Internet via wireless networks. This combination makes it possible to retrieve information on the WWW from real world objects.

CiNii Books

researchmap
A Commentary on Program Language Design and Implementation Research at ICOT-Honoring our Great Predecessors

MATSUOKA Satoshi

IPSJ Magazine 37 ( 5 ) 407 - 410 1996.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap
Analysis of the structures of diagrams created in drawing editors

52 ( 0 ) 89 - 90 1996.3

　More details

Language：Japanese

CiNii Books

researchmap
A Meta Server Architecture for Ninf : Networked Information Library for High Performance Computing

NAKADA HIDEMOTO, HUSANO TAKAYUKI, MATSUOKA SATOSHI, SATOH MITSUHISA, SEKIGUCHI SATOSHI

IPSJ SIG Notes 60 ( 22(HPC-60) ) 77 - 82 1996.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

To establish a framework of information sharing in the numerical computation area, we have proposed the Ninf, Network based information library for high performance computing. In this paper, we show a Meta Server architecture, which is a component of the Ninf system. Meta Server stand between the Server and the Client and hides the Server from the Client. It also enables easy distributed concurrent computation.

CiNii Books

J-GLOBAL

researchmap
OMPI : Optimizing MPI programs using Partial Evaluation

SATOSHI MATSUOKA

Proc. IEEE/ACM Supercomputing '96, Pittsburgh, PA, IEEE Society Press, 1996 (proceedings in CD-ROM). 1996

　More details

researchmap
OMPI : Optimizing MPI programs using Partial Evaluation

SATOSHI MATSUOKA

Proc. IEEE/ACM Supercomputing '96, Pittsburgh, PA, IEEE Society Press, 1996 (proceedings in CD-ROM) 1996

　More details

researchmap
COMPILING AWAY THE META-LEVEL IN OBJECT-ORIENTED CONCURRENT REFLECTIVE LANGUAGES USING PARTIAL EVALUATION

H MASUHARA, S MATSUOKA, K ASAI, A YONEZAWA

SIGPLAN NOTICES 30 ( 10 ) 300 - 315 1995.10

　More details

Language：English

DOI： 10.1145/217839.217869

Web of Science

J-GLOBAL

researchmap
Extension to a Parallel Constraint Logic Programming Language for Applications in Optimization Problems.

51 ( 0 ) 77 - 78 1995.9

　More details

Language：Japanese

CiNii Books

researchmap
Report on the Object-Oriented '95 Symposium (OO '95)

AOYAMA Mikio, NISHIOKA Kenji, KISHI Tomoji, UEHARA Sanya, MATSUOKA Satoshi, CHUSHO Takeshi, FUKAZAWA Yoshiaki

IPSJ SIG Notes 105 ( 84 ) 89 - 97 1995.9

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

The Object-Oriented '95 Symposium was held on June 1-2. 1995 at Mita Campus of Keio University in Tokyo Under the theme of "Theory and Practice of Object-Oriented Systems Development", opening speeches, tutorials. general sessions and panel session have covered a wide spectrum of development technologies based on object-orientation. This report highlightens the major topics of the symposium as well as two special sessions ; one session presented the experience of object-oriented systems development and another was a panel on the theory and practice of object-oriented development technology.

CiNii Books

researchmap
Hierarchical Collection : a Simple Scheme for the Separation of Parallelism and Distribution

SATO Naohito, MATSUOKA Satoshi, YONEZAWA Akinori

IPSJ SIG Notes 57 ( 81 ) 37 - 42 1995.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Separation of parallelism and distribution is one of major concerns of efficient massively parallel computation. The details of distribution should be hidden from users of parallel / distributed class frameworks, but should be easily modifiable by (library) programmers. We have proposed a new scheme for building object-oriented parallel distributed class frameworks based on a simple but mathematically disciplined model called hierarchy of collections. Based on this model, classes can be easily derived to achieve high performance massively parallel computation on a variety of physical platforms. We describe in detail how to define hierarchical collections for typical examples of distributions.

CiNii Books

researchmap
Evaluation of MPI Optimization Method by Eliminating Software Overhead

OGAWA Hirotaka, MATSUOKA Satoshi

IPSJ SIG Notes 57 ( 81 ) 13 - 18 1995.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

With generic implementation of MPI (Standard Interface of Message Passing Library for parallel computers), dynamic data types and communication contexts must be supported extensively. As a result, software overhead per single communication becomes very large, despite inherent low latency communication performance of the target architecture. In this paper, we propose a method of generating optimized MPI programs, by 1) analyzing a MPI program statically, 2) specializing the program using static information to eliminate software overhead. We preliminarily evaluate basic communication performance of this method on Fujitsu's AP1000. Results show that simple static analysis decreases the overhead from 338μsec to 76μsec, and greatly improves both latency and throughput.

CiNii Books

researchmap
Meta-Level Architecture of ABCL/f and its Use in Parallel Programs

Masuhara Hidehiko, Matsuoka Satoshi, Yonezawa Akinori

IPSJ SIG Notes 95 ( 82 ) 65 - 72 1995.8

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Meta-level programming via computational reflection has come to be recognized as beneficial for parallel applications. Whether we can clearly program practical meta-programs greatly depends on the design of the language's meta-architecture. This paper presents a design of the meta-architecture of ABCL/f, an object-oriented concurrent reflective language. Its features are customization via the meta-interpreters and the meta-objects, annotations that serve as directives that are implemented by the meta-programs, re-use via inheritance in meta-programs, etc. The effectiveness of the architecture is examined through examples from several parallel programs.

CiNii Books

researchmap
Adaptive Recognition of Implicit Structures in Human-Organized Layouts

IGARASHI Takeo, MATSUOKA Satoshi, MASUI Toshiyuki

61 ( 70 ) 33 - 38 1995.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Card-handling using hypertext editor can be a powerful methodology understanding of complex problems. To support such activity, recognizing implicit structure in the arrangement of cards would be useful. But, because these structures are by nature ambiguous and highly dependent on user-specific perception, it is difficult for conventional rule-based spatial parsing algorithm to achieve this task. We propose techniques for building spatial parser suitable for finding such ambiguous structures based on the mechanics of human perception. Moreover, our parser is adaptively customized to reflect a particular user's preferences through an interactive suggestion process, supported by application of a genetic algorithm.

CiNii Books

researchmap
Editor's Message to Special Session on Parallel Processings

IPSJ Journal 36 ( 7 ) 1503 - 1503 1995.7

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

CiNii Books

researchmap
Evaluation of performance and reliability of client-server system coded by TCP/IP and PVM for Ninf system

IIOKA Mie, NII Yukako, NAGASHIMA Umpei, SEKIGUCHI Satoshi, SATO Mitsuhisa, MATSUOKA Satoshi, HOSOYA Haruo

IPSJ SIG Notes 55 ( 28(HPC-55) ) 81 - 88 1995.3

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

A prototype of a numerical information library system based on a high performance wide area network (Ninf) was developed to evaluate performance and reliability of communication using TCP/IP and PVM. Though client-server system constructed by TCP/IP is almost 10 times faster than that by PVM, programming cost is quite high because many overhead and error treatment should be coded by the user. On the other hand, client-server program is easily made using PVM.

CiNii Books

J-GLOBAL

researchmap
Interactive Generation of Graphical User Interfaces by Multiple Visual Examples

MIYASHITA Ken, MATSUOKA Satoshi, TAKAHASHI Sin, YONEZAWA Akinori, Ken Miyashita, Satoshi Matsuoka, Shin Takahashi, Akinori Yonezawa, Information & Communication System Research Division Research Center SONY Co. Ltd., Department of Mathematical Engineering Faculty of Engineering University of Tokyo., Department of Information Science Faculty of Science University of Tokyo., Department of Information Science Faculty of Science University of Tokyo.

11 ( 6 ) 41 - 51 1994.11

　More details

Language：Japanese

CiNii Books

researchmap
Constructing Algorithm Animations via Declarative Specifications

TAKAHASHI Shin, MIYASHITA Ken, MATSUOKA Satoshi, YONEZAWA Akinori, Shin Takahashi, Ken Miyashita, Satoshi Matsuoka, Akinori Yonezawa, Department of Information Science The University of Tokyo., Department of Information Science The University of Tokyo:(Present address) Research Center Sony Co., Department of Mathematical Engineering The University of Tokyo., Department of Information Science The University of Tokyo.

11 ( 6 ) 83 - 94 1994.11

　More details

Language：Japanese

CiNii Books

researchmap
An Object-Oriented Concurrent Reflective Language for Dynamic Resource Management in Highly Parallel Computing

Masuhara Hidehiko, Matsuoka Satoshi, Yonezawa Akinori

94 ( 65 ) 57 - 64 1994.7

　More details

Language：English Publisher：Information Processing Society of Japan (IPSJ)

Irregular parallel applications, whose data and communication patterns are determined only at run-time, often requires good dynamic resource management (DRM) tailored to the application and/or hardware architecture for efficient execution. To easily provide such DRM system, this paper proposes an object-oriented concurrent reflective language ABCL/R3. In ABCL/R3, various DRM systems including scheduling, object allocation, and load balancing, can be realized by modifying/extending abstracted meta-level of the language in an encapsulated way. This paper also shows preliminary evaluation of the language including a basic cost of reflection and a simple DRM system, developed in a prototype system running on a multicomputer AP1000.

CiNii Books

researchmap
Design and Implementation of an Object-Oriented Concurrent Reflective Language ABCL/R2.

MASUHARA Hidehiko, MATSUOKA Satoshi, WATANABE Takuo, Hidehiko Masuhara, Satoshi Matsuoka, Takuo Watanabe, Department of Information Science The University of Tokyo., Department of Information Science The University of Tokyo:(Present address) Department of Mathematical Engineering The University of Tokyo, Department of Information Science The University of Tokyo:(Present address) School of Information Science Japan Advanced Institute of Science and Technology

11 ( 3 ) 15 - 32 1994.5

　More details

Language：Japanese

CiNii Books

researchmap
The Implementation of a Reflective Object-Oriented Concurrent Language without a Run-time Kernel.

ICHISUGI Yuuji, MATSUOKA Satoshi, YONEZAWA Akinori, Yuuji Ichisugi, Satoshi Matsuoka, Akinori Yonezawa, Electrotechnical Laboratory., Department of Mathematical Engineering The University of Tokyo., Department of Information Science The University of Tokyo.

11 ( 3 ) 65 - 77 1994.5

　More details

Language：Japanese

CiNii Books

researchmap
The Plan-Do Style Compilation Technique for Eager Data Transfer in Thread-Based Execution

Yasugi M., Matsuoka S., Yonezawa A.

94 ( 65 ) 9 - 16 1994

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

Plan-do compilation technique is a new, advanced compilation framework for eager data transfer on distributed-memory parallel architectures. The technique is especially effective for a recent breed of fine-grained architectures by realizing a high-throughput low-latency communication scheme, pipelined sends. The compilation of high-level, plan-do style code into low-level, eager data transfer code is achieved via straightforward application of a set of translation rules. Preliminary low-level benchmark results on a real parallel architecture, EM-4, exhibit good speedups.

CiNii Books

researchmap
An Efficient Implementation of Concurrent Object-Oriented Languages on Multicomputers

TAURA Kenjiro, MATSUOKA Satoshi, YONEZAWA Akinori

7 39 - 42 1993.7

　More details

Language：Japanese

CiNii Books

researchmap
IMSA'92国際リフレクションワークショップ

松岡聡, 増原英彦, Satoshi Matsuoka, Hidehiko Masuhara, 東京大学理学部情報科学科, 東京大学理学部情報科学科, Department of Information Science the University of Tokyo., Department of Information Science the University of Tokyo.

コンピュータソフトウェア 10 ( 4 ) 76 - 82 1993.7

　More details

Language：Japanese Publisher：日本ソフトウェア科学会

CiNii Books

researchmap
Mu1tilispの操作的意味及び実現

浅井健一, 松岡聡, 米澤明憲

全国大会講演論文集 41 ( 0 ) 8 - 9 1990.9

　More details

Language：Japanese

近年、並列Lispが関数型言語の潜在的な並列性を大きく引き出せるものとして注目されている。実際にMultilisp[3]をはじめとしてMultischeme[6],Mul-T[5],QIisp[2]などたくさんの並列が開発され並列計算機上で高い性能が報告されている。しかし、現在のところ並列Lispは並列計算機上での性能を向上させることを目的としているのでもっぱら性能に関しての議論がなされ、言語の意味に関する考察はほとんどなされていない。そのため言語仕様があいまいになるし、言語仕様の変更も難しくなっている。このことはスケジューリング方式の固定化を引き起こし、ひいては自己反映計算[8]の実現を難しくしている。そこでMultiLispの操作的意味記述[1]を与え、これを用いて逐次型計算機上にSchme[7]によるインタプリタを作成した。さらにこれをもとに表示的意味記述を与える。またその記述から導かれるfutureとcall/ccとの相互干渉について述べる。

CiNii Books

researchmap
並列オブジェクト指向言語におけるSynchronization Constraintsと継承について

松岡聡, 米澤明憲

全国大会講演論文集 41 ( 0 ) 28 - 29 1990.9

　More details

Language：English

On developing large-scale programs with object-orientedconcurrent programming (OOCP) languages, we generally acknowledge that inheritance is one of the most essential features. However, it has been previously pointed out that in heritance and synchronization constraints in concurrent object systems often conflict with each other. For this reason, some languages such as ABCL/1[13] do not employ inheritance. Although several solutions[3, 4, 7,10,12] have been proposed in the past, we argue that, unfortunately,most of the proposals render inheritance totalLy useless.

CiNii Books

researchmap
並列オブジェクト指向言語への安全な継承の導入について

脇田建, 松岡聡

全国大会講演論文集 41 ( 0 ) 26 - 27 1990.9

　More details

Language：Japanese

並列オブジェクト指向言語では,同期制約の記述を継承することの困難が指摘されて以来,さまざまな言語で同期制約の記述の工夫が図られてきた.その多くのものは,同期制約を受理可能なメッセージの集合で表すものであったが,本稿はその方法の問題点を指摘し,それに対する解決として同期制約の論理式による表現法を挙げる.さらに,このように表現されたプログラムをプログラム変換を用いて実現することを提案する.

CiNii Books

researchmap
On Formal Treatment of Interactive Graphics

39 ( 0 ) 846 - 847 1989.10

　More details

Language：Japanese

CiNii Books

researchmap

▼display all

Presentations

Distributed Diskless Checkpoint for Large Scale Systems

10 IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2010) 2010

　More details

researchmap
HPC in the Cloud---A Hype, the End of SCs, or Peaceful Coexistence?

2010

　More details

researchmap
Auto-Tuning of a Scientific Application on GPU clusters

IPSJ SIG Technical Report 2010

　More details

researchmap
クラウド環境における大規模データブロードキャストの動的最適化

ハイパフォーマンスコンピューティングと計算科学シンポジウム (HPCS2010) 2010

　More details

researchmap
Improving the Large-Scale Data Access Using Virtual Machine Migration

2010

　More details

researchmap
Performance Evaluation of Software Framework for Memory Fault Tolerance in GPU Accelerators

SIAM Conference on Parallel Processing and Scientific Computing (PP10), MS36: Trends and Experiences in Heterogeneous Many-core Computing 2010

　More details

researchmap
Accelerated Computing in TSUBAME 1.2/2.0

2010

　More details

researchmap
HPC in the Cloud---A Hype, the End of SCs, or Peaceful Coexistence?

2010

　More details

researchmap
GPU クラスタにおける科学技術計算の自動最適化

HPC研究会 2010

　More details

researchmap
Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators

IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010) 2010

　More details

researchmap
A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs

24th IEEE International Parallel and Distributed Processing Symposium (IPDPS'10) 2010

　More details

researchmap
GPU Acceleration: a Fad or the Yellow Brick Road onto Exascale

2010

　More details

researchmap
大規模計算機システムの資源選択を支援するエキスパートシステム

情報処理学会研究報告2009-HPC-124 2010

　More details

researchmap
GPUクラスタにおける省電力タスクスケジューリング

第124回HPC研究会 2010

　More details

researchmap
Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds

The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing 2010

　More details

researchmap
Distributed Diskless Checkpoint for Large Scale Systems

10 IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2010) 2010

　More details

researchmap
仮想マシン動的再配置による大規模データアクセスの高速化

情報処理学会先進的計算基盤システムシンポジウム論文集 (SACSIS2010) 2010

　More details

researchmap
Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds

The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing 2010

　More details

researchmap
Rise of the commodity vectors

2008 8th International Meeting High Performance Computing for Computational Science 2008

　More details

researchmap
性能モデルに基づくCPU及びGPUを併用する効率的なFFTライブラリ

情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2008) 2008

　More details

researchmap
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

2008 ACM/IEEE conference on Supercomputing (SC08) 2008

　More details

researchmap
HPC-GPGPU: Large-scale commodity accelerated clusters and its application to advanced structural proteomics

AHeDD2008/IPAB2008 Joint Symposium 2008

　More details

Presentation type：Poster presentation

researchmap
Hundred million cores in commodity---Why not? (or, will `custom'*finally* prevail?)

CCGSC2008 2008

　More details

researchmap
Coupled-simulation e-science support in the NAREGI grid

IEEE Computer 2008

　More details

researchmap
情報爆発時代のグリッド環境に対応したMPI集団通信アルゴリズムの最適化

第70回情報処理学会全国大会 2008

　More details

researchmap
HPC-GPGPU: Large-scale commodity accelerated clusters and its application to advanced structural proteomics

Microsoft Science All-Hands-Meeting 2008

　More details

Presentation type：Poster presentation

researchmap
情報爆発に対応する耐故障性 MPI フレームワークの提案

第70回情報処理学会全国大会 2008

　More details

researchmap
情報爆発時代の光インターコネクト上でのMPI通信アルゴリズム

第70回情報処理学会全国大会 2008

　More details

researchmap
Grid'BnB: A parallel branch & bound framework for grids

14th International Conference on High Performance Computing (HiPC) 2008

　More details

researchmap
省電力ページング方式を実装した次世代メモリアーキテクチャ上での並列プログラムの評価

情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2008) 2008

　More details

researchmap
情報爆発時代へ向けた不均一アーキテクチャにおけるスーパーコンピューティング

第70回情報処理学会全国大会 2008

　More details

researchmap
情報爆発時代のグリッドファイルシステム上での大規模データ管理

第70回情報処理学会全国大会 2008

　More details

researchmap
情報爆発に対応するスケーラブルかつ自律的な障害解析

情報処理学会第70回全国大会 2008

　More details

researchmap
情報爆発時代におけるモデルベース資源選択による高速な仮想クラスタ構築

情報処理学会第70回全国大会 2008

　More details

researchmap
An efficient, model-based CPU-GPU heterogeneous FFT library

International Heterogeneity in Computing Workshop (HCW '08) 2008

　More details

researchmap
Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method

The Fourth Workshop on High-Performance 2008

　More details

researchmap
Massive supercomputing coping with heterogeneity of modern accelerators

IEEE International Parallel & Distributed Processing Symposium (IPDPS 2008) 2008

　More details

researchmap
Locality aware MPI communication on a commodity opto-electronic hybrid network

Workshop on Large-Scale Parallel Processing (LSPP) 2008

　More details

researchmap
情報爆発時代のスーパコンピュータ運用経験:TSUBAME Grid Clusterにて

情報処理学会第70回全国大会 2008

　More details

researchmap
NAREGIグリッドミドルウェアによる大規模連携接続実証実験

情報処理学会研究報告 2008

　More details

researchmap
Index distribution technique for efficient search on unstructured peer-to-peer networks

2008

　More details

researchmap
A decentralized, scalable, and autonomous grid monitoring system

11th International Conference on Principles of Distributed Systems (OPODIS) 2008

　More details

researchmap
Model-based fault localization in large-scale computing systems

The 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS'08) 2008

　More details

researchmap
Index distribution technique for efficient search on unstructured peer-to-peer networks

The International Conference in Electrical Engineering/Electronics 2008

　More details

researchmap
Rise of the commodity vectors

2008 8th International Meeting High Performance Computing for Computational Science 2008

　More details

researchmap
モデルベース資源選択による効率的な仮想クラスタ構築

情報処理学会先進的計算基盤システムシンポジウム(SACSIS2008) 2008

　More details

researchmap
グリッド環境におけるMPI Scatter/Gather通信アルゴリズムの最適化

並列/分散/協調処理に関するサマーワークショップ(SWoPP2008) 2008

　More details

researchmap
ソフトウェアECCによるGPUメモリの耐故障性の実現と評価

並列/分散/協調処理に関するサマーワークショップ(SWoPP2008) 2008

　More details

researchmap
CUDA 環境における高性能3次元FFT

情報処理学会先進的計算基盤システムシンポジウム(SACSIS2008) 2008

　More details

researchmap
Time stamping authoruty grid

Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'08) 2008

　More details

researchmap
不均一な複数GPUにおけるセルフスケジューリングによる並列数値演算

情報処理学会先進的基盤システムシンポジウム (SACSIS2008) 2008

　More details

Presentation type：Poster presentation

researchmap
ヘテロ計算環境のための省電力タスクスケジューリング

情報処理学会先進的基盤システムシンポジウム (SACSIS2008) 2008

　More details

Presentation type：Poster presentation

researchmap
ヘテロ計算環境のための省電力タスクスケジューリング

並列/分散/協調処理に関するサマーワークショップ(SWoPP2008) 2008

　More details

researchmap
Rise of the Commodity Vectors or Democratization of Supercomputing

NVISION2008 2008

　More details

researchmap
Access-pattern and bandwidth aware file replication algorithm in a grid environment

The 9th IEEE/ACM International Conference on Grid Computing (Grid 2008) 2008

　More details

researchmap
Environmental-aware optimization of MPI checkpointing intervals

The 2008 IEEE International Conference on Cluster Computing (Cluster 2008) 2008

　More details

Presentation type：Poster presentation

researchmap
HPC-GPGPU: Large-scale commodity accelerated clusters and its application to advanced structural proteomics

AHeDD2008/IPAB2008 Joint Symposium 2008

　More details

Presentation type：Poster presentation

researchmap
Model-based Optimization for Data-Intensive Application on Virtual Cluster

The 2008 9th IEEE/ACM International Conference on Grid Computing (Grid 2008) 2008

　More details

Presentation type：Poster presentation

researchmap
光ネットワークの補助的利用によるHPC性能向上

並列/分散/協調処理に関するサマーワークショップ(SWoPP2008) 2008

　More details

researchmap
広域分散ファイルシステムにおけるアクセスパターンと性能を考慮したファイル配置

並列/分散/協調処理に関するサマーワークショップ(SWoPP2008) 2008

　More details

researchmap
仮想クラスタを用いたData-Intensive Application 実行環境の性能モデル構築と最適化に向けて

2008

　More details

Presentation type：Poster presentation

researchmap
仮想クラスタを用いたData-Intensive Application 実行環境の性能モデル構築と最適化

情報処理学会研究報告 2008

　More details

researchmap
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

2008 ACM/IEEE conference on Supercomputing (SC08) 2008

　More details

researchmap
複数GPUにおけるセルフスケジューリングによる並列数値演算

並列/分散/協調処理に関するサマーワークショップ(SWoPP2008) 2008

　More details

researchmap
High-performance MPI broadcast algorithm for grid environments utilizing multi-lane NICs

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07) 2007

　More details

researchmap
Virtual clusters on the fly - fast, scalable, and flexible installation

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07) 2007

　More details

researchmap
Web-site-based partitioning techniques for efficient parallelization of the PageRank Computation

2007

　More details

researchmap
High-performance distributed solar computing (?) --- Towards a grid that computes like trees---

2007

　More details

researchmap
Data management on grid filesystem for data-intensive computing

SAINT 2007 Workshop on Middleware Architecture in the Internet 2007

　More details

researchmap
Peer-to-peer scheduling system with scalable information sharing protocol

SAINT 2007 Workshop on Middleware Architecture in the Internet 2007

　More details

researchmap
A peer-to-peer infrastructure for autonomous grid monitoring

The 3rd International Workshop on Hot Topics in Peer-to-Peer Systems at the International Parallel & Distributed Processing Symposium 2007 2007

　More details

researchmap
ABARIS: An adaptable fault detection/recovery component framework for MPIs

12th IEEE Workshop on Dependable Parallel 2007

　More details

researchmap
TSUBAME 1.2 の概要---世界初のGPU加速された大規模スパコン

SGI セミナー 2008

　More details

researchmap
最新TSUBAMEシステム

IPAB セミナー 2008

　More details

researchmap
NAREGIミドルウェアβ-gLite 間における相互ジョブ起動実験

情報処理学会研究報告2007-HPC-109(HOKKE2007) 2007

　More details

researchmap
ABARIS: An adaptable fault detection/recovery component framework for MPIs

12th IEEE Workshop on Dependable Parallel 2007

　More details

researchmap
ハイパフォーマンス分散時刻認証局：毎秒百万タイムスタンプ発行の実現

情報処理学会研究報告2007-HPC-109(HOKKE2007) 2007

　More details

researchmap
グリッド環境におけるマルチレーンを用いたMPIコレクティブ通信アルゴリズム

情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2007) 2007

　More details

researchmap
ヘテロ型スーパーコンピュータTSUBAMEのLinpackによる性能評価

2007年ハイパフォーマンスコンピューティングと計算科学シンポジウムHPCS2007 2007

　More details

researchmap
Data management on grid filesystem for data-intensive computing

SAINT 2007 Workshop on Middleware Architecture in the Internet 2007

　More details

researchmap
Peer-to-peer scheduling system with scalable information sharing protocol

SAINT 2007 Workshop on Middleware Architecture in the Internet 2007

　More details

researchmap
Autonomically-adapting master-worker programming framework for multi-layered grid-of-clusters

HPC Asia 2007 2007

　More details

researchmap
Model-based resource selection for efficient virtual cluster deployment

2nd International Workshop on Virtualization Technology in Distributed Computing (VTDC'07) 2007

　More details

researchmap
Job invocation interoperability between NAREGI Middleware Beta and gLite

HPC Asia 2007 2007

　More details

researchmap
フォールト／リカバリモデルを考慮した耐故障性をもつMPI フレームワークABARIS の提案と評価

情報処理学会研究報告2007-HPC-109(HOKKE2007) 2007

　More details

researchmap
High-performance MPI broadcast algorithm for grid environments utilizing multi-lane NICs

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07) 2007

　More details

researchmap
Virtual clusters on the fly - fast, scalable, and flexible installation

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07) 2007

　More details

researchmap
CPUおよびGPUを併用するFFTライブラリの提案と評価

情報処理学会研究報告 2007-HPC-111(SWOPP2007) 2007

　More details

researchmap
クラスタシステムにおけるIP-SANを用いたI/O処理の並列ベンチマークによる評価

情報処理学会研究報告 2007-HPC-111(SWOPP2007) 2007

　More details

researchmap
仮想クラスタを用いた複数サイト上でのMPI実行環境

情報処理学会研究報告2007-HPC-109(HOKKE2007) 2007

　More details

researchmap
キャッシュを用いた仮想クラスタ高速構築手法の性能評価

情報処理学会研究報告2007-HPC-109(HOKKE2007) 2007

　More details

researchmap
分散時刻認証局グリッドとパラメータ依存性の解析

情報処理学会先進的計算基盤システムシンポジウム(SACSIS2007) 2007

　More details

researchmap
A peer-to-peer infrastructure for autonomous grid monitoring

The 3rd International Workshop on Hot Topics in Peer-to-Peer Systems at the International Parallel & Distributed Processing Symposium 2007 2007

　More details

researchmap
仮想クラスタ構築時間のモデリングおよびその最適化

電子情報通信学会技術研究報告 2007-CPSY (SWOPP2007) 2007

　More details

researchmap
次世代省電力メモリを用いた並列プログラムの省電力化の評価

情報処理学会研究報告 2007-HPC-111(SWOPP2007) 2007

　More details

researchmap
High-performance distributed solar computing (?) --- Towards a grid that computes like trees---

2007

　More details

researchmap
分散システムにおける故障の自律的な解析

ソフトウェア科学会第24回大会 2007

　More details

researchmap
Web-site-based partitioning techniques for efficient parallelization of the PageRank Computation

2007

　More details

researchmap
インターネット上での分散時刻認証グリッドのタイムスタンプ発行スケーラビリティの評価

情報処理学会研究報告2007-HPC-112,HPC Asia併設WS 2007

　More details

researchmap
時刻認証グリッドの構築と基礎実験

電子情報通信学会技術研究報告 2007-CPSY (SWOPP2007) 2007

　More details

researchmap
分散時刻認証グリッドのインターネット上での動作実験

電子情報通信学会技術研究報告 2007-CPSY (SWOPP2007) 2007

　More details

researchmap
次世代光インターコネクトでの MPI 通信性能の評価

日本ソフトウェア科学会第24回大会（2007年度） 2007

　More details

researchmap
広域分散環境における大規模データ管理のためのノードグルーピング

情報処理学会研究報告 2007

　More details

researchmap
次世代光インターコネクト上での MPI アプリケーションの評価

情報処理学会研究報告 2007-HPC-111(SWOPP2007) 2007

　More details

researchmap
Autonomically-adapting master-worker programming framework for multi-layered grid-of-clusters

HPC Asia 2007 2007

　More details

researchmap
Model-based resource selection for efficient virtual cluster deployment

2nd International Workshop on Virtualization Technology in Distributed Computing (VTDC'07) 2007

　More details

researchmap
Job invocation interoperability between NAREGI Middleware Beta and gLite

HPC Asia 2007 2007

　More details

researchmap
Multi-replication with intelligent staging in data-intensive grid applications

The 7th IEEE/ACM International Conference on Grid Computing 2006

　More details

researchmap
Multi-Replication with Intelligent Staging in ata-Intensive Grid Applications.

In The 7th IEEE/ACM International Conference on Grid Computing 2006

　More details

researchmap
Speculative checkpointing

DSW 2006 2006

　More details

researchmap
Profile-based optimization of power-performance by using dynamic voltage scaling on a PC cluster

20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006) 2006

　More details

researchmap
Speculative checkpointing

DSW 2006 2006

　More details

researchmap
Construction and Operation of the Grid Challenge Testbed

2006

　More details

researchmap
MegaProto/E: Power-aware high-performance cluster with commodity technology

20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006) 2006

　More details

researchmap
光ネットワーク環境におけるMPI集団通信

第159回計算機アーキテクチャ・第105回ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-2006） 2006

　More details

researchmap
レプリカ管理システムを利用したデータインテンシブアプリケーション向けスケジューリングシステム

第159回計算機アーキテクチャ・第105回ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-2006） 2006

　More details

researchmap
グリッド上における仮想計算機を用いたジョブ実行環境構築システムの高速化

第159回計算機アーキテクチャ・第105回ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-2006） 2006

　More details

researchmap
大規模環境向け情報共有手法を用いた分散ジョブスケジューリングシステム

第159回計算機アーキテクチャ・第105回ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-2006） 2006

　More details

researchmap
ＴＳＵＢＡＭＥの飛翔: ペタスケールへ向けた「みんなのスパコン」の構想.

情報処理学会研究報告 2006-HPC-107 2006

　More details

researchmap
動的なノード群構成機構を備えた階層型グリッド環境: Jojo2

先進的計算基盤システムシンポジウム SACSIS2006 2006

　More details

researchmap
MegaProto/E: Power-aware high-performance cluster with commodity technology

20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006) 2006

　More details

researchmap
Profile-based optimization of power-performance by using dynamic voltage scaling on a PC cluster

20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006) 2006

　More details

researchmap
仮想計算機を用いたグリッド上でのMPI実行環境

先進的計算基盤システムシンポジウム SACSIS2006 2006

　More details

researchmap
ORE Grid：仮想計算機を用いたグリッド実行環境の高速な配置ツール

先進的計算基盤システムシンポジウム SACSIS2006 2006

　More details

researchmap
グリッドチャレンジテストベッドの構築と運用縲怎Oリチャレテストベッドの作り方縲鰀

並列／分散／協調処理に関する『高知』サマー・ワークショップ（SWoPP2006） 2006

　More details

researchmap
仮想計算機と仮想ネットワークを用いた仮想クラスタの構築

並列／分散／協調処理に関する『高知』サマー・ワークショップ（SWoPP2006） 2006

　More details

researchmap
フォールトモデルを考慮した耐故障性をもつ MPI フレームワーク Cuckoo FTMPI の提案と評価

電子情報通信学会技術研究報告 2006

　More details

researchmap
ヘテロ型スーパーコンピュータTSUBAMEのLinpackによる性能評価

並列／分散／協調処理に関する『高知』サマー・ワークショップ（SWoPP2006） 2006

　More details

researchmap
ＴＳＵＢＡＭＥの飛翔：ペタスケールへ向けた「みんなのスパコン」の構想

並列／分散／協調処理に関する『高知』サマー・ワークショップ（SWoPP2006） 2006

　More details

researchmap
Being "BYTES-oriented" in HPC leads to an Open Big Data/AI Ecosystem and Further Advances into the Post-Moore Era (Keynote Talk) Invited

Satoshi Matsuoka

2017 IEEE International Conference on Big Data 2017.12

　More details

researchmap
レプリカ管理システムを利用したデータインテンシブアプリケーション向けスケジューリングシステム

ハイパフォーマンスコンピューティングと計算科学シンポジウム 2006

　More details

researchmap
Converging HPC and Big Data / AI in an Open Public Infrastructure: Tokyo Tech. Tsubame3 and AIST ABCI Invited

Satoshi Matsuoka

The 19th IEEE International Conference on High Performance Computing and Communications (HPCC2017). 2017.12

　More details

researchmap
Energy Efficiency Gains From Software: Retrospectives and Perspectives (Panelist Talk) Invited

Satoshi Matsuoka

The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17) 2017.11

　More details

researchmap
Multi-Replication with Intelligent Staging in ata-Intensive Grid Applications.

In The 7th IEEE/ACM International Conference on Grid Computing 2006

　More details

researchmap
Efficient Sparse General Matrix-Matrix Multiplication Algorithms for Many-Core Processors

Yusuke Nagasaka, Aydın Buluç, Ariful Azad, Akira Nukada, Satoshi Matsuoka

SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP'18) 2018.3

　More details

researchmap
大規模分散システムにおける故障の解析

電子情報通信学会技術研究報告 DC2006-16 2006

　More details

researchmap
Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization

Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Marc Snir, Peter Nugent, Brian Van Essen

PDML19 @ ICPP2019 2019.8

　More details

researchmap
データインテンシブコンピューティングのためのグリッドファイルシステム上でのデータ管理

コンピュータシステム・シンポジウム(Compsys 2006) 2006

　More details

researchmap
Cambrian Explosion of Computing and Big Data in the Post-Moore Era

HPDC 2018 2018

　More details

researchmap
Multi-replication with intelligent staging in data-intensive grid applications

The 7th IEEE/ACM International Conference on Grid Computing 2006

　More details

researchmap
From Post-K to Cambrian Explosion of Computing and Big Data in the Post-Moore Era

HPC2018 - International Advanced Workshop, From Clouds and Big Data to Exascale and Beyond 2018

　More details

researchmap
You Don't Really Need Big Fat Switches Anymore --- Almost

情報処理学会研究報告 2003-ARC-154, SWoPP 2003 2003

　More details

researchmap
Converging HPC and BD/AI: Tokyo Tech TSUBAME3.0 and AIST ABCI (Booth Talk at Nvidia Booth) Invited

Satoshi Matsuoka

The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17) 2017.11

　More details

researchmap
Java 言語向け適応的部分計算の設計と実装

第6回プログラミングおよび応用のシステムに関するワークショップ SPA 2003 2003

　More details

researchmap
Blurring the Lines: High-End Computing and Data Science (Panelist Talk) Invited

Satoshi Matsuoka

The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17) 2017.11

　More details

researchmap
OpenJITコンパイラフレームワークにおける実行時特化システム

日本ソフトウエア科学会第4回プログラミングおよび応用システムに関するワークショップ（SPA2001）,March 2001 2001

　More details

researchmap
Converging HPC and BD/AI: Tokyo Tech TSUBAME3.0 and AIST ABCI (Booth Talk at Tokyo Tech Booth) Invited

Satoshi Matsuoka

The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17) 2017.11

　More details

researchmap
Grid RPC meets Data Grid: Network Enabled Services for Data Farming on the Grid

Proceedings of IEEE Symposium on Cluster Computing and the Grid Brisbane, Australia, May 2001 (Invited Paper) 2001

　More details

researchmap
Converging HPC and BD/AI: Tokyo Tech TSUBAME3.0 and AIST ABCI (Booth Talk at DDN Booth) Invited

Satoshi Matsuoka

The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17) 2017.11

　More details

researchmap
Japanese Computional Grid Research Project: NAREGI

Proceedings of the IEEE 2005

　More details

researchmap
Japanese Computional Grid Research Project: NAREGI

Proceedings of the IEEE 2005

　More details

researchmap
Large-scale Dynamic Graph Processing on HPC Systems

Keita Iwabuchi, Roger Pearce, Maya Gokhale, Satoshi Matsuoka

Minisymposium @ SIAM 2017 2017.1

　More details

researchmap
Exploring User Level Burst Buffer on Public Cloud and HPC Invited

Satoshi Matsuoka

Dagstuhl Seminar: Challenges and Opportunities of User-Level File Systems for HPC 2017.5

　More details

researchmap
Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms

Proceedings of 8th IEEE International Symposium on High Performance Distributed Computing (HPDC8) 1999

　More details

researchmap
HPCとビッグデータ・AIの融合インフラ Invited

松岡聡

産総研IMPULSEコンソセミナー（第3回） 2017.10

　More details

researchmap
Grid RPC meets Data Grid: Network Enabled Services for Data Farming on the Grid

Proceedings of IEEE Symposium on Cluster Computing and the Grid Brisbane, Australia, May 2001 (Invited Paper) 2001

　More details

researchmap
Results from Tsubame 3.0 - A 47 AI-PFLOPS System for HPC and AI Convergence Invited

Satoshi Matsuoka

HP-CAST29 2017.11

　More details

researchmap
Highly Efficient and Encapsulated Re-use of Synchronization Code in Concurrent Object-Oriented Languages Washington D. C., Sep. 1993.

Proceedings of ACM OOPSLA '93 1993

　More details

researchmap
FLOPS to BYTES: Accelerating Beyond Moore's Law is Data-Oriented Invited

Satoshi Matsuoka

PPAM2017 2017.9

　More details

researchmap
Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms

Proceedings of 8th IEEE International Symposium on High Performance Distributed Computing (HPDC8) 1999

　More details

researchmap
TSUBAME3/ABCI and AI Invited

Satoshi Matsuoka

The 3rd International High Performance Computing Forum (IHPCF2017) 2017.9

　More details

researchmap
アプリケーションのEmpiricalな性能モデル構築のためのプロファイル情報の収集 (オーガナイズドセッション: 計算科学と計算機科学のコデザインのためのミニアプリ)

野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡

2015年ハイパフォーマンスコンピューティングと計算科学シンポジウム 2015.5

　More details

researchmap
Can Local Binary Convolutions Make Neural Networks Models Smaller?

Haoyu Zhang, Wahib Mohamed, Pen Chen, Satoshi Matsuoka

International Conference on Parallel Processing (ICPP' 2019)

　More details

researchmap
Finishing GPU Jobs running on a Multi-GPU Batch-Queue Node-Sharing System Earlier with Remote GPU Execution and Migration

Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

ISC2016 PhD Forum 2016.6

　More details

researchmap
Dynamic Optimization for large data Broadcast on Clouds

2010

　More details

researchmap
Evaluations of Directive Based Programming Model for GPUs and Extensions for Performance Portability

Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka

SIAM Conference and Computational Science (CSE) 2015 2015.3

　More details

researchmap
Highly Efficient and Encapsulated Re-use of Synchronization Code in Concurrent Object-Oriented Languages Washington D. C., Sep. 1993.

Proceedings of ACM OOPSLA '93 1993

　More details

researchmap
A General Framwork for Bi-Directional Translation between Abstract and Pictorial Data.

ACM Transactions on Information Systems 1992

　More details

researchmap
Increasing Jobs that a Multi-GPU Batch-Queue System can serve, with GPU Remoting and Migration

Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

TJIA 2016 : The 8th Thailand-Japan International Academic Conference (TJIA) 2016.10

　More details

researchmap
A General Framwork for Bi-Directional Translation between Abstract and Pictorial Data.

ACM Transactions on Information Systems 1992

　More details

researchmap
A Resource Selection Support Expert System for Large-Scale Computing Environments

2010

　More details

researchmap
Performance Evaluation of Software Framework for Memory Fault Tolerance in GPU Accelerators

SIAM Conference on Parallel Processing and Scientific Computing (PP10), MS36: Trends and Experiences in Heterogeneous Many-core Computing 2010

　More details

researchmap
Accelerated Computing in TSUBAME 1.2/2.0

2010

　More details

researchmap
Power-Aware Task Scheduling on GPU Accelerated Clusters

2010

　More details

researchmap
GPU Acceleration: a Fad or the Yellow Brick Road onto Exascale

2010

　More details

researchmap
Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators

IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010) 2010

　More details

researchmap
A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs

24th IEEE International Parallel and Distributed Processing Symposium (IPDPS'10) 2010

　More details

researchmap
Access-pattern and bandwidth aware file replication algorithm in a grid environment

The 9th IEEE/ACM International Conference on Grid Computing (Grid 2008) 2008

　More details

researchmap
TSUBAME 1.2 and the Road to TSUBAME 2.0 - Accelerated Multi-Petascale Commodity Computing for Everyone

2009

　More details

researchmap
Speculative checkpointing: Exploiting temporal affinity of memory operations

HPC ASIA 2009 2009

　More details

researchmap
Fast Conjugate Gradients with Multiple GPUs

Lecture Notes in Computer Science 2009

　More details

researchmap
A Model-Based Algorithm for Optimizing I/O Intensive Applications in Clouds using VM-Based Migration

Proceedings of Cloud2009 in the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid 2009

　More details

researchmap
Preliminary evaluation of software-based memory fault tolerance for GPGPU

2009

　More details

Presentation type：Poster presentation

researchmap
Fast conjugate gradient solver on multi-GPU systems

2009

　More details

Presentation type：Poster presentation

researchmap
Environmental-aware optimization of MPI checkpointing intervals

HPC ASIA 2009 2009

　More details

researchmap
HPC Application Performance Improvement by a Supplemental Optical Circuit Switching Network

High Performance Computing Symposium 2009 2009

　More details

researchmap
An Efficient Conjugate Gradient Solver on Double Precision Multi-GPU Systems

2009

　More details

researchmap
Adaptive Resource Indexing Technique for Unstructured Peer-to-Peer Networks

9th IEEE/ACM International Symposium on Cluster Computing and the Grid 2009

　More details

researchmap
Linpack Tuning Method on a Heterogeneous Supercomputer with Hybrid Accelerators

Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP2009) 2009

　More details

researchmap
Towards user-satisfaction-based resource management in a large-scale computing environment

SWoPP2009 2009

　More details

researchmap
Petascaling Commodity onto Exascale: GPUs as Multithreaded Massively-Parallel Vector Processors - the Only Road to Exascale

2009

　More details

researchmap
A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs

2009

　More details

Presentation type：Poster presentation

researchmap
GPU Accelerated Computing---From Hype to Mainstream, the Rebirth of Vector Computing

2009

　More details

researchmap
File Clustering Based Replication Algorithm in a Grid Environment

The 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid 2009

　More details

researchmap
GPU accelerated computing窶吐rom hype to mainstream, the rebirth of vector computing

Scientific Discovery through Advanced Computing (SciDAC 2009) 2009

　More details

researchmap
Software-Based ECC for GPUs

2009 Symposium on Application Accelerators in High Performance Computing (SAAHPC'09) 2009

　More details

researchmap
The Efficient Checkpoint based on Erasure Coding with Incremental Method

SIG HPC 2009

　More details

researchmap
Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters

The Fifth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction to IEEE IPDPS 2009 2009

　More details

researchmap
プロセス間共通メモリイメージを考慮したマイグレーション最適化

情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009) 2009

　More details

Presentation type：Poster presentation

researchmap
四種プロセッサからなるヘテロ型スーパーコンピュータにおけるLinpackチューニング

情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009) 2009

　More details

Presentation type：Poster presentation

researchmap
Fast conjugate gradient solver on multi-GPU systems

2009

　More details

Presentation type：Poster presentation

researchmap
SWAPアクセス数の実行時推定によるメモリの省電力化手法

情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009) 2009

　More details

Presentation type：Poster presentation

researchmap
Petascaling Commodity onto Exascale with GPUs on TSUBAME1.2 onto TSUBAME2.0

2009

　More details

researchmap
Auto-Tuning 3-D FFT Library for CUDA GPUs

2009 ACM/IEEE conference on Supercomputing (SC09) 2009

　More details

researchmap
複数 GPU システムに対応する自動最適化 3D-FFT ライブラリ

情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009) 2009

　More details

Presentation type：Poster presentation

researchmap
Petascaling Commodity onto Exascale with GPUs and Windows HPC

2009

　More details

researchmap
MapReduce Implementation on the TSUBAME Supercomputer

2009

　More details

researchmap
CG on GPU-enhanced Clusters

2009

　More details

researchmap
TSUBAME 1.2 and the Road to TSUBAME 2.0 - Accelerated Multi-Petascale Commodity Computing for Everyone

2009

　More details

researchmap
Speculative checkpointing: Exploiting temporal affinity of memory operations

HPC ASIA 2009 2009

　More details

researchmap
A Model-Based Algorithm for Optimizing I/O Intensive Applications in Clouds using VM-Based Migration

Proceedings of Cloud2009 in the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid 2009

　More details

researchmap
スワップコストの動的推定によるメモリの省電力化手法

計算機アーキテクチャ・ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-2009） 2009

　More details

researchmap
四種プロセッサからなるヘテロ型スーパーコンピュータにおけるLinpackチューニング

計算機アーキテクチャ・ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-2009） 2009

　More details

researchmap
Environmental-aware optimization of MPI checkpointing intervals

HPC ASIA 2009 2009

　More details

researchmap
プロセス間共通メモリイメージを考慮したマイグレーション最適化

計算機アーキテクチャ・ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-2009） 2009

　More details

researchmap
Preliminary evaluation of software-based memory fault tolerance for GPGPU

2009

　More details

Presentation type：Poster presentation

researchmap
GPU向けソフトウェアECCの性能評価

計算機アーキテクチャ・ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-2009） 2009

　More details

researchmap
光サーキットネットワークの補助的利用によるHPCアプリケーション性能向上

情報処理学会ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009) 2009

　More details

researchmap
TSUBAME2.0における高バンド幅なペタフロップス・コンピューティングの可能性

2009

　More details

researchmap
姫野ベンチマークのGPUマルチノード実行における通信と演算のオーバーラップによる高速化～ 32GPUで700GFLOPS超を達成～

HPC研究会 2009

　More details

researchmap
GPUにおける耐故障性を考慮した数値計算の電力性能

2009

　More details

Presentation type：Poster presentation

researchmap
GPUにおける性能と消費電力の相関性の解析

2009

　More details

Presentation type：Poster presentation

researchmap
CUDA GPU向けの自動最適化FFTライブラリ

先進的基盤システムシンポジウム SACSIS 2009 2009

　More details

researchmap
File Clustering Based Replication Algorithm in a Grid Environment

The 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid 2009

　More details

researchmap
Adaptive Resource Indexing Technique for Unstructured Peer-to-Peer Networks

9th IEEE/ACM International Symposium on Cluster Computing and the Grid 2009

　More details

researchmap
Fast Conjugate Gradients with Multiple GPUs

Lecture Notes in Computer Science 2009

　More details

researchmap
Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters

The Fifth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction to IEEE IPDPS 2009 2009

　More details

researchmap
An Efficient Conjugate Gradient Solver on Double Precision Multi-GPU Systems

2009

　More details

researchmap
Software-Based ECC for GPUs

2009 Symposium on Application Accelerators in High Performance Computing (SAAHPC'09) 2009

　More details

researchmap
GPU Accelerated Computing---From Hype to Mainstream, the Rebirth of Vector Computing

2009

　More details

researchmap
A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs

2009

　More details

Presentation type：Poster presentation

researchmap
異種アクセラレータを持つヘテロ型スーパーコンピュータ上のLinpack の性能向上手法

並列/分散/協調処理に関するサマーワークショップ(SWoPP2009) 2009

　More details

researchmap
TSUBAME2.0におけるGPGPUによるスケーラブルなペタフロップス・ベクトル・スーパーコンピューティング

2009

　More details

researchmap
Petascaling Commodity onto Exascale: GPUs as Multithreaded Massively-Parallel Vector Processors - the Only Road to Exascale

2009

　More details

researchmap
GPUにおける耐故障性を考慮した数値計算の電力性能

情報処理学会研究報告2009-HPC-121 2009

　More details

researchmap
GPU における性能と消費電力の相関性の解析

情報処理学会研究報告2009-HPC-121 2009

　More details

researchmap
大規模計算環境におけるユーザ満足度を考慮した資源管理へむけて

2009年並列／分散／協調処理に関する『仙台』サマー・ワークショップ（SWoPP仙台2009） 2009

　More details

researchmap
GPU accelerated computing窶吐rom hype to mainstream, the rebirth of vector computing

Scientific Discovery through Advanced Computing (SciDAC 2009) 2009

　More details

researchmap
増分データとErasure Coding を利用した高速なチェックポイント手法

HPC研究会 2009

　More details

researchmap
CG on GPU-enhanced Clusters

2009

　More details

researchmap
GPU向け耐メモリエラーソフトウェアフレームワーク

情報処理学会研究報告 2009-HPC-123 2009

　More details

researchmap
Hundred million cores in commodity---Why not? (or, will `custom'*finally* prevail?)

CCGSC2008 2008

　More details

researchmap
Coupled-simulation e-science support in the NAREGI grid

IEEE Computer 2008

　More details

researchmap
Grid'BnB: A parallel branch & bound framework for grids

14th International Conference on High Performance Computing (HiPC) 2008

　More details

researchmap
Auto-Tuning 3-D FFT Library for CUDA GPUs

2009 ACM/IEEE conference on Supercomputing (SC09) 2009

　More details

researchmap
スーパーコンピュータTSUBAME上でのMapReduceの実現

情報処理学会研究報告2009-HPC-123(HOKKE17) 2009

　More details

researchmap
Petascaling Commodity onto Exascale with GPUs and Windows HPC

2009

　More details

researchmap
Petascaling Commodity onto Exascale with GPUs on TSUBAME1.2 onto TSUBAME2.0

2009

　More details

researchmap
Index distribution technique for efficient search on unstructured peer-to-peer networks

2008

　More details

researchmap
HPC-GPGPU: Large-scale commodity accelerated clusters and its application to advanced structural proteomics

Microsoft Science All-Hands-Meeting 2008

　More details

Presentation type：Poster presentation

researchmap
An efficient, model-based CPU-GPU heterogeneous FFT library

International Heterogeneity in Computing Workshop (HCW '08) 2008

　More details

researchmap
Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method

The Fourth Workshop on High-Performance 2008

　More details

researchmap
Time stamping authoruty grid

Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'08) 2008

　More details

researchmap
Index distribution technique for efficient search on unstructured peer-to-peer networks

The International Conference in Electrical Engineering/Electronics 2008

　More details

researchmap
Massive supercomputing coping with heterogeneity of modern accelerators

IEEE International Parallel & Distributed Processing Symposium (IPDPS 2008) 2008

　More details

researchmap
Locality aware MPI communication on a commodity opto-electronic hybrid network

Workshop on Large-Scale Parallel Processing (LSPP) 2008

　More details

researchmap
A decentralized, scalable, and autonomous grid monitoring system

11th International Conference on Principles of Distributed Systems (OPODIS) 2008

　More details

researchmap
Model-based fault localization in large-scale computing systems

The 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS'08) 2008

　More details

researchmap
Environmental-aware optimization of MPI checkpointing intervals

The 2008 IEEE International Conference on Cluster Computing (Cluster 2008) 2008

　More details

Presentation type：Poster presentation

researchmap
Rise of the Commodity Vectors or Democratization of Supercomputing

NVISION2008 2008

　More details

researchmap
Model-based Optimization for Data-Intensive Application on Virtual Cluster

The 2008 9th IEEE/ACM International Conference on Grid Computing (Grid 2008) 2008

　More details

Presentation type：Poster presentation

researchmap

▼display all

Works

the Lucie project, a turn-key network installer for large-scaled cluster

2003 - 2006

　More details

Work type：Artistic work

researchmap
大規模クラスタ用セットアップ・管理ツールの実用化

2003 - 2006

　More details

Work type：Artistic work

researchmap
グリッド技術に基づくディペンダブルな大規模モディティクラスタ構築技術

2001 - 2006

　More details

Work type：Artistic work

researchmap
クラスタグリッドテストベッド開発グリッド上数理最適化ライブラリアプリケーションの構築 Grid RPC/Ninfスケーラブル高信頼性拡張の研究

2001 - 2004

　More details

Work type：Artistic work

researchmap

Awards

HPCwire 35 Legends

2024.11 HPC Wire

　More details

researchmap
情報処理学会 2022年度フェロー

2023.3

　More details

Award type：Award from Japanese society, conference, symposium, etc. Country：Japan

「高性能計算技術研究開発への貢献」

researchmap
2022年度SCAT会長賞

2023.1

　More details

Award type：Award from publisher, newspaper, foundation, etc. Country：Japan

「世界最高性能スーパーコンピュータ「富岳」による新型コロナウイルス感染症対策への貢献」。

researchmap
2022 IEEE Computer Society Seymour Cray Computer Engineering Award

2022.10 IEEE Computer Society

　More details

“Long-term global leadership in supercomputing system design, such as TSUBAME and Fugaku.”

https://www.computer.org/press-room/2022-news/matsuoka-receives-2022-ieee-cs-seymour-cray-award

researchmap
2022 C&C Prize

2022.10 The NEC C&C Foundation

　More details

Award type：Award from publisher, newspaper, foundation, etc. Country：Japan

For Contributions to Pioneering Research and Development of Energy-saving and General-purpose Ultra-high-performance Computer Systems

https://www.candc.or.jp/en/2022/2022_prize_cc.html

researchmap
令和4年春紫綬褒章

2022.4

　More details

Country：Japan

計算機科学の分野において、現代のスーパコンピュータの劇的な性能進化に先駆的な研究で大いに寄与するだけでなく、それらの成果を直接的に世界トップクラスの実運用スーパコンピュータに適用し、世界トップの技術として普及させた。計算機科学研究への功績が評価された。

researchmap
情報処理学会 2021年度功績賞

2022

　More details

Award type：Award from Japanese society, conference, symposium, etc. Country：Japan

researchmap
The Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research

2021.11 Association for Computing Machinery

　More details

researchmap
HPCwire 2020 Editor's Choice Awards Outstanding Leadership in HPC

2020.11 HPCwire

　More details

researchmap
Fellow, Japan Society for Software Science and Technology

2020.8

　More details

Country：Japan

researchmap
SC Asia 2019 Asia HPC Leadership Award

2019

　More details

Asia HPC Leadership Award は、スーパーコンピュータのコミュニティにおいてすぐれたリーダーシップを発揮した個人に与えられる賞。高性能計算、特にシステムソフトウェアやシステム設計、性能モデリングと計測、低電力コンピューティング、HPCとビッグデータ/AIとの融合などの成果が評価されての受賞。

researchmap
2018 ACM HPDC Achievement Award

2018 ACM

　More details

researchmap
電子情報通信学会通信ソサイエティ活動功労賞

2018

　More details

researchmap
東京工業大学「末松賞『ディジタル技術の基礎と展開』支援」創設記念特別賞

2018

　More details

researchmap
People to Watch in 2017

2017 The HPC Wire

　More details

researchmap
HPC Wire 2015 Readers Choice Awards Outstanding Leadership in HPC

2015 HPC Wire

co-award with Prof. Jack Dongarra, Univ. Tennessee

　More details

researchmap
Sidney Fernbach Memorial Award

2014 IEEE Computer Society

　More details

researchmap
大川出版賞 2013年度

2014 （公財）大川情報通信基金岩波講座計算科学別巻「スーパーコンピュータ」

　More details

Country：Japan

researchmap
Rakuten Technology Award 2013 Gold

2013

　More details

researchmap
The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology (Prizes for Science and Technology, Development Category)

2012

　More details

researchmap
第59回電気科学技術奨励賞

2011 （公財）電気科学技術奨励会「運用世界一グリーンスパコンのTSUBAME2.0を実現した研究開発」

　More details

Country：Japan

researchmap
Fellow

2011 Association for Computing Machinery

　More details

researchmap
The Gordon Bell Prize

2011 Association for Computing Machinery

　More details

researchmap
ESPS Award

2011 The Promotion Foundation for Electrical Science

　More details

researchmap
People to Watch in 2010

2010 The HPC Wire

　More details

researchmap
Fellow

2009 International Supercomputing Conference

　More details

researchmap
ISC 2008 Award

2008 International Supercomputing Conference

　More details

researchmap
The 28th Top500 "No.1 SuperComputer in Asia"

2006

　More details

researchmap
第2回日本学術振興会賞

2006 日本学術振興会

　More details

Country：Japan

researchmap
IEEE Supercomputing StorCloud Challenge " Most Innovative Use of Storage In Support of Science " Award

2005 IEEE

　More details

Award type：Award from international society, conference, symposium, etc.

researchmap
Computerworld Computing Honors Laureate

2002

　More details

Award type：Award from international society, conference, symposium, etc.

researchmap
ACM Recognition of Service Award

2002

　More details

Award type：Award from international society, conference, symposium, etc.

researchmap
情報処理学会 1998年度坂井記念特別賞

1999 情報処理学会

　More details

Award type：Award from Japanese society, conference, symposium, etc. Country：Japan

researchmap
情報処理学会1996年度論文賞

1996 情報処理学会

　More details

Award type：Award from Japanese society, conference, symposium, etc. Country：Japan

researchmap
Best Paper Award, IEEE Visual Languages Symposium

1995 IEEE

　More details

Award type：Award from international society, conference, symposium, etc.

researchmap

▼display all

Research Projects

-

2021.4 - 2023.3

Fujitsu Ltd Collaborative Research

　 More details

Authorship：Principal investigator Grant type：Collaborative (industry/university)

researchmap
Fast and cost-effective deep learning algorithm platform for video processing in social infrastructure

2019.4 - 2023.3

Japan Science and Technology Agency CREST

　 More details

Authorship：Coinvestigator(s)

researchmap
次世代コンピュータシステムのソフト・ハードアーキテクチャと適用アプリに関する研究

2017.4 - 2021.3

株式会社富士通研究所共同研究

松岡聡, 遠藤敏夫, 野村哲弘

　 More details

Authorship：Principal investigator Grant type：Collaborative (industry/university)

researchmap
社会インフラ映像処理のための高速・省資源深層学習アルゴリズム基盤（スモールフェーズ課題）

2016.12 - 2019.3

科学技術振興機構 (JST) 戦略的創造研究推進事業(CREST)

篠田浩一, 松岡聡, 村田剛志, 横田理央

　 More details

Authorship：Coinvestigator(s) Grant type：Competitive

researchmap
Accelerating High-Performance Computing Application Kernels Through Reconfigurable Hardware

Grant number：16F16764 2016.11 - 2019.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for JSPS Fellows Grant-in-Aid for JSPS Fellows

　 More details

Grant amount：\2200000 （ Direct Cost: \2200000 ）

researchmap
機械学習の処理高速化に関する研究

2016.10 - 2020.3

株式会社デンソーアイティーラボラトリ共同研究

松岡聡, 横田理央, 野村哲弘

　 More details

Authorship：Principal investigator Grant type：Collaborative (industry/university)

researchmap
次世代HPC のソフト・ハードアーキテクチャと適用アプリに関する研究

2016.10 - 2017.3

株式会社富士通研究所共同研究

　 More details

Authorship：Principal investigator Grant type：Collaborative (industry/university)

researchmap
圧縮性流体解析プログラムの高速化に関する研究

2015.11 - 2016.3

株式会社IHI 共同研究

　 More details

Authorship：Principal investigator Grant type：Collaborative (industry/university)

researchmap
機械学習の処理高速化に関する研究

2015.10 - 2016.9

株式会社デンソー共同研究

　 More details

Authorship：Principal investigator Grant type：Collaborative (industry/university)

researchmap
EBD: Extreme Big Data - Convergence of Big Data and HPC for Yottabyte Processing

2013.10 - 2019.3

Japan Science and Technology Agency (JST) Core Research for Evolutional Science and Technology (CREST)

　 More details

Authorship：Principal investigator Grant type：Competitive

researchmap
高性能計算のためのプログラミングモデル

Grant number：12F02044 2012 - 2013

日本学術振興会科学研究費助成事業特別研究員奨励費特別研究員奨励費

松岡聡, PERICASGLEIM M.

　 More details

Grant amount：\2300000 （ Direct Cost: \2300000 ）

本研究の主たる目的は次世代スーパーコンピュータにおいて高性能・高電力効率と生産性を両立させる並列プログラミング手法の開発である。本年度はタスクパラレルモデルとデータフローモデルのランタイム評価と資源管理に焦点を置き、研究計画を推進した。前年度に行ったexaFMMを対象とした解析において、スケジューラーによるアプリケーションの性能差は、スケジューリング法の差によるプロセッサアイドル時間では説明がつかず、資源管理によるものと考えられたことによる。インターコネクトの制約が増大する将来のシステムでは、この点は性能・電力両面からより重要性が増すと考えられる。
この目的のため、タスクパラレルモデルとデータフローモデルにおけるreuse distance methodの解析手法の開発を行った。Reuse distanceは、ある特定のデータ要素への2回のアクセスの間にアクセスされたデータの量を示す指標である。この手法は資源管理において最も重要となるメモリアクセスの時間的局所性を解析するための強力な手法であり、プロセッサキャッシュの利用効率と高い相関がある。しかし、元来シングルコアプロセッサ向けに開発されたものであり、本研究に用いるための実装手法は明らかではなかった。そこで、克服すべき課題(トレースの生成、トレースのサイズ、計算の複雑さ)を抽出し、実現手法を提案した。
まず、このような手法が調査対象となる計算カーネルのデータサイズより十分大きい距離に関しては正確なreuse distanceを計算できる一方でトレースのサイズを大幅に削減することができる手法を示した。この手法の有効性を示すため、トレースの生成がほぼオーバーヘッド無しで測定できることを示すプロトタイプを構築した。加えてこの手法は、先行研究よりも大規模・長時間にわたる実行へもスケール可能である。これらの結果を、二報の論文として発表した。

researchmap
Fault Tolerant Infrastructure Toward Billion of Parallelization and Exa-scale Supercomputer

Grant number：23220003 2011.4 - 2016.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S) Grant-in-Aid for Scientific Research (S)

Matsuoka Satoshi, Hideyuki Jitsumoto, Toshio Endo, Hitoshi Sato, Naoya Maruyama, Shinichiro Takizawa, Kento Sato, Leonardo Bautista Gomez, Jens Domke

　 More details

Grant amount：\213720000 （ Direct Cost: \164400000 、 Indirect Cost：\49320000 ）

Fault tolerance has been recognized as an indispensable technique for exascale computing as supercomputers grow towards billion-way of parallelism. For future exascale supercomputers, we proposed advanced fault tolerant infrastructures. The advanced fault tolerant infrastructures include a scalable checkpoint/restart library, a fault tolerant messaging interface and a highly resilient burst buffer architecture. We validated the effectiveness based on mathematical statistics. We also released the software and made impact to the community.

researchmap
1億並列・エクサスケールスーパーコンピュータの耐故障性基盤

Grant number：23240006 2011

日本学術振興会科学研究費助成事業基盤研究(A) 基盤研究(A)

松岡聡

　 More details

Grant amount：\13000000 （ Direct Cost: \10000000 、 Indirect Cost：\3000000 ）

科学技術分野において、大規模なシミュレーションではスーパーコンピュータ(スパコン)の利用が不可欠となっている。しかし、スパコンに搭載される機器の増大・複雑化により、障害発生率が増加し、システムが実質的に動作しなくなると危惧されており、チェックポイント/リスタートなどの耐障害手法の適用が不可避となっているが、ポストペタースケールスパコンでは、技術的な課題が残る。このため、初年度は、1億スレッド・ポストペタのための基礎的な複合的なチェックポイント・リスタートを行うための耐故障性の数理モデル・性能モデルを探求し、特にポストペタスケールアプリケーションに適した耐障害手法の億単位のスレッド時の定量的性質を明らかにすることを目的として研究に従事した。実際、ポストペタ・エクサスケールスパコンの最有力アーキテクチャである細粒度長並列プロセッサ+粗粒度プロセッサを併用するハイブリッド型アーキテクチャにおいて、チェックポイント/リスタートは種々の技術的困難を伴う技術であったが、我々は、単一GPUにおける「リプレイ手法(メモリ割り当てやメモリーコピーの履歴を取り、リスタートの際に、その履歴に基づいて再現実行"リプレイ"を行うことにより、整合性の取れたチェックポイントを取る手法)」を拡張し、ノード内およびノード外の複数のCPU・GPUを使用するアプリケーションにおいて、安定かつ一貫性のとれたチェックポインティングを実現した。また、性能面においても、許容されるレベルのオーバーヘッドまでに押さえた。また、我々が開発したライブラリでは既存のプログラムに変更を加えることなく、これを実現することが可能であり、可用性にも優れている。超細粒度並列・ハイブリッド型アーキテクチャにおける透過的なチェックポイントの実現は、学術的インパクトも大きいと期待される。

researchmap
Platform of large scale and high quality genomics and bioinformatics: Towards the advancement of genome sciences in academia

Grant number：221S0002 2010.4 - 2016.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)

KOHARA Yuji, KATO Kazuto, TOYODA Atsushi, KUROKI Yoko, SUGANO Sumio, SUZUKI Yutaka, HAYASHI Tetsuya, YAMAMOTO Ken, TSUJI Shoji, INOUE Ituro, KUROKAWA Ken, MORISHITA Shinichi, NAKAMURA Yasukazu, TABATA Satoshi, KUHARA Satoshi, IWASAKI Wataru, SESE Jun, TAKAHASHI Hiroki, ASAI Kiyoshi, KASAHARA Masahiro, SAKAKIBARA Yasubumi, YADA Tetsushi, YAMAGATA Zentaro, MUTO Kaori, IDA Ryuichi, MASUI Tohru, KURIYAMA Mariko, TAKAGI Toshihisa, FUJIYAMA Asao, HATTORI Masahira, OGURA Yoshitoshi, TOKUNAGA Katsushi, KUWANO Ryozo, OHASHI Jun, ITOH Takehiko, HIRAKAWA Hideki, NOGUCHI Hideki, MATSUOKA Satoshi, OGASAWARA Naotake, NAKAMURA Kensuke, HAMADA Michiaki, KANAYA Shigehiko, ANZAI Yuichiro, OKADA Kiyotaka, SAKAKI Yoshiyuki, TAKAKU Fumimaro, TOYOSHIMA Kumao, NAKAMURA Keiko, HOTTA Yoshiki, YONEZAWA Akinori, YOSHIKAWA Hiroshi, YOSHIDA Mitsuaki, INOKO Hidetoshi, TODA Tatsushi, INAZAWA Johji, GOJOBORI Takashi, URUSHIHARA Hideko, TAKEDA Hiroyuki, SHIROISHI Toshihiko, ITOH Takashi, SATOH Noriyuki, MATSUDA Hideo, GOTO Susumu, TSUDA Masataka

　 More details

We have provided technologies of large scale and high quality genomics and bioinformatics to many KAKENHI projects, 60 to 90 subjects every year and altogether 464 subjects, based on application and selection. This kind of support became possible by concentrating to a limited number of DNA sequencing centers under the situation that there was unexpectedly fast advancement of these technologies in the world. Our activity has led to 363 papers including the Coelacanth genome paper. The KAKENHI subjects that we supported cover all the KAKENHI items and almost divisions of life science domain. Furthermore, we have developed new methodologies to solve the problems that emerged from the support activity : One of them is the genome assembly software PLATANUS that has become a key method to decipher difficult genomes. Such a virtuous circle and the outcome show that the platform is essential and effective in life sciences.

researchmap
ULP-HPC: Ultra Low-Power, High Performance Computing via Modeling and Optimization of Next Generation HPC Technologies

2007.10 - 2013.3

Japan Science and Technology Agency (JST) Core Research for Evolutional Science and Technology (CREST)

　 More details

Authorship：Principal investigator Grant type：Competitive

researchmap
Design and Development of Advanced IT Research Platform for Information Explosion Era

Grant number：18049073 2006 - 2010

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas Grant-in-Aid for Scientific Research on Priority Areas

ADACHI Jun, TANAKA Katsumi, NISHIDA Toyoaki, KUNIYOSHI Yasuo, SUDOH Osamu, KUROHASHI Sadao, HARA Takahiro, MATSUOKA Satoshi, TAURA Kenjiro, TATEBE Osami, MUNETOMO Masaharu, HIROTSU Toshio, MATSUBARA Jin, SHIMOJYO Shinji, CHIBA Shigeru, YUASA Taichi, MATSUYAMA Takashi, CHIKAYAMA Takashi, KONDO Toru, KONO Kenji, OKAMOTO Masahiro, AIDA Kento, KAMADA Tomio, KITSUREGAWA Mararu, YAMANA Hayato, NAKAMURA Yutaka, KOBAYASHI Hiroaki, NAKAJIMA Hiroshi

　 More details

Grant amount：\644600000 （ Direct Cost: \644600000 ）

This project implemented a common research infrastructure for all the research groups participating in this priority-area research initiative, accordingly supported all research activities in this initiative. Providing this infrastructure, we succeeded in accelerating shared utilization of research facilities and resources within the limitation of research funding and strengthening the collaboration among research groups. These shared facilities include (a)TSUBAKI: a open search engine for large-scale corpus, (b)InTrigger : Widely-distributed computing test-bed, (c)IMADE : an environment for real-world interaction measurement and analysis, and (d) prototyping for sensor-network based preventive medicine.

researchmap
Highly Scalable, High Performance and Autonomous Distributed Execution for Information Explosion Environments

Grant number：18049028 2006 - 2010

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas Grant-in-Aid for Scientific Research on Priority Areas

MATSUOKA Satoshi, AIDA Kento, NAKADA Hidemoto, TAKEFUSA Atsuko, MARUYAMA Naoya, JITSUMOTO Hideyuki, SATO Hitoshi, TAKIZAWA Shinichiro

　 More details

Grant amount：\87100000 （ Direct Cost: \87100000 ）

We have conducted several fundamental research activities for constructing highly scalable, high performance and autonomous distributed execution environments, called "resilient grids", for the information explosion era. We have built the constituent techniques, including modeling and simulation, for the resilient grids in terms of autonomous construction of high performance application execution environments and federation of future-networks and the environments.

researchmap
New IT Infrastructure for the Information-explosion Era

Grant number：17077001 2005 - 2011

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas Grant-in-Aid for Scientific Research on Priority Areas

KITSUREGAWA Masaru, ADACHI Jun, MATSUOKA Satoshi, MATSUYAMA Takashi, SUDOU Osamu

　 More details

Grant amount：\60400000 （ Direct Cost: \60400000 ）

researchmap
グリッドのプログラミングモデルProActiveを大規模テストベッド上でGridRPCなどと比較する

Grant number：05F05791 2005 - 2007

日本学術振興会科学研究費助成事業特別研究員奨励費特別研究員奨励費

松岡聡, DADUEL Laurent, BADUEL LAURENT, BADUEL. LAURENT

　 More details

Grant amount：\1000000 （ Direct Cost: \1000000 ）

これまで、自律的でスケーラブルな効率の良いグリッドモニタリングシステムの構築を行ってきた。このモニタリングシステムは自律的に動作するためにP2Pネットワークを通して、情報のやり取りを行う。一般的にシステムのモニタリングは、イベントのモニタリングを行い、その結果得られた情報を必要とするシステムへ通信することにより行われる。
現在のモニタリングシステムの問題点は、中央管理型であり、また構成の静的な決定を前提とすることである。中央管理型は単一故障点が存在し、ボトルネックを発生させる。また、環境の静的構成は大規模システムにおいて、すべての構成ノードの位置を正確に知る必要があり管理者に大きな負担を与える。これらの問題を解決するために、我々はP2Pネットワークの大規模性、単一故障点の回避による頑健性を利用した自律的モニタリングシステムを提案した。提案システムを利用することにより、自律構成、自己最適化、自己回復、および自己保護が実現可能となり、システムの完全な自律運用が可能となった。
本年度は、上記提案の開発を進めた。これまでのプロイタイプは、自律構成を実装し、実環境で運用可能であった。これに加え、システム運用時の振る舞いを利用した自己最適化を実装した。自己最適化は通信量、分散データベース、情報の分散速度、システムの適応性と動的なサイズ決定、グリッドサービスの協調性、システムを構成するコンポーネントの構成に焦点をあてて行っている。

researchmap
Study on advanced programming environment using OpenMP for a next generation high performance cluster system

Grant number：14208026 2002 - 2004

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A) Grant-in-Aid for Scientific Research (A)

SATO Mitsuhisa, ISHIKAWA Yutaka, MATSUOKA Satoshi, HONDA Hiroki, BOKU Taisuke, TAKAHASHI Daisuke

　 More details

Grant amount：\34970000 （ Direct Cost: \26900000 、 Indirect Cost：\8070000 ）

We have studied the OpenMP programming environment for the next generation 64-bit high-performance clusters, by using software distributed shared memory (SDSM) system to enable OpenMP program to run on the cluster. We have also developed a programming support system for OpenMP, and numerical libraries using OpenMP.
1.We ported the SCore cluster system software to 64-bit processor architectures. We conducted the performance evaluation of SCASH DSM system which runs on SCore.
2.We have designed and implemented a very portable SDSM system, SCASH-MPI which uses MPI as its communication layer. MPI is the most portable communication library supported for many kinds of high-speed communication network, so that this approach provide highly portability It allows the users to make use of wide address space in 64-bit processor. We found that the overhead of this implementation is just 6% comparing to the original SCASH.
3.We have designed a new SDSM system, FDSM, by using the access pattern analysis of applications. The access pattern is detected by a hardware mechanism provided by IA64, and is used for efficient communication. It achieves more performance than SCASH.
4.We have studied the optimization of OpenMP program running a DSM system of heterogeneous clusters. We found that the performance can be improved by the combination of the loop re-partitioning and the page migration.
5.We have designed and implemented the interactive tool, OMP/iPat, to support the programmer for OpenMP program developments. It allows the programmer to develop his OpenMP program interactively using the information from parallelism analysis by the compiler.
6.We have conducted the performance evaluation by using the OpenMP benchmark, SPEC-OMP. We have designed and implemented an algorithm of parallel recursive FFT by using OpenMP for IA-64 shared memory multi-processors.

researchmap
Research on Peer-to-peer large-scale data processing on the Grid

Grant number：13224034 2001 - 2005

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas Grant-in-Aid for Scientific Research on Priority Areas

MATSUOKA Satoshi, AIDA Kento, MATSUDA Yuko, MORITA Youhei, NAKADA Hidemoto, TATEBE Osamu

　 More details

Grant amount：\113200000 （ Direct Cost: \113200000 ）

The aim of this project is to develop basic technologies for the petabyte scale data processing. The goal is to bear the actual data processing from the LHC/ATLAS detector, as a part of the 'Gfarm' data grid middleware project. We have studied the following themes.
1). Wide area peer-to-peer federation and data transfer among the large clusters on the Grid
2). Programming, performance analyses and simulation techniques on the peer-to-peer data Grid.
3). High performance large scale data management for the data Grid.
1) We have promoted research on wide area large scale data transfer, and participated in the Bandwidth Challenge 2002, 2003 and StorCloud 2005 held in the United States to demonstrate the performance of large data transfer. Through the challenges, we have succeeded in running real applications on the GFarm and got the outlook for the realization of the super large scale data centers and the super scale data analyses for international collaborative experiments.
2) We have developed a Grid simulator named 'Bricks' to construct the performance models and to perform analyses of the data Grid. With that, we made lots of simulations of the data Grid and investigated characteristics of the data Grid and the GFarm architecture. We have also developed a Grid programming environment 'Jay', which is portable and suitable for the peer-to-peer Grid environments
3) We have developed a widely distributed file system that can avoid access concentration on a small number of nodes on the data Grid. We have implemented the system as an extension to GFarm, and inspected the validity of the system.

researchmap
Polyhedral Homotopy Continuation Methods for Computing All Real and Complex Solutions of Systems of Polynomial Equations

Grant number：13650444 2001 - 2002

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C) Grant-in-Aid for Scientific Research (C)

KOJIMA Masakazu, FUJISAWA Katsuki, MATSUOKA Satoshi

　 More details

Grant amount：\3000000 （ Direct Cost: \3000000 ）

The purpose of this research project is to develop practical numerical methods for all real and complex solutions of large scale systems of polynomial equations. The polyhedral homotopy continuation method used in this research consists of the following three phases :
Phase 1 : Construction of polyhedral homotopy systems.
Phase 2 : Numerical tracing of homotopy paths by the predictor-corrector method.
Phase 3 : Verification of solutions.
In 2001, we designed and developed basic algorithms for each phase above. In 2002, we studied the following issues.
1. Improvement of computational efficiency in each phase. In phase 1, we proposed an efficient construction of homotopy systems arising from symmetric systems of polynomial equations. We incorporated a linear algebra library LAPACK into phase 2, and developed a new method for verifying and classifying solutions of the cyclic polynomial.
2. Improvement of numerical stability in each phase. Linear systems to be solved in phase 2 become often so ill-conditioned that computation of their accurate solutions are difficult. We devised new dynamic scaling techniques to resolve this difficulty. We confirmed through numerical experiments that the use of these scaling techniques together with the singular value decomposition worked very effectively to improve the numerical stability of phase 2.
3. We combined the three phases into a software package PHoM, and released it through Internet. This software solved some large scale systems of polynomial equations that had not been solved before. In conclusion, this research project has accomplished its purpose mentioned above.
4. We have started a parallel implementation of PHoM, which will continue in the next year.

researchmap
Descriptions and Negotiation Models of Security Policies

Grant number：12133205 2000 - 2003

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas Grant-in-Aid for Scientific Research on Priority Areas

SHIBAYAMA Etsuya, TAKAHASHI Shin, WAKITA Ken, MATSUOKA Satoshi

　 More details

Grant amount：\32200000 （ Direct Cost: \32200000 ）

As a first step to building next generation secure information infrastructures, we have investigated the following three areas, representing three different viewpoints : descriptions, users, and systems.
1. Flexible Security Policy Description Schemes and Their Enforcement Mechanisims Taking account that mutual ly untrusted parties may have to collaborate or do trade with one another in the Internet era, we propose a new model of security policy that is compatible with privacy protections. Our research results include a model of policy negotiation using at tribute authentications, description schemes based upon security automata, an enforcement mechanism with instrumentation, and optimization with partial evaluations.
2. Convenient Methodologies for Constructions and Operations of Secure Software Systems We propose (semi-) automated construct ions and operations of secure software systems by developers, operators, and end-users. Our research results include automatic exploitations of security policies from information of package managers, semi-automated constructions of secure programuing language processors, development environments of secure software including a visual language system and a debugger.
3. Foundations of Next Generation Information Infrastructures We propose various security mechanisms for computing systems utilizing massive resources. Our research results include a fault-tolerant and high performance communication library, a scalable authentication algorithm, a remote installation and recovery tool for PC clusters, a virtual machine technology for the resolution of interference among virtual organizations.

researchmap
Multi-focus Zooming Interfaces with Focus Predictions

Grant number：12480070 2000 - 2002

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) Grant-in-Aid for Scientific Research (B)

SHIBAYAMA Etsuya, TAKAHASHI Shin, MATSUOKA Satoshi, TANAKA Jiro

　 More details

Grant amount：\7200000 （ Direct Cost: \7200000 ）

We have investigated two application domains of multi-focus zooming interfaces, that is, interactive visualizations of declarative data-flow visual program executions and hierarchical directory structures. In addition, for future enhancement of our zooming interfaces, we have been doing basic research on human-computer interactions beyond traditional desktop environments.
1. We have implemented an interactive browser with a multi-focus zooming interface, which provides a support for navigation of a huge and static diagram, representing an entire execution of a declarative data-flow visual program. This browser can effectively put multiple foci on arbitrary portions of a diagram and render an image consisting of not only those focal points but also their overall contexts. Also, based upon the notion of a "trace view,"we have proposed techniques to simultaneously depict asynchronous events such as inputs and outputs of a process and to illustrate dependences among processes.
2. We have implemented a prototype system that can interactively visualize a directory structure through a multi-focus zooming interface. This system provides a support for three sorts of navigation patterns, that is, navigation via parent-child links, keywords, and similarities. An important feature of this system is to show the enclosing directories of foci as contexts.
3. For the purposes of future enhancement of our zooming interfaces, we have investigated fundamental interaction techniques in wall display and mixed reality environments.

researchmap
j-GRID

Grant number：12558031 2000 - 2001

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) Grant-in-Aid for Scientific Research (B)

MURAOKA Yoichi, SEKIGUCHI Satoshi, AIDA Kento, SATOSHI Matsuoka, TANABE Osamu, TANAKA Yoshio

　 More details

Grant amount：\13200000 （ Direct Cost: \13200000 ）

In this project, we have developed basic technologies to construct Knowledge GRID which is a next generation GRID. Knowledge GRID enables us to share knowledge among people. Basic technologies include virtual space, hyperbook, and soft-computing.

researchmap
Wide-Area Grid Cluster for Parallel Optimization

Grant number：12480068 2000 - 2001

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) Grant-in-Aid for Scientific Research (B)

MATSUOKA Satoshi, AIDA Kento, DAI You, KOJIMA Masakazu, OGAWA Hirotaka, FUJISAWA Katsuki

　 More details

Grant amount：\12500000 （ Direct Cost: \12500000 ）

We employ the so-called Grid technology to construct a fleet of compute nodes as an aggregation of computing cluster nodes over a wide-area network, and using such "federation of cluster resources" attempt to tackle non-convex quadratic optimization problems of unprecedented scale, and made it accessible from throughout the Internet. More specifically, we developed an algorithm called SCRM (Successive Convex Relaxation Method) which is heavily based on using large numbers of SDP (Semidefinite Programming, SDP) subsolvers, which itself is called SDPA and is a very fast SDP solver using the Interior Point Methods. By efficiently spreading out the SDP solvers over the Grid we showed that we can solve non-convex quadratic problems of very large scale very efficiently, achieving almost linear speedup. For this purpose, we have constructed a fleet of PC clusters spread out throughout several locations, including Titech Oo-okayama Campus, Titech Suzukake-dai Campus, and Kyoto University. We have been able to achieve nearly 100-fold speedup using 128 processors. The key issue was not only the algorithm but efficient programming using the Ninf GridRPC system, which had to be modified extensively as well as new programming methodologies had to be 4eyeloped in order to cope with massive parallel execution of hundreds of tasks over the Grid.
More specifically, we parallelized SDPA with OpenMP using worksharing methodology to achieve nearly perfect parallel speedup for each cluster on the Grid. Also, we automated the process of selecting the best solver based on the data structure of the problem as well as the "shape" of the non-zero elements in the problem matrix. Then using the 256 nodes worth of clusters spread out over the -country, and using the Ninf GridRPC middleware, we constructed a "optimization solver server", achieving good speedup as mentioned above. The result not only set several world records for benchmark problems but also lead to even larger Grid research in the coming years.

researchmap
Reconfigurable Parallel Processing Plug＆Play Clustering

Grant number：12558025 2000 - 2001

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) Grant-in-Aid for Scientific Research (B)

MATSUOKA Satoshi, ISHIKAWA Yutaka, OGAWA Hirotaka, AIDA Kento, TAKAGI Hiromitsu

　 More details

Grant amount：\8100000 （ Direct Cost: \8100000 ）

The objective of the project is to push the technological envelop of fault tolerance and reconfigurability in large-scale clustering such that the clusters become almost self-sustaining, and reconfiguring is a matter of "Plug&Play". Some of the salient results are as follows :
1) Construction of the "Plug&Play" clustering testbed (20 nodes of DELL Inspiron , Mobile Celeron 600 MHz, 128 MB Memory, 20 GB HDD, 3COM Plug＆Play PCMCIA 100Base-T Network Card). This served as a flexible testbed for middleware development. It was also very compact (a small rack) and low power (less than 400 watts/20 nodes)
2) Development of the Parakeet Fault Tolerant, High-Performance Cluster MPI which allows various checkpointing algorithms to be selected from a set of available algorithm by the user according to his application characteristics. Parakeet is an entirely user-level implementation, is portable and efficient, and frees the users from checkpointing concerns within his code. We have implemented various checkpointing strategies to achieve the best efficiency, and conducted detailed performance analysis comparing with full restart.
3) Self-organizing cluster middleware, the Lucie prototype. As a basic technology, plug＆play clustering requires hot swapping of nodes, reconfiguration of software organization within a node, and dynamic partition management. Lucie builds on existing Linux tools to implement full cluster configuration capabilities in an automated fashion. Lucie allows fully automated (re)installation and configuration of every node in a cluster in a very rapid fashion.
4) Prototyping scalable, secure and self-organizing cluster communication. We have identified that scalable, reliable, secure, and self-organizing communication within the cluster node is the essential foundation for reliable, plug＆play clustering. We have prototyped some of the ideas in the Gfarm (cluster middleware for Petascale Datagrid processing) job manager : there, the self-organizing process ring structure governs all the nodes, and jobs can be started up rapidly in parallel, in a safe secure manner.

researchmap
超広域高性能計算環境の基礎的研究

1998 - 2001

科学技術振興事業団戦略的創造研究推進事業(さきがけ)

　 More details

Authorship：Principal investigator Grant type：Competitive

researchmap
Interactive Software Architecture for Advanced Movile Interface

Grant number：10480055 1998 - 1999

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) Grant-in-Aid for Scientific Research (B)

MATSUOKA Satoshi, YONEZAWA Akinori, TAKAHASHI Shin, SHIBAYAMA Etusya

　 More details

Grant amount：\10400000 （ Direct Cost: \10400000 ）

(1) Pen-Based Drawing System for Geometric Design:
We proposed two novel interaction techniques called "interactive beautification" and "predictive drawing." Interactive beautification receives the user's free stroke input and generates multiple candidates by considering possible geometric constraints. Predictive drawing shows next drawing operation based on the spatial relationship among existing figures. We then developed a prototype system Pegasus, and achieved fast and flexible drawing interactions.
(2) Pen-Based Sketching Interface for 3D Freeform Design :
We developed a pen-based drawing system "Teddy", a sketching interface for creating 3D polygonal surface models. The user can create a 3D model by drawing 2D silhouette with a pen. Our informal user study showed that a first-time user typically masters the operations within 10 minutes, and can construct 3D models within minutes. This papers was awarded Impact Paper at SIGGRAPH'99.
(3) Context-aware information system for mobile computer :
We proposed a new information system architecture called the "WEBPAD system". WEBPAD is attempts to be a general-purpose context-aware UI system, which recognizes and predicts current user's "context" information. Our system employs thin-client technique to offload much of the data processing to the server. The server collects useful information from the WWW automatically. The user can create new information via a camera and a microphone, and the information is distributed to others via information filtering techniques.
(4) Multiple Painting Input System :
We developed a prototype system of multiple pointing input system. Our system recognizes multiple movements of devices that is moved by a user simultaneously, so the user can point and manipulate multiple object on the screen.

researchmap
Development of System-LSI Architectures Based on Merged Memory/Logic Technology

Grant number：09358005 1997 - 1999

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A) Grant-in-Aid for Scientific Research (A)

MURAKAMI Kazuaki, MIYAZAKI Akio, TANIGUCHI Hideo, YASUURA Hiroto, SAWADA Sunao, IWAIHARA Mizuho

　 More details

Grant amount：\28600000 （ Direct Cost: \28600000 ）

The objectives of this research project are to develop system-LSI architectures and computer-system architectures, or PPRAM (Parallel Processing RAM), which are based mainly on merged memory/logic technology, parallel/distributed processing technology , and inter-LSI high-speed interconnect technology. The project has performed the following research results.
(1) Inter-LSI high-speed interconnect standard, or PPRAM-Link : The project has defined a set of specifications for physical layer, logical layer, and API of PPRAM-Link ; and then it has implemented these specifications in several ways.
(2) Reference PPRAM architectures : The project has developed a couple of architectures good for merged DRAM/logic system-LSI, such as (I) shared-register CMP (chip multiprocessor), (ii) statically/dynamically variable line-size cache, (iii) way-predicting set-associative cache.
(3) DRAM refresh architectures for merged DRAM/logic LSI : The project has developed a couple of architectures good for merged DRAM/logic system-LSI so that alleviate the DRAM refresh characteristics to be worsened by on-chip logic.
(4) Hardware/software codesign methodology for embedded system-LSI : The project has developed a hardware/software codesign methodology based on soft-core processor and Valen-C technologies.
(5) Software-controlled low power architectures : The project has designed a processor architecture, or PowerPro, which can optimize the power consumption by means of software control according to the system load.
(6) Test methodology for system-LSI : The project has proposed a test methodology good for system-LSI, which combines BIST and external test.
(7) PPRAM-based MOE (molecular orbital calculation engine) : The project has developed some PPRAM applications, including MOE chips and MOE system. The MOE chip consists of a 32-bit integer RISC processor, a 76-bit MO-specific floating-point processor, 1Mb SRAM, and a PPRAM-Link interface. The MOE system consists of a number of MOE boards, each of which includes five MOE chips and a bridge chip for PPRAM-Link and IEEE1394.
(8) PPRAM-based realtime digital-watermarking engine for movies : Another PPRAM application is a realtime digital-watermarking engine for movies. The project has implemented a suite of wavelet transformation function, PPRAM-Link interface and PCI-bus interface by means of FPGA.

researchmap
A Methodology of Pattern-Oriented Visual Parallel Programming and Its Interactive Supports

Grant number：09680328 1997 - 1999

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C) Grant-in-Aid for Scientific Research (C)

SHIBAYAMA Etsuya, TAKAHASHI Shin, MATSUOKA Satoshi

　 More details

Grant amount：\3100000 （ Direct Cost: \3100000 ）

1. We have proposed a visual language, in which higher levels of abstractions for object-oriented parallel programming can be effectively described. Thanks to a visual notation, inherently diagrammatic concepts such as patterns and architectures can have comprehensible representations in the language.
2. We have proposed the notion of visual pattern and a programming methodology based upon this notion. We have also designed an interactive and integrated environment for uses/reuses of visual patterns.
3. We have designed and implemented a visual language environment KLIEG, whose major features areas follows :
(1) A single notation is available in design, programming, and debugging phases.
(2) A simple graphical user interface is provided for uses/reuses of visual patterns, which encapsulate design information of object compositions.
(3) Software architectures are represented as nested compositions of visual patterns. Each visual pattern in any level is replaceable through a simple graphical user interface.
(4) Upon displaying portions of a program, the environment may put more stresses on (allocate more area for) the objects that should be replaced.
(5) For each hole in a visual pattern, multiple alternative implementations can be defined
(6) A visual tracer that automatically animates visual program executions is available.

researchmap
StackThreads/MP : Integrating Futures into Calling Standards

Grant number：08408008 1996 - 1998

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A) Grant-in-Aid for Scientific Research (A)

YONEZAWA Akinori, MASUHARA Hidehiko, KOBAYASHI Naoki, MATSUOKA Satoshi, TAURA Kenjiro

　 More details

Grant amount：\34600000 （ Direct Cost: \34600000 ）

An implementation scheme of fine-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is implemented. Like previous similar systems, it performs an asynchronous call as if it were an ordinary procedure call, and detaches the callee from the caller when the callee suspends or either of them migrates to another processor. Unlike previous similar systems, it detaches and connects arbitrary frames generated by off-the-shelf sequential compilers obeying calling standards. As a consequence, it requires neither a frontend preprocessor nor a native code generator that has a builtin notion of parallelism. The system practically works with unmodified GNU C compiler (GCC). Desirable extensions to sequential compilers for guaranteeing portability and correctness of the scheme are clarified and claimed modest. Experiments indicate that sequential performance is not sacrificed for practical applications and both sequential and parallel performance are comparable to Cilk, whose current implementation requires a fairly sophisticated preprocessor to C. These results show that efficient asynchronous calls (i.e., future calls) can be integrated into current calling standard with a very small impact both on sequential performance and compiler engineering.

researchmap
Advanced User Interface Construction via Multiple Visual Examples

Grant number：06452388 1994 - 1996

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) Grant-in-Aid for Scientific Research (B)

MATSUOKA Satoshi, TAKAHASHI Shin, YONEZAWA Akinori

　 More details

Grant amount：\7700000 （ Direct Cost: \7700000 ）

(1) Interactive constructionof GUI via multiple user examples : We have constructed a framework where a user can construct declarative rules of by providing multiple examples of user interfaces. In particular, the system infers the intentions of user corrections, whereby a cycle of system presentation versus user correction refines the interface.
(2) Declarative Animation Interface : We extended our previously proposed model of GUI called "bidirectional translation of abstract data and pictures" by incorporating the notion of time, thereby achieving semi-automated visualization of dynamic behavior of application data structures. The user merely declares the correspondence between the program's actions on the data structure and the desired animation effects, and the rest of the animation is generated by interpolation. The system has also been extended to incorporate 3-D interfaces.
(3) A theory of Generalized Local Propagation : We generalized the theory of local propagation in solving the constraints in a hierarchical constraint system. First, we refined the definition of hierarchical constraints by Alan Borning et.al. ; then, we defined the notion of local semi-monotonicity and global-monotonicity in the solution graph, obtaining the necessary and sufficient condition under which the solution obtained by the local propagation algorithm can be considered "correct". We then categorized different solvers and comparators. Finally, we developed a constraint solver DETAIL based on the theory, and used it in our prototype systems.
(4) New interaction techniques for pen computing : We proposed a model of recognition in local structures called the "Link Model". We then developed a pen-interaction techniques in a prototype pen-system Pegasus. Which presents multiple candidates per user actions in drawing, achieving fast and flexible drawing interactions.
(5) Interactive Penumbrae : We proposed a new use of penumbrae in 3-D interaction, called the "Interactive Penumbrae". An artificially-drawing penumbrae in 3-D space enhances user perception of height and location from the projection plane. A fast rendering algorithm has been developed which makes the technique useful for real-time interaction in 3-D space.

researchmap
Implementation of Parallel Functional Programming Systems

Grant number：06558039 1994 - 1996

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A) Grant-in-Aid for Scientific Research (A)

TAKEICHI Masato, TANAKA Tetsuro, MATSUOKA Satoshi, IWASAKI Hideya, YONEZAWA Akinori

　 More details

Grant amount：\15800000 （ Direct Cost: \15800000 ）

This project aims at development of functional programming systems for parallel computers. Implementation of a parallel functional system Parallel Gofer on the AP1000 computer has been finished and under evaluation. This implementation is based on the Gofer system developed by Mark Jones and accepts any Gofer programs for sequential evaluation. Programs are allowed to include references to extended library functions for parallelization.
Several new ideas for this implementation have been published already. One of such ideas is so-called "unboxing" techniques for data construction. This implementation showed that the idea is promising while some optimization should be considered for practical application.
Along with this implementation, a novel idea for optimization has been explored and implemented. Although most optimization so far relies on heuristics, our new system is completely mechanical. It is based on hylomorphisms which comes from research on constructive algorithmics. This technique is applicable to sequential and parallel functional programs.
These results have been made public at international conferences and published in the proceedings.

researchmap
Design and Implementation of Concurrent Programming Language based on Linear Logic

Grant number：06452389 1994 - 1995

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for General Scientific Research (B) Grant-in-Aid for General Scientific Research (B)

YONEZAWA Akinori, KOBAYASHI Naoki, MATSUOKA Satoshi

　 More details

Grant amount：\6000000 （ Direct Cost: \6000000 ）

The goal of our research is to develop a theoretical foundation of concurrent computation based on linear logic so that we can uniformly discuss various issues of concurrent programming languages : program analyzes, language desigh, and implementation techniques. The concrete research achievements are in order.
1. Development of a concurrent linear logic programming framework ACL/Higher-order ACL : We showed that the esence of concurrent computation is captured by proof search in first-otder linear logic. We further extended it to a higher-order system, and showed that static type systems and higher-order processes for concurrent programming languages are naturally introduced in the system.
2. Design and implementation of a typed concurrent linear logic programming language HACL : We designed and implemented a programming language HACL based on concurrent linear logic programming framework. A compiler on a single processor workstation was constructed, and programming experiments were made by using the compiler. We also constructed a prototype compiler on a cluster of workstations.
3. Study of high-level mechanisms for concurrent object-oriented languages through HACL : We showed that various high-level mechanisms of concurrent objects-inheritance, access control for methods-are easily constructed on top of HACL.The result implies not onbly that we can construct a concurrent object-oriented interface of HACL,but also that we can uniformly discuss various issues of other concurrent object-oriented languages.
4. Development of program analysis techniquse : Novel program analysis techniques for concurrent programming languages were developed through HACL.The proposed techniques enable compile-time optimizations, and also improve reliability of concurrent programs.

researchmap
Efficient Implementation of Concurrent Object-Oriented Languages for General Purpose MIMD Parallel Computers.

Grant number：05558026 1993 - 1995

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Developmental Scientific Research (B) Grant-in-Aid for Developmental Scientific Research (B)

YONEZAWA Akinori, KOBAYASHI Naoki, MATSUOKA Satoshi

　 More details

Grant amount：\20500000 （ Direct Cost: \20500000 ）

The goal of our research is to develop a highly efficient languages processor (i.e., compiler and runtime system) for concurrent object-oriented languages on general purpose MIMD parallel machines, demonstrating the viability of the concurrent object-oriented paradigm in the practical setting. After three-year research effort, our goal has been basically achieved and furthermore, much good results have been obtained in designing a new efficient debugging scheme for multi-thread parallel programs. The concrete research achievements are in order.
(1) A new concurrent object-oriented language called ABCL/f was designed. In the normal style of programming in this language, a mutable data structure is represented as a concurrent object where its access is only allowed through its associated methods which are invoked mutually exclusively [Ref 7].
(2) By simplifying ABCL/f, we newly designed a language called Schematic. This language can be viewed as an concurrent object-oriented extension of the Scheme languages which is a very popular dialect of Lisp. [Ref14]
(3) We designed an abstract machine called StackThreads amd highly efficient techniques for implementing StackThreads were developed. [Ref 1,2,8]
(4) Based on the implementation of StackThread, we implemented a language processor for ABCL/f where good performance numbers were obtained.
(5) A new garbage collection scheme was disigned and implemented and its performance was measured. This scheme was incorporated in our language processor mentioned in (4). [10,11]
(6) We designed a new debugging scheme which supports replay and tracing, requiring a small amount of logging information even where a large number of threads are active in a program. This scheme has been implemented. [Ref 15]
(7) In ABCL/f, we developed programs for (a) predicting RNA secondary structures and (b) finite element method application and N-body problem applications [9,12,13].

researchmap
直接操作インターフェースのための高速制約解消系の実現

Grant number：05780227 1993

日本学術振興会科学研究費助成事業奨励研究(A) 奨励研究(A)

松岡聡

　 More details

Grant amount：\900000 （ Direct Cost: \900000 ）

グラフィカルユーザインターフェース(以下、GUI)では、計算機内の情報の視覚化と、その視覚化に対する直接操作の実現のためのプログラム開発コストが大きい。その解決法の1つとして、幾何的制約を用いて図の構造を表現し、制約を解くことによって図を校正し、制約の動的変更により直接操作を実現する手法が注目され、盛んに研究されている。しかし、これまでに提案された制約解消法は、高速化のために、制約の連立や非線形な制約を禁ずるなど、制約系のクラスを大幅に制限していたため、これを採用したGUIシステムでは実用上必要な図を表現することが難しかった。そこで本研究では、制約の連立や非線形な制約も扱うことができる高速な制約解消法を開発した。これは以下のような分析と考察に基づいている。GUIで使われる一般的な制約系では、制約の連立は局所的に現れていて、大部分では制約を単独に解くことが可能である。また、単独であれば非線形な制約でも高速に解くことができ、連立する部分が小さければ全体的な速度低下への影響も少ない。そこで開発した制約解消法では、制約系を分析して連立する必要のある最小限の部分を求め、それ以外の制約を単独で解ける部分と合わせて、統一的に扱うようにした。これに基づいて制約解消系を作成し、その性能評価を行ったところ、制約系のクラスを大幅に制限した制約解消系と比べても、それほど大きな速度低下を生じていないことがわかった。さらに、この制約解消系を使用して、複数の視覚的例による直接操作インターフェイスの対話的実現を行うシステムImageを作成した。このシステムでは、制約を連立する機能を大いに活用している。また、現在開発中の宣言的記述に基づくアルゴリズムアニメーション作成システムにも採用する予定である。

researchmap
視覚的例による宣言的グラフィカルユーザインターフェースのプログラミング

Grant number：04780025 1992

日本学術振興会科学研究費助成事業奨励研究(A) 奨励研究(A)

松岡聡

　 More details

Grant amount：\900000 （ Direct Cost: \900000 ）

researchmap
グラフィカルユーザインターフェースにおける新しい抽象データの視覚化及び操作の手法

Grant number：03780021 1991

日本学術振興会科学研究費助成事業奨励研究(A) 奨励研究(A)

松岡聡

　 More details

Grant amount：\900000 （ Direct Cost: \900000 ）

researchmap
Computational Reflection in Object-Oriented Concurrent Computing and its Applications

Grant number：01420045 1989 - 1991

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for General Scientific Research (A) Grant-in-Aid for General Scientific Research (A)

YONEZAWA Akinori, WATANABE Takuo, MATSUOKA Satoshi

　 More details

Grant amount：\11800000 （ Direct Cost: \11800000 ）

1. An Object-oriented Concurrent Language ABCL/R which is able to describe reflective computation was designed and its preliminary implementation has been completed. [1]
2. We leaned a new notion called "Group-Wide Reflection" which is the reflective capability for a group of concurrent objects as a whole. [4, 5, 9]
3. An Actor-based reflective model and language ACT/R was designed, which have the group-wide reflective capability. [4, 5, 9]
4. Various results including correctness of the reflective model have been obtained by analyzing the notion of Group-Wide Reflection. [4, 5, 9]
5. A prototype implementation of ACT/R was completed.
6. We defined a new reflective notion called "Hybrid Group Reflection" by incorporating the results of our research on Group-Wide Reflection into our language ACT/R. [6, 13, 14]
7. In conventional languages, important aspects of parallel computation such as scheduling and shared resource coordination could only be programmed in an ad hoc way. Our research results enabled us to model and program such aspects of parallel computation in reflective (user) languages. Using our reflective language, we also showed that controlling and programming such aspects of parallel discrete event simulation can be done in a very succinctly way in the same language as the simulation is written. [13, 14, 11]
8. Our implementation of ABCL/R2 on a multi-processor workstation Omuron Luna88k demonstrated the effectiveness of the use of reflective capability in programming parallel discrete event simulation.
9. Using examples, we showed that the reflective capability provides an effective means for coping with the inheritance anomaly problem. [6]
10. The runtime kernel for an object-oriented concurrent language includes not only its intermediate-code interpreter, method dispatcher, and garbage collector, but also its scheduler and inter-node communication facilities. It is often the case in distributed computing environments that the behavior of the runtime k. emel needs to be adaptive according to its execution environment. For this purpose, we constructed a reflective architecture system called RbCl the almost all runtime facilities of which can be dynamically replaceable with user-defined ones. [8, 11]
Other Contributions : - International Workshops -
Two of our research members, Satoshi Matsuoka and Takuo Watanabe, organized (as program committee members) the following two international workshops :
1. ECOOP/OOPSLA'90 Workshop on Reflection and Metalevel Architectures in Object-Oriented Programming, Ottawa, October 21 1990.
2. OOPSLA'90 Workshop on Reflection and Metalevel Architectures in Object-Oriented Programming, Phoenix, October 7, 1991.

researchmap

▼display all