Faculty Profiles - ENDO TOSHIO

写真a

ENDO TOSHIO

Organization

Institute of Integrated Research Supercomputing Research Center Professor

Homepage

http://www.el.scrc.iir.isct.ac.jp/endo/

External link

Degree

博士（理学） ( 東京大学 )

Research Interests

memory hierarchy
high performance computing
GPGPU
Supercomputers

Research Areas

Informatics / High performance computing

Education

The University of Tokyo Graduate School, Division of Science Department of Information Science

1996.4 - 2001.9

　 More details

Country： Japan

Notes： Master course, Doctor course

researchmap
The University of Tokyo Faculty of Science Department of Information Science

1992.4 - 1996.3

　 More details

Country： Japan

researchmap

Research History

Institute of Science Tokyo Supercomputing Research Center, IIR Professor

2024.10

　 More details

Country：Japan

researchmap
Tokyo Institute of Technology GSIC Professor

2018.4 - 2024.9

　 More details

researchmap
Tokyo Institute of Technology GSIC Associate Professor

2012.6 - 2018.3

　 More details

researchmap

Professional Memberships

IPSJ

　 More details

researchmap
ACM

　 More details

researchmap
IEEE-CS

　 More details

researchmap

Papers

Effiicient counting of locally flat-foldable 45-degree grid crease patterns Reviewed

Ryosuke Sakurai, Toshio Endo

( 11 ) 1 - 17 2026.6

　More details

Authorship：Last author Language：Japanese Publishing type：Research paper (scientific journal)

researchmap
FRUGAL: Pushing GPU Applications beyond Memory Limits Reviewed International coauthorship International journal

Lingqi Zhang, Tengfei Wang, Jiajun Huang, Chen Zhuang, Ivan R. Ivanov, Peng Chen, Toshio Endo, Mohamed Wahib

2026 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 188 - 201 2026.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/cgo68049.2026.11395210

researchmap
Optimizing Intra-Layer Parallel Communication for LLM Training on Systems with Fully-Connected Mesh GPU Topology Reviewed International coauthorship International journal

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region 328 - 339 2026.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3773656.3773675

researchmap
次世代HPC・AI研究開発支援拠点の発足と国内におけるGPUプログラム開発支援

朴泰祐, 石田真一, 額田彰, 藤田典久, 下川辺隆史, 中島研吾, 三木洋平, 横田理央, 遠藤敏夫, 小林諒平

情報処理学会研究報告 2025-HPC-202 ( 9 ) 1 - 9 2025.12

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Dynamic Thread Coarsening for CPU and GPU OpenMP Code Reviewed International coauthorship International journal

Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert

Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis 1066 - 1074 2025.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3731599.3767482

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3731599.3767482
Physical System Study on Balancing Interactive and Batch Job Performance through Oversubscribing Scheduling Reviewed International journal

Shohei Minami, Toshio Endo, Akihiro Nomura, Hiroki Ohtsuji, Jun Kato, Masahiro Miwa, Eiji Yoshida

Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis 2137 - 2145 2025.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3731599.3767472

researchmap
CXLメモリプール実験システムの初期評価

遠藤敏夫, 坂本龍一, 野村哲弘, 小林諒平, 大辻弘貴, 加藤純, 古藤明音, 三輪真弘

並列/分散/協調処理に関するサマーワークショップ(SWoPP2025)，情報処理学会研究報告 2025-HPC-200 ( 24 ) 2025.8

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
オーバーコミット状態におけるノード内ジョブスケジューリング手法の性能比較

矢澤智成, 遠藤敏夫, 南将平, 細木隆豊

並列/分散/協調処理に関するサマーワークショップ(SWoPP2025)，情報処理学会研究報告 2025-HPC-200 ( 19 ) 2025.8

　More details

Language：Japanese

researchmap
分散並列学習における通信圧縮技術のボトルネック解析

小宮山悠輔, 細木隆豊, 遠藤敏夫

並列/分散/協調処理に関するサマーワークショップ(SWoPP2025)，情報処理学会研究報告 2025-HPC-200 ( 18 ) 2025.8

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
大規模スーパーコンピュータ向けネットワークトポロジの媒介中心性を用いた評価

佐藤智優, 遠藤敏夫, 細木隆豊

並列/分散/協調処理に関するサマーワークショップ(SWoPP2025)，情報処理学会研究報告 2025-HPC-200 ( 17 ) 2025.8

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers Reviewed International coauthorship International journal

Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Proceedings of the 39th ACM International Conference on Supercomputing 57 - 72 2025.6

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3721145.3730422

researchmap
An Optimization Technique for Hiding Communication Costs in 3D Parallel Training of Deep Learning Reviewed International journal

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid) 472 - 481 2025.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ccgrid64434.2025.00044

researchmap
Challenges in Computing Resource Sharing Towards Next-Gen Interactive Accelerated HPC Reviewed International journal

Toshio Endo, Shohei Minami, Akihiro Nomura, Hiroki Ohtsuji, Jun Kato, Masahiro Miwa, Eiji Yoshida, Tomoya Yuki, Ryuichi Sakamoto

Interactive and Urgent High-Performance Computing (CIW-IUS), in conjunction with ISC24, LNCS 15058 231 - 242 2024.12

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Nature Switzerland

DOI： 10.1007/978-3-031-73716-9_16

researchmap
TSUBAME4.0の処理量担保のための計算ノード分割

野村哲弘, 遠藤敏夫

2024年度大学ICT推進協議会(AXIES)年次大会, 10AM2C-5 2024.12

　More details

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
TSUBAME4.0: HPC-AI時代に向けた東京科学大学のもっとみんなのスパコン

安良岡由規, 遠藤敏夫, 野村哲弘, 渡邊寿雄, 鶴見慶

2024年度大学ICT推進協議会(AXIES)年次大会, 10AM1C-1 2024.12

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
SuperGCN: General and Scalable Framework for GCN Training on CPU-powered Supercomputers International coauthorship International journal

Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

arXiv:2411.16025 [cs.DC] 2024.11

　More details

Language：English Publishing type：Research paper (other academic)

researchmap
HPC-AI時代に向けたもっとみんなのスパコンTSUBAME4.0

遠藤敏夫, 野村哲弘, 渡邊寿雄, 安良岡由規, 鶴見慶

並列/分散/協調処理に関するサマーワークショップ(SWoPP2024)，情報処理学会研究報告 2024-HPC-195 ( 8 ) 2024.8

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
タンパク質構造予測プログラムOmegaFoldのマルチGPUを用いた高速化 Reviewed

大沢泰生, 遠藤敏夫, 細木隆豊

Cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG 2024) 2024.8

　More details

Language：Japanese Publishing type：Research paper (other academic)

researchmap
スパコンTSUBAMEシリーズにおけるリソース分割戦略

野村哲弘, 遠藤敏夫

並列/分散/協調処理に関するサマーワークショップ(SWoPP2024)，情報処理学会研究報告 2024-HPC-195 ( 7 ) 2024.8

　More details

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
FRUGAL: Reducing GPU Memory Requirement of HPC Applications

Tengfei Wang, Lingqi Chang, Ivan Ivanov, Peng Chen, Toshio Endo, Mohamed Wahib

Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report 2024-HPC-195 ( 27 ) 2024.8

　More details

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap
Leveraging GPUDirect Storage for Efficient Image Reconstruction

Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report 2024-HPC-195 ( 5 ) 2024.8

　More details

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap
High-performance Graph Convolutional Networks Training on Fugaku and ABCI Supercomputers International coauthorship

Chen Zhuang, Peng Chen, Xin Liu, Rio Yokota, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report 2024-HPC-195 ( 14 ) 2024.8

　More details

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap
Real-time High-resolution X-Ray Computed Tomography Reviewed International coauthorship International journal

Du Wu, Peng Chen, Xiao Wang, Issac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Proceedings of the 38th ACM International Conference on Supercomputing 110 - 123 2024.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3650200.3656634

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3650200.3656634
ステンシル計算の時間ブロッキングフレームワークの実装と評価

瓜生侑, 遠藤敏夫

情報処理学会研究報告 2024-HPC-194 ( 3 ) 2024.5

　More details

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
An optimization pass for training speed-up and strategy search in 3D parallelism International coauthorship

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

IPSJ SIG Technical Report 2024-HPC-194 ( 7 ) 2024.5

　More details

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap
Accelerating Stencil Computations on a GPU by Combining Using Tensor Cores and Temporal Blocking Reviewed

Futa Kambe, Toshio Endo

16th Workshop on General Purpose Processing Using GPU 1 - 6 2024.3

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3649411.3649412

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3649411.3649412
Retargeting and Respecializing GPU Workloads for Performance Portability Reviewed

Ivan R. Ivanov, Oleksandr Zinenko, Jens Domke, Toshio Endo, William S. Moses

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 119 - 132 2024.3

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/cgo57630.2024.10444828

researchmap
Automatic Parallelization and OpenMP Offloading of Fortran Array Notation Reviewed

Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert

proceedings of 20th International Workshop on OpenMP (IWOMP 2024), LNCS 15195 197 - 209 2024.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-031-72567-8_13

researchmap
High Throughput 3D Image Reconstruction with GPUDirect and Tensor Core

Du Wu, Peng Chen, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

IPSJ SIG Technical Report 2024-HPC-193 ( 25 ) 2024.3

　More details

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap
AshPipe: Asynchronous Hybrid Pipeline Parallel for DNN Training Reviewed

Ryubu Hosoki, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region 117 - 126 2024.1

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3635035.3635045

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3635035.3635045
Communication Optimization for Distributed GCN Training on ABCI Supercomputer.

Chen Zhuang, Peng Chen 0035, Xin Liu 0020, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

IEEE International Conference on Cluster Computing 160 - 161 2024

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/CLUSTERWorkshops61563.2024.00038

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#Zhuang0LEMW24
Investigating Nvidia GPU Architecture Trends via Microbenchmarks.

Lingqi Zhang 0001, Ryan Barton, Peng Chen 0035, Xiao Wang 0004, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

IEEE International Conference on Cluster Computing 174 - 175 2024

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/CLUSTERWorkshops61563.2024.00045

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#ZhangBCWEMW24
Asynchronous I/O Optimization for X-Ray Imaging via GPUDirect Storage.

Du Wu, Peng Chen 0035, Yiyu Tan, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

IEEE International Conference on Cluster Computing 196 - 197 2024

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/CLUSTERWorkshops61563.2024.00056

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#WuCTTEMW24
Pyramid Swin Transformer for Multi-task: Expanding to More Computer Vision Tasks Reviewed

Chenyu Wang, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami

Proceedings of Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS 2023), Springer, LNCS Vol. 14124 53 - 65 2023.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Nature Switzerland

DOI： 10.1007/978-3-031-45382-3_5

researchmap
The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System Reviewed

Shohei Minami, Toshio Endo, Akihiro Nomura

2023 IEEE High Performance Extreme Computing Conference (HPEC) 1 - 7 2023.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/hpec58863.2023.10363580

researchmap
Scalable Training of Graph Convolutional Networks on Supercomputers

Chen Zhuang, Peng Chen, Xin Liu, Satoshi Matsuoka, Toshio Endo, Mohamed Wahib

Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2023), IPSJ SIG Technical Report 2023-HPC-190 ( 19 ) 2023.8

　More details

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap
High-performance Temporal Blocking Stencils at Low GPU Occupancy

Lingqi Zhang, Mohamed Wahib, Peng Chen, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka

Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2023), IPSJ SIG Technical Report 2023-HPC-190 ( 26 ) 2023.8

　More details

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap
動的スケジューリングライブラリを用いたPythonにおける分散コレスキー分解の実装と評価

岡本洸琉, 遠藤敏夫

並列/分散/協調処理に関するサマーワークショップ(SWoPP2023)，情報処理学会研究報告 2023-HPC-190 ( 15 ) 2023.8

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
GPU上のTensor coreを使ったステンシル計算の時間ブロッキングによる高速化

神戸風太, 遠藤敏夫

並列/分散/協調処理に関するサマーワークショップ(SWoPP2023)，情報処理学会研究報告 2023-HPC-190 ( 29 ) 2023.8

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Enhancing the Performance of AlphaFold Through Modified Storage Method and Optimization of HHblits on TSUBAME3.0 Supercomputer

Hayato Fujita, Akihiro Nomura, Toshio Endo, Masakazu Sekijima

2023 Congress in Computer Science, Computer Engineering, &amp; Applied Computing (CSCE) 2023.7

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/csce60160.2023.00351

researchmap
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications Invited Reviewed

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

In proceedings of ACM International Conference on Supercomputing (ICS 2023), Orlando, June 2023. 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3577193.3593705

researchmap
Revisiting Temporal Blocking Stencil Optimizations Invited Reviewed

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

In proceedings of ACM International Conference on Supercomputing (ICS 2023), Orlando, June 2023. 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3577193.3593716

researchmap
次世代高性能メモリシステムにおけるステンシル計算の局所性向上技術の評価

幸朋矢, 遠藤敏夫

情報処理学会研究報告 2023-HPC-188 ( 31 ) 2023.3

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Effectiveness of the Oversubscribing Scheduling on Supercomputer Systems Reviewed

Shohei Minami, Toshio Endo, Akihiro Nomura

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region 18 - 28 2023.2

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3578178.3578221

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3578178.3578221
Exploiting Scratchpad Memory for Deep Temporal Blocking Reviewed

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

Proceedings of the 15th Workshop on General Purpose Processing Using GPU 34 - 35 2023.2

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3589236.3589242

researchmap
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming 119 - 134 2023.2

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3572848.3577475

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3572848.3577475
Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection

Chenyu Wang, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami

Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 583 - 590 2023

　More details

Publishing type：Research paper (international conference proceedings) Publisher：SCITEPRESS - Science and Technology Publications

DOI： 10.5220/0011675800003417

researchmap
機械学習を用いた音声処理に向けたデータ拡張手法の研究

丸山翼, 池上努, 遠藤敏夫, 広渕崇宏

電子情報通信学会応用音響研究会技術研究報告 2022.12

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Efficient Stencil Computation with Temporal Blocking by Halide DSL Reviewed

Hiroki Aikawa, Toshio Endo, Tomoya Yuki, Takahiro Hirofuchi, Tsutomu Ikegami

2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom) 870 - 877 2022.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ispa-bdcloud-socialcom-sustaincom57177.2022.00116

researchmap
Breaking the Memory Bottleneck for Iterative Memory-bound Applications Via Persistent Kernels

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

IPSJ SIG Technical Report 2022-HPC-187 ( 18 ) 2022.12

　More details

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap
Speed-Up Single Shot Detector on GPU with CUDA Reviewed

Chenyu Wang, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami

Proceedings of SNPD2022-summer, Studies in Computational Intelligence 1074 89 - 106 2022.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer International Publishing

DOI： 10.1007/978-3-031-19604-1_7

researchmap
3D Stacked SRAMを活用したHPC向けメモリアーキテクチャの検討

萩原汐, 吉川隆英, 幸朋矢, 遠藤敏夫

デザインガイア2022，情報処理学会研究報告 2022-SLDM-200 ( 31 ) 2022.11

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
ラムダ式を用いる移植性の高い並列プログラムの実装とCPU・GPU上の評価

瓜生侑, 遠藤敏夫

並列/分散/協調処理に関するサマーワークショップ(SWoPP2022)，情報処理学会研究報告 2022-HPC-185 ( 20 ) 2022.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfer, Oleksandr Zinenko

arXiv:2207.00257 [cs.PL] 2022.7

　More details

Language：English Publishing type：Research paper (other academic)

researchmap
負荷分散を改善したハイブリッドパイプライン並列深層学習手法

細木隆豊, 遠藤敏夫, 広渕崇宏, 池上努

並列/分散/協調処理に関するサマーワークショップ(SWoPP2022)，情報処理学会研究報告 2022-HPC-185 ( 17 ) 2022.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
タンパク質構造解析システムAlphafoldの実行時ファイルステージングを用いた高速化

大沢泰生, 遠藤敏夫, 野村哲弘

並列/分散/協調処理に関するサマーワークショップ(SWoPP2022)，情報処理学会研究報告 2022-HPC-185 ( 24 ) 2022.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations

Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hiroki Kobayashi, Shin Matsushima, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuke Shimotoku, Daisuke Miyamoto, Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa, Ikki Fujiwara, Yusuke Tanimura, Takayuki Aoki, Toshio Endo, Satoshi Ohshima, Keiichiro Fukazawa, Susumu Date, Toshihiro Uchibayashi

arXiv:2203.14188 [cs.LG] 2022.3

　More details

Language：English Publishing type：Research paper (other academic)

researchmap
Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems Reviewed

Shohei Minami, Toshio Endo, Akihiro Nomura

proceedings of 24th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2021), LNCS 12985 59 - 79 2021.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer International Publishing

DOI： 10.1007/978-3-030-88224-2_4

researchmap
Performance Modeling of HPC Applications on Overcommitted Systems.

Shohei Minami, Toshio Endo, Akihiro Nomura 0002

HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region 129 - 132 2021

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3432261.3439866

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/hpcasia/hpcasia2021.html#MinamiE021
Integrating Cache Oblivious Approach with Modern Processor Architecture Reviewed

Toshio Endo

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region 123 - 130 2020.1

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3368474.3368477

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3368474.3368477
AN5D: automated stencil framework for high-degree temporal blocking on GPUs. Reviewed

Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka

CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization(CGO) 199 - 211 2020

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

Stencil computation is one of the most widely-used compute patterns in high
performance computing applications. Spatial and temporal blocking have been
proposed to overcome the memory-bound nature of this type of computation by
moving memory pressure from external memory to on-chip memory on GPUs. However,
correctly implementing those optimizations while considering the complexity of
the architecture and memory hierarchy of GPUs to achieve high performance is
difficult. We propose AN5D, an automated stencil framework which is capable of
automatically transforming and optimizing stencil patterns in a given C source
code, and generating corresponding CUDA code. Parameter tuning in our framework
is guided by our performance model. Our novel optimization strategy reduces
shared memory and register pressure in comparison to existing implementations,
allowing performance scaling up to a temporal blocking degree of 10. We achieve
the highest performance reported so far for all evaluated stencil benchmarks on
the state-of-the-art Tesla V100 GPU.

DOI： 10.1145/3368826.3377904

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cgo/cgo2020.html#MatsumuraZWEM20
AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs.

Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka

CoRR abs/2001.01473 2020

　More details

Publishing type：Research paper (scientific journal)

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2001.html#abs-2001-01473
Profiling based Out-of-core Hybrid Method for Large Neural Networks

Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo

arXiv:1907.05013 [cs.LG] 2019.7

　More details

Language：English Publishing type：Research paper (other academic)

researchmap
An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation Reviewed

Yukinori Sato, Tomoya Yuki, Toshio Endo

ACM Transactions on Architecture and Code Optimization 15 ( 4 ) 1 - 23 2018.12

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Association for Computing Machinery (ACM)

On modern many-core CPUs, performance tuning against complex memory subsystems and scalability for parallelism is mandatory to achieve their potential. In this article, we focus on loop tiling, which plays an important role in performance tuning, and develop a novel framework that analytically models the load balance and empirically autotunes unpredictable cache behaviors through iterative polyhedral compilation using LLVM/Polly. From an evaluation on many-core CPUs, we demonstrate that our autotuner achieves a performance superior to those that use conventional static approaches and well-known autotuning heuristics. Moreover, our autotuner achieves almost the same performance as a brute-force search-based approach.

DOI： 10.1145/3293449

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3293449
Scalable RMA-based Communication Library Featuring Node-local NVMs Reviewed

Ryo Matsumiya, Toshio Endo

2018 IEEE High Performance extreme Computing Conference (HPEC) 1 - 7 2018.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/hpec.2018.8547546

researchmap
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy Reviewed

Toshio Endo

2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA) 19 - 24 2018.8

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/nvmsa.2018.00016

researchmap
Exhaustive evaluation of memory-latency sensitivity on manycore processors with large cache Reviewed

Noboru Tanabe, Toshio Endo

Proceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications 27 - 34 2018.3

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3195612.3195616

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3195612.3195616
Characterizing Memory-Latency Sensitivity of Sparse Matrix Kernels Reviewed

Noboru Tanabe, Toshio Endo

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) 249 - 254 2018.3

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/pdp2018.2018.00042

researchmap
ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity Reviewed

Yuki Ito, Ryo Matsumiya, Toshio Endo

2017 IEEE International Conference on Big Data (Big Data) 2017.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/bigdata.2017.8257926

researchmap
Overview of TSUBAME3.0, Green Cloud Supercomputer for Convergence of HPC, AI and Big-Data

松岡聡, 遠藤敏夫, 額田彰, 三浦信一, 野村哲弘, 佐藤仁, 實本英之, DROZD Aleksandr

Tsubame e-Science Journal 16 02‐08 (JA),20‐27 (EN) - 9 2017.11

　More details

Language：Japanese Publishing type：Research paper (bulletin of university, research institution)

J-GLOBAL

researchmap
Applying Temporal Blocking with a Directive-based Approach Reviewed

Shota Kuroda, Toshio Endo, Satoshi Matsuoka

Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC 1 - 11 2017.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3148173.3148190

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3148173.3148190
A Stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers Reviewed

Shimokawabe, Takashi, Endo, Toshio, Onodera, Naoyuki, Aoki, Takayuki

Proceedings of 2017 IEEE International Conference on Cluster Computing (IEEE Cluster 2017) (Internet) 525 - 529 2017.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

Stencil-based applications such as CFD have succeeded in obtaining high performance on GPU supercomputers. The problem sizes of these applications are limited by the GPU device memory capacity, which is typically smaller than the host memory. On GPU supercomputers, a locality improvement technique using temporal blocking method with memory swapping between host and device enables large computation beyond the device memory capacity. Our high-productivity stencil framework automatically applies temporal blocking to boundary exchange required for stencil computation and supports automatic memory swapping provided by a MPI/CUDA wrapper library. The framework-based application for the airflow in an urban city maintains 80\% performance even with the twice larger than the GPU memory capacity and have demonstrated good weak scalability on the TSUBAME 2.5 supercomputer.

DOI： 10.1109/cluster.2017.97

researchmap
An Accurate Simulator of Cache-Line Conflicts to Exploit the Underlying Cache Performance Reviewed

Yukinori Sato, Toshio Endo

Proceedings of 23rd International European Conference on Parallel and Distributed Computing (Euro-par 2017) 119 - 133 2017.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer International Publishing

DOI： 10.1007/978-3-319-64203-1_9

researchmap
HPCとビッグデータ・AIを融合するグリーン・クラウドスパコンTSUBAME3.0の概要

松岡聡, 遠藤敏夫, 額田彰, 三浦信一, 野村哲弘, 佐藤仁, 實本英之, DROZD Aleksandr

情報処理学会研究報告(Web) 2017 ( HPC-160 ) Vol.2017‐HPC‐160,No.29,1‐6 (WEB ONLY) 2017.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

J-GLOBAL

researchmap
ExanaDBT: A Dynamic Compilation System for Transparent Polyhedral Optimizations at Runtime Reviewed

Yukinori Sato, Tomoya Yuki, Toshio Endo

Proceedings of the Computing Frontiers Conference 191 - 200 2017.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3075564.3077627

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3075564.3077627
Evaluating the impacts of code-level performance tunings on power efficiency Reviewed

Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, Takatsugu Ono

2016 IEEE International Conference on Big Data (Big Data) 362 - 369 2016.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/bigdata.2016.7840624

researchmap
PGAS Communication Runtime for Extreme Large Data Computation Reviewed

Ryo Matsumiya, Toshio Endo

2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2) 10 - 16 2016.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/espm2.2016.007

researchmap
Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters Reviewed

Toshio Endo

2016 IEEE International Conference on Cluster Computing (CLUSTER) 21 - 29 2016.9

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/cluster.2016.61

researchmap
From FLOPS to BYTES: Disruptive change in high-performance computing towards the post-moore era Reviewed

Satoshi Matsuoka, Hideharu Amano, Kengo Nakajima, Koji Inoue, Tomohiro Kudoh, Naoya Maruyama, Kenjiro Taura, Takeshi Iwashita, Takahiro Katagiri, Toshihiro Hanawa, Toshio Endo

2016 ACM International Conference on Computing Frontiers - Proceedings 274 - 281 2016.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Association for Computing Machinery, Inc

DOI： 10.1145/2903150.2906830

Scopus

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/cd/cf2016.html#conf/cd/MatsuokaANIKMTI16
Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers. Reviewed

Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9725 265 - 274 2016.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-42432-3_33

Web of Science

Scopus

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/icms/icms2016.html#FujisawaEY16
Dynamic Compilation for Transparent Data Locality Analysis and Memory Subsystem Tuning Reviewed

Yukinori Sato, Toshio Endo

The International Workshop on Architectural and Micro-Architectural Support for Dynamic Optimization (AMAS-DO) 2016.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
A Cache-aware Temporal Blocking Method for 3D Stencil Computation Reviewed

Shimpei Sato, Yukinori Sato, Toshio Endo

3rd International Workshop on High-Performance Stencil Computations (HiStencils 2016), In conjunction with HiPEAC 2016 2016.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
The scalable petascale data-driven approach for the Cholesky factorization with multiple GPUs. Reviewed

Yuki Tsujita, Toshio Endo, Katsuki Fujisawa

Proceedings of ESPM2 2015: 1st International Workshop on Extreme Scale Programming Models and Middleware - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis 38 - 45 2015.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/2832241.2832245

Scopus

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/sc/espm2015.html#TsujitaEF15
Exana: an execution-driven application analysis tool for assisting productive performance tuning Reviewed

Yukinori Sato, Shimpei Sato, Toshio Endo

Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems 1 - 10 2015.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/2837476.2837477

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/2837476.2837477
Investigating potential performance benefits of memory layout optimization based on roofline model Reviewed

Shimpei Sato, Yukinori Sato, Toshio Endo

Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems 50 - 56 2015.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/2837476.2837483

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/2837476.2837483
Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers Reviewed

Katsuki Fujisawa, Toyotaro Suzumura, Hitoshi Sato, Koji Ueno, Yuichiro Yasui, Keita Iwabuchi, Toshio Endo

Optimization in the Real World - Toward Solving Real-World Optimization Problems -, Series of Mathematics for Industry 1 - 13 2015.9

　More details

Language：English Publishing type：Part of collection (book) Publisher：Springer Japan

DOI： 10.1007/978-4-431-55420-2_1

researchmap
TSUBAME2におけるスケジュール効率化への取り組みとユーザ動向の見える化

野村哲弘, 野村哲弘, 佐々木淳, 三浦信一, 三浦信一, 遠藤敏夫, 遠藤敏夫, 松岡聡, 松岡聡

情報処理学会研究報告(Web) 2015 ( HPC-150 ) VOL.2015-HPC-150,NO.2 (WEB ONLY) 2015.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

J-GLOBAL

researchmap
Power Capping of CPU-GPU Heterogeneous Systems using Power and Performance Models Reviewed

Kazuki Tsuzuku, Toshio Endo

Proceedings of the 4th International Conference on Smart Cities and Green ICT Systems 226 - 233 2015.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SCITEPRESS - Science and and Technology Publications

DOI： 10.5220/0005445102260233

researchmap
Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition Reviewed

Yuki Tsujita, Toshio Endo

Proceedings of Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP) 2015.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers Reviewed

Toshio Endo, Yuki Takasaki, Satoshi Matsuoka

2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) 625 - 632 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICPADS.2015.84

Web of Science

researchmap
Exploration of Lossy Compression for Application-level Checkpoint/Restart Reviewed

Naoto Sasaki, Kento Sato, Toshio Endo, Satoshi Matsuoka

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) 914 - 922 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS.2015.67

Web of Science

researchmap
Efficient utilization of memory hierarchy to enable the computation on bigger domains for stencil computation in CPU-GPU based systems Reviewed

Guanghao Jin, James Lin, Toshio Endo

2014 International Conference on High Performance Computing and Applications (ICHPCA) 1 - 6 2014.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ichpca.2014.7045354

researchmap
Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations Reviewed

Toshio Endo, Guanghao Jin

2014 IEEE International Conference on Cluster Computing (CLUSTER) 132 - 139 2014.9

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/cluster.2014.6968747

researchmap
実アプリケーションを用いた計算機評価ベンチマークと性能リポジトリの開発

野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡

情報処理学会研究報告(Web) 2014 ( 29 ) 1 - 7 2014.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：一般社団法人情報処理学会

J-GLOBAL

researchmap
An evaluation of the potential of flash SSD as large and slow memory for stencil computations Reviewed

Hiroko Midorikawa, Hideyuki Tan, Toshio Endo

2014 International Conference on High Performance Computing & Simulation (HPCS) 268 - 277 2014.7

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/hpcsim.2014.6903695

researchmap
Petascale general solver for semidefinite programming problems with over two million constraints Reviewed

Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS 1171 - 1180 2014.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS.2014.121

Web of Science

Scopus

researchmap
TSUBAME-KFC: a Modern Liquid Submersion Cooling Prototype towards Exascale Becoming the Greenest Supercomputer in the World Reviewed

Toshio Endo, Akira Nukada, Satoshi Matsuoka

2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) 360 - 367 2014

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/padsw.2014.7097829

Web of Science

researchmap
Accelerating Quantum Chemistry Calculations with Graphical Processing Units - Toward in High-Density (HD) Silico Drug Discovery Reviewed

Yohsuke Hagiwara, Kazuki Ohno, Masaya Orita, Ryota Koga, Toshio Endo, Yutaka Akiyama, Masakazu Sekijima

CURRENT COMPUTER-AIDED DRUG DESIGN 9 ( 3 ) 396 - 401 2013.9

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.2174/15734099113099990031

Web of Science

PubMed

researchmap
システム評価のためのアプリケーション性能リポジトリの構築と性能モデルの評価

野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡, 鈴木惣一朗, 丸山直也

情報処理学会研究報告(Web) 2013 ( 4 ) 1 - 6 2013.7

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：一般社団法人情報処理学会

J-GLOBAL

researchmap
A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU Reviewed

Guanghao Jin, Toshio Endo, Satoshi Matsuoka

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum 1080 - 1087 2013.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ipdpsw.2013.58

researchmap
Tsubame2.0: The first petascale supercomputer in japan and the greatest production in the world

Satoshi Matsuoka, Takayuki Aoki, Toshio Endo, Hitoshi Sato, Shin'Ichiro Takizawa, Akihiko Nomura, Kento Sato

Contemporary High Performance Computing: From Petascale toward Exascale 525 - 555 2013.1

　More details

Publishing type：Part of collection (book)

Scopus

researchmap
A Parallel Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPUs Reviewed

Guanghao Jin, Toshio Endo, Satoshi Matsuoka

2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) 1 - 8 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/cluster.2013.6702633

Web of Science

researchmap
TSUBAME2.0におけるMulti-rail InfiniBandネットワークの性能評価

野村哲弘, Nomura Akihiro, 遠藤敏夫, Endo Toshio, 松岡聡, MATSUOKA SATOSHI

情報処理学会研究報告 ( 2012 ) 2012.12

　More details

Language：Japanese Publisher：情報処理学会

researchmap
High-performance general solver for extremely large-scale semidefinite programming problems Reviewed

Katsuki Fujisawa, Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Makoto Yamashita, Maho Nakata

International Conference for High Performance Computing, Networking, Storage and Analysis, SC 93 - 93 2012.11

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/SC.2012.67

Web of Science

Scopus

researchmap

Other Link： http://dl.acm.org/citation.cfm?id=2389122
Petaflop biofluidics simulations on a two million-core system Reviewed

Massimo Bernaschi, Mauro Bisson, Toshio Endo, Satoshi Matsuoka, Massimiliano Fatica, Simone Melchionna

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis 1 - 12 2011.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/2063384.2063389

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/2063384.2063389
Molecular Dynamics Simulation of a Biomolecule with High Speed, Low Power and Accuracy Using GPU-Accelerated TSUBAME2.0 Supercomputer Reviewed

Shiqiao Du, Takuro Udagawa, Toshio Endo, Masakazu Sekijima

Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2011) 2011.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer Reviewed

Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Akinori Yamanaka, Akira Nukada, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka

Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis 1 - 11 2011

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/2063384.2063388

Scopus

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/2063384.2063388
An 80-fold speedup, 15.0 TFlops GPU acceleration of non-hydrostatic weather model ASUCA production code Reviewed

Takashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010 2010

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/SC.2010.9

Scopus

researchmap
Linpack Evaluation on the TSUBAME Supercomputer with Hybrid Accelerators(<Special Topics>GPGPU Computing)

Endo Toshio, Nukada Akira, Matsuoka Satoshi

Bulletin of the Japan Society for Industrial and Applied Mathematics 20 ( 2 ) 117 - 124 2010

　More details

Language：Japanese Publisher：The Japan Society for Industrial and Applied Mathematics

This paper reports Linpack benchmark evaluation on the TSUBAME supercomputer, a large scale hybrid supercomputer equipped with graphics processing units (GPUs) and ClearSpeed SIMD accelerators. With all of about 10,000 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, we have achieved 87 TFlops. This paper also describes our design policy and tuning method that take characteristics of accelerators into account, which are essential to achieve scalability on hybrid supercomputers. The design is significantly different from that of LANL RoadRunner, a hybrid system equipped with Cell processors. We discuss the difference from the viewpoint of system architecture.

DOI： 10.11540/bjsiam.20.2_117

CiNii Books

researchmap
The Efficient Checkpoint based on Erasure Coding with Incremental Method

JITSUMOTO HIDEYUKI, NAKAMURA SYUNSUKE, ENDO TOSHIO, MATSUOKA SATOSHI

研究報告ハイパフォーマンスコンピューティング（HPC） 2009 ( 9 ) 1 - 6 2009.10

　More details

Language：Japanese Publisher：情報処理学会

チェックポイント／リスタート手法は多くの大規模 HPC システムで利用されている耐故障機能である。しかし、近年の大規模 HPC システムのメモリサイズの急速な増大に比べ、並列ファイルシステムの I/O 帯域の増大は相対的に低いため、チェックポイント時間が増加してしまう問題がある。本研究では、チェックポイントのオーバヘッドを低く抑えつつ多重故障に対応することを目的とし、Erasure Coding を採用する。処理のボトルネックを避けるために Erasure Coding の符号演算処理を並列化し、かつ並列ファイルシステムの代わりにノードのローカルストレージにプロセスイメージを格納する。さらにプロセスイメージの大きさを削減するために、Incremental Checkpoint 手法を採用する。この手法はチェックポイント時に前回のプロセスイメージとの差分部分を記憶するものである。並列環境において行列積演算および NPB LU ベンチマークを用いた実験の結果、Incremental Checkpoint を用いたときに 28-84% の性能向上を確認した。Checkpointing/restarting is a well-known method as a fault tolerance mechanism in large scale HPC systems. However, overhead of this method tends to get larger, since memory size of recent systems is increasing rapidly, while the improvement of I/O bandwidth of file systems is relatively mild. The purpose of this work is to achieve checkpointing that supports multiple faults with low overhead by utilizing erasure coding. To eliminate the bottleneck, we parallelize encoding and store process images into node-local storage instead of shared file systems. Furthermore, to reduce sizes of process images, we adopt incremental checkpointing, which stores only parts of the process image that are modified since the previous checkpointing. Through parallel experiments using matrix multiply computation and NPB LU benchmark, we have observed 28 to 84% performance improvement by introducing incremental checkpointing.

researchmap
A Study of MPI Communication on a Next Generation Optical Interconnect

TAKIZAWA Shin'ichiro, ENDO Toshio, MATSUOKA Satoshi

26 ( 3 ) 5 - 19 2009.7

　More details

Language：Japanese

DOI： 10.11309/jssst.26.3_5

CiNii Books

researchmap
Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters

Tomoaki Hamano, Toshio Endo, Satoshi Matsuoka

2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5 1912 - 1919 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
File Clustering Based Replication Algorithm in a Grid Environment Reviewed

Hitoshi Sato, Satoshi Matsuoka, Toshio Endo

CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID 204 - 211 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CCGRID.2009.73

Web of Science

researchmap
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA Reviewed

A. Nukada, Y. Ogata, T. Endo, S. Matsuoka

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis 1 - 11 2008.11

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/sc.2008.5213210

Web of Science

researchmap
Massive supercomputing coping with heterogeneity of modern accelerators Reviewed

Toshio Endo, Satoshi Matsuoka

2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 1179 - 1188 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Locality aware MPI communication on a commodity opto-electronic hybrid network Reviewed

Shin'ichiro Takizawa, Toshio Endo, Satoshi Matsuoka

2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 2158 - + 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
An efficient, model-based CPU-GPU heterogeneous FFT library Reviewed

Yasuhito Ogata, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka

2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 380 - + 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid Environment Reviewed

Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Naoya Maruyama

2008 9TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING 250 - 257 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method Reviewed

Yuto Hosogaya, Toshio Endo, Satoshi Matsuoka

2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8 862 - 869 2008

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap
性能モデルに基づくCPU及びGPUを併用する効率的なFFTライブラリ

尾形泰彦, 遠藤敏夫, 丸山直也, 松岡聡

情報処理学会論文誌コンピューティングシステム 1 ( 1 ) 40 - 50 2008

　More details

researchmap
ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs. Reviewed

Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka

21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26-30 March 2007, Long Beach, California, USA 1 - 8 2007

　More details

Publisher：IEEE

DOI： 10.1109/IPDPS.2007.370603

researchmap
High-performance MPI broadcast algorithm for grid environments utilizing multi-lane NICs Reviewed

Tatsuhiro Chiba, Toshio Endo, Satoshi Matsuoka

CCGRID 2007: SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID 487 - + 2007

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Web of Science

researchmap

▼display all

Books

Advanced software technologies for post-peta scale computing : the Japanese post-peta CREST research project

Mitsuhisa Sato（ Role： ContributorEndo, Midorikawa, Sato: Software Technology That Deals with Deeper Memory Hierarchy in Post-petascale Era (pp, 227-248)）

Springer 2019 （ ISBN:9789811319235 ）

　More details

Total pages：viii, 317 p. Language：English

CiNii Books

researchmap

Presentations

Bridge Over Troubled Water: Offloading OpenMP Regions to XLA via StableHLO International coauthorship International conference

Muyao Xiao, Ivan R. Ivanov, Jens Domke, Toshio Endo

The 25th The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2026), poster session 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation

researchmap
(Status Report) Supercomputing Research Center/Center for Information Infrastructure, Institute of Science Tokyo Invited International conference

Toshio Endo

Vision and Strategy: How will supercomputing centers contribute to the future development of HPC/AI+?, Invited Session at SCA/HPC Asia 2026 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Oral presentation (invited, special)

researchmap
プロダクションHPC環境向けコンテナオーケストレーションツールの研究開発

坂本龍一, 加藤純, 古藤明音, 植木美和, 小野功, 野村哲弘, 小林諒平, 板倉宏太, 伊東利雄, 大辻弘貴, 遠藤敏夫, 三輪真弘

情報処理学会第37回コンピュータシステム・シンポジウム（ComSys 2025） 2025.12

　More details

Event date： 2025.12

Language：Japanese Presentation type：Oral presentation (general)

researchmap
TSUBAME4.0: More of Everyone's Supercomputer toward Future Computing Invited International conference

Toshio Endo

The 9th ISM-ISCT-NII-ZIB-NUS-MODAL Workshop on Optimization and Machine Learning for Data Science and Future Computing 2025.9

　More details

Event date： 2025.9

Language：English Presentation type：Oral presentation (invited, special)

researchmap
An Optimization Technique for Hiding Communication Costs in 3D Parallel Training of Deep Learning International coauthorship International conference

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid) 2025.5 IEEE

　More details

Event date： 2025.5

Language：English Presentation type：Oral presentation (general)

researchmap
Polyhedral Rescheduling of GPU Kernels To Exploit Async Memory Movement International coauthorship International conference

Ivan R. Ivanov, William Moses, Emil Vatai, Toshio Endo, Jens Domke, Oleksandr Zinenko

Ninth LLVM Performance Workshop at CGO 2025.3

　More details

Event date： 2025.3

Language：English Presentation type：Oral presentation (general)

researchmap
Challenges in Computing Resource Sharing Towards Next-Gen Interactive Accelerated HPC International conference

Toshio Endo, Shohei Minami, Akihiro Nomura, Hiroki Ohtsuji, Jun Kato, Masahiro Miwa, Eiji Yoshida, Tomoya Yuki, Ryuichi Sakamoto

Interactive and Urgent High-Performance Computing (CIW-IUS), in conjunction with ISC24, LNCS 15058 2024.12 Springer Nature Switzerland

　More details

Event date： 2024.12

Language：English Presentation type：Oral presentation (general)

researchmap
TSUBAME4.0の処理量担保のための計算ノード分割

野村哲弘, 遠藤敏夫

2024年度大学ICT推進協議会(AXIES)年次大会, 10AM2C-5 2024.12

　More details

Event date： 2024.12

Language：Japanese Presentation type：Oral presentation (general)

researchmap
TSUBAME4.0: HPC-AI時代に向けた東京科学大学のもっとみんなのスパコン

安良岡由規, 遠藤敏夫, 野村哲弘, 渡邊寿雄, 鶴見慶

2024年度大学ICT推進協議会(AXIES)年次大会, 10AM1C-1 2024.12

　More details

Event date： 2024.12

Language：Japanese Presentation type：Oral presentation (general)

researchmap
System Research on TSUBAME Supercomputer Series in Tokyo Tech and Science Tokyo Invited International conference

Toshio Endo

Co-Creation Monthly Seminar, Fujitsu-Co-Creation Research Lab at the University of Toronto 2024.11

　More details

Event date： 2024.11

Language：English Presentation type：Public lecture, seminar, tutorial, course, or other speech

researchmap
Asynchronous I/O Optimization for X-ray Imaging via GPUDirect Storage International coauthorship International conference

Du Wu, Peng Chen, Yiyu Tan, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) 2024.9

　More details

Event date： 2024.9

Language：English Presentation type：Poster presentation

researchmap
Investigating Nvidia GPU Architecture Trends via Microbenchmarks International coauthorship International conference

Lingqi Zhang, Ryan Barton, Peng Chen, Xiao Wang, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) 2024.9

　More details

Event date： 2024.9

Language：English Presentation type：Poster presentation

researchmap
An optimization pass for training speed-up and strategy search in 3D parallelism International coauthorship International conference

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) 2024.9

　More details

Event date： 2024.9

Language：English Presentation type：Poster presentation

researchmap
Communication Optimization for Distributed GCN Training on ABCI Supercomputer ． 2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) poster session, Kobe, Sep 24-27, 2024. International coauthorship International conference

Chen Zhuang, Peng Chen, Xin Liu, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) 2024.9

　More details

Event date： 2024.9

Language：English Presentation type：Poster presentation

researchmap
HPC-AI時代に向けたもっとみんなのスパコンTSUBAME4.0

遠藤敏夫, 野村哲弘, 渡邊寿雄, 安良岡由規, 鶴見慶

並列/分散/協調処理に関するサマーワークショップ(SWoPP2024)，情報処理学会研究報告 2024.8

　More details

Event date： 2024.8

Language：Japanese Presentation type：Oral presentation (general)

researchmap
Leveraging GPUDirect Storage for Efficient Image Reconstruction

Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report 2024.8

　More details

Event date： 2024.8

Language：English Presentation type：Oral presentation (general)

researchmap
High-performance Graph Convolutional Networks Training on Fugaku and ABCI Supercomputers International coauthorship

Chen Zhuang, Peng Chen, Xin Liu, Rio Yokota, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report 2024.8

　More details

Event date： 2024.8

Language：English Presentation type：Oral presentation (general)

researchmap
FRUGAL: Reducing GPU Memory Requirement of HPC Applications

Tengfei Wang, Lingqi Chang, Ivan Ivanov, Peng Chen, Toshio Endo, Mohamed Wahib

Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report 2024.8

　More details

Event date： 2024.8

Language：English Presentation type：Oral presentation (general)

researchmap
タンパク質構造予測プログラムOmegaFoldのマルチGPUを用いた高速化

大沢泰生, 遠藤敏夫, 細木隆豊

Cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG 2024) 2024.8

　More details

Event date： 2024.8

Language：Japanese Presentation type：Oral presentation (general)

researchmap
スパコンTSUBAMEシリーズにおけるリソース分割戦略

野村哲弘, 遠藤敏夫

並列/分散/協調処理に関するサマーワークショップ(SWoPP2024)，情報処理学会研究報告 2024.8

　More details

Event date： 2024.8

Language：Japanese Presentation type：Oral presentation (general)

researchmap
Experiences with making a power measurement and submission for TSUBAME4.0, Level 3 International conference

Toshio Endo, Akihiro Nomura

EE HPC WG Workshop 2024 2024.6

　More details

Event date： 2024.6

Language：English Presentation type：Oral presentation (general)

researchmap
Real-time High-resolution X-Ray Computed Tomography International coauthorship International conference

Du Wu, Peng Chen, Xiao Wang, Issac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Proceedings of the 38th ACM International Conference on Supercomputing 2024.5 ACM

　More details

Event date： 2024.5

Language：English Presentation type：Oral presentation (general)

researchmap

Other Link： https://dl.acm.org/doi/pdf/10.1145/3650200.3656634
ステンシル計算の時間ブロッキングフレームワークの実装と評価

瓜生侑, 遠藤敏夫

情報処理学会研究報告 2024.5

　More details

Event date： 2024.5

Language：Japanese Presentation type：Oral presentation (general)

researchmap
An optimization pass for training speed-up and strategy search in 3D parallelism International coauthorship

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

IPSJ SIG Technical Report 2024.5

　More details

Event date： 2024.5

Language：English Presentation type：Oral presentation (general)

researchmap
General and Scalable Framework for GCN Training on CPU-powered Supercomputers

Chen Zhuang, Peng Chen, Xin Liu, Toshio Endo, Mohamed Wahib

The 6th R-CCS International Symposium, poster session 2024.1

　More details

Event date： 2024.1

Language：English Presentation type：Poster presentation

researchmap
Optimizing Matrix Multiplication on Arm Architectures

Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

The 6th R-CCS International Symposium 2024.1

　More details

Event date： 2024.1

Language：English Presentation type：Poster presentation

researchmap
The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System

Shohei Minami, Toshio Endo, Akihiro Nomura

The cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG 2023) 2023.8

　More details

Event date： 2023.8

Language：English Presentation type：Poster presentation

researchmap
TSUBAMEスパコンシリーズのデータセンターとしての側面 Invited

遠藤敏夫

電子情報通信学会集積回路研究専門委員会 LSIとシステムのワークショップ2023 2023.5

　More details

Event date： 2023.5

Language：Japanese Presentation type：Public lecture, seminar, tutorial, course, or other speech

researchmap
Environmental-Aware Optimization of MPI Checkpointing Intervals

Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka

2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING 2008.9

　More details

Event date： 2008.9

Language：English Presentation type：Poster presentation

researchmap

▼display all

Awards

IEEE CCGrid 2025 Best Paper Award

2025.5 IEEE CCGrid 2026 Program Committee An Optimization Technique for Hiding Communication Costs in 3D Parallel Training of Deep Learning

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigo, Edouard Audi

　More details

Award type：Award from international society, conference, symposium, etc. Country：Norway

researchmap
平成24年度科学技術分野の文部科学大臣表彰（開発部門）

2012.4 文部科学省

　More details

「運用世界一グリーンペタスパコンの開発」について、松岡聡教授、青木尊之教授と共同受賞

researchmap
2011年度情報処理学会山下記念研究賞

2012.3 IPSJ

　More details

Award type：Award from Japanese society, conference, symposium, etc.

HOKKE-18発表論文「ヘテロ型スーパーコンピュータTSUBAME 2.0のLinpackによる性能評価」に対して授与

researchmap
2011 ACM Gordon Bell Prize Special Achievements in Scalability and Time-to-Solution

2011.11 ACM

　More details

Award type：Award from international society, conference, symposium, etc.

"Peta-scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer" (8 authors are awarded)

researchmap

Research Projects

Deployment of Scalable System Software for Machine Learning Technology to Saving Computing Resources

Grant number：23K28059 2023.4 - 2027.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Grant amount：\18460000 （ Direct Cost: \14200000 、 Indirect Cost：\4260000 ）

researchmap
Scalable System Software for Machine Learning on Heterogeneous Parallel Computing Environments

Grant number：20H04165 2020.4 - 2023.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Grant amount：\17550000 （ Direct Cost: \13500000 、 Indirect Cost：\4050000 ）

researchmap
ExaPath: Hierarchical Routing for Next-Gen Supercomputers and Beyond

Grant number：19H04119 2019.4 - 2024.3

日本学術振興会科学研究費助成事業基盤研究(B)

ドンケイェンス, 遠藤敏夫

　 More details

Grant amount：\17160000 （ Direct Cost: \13200000 、 Indirect Cost：\3960000 ）

In FY2020, the second year of the ExaPath project, we conducted two distinct studies for routing in HPC interconnects.
The first published paper of this FY is a survey of data center and supercomputer networks, which investigates various aspects related to how multi-pathing is implemented in those systems, what type of routing they deploy, and how effectively utilize them for extensive communication loads. The survey with the title "High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers" was published in the IEEE Transactions on Parallel and Distributed Systems journal.
The second published work, a peer-reviewed poster, is based on a Bachelor's thesis of our intern from Tokyotech which was presented at the 3rd R-CCS International Symposium. This thesis and poster tackled the fault resiliency of lossless interconnects and how to perform rerouting of the network while preserving certain properties, such as deadlock-freedom.
Furthermore, we collaborated with researchers of ETH Zurich to develop a real Slimfly testbed and deploy the routing we developed in the previous FY. Simultaneously, we supervised with a colleague from ETH a second Bachelor's thesis with the topic of routing low-diameter topologies.
Lastly, we disseminated our research findings through invited talks at the ISC High Performance conference (ISC'20) in a focus session on 'Photonics & Interconnects' and discussed our work and related routing and network topics with colleagues from academia and industry at various meetings and conference.

researchmap
Autonomous HPC data center using machine learning

Grant number：19H04121 2019.4 - 2022.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Matsuba Hiroya

　 More details

Grant amount：\17290000 （ Direct Cost: \13300000 、 Indirect Cost：\3990000 ）

To automate data center operations, we studied methods of acquiring data from data centers and reproducing them on a virtual space, as well as methods of optimizing operational policies on that virtual space.
For the former, data acquisition, we succeeded in defining a general data format and storage format that can collect general-purpose data from many data centers useful for various operations without depending on differences in data format by device or data usage. For the latter operational optimization, we succeeded in optimizing job scheduling using reinforcement learning, which automatically learns control methods, and implemented a scheduling and cooling equipment simulator as a place for such learning.

researchmap
Advancement of HPC Applications for Manufacturing Technology to Exascale

Grant number：26220002 2014.5 - 2019.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S)

Aoki Takayuki, ONODERA Naoyuki, NUKADA Akira, ENDO Toshio

　 More details

Grant amount：\188370000 （ Direct Cost: \144900000 、 Indirect Cost：\43470000 ）

In exascale supercomputers which have relatively poor memory bandwidth and low inter-node connection speed to computational performance, applications of manufacturing technology have revolutionary changes to minimize “Time-to-Solution” by introducing new numerical methods and innovative numerical algorithms. Explicit schemes, Adaptive Mesh Refinement (AMR) method and dynamic load balance extremely improve the “Time-to-Solition”and computational scales of these exascale applications for fluids, materials and particles. We have a strong confidence on execution of exascale applications for manufacturing technology.

researchmap
ポストペタスケール時代のメモリ階層の深化に対応するソフトウェア技術

Grant number：12101604 2012 - 2017

科学技術振興機構戦略的な研究開発の推進/戦略的創造研究推進事業/CREST

遠藤敏夫

　 More details

メモリの速度性能・容量の伸びが、メニーコア化するプロセッサの伸びに追いつかないという、メモリウォール問題は、今後のスパコンアーキテクチャにおいて顕著となり、気象・医療・防災などの重要なシミュレーションをさらに大規模化・精緻化する上での障害となると考えられています。その解決のために、不揮発メモリも含めた異種のメモリを混在させたスパコンアーキテクチャを想定し、それを有効活用するコンパイラ・メモリ管理技術・シミュレーションアルゴリズムなどにまたがった新しいソフトウェア技術の研究開発を推進します。

researchmap
Fault Tolerant Infrastructure Toward Billion of Parallelization and Exa-scale Supercomputer

Grant number：23220003 2011.4 - 2016.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S)

Matsuoka Satoshi, Hideyuki Jitsumoto, Toshio Endo, Hitoshi Sato, Naoya Maruyama, Shinichiro Takizawa, Kento Sato, Leonardo Bautista Gomez, Jens Domke

　 More details

Grant amount：\213720000 （ Direct Cost: \164400000 、 Indirect Cost：\49320000 ）

Fault tolerance has been recognized as an indispensable technique for exascale computing as supercomputers grow towards billion-way of parallelism. For future exascale supercomputers, we proposed advanced fault tolerant infrastructures. The advanced fault tolerant infrastructures include a scalable checkpoint/restart library, a fault tolerant messaging interface and a highly resilient burst buffer architecture. We validated the effectiveness based on mathematical statistics. We also released the software and made impact to the community.

researchmap
Highly Scalable Software Construction Basis for Information Explosion Era

Grant number：18049015 2006 - 2010

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas

CHIKAYAMA Takashi, YUASA Taiichi, UEDA Kazunori, TAURA Kenjiro, ENDO Toshio, YOKOYAMA Daisaku, UMATANI Seiji

　 More details

Grant amount：\64600000 （ Direct Cost: \64600000 ）

To cope with the explosive increase of data amount, frameworks for flexible description of software for widely distributed highly parallel information systems are required. For this purpose, programming languages, middleware systems, and veri-fication systems for highly complicated software have been investigated, and such systems have been proposed, designed, implemented and evaluated the performance. Represent-ative resultant software systems are made open to public.

researchmap
広域分散環境における高性能で記述の容易な並列ブログラミングシステムに関する研究

Grant number：17700050 2005 - 2006

日本学術振興会科学研究費助成事業若手研究(B)

遠藤敏夫

　 More details

Grant amount：\3400000 （ Direct Cost: \3400000 ）

本研究の目的は,動的・ヘテロ・大遅延の特徴を持つ大規模分散環境向けの並列プログラミングシステムの設計・評価である.タスク間の複雑な依存関係が存在する計算に対応し,大規模環境の特性を考慮したプログラミング環境を対象とし,応用分散アルゴリズムも対象とした.本年度は,近年の高性能アーキテクチャの動向を踏まえ,マルチコア・マルチCPUやアクセラレータ,GPUによるヘテロ型アーキテクチャへ対応する技術の提案・評価を行った.具体的には,ヘテロなノード間において並列プロセス数の調整を行い,かつそれぞれはCPUもしくはアクセラレータにおいてカーネル部分の計算を行うことにより異種計算機資源を効率的に利用するものである.提案プログラミング手法の実装・評価を東京工業大学の大規模計算機であるTSUBAME上において行った.ノードは汎用CPUとClearSpeed SIMDアクセラレータという異種計算機資源を持つため,双方を効率的に利用するようにプロセス数の調整を,現状では手動で,行った.並列Linpackを用いた評価により,本手法が十分に大きな問題サイズを持つときにスケーラビリティが高い(weakly scalable)ことを示した.結果については電子情報通信学会研究会(招待講演として)や情報処理学会シンポジウムHPCS2007で発表を行い,また情報処理学会論文誌に採択された.本年度購入した物品のうち,小額備品のノート型パソコンについては,システムの実装や実験のために利用した.また備品のXeonサーバは,合計8CPUコアを備えるサーバであり,マルチコアアーキテクチャの性能評価のために利用した.

researchmap
Secure Programming Languages Based on Mobile Code

Grant number：12133203 2000 - 2003

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas

YONEZAWA Akinori, OYAMA Yoshihiro, MASUHARA Hidehiko, TAURA Kenjiro

　 More details

Grant amount：\37800000 （ Direct Cost: \37800000 ）

We studied secure programming languages based on mobile code. We worked on essential research issues in security of programming languages and system software from both theoretical and practical aspects. First we gave a systematic solution to problems in programming languages, which is the most important element in software development. The solution is not ad-hoc but based on a solid theory. Furthermore, we addressed research issues in system software with a systematic solution based on deep understanding of target systems. The results of this research include a secure compiler for the C programming language (Fail-Safe C), an interface definition language for Fail-Safe C, an operating system that guarantees security using type systems (Kernel Mode Linux), a system that defends networks against distributed denial of service attacks (MovingFirewall), cryptographic lambda calculus, regular expression types for string operations, self-repairing reference monitors, a fine-grain access control mechanism between program modules, programming languages that support the description of mobile code (JavaGO and JavaGoX), object usage analysis for Java, efficient regjon-based memory management for a dynamical ly-typed programming language Scheme, and a type system for access control in distributed computation. The results are highly evaluated by internat ional communit of computer science. This research had a significant impact on academic and industrial fields by releasing three software and publishing about 30 refereed papers. We received four prestigious awards including the best paper award from Japan Society for Software Science and the best technology award from Nikkei Business Publications Inc.

researchmap
Adaptive Software Substrate for High Performance Wide Area Computing

Grant number：12308012 2000 - 2002

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

YONEZAWA Akinori, MASUHARA Hidehiko, SUMII Eijirou, TAURA Kenjirou, KOBAYASHI Naoki

　 More details

Grant amount：\41120000 （ Direct Cost: \36200000 、 Indirect Cost：\4920000 ）

The objective of this project was to establish foundations for building adaptive runtime systems, which behave well under various resource conditions (of CPU, memory, and network) that reveal at runtime. Main results are as follows, (1) We established an efficient access control (e.g., mutual exclusion) method for shared data, which adapts to the degree of parallelism in the application (published in ACM PaCT). (2) We proposed a framework in which parallel applications can migrate from a set of resources to another at runtime, depending on resource conditions (published or to be published in ACM PPoPP and ACM/IEEE CCGrid). (3) We tackled the resource selection problem in which the system, given computation/communication requirements of the application and computation/communication capacity of available resources, tries to select good resources for the application automatically. It proposed the problem formulation, an algorithm assuming resource requirements and conditions do not change over time, and conducted a simulation. (4) We established a dynamic memory management scheme that can trade thread-level locality of allocated memory and the total memory requirements. (5) We achieved a very short pause time (less than 10ms) for conservative garbage collectors, which have been considered to be difficult in the community (published in ACM ISMM).

researchmap
自動メモリ管理方式を中心とした並列言語処理系の最適化に関する研究

Grant number：00J08839 2000 - 2001

日本学術振興会科学研究費助成事業特別研究員奨励費

遠藤敏夫

　 More details

Grant amount：\2000000 （ Direct Cost: \2000000 ）

researchmap

▼display all