Updated on 2026/03/11

写真a

 
ENDO TOSHIO
 
Organization
Institute of Integrated Research Supercomputing Research Center Professor
Title
Professor
External link

Degree

  • 博士(理学) ( 東京大学 )

Research Interests

  • memory hierarchy

  • high performance computing

  • GPGPU

  • Supercomputers

Research Areas

  • Informatics / High performance computing

Education

  • The University of Tokyo   Graduate School, Division of Science   Department of Information Science

    1996.4 - 2001.9

      More details

    Country: Japan

    Notes: Master course, Doctor course

    researchmap

  • The University of Tokyo   Faculty of Science   Department of Information Science

    1992.4 - 1996.3

      More details

    Country: Japan

    researchmap

Research History

  • Institute of Science Tokyo   Supercomputing Research Center, IIR   Professor

    2024.10

      More details

    Country:Japan

    researchmap

  • Tokyo Institute of Technology   GSIC   Professor

    2018.4 - 2024.9

      More details

  • Tokyo Institute of Technology   GSIC   Associate Professor

    2012.6 - 2018.3

      More details

Professional Memberships

Papers

  • FRUGAL: Pushing GPU Applications beyond Memory Limits

    Lingqi Zhang, Tengfei Wang, Jiajun Huang, Chen Zhuang, Ivan R. Ivanov, Peng Chen, Toshio Endo, Mohamed Wahib

    2026 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)   188 - 201   2026.1

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/cgo68049.2026.11395210

    researchmap

  • Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers

    Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    Proceedings of the 39th ACM International Conference on Supercomputing   57 - 72   2025.6

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3721145.3730422

    researchmap

  • An Optimization Technique for Hiding Communication Costs in 3D Parallel Training of Deep Learning Reviewed International coauthorship International journal

    Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

    2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)   1 - 10   2025.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/ccgrid64434.2025.00044

    researchmap

  • A General and Scalable GCN Training Framework on CPU Supercomputers.

    Chen Zhuang, Peng Chen 0035, Xin Liu 0020, Rio Yokota, Nikoli Dryden, Lingqi Zhang 0001, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    PPoPP   566 - 568   2025

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3710848.3710860

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ppopp/ppopp2025.html#ZhuangCLYD0EMW25

  • Challenges in Computing Resource Sharing Towards Next-Gen Interactive Accelerated HPC Reviewed International journal

    Toshio Endo, Shohei Minami, Akihiro Nomura, Hiroki Ohtsuji, Jun Kato, Masahiro Miwa, Eiji Yoshida, Tomoya Yuki, Ryuichi Sakamoto

    Interactive and Urgent High-Performance Computing (CIW-IUS), in conjunction with ISC24, LNCS 15058   231 - 242   2024.12

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer Nature Switzerland  

    DOI: 10.1007/978-3-031-73716-9_16

    researchmap

  • TSUBAME4.0の処理量担保のための計算ノード分割

    野村 哲弘, 遠藤 敏夫

    2024年度大学ICT推進協議会(AXIES)年次大会, 10AM2C-5   2024.12

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • TSUBAME4.0: HPC-AI時代に向けた東京科学大学のもっとみんなのスパコン

    安良岡由規, 遠藤敏夫, 野村哲弘, 渡邊 寿雄, 鶴見 慶

    2024年度大学ICT推進協議会(AXIES)年次大会, 10AM1C-1   2024.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • SuperGCN: General and Scalable Framework for GCN Training on CPU-powered Supercomputers International coauthorship International journal

    Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    arXiv:2411.16025 [cs.DC]   2024.11

     More details

    Language:English   Publishing type:Research paper (other academic)  

    researchmap

  • HPC-AI時代に向けたもっとみんなのスパコンTSUBAME4.0

    遠藤 敏夫, 野村 哲弘, 渡邊 寿雄, 安良岡 由規, 鶴見 慶

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2024), 情報処理学会研究報告   2024-HPC-195 ( 8 )   2024.8

     More details

    Authorship:Lead author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Leveraging GPUDirect Storage for Efficient Image Reconstruction

    Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report   2024-HPC-195 ( 5 )   2024.8

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • High-performance Graph Convolutional Networks Training on Fugaku and ABCI Supercomputers International coauthorship

    Chen Zhuang, Peng Chen, Xin Liu, Rio Yokota, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report   2024-HPC-195 ( 14 )   2024.8

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • FRUGAL: Reducing GPU Memory Requirement of HPC Applications

    Tengfei Wang, Lingqi Chang, Ivan Ivanov, Peng Chen, Toshio Endo, Mohamed Wahib

    Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report   2024-HPC-195 ( 27 )   2024.8

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • タンパク質構造予測プログラムOmegaFoldのマルチGPUを用いた高速化 Reviewed

    大沢 泰生, 遠藤 敏夫, 細木 隆豊

    Cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG 2024)   2024.8

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • スパコンTSUBAMEシリーズにおけるリソース分割戦略

    野村 哲弘, 遠藤 敏夫

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2024), 情報処理学会研究報告   2024-HPC-195 ( 7 )   2024.8

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Real-time High-resolution X-Ray Computed Tomography Reviewed International coauthorship International journal

    Du Wu, Peng Chen, Xiao Wang, Issac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    Proceedings of the 38th ACM International Conference on Supercomputing   110 - 123   2024.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3650200.3656634

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3650200.3656634

  • ステンシル計算の時間ブロッキングフレームワークの実装と評価

    瓜生 侑, 遠藤 敏夫

    情報処理学会研究報告   2024-HPC-194 ( 3 )   2024.5

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • An optimization pass for training speed-up and strategy search in 3D parallelism International coauthorship

    Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

    IPSJ SIG Technical Report   2024-HPC-194 ( 7 )   2024.5

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Accelerating Stencil Computations on a GPU by Combining Using Tensor Cores and Temporal Blocking Reviewed

    Futa Kambe, Toshio Endo

    16th Workshop on General Purpose Processing Using GPU   1 - 6   2024.3

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3649411.3649412

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3649411.3649412

  • Retargeting and Respecializing GPU Workloads for Performance Portability Reviewed

    Ivan R. Ivanov, Oleksandr Zinenko, Jens Domke, Toshio Endo, William S. Moses

    2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)   119 - 132   2024.3

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/cgo57630.2024.10444828

    researchmap

  • Automatic Parallelization and OpenMP Offloading of Fortran Array Notation Reviewed

    Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert

    proceedings of 20th International Workshop on OpenMP (IWOMP 2024), LNCS 15195   197 - 209   2024.3

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-031-72567-8_13

    researchmap

  • High Throughput 3D Image Reconstruction with GPUDirect and Tensor Core

    Du Wu, Peng Chen, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    IPSJ SIG Technical Report   2024-HPC-193 ( 25 )   2024.3

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • AshPipe: Asynchronous Hybrid Pipeline Parallel for DNN Training Reviewed

    Ryubu Hosoki, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami

    Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region   117 - 126   2024.1

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3635035.3635045

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3635035.3635045

  • Communication Optimization for Distributed GCN Training on ABCI Supercomputer.

    Chen Zhuang, Peng Chen 0035, Xin Liu 0020, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    IEEE International Conference on Cluster Computing   160 - 161   2024

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/CLUSTERWorkshops61563.2024.00038

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#Zhuang0LEMW24

  • Investigating Nvidia GPU Architecture Trends via Microbenchmarks.

    Lingqi Zhang 0001, Ryan Barton, Peng Chen 0035, Xiao Wang 0004, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    IEEE International Conference on Cluster Computing   174 - 175   2024

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/CLUSTERWorkshops61563.2024.00045

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#ZhangBCWEMW24

  • Asynchronous I/O Optimization for X-Ray Imaging via GPUDirect Storage.

    Du Wu, Peng Chen 0035, Yiyu Tan, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    IEEE International Conference on Cluster Computing   196 - 197   2024

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/CLUSTERWorkshops61563.2024.00056

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#WuCTTEMW24

  • Pyramid Swin Transformer for Multi-task: Expanding to More Computer Vision Tasks Reviewed

    Chenyu Wang, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami

    Proceedings of Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS 2023), Springer, LNCS Vol. 14124   53 - 65   2023.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer Nature Switzerland  

    DOI: 10.1007/978-3-031-45382-3_5

    researchmap

  • The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System Reviewed

    Shohei Minami, Toshio Endo, Akihiro Nomura

    2023 IEEE High Performance Extreme Computing Conference (HPEC)   1 - 7   2023.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/hpec58863.2023.10363580

    researchmap

  • Scalable Training of Graph Convolutional Networks on Supercomputers

    Chen Zhuang, Peng Chen, Xin Liu, Satoshi Matsuoka, Toshio Endo, Mohamed Wahib

    Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2023), IPSJ SIG Technical Report   2023-HPC-190 ( 19 )   2023.8

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • High-performance Temporal Blocking Stencils at Low GPU Occupancy

    Lingqi Zhang, Mohamed Wahib, Peng Chen, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka

    Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2023), IPSJ SIG Technical Report   2023-HPC-190 ( 26 )   2023.8

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 動的スケジューリングライブラリを用いたPythonにおける分散コレスキー分解の実装と評価

    岡本 洸琉, 遠藤 敏夫

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2023), 情報処理学会研究報告   2023-HPC-190 ( 15 )   2023.8

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • GPU上のTensor coreを使ったステンシル計算の時間ブロッキングによる高速化

    神戸 風太, 遠藤 敏夫

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2023), 情報処理学会研究報告   2023-HPC-190 ( 29 )   2023.8

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Enhancing the Performance of AlphaFold Through Modified Storage Method and Optimization of HHblits on TSUBAME3.0 Supercomputer

    Hayato Fujita, Akihiro Nomura, Toshio Endo, Masakazu Sekijima

    2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)   2023.7

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/csce60160.2023.00351

    researchmap

  • PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications Invited Reviewed

    Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

    In proceedings of ACM International Conference on Supercomputing (ICS 2023), Orlando, June 2023.   2023.6

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3577193.3593705

    researchmap

  • Revisiting Temporal Blocking Stencil Optimizations Invited Reviewed

    Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

    In proceedings of ACM International Conference on Supercomputing (ICS 2023), Orlando, June 2023.   2023.6

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3577193.3593716

    researchmap

  • 次世代高性能メモリシステムにおけるステンシル計算の局所性向上技術の評価

    幸 朋矢, 遠藤 敏夫

    情報処理学会研究報告   2023-HPC-188 ( 31 )   2023.3

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Effectiveness of the Oversubscribing Scheduling on Supercomputer Systems Reviewed

    Shohei Minami, Toshio Endo, Akihiro Nomura

    Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region   18 - 28   2023.2

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3578178.3578221

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3578178.3578221

  • Exploiting Scratchpad Memory for Deep Temporal Blocking Reviewed

    Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

    Proceedings of the 15th Workshop on General Purpose Processing Using GPU   34 - 35   2023.2

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3589236.3589242

    researchmap

  • High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

    William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko

    Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming   119 - 134   2023.2

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3572848.3577475

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3572848.3577475

  • Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection

    Chenyu Wang, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami

    Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications   583 - 590   2023

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:SCITEPRESS - Science and Technology Publications  

    DOI: 10.5220/0011675800003417

    researchmap

  • 機械学習を用いた音声処理に向けたデータ拡張手法の研究

    丸山 翼, 池上 努, 遠藤 敏夫, 広渕 崇宏

    電子情報通信学会 応用音響研究会 技術研究報告   2022.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Efficient Stencil Computation with Temporal Blocking by Halide DSL Reviewed

    Hiroki Aikawa, Toshio Endo, Tomoya Yuki, Takahiro Hirofuchi, Tsutomu Ikegami

    2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)   870 - 877   2022.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/ispa-bdcloud-socialcom-sustaincom57177.2022.00116

    researchmap

  • Breaking the Memory Bottleneck for Iterative Memory-bound Applications Via Persistent Kernels

    Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

    IPSJ SIG Technical Report   2022-HPC-187 ( 18 )   2022.12

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Speed-Up Single Shot Detector on GPU with CUDA Reviewed

    Chenyu Wang, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami

    Proceedings of SNPD2022-summer, Studies in Computational Intelligence   1074   89 - 106   2022.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer International Publishing  

    DOI: 10.1007/978-3-031-19604-1_7

    researchmap

  • 3D Stacked SRAMを活用したHPC向けメモリアーキテクチャの検討

    萩原 汐, 吉川 隆英, 幸 朋矢, 遠藤 敏夫

    デザインガイア2022, 情報処理学会研究報告   2022-SLDM-200 ( 31 )   2022.11

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • ラムダ式を用いる移植性の高い並列プログラムの実装とCPU・GPU上の評価

    瓜生 侑, 遠藤 敏夫

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2022), 情報処理学会研究報告   2022-HPC-185 ( 20 )   2022.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

    William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfer, Oleksandr Zinenko

    arXiv:2207.00257 [cs.PL]   2022.7

     More details

    Language:English   Publishing type:Research paper (other academic)  

    researchmap

  • 負荷分散を改善したハイブリッドパイプライン並列深層学習手法

    細木 隆豊, 遠藤 敏夫, 広渕 崇宏, 池上 努

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2022), 情報処理学会研究報告   2022-HPC-185 ( 17 )   2022.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • タンパク質構造解析システムAlphafoldの実行時ファイルステージングを用いた高速化

    大沢 泰生, 遠藤 敏夫, 野村 哲弘

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2022), 情報処理学会研究報告   2022-HPC-185 ( 24 )   2022.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations

    Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hiroki Kobayashi, Shin Matsushima, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuke Shimotoku, Daisuke Miyamoto, Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa, Ikki Fujiwara, Yusuke Tanimura, Takayuki Aoki, Toshio Endo, Satoshi Ohshima, Keiichiro Fukazawa, Susumu Date, Toshihiro Uchibayashi

    arXiv:2203.14188 [cs.LG]   2022.3

     More details

    Language:English   Publishing type:Research paper (other academic)  

    researchmap

  • Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems Reviewed

    Shohei Minami, Toshio Endo, Akihiro Nomura

    proceedings of 24th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2021), LNCS   12985   59 - 79   2021.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer International Publishing  

    DOI: 10.1007/978-3-030-88224-2_4

    researchmap

  • Performance Modeling of HPC Applications on Overcommitted Systems.

    Shohei Minami, Toshio Endo, Akihiro Nomura 0002

    HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region   129 - 132   2021

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3432261.3439866

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/hpcasia/hpcasia2021.html#MinamiE021

  • Integrating Cache Oblivious Approach with Modern Processor Architecture Reviewed

    Toshio Endo

    Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region   123 - 130   2020.1

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3368474.3368477

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3368474.3368477

  • AN5D: automated stencil framework for high-degree temporal blocking on GPUs. Reviewed

    Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka

    CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization(CGO)   199 - 211   2020

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    Stencil computation is one of the most widely-used compute patterns in high
    performance computing applications. Spatial and temporal blocking have been
    proposed to overcome the memory-bound nature of this type of computation by
    moving memory pressure from external memory to on-chip memory on GPUs. However,
    correctly implementing those optimizations while considering the complexity of
    the architecture and memory hierarchy of GPUs to achieve high performance is
    difficult. We propose AN5D, an automated stencil framework which is capable of
    automatically transforming and optimizing stencil patterns in a given C source
    code, and generating corresponding CUDA code. Parameter tuning in our framework
    is guided by our performance model. Our novel optimization strategy reduces
    shared memory and register pressure in comparison to existing implementations,
    allowing performance scaling up to a temporal blocking degree of 10. We achieve
    the highest performance reported so far for all evaluated stencil benchmarks on
    the state-of-the-art Tesla V100 GPU.

    DOI: 10.1145/3368826.3377904

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/cgo/cgo2020.html#MatsumuraZWEM20

  • AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs.

    Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka

    CoRR   abs/2001.01473   2020

     More details

    Publishing type:Research paper (scientific journal)  

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2001.html#abs-2001-01473

  • Profiling based Out-of-core Hybrid Method for Large Neural Networks

    Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo

    arXiv:1907.05013 [cs.LG]   2019.7

     More details

    Language:English   Publishing type:Research paper (other academic)  

    researchmap

  • An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation Reviewed

    Yukinori Sato, Tomoya Yuki, Toshio Endo

    ACM Transactions on Architecture and Code Optimization   15 ( 4 )   1 - 23   2018.12

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:Association for Computing Machinery (ACM)  

    On modern many-core CPUs, performance tuning against complex memory subsystems and scalability for parallelism is mandatory to achieve their potential. In this article, we focus on loop tiling, which plays an important role in performance tuning, and develop a novel framework that analytically models the load balance and empirically autotunes unpredictable cache behaviors through iterative polyhedral compilation using LLVM/Polly. From an evaluation on many-core CPUs, we demonstrate that our autotuner achieves a performance superior to those that use conventional static approaches and well-known autotuning heuristics. Moreover, our autotuner achieves almost the same performance as a brute-force search-based approach.

    DOI: 10.1145/3293449

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3293449

  • Scalable RMA-based Communication Library Featuring Node-local NVMs Reviewed

    Ryo Matsumiya, Toshio Endo

    2018 IEEE High Performance extreme Computing Conference (HPEC)   1 - 7   2018.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/hpec.2018.8547546

    researchmap

  • Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy Reviewed

    Toshio Endo

    2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA)   19 - 24   2018.8

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/nvmsa.2018.00016

    researchmap

  • Exhaustive evaluation of memory-latency sensitivity on manycore processors with large cache Reviewed

    Noboru Tanabe, Toshio Endo

    Proceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications   27 - 34   2018.3

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3195612.3195616

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3195612.3195616

  • Characterizing Memory-Latency Sensitivity of Sparse Matrix Kernels Reviewed

    Noboru Tanabe, Toshio Endo

    2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)   249 - 254   2018.3

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/pdp2018.2018.00042

    researchmap

  • ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity Reviewed

    Yuki Ito, Ryo Matsumiya, Toshio Endo

    2017 IEEE International Conference on Big Data (Big Data)   2017.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/bigdata.2017.8257926

    researchmap

  • Overview of TSUBAME3.0, Green Cloud Supercomputer for Convergence of HPC, AI and Big-Data

    松岡聡, 遠藤敏夫, 額田彰, 三浦信一, 野村哲弘, 佐藤仁, 實本英之, DROZD Aleksandr

    Tsubame e-Science Journal   16   02‐08 (JA),20‐27 (EN) - 9   2017.11

     More details

    Language:Japanese   Publishing type:Research paper (bulletin of university, research institution)  

    J-GLOBAL

    researchmap

  • Applying Temporal Blocking with a Directive-based Approach Reviewed

    Shota Kuroda, Toshio Endo, Satoshi Matsuoka

    Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC   1 - 11   2017.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3148173.3148190

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3148173.3148190

  • A Stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers Reviewed

    Shimokawabe, Takashi, Endo, Toshio, Onodera, Naoyuki, Aoki, Takayuki

    Proceedings of 2017 IEEE International Conference on Cluster Computing (IEEE Cluster 2017) (Internet)   525 - 529   2017.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    Stencil-based applications such as CFD have succeeded in obtaining high performance on GPU supercomputers. The problem sizes of these applications are limited by the GPU device memory capacity, which is typically smaller than the host memory. On GPU supercomputers, a locality improvement technique using temporal blocking method with memory swapping between host and device enables large computation beyond the device memory capacity. Our high-productivity stencil framework automatically applies temporal blocking to boundary exchange required for stencil computation and supports automatic memory swapping provided by a MPI/CUDA wrapper library. The framework-based application for the airflow in an urban city maintains 80\% performance even with the twice larger than the GPU memory capacity and have demonstrated good weak scalability on the TSUBAME 2.5 supercomputer.

    DOI: 10.1109/cluster.2017.97

    researchmap

  • An Accurate Simulator of Cache-Line Conflicts to Exploit the Underlying Cache Performance Reviewed

    Yukinori Sato, Toshio Endo

    Proceedings of 23rd International European Conference on Parallel and Distributed Computing (Euro-par 2017)   119 - 133   2017.8

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer International Publishing  

    DOI: 10.1007/978-3-319-64203-1_9

    researchmap

  • HPCとビッグデータ・AIを融合するグリーン・クラウドスパコンTSUBAME3.0の概要

    松岡聡, 遠藤敏夫, 額田彰, 三浦信一, 野村哲弘, 佐藤仁, 實本英之, DROZD Aleksandr

    情報処理学会研究報告(Web)   2017 ( HPC-160 )   Vol.2017‐HPC‐160,No.29,1‐6 (WEB ONLY)   2017.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    J-GLOBAL

    researchmap

  • ExanaDBT: A Dynamic Compilation System for Transparent Polyhedral Optimizations at Runtime Reviewed

    Yukinori Sato, Tomoya Yuki, Toshio Endo

    Proceedings of the Computing Frontiers Conference   191 - 200   2017.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3075564.3077627

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3075564.3077627

  • Evaluating the impacts of code-level performance tunings on power efficiency Reviewed

    Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, Takatsugu Ono

    2016 IEEE International Conference on Big Data (Big Data)   362 - 369   2016.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/bigdata.2016.7840624

    researchmap

  • PGAS Communication Runtime for Extreme Large Data Computation Reviewed

    Ryo Matsumiya, Toshio Endo

    2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2)   10 - 16   2016.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/espm2.2016.007

    researchmap

  • Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters Reviewed

    Toshio Endo

    2016 IEEE International Conference on Cluster Computing (CLUSTER)   21 - 29   2016.9

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/cluster.2016.61

    researchmap

  • From FLOPS to BYTES: Disruptive change in high-performance computing towards the post-moore era Reviewed

    Satoshi Matsuoka, Hideharu Amano, Kengo Nakajima, Koji Inoue, Tomohiro Kudoh, Naoya Maruyama, Kenjiro Taura, Takeshi Iwashita, Takahiro Katagiri, Toshihiro Hanawa, Toshio Endo

    2016 ACM International Conference on Computing Frontiers - Proceedings   274 - 281   2016.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Association for Computing Machinery, Inc  

    DOI: 10.1145/2903150.2906830

    Scopus

    researchmap

    Other Link: http://dblp.uni-trier.de/db/conf/cd/cf2016.html#conf/cd/MatsuokaANIKMTI16

  • Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers. Reviewed

    Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9725   265 - 274   2016.3

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-319-42432-3_33

    Web of Science

    Scopus

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/icms/icms2016.html#FujisawaEY16

  • Dynamic Compilation for Transparent Data Locality Analysis and Memory Subsystem Tuning Reviewed

    Yukinori Sato, Toshio Endo

    The International Workshop on Architectural and Micro-Architectural Support for Dynamic Optimization (AMAS-DO)   2016.3

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • A Cache-aware Temporal Blocking Method for 3D Stencil Computation Reviewed

    Shimpei Sato, Yukinori Sato, Toshio Endo

    3rd International Workshop on High-Performance Stencil Computations (HiStencils 2016), In conjunction with HiPEAC 2016   2016.1

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • The scalable petascale data-driven approach for the Cholesky factorization with multiple GPUs. Reviewed

    Yuki Tsujita, Toshio Endo, Katsuki Fujisawa

    Proceedings of ESPM2 2015: 1st International Workshop on Extreme Scale Programming Models and Middleware - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis   38 - 45   2015.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/2832241.2832245

    Scopus

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/sc/espm2015.html#TsujitaEF15

  • Exana: an execution-driven application analysis tool for assisting productive performance tuning Reviewed

    Yukinori Sato, Shimpei Sato, Toshio Endo

    Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems   1 - 10   2015.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/2837476.2837477

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/2837476.2837477

  • Investigating potential performance benefits of memory layout optimization based on roofline model Reviewed

    Shimpei Sato, Yukinori Sato, Toshio Endo

    Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems   50 - 56   2015.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/2837476.2837483

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/2837476.2837483

  • Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers Reviewed

    Katsuki Fujisawa, Toyotaro Suzumura, Hitoshi Sato, Koji Ueno, Yuichiro Yasui, Keita Iwabuchi, Toshio Endo

    Optimization in the Real World - Toward Solving Real-World Optimization Problems -, Series of Mathematics for Industry   1 - 13   2015.9

     More details

    Language:English   Publishing type:Part of collection (book)   Publisher:Springer Japan  

    DOI: 10.1007/978-4-431-55420-2_1

    researchmap

  • TSUBAME2におけるスケジュール効率化への取り組みとユーザ動向の見える化

    野村哲弘, 野村哲弘, 佐々木淳, 三浦信一, 三浦信一, 遠藤敏夫, 遠藤敏夫, 松岡聡, 松岡聡

    情報処理学会研究報告(Web)   2015 ( HPC-150 )   VOL.2015-HPC-150,NO.2 (WEB ONLY)   2015.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    J-GLOBAL

    researchmap

  • Power Capping of CPU-GPU Heterogeneous Systems using Power and Performance Models Reviewed

    Kazuki Tsuzuku, Toshio Endo

    Proceedings of the 4th International Conference on Smart Cities and Green ICT Systems   226 - 233   2015.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SCITEPRESS - Science and and Technology Publications  

    DOI: 10.5220/0005445102260233

    researchmap

  • Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition Reviewed

    Yuki Tsujita, Toshio Endo

    Proceedings of Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP)   2015.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers Reviewed

    Toshio Endo, Yuki Takasaki, Satoshi Matsuoka

    2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS)   625 - 632   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/ICPADS.2015.84

    Web of Science

    researchmap

  • Exploration of Lossy Compression for Application-level Checkpoint/Restart Reviewed

    Naoto Sasaki, Kento Sato, Toshio Endo, Satoshi Matsuoka

    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)   914 - 922   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPS.2015.67

    Web of Science

    researchmap

  • Efficient utilization of memory hierarchy to enable the computation on bigger domains for stencil computation in CPU-GPU based systems Reviewed

    Guanghao Jin, James Lin, Toshio Endo

    2014 International Conference on High Performance Computing and Applications (ICHPCA)   1 - 6   2014.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/ichpca.2014.7045354

    researchmap

  • Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations Reviewed

    Toshio Endo, Guanghao Jin

    2014 IEEE International Conference on Cluster Computing (CLUSTER)   132 - 139   2014.9

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/cluster.2014.6968747

    researchmap

  • 実アプリケーションを用いた計算機評価ベンチマークと性能リポジトリの開発

    野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡

    情報処理学会研究報告(Web)   2014 ( 29 )   1 - 7   2014.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:一般社団法人情報処理学会  

    J-GLOBAL

    researchmap

  • An evaluation of the potential of flash SSD as large and slow memory for stencil computations Reviewed

    Hiroko Midorikawa, Hideyuki Tan, Toshio Endo

    2014 International Conference on High Performance Computing & Simulation (HPCS)   268 - 277   2014.7

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/hpcsim.2014.6903695

    researchmap

  • Petascale general solver for semidefinite programming problems with over two million constraints Reviewed

    Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki

    Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS   1171 - 1180   2014.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPS.2014.121

    Web of Science

    Scopus

    researchmap

  • TSUBAME-KFC: a Modern Liquid Submersion Cooling Prototype towards Exascale Becoming the Greenest Supercomputer in the World Reviewed

    Toshio Endo, Akira Nukada, Satoshi Matsuoka

    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS)   360 - 367   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/padsw.2014.7097829

    Web of Science

    researchmap

  • Accelerating Quantum Chemistry Calculations with Graphical Processing Units - Toward in High-Density (HD) Silico Drug Discovery Reviewed

    Yohsuke Hagiwara, Kazuki Ohno, Masaya Orita, Ryota Koga, Toshio Endo, Yutaka Akiyama, Masakazu Sekijima

    CURRENT COMPUTER-AIDED DRUG DESIGN   9 ( 3 )   396 - 401   2013.9

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.2174/15734099113099990031

    Web of Science

    PubMed

    researchmap

  • システム評価のためのアプリケーション性能リポジトリの構築と性能モデルの評価

    野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡, 鈴木惣一朗, 丸山直也

    情報処理学会研究報告(Web)   2013 ( 4 )   1 - 6   2013.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:一般社団法人情報処理学会  

    J-GLOBAL

    researchmap

  • A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU Reviewed

    Guanghao Jin, Toshio Endo, Satoshi Matsuoka

    2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum   1080 - 1087   2013.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/ipdpsw.2013.58

    researchmap

  • Tsubame2.0: The first petascale supercomputer in japan and the greatest production in the world

    Satoshi Matsuoka, Takayuki Aoki, Toshio Endo, Hitoshi Sato, Shin'Ichiro Takizawa, Akihiko Nomura, Kento Sato

    Contemporary High Performance Computing: From Petascale toward Exascale   525 - 555   2013.1

     More details

    Publishing type:Part of collection (book)  

    Scopus

    researchmap

  • A Parallel Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPUs Reviewed

    Guanghao Jin, Toshio Endo, Satoshi Matsuoka

    2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   1 - 8   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/cluster.2013.6702633

    Web of Science

    researchmap

  • TSUBAME2.0におけるMulti-rail InfiniBandネットワークの性能評価

    野村 哲弘, Nomura Akihiro, 遠藤 敏夫, Endo Toshio, 松岡 聡, MATSUOKA SATOSHI

    情報処理学会研究報告   ( 2012 )   2012.12

     More details

    Language:Japanese   Publisher:情報処理学会  

    researchmap

  • High-performance general solver for extremely large-scale semidefinite programming problems Reviewed

    Katsuki Fujisawa, Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Makoto Yamashita, Maho Nakata

    International Conference for High Performance Computing, Networking, Storage and Analysis, SC   93 - 93   2012.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/SC.2012.67

    Web of Science

    Scopus

    researchmap

    Other Link: http://dl.acm.org/citation.cfm?id=2389122

  • Petaflop biofluidics simulations on a two million-core system Reviewed

    Massimo Bernaschi, Mauro Bisson, Toshio Endo, Satoshi Matsuoka, Massimiliano Fatica, Simone Melchionna

    Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis   1 - 12   2011.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/2063384.2063389

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/2063384.2063389

  • Molecular Dynamics Simulation of a Biomolecule with High Speed, Low Power and Accuracy Using GPU-Accelerated TSUBAME2.0 Supercomputer Reviewed

    Shiqiao Du, Takuro Udagawa, Toshio Endo, Masakazu Sekijima

    Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2011)   2011.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer Reviewed

    Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Akinori Yamanaka, Akira Nukada, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka

    Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis   1 - 11   2011

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/2063384.2063388

    Scopus

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/2063384.2063388

  • An 80-fold speedup, 15.0 TFlops GPU acceleration of non-hydrostatic weather model ASUCA production code Reviewed

    Takashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka

    2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010   2010

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/SC.2010.9

    Scopus

    researchmap

  • Linpack Evaluation on the TSUBAME Supercomputer with Hybrid Accelerators(<Special Topics>GPGPU Computing)

    Endo Toshio, Nukada Akira, Matsuoka Satoshi

    Bulletin of the Japan Society for Industrial and Applied Mathematics   20 ( 2 )   117 - 124   2010

     More details

    Language:Japanese   Publisher:The Japan Society for Industrial and Applied Mathematics  

    This paper reports Linpack benchmark evaluation on the TSUBAME supercomputer, a large scale hybrid supercomputer equipped with graphics processing units (GPUs) and ClearSpeed SIMD accelerators. With all of about 10,000 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, we have achieved 87 TFlops. This paper also describes our design policy and tuning method that take characteristics of accelerators into account, which are essential to achieve scalability on hybrid supercomputers. The design is significantly different from that of LANL RoadRunner, a hybrid system equipped with Cell processors. We discuss the difference from the viewpoint of system architecture.

    DOI: 10.11540/bjsiam.20.2_117

    CiNii Books

    researchmap

  • The Efficient Checkpoint based on Erasure Coding with Incremental Method

    JITSUMOTO HIDEYUKI, NAKAMURA SYUNSUKE, ENDO TOSHIO, MATSUOKA SATOSHI

    研究報告ハイパフォーマンスコンピューティング(HPC)   2009 ( 9 )   1 - 6   2009.10

     More details

    Language:Japanese   Publisher:情報処理学会  

    チェックポイント/リスタート手法は多くの大規模 HPC システムで利用されている耐故障機能である。しかし、近年の大規模 HPC システムのメモリサイズの急速な増大に比べ、並列ファイルシステムの I/O 帯域の増大は相対的に低いため、チェックポイント時間が増加してしまう問題がある。本研究では、チェックポイントのオーバヘッドを低く抑えつつ多重故障に対応することを目的とし、Erasure Coding を採用する。処理のボトルネックを避けるために Erasure Coding の符号演算処理を並列化し、かつ並列ファイルシステムの代わりにノードのローカルストレージにプロセスイメージを格納する。さらにプロセスイメージの大きさを削減するために、Incremental Checkpoint 手法を採用する。この手法はチェックポイント時に前回のプロセスイメージとの差分部分を記憶するものである。並列環境において行列積演算および NPB LU ベンチマークを用いた実験の結果、Incremental Checkpoint を用いたときに 28-84% の性能向上を確認した。Checkpointing/restarting is a well-known method as a fault tolerance mechanism in large scale HPC systems. However, overhead of this method tends to get larger, since memory size of recent systems is increasing rapidly, while the improvement of I/O bandwidth of file systems is relatively mild. The purpose of this work is to achieve checkpointing that supports multiple faults with low overhead by utilizing erasure coding. To eliminate the bottleneck, we parallelize encoding and store process images into node-local storage instead of shared file systems. Furthermore, to reduce sizes of process images, we adopt incremental checkpointing, which stores only parts of the process image that are modified since the previous checkpointing. Through parallel experiments using matrix multiply computation and NPB LU benchmark, we have observed 28 to 84% performance improvement by introducing incremental checkpointing.

    researchmap

  • A Study of MPI Communication on a Next Generation Optical Interconnect

    TAKIZAWA Shin'ichiro, ENDO Toshio, MATSUOKA Satoshi

    26 ( 3 )   5 - 19   2009.7

     More details

  • Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters

    Tomoaki Hamano, Toshio Endo, Satoshi Matsuoka

    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5   1912 - 1919   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • File Clustering Based Replication Algorithm in a Grid Environment Reviewed

    Hitoshi Sato, Satoshi Matsuoka, Toshio Endo

    CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID   204 - 211   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGRID.2009.73

    Web of Science

    researchmap

  • Bandwidth intensive 3-D FFT kernel for GPUs using CUDA Reviewed

    A. Nukada, Y. Ogata, T. Endo, S. Matsuoka

    2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis   1 - 11   2008.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/sc.2008.5213210

    Web of Science

    researchmap

  • Massive supercomputing coping with heterogeneity of modern accelerators Reviewed

    Toshio Endo, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8   1179 - 1188   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Locality aware MPI communication on a commodity opto-electronic hybrid network Reviewed

    Shin'ichiro Takizawa, Toshio Endo, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8   2158 - +   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • An efficient, model-based CPU-GPU heterogeneous FFT library Reviewed

    Yasuhito Ogata, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8   380 - +   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid Environment Reviewed

    Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Naoya Maruyama

    2008 9TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING   250 - 257   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method Reviewed

    Yuto Hosogaya, Toshio Endo, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8   862 - 869   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • 性能モデルに基づくCPU及びGPUを併用する効率的なFFTライブラリ

    尾形泰彦, 遠藤敏夫, 丸山直也, 松岡聡

    情報処理学会論文誌コンピューティングシステム   1 ( 1 )   40 - 50   2008

     More details

  • ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs. Reviewed

    Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka

    21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26-30 March 2007, Long Beach, California, USA   1 - 8   2007

     More details

  • High-performance MPI broadcast algorithm for grid environments utilizing multi-lane NICs Reviewed

    Tatsuhiro Chiba, Toshio Endo, Satoshi Matsuoka

    CCGRID 2007: SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID   487 - +   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

▼display all

Books

MISC

  • スーパーコンピュータTSUBAME3.0におけるAlphaFoldのデータベースの保存方法の変更とHHblitsの最適化による性能向上

    藤田隼斗, 野村哲弘, 遠藤敏夫, 遠藤敏夫, 関嶋政和

    情報処理学会研究報告(Web)   2023 ( MPS-144 )   2023

  • センサー情報を意識したジョブスケジューリング実現のための標準ジョブ履歴スキーマの提案

    野村, 哲弘, Nomura, Akihiro, 遠藤, 敏夫, Endo, Toshio

    情報処理学会研究報告   HPC-178 ( No. 14 )   1 - 8   2021.3

     More details

    Language:Japanese   Publisher:一般社団法人 情報処理学会  

    identifier:oai:t2r2.star.titech.ac.jp:50567350

    CiNii Research

    researchmap

  • TSUBAME2におけるジョブスケジューリング効率化への取り組みと検証

    野村, 哲弘, Nomura, Akihiro, 佐々木, 淳, Sasaki, Atsushi, 三浦, 信一, Miura, Shinichi, 遠藤, 敏夫, Endo, Toshio, 松岡, 聡, MATSUOKA, SATOSHI

    大学ICT推進協議会 2015年度年次大会 企画セッション HPCテクノロジー   2015.12

     More details

    Language:Japanese  

    identifier:oai:t2r2.star.titech.ac.jp:50307764

    CiNii Research

    researchmap

  • 計算科学と計算機科学のコデザインのためのミニアプリ(ミニアプリ集FIBERの紹介/アプリケーションのEmpiricalな性能モデル構築のためのプロファイル情報の収集/FIBERミニアプリの性能およびそのモデル化)

    丸山, 直也, 鈴木, 惣一朗, 三上, 和徳, 小村, 幸浩, 滝澤, 真一朗, 松田, 元彦, 野村, 哲弘, 三浦, 信一, 遠藤, 敏夫, 松岡, 聡

    ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集   2015   107 - 108   2015.5

     More details

    Language:Japanese   Publisher:情報処理学会  

    我々は計算科学アプリケーションを簡略したプログラムであるミニアプリの開発および整備を進めている.ミニアプリは元のフルセットのアプリケーションから評価に本質的に不要な部分を省いた簡略化したものであり,計算機科学研究開発における評価を迅速に進めることを目的としたツールとして開発・整備を進めている.HPC における計算機科学研究では実際のアプリケーションを用いた評価を行うことが望まれるが,往々にして実際のアプリケーションは広く入手可能なものとは限らず,評価に用いるためにはハードルが高い.我々のミニアプリ集である FIBER は原則としてソースコードが自由に入手可能であり,利用のための制限を設けていない点が特徴である (http://fiber-miniapp.github.io/).本セッションではまず我々のミニアプリを紹介し,続いてその性能評価およびモデル化に関する取り組みを紹介する.本セッションによってミニアプリを計算機科学研究における今後の評価に役立つツールとして広く普及することを狙う.また計算科学研究者には今後のミニアプリ集の拡充に協力を呼びかける場となることを狙う.

    CiNii Research

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00141717/

Presentations

  • An Optimization Technique for Hiding Communication Costs in 3D Parallel Training of Deep Learning International coauthorship International conference

    Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

    2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)  2025.5  IEEE

     More details

    Event date: 2025.5

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • Polyhedral Rescheduling of GPU Kernels To Exploit Async Memory Movement International coauthorship International conference

    Ivan R. Ivanov, William Moses, Emil Vatai, Toshio Endo, Jens Domke, Oleksandr Zinenko

    Ninth LLVM Performance Workshop at CGO  2025.3 

     More details

    Event date: 2025.3

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • Challenges in Computing Resource Sharing Towards Next-Gen Interactive Accelerated HPC International conference

    Toshio Endo, Shohei Minami, Akihiro Nomura, Hiroki Ohtsuji, Jun Kato, Masahiro Miwa, Eiji Yoshida, Tomoya Yuki, Ryuichi Sakamoto

    Interactive and Urgent High-Performance Computing (CIW-IUS), in conjunction with ISC24, LNCS 15058  2024.12  Springer Nature Switzerland

     More details

    Event date: 2024.12

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • TSUBAME4.0の処理量担保のための計算ノード分割

    野村 哲弘, 遠藤 敏夫

    2024年度大学ICT推進協議会(AXIES)年次大会, 10AM2C-5  2024.12 

     More details

    Event date: 2024.12

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • TSUBAME4.0: HPC-AI時代に向けた東京科学大学のもっとみんなのスパコン

    安良岡由規, 遠藤敏夫, 野村哲弘, 渡邊 寿雄, 鶴見 慶

    2024年度大学ICT推進協議会(AXIES)年次大会, 10AM1C-1  2024.12 

     More details

    Event date: 2024.12

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • System Research on TSUBAME Supercomputer Series in Tokyo Tech and Science Tokyo Invited International conference

    Toshio Endo

    Co-Creation Monthly Seminar, Fujitsu-Co-Creation Research Lab at the University of Toronto  2024.11 

     More details

    Event date: 2024.11

    Language:English   Presentation type:Public lecture, seminar, tutorial, course, or other speech  

    researchmap

  • Asynchronous I/O Optimization for X-ray Imaging via GPUDirect Storage International coauthorship International conference

    Du Wu, Peng Chen, Yiyu Tan, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    2024 IEEE International Conference on Cluster Computing (CLUSTER 2024)  2024.9 

     More details

    Event date: 2024.9

    Language:English   Presentation type:Poster presentation  

    researchmap

  • Investigating Nvidia GPU Architecture Trends via Microbenchmarks International coauthorship International conference

    Lingqi Zhang, Ryan Barton, Peng Chen, Xiao Wang, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    2024 IEEE International Conference on Cluster Computing (CLUSTER 2024)  2024.9 

     More details

    Event date: 2024.9

    Language:English   Presentation type:Poster presentation  

    researchmap

  • An optimization pass for training speed-up and strategy search in 3D parallelism International coauthorship International conference

    Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

    2024 IEEE International Conference on Cluster Computing (CLUSTER 2024)  2024.9 

     More details

    Event date: 2024.9

    Language:English   Presentation type:Poster presentation  

    researchmap

  • Communication Optimization for Distributed GCN Training on ABCI Supercomputer . 2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) poster session, Kobe, Sep 24-27, 2024. International coauthorship International conference

    Chen Zhuang, Peng Chen, Xin Liu, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    2024 IEEE International Conference on Cluster Computing (CLUSTER 2024)  2024.9 

     More details

    Event date: 2024.9

    Language:English   Presentation type:Poster presentation  

    researchmap

  • HPC-AI時代に向けたもっとみんなのスパコンTSUBAME4.0

    遠藤 敏夫, 野村 哲弘, 渡邊 寿雄, 安良岡 由規, 鶴見 慶

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2024), 情報処理学会研究報告  2024.8 

     More details

    Event date: 2024.8

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Leveraging GPUDirect Storage for Efficient Image Reconstruction

    Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report  2024.8 

     More details

    Event date: 2024.8

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • High-performance Graph Convolutional Networks Training on Fugaku and ABCI Supercomputers International coauthorship

    Chen Zhuang, Peng Chen, Xin Liu, Rio Yokota, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report  2024.8 

     More details

    Event date: 2024.8

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • FRUGAL: Reducing GPU Memory Requirement of HPC Applications

    Tengfei Wang, Lingqi Chang, Ivan Ivanov, Peng Chen, Toshio Endo, Mohamed Wahib

    Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report  2024.8 

     More details

    Event date: 2024.8

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • タンパク質構造予測プログラムOmegaFoldのマルチGPUを用いた高速化

    大沢 泰生, 遠藤 敏夫, 細木 隆豊

    Cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG 2024)  2024.8 

     More details

    Event date: 2024.8

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • スパコンTSUBAMEシリーズにおけるリソース分割戦略

    野村 哲弘, 遠藤 敏夫

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2024), 情報処理学会研究報告  2024.8 

     More details

    Event date: 2024.8

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Experiences with making a power measurement and submission for TSUBAME4.0, Level 3 International conference

    Toshio Endo, Akihiro Nomura

    EE HPC WG Workshop 2024  2024.6 

     More details

    Event date: 2024.6

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • Real-time High-resolution X-Ray Computed Tomography International coauthorship International conference

    Du Wu, Peng Chen, Xiao Wang, Issac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    Proceedings of the 38th ACM International Conference on Supercomputing  2024.5  ACM

     More details

    Event date: 2024.5

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

    Other Link: https://dl.acm.org/doi/pdf/10.1145/3650200.3656634

  • ステンシル計算の時間ブロッキングフレームワークの実装と評価

    瓜生 侑, 遠藤 敏夫

    情報処理学会研究報告  2024.5 

     More details

    Event date: 2024.5

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • An optimization pass for training speed-up and strategy search in 3D parallelism International coauthorship

    Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit

    IPSJ SIG Technical Report  2024.5 

     More details

    Event date: 2024.5

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • General and Scalable Framework for GCN Training on CPU-powered Supercomputers

    Chen Zhuang, Peng Chen, Xin Liu, Toshio Endo, Mohamed Wahib

    The 6th R-CCS International Symposium, poster session  2024.1 

     More details

    Event date: 2024.1

    Language:English   Presentation type:Poster presentation  

    researchmap

  • Optimizing Matrix Multiplication on Arm Architectures

    Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    The 6th R-CCS International Symposium  2024.1 

     More details

    Event date: 2024.1

    Language:English   Presentation type:Poster presentation  

    researchmap

  • The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System

    Shohei Minami, Toshio Endo, Akihiro Nomura

    The cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG 2023)  2023.8 

     More details

    Event date: 2023.8

    Language:English   Presentation type:Poster presentation  

    researchmap

  • TSUBAMEスパコンシリーズのデータセンターとしての側面 Invited

    遠藤 敏夫

    電子情報通信学会集積回路研究専門委員会 LSIとシステムのワークショップ2023  2023.5 

     More details

    Event date: 2023.5

    Language:Japanese   Presentation type:Public lecture, seminar, tutorial, course, or other speech  

    researchmap

  • Environmental-Aware Optimization of MPI Checkpointing Intervals

    Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING  2008.9 

     More details

    Event date: 2008.9

    Language:English   Presentation type:Poster presentation  

    researchmap

▼display all

Awards

  • 平成24年度科学技術分野の文部科学大臣表彰(開発部門)

    2012.4   文部科学省  

     More details

    「運用世界一グリーンペタスパコンの開発」について、松岡聡教授、青木尊之教授と共同受賞

    researchmap

  • 2011年度情報処理学会山下記念研究賞

    2012.3   IPSJ  

     More details

    Award type:Award from Japanese society, conference, symposium, etc. 

    HOKKE-18発表論文「ヘテロ型スーパーコンピュータTSUBAME 2.0のLinpackによる性能評価」に対して授与

    researchmap

  • 2011 ACM Gordon Bell Prize Special Achievements in Scalability and Time-to-Solution

    2011.11   ACM  

     More details

    Award type:Award from international society, conference, symposium, etc. 

    "Peta-scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer" (8 authors are awarded)

    researchmap

Research Projects

  • Deployment of Scalable System Software for Machine Learning Technology to Saving Computing Resources

    Grant number:23K28059  2023.4 - 2027.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

      More details

    Grant amount:\18460000 ( Direct Cost: \14200000 、 Indirect Cost:\4260000 )

    researchmap

  • Scalable System Software for Machine Learning on Heterogeneous Parallel Computing Environments

    Grant number:20H04165  2020.4 - 2023.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

      More details

    Grant amount:\17550000 ( Direct Cost: \13500000 、 Indirect Cost:\4050000 )

    researchmap

  • ExaPath: Hierarchical Routing for Next-Gen Supercomputers and Beyond

    Grant number:19H04119  2019.4 - 2024.3

    日本学術振興会  科学研究費助成事業  基盤研究(B)

    ドンケ イェンス, 遠藤 敏夫

      More details

    Grant amount:\17160000 ( Direct Cost: \13200000 、 Indirect Cost:\3960000 )

    In FY2020, the second year of the ExaPath project, we conducted two distinct studies for routing in HPC interconnects.
    The first published paper of this FY is a survey of data center and supercomputer networks, which investigates various aspects related to how multi-pathing is implemented in those systems, what type of routing they deploy, and how effectively utilize them for extensive communication loads. The survey with the title "High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers" was published in the IEEE Transactions on Parallel and Distributed Systems journal.
    The second published work, a peer-reviewed poster, is based on a Bachelor's thesis of our intern from Tokyotech which was presented at the 3rd R-CCS International Symposium. This thesis and poster tackled the fault resiliency of lossless interconnects and how to perform rerouting of the network while preserving certain properties, such as deadlock-freedom.
    Furthermore, we collaborated with researchers of ETH Zurich to develop a real Slimfly testbed and deploy the routing we developed in the previous FY. Simultaneously, we supervised with a colleague from ETH a second Bachelor's thesis with the topic of routing low-diameter topologies.
    Lastly, we disseminated our research findings through invited talks at the ISC High Performance conference (ISC'20) in a focus session on 'Photonics & Interconnects' and discussed our work and related routing and network topics with colleagues from academia and industry at various meetings and conference.

    researchmap

  • Autonomous HPC data center using machine learning

    Grant number:19H04121  2019.4 - 2022.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

    Matsuba Hiroya

      More details

    Grant amount:\17290000 ( Direct Cost: \13300000 、 Indirect Cost:\3990000 )

    To automate data center operations, we studied methods of acquiring data from data centers and reproducing them on a virtual space, as well as methods of optimizing operational policies on that virtual space.
    For the former, data acquisition, we succeeded in defining a general data format and storage format that can collect general-purpose data from many data centers useful for various operations without depending on differences in data format by device or data usage. For the latter operational optimization, we succeeded in optimizing job scheduling using reinforcement learning, which automatically learns control methods, and implemented a scheduling and cooling equipment simulator as a place for such learning.

    researchmap

  • Advancement of HPC Applications for Manufacturing Technology to Exascale

    Grant number:26220002  2014.5 - 2019.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (S)

    Aoki Takayuki, ONODERA Naoyuki, NUKADA Akira, ENDO Toshio

      More details

    Grant amount:\188370000 ( Direct Cost: \144900000 、 Indirect Cost:\43470000 )

    In exascale supercomputers which have relatively poor memory bandwidth and low inter-node connection speed to computational performance, applications of manufacturing technology have revolutionary changes to minimize “Time-to-Solution” by introducing new numerical methods and innovative numerical algorithms. Explicit schemes, Adaptive Mesh Refinement (AMR) method and dynamic load balance extremely improve the “Time-to-Solition”and computational scales of these exascale applications for fluids, materials and particles. We have a strong confidence on execution of exascale applications for manufacturing technology.

    researchmap

  • Fault Tolerant Infrastructure Toward Billion of Parallelization and Exa-scale Supercomputer

    Grant number:23220003  2011.4 - 2016.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (S)

    Matsuoka Satoshi, Hideyuki Jitsumoto, Toshio Endo, Hitoshi Sato, Naoya Maruyama, Shinichiro Takizawa, Kento Sato, Leonardo Bautista Gomez, Jens Domke

      More details

    Grant amount:\213720000 ( Direct Cost: \164400000 、 Indirect Cost:\49320000 )

    Fault tolerance has been recognized as an indispensable technique for exascale computing as supercomputers grow towards billion-way of parallelism. For future exascale supercomputers, we proposed advanced fault tolerant infrastructures. The advanced fault tolerant infrastructures include a scalable checkpoint/restart library, a fault tolerant messaging interface and a highly resilient burst buffer architecture. We validated the effectiveness based on mathematical statistics. We also released the software and made impact to the community.

    researchmap

  • Highly Scalable Software Construction Basis for Information Explosion Era

    Grant number:18049015  2006 - 2010

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research on Priority Areas

    CHIKAYAMA Takashi, YUASA Taiichi, UEDA Kazunori, TAURA Kenjiro, ENDO Toshio, YOKOYAMA Daisaku, UMATANI Seiji

      More details

    Grant amount:\64600000 ( Direct Cost: \64600000 )

    To cope with the explosive increase of data amount, frameworks for flexible description of software for widely distributed highly parallel information systems are required. For this purpose, programming languages, middleware systems, and veri-fication systems for highly complicated software have been investigated, and such systems have been proposed, designed, implemented and evaluated the performance. Represent-ative resultant software systems are made open to public.

    researchmap

  • 広域分散環境における高性能で記述の容易な並列ブログラミングシステムに関する研究

    Grant number:17700050  2005 - 2006

    日本学術振興会  科学研究費助成事業  若手研究(B)

    遠藤 敏夫

      More details

    Grant amount:\3400000 ( Direct Cost: \3400000 )

    本研究の目的は,動的・ヘテロ・大遅延の特徴を持つ大規模分散環境向けの並列プログラミングシステムの設計・評価である.タスク間の複雑な依存関係が存在する計算に対応し,大規模環境の特性を考慮したプログラミング環境を対象とし,応用分散アルゴリズムも対象とした.本年度は,近年の高性能アーキテクチャの動向を踏まえ,マルチコア・マルチCPUやアクセラレータ,GPUによるヘテロ型アーキテクチャへ対応する技術の提案・評価を行った.具体的には,ヘテロなノード間において並列プロセス数の調整を行い,かつそれぞれはCPUもしくはアクセラレータにおいてカーネル部分の計算を行うことにより異種計算機資源を効率的に利用するものである.提案プログラミング手法の実装・評価を東京工業大学の大規模計算機であるTSUBAME上において行った.ノードは汎用CPUとClearSpeed SIMDアクセラレータという異種計算機資源を持つため,双方を効率的に利用するようにプロセス数の調整を,現状では手動で,行った.並列Linpackを用いた評価により,本手法が十分に大きな問題サイズを持つときにスケーラビリティが高い(weakly scalable)ことを示した.結果については電子情報通信学会研究会(招待講演として)や情報処理学会シンポジウムHPCS2007で発表を行い,また情報処理学会論文誌に採択された.本年度購入した物品のうち,小額備品のノート型パソコンについては,システムの実装や実験のために利用した.また備品のXeonサーバは,合計8CPUコアを備えるサーバであり,マルチコアアーキテクチャの性能評価のために利用した.

    researchmap

  • Secure Programming Languages Based on Mobile Code

    Grant number:12133203  2000 - 2003

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research on Priority Areas

    YONEZAWA Akinori, OYAMA Yoshihiro, MASUHARA Hidehiko, TAURA Kenjiro

      More details

    Grant amount:\37800000 ( Direct Cost: \37800000 )

    We studied secure programming languages based on mobile code. We worked on essential research issues in security of programming languages and system software from both theoretical and practical aspects. First we gave a systematic solution to problems in programming languages, which is the most important element in software development. The solution is not ad-hoc but based on a solid theory. Furthermore, we addressed research issues in system software with a systematic solution based on deep understanding of target systems. The results of this research include a secure compiler for the C programming language (Fail-Safe C), an interface definition language for Fail-Safe C, an operating system that guarantees security using type systems (Kernel Mode Linux), a system that defends networks against distributed denial of service attacks (MovingFirewall), cryptographic lambda calculus, regular expression types for string operations, self-repairing reference monitors, a fine-grain access control mechanism between program modules, programming languages that support the description of mobile code (JavaGO and JavaGoX), object usage analysis for Java, efficient regjon-based memory management for a dynamical ly-typed programming language Scheme, and a type system for access control in distributed computation. The results are highly evaluated by internat ional communit of computer science. This research had a significant impact on academic and industrial fields by releasing three software and publishing about 30 refereed papers. We received four prestigious awards including the best paper award from Japan Society for Software Science and the best technology award from Nikkei Business Publications Inc.

    researchmap

  • Adaptive Software Substrate for High Performance Wide Area Computing

    Grant number:12308012  2000 - 2002

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (A)

    YONEZAWA Akinori, MASUHARA Hidehiko, SUMII Eijirou, TAURA Kenjirou, KOBAYASHI Naoki

      More details

    Grant amount:\41120000 ( Direct Cost: \36200000 、 Indirect Cost:\4920000 )

    The objective of this project was to establish foundations for building adaptive runtime systems, which behave well under various resource conditions (of CPU, memory, and network) that reveal at runtime. Main results are as follows, (1) We established an efficient access control (e.g., mutual exclusion) method for shared data, which adapts to the degree of parallelism in the application (published in ACM PaCT). (2) We proposed a framework in which parallel applications can migrate from a set of resources to another at runtime, depending on resource conditions (published or to be published in ACM PPoPP and ACM/IEEE CCGrid). (3) We tackled the resource selection problem in which the system, given computation/communication requirements of the application and computation/communication capacity of available resources, tries to select good resources for the application automatically. It proposed the problem formulation, an algorithm assuming resource requirements and conditions do not change over time, and conducted a simulation. (4) We established a dynamic memory management scheme that can trade thread-level locality of allocated memory and the total memory requirements. (5) We achieved a very short pause time (less than 10ms) for conservative garbage collectors, which have been considered to be difficult in the community (published in ACM ISMM).

    researchmap

  • 自動メモリ管理方式を中心とした並列言語処理系の最適化に関する研究

    Grant number:00J08839  2000 - 2001

    日本学術振興会  科学研究費助成事業  特別研究員奨励費

    遠藤 敏夫

      More details

    Grant amount:\2000000 ( Direct Cost: \2000000 )

    researchmap

▼display all