Updated on 2026/03/10

写真a

 
MATSUOKA SATOSHI
 
Organization
School of Computing Visiting Professor
Title
Visiting Professor
External link

News & Topics

▼display all

Degree

  • Master of Science

  • Doctor of Science

Research Areas

  • Informatics / High performance computing  / Supercomputer, high performance AI, power-efficient computing, HPC-driven Big Data, Heterogeneous HPC

Education

  • The University of Tokyo   Graduate School, Division of Science

    - 1989

      More details

    Country: Japan

    researchmap

  • The University of Tokyo   Faculty of Science

    - 1986

      More details

    Country: Japan

    researchmap

Research History

  • Institute of Science Tokyo   School of Computing   Visiting Professor

    2024.10

      More details

  • Tokyo Institute of Technology   School of Computing   Visiting Professor

    2023.4 - 2024.9

      More details

  • RIKEN   Riken Center for Computational Science (R-CCS)   Director

    2018.4

      More details

    Country:Japan

    researchmap

  • Tokyo Institute of Technology   School of Computing   Specially Appointed Professor

    2018.4 - 2023.3

      More details

  • Tokyo Institute of Technology   Global Scientific Information and Computing Center   Professor

    2001 - 2018.3

      More details

    Country:Japan

    researchmap

  • Tokyo Institute of Technology   Associate Professor

    1996

      More details

  • The University of Tokyo   Lecturer

    1993

      More details

  • The University of Tokyo

    1989

      More details

▼display all

Professional Memberships

  • IEEE Supercomputing

      More details

  • HPC Asia 2004

      More details

  • ACM Object-Oriented Programming: Languages, Systems and Applications (OOPSLA 2002)

      More details

  • IEEE Computing Clusters and the Grid (CCGrid 2003)

      More details

Committee Memberships

  • IEEE Supercomputing   Area Chair  

    2004   

      More details

    Committee type:Academic society

    researchmap

  • HPC Asia 2004   Program Co-chair  

    2004   

      More details

    Committee type:Academic society

    researchmap

  • IEEE Supercomputing   Area Chair  

    2004   

      More details

    Committee type:Academic society

    researchmap

  • HPC Asia 2004   Program Co-chair  

    2004   

      More details

    Committee type:Academic society

    HPC Asia 2004

    researchmap

  • IEEE Computing Clusters and the Grid (CCGrid 2003)   Program Chair  

    2003   

      More details

    Committee type:Academic society

    researchmap

  • IEEE Computing Clusters and the Grid (CCGrid 2003)   Program Chair  

    2003   

      More details

    Committee type:Academic society

    researchmap

  • ACM Object-Oriented Programming: Languages, Systems and Applications (OOPSLA 2002)   Program Chair  

    2002   

      More details

    Committee type:Academic society

    researchmap

  • ACM Object-Oriented Programming: Languages, Systems and Applications (OOPSLA 2002)   Program Chair  

    2002   

      More details

    Committee type:Academic society

    researchmap

▼display all

Papers

  • Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers

    Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    Proceedings of the 39th ACM International Conference on Supercomputing   57 - 72   2025.6

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3721145.3730422

    researchmap

  • A General and Scalable GCN Training Framework on CPU Supercomputers.

    Chen Zhuang, Peng Chen 0035, Xin Liu 0020, Rio Yokota, Nikoli Dryden, Lingqi Zhang 0001, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    PPoPP   566 - 568   2025

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3710848.3710860

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ppopp/ppopp2025.html#ZhuangCLYD0EMW25

  • Real-time High-resolution X-Ray Computed Tomography Invited Reviewed

    In proceedings of ACM International Conference on Supercomputing (ICS 2024), Kyoto, June 2023.   2024.6

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3650200.3656634

    researchmap

  • Asynchronous I/O Optimization for X-Ray Imaging via GPUDirect Storage.

    Du Wu, Peng Chen 0035, Yiyu Tan, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    IEEE International Conference on Cluster Computing   196 - 197   2024

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/CLUSTERWorkshops61563.2024.00056

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#WuCTTEMW24

  • Communication Optimization for Distributed GCN Training on ABCI Supercomputer.

    Chen Zhuang, Peng Chen 0035, Xin Liu 0020, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    IEEE International Conference on Cluster Computing   160 - 161   2024

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/CLUSTERWorkshops61563.2024.00038

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#Zhuang0LEMW24

  • Investigating Nvidia GPU Architecture Trends via Microbenchmarks.

    Lingqi Zhang 0001, Ryan Barton, Peng Chen 0035, Xiao Wang 0004, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

    IEEE International Conference on Cluster Computing   174 - 175   2024

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/CLUSTERWorkshops61563.2024.00045

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/cluster/clusterw2024.html#ZhangBCWEMW24

  • At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.

    Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang 0001, Peng Chen 0035, Aleksandr Drozd, Satoshi Matsuoka

    ACM Transactions on Architecture and Code Optimization   20 ( 4 )   57 - 26   2023.12

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1145/3629520

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/taco/taco20.html#DomkeVGKWPMPZCDM23

  • Myths and legends in high-performance computing.

    Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler

    International Journal of High Performance Computing Applications   37 ( 3-4 )   245 - 259   2023.7

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1177/10943420231166608

    researchmap

  • PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications Invited Reviewed

    Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

    In proceedings of ACM International Conference on Supercomputing (ICS 2023), Orlando, June 2023.   2023.6

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3577193.3593705

    researchmap

  • Revisiting Temporal Blocking Stencil Optimizations Invited Reviewed

    Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

    In proceedings of ACM International Conference on Supercomputing (ICS 2023), Orlando, June 2023.   2023.6

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3577193.3593716

    researchmap

  • Exploiting Scratchpad Memory for Deep Temporal Blocking

    Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

    Proceedings of the 15th Workshop on General Purpose Processing Using GPU   2023.2

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3589236.3589242

    researchmap

  • Simeuro: A Hybrid CPU-GPU Parallel Simulator for Neuromorphic Computing Chips

    Huaipeng Zhang, Nhut-Minh Ho, Yigit Polat Dogukan, Peng Chen, Mohamed Wahib, Truong Thao Nguyen, Jintao Meng, Rick Siow Mong Goh, Satoshi Matsuoka, Tao Luo, Weng-Fai Wong

    IEEE Transactions on Parallel and Distributed Systems   1 - 16   2023

     More details

    Publishing type:Research paper (scientific journal)   Publisher:Institute of Electrical and Electronics Engineers (IEEE)  

    DOI: 10.1109/tpds.2023.3291795

    researchmap

  • Scalable FBP decomposition for cone-beam CT reconstruction

    Peng Chen, Mohamed Wahib, Xiao Wang, Takahiro Hirofuchi, Hirotaka Ogawa, Ander Biguri, Richard Boardman, Thomas Blumensath, Satoshi Matsuoka

    International Conference for High Performance Computing, Networking, Storage and Analysis, SC   2021.11

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3458817.3476139

    Scopus

    researchmap

  • Performance portable back-projection algorithms on CPUs

    Peng Chen, Mohamed Wahib, Xiao Wang, Shinichiro Takizawa, Takahiro Hirofuchi, Hirotaka Ogawa, Satoshi Matsuoka

    Proceedings of the ACM International Conference on Supercomputing   2021.6

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3447818.3460353

    researchmap

  • MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

    Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey C. Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shirahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda, Toshiyuki Imamura, Akihiko Kasagi, Kentaro Kawakami, Shuhei Kudo, Akiyoshi Kuroda, Maxime Martinasso, Satoshi Matsuoka, Henrique Mendonça, Kazuki Minami, Prabhat Ram, Takashi Sawada, Mallikarjun Shankar, Tom St. John, Akihiro Tabuchi, Venkatram Vishwanath, Mohamed Wahib, Masafumi Yamazaki, Junqi Yin

    CoRR   abs/2110.11466   2021

     More details

    Publishing type:Research paper (scientific journal)  

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2110.html#abs-2110-11466

  • Scalable FBP decomposition for cone-beam CT reconstruction.

    Peng Chen, Mohamed Wahib, Xiao Wang 0004, Takahiro Hirofuchi, Hirotaka Ogawa, Ander Biguri, Richard P. Boardman, Thomas Blumensath, Satoshi Matsuoka

    SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis(SC)   9 - 9   2021

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3458817.3476139

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/sc/sc2021.html#ChenWWHOBBBM21

  • Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?

    Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang 0001, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

    1056 - 1065   2021

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPS49936.2021.00114

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ipps/ipdps2021.html#DomkeVDCO0SMPWM21

  • A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs.

    Lingqi Zhang 0001, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka

    2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(IPDPS)   483 - 493   2020

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    GPUs are playing an increasingly important role in general-purpose computing.
    Many algorithms require synchronizations at different levels of granularity in
    a single GPU. Additionally, the emergence of dense GPU nodes also calls for
    multi-GPU synchronization. Nvidia's latest CUDA provides a variety of
    synchronization methods. Until now, there is no full understanding of the
    characteristics of those synchronization methods. This work explores important
    undocumented features and provides an in-depth analysis of the performance
    considerations and pitfalls of the state-of-art synchronization methods for
    Nvidia GPUs. The provided analysis would be useful when making design choices
    for applications, libraries, and frameworks running on single and/or multi-GPU
    environments. We provide a case study of the commonly used reduction operator
    to illustrate how the knowledge gained in our analysis can be useful. We also
    describe our micro-benchmarks and measurement methods.

    DOI: 10.1109/IPDPS47924.2020.00057

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ipps/ipdps2020.html#ZhangWZM20

  • The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism.

    Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Erin McCarthy, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Peter Nugent, Brian Van Essen

    CoRR   abs/2007.12856   2020

     More details

    Publishing type:Research paper (scientific journal)  

    We present scalable hybrid-parallel algorithms for training large-scale 3D
    convolutional neural networks. Deep learning-based emerging scientific
    workflows often require model training with large, high-dimensional samples,
    which can make training much more costly and even infeasible due to excessive
    memory usage. We solve these challenges by extensively applying hybrid
    parallelism throughout the end-to-end training pipeline, including both
    computations and I/O. Our hybrid-parallel algorithm extends the standard data
    parallelism with spatial parallelism, which partitions a single sample in the
    spatial domain, realizing strong scaling beyond the mini-batch dimension with a
    larger aggregated memory capacity. We evaluate our proposed training algorithms
    with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive
    performance studies show that good weak and strong scaling can be achieved for
    both networks using up 2K GPUs. More importantly, we enable training of
    CosmoFlow with much larger samples than previously possible, realizing an
    order-of-magnitude improvement in prediction accuracy.

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2007.html#abs-2007-12856

  • AN5D: automated stencil framework for high-degree temporal blocking on GPUs.

    Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka

    CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization(CGO)   199 - 211   2020

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    Stencil computation is one of the most widely-used compute patterns in high
    performance computing applications. Spatial and temporal blocking have been
    proposed to overcome the memory-bound nature of this type of computation by
    moving memory pressure from external memory to on-chip memory on GPUs. However,
    correctly implementing those optimizations while considering the complexity of
    the architecture and memory hierarchy of GPUs to achieve high performance is
    difficult. We propose AN5D, an automated stencil framework which is capable of
    automatically transforming and optimizing stencil patterns in a given C source
    code, and generating corresponding CUDA code. Parameter tuning in our framework
    is guided by our performance model. Our novel optimization strategy reduces
    shared memory and register pressure in comparison to existing implementations,
    allowing performance scaling up to a temporal blocking degree of 10. We achieve
    the highest performance reported so far for all evaluated stencil benchmarks on
    the state-of-the-art Tesla V100 GPU.

    DOI: 10.1145/3368826.3377904

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/cgo/cgo2020.html#MatsumuraZWEM20

  • A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective.

    Artur Podobas, Kentaro Sano, Satoshi Matsuoka

    CoRR   abs/2004.04509   2020

     More details

    Publishing type:Research paper (scientific journal)  

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2004.html#abs-2004-04509

  • A Template-based Framework for Exploring Coarse-Grained Reconfigurable Architectures.

    Artur Podobas, Kentaro Sano, Satoshi Matsuoka

    31st IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP)   1 - 8   2020

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/ASAP49362.2020.00010

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/asap/asap2020.html#PodobasSM20

  • AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs.

    Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka

    CoRR   abs/2001.01473   2020

     More details

    Publishing type:Research paper (scientific journal)  

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2001.html#abs-2001-01473

  • A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs.

    Lingqi Zhang 0001, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka

    CoRR   abs/2004.05371   2020

     More details

    Publishing type:Research paper (scientific journal)  

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2004.html#abs-2004-05371

  • Scaling distributed deep learning workloads beyond the memory capacity with KARMA.

    Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang 0001, Ryousei Takano, Satoshi Matsuoka

    19 - 19   2020

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/SC41405.2020.00023

    researchmap

    Other Link: https://dblp.uni-trier.de/conf/sc/2020

  • Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

    Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydin Buluc

    PARALLEL COMPUTING   90   2019.12

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1016/j.parco.2019.102545

    Web of Science

    researchmap

  • iFDK

    Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis   2019.11

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3295500.3356163

    researchmap

  • Scaling Word2Vec on Big Corpus.

    Bofang Li, Aleksandr Drozd, Yuhe Guo, Tao Liu 0001, Satoshi Matsuoka, Xiaoyong Du 0001

    Data Sci. Eng.   4 ( 2 )   157 - 175   2019

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1007/s41019-019-0096-6

    researchmap

  • How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications.

    Aamer Shah, Chih-Song Kuo, Akihiro Nomura 0002, Satoshi Matsuoka, Felix Wolf 0001

    Supercomput. Front. Innov.   6 ( 2 )   29 - 55   2019

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.14529/jsfi190203

    researchmap

  • A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

    Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3295500.3356162

    Web of Science

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/1907.06154v2

  • iFDK: A Scalable Framework for Instant High-resolution Image Reconstruction

    Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3295500.3356163

    Web of Science

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/1909.02724v1

  • Learning Neural Representations for Predicting GPU Performance

    Shweta Salaria, Aleksandr Drozd, Artur Podobas, Satoshi Matsuoka

    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2019   11501   40 - 58   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-030-20656-7_3

    Web of Science

    researchmap

  • MH-QEMU: Memory-State-Aware Fault Injection Platform

    Hideyuki Jitsumoto, Yuya Kobayashi, Akihiro Nomura, Satoshi Matsuoka

    SUPERCOMPUTING FRONTIERS, SCFA 2019   11416   71 - 85   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-030-18645-6_5

    Web of Science

    researchmap

  • The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees?

    Jens Domke, Satoshi Matsuoka, Ivan Radanov, Yuki Tsushima, Tomoya Yuki, Akihiro Nomura 0002, Shin'ichi Miura, Nic McDonald, Dennis Lee Floyd, Nicolas Dubé

    2019 IEEE Symposium on High-Performance Interconnects   1 - 4   2019

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/HOTI.2019.00013

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/hoti/hoti2019.html#DomkeMRTY0MMFD19

  • Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method

    Yohei Tsuji, Kazuki Osawa, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019)   21 - 8   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3339186.3339202

    Web of Science

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/icppw/icppw2019.html#TsujiOUNYM19

  • Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35

    Kazuki Oosawa, Youhei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

    2019

     More details

  • HyperX Topology: First At-Scale Implementation and Comparison to the Fat-Tree

    Jens Domke, Satoshi Matsuoka, Ivan R. Ivanov, Yuki Tsushima, Tomoya Yuki, Akihiro Nomura, Shin'ichi Miura, Nic McDonald, Dennis L. Floyd, Nicolas Dube

    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3295500.3356140

    Web of Science

    researchmap

  • Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks

    Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019)   12351 - 12359   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CVPR.2019.01264

    Web of Science

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/conf/cvpr/2019

  • Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks

    Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka

    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID)   231 - 240   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGRID.2019.00037

    Web of Science

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ccgrid/ccgrid2019.html#NagasakaNKM19

  • Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations

    James Lin, Zhigeng Xu, Linjin Cai, Akira Nukada, Satoshi Matsuoka

    PARALLEL COMPUTING   77   128 - 143   2018.9

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1016/j.parco.2018.06.001

    Web of Science

    researchmap

  • Interference between I/O and MPI Traffic on Fat-tree Networks

    Kevin A. Brown, Nikhil Jain, Satoshi Matsuoka, Martin Schulz, Abhinav Bhatele

    Proceedings of the 47th International Conference on Parallel Processing   2018.8

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3225058.3225144

    researchmap

  • MRG8 - Random Number Generation for the Exascale Era

    Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka, Kenichi Miura, John Shalf

    Proceedings of the Platform for Advanced Scientific Computing Conference   2018.7

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3218176.3218230

    researchmap

  • 世界最大規模のオープンAIインフラストラクチャAI橋渡しクラウド(ABCI)の概要

    小川宏高, 松岡聡, 松岡聡, 佐藤仁, 高野了成, 滝澤真一朗, 谷村勇輔, 三浦信一, 三浦信一, 関口智嗣

    情報処理学会研究報告(Web)   2018 ( HPC-165 )   Vol.2018‐HPC‐165,No.19,1‐7 (WEB ONLY)   2018.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    J-GLOBAL

    researchmap

  • 0.55AI‐EFLOPSの計算インフラストラクチャを支える超グリーンAIデータセンタ

    高野了成, 三浦信一, 三浦信一, 杉田正, 小川宏高, 松岡聡, 松岡聡

    情報処理学会研究報告(Web)   2018 ( HPC-165 )   Vol.2018‐HPC‐165,No.20,1‐7 (WEB ONLY)   2018.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    J-GLOBAL

    researchmap

  • High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

    Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydın Buluç

    2018.4

     More details

    Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive
    that is widely used in areas ranging from traditional numerical applications to
    recent big data analysis and machine learning. Although many SpGEMM algorithms
    have been proposed, hardware specific optimizations for multi- and many-core
    processors are lacking and a detailed analysis of their performance under
    various use cases and matrices is not available. We firstly identify and
    mitigate multiple bottlenecks with memory management and thread scheduling on
    Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and
    many-core processors, we develop a hash-table-based algorithm and optimize a
    heap-based shared-memory SpGEMM algorithm. We examine their performance
    together with other publicly available codes. Different from the literature,
    our evaluation also includes use cases that are representative of real graph
    algorithms, such as multi-source breadth-first search or triangle counting. Our
    hash-table and heap-based algorithms are showing significant speedups from
    libraries in the majority of the cases while different algorithms dominate the
    other scenarios with different matrix size, sparsity, compression factor and
    operation type. We wrap up in-depth evaluation results and make a recipe to
    give the best SpGEMM algorithm for target scenario. A critical finding is that
    hash-table-based SpGEMM gets a significant performance boost if the nonzeros
    are not required to be sorted within each row of the output matrix.

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/1804.01698v2

  • Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

    Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

    2018.2

     More details

    Recent developments in High Level Synthesis tools have attracted software
    programmers to accelerate their high-performance computing applications on
    FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms
    of performance for stencil computation, most previous work achieve this by
    avoiding spatial blocking and restricting input dimensions relative to FPGA
    on-chip memory. In this work we create a stencil accelerator using Intel FPGA
    SDK for OpenCL that achieves high performance without having such restrictions.
    We combine spatial and temporal blocking to avoid input size restrictions, and
    employ multiple FPGA-specific optimizations to tackle issues arisen from the
    added design complexity. Accelerator parameter tuning is guided by our
    performance model, which we also use to project performance for the upcoming
    Intel Stratix 10 devices. On an Arria 10 GX 1150 device, our accelerator can
    reach up to 760 and 375 GFLOP/s of compute performance, for 2D and 3D stencils,
    respectively, which rivals the performance of a highly-optimized GPU
    implementation. Furthermore, we estimate that the upcoming Stratix 10 devices
    can achieve a performance of up to 3.5 TFLOP/s and 1.6 TFLOP/s for 2D and 3D
    stencil computation, respectively.

    DOI: 10.1145/3174243.3174248

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/1802.00438v1

  • Lock Contention Management in Multithreaded MPI

    Abdelhalim Amer, Huiwei Lu, Pavan Balaji, Milind Chabbi, Yanjie Wei, Jeff Hammond, Satoshi Matsuoka

    ACM TRANSACTIONS ON PARALLEL COMPUTING   5 ( 3 )   2018.1

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1145/3275443

    Web of Science

    researchmap

  • Machine Learning Predictions for Underestimation of Job Runtime on HPC System

    Jian Guo, Akihiro Nomura, Ryan Barton, Haoyu Zhang, Satoshi Matsuoka

    Supercomputing Frontiers   179 - 198   2018

     More details

    Publisher:Springer International Publishing  

    DOI: 10.1007/978-3-319-69953-0\_11

    DOI: 10.1007/978-3-319-69953-0_11

    researchmap

  • Machine Learning Predictions for Underestimation of Job Runtime on HPC System

    Jian Guo, Akihiro Nomura, Ryan Barton, Haoyu Zhang, Satoshi Matsuoka

    SUPERCOMPUTING FRONTIERS, SCFA 2018   10776   179 - 198   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-319-69953-0_11

    Web of Science

    researchmap

  • Efficient Solving of Scan Primitive on Multi-GPU Systems

    Adrian P. Dieguez, Margarita Amor, Ramon Doallo, Akira Nukada, Satoshi Matsuoka

    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)   794 - 803   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPS.2018.00089

    Web of Science

    researchmap

  • Predicting Performance Using Collaborative Filtering

    Shweta Salaria, Aleksandr Drozd, Artur Podobas, Satoshi Matsuoka

    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   504 - 514   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CLUSTER.2018.00066

    Web of Science

    researchmap

  • Hardware Implementation of POSITs and Their Application in FPGAs

    Artur Podobas, Satoshi Matsuoka

    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018)   138 - 145   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPSW.2018.00029

    Web of Science

    researchmap

  • Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs

    Hiroki Kanezashi, Toyotaro Suzumura, Dario Garcia-Gasulla, Min-hwan Oh, Satoshi Matsuoka

    2018 IEEE 25TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC)   92 - 101   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/HiPC.2018.00019

    Web of Science

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/1812.10321v1

  • Explorations of Data Swapping on Burst Buffer

    Tianqi Xu, Kento Sato, Satoshi Matsuoka

    2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018)   517 - 526   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/ICPADS.2018.00074

    Web of Science

    researchmap

  • DRAGON: Breaking GPU Memory Capacity Limits with Direct NVM Access

    Pak Markthub, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, Satoshi Matsuoka

    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18)   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Optimizing Preconditioned Conjugate Gradient on TaihuLight for OpenFOAM

    James Lin, Minhua Wen, Delong Meng, Xin Liu, Akira Nukada, Satoshi Matsuoka

    2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID)   273 - 282   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGRID.2018.00042

    Web of Science

    researchmap

  • Accelerating Deep Learning Frameworks with Micro-batches

    Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   402 - 412   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CLUSTER.2018.00058

    Web of Science

    researchmap

  • Cambrian Explosion of Computing and Big Data in the Post-Moore Era

    Satoshi Matsuoka

    HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING   105 - 105   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3208040.3225055

    Web of Science

    researchmap

  • Efficient Algorithms for the Summed Area Tables Primitive on GPUs

    Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   482 - 493   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CLUSTER.2018.00064

    Web of Science

    researchmap

  • High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

    Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018)   abs/2002.05983   123 - 130   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPSW.2018.00027

    Web of Science

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2002.html#abs-2002-05983

  • MACC: An OpenACC Transpiler for Automatic Multi-GPU Use Reviewed

    Kazuaki Matsumura, Mitsuhisa Sato, Taisuke Boku, Artur Podobas, Satoshi Matsuoka

    SUPERCOMPUTING FRONTIERS, SCFA 2018   10776   109 - 127   2018

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-319-69953-0_7

    Web of Science

    Scopus

    researchmap

  • Overview of TSUBAME3.0, Green Cloud Supercomputer for Convergence of HPC, AI and Big-Data

    松岡聡, 遠藤敏夫, 額田彰, 三浦信一, 野村哲弘, 佐藤仁, 實本英之, DROZD Aleksandr

    Tsubame e-Science Journal   16   02‐08 (JA),20‐27 (EN) - 8   2017.11

     More details

  • Applying Temporal Blocking with a Directive-based Approach

    Shota Kuroda, Toshio Endo, Satoshi Matsuoka

    Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC   2017.11

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3148173.3148190

    researchmap

  • AI橋渡しクラウド―AI Bridging Cloud Infrastructure(ABCI)―の構想

    小川宏高, 松岡聡, 松岡聡, 佐藤仁, 高野了成, 滝澤真一朗, 谷村勇輔, 三浦信一, 関口智嗣

    情報処理学会研究報告(Web)   2017 ( HPC-160 )   Vol.2017‐HPC‐160,No.28,1‐7 (WEB ONLY)   2017.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    J-GLOBAL

    researchmap

  • HPCとビッグデータ・AIを融合するグリーン・クラウドスパコンTSUBAME3.0の概要

    松岡聡, 遠藤敏夫, 額田彰, 三浦信一, 野村哲弘, 佐藤仁, 實本英之, DROZD Aleksandr

    情報処理学会研究報告(Web)   2017 ( HPC-160 )   Vol.2017‐HPC‐160,No.29,1‐6 (WEB ONLY)   2017.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    J-GLOBAL

    researchmap

  • Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration)

    Kevin Brown, Tianqi Xu, Keita Iwabuchi, Kento Sato, Adam Moody, Kathryn Mohror, Nikhil Jain, Abhinav Bhatele, Martin Schulz, Roger Pearce, Maya Gokhale, Satoshi Matsuoka

    2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW)   343 - 347   2017.6

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icdcsw.2017.74

    researchmap

  • Efficient Breadth-First Search on Massively Parallel and Distributed-Memory Machines

    Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, Satoshi Matsuoka

    Data Science and Engineering   2 ( 1 )   22 - 35   2017.3

     More details

    Publishing type:Research paper (scientific journal)   Publisher:Springer Science and Business Media LLC  

    DOI: 10.1007/s41019-016-0024-y

    researchmap

    Other Link: http://link.springer.com/article/10.1007/s41019-016-0024-y/fulltext.html

  • Fast Recognition of Bird Sounds Using Extreme Learning Machines

    Kun Qian, Jian Guo, Ken Ishida, Satoshi Matsuoka

    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING   12 ( 2 )   294 - 296   2017.3

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1002/tee.22378

    Web of Science

    researchmap

  • Co-locating Graph Analytics and HPC Applications

    Kevin Brown, Satoshi Matsuoka

    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   659 - 660   2017

     More details

  • Optimizations of Two Compute-bound Scientific Kernels on the SW26010 Many-core Processor

    James Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka

    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP)   432 - 441   2017

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/ICPP.2017.52

    Web of Science

    researchmap

  • GPU-based Training of Autoencoders for Bird Sound Data Processing

    Jian Guo, Kun Qian, Bjorn Schuller, Satoshi Matsuoka

    2017 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW)   2017

     More details

  • High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

    Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP)   101 - 110   2017

     More details

  • Being "BYTES-oriented" in HPC leads to an Open Big Data/AI Ecosystem and Further Advances into the Post-Moore Era

    Satoshi Matsuoka

    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)   5 - 5   2017

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Benchmarking SW26010 Many-core Processor

    Zhigeng Xu, James Lin, Satoshi Matsuoka

    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)   743 - 752   2017

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPSW.2017.9

    Web of Science

    researchmap

  • Asynchronous, Data-Parallel Deep Convolutional Neural Network Training with Linear Prediction Model for Parameter Transition

    Ikuro Sato, Ryo Fujisaki, Yosuke Oyama, Akihiro Nomura, Satoshi Matsuoka

    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II   10635   305 - 314   2017

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-319-70096-0_32

    Web of Science

    researchmap

  • Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing

    Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

    2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL)   2017

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Designing and Accelerating Spiking Neural Networks using OpenCL for FPGAs

    Artur Podobas, Satoshi Matsuoka

    2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT)   255 - 258   2017

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Evaluation of HPC-Big Data Applications Using Cloud Platforms

    Shweta Salaria, Kevin Brown, Hideyuki Jitsumoto, Satoshi Matsuoka

    2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID)   1053 - 1061   2017

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGRID.2017.143

    Web of Science

    researchmap

  • Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing

    Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

    2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL)   2017

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Migrating Legacy Fortran to Python While Retaining Fortran-Level Performance through Transpilation and Type Hints

    Mateusz Bysiek, Aleksandr Drozd, Satoshi Matsuoka

    2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)   2016.11

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/pyhpc.2016.006

    researchmap

  • Special Issue on Cluster Computing

    Michela Taufer, Pavan Balaji, Satoshi Matsuoka

    PARALLEL COMPUTING   58   25 - 26   2016.10

     More details

  • Critical mass in the emergence of collective intelligence: a parallelized simulation of swarms in noisy environments

    Aleksandr Drozd, Olaf Witkowski, Satoshi Matsuoka, Takashi Ikegami

    Artificial Life and Robotics   21 ( 3 )   317 - 323   2016.9

     More details

    Publishing type:Research paper (scientific journal)   Publisher:Springer Science and Business Media LLC  

    DOI: 10.1007/s10015-016-0303-8

    researchmap

    Other Link: http://link.springer.com/article/10.1007/s10015-016-0303-8/fulltext.html

  • 仮想マシンエミュレータを用いた特定故障パターン発生時におけるアプリケーションの誤差の評価

    小林佑矢, 實本英之, 野村哲弘, 松岡聡

    情報処理学会研究報告(Web)   2016 ( HPC-155 )   Vol.2016‐HPC‐155,No.10,1‐7 (WEB ONLY) - 7   2016.8

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • Routing on the Dependency Graph

    Jens Domke, Torsten Hoefler, Satoshi Matsuoka

    Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing   2016.5

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/2907294.2907313

    researchmap

  • From FLOPS to BYTES: Disruptive change in high-performance computing towards the post-moore era Reviewed

    Satoshi Matsuoka, Hideharu Amano, Kengo Nakajima, Koji Inoue, Tomohiro Kudoh, Naoya Maruyama, Kenjiro Taura, Takeshi Iwashita, Takahiro Katagiri, Toshihiro Hanawa, Toshio Endo

    2016 ACM International Conference on Computing Frontiers - Proceedings   274 - 281   2016.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Association for Computing Machinery, Inc  

    DOI: 10.1145/2903150.2906830

    Scopus

    researchmap

  • From FLOPS to BYTES

    Satoshi Matsuoka, Hideharu Amano, Kengo Nakajima, Koji Inoue, Tomohiro Kudoh, Naoya Maruyama, Kenjiro Taura, Takeshi Iwashita, Takahiro Katagiri, Toshihiro Hanawa, Toshio Endo

    Proceedings of the ACM International Conference on Computing Frontiers   274 - 281   2016.5

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/2903150.2906830

    researchmap

  • Serving More GPU Jobs, with Low Penalty, using Remote GPU Execution and Migration

    Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   485 - 488   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CLUSTER.2016.36

    Web of Science

    researchmap

  • Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures

    Abdelhalim Amer, Satoshi Matsuoka, Miquel Pericas, Naoya Maruyama, Kenjiro Taura, Rio Yokota, Pavan Balaji

    OPENMP: MEMORY, DEVICES, AND TASKS   9903   156 - 170   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-319-45550-1_12

    Web of Science

    researchmap

  • GPU-Based Fast Signal Processing for Large Amounts of Snore Sound Data

    Jian Guo, Kun Qian, Huijie Xu, Christoph Janott, Bjoern Schuller, Satoshi Matsuoka

    2016 IEEE 5TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Tapas: An Implicitly Parallel Programming Framework For Hierarchical N-body Algorithms

    Keisuke Fukuda, Motohiko Matsuda, Naoya Maruyama, Rio Yokota, Kenjiro Taura, Satoshi Matsuoka

    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS)   1100 - 1109   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/ICPADS.2016.143

    Web of Science

    researchmap

  • アプリケーションからみた将来の HPCI システムへの要件の抽出のためのベンチマーク

    野村哲弘, 鈴木惣一朗, 三上和徳, 丸山直也, 松岡聡

    2016

     More details

  • Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn't.

    Anna Gladkova, Aleksandr Drozd, Satoshi Matsuoka

    Proceedings of the NAACL Student Research Workshop   2016

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:Association for Computational Linguistics  

    DOI: 10.18653/v1/n16-2002

    researchmap

  • Word embeddings, analogies, and machine learning: beyond king - man + woman = queen

    Aleksandr Drozd, Anna Gladkova, Satoshi Matsuoka

    Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers   2016

     More details

    Publisher:The COLING 2016 Organizing Committee  

    researchmap

  • Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs

    Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, Satoshi Matsuoka

    SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS   409 - 420   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • CloudBB: Scalable I/O Accelerator for Shared Cloud Storage

    Tianqi Xu, Kento Sato, Satoshi Matsuoka

    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS)   509 - 518   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/ICPADS.2016.72

    Web of Science

    researchmap

  • I/O Chunking and Latency Hiding Approach for Out-of-core Sorting Acceleration using GPU and Flash NVM Reviewed

    Hitoshi Sato, Ryo Mizote, Satoshi Matsuoka, Hirotaka Ogawa

    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)   398 - 403   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Predicting Statistics of Asynchronous SGD Parameters for a Large-Scale Distributed Deep Learning System on GPU Supercomputers

    Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, Yukimasa Tamatsu, Satoshi Matsuoka

    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)   66 - 75   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

    Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016)   80   131 - 142   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1016/j.procs.2016.05.304

    Web of Science

    researchmap

  • Extreme Scale Breadth-First Search on Supercomputers

    Koji Ueno, Toyotaro Suzumura, Naova Maruyama, Katsuki Fujisawa, Satoshi Matsuoka

    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)   1040 - 1047   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Towards a Distributed Large-Scale Dynamic Graph Data Store

    Keita Iwabuchi, Scott Sallinen, Roger Pearce, Brian Van Essen, Maya Gokhale, Satoshi Matsuoka

    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)   892 - 901   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPSW.2016.189

    Web of Science

    researchmap

  • A Directive-based Data Layout Abstraction for Performance Portability of OpenACC Applications

    Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka

    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS)   1147 - 1154   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/HPCC-SmartCity-DSS.2016.34

    Web of Science

    researchmap

  • Towards Convergence of Extreme Computing and Big Data Centers

    Satoshi Matsuoka

    DIDC'16: PROCEEDINGS OF THE ACM INTERNATIONAL WORKSHOP ON DATA-INTENSIVE DISTRIBUTED COMPUTING   1 - 1   2016

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/2912152.2912159

    Web of Science

    researchmap

  • Discovering Aspectual Classes of Russian Verbs in Untagged Large Corpora

    Aleksandr Drozd, Anna Gladkova, Satoshi Matsuoka

    2015 IEEE International Conference on Data Science and Data Intensive Systems   2015.12

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/dsdis.2015.30

    researchmap

  • MPI plus Threads: Runtime Contention and Remedies

    Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, Satoshi Matsuoka

    ACM SIGPLAN NOTICES   50 ( 8 )   239 - 248   2015.8

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1145/2688500.2688522

    Web of Science

    researchmap

  • TSUBAME2におけるスケジュール効率化への取り組みとユーザ動向の見える化

    野村哲弘, 野村哲弘, 佐々木淳, 三浦信一, 三浦信一, 遠藤敏夫, 遠藤敏夫, 松岡聡, 松岡聡

    情報処理学会研究報告(Web)   2015 ( HPC-150 )   VOL.2015-HPC-150,NO.2 (WEB ONLY)   2015.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    J-GLOBAL

    researchmap

  • Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers

    Toshio Endo, Yuki Takasaki, Satoshi Matsuoka

    2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS)   625 - 632   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/ICPADS.2015.84

    Web of Science

    researchmap

  • Evaluating AVX2 Vgather Instruction with Stencils

    James Lin, Qiang Qin, Shuo Li, Minhua Wen, Satoshi Matsuoka

    2015

     More details

  • GPUクラスタにおける大規模都市気流シミュレーションの最適化と性能モデル

    高嵜 祐樹, 遠藤 敏夫, 松岡 聡

    2015

     More details

  • Python, performance, and natural language processing

    Aleksandr Drozd, Anna Gladkova, Satoshi Matsuoka

    Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing - PyHPC '15   2015

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ACM Press  

    DOI: 10.1145/2835857.2835858

    researchmap

  • Design of a NVRAM Specialized Degree Aware Dynamic Graph Data Structure

    Keita Iwabuchi, Roger Pearce, Brian Van Essen, Maya Gokhale, Satoshi Matsuoka

    2015

     More details

  • Performance Analysis of MapReduce Implementations for High Performance Homology Search

    Chaojie Zhang, Koichi Shirahata, Shuji Suzuki, Yutaka Akiyama, Satoshi Matsuoka

    2015

     More details

  • Porting and Optimizing GTC-P on Sunway TaihuLight Supercomputer with Sunway OpenACC

    Yichao Wang, James Lin, Linjin Cai, William Tang, Stephane Ethier, Bei Wang, Simon See, Satoshi

    2015

     More details

  • Pregelグラフ処理系におけるメッセージ配送最適化

    上野 晃司, 鈴村 豊太郎, 松岡 聡

    2015

     More details

  • Signal-Driven Swarming: A Parallel Implementation of Evolved Autonomous Agents to Perform A Foraging Task

    Aleksandr Drozd, Olaf Witkowski, Satoshi Matsuoka, Takashi Ikegami

    2015

     More details

  • Software Certification in Legal and Scientific Metrology

    MATSUOKA Satoshi

    Journal of The Society of Instrument and Control Engineers   54 ( 10 )   766 - 769   2015

     More details

    Language:Japanese   Publisher:The Society of Instrument and Control Engineers  

    DOI: 10.11499/sicejl.54.766

    CiNii Books

    researchmap

    Other Link: https://jlc.jst.go.jp/DN/JLC/20016439007?from=CiNii

  • Exploration of Lossy Compression for Application-level Checkpoint/Restart Reviewed

    Naoto Sasaki, Kento Sato, Toshio Endo, Satoshi Matsuoka

    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)   914 - 922   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPS.2015.67

    Web of Science

    researchmap

  • Efficient Execution of Multiple CUDA Applications Using Transparent Suspend, Resume and Migration

    Taichiro Suzuki, Akira Nukada, Satoshi Matsuoka

    EURO-PAR 2015: PARALLEL PROCESSING   9233   687 - 699   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-662-48096-0_53

    Web of Science

    researchmap

  • Understanding Performance Portability of OpenACC for Supercomputers

    Suttinee Sawadsitang, James Lin, Simon See, Francois Bodin, Satoshi Matsuoka

    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS   699 - 707   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPSW.2015.60

    Web of Science

    researchmap

  • Hardware-Centric Analysis of Network Performance for MPI Applications

    Kevin A. Brown, Jens Domke, Satoshi Matsuoka

    2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS)   692 - 699   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/ICPADS.2015.92

    Web of Science

    researchmap

  • Characterizing MPI and Hybrid MPI plus Threads Applications at Scale: Case Study with BFS

    Abdelhalim Amer, Huiwei Lu, Pavan Balaji, Satoshi Matsuoka

    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING   1075 - 1083   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGrid.2015.93

    Web of Science

    researchmap

  • Modeling Gather and Scatter with Hardware Performance Counters for Xeon Phi

    James Lin, Akira Nukada, Satoshi Matsuoka

    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING   713 - 716   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGrid.2015.59

    Web of Science

    researchmap

  • An OpenACC Extension for Data Layout Transformation

    Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka

    2014 First Workshop on Accelerator Programming using Directives   12 - 18   2014.11

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/waccpd.2014.12

    researchmap

  • Tracing data movements within MPI collectives

    Kevin A. Brown, Jens Domke, Satoshi Matsuoka

    ACM International Conference Proceeding Series   09-12-September-2014   117 - 118   2014.9

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/2642769.2642789

    Scopus

    researchmap

  • 実アプリケーションを用いた計算機評価ベンチマークと性能リポジトリの開発

    野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡

    情報処理学会研究報告(Web)   2014 ( 29 )   1 - 7   2014.7

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:一般社団法人情報処理学会  

    CiNii Books

    J-GLOBAL

    researchmap

  • HPCI先端ソフトウェア運用基盤の構築と運用

    三浦信一, 滝澤真一朗, 松岡聡, 棟朝雅晴, 實本英之, 小林泰三

    情報処理学会研究報告(Web)   2014 ( 30 )   1 - 6   2014.2

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:一般社団法人情報処理学会  

    CiNii Books

    J-GLOBAL

    researchmap

  • Cache-aware Sparse Matrix Formats for Kepler GPU

    Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS)   281 - 288   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Special issue: SC13-The International Conference for High Performance Computing, Networking, Storage and Analysis

    William Gropp, Satoshi Matsuoka

    SCIENTIFIC PROGRAMMING   22 ( 2 )   57 - 58   2014

     More details

  • Petascale General Solver for Semidefinite Programming Problems with over Two Million Constraints

    Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki

    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPS.2014.121

    Web of Science

    researchmap

  • NVM-based Hybrid BFS with Memory Efficient Data Structure

    Keita Iwabuchi, Hitoshi Sato, Yuichiro Yasui, Katsuki Fujisawa, Satoshi Matsuoka

    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)   529 - 538   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Hybrid BFS Approach Using Semi-External Memory

    Keita Iwabuchi, Hitoshi Sato, Ryo Mizote, Yuichiro Yasui, Katsuki Fujisawa, Satoshi Matsuoka

    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)   1698 - 1707   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPSW.2014.189

    Web of Science

    researchmap

  • Fail-in-Place Network Design: Interaction between Topology, Routing Algorithm and Failures

    Jens Domke, Torsten Hoefler, Satoshi Matsuoka

    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS   597 - 608   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/SC.2014.54

    Web of Science

    researchmap

  • TSUBAME-KFC: a Modern Liquid Submersion Cooling Prototype towards Exascale Becoming the Greenest Supercomputer in the World

    Toshio Endo, Akira Nukada, Satoshi Matsuoka

    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS)   360 - 367   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Efficient String Sorting on Multi- and Many-Core Architectures

    Aleksandr Drozd, Miquel Pericas, Satoshi Matsuoka

    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS)   637 - 644   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/BigData.Congress.2014.97

    Web of Science

    researchmap

  • Extreme Big Data (EBD): Next Generation Big Data Infrastructure Technologies Towards Yottabyte/Year.

    Satoshi Matsuoka, Hitoshi Sato, Osamu Tatebe, Michihiro Koibuchi, Ikki Fujiwara, Shuji Suzuki, Masanori Kakuta, Takashi Ishida, Yutaka Akiyama, Toyotaro Suzumura, Koji Ueno, Hiroki Kanezashi, Takemasa Miyoshi

    Supercomput. Front. Innov.   1 ( 2 )   89 - 107   2014

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.14529/jsfi140206

    researchmap

  • Tracing Data Movements within MPI Collectives.

    Kevin A. Brown, Jens Domke, Satoshi Matsuoka

    117 - 117   2014

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/2642769.2642789

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/pvm/eurompi2014.html#BrownDM14

  • Latent Fault Detection With Unbalanced Workloads

    Moshe Gabel, Kento Sato, Daniel Keren, Satoshi Matsuoka, Assaf Schuster

    2014

     More details

  • Node-level Memory Access Optimization on Intel Knights Corner

    James Lin, Shuo Li, Jiaming Zhao, Satoshi Matsuoka

    2014

     More details

  • Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Processing

    Koichi Shirahata, Hitoshi Sato, Satoshi Matsuoka

    2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   221 - 229   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Large-scale Distributed Sorting for GPU-based Heterogeneous Supercomputers

    Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato, Satoshi Matsuoka

    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)   510 - 518   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Scalable Analysis of Multicore Data Reuse and Sharing

    Miquel Pericas, Kenjiro Taura, Satoshi Matsuoka

    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14)   353 - 362   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/2597652.2597674

    Web of Science

    researchmap

  • Analysis of Data Reuse in Task-Parallel Runtimes

    Miquel Pericas, Abdelhalim Amer, Kenjiro Taura, Satoshi Matsuoka

    HIGH PERFORMANCE COMPUTING SYSTEMS: PERFORMANCE MODELING, BENCHMARKING AND SIMULATION   8551   73 - 87   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-319-10214-6_4

    Web of Science

    researchmap

  • FMI: Fault Tolerant Messaging Interface for Fast and Transparent Recovery

    Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka

    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPS.2014.126

    Web of Science

    researchmap

  • A User-level InfiniBand-based File System and Checkpoint Strategy for Burst Buffers

    Kento Sato, Kathryn Mohror, Adam Moody, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka

    2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID)   21 - 30   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGrid.2014.24

    Web of Science

    researchmap

  • Using rCUDA to Reduce GPU Resource-assignment Fragmentation caused by Job Scheduler

    Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

    2014 15TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2014)   105 - 112   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/PDCAT.2014.26

    Web of Science

    researchmap

  • How File Access Patterns Influence Interference Among Cluster Applications

    Chih-Song Kuo, Aamer Shah, Akihiro Nomura, Satoshi Matsuoka, Felix Wolf

    2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   185 - 193   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • システム評価のためのアプリケーション性能リポジトリの構築と性能モデルの評価

    野村哲弘, 三浦信一, 遠藤敏夫, 松岡聡, 鈴木惣一朗, 丸山直也

    情報処理学会研究報告(Web)   2013 ( 4 )   1 - 6   2013.7

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    CiNii Books

    J-GLOBAL

    researchmap

  • Guest Editors' Introduction: Special Issue on Applications for the Heterogeneous Computing Era

    Pavan Balaji, Satoshi Matsuoka

    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS   27 ( 2 )   87 - 88   2013.5

     More details

  • Tsubame2.0: The first petascale supercomputer in japan and the greatest production in the world

    Satoshi Matsuoka, Takayuki Aoki, Toshio Endo, Hitoshi Sato, Shin'Ichiro Takizawa, Akihiko Nomura, Kento Sato

    Contemporary High Performance Computing: From Petascale toward Exascale   525 - 555   2013.1

     More details

    Publishing type:Part of collection (book)  

    Scopus

    researchmap

  • Scalable Implementation of a MapReduce-based Graph Processing Algorithm for Large-scale Heterogeneous Supercomputers

    Koichi Shirahata, Hitoshi Sato, Toyotaro Suzumura, Satoshi Matsuoka

    PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013)   277 - 284   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGrid.2013.85

    Web of Science

    researchmap

  • Energy-aware I/O optimization for checkpoint and restart on a NAND flash memory system

    Takafumi Saito, Kento Sato, Hitoshi Sato, Satoshi Matsuoka

    FTXS 2013 - Proceedings of the 3rd ACM Workshop on Fault-Tolerance for HPC at eXtreme Scale   41 - 47   2013

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/2465813.2465822

    Scopus

    researchmap

  • Proceedings of SC13 The International Conference for High Performance Computing, Networking, Storage and Analysis Denver, Colorado 17-22 November 2013

    William Gropp, Satoshi Matsuoka

    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC)   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • A Multi GPU Read Alignment Algorithm with Model-Based Performance Optimization

    Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka

    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012   7851   270 - 277   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • A Parallel Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPUs

    Guanghao Jin, Toshio Endo, Satoshi Matsuoka

    2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application11

    Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka, Ryoji Takaki

    PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013)   136 - 143   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGrid.2013.12

    Web of Science

    researchmap

  • Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM

    Abdelhalim Amer, Naoya Maruyama, Miquel Pericas, Kenjiro Taura, Rio Yokota, Satoshi Matsuoka

    SUPERCOMPUTING (ISC 2013)   7905   255 - 266   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Energy-aware I/O optimization for checkpoint and restart on a NAND flash memory system.

    Takafumi Saito, Kento Sato, Hitoshi Sato, Satoshi Matsuoka

    41 - 48   2013

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/2465813.2465822

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/hpdc/ftxs2013.html#SaitoSSM13

  • Multi-GPU Implementation of the NICAM Atmospheric Model

    Irina Demeshko, Naoya Maruyama, Hirofumi Tomita, Satoshi Matsuoka

    EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS   7640   175 - 184   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Improving the computing efficiency of HPC systems using a combination of proactive and preventive checkpointing

    Mohamed Slim Bouguerra, Ana Gainaru, Leonardo Bautista Gomez, Franck Cappello, Satoshi Matsuoka, Naoya Maruyama

    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013)   501 - 512   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPS.2013.74

    Web of Science

    researchmap

  • Towards Exascale with the ANR-JST Japanese-French Project FP3C Reviewed

    G. Antoniu, T. Boku, A. Buttari, C. Calvin, P. Codognet, M. Dayde, N. Emad, Y. Ishikawa, G. Joslin, S. Matsuoka, K. Nakajima, H. Nakashima, R. Namyst, S. Petiton, T. Sakurai, M. Sato

    2013 COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES (CSIT)   2013

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CSITechnol.2013.6710357

    Web of Science

    researchmap

  • Scalable Reed-Solomon-Based Reliable Local Storage for HPC Applications on IaaS Clouds

    Leonardo Bautista Gomez, Bogdan Nicolae, Naoya Maruyama, Franck Cappello, Satoshi Matsuoka

    EURO-PAR 2012 PARALLEL PROCESSING   7484   313 - 324   2012

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Hierarchical Clustering Strategies for Fault Tolerance in Large Scale HPC Systems

    Leonardo Bautista-Gomez, Thomas Ropars, Naoya Maruyama, Franck Cappello, Satoshi Matsuoka

    2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)   355 - 363   2012

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CLUSTER.2012.71

    Web of Science

    researchmap

  • Using Bittorrent and SVC for Efficient Video Sharing and Streaming

    Amer Abdelhalim, Toufik Ahmed, Hidouci Walid-Khaled, Satoshi Matsuoka

    2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC)   537 - 543   2012

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Design and Modeling of a Non-blocking Checkpointing System

    Kento Sato, Kathryn Mohror, Adam Moody, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka

    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC)   2012

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Topic 16: GPU and Accelerators Computing

    Alex Ramirez, Dimitrios S. Nikolopoulos, David Kaeli, Satoshi Matsuoka

    EURO-PAR 2012 PARALLEL PROCESSING   7484   857 - 858   2012

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Design and Implementation of Portable and Efficient Non-blocking Collective Communication. Reviewed

    Akihiro Nomura 0002, Yutaka Ishikawa, Naoya Maruyama, Satoshi Matsuoka

    12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing(CCGRID)   1 - 8   2012

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:IEEE Computer Society  

    DOI: 10.1109/CCGrid.2012.96

    researchmap

  • High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems

    Katsuki Fujisawa, Toshio Endo, Hitoshi Sato, Makoto Yamashita, Satoshi Matsuoka, Maho Nakata

    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC)   2012

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Sequence Alignment on Massively Parallel Heterogeneous Systems

    Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka

    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW)   2498 - 2501   2012

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPSW.2012.311

    Web of Science

    researchmap

  • Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

    Akira Nukada, Kento Sato, Satoshi Matsuoka

    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC)   2012

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • The International Exascale Software Project roadmap Reviewed

    Jack Dongarra, Pete Beckman, Terry Moore, Patrick Aerts, Giovanni Aloisio, Jean-Claude Andre, David Barkai, Jean-Yves Berthou, Taisuke Boku, Bertrand Braunschweig, Franck Cappello, Barbara Chapman, Xuebin Chi, Alok Choudhary, Sudip Dosanjh, Thom Dunning, Sandro Fiore, Al Geist, Bill Gropp, Robert Harrison, Mark Hereld, Michael Heroux, Adolfy Hoisie, Koh Hotta, Zhong Jin, Yutaka Ishikawa, Fred Johnson, Sanjay Kale, Richard Kenway, David Keyes, Bill Kramer, Jesus Labarta, Alain Lichnewsky, Thomas Lippert, Bob Lucas, Barney Maccabe, Satoshi Matsuoka, Paul Messina, Peter Michielse, Bernd Mohr, Matthias S. Mueller, Wolfgang E. Nagel, Hiroshi Nakashima, Michael E. Papka, Dan Reed, Mitsuhisa Sato, Ed Seidel, John Shalf, David Skinner, Marc Snir, Thomas Sterling, Rick Stevens, Fred Streitz, Bob Sugar, Shinji Sumimoto, William Tang, John Taylor, Rajeev Thakur, Anne Trefethen, Mateo Valero, Aad van der Steen, Jeffrey Vetter, Peg Williams, Robert Wisniewski, Kathy Yelick

    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS   25 ( 1 )   3 - 60   2011.2

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1177/1094342010391989

    Web of Science

    researchmap

  • Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers.

    Naoya Maruyama, Tatsuo Nomura, Kento Sato, Satoshi Matsuoka

    11 - 12   2011

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/2063384.2063398

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/sc/sc2011.html#MaruyamaNSM11

  • Physis: An implicitly parallel programming model for stencil computations on large-scale gpu-accelerated supercomputers

    Naoya Maruyama, Tatsuo Nomura, Kento Sato, Satoshi Matsuoka

    Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis   2011

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/2063384.2063398

    Scopus

    researchmap

  • Performance Characteristics of Graph500 on Large-Scale Distributed Environment

    Toyotaro Suzumura, Koji Ueno, Hitoshi Sato, Katsuki Fujisawa, Satoshi Matsuoka

    2011 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC)   149 - 158   2011

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Model-based Fault Localization: Finding Behavioral Outliers in Large-scale Computing Systems

    Naoya Maruyama, Satoshi Matsuoka

    NEW GENERATION COMPUTING   28 ( 3 )   237 - 255   2010.7

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1007/s00354-009-0088-6

    Web of Science

    researchmap

  • THE INTERNATIONAL EXASCALE SOFTWARE PROJECT: A CALL TO COOPERATIVE ACTION BY THE GLOBAL HIGH-PERFORMANCE COMMUNITY

    Jack Dongarra, Pete Beckman, Patrick Aerts, Frank Cappello, Thomas Lippert, Satoshi Matsuoka, Paul Messina, Terry Moore, Rick Stevens, Anne Trefethen, Mateo Valero

    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS   23 ( 4 )   309 - 322   2009.11

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1177/1094342009347714

    Web of Science

    researchmap

  • Linpack Tuning on a Heterogeneous Supercomputer with Four Types of Processors

    ENDO TOSHIO, NUKADA AKIRA, MATSUOKA SATOSHI, MARUYAMA NAOYA, JITSUMOTO HIDEYUKI

    IPSJ SIG Notes   182 ( 14 )   13 - 18   2009.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We report Linpack benchmark results on the TSUBAME supercomputer, a large scale heterogenous system with two types of general processors and two types of accelerators. Although accelerator architectures are promising for performance improvement of computer systems while keeping power consumption and footprint low, there are only few reports about large scale computations on a large number of accelerators, except our previous trials. With all of about 10,000 Opteron cores, 500 Xeon cores, 640 ClearSpeed accelerators and 620 NVIDIA Tesla GPUs, we have achieved 77TFlops in Linpack. Keys for obtaining this result are modification to the program code and careful tuning that preserve performance of accelerators. With this result, TSUBAME is ranked as 29th in the latest Top500 supercomputer ranking, and it is the second largest heterogeneous system in the world.

    CiNii Books

    researchmap

  • Speculative Checkpointing: Exploiting Temporal Affinity of Memory Operations Reviewed

    Satoshi Matsuoka, Ikuhei Yamagata, Hideyuki Jitsumoto

    Conference on High Performance Computing (HPC Asia 2009)   2009

     More details

  • Adaptive Resource Indexing Technique for Unstructured Peer-to-Peer Networks

    Sumeth Lerthirunwong, Naoya Maruyama, Satoshi Matsuoka

    CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID   172 - 179   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGRID.2009.41

    Web of Science

    researchmap

  • Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters

    Tomoaki Hamano, Toshio Endo, Satoshi Matsuoka

    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5   1912 - 1919   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Auto-Tuning 3-D FFT Library for CUDA GPUs

    Akira Nukada, Satoshi Matsuoka

    PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Fast Conjugate Gradients with Multiple GPUs

    Ali Cevahir, Akira Nukada, Satoshi Matsuoka

    COMPUTATIONAL SCIENCE - ICCS 2009, PART I   5544   893 - 903   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • A Model-Based Algorithm for Optimizing I/O Intensive Applications in Clouds using VM-Based Migration

    Kento Sato, Hitoshi Sato, Satoshi Matsuoka

    CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID   466 - +   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGRID.2009.24

    Web of Science

    researchmap

  • File Clustering Based Replication Algorithm in a Grid Environment Reviewed

    Hitoshi Sato, Satoshi Matsuoka, Toshio Endo

    CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID   204 - 211   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/CCGRID.2009.73

    Web of Science

    researchmap

  • Aspects of GPU for General Purpose High Performance Computing

    Reiji Suda, Takayuki Aoki, Shoichi Hirasawa, Akira Nukada, Hiroki Honda, Satoshi Matsuoka

    PROCEEDINGS OF THE ASP-DAC 2009: ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 2009   216 - +   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Coupled-Simulation e-Science Support in the NAREGI Grid Reviewed

    Satoshi Matsuoka, Kazushige Saga, Mutsumi Aoyagi

    COMPUTER   41 ( 11 )   42 - +   2008.11

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1109/MC.2008.449

    Web of Science

    researchmap

  • GridARS: An Advance Reservation-based Grid Co-allocation Framework for Distributed Computing and Network Resources Reviewed

    Atsuko Takefusa, Hidemoto Nakada, Tomohiro Kudoh, Yoshio Tanaka, Satoshi Sekiguchi

    Proc. 13th Workshop on Job Scheduling Strategies for Parallel Processing (LNCS 4942)   152 - 168   2008.4

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    For high performance parallel computing on actual Grids, one of the<br /> important issues is to co-allocate the distributed resources that are<br /> managed by <br /> various local schedulers with advance reservation.<br /> To address the issue, we proposed and developed the GridARS resource<br /> co-allocation framework, and a general advance reservation protocol<br /> that uses WSRF/GSI and a two-phased commit (2PC) protocol to<br /> enable a generic and secure advance reservation process based on distributed<br /> transactions, and provides the interface module for various existing<br /> resource schedulers.<br /> To confirm the effectiveness of GridARS, we describe the performance of <br /> a simultaneous reservation process and a case study of GridARS grid<br /> co-allocation over transpacific computing and network resources.<br /> Our experiments showed that: <br /> 1) the GridARS simultaneous 2PC reservation process is scalable and<br /> practical and<br /> 2) GridARS can co-allocate distributed<br /> resources managed by various local schedulers stably.

    DOI: 10.1007/978-3-540-78699-3_9

    researchmap

  • An efficient, model-based CPU-GPU heterogeneous FFT library Reviewed

    Yasuhito Ogata, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8   380 - +   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Model-Based Optimization for Data-Intensive Application on Virtual Cluster

    Kento Sato, Hitoshi Sato, Satoshi Matsuoka

    2008 9TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING   367 - 368   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Bandwidth Intensive 3-D FFT kernel for GPUs using CUDA

    Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka

    INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS   273 - 283   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Index Distribution Technique for Efficient Search on Unstructured Peer-to-Peer Networks

    Sumeth Lerthirunwong, Naoya Maruyama, Satoshi Matsuoka

    ECTI-CON 2008: PROCEEDINGS OF THE 2008 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, VOLS 1 AND 2   97 - +   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • The Rise of the Commodity Vectors

    Satoshi Matsuoka

    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2008   5336   53 - 62   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Model-based fault localization in large-scale computing systems

    Naoya Maruyama, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8   1841 - 1852   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • The road to TSUBAME and beyond

    Satoshi Matsuoka

    HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2007   265 - 267   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-540-74384-2_19

    Web of Science

    researchmap

  • Connecting Text Mining and Pathways using the PathText Resource Reviewed

    Kemper Oda Okazaki Saetre, Matsuoka, Kikuchi, Kitano, Ananiadou Tsujii Tsuruoka

    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008   1736 - 1740   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid Environment Reviewed

    Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Naoya Maruyama

    2008 9TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING   250 - 257   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method Reviewed

    Yuto Hosogaya, Toshio Endo, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8   862 - 869   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Massive supercomputing coping with heterogeneity of modern accelerators Reviewed

    Toshio Endo, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8   1179 - 1188   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Locality aware MPI communication on a commodity opto-electronic hybrid network Reviewed

    Shin'ichiro Takizawa, Toshio Endo, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8   2158 - +   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • An Advance Reservation-based Computation Resource Manager for Global Scheduling Reviewed

    Hidemoto Nakada, Atsuko Takefusa, Katsuhiko Ookubo, Tomohiro Kudoh, Yoshio Tanaka, Satoshi Sekiguchi

    Proc. of GCA 2007   3 - 14   2007.6

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Advance Reservation is one possible way to enable resource co-allocation on the Grid. This method requires all the resources to have advance reservation capability as well as coordination protocol support. We employed 2-phased commit protocol as a coordination protocol, which is common in the distributed transaction area, and implemented an Advance Reservation Manager called {\bf PluS}. PluS works with existing local queuing managers, such as TORQUE or Grid Engine, and provides users advance reservation capability. To provide the capability, there are two implementation methods; 1) completely replaces the scheduling module of the queuing manger, 2) represents reservation as a queue and controls the queues using external interface. We designed and implemented a reservation manager with both way, and evaluated them. We found that the former has smaller overhead and allows arbitrary scheduling policy, while the latter is much easier to implement withacceptable response time.

    DOI: 10.1142/9789812708823_0001

    researchmap

  • Foundations for Dependable Computing Infrastructure in the Information Explosion Era(<Special Issue>Grant in Aid for Scientific Research on Priority Areas: Cyber Infrastructure for the Information Explosion Era)

    MATSUOKA Satoshi, SHIBAYAMA Etsuya, CHIKAYAMA Takashi, NAKAJIMA Tatsuo, TAURA Kenjiro

    Journal of the Japanese Society for Artificial Intelligence   22 ( 2 )   222 - 228   2007.3

     More details

    Language:Japanese   Publisher:The Japanese Society for Artificial Intelligence  

    DOI: 10.11517/jjsai.22.2_222

    CiNii Books

    CiNii Research

    researchmap

    Other Link: http://id.nii.ac.jp/1004/00006713/

  • A decentralized, scalable, and autonomous grid monitoring system Reviewed

    Laurent Baduel, Satoshi Matsuoka

    PRINCIPLES OF DISTRIBUTED SYSTEMS, PROCEEDINGS   4878   1 - 15   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • 情報爆発時代における安全・安心ITシステム基盤 Reviewed

    松岡 聡, 柴山 悦哉, 近山 隆, 田浦 健次朗

    人工知能学会誌 22   222 - 228   2007

     More details

    Publishing type:Research paper (scientific journal)  

    CiNii Research

    researchmap

    Other Link: https://kaken.nii.ac.jp/grant/KAKENHI-PLANNED-18049073/

  • Grid'BnB: A parallel branch and bound framework for grids

    Denis Caromel, Alexandre di Costanzo, Laurent Baduel, Satoshi Matsuoka

    HIGH PERFORMANCE COMPUTING - HIPC 2007, PROCEEDINGS   4873   566 - 579   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Teddy: a sketching interface for 3D freeform design.

    Takeo Igarashi, Satoshi Matsuoka, Hidehiko Tanaka

    21 - 21   2007

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/1281500.1281532

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/siggraph/siggraph2007courses.html#IgarashiMT07

  • ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs. Reviewed

    Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka

    21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26-30 March 2007, Long Beach, California, USA   1 - 8   2007

     More details

  • A peer-to-peer infrastructure for autonomous grid monitoring Reviewed

    Laurent Baduel, Satoshi Matsuoka

    Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/IPDPS.2007.370653

    Scopus

    researchmap

  • Virtual clusters on the fly - Fast, scalable, and flexible installation

    Hideo Nishimura, Naoya Maruyama, Satoshi Matsuoka

    CCGRID 2007: SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID   549 - +   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • High-performance MPI broadcast algorithm for grid environments utilizing multi-lane NICs Reviewed

    Tatsuhiro Chiba, Toshio Endo, Satoshi Matsuoka

    CCGRID 2007: SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID   487 - +   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Design and Implementation of a Local Scheduling System with Advance Reservation for Co-allocation on the Grid Reviewed

    Hidemoto Nakada, Atsuko Takefusa, Katsuhiko Ookubo, Makoto Kishimoto, Tomohiro Kudoh, Yoshio Tanaka, Satoshi Sekiguchi

    Proceedings of CIT2006   2006.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    While advance reservation is an essential capability for co-allocating several resources on Grid environments, it is not obvious how it can co-exist with priority-based First Come First Served scheduling, that is widely used as local scheduling policy today. To investigate this problem, we 1) developed a scheduling API in Java for TORQUE, a variant of OpenPBS, that enables users to implement their own schedulers and replace the original scheduling module with them, 2) implemented a prototype scheduler module that has advance reservation capability with the API. We also provide an external interface for the reservation capability based on WSRF to enable co-allocation of resources over the Grid. Using this interface with the job submission module from Globus toolkit 4, users can make reservation for resources and submit jobs over the Grid.

    DOI: 10.1109/CIT.2006.71

    researchmap

  • Interactive beautification: A technique for rapid geometric design Reviewed

    Takeo Igarashi, Satoshi Matsuoka, Sachiko Kawachiya, Hidehiko Tanaka

    SIGGRAPH 2006 - ACM SIGGRAPH 2006 Courses   2006.7

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Association for Computing Machinery, Inc  

    DOI: 10.1145/1185657.1185769

    Scopus

    researchmap

  • G-lambda: Coordination of a Grid Scheduler and Lambda Path Service over GMPLS Reviewed

    Atsuko Takefusa, Michiaki Hayashi, Naohide Nagatsu, Hidemoto Nakada, Tomohiro Kudoh, Takahiro Miyamoto, Tomohiro Otani, Hideaki Tanaka, Masatoshi Suzuki, Yasunori Samejima, Wataru Imajuku, Masahiko Jinno, Yoshihiro Takigawa, Shuichi Okamoto, Yoshio Tanaka, Satoshi Sekiguchi

    Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications   22 ( 2006 )   868 - 875   2006.6

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1016/j.future.2006.03.005

    Scopus

    researchmap

  • MegaProto/E: Power-Aware High-Performance Cluster with Commodity Technology Reviewed

    Taisuke, Boku, Mitsuhisa, Sato, Daisuke, Takahashi, Hiroshi, Nakashima, Hiroshi, Nakamura, Satoshi, Matsuoka, Yoshihiko, Hotta

    Proc. 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006), The Second Workshop on High-Performance, Power-Aware Computing (HP-PAC 2006)   2006.4

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE Computer Society  

    researchmap

  • Profile-based Optimization of Power Performance by using Dynamic Voltage Scaling on a PC cluster Reviewed

    HOTTA YOSHIHIKO, SATO MITSUHISA, KIMURA HIDEAKI, MATSUOKA SATOSHI, BOKU TAISUKE, TAKAHASHI DAISUKE

    IPSJ SIG Notes   2006 ( 20 )   139 - 144   2006.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Currently, several of the high performance processors used in a PC cluster have a DVS (Dynamic Voltage Scaling) architecture that can dynamically scale processor voltage and frequency. Adaptive scheduling of the voltage and frequency enables us to reduce power dissipation without a performance slowdown during communication and memory access. In this paper, we propose a method of profiled-based power-performance optimization by DVFS scheduling in a high-performance PC cluster. We divide the program execution into several regions and select the best gear for power efficiency. We propose an optimization algorithm to select a gear using the execution and power profile by taking the transition overhead into account. We have built and designed a power-profiling system, Power Watch. With this system we examined the effectiveness of our optimization algorithm on two types of power-scalable clusters (Crusoe and Turion). According to the results of benchmark tests, we achieved almost 40% reduction in terms of EDP (energy-delay product) without performance impact (less than 5%) compared to results using the standard clock frequency.

    CiNii Books

    researchmap

  • Making wide-area, multi-site MPI feasible using Xen VM

    Masaki Tatezono, Naoya Maruyama, Satoshi Matsuoka

    FRONTIERS OF HIGH PERFORMANCE COMPUTING AND NETWORKING - ISPA 2006 WORKSHOPS, PROCEEDINGS   4331   387 - +   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Teddy: a sketching interface for 3D freeform design.

    Takeo Igarashi, Satoshi Matsuoka, Hidehiko Tanaka

    11 - 11   2006

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/1185657.1185772

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/siggraph/siggraph2006courses.html#IgarashiMT06

  • Multi-Replication with Intelligent Staging in Data-Intensive Grid Applications Reviewed

    Yuya Machida, Shin'ichiro Takizawa, Hidemoto Nakada, Satoshi Matsuoka

    2006 7TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING   88 - +   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/ICGRID.2006.311002

    Web of Science

    researchmap

  • MegaProto: A Low-power and Compact Cluster for High-performance Computing Reviewed

    NAKASHIMA HIROSHI, NAKAMURA HIROSHI, SATO MITSUHISA, BOKU TAISUKE, MATSUOKA SATOSHI, TAKAHASHI DAISUKE, HOTTA YOSHIHIKO

    46 ( 12 )   46 - 61   2005.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    MegaProto is a proof-of-concept prototype for our project "Mega-Scale Computing Based on Low-Power Technology and Workload Modeling", implementing our key idea that a millionscale parallel system should be built with densely mounted low-power commodity processors. It also serves as a platform to implement and evaluate our new technologies such as power conscious compilation, highly reliable and high performance networking, highly dependable cluster management, and multi-level scalable parallel programming. The building block of the MegaProto is a 1U-high 19 inch-rack mountable motherboard unit on which 16 low-power, one-dollar note-sized, commodity PC-architecture daughterboards are mounted with a high bandwidth, 2Gbps per processor network based on Gigabit Ethernet. The peak performance of each unit is 14.4GFlops for the first version and will improve to 32.0GFlops in the second version through a processor/daughterboard upgrade. The intra- and inter-unit network bandwidths are 32Gbps and 16Gbps respectively. As for power consumption, the entire unit idles at less than 150W and consumes 300-320W maximum under extreme computational stress; this is comparable to or better than conventional 1U servers comprised of dual high-performance, power hungry processors, while benchmarks exhibit up to 279% superior performance for some NPB programs. This demonstrates that higher performance can be achieved with low-power, densely populated architectures with commodity components.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018370/

  • MegaProto: A Low-Level and Compact Cluster for High-Performance Computing Reviewed

    H., Nakashima, H., Nakamura, M., Sato, T., Boku, S., Matsuoka, D., Takahashi, Y., Hotta

    Proc. of HP-PAC05 (in IPDPS2005), Denver   CDROM   2005.1

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • A scalable multi-replication framework for data grid Reviewed

    S Takizawa, Y Takamiya, H Nakada, S Matsuoka

    2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS   2005   310 - 315   2005

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    Scopus

    researchmap

    Other Link: http://orcid.org/0000-0002-8901-2504

  • Parallelization of phylogenetic tree inference using grid technologies

    Yo Yamamoto, Hidemoto Nakada, Hidetoshi Shimodaira, Satoshi Matsuoka

    Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science)   3370   103 - 116   2005

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-540-32251-1_10

    Scopus

    researchmap

  • The second trans-pacific grid datafarm testbed and experiments for SC2003 Reviewed

    O Tatebe, H Ogawa, Y Kodama, T Kudoh, S Sekiguchi, S Matsuoka, K Aida, T Boku, M Sato, Y Morita, Y Kitatsuji, J Williams, J Hicks

    2004 INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS   602 - 607   2004

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Parallelization of Phylogenetic Tree Inference Using Grid Technologies.

    Yo Yamamoto, Hidemoto Nakada, Hidetoshi Shimodaira, Satoshi Matsuoka

    103 - 116   2004

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/978-3-540-32251-1_10

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/lsgrid/lsgrid2004.html#YamamotoNSM04

  • A Java-based programming environment for hierarchical Grid: Jojo Reviewed

    H Nakada, S Matsuoka, S Sekiguchi

    2004 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID - CCGRID 2004   51 - 58   2004

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    Scopus

    researchmap

    Other Link: http://orcid.org/0000-0002-8901-2504

  • GridSpeed: A Web-based grid portal generation server Reviewed

    Toyotaro Suzumura, Satoshi Matsuoka, Hidemoto Nakada, Henri Casanova

    Proceedings - Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, HPCAsia 2004   26 - 33   2004

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/HPCASIA.2004.1324013

    Scopus

    researchmap

  • The design and implementation of a fault-tolerant RFC system: Ninf-C Reviewed

    Hidemoto Nakada, Satoshi Matsuoka, Yoshio Tanaka, Satoshi Sekiguchi

    Proceedings - Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, HPCAsia 2004   9 - 18   2004

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/HPCASIA.2004.1324011

    Scopus

    researchmap

  • Autonomous configuration of grid monitoring systems Reviewed

    K Shirose, S Matsuoka, H Nakada, H Ogawa

    2004 INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS   651 - 657   2004

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    Scopus

    researchmap

    Other Link: http://orcid.org/0000-0002-8901-2504

  • Performance of a Deadline-Scheduling Scheme on the Computational Grids Reviewed

    TAKEFUSA Atsuko, MATSUOKA Satoshi

    The Transactions of the Institute of Electronics,Information and Communication Engineers.   86 ( 9 )   661 - 670   2003.9

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Books

    researchmap

  • Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture Reviewed

    TAKEFUSA ATSUKO, TATEBE OSAMU, MATSUOKA SATOSHI, MORITA YOUHEI

    44 ( 11 )   57 - 67   2003.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Data Grid is a Grid environment for ubiquitous access and analysis of large-scale data. Due to its early research status, the performance of petabyte-scale Data Grid models in a realistic data processing setting have not been well investigated. By enhancing our Bricks Grid simulator to be able to simulate Data Grid scenarios, we investigate and compare the performance of different Data Grid models in the Grid Datafarm architecture, mainly categorized into the central and the tier models but with varying scheduling and replication strategies, under realistic assumptions of job processing for the CERN LHC experiments. Our results show the central model is efficient but the tier model with greater amount of resources and speculative class of background replication policies is quite effective and achieves higher performance while each tier being smaller than the central model.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018532/

  • Worldwide Fast File Replication on Grid Datafarm Reviewed

    Osamu Tatebe, Satoshi Sekiguchi, Youhei Morita, Satoshi Matsuoka, Noriyuki Soda

    CoRR   cs.PF/0306090   2003.6

     More details

    The Grid Datafarm architecture is designed for global petascale
    data-intensive computing. It provides a global parallel filesystem with online
    petascale storage, scalable I/O bandwidth, and scalable parallel processing,
    and it can exploit local I/O in a grid of clusters with tens of thousands of
    nodes. One of features is that it manages file replicas in filesystem metadata
    for fault tolerance and load balancing.
    This paper discusses and evaluates several techniques to support
    long-distance fast file replication. The Grid Datafarm manages a ranked group
    of files as a Gfarm file, each file, called a Gfarm file fragment, being stored
    on a filesystem node, or replicated on several filesystem nodes. Each Gfarm
    file fragment is replicated independently and in parallel using rate-controlled
    HighSpeed TCP with network striping. On a US-Japan testbed with 10,000 km
    distance, we achieve 419 Mbps using 2 nodes on each side, and 741 Mbps using 4
    nodes out of 893 Mbps with two transpacific networks.

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/cs/0306090v1

  • Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture for High Energy Physics Applications Reviewed

    Atsuko Takefusa, Osamu Tatebe, Satoshi Matsuoka, Youhei Morita

    Proc. 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12)   34 - 43   2003.6

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/HPDC.2003.1210014

    researchmap

  • Grid Datafarmにおけるスケジューリング・複製手法の性能評価 Reviewed

    竹房 あつ子, 建部 修見, 松岡 聡, 森田 洋平

    情報処理学会・電気通信処理学会 SACSIS2003シンポジウム 論文集   121 - 128   2003.5

     More details

    Language:Japanese  

    researchmap

  • Building A High Performance Parallel File System Using Grid Datafarm and ROOT I/O Reviewed

    Youhei Morita, Hiroyuki Sato, Yoshiyuki Watase, Osamu Tatebe, Satoshi Sekiguchi, Satoshi Matsuoka, Noriyuki Soda, A. Dell'Acqua

    CoRR   cs.DC/0306092   2003

     More details

  • Ninf-G: A Reference Implementation of RPC-based Programming Middleware for Grid Computing. Reviewed

    Yoshio Tanaka, Hidemoto Nakada, Satoshi Sekiguchi, Toyotaro Suzumura, Satoshi Matsuoka

    J. Grid Comput.   1 ( 1 )   41 - 51   2003

  • Performance Evaluation Model for Scheduling in Global Computing Systems Reviewed

    Kento Aida, Atsuko Takefusa, Satoshi Matsuoka, Hidemoto Nakada, Umpei Nagashima

    International Journal of High-Performance Computing Applications   14 ( 3 )   268 - 279   2000.10

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1177/109434200001400308

    researchmap

  • OMPC++ - A portable high-performance implementation of DSM using OpenC plus plus reflection Reviewed

    Y Sohda, H Ogawa, S Matsuoka

    PARALLEL AND DISTRIBUTED COMPUTING FOR SYMBOLIC AND IRREGULAR APPLICATIONS   316 - 320   2000

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Design Issues of Network Enabled Server Systems for the Grid. Reviewed

    Satoshi Matsuoka, Mitsuhisa Sato, Hidemoto Nakada, Satoshi Sekiguchi

    Grid Computing - GRID 2000, First IEEE/ACM International Workshop, Bangalore, India, December 17, 2000, Proceedings   4 - 17   2000

     More details

    Publisher:Springer  

    DOI: 10.1007/3-540-44444-0_2

    researchmap

  • Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms Reviewed

    Atsuko Takefusa, Satoshi Matsuoka, Hidemoto Nakada, Kento Aida, Umpei Nagashima

    Proc. 8th IEEE International Symposium on High Performance Distributed Computing (HPDC-8)   97 - 104   1999.8

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/HPDC.1999.805287

    researchmap

  • グローバルコンピューティングのためのスケジューリング フレームワーク Reviewed

    中田 秀基, 竹房 あつ子, 松岡 聡, 佐藤 三久, 関口 智嗣

    情報処理学会・電気通信処理学会 並列処理シンポジウム JSPP'99 論文集   277 - 284   1999.6

     More details

    Language:Japanese  

    researchmap

  • Enhancing and Porting an Efficient Constraint Solver for Hierarchical Linear Systems

    Satoshi Matsuoka, Hiroshi Hosobe

    Proceedings of the Workshop of the FY1998 Research Funding Program   2 - 7   1999.3

     More details

    Language:Japanese   Publisher:AITEC, JIPDEC  

    researchmap

  • HiRise : An Incremental Constraint Solver for Constructing Graphical User Interfaces Reviewed

    Hosobe Hiroshi, Matsuoka Satoshi, Yonezawa Akinori

    Computer Software   16 ( 6 )   6_549 - 6_561   1999

     More details

    Language:Japanese   Publisher:Japan Society for Software Science and Technology  

    <p></p>

    DOI: 10.11309/jssst.16.6_549

    CiNii Books

    researchmap

  • HiRise: An Incremental Constraint Solver for Constructing Graphical User Interfaces Reviewed

    Hiroshi Hosobe, Satoshi Matsuoka, Akinori Yonezawa

    Michiaki Yasumura (Ed.), Interactive Systems and Software VI--JSSST WISS'98, Lecture Notes in Software Science   21   73 - 82   1998.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:Kindai-Kagaku-Sha  

    researchmap

  • 広域計算システムのシミュレーションによる評価 - Ninfシステムの広域分散環境でのジョブスケジューリング実現 に向けて - Reviewed

    竹房 あつ子, 合田 憲人, 小川宏高, 中田 秀基, 松岡 聡, 佐藤 三久, 関口 智嗣, 長嶋 雲兵

    情報処理学会・電気通信処理学会 並列処理シンポジウム JSPP'98 論文集   127 - 134   1998.6

     More details

    Language:Japanese  

    researchmap

  • Development of an Efficient Solver for Hierarchical Linear Systems

    Satoshi Matsuoka, Hiroshi Hosobe

    Proceedings of the Workshop of the FY1997 Research Funding Program   4 - 9   1998.3

     More details

    Language:Japanese   Publisher:AITEC, JIPDEC  

    researchmap

  • An Interactive Drawing Editor with Low Cognitive Overload

    Kwachiya Sachiko, Igarashi Takeo, Matsuoka Satoshi, Tanaka Hidehiko

    Computer Software   15 ( 4 )   4_296 - 4_306   1998

     More details

    Language:Japanese   Publisher:Japan Society for Software Science and Technology  

    <p></p>

    DOI: 10.11309/jssst.15.4_296

    CiNii Books

    researchmap

  • Ninflet: a migratable parallel objects framework using Java. Reviewed

    Hiromitsu Takagi, Satoshi Matsuoka, Hidemoto Nakada, Satoshi Sekiguchi, Mitsuhisa Sato, Umpei Nagashima

    Concurrency - Practice and Experience   10 ( 11-13 )   1063 - 1078   1998

  • Efficient Satisfaction of Constraint Hierarchies Using Hierarchical Linear Systems (short paper) Reviewed

    Hiroshi Hosobe, Satoshi Matsuoka, Akinori Yonezawa

    Rikio Onai (Ed.), Interactive Systems and Software V--JSSST WISS'97, Lecture Notes in Software Science   18   129 - 134   1997.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:Kindai-Kagaku-Sha  

    researchmap

  • Preliminary Evaluation of Scheduling in Ninf: a Global Computing System Reviewed

    Satoshi Matsuoka, Hirotaka Ogawa, Atsuko Takefusa, Hidemoto Nakada, Kento Aida, Umpei Nagashima, Mitsuhisa Sato, Satoshi Sekiguchi

    Proc. International Workshop on Innovative Architectures '97   1 - 7   1997.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • In Search for an Ideal Computer-Assisted Drawing System Reviewed

    Takeo Igarashi, Sachiko Kawachiya, Satoshi Matsuoka, Hidehiko Tanaka

    INTERACT'97 ( The Sixth IFIP Conference on Human-Computer Interaction ) Sydney, Australia   104 - 111   1997.7

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:Chapman & Hall  

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/interact/interact1997.html#IgarashiKMT97

  • マルチクライアントによるネットワーク数値情報システム Ninfの性能 Reviewed

    竹房 あつ子, 小川宏高, 松岡 聡, 中田 秀基, 佐藤 三久, 関口 智嗣, 長嶋 雲兵

    情報処理学会・電気通信処理学会 並列処理シンポジウム JSPP'97 論文集   273 - 280   1997.5

     More details

    Language:Japanese  

    researchmap

  • Towards a Parallel C++ Programming Language Based on Commodity Object-Oriented Technologies.

    Satoshi Matsuoka, A. Nikami, Hirotaka Ogawa, Yutaka Ishikawa

    Scientific Computing in Object-Oriented Parallel Environments(ISCOPE)   81 - 88   1997

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/3-540-63827-X_47

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/iscope/iscope1997.html#MatsuokaNOI97

  • Ninf: A Network Based Information Library for Global World-Wide Computing Infrastructure. Reviewed

    Mitsuhisa Sato, Hidemoto Nakada, Satoshi Sekiguchi, Satoshi Matsuoka, Umpei Nagashima, Hiromitsu Takagi

    High-Performance Computing and Networking, International Conference and Exhibition, HPCN Europe 1997, Vienna, Austria, April 28-30, 1997, Proceedings   491 - 502   1997

     More details

    Publisher:Springer  

    DOI: 10.1007/BFb0031622

    researchmap

  • Generalized Local Propagation: A Framework for Solving Constraint Hierarchies Reviewed

    Hiroshi Hosobe, Satoshi Matsuoka, Akinori Yonezawa

    Eugene C. Freuder (Ed.), Principles and Practice of Constraint Programming--CP'96, Lecture Notes in Computer Science   1118   237 - 251   1996.8

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer-Verlag  

    researchmap

  • GIGA:A Pen-Based Constraint Drawing System Reviewed

    Sachiko Kawachiya, Takeo Igarashi, Satoshi Matsuoka, Hidehiko Tanaka

    Proc. of OZCHI'96 (6th Australian Conference on Computer-Human Interaction)   314 - 315   1996

     More details

  • Efficient Satisfaction of Constraint Hierarchies with Inequalities Reviewed

    Hiroshi Hosobe, Satoshi Matsuoka, Akinori Yonezawa

    Jiro Tanaka (Ed.), Interactive Systems and Software III--JSSST WISS'95, Lecture Notes in Software Science   12   123 - 132   1995.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:Kindai-Kagaku-Sha  

    researchmap

  • Adaptive Recognition of Implicit Structure in Human-Organized Layouts Reviewed

    Takeo Igarashi, Satoshi Matsuoka, Toshiyuki Masui

    Proceedings of Visual Languages '95   51 ( 0 )   265 - 266   1995.9

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • A Constraint-Based Approach for Visualization and Animation Reviewed

    Shin Takahashi, Satoshi Matsuoka, Ken Miyashita, Hiroshi Hosobe, Akinori Yonezawa, Tomihisa Kamada

    Proceedings of the International Workshop on Constraints for Graphics and Visualization   103 - 117   1995.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Stackthreads: An abstract machine for scheduling fine-grain threads on stock CPUs Reviewed

    Kenjiro Taura, Satoshi Matsuoka, Akinori Yonezawa

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   907   121 - 136   1995

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer Verlag  

    DOI: 10.1007/BFb0026567

    Scopus

    researchmap

  • Locally Simultaneous Constraint Satisfaction Reviewed

    Hiroshi Hosobe, Ken Miyashita, Shin Takahashi, Satoshi Matsuoka, Akinori Yonezawa

    Alan Borning (Ed.), Principles and Practice of Constraint Programming--PPCP'94, Lecture Notes in Computer Science   874   51 - 62   1994.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer-Verlag  

    researchmap

  • Locally Simultaneous Constraint Satisfaction Reviewed

    Hiroshi Hosobe, Ken Miyashita, Shin Takahashi, Satoshi Matsuoka, Akinori Yonezawa

    Akikazu Takeuchi (Ed.), Interactive Systems and Software I--JSSST WISS'93, Lecture Notes in Software Science   7   49 - 56   1994.9

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:Kindai-Kagaku-Sha  

    researchmap

  • Highly efficient and encapsulated re-use of synchronization code in concurrent object-oriented languages Reviewed

    Satoshi Matsuoka, Kenjiro Taura, Akinori Yonezawa

    Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA   129674   109 - 126   1993.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Association for Computing Machinery  

    DOI: 10.1145/165854.165874

    Scopus

    researchmap

  • A Constraint Solving Algorithm for Real-Time Interaction in User Interfaces

    Hiroshi Hosobe, Ken Miyashita, Shin Takahashi, Satoshi Matsuoka, Akinori Yonezawa, Tomihisa Kamada

    Proceedings of the 10th JSSST Conference   77 - 80   1993.6

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Implementing Concurrent Object-Oriented Languages on Multicomputers

    Akinori Yonezawa, Satoshi Matsuoka, Masahiro Yasugi, Kenjiro Taura

    IEEE Parallel and Distributed Technology   1 ( 2 )   49 - 61   1993.5

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1109/88.218175

    Scopus

    researchmap

  • An Efficient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers

    Kenjiro Taura, Satoshi Matsuoka, Akinori Yonezawa

    ACM SIGPLAN Notices   28 ( 7 )   218 - 228   1993.1

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1145/173284.155355

    Scopus

    researchmap

  • An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers

    Kenjiro Taura, Satoshi Matsuoka, Akinori Yonezawa

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   748 LNCS   402 - 403   1993

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/bfb0018667

    Scopus

    researchmap

  • OBJECT-ORIENTED CONCURRENT REFLECTIVE LANGUAGES CAN BE IMPLEMENTED EFFICIENTLY

    H MASUHARA, S MATSUOKA, T WATANABE, A YONEZAWA

    SIGPLAN NOTICES   27 ( 10 )   127 - 144   1992.10

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1145/141937.141948

    Web of Science

    researchmap

  • OBJECT-ORIENTED CONCURRENT REFLECTIVE LANGUAGES CAN BE IMPLEMENTED EFFICIENTLY Reviewed

    H MASUHARA, S MATSUOKA, T WATANABE, A YONEZAWA

    OOPSLA '92 CONFERENCE PROCEEDINGS: CONFERENCE ON OBJECT-ORIENTED PROGRAMMING SYSTEMS, LANGUAGES, AND APPLICATIONS   127 - 144   1992

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Web of Science

    researchmap

  • Object-oriented concurrent reflective architectures

    Satoshi Matsuoka, Takuo Watanabe, Yuuji Ichisugi, Akinori Yonezawa

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   612 LNCS   211 - 226   1992

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1007/3-540-55613-3_11

    Scopus

    researchmap

▼display all

Books

▼display all

MISC

  • Efficient FDK Algorithms on SIMD-accelerated Processors

    Peng Chen, Mohamed Wahib, shinichiro takizawa, Takahiro Hirofuchi, Ogawa Hirotaka, Satoshi Matsuoka

    研究報告ハイパフォーマンスコンピューティング(HPC)   2020-HPC-175 ( 6 )   1 - 11   2020.7

     More details

    Computed Tomography (CT) is a widely used 3D imaging technology that requires compute-intense algorithms to generate volume data (or images). We propose a collection of novel back-projection algorithms that reduce the arithmetic computation and improve data locality. We also implement novel algorithms as efficient back-projection kernels that are performance portable over a wide range of CPUs. Unlike the conventional approaches that use OpenMP and target-specific SIMD intrinsics, we employ a high-level OpenCL implementation to generate the vectorized code and use the OpenCL local memory to prefetch the pixels at sub-pixel precision in a regular memory access fashion. Performance evaluation using a variety of Intel CPUs generations demonstrates that our back-projection implementation runs up to 10 times faster than the multi-threading optimized implementation.

    researchmap

  • A Study of Synchronization Methods in Modern GPUs

    Lingqi Zhang, Wahib Mohamed, Haoyu Zhang, Satoshi Matsuoka

    IEEE International Parallel & Distributed Processing Symposium 2020   2020.4

     More details

    Language:English  

    GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia's latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of those synchronization methods. This work explores important undocumented features and provides an in-depth analysis of the performance considerations and pitfalls of the state-of-art synchronization methods for Nvidia GPUs. The provided analysis would be useful when making design choices for applications, libraries, and frameworks running on single and/or multi-GPU environments. We provide a case study of the commonly used reduction operator to illustrate how the knowledge gained in our analysis can be useful. We also describe our micro-benchmarks and measurement methods.

    researchmap

  • High resolution Image Reconstruction on Super computers

    Chen Peng,Wahib, Mohamed, Takizawa Shinichiro, Matsuoka Satoshi

    2020.3

     More details

  • A Software Systolic Array on GPUs

    Chen Peng,Wahib, Mohamed, Takizawa Shinichiro, Matsuoka Satoshi

    2020.3

     More details

  • A Survey on Coarse-Grained Reconfigurable Architectures From a Performance Perspective

    Artur Podobas, Kentaro Sano, Satoshi Matsuoka

    IEEE ACCESS   8   146719 - 146743   2020

  • Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA.

    Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang 0001, Ryousei Takano, Satoshi Matsuoka

    CoRR   abs/2008.11421   2020

     More details

    The dedicated memory of hardware accelerators can be insufficient to store
    all weights and/or intermediate states of large deep learning models. Although
    model parallelism is a viable approach to reduce the memory pressure issue,
    significant modification of the source code and considerations for algorithms
    are required. An alternative solution is to use out-of-core methods instead of,
    or in addition to, data parallelism. We propose a performance model based on
    the concurrency analysis of out-of-core training behavior, and derive a
    strategy that combines layer swapping and redundant recomputing. We achieve an
    average of 1.52x speedup in six different models over the state-of-the-art
    out-of-core methods. We also introduce the first method to solve the
    challenging problem of out-of-core multi-node training by carefully pipelining
    gradient exchanges and performing the parameter updates on the host. Our data
    parallel out-of-core solution can outperform complex hybrid model parallelism
    in training large models, e.g. Megatron-LM and Turning-NLG.

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2008.html#abs-2008-11421

  • Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?

    Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang 0001, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

    CoRR   abs/2010.14373   2020

     More details

    Matrix engines or units, in different forms and affinities, are becoming a
    reality in modern processors; CPUs and otherwise. The current and dominant
    algorithmic approach to Deep Learning merits the commercial investments in
    these units, and deduced from the No. 1 benchmark in supercomputing, namely
    High Performance Linpack, one would expect an awakened enthusiasm by the HPC
    community, too. Hence, our goal is to identify the practical added benefits for
    HPC and machine learning applications by having access to matrix engines. For
    this purpose, we perform an in-depth survey of software stacks, proxy
    applications and benchmarks, and historical batch job records. We provide a
    cost-benefit analysis of matrix engines, both asymptotically and in conjunction
    with state-of-the-art processors. While our empirical data will temper the
    enthusiasm, we also outline opportunities to "misuse" these dense
    matrix-multiplication engines if they come for free.

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2010.html#abs-2010-14373

  • 早期終了タイミングを予測する:深層学習における確率勾配の分布の変化点検出

    八島慶汰, 石川康太, 佐藤育郎, 野村哲弘, 横田理央, 松岡聡

    第22回情報論的学習理論ワークショップ (IBIS 2019)   2019.9

     More details

  • Understanding the Overheads of Launching CUDA Kernels

    Lingqi Zhang, Wahib Mohamed, Satoshi Matsuoka

    2019.8

     More details

  • Towards Performance Portability and Modernization of FLASH via Transpilation with High-Level Intermediate Representation

    Mateusz Bysiek, Saurabh Chawdhary, Mohamed Wahib, Anshu Dubey, Satoshi Matsuoka

    2019-HPC-170 ( 30 )   1 - 9   2019.7

     More details

    With concurrent increase in application complexity and hardware heterogeneity, large multiphysics code FLASH faces huge challenges to its continued usability on high performance computing platforms. We are building a novel transpilation framework that relies on high-level intermediate representation to confront this challenge and enable FLASH to adapt to accelerator-based architecture via performance-oriented refactoring. Additionally, we use the framework to modernize code by enabling higher level of abstraction in expressing computations. We evaluate the effectiveness of the tool with respect to speedup obtained relative to original code performance, and also quantify productivity gains.

    researchmap

  • メモリアクセスデータを用いた機械学習によるアプリケーションの類型化

    土川 稔生, 遠藤 敏夫, 野村 哲弘, 近藤 正章, 大山 洋介, 松岡 聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2019-HPC-170 ( 12 )   1 - 7   2019.7

     More details

    Language:Japanese  

    researchmap

  • Breaking the Limitation of GPU memory for Deep Learning

    Haoyu Zhang, Wahib Mohamed, Lingqi Zhang, Yohei Tsuji, Satoshi Matsuoka

    2019-HPC-170 ( 10 )   1 - 7   2019.7

     More details

    GPU memory can be insufficient for Deep Learning workloads with respect to the model and dataset sizes. Although model parallelism could help, significant modification of the code is needed for every case. An alternative general solution to this problem is to use out-of-core methods. Recent work proposed data-swapping and CUDA Unified Memory (UM) methods to break the limitation of GPU memory capacity. However, there is a lack of detailed analysis, via performance modeling, of the behavior and limitations of those methods. In this paper we analyze the behavior in terms of both single layer and the whole model. as well as propose a performance model based on the analysis to study how out-of-core training behaves and hence empower the co-design process for Deep Learning workloads.

    researchmap

  • DNNの汎化の解明に向けた学習過程における勾配データの解析

    八島慶汰, 石川康太, 佐藤育郎, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2019-HPC-170 ( 7 )   1 - 5   2019.7

     More details

    近年,Deep Neural Network (DNN) を用いた深層学習は画像認識や自然言語等の多くの分野において優れた結果を残している.その中でも SGD を用いた学習メカニズムと未知データに対する汎化性能との関連性については未解明な部分が多く存在している.私達は学習過程において学習データから得られる Fisher 情報行列の固有値や勾配データの解析を行うことで,これまでに汎化の指標であると考えられてきた Fisher 情報量行列の固有値の値は不安定であるということを実験的に示した.また,その実験から勾配の外れ値や分布と汎化性能が関連しているのではないのかという仮説をもとに,学習モデルから全訓練データから得られる勾配量の時系列的解析を行った.

    researchmap

  • Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization

    Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Marc Snir, Peter Nugent, Brian Van Essen

    2019.6

     More details

    Language:English  

    researchmap

  • Boosting GCN Application with Batched Sparse Matrix Multiplication

    Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka

    2019.3

     More details

  • Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches?

    Jens Domke, Kazuaki Matsumura, Mohamed Wahib, Haoyu Zhang, Keita Yashima, Toshiki Tsuchikawa, Yohei Tsuji, Artur Podobas, Satoshi Matsuoka

    2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019)   78 - 88   2019

  • The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface

    Hamid Reza Zohouri, Satoshi Matsuoka

    PROCEEDINGS OF H2RC 2019: 2019 FIFTH IEEE/ACM INTERNATIONAL WORKSHOP ON HETEROGENEOUS HIGH-PERFORMANCE RECONFIGURABLE COMPUTING (H2RC)   11 - 18   2019

  • Cloud-based Burst Buffers for I/O Acceleration

    Cloud-based Burst Buffers for I, O Acceleration

    2018.7

     More details

    Cloud computing offers high computational resources, scalability, as well as ease of access. Such cloud environments provide users with virtually unlimited computational resources to run HPC applications at larger scale than what in-house systems can provide. Since large scale data intensive applications typically generate huge amounts of intermediate data and are shared by hundreds and thousands of compute nodes, such applications require high I/O throughput to shared storage. However, current shared storage in cloud environments cannot provide enough I/O throughput for these applications. The low I/O throughput becomes a performance bottleneck and the prolonged execution time incurs more cost to users as most cloud providers employ pay-as-you-go pricing models. Furthermore, the eventual consistency policy adopted by most cloud storages causes multiple-node job failure due to the inconsistent read-after-write. To solve these problems, we propose a cloud-based burst buffer system as a new tier in cloud storage systems. The cloud-based burst buffer system uses computing nodes as burst buffer nodes, and buffers applications' data in the burst buffer nodes. Because throughput between compute nodes is much higher and more stable than shared storage throughput, we can accelerate I/O performance for data intensive applications. Moreover, by maintaining data consistency among burst buffer nodes, we can avoid job failure caused by eventual consistency issue. To explore the effectiveness of cloud-based burst buffers, we implement a prototype and evaluate the system in Amazon EC2/S3. Our experiments reveal that our system can perfectly solve the eventual consistency issue as well as improve performance of a real-world data intensive application by up to 4.5 times as well as reduced monetary cost by 56.3%.

    researchmap

  • μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

    Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

    2018.4

     More details

    NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used
    in deep learning. Specifically, cuDNN implements several equivalent convolution
    algorithms, whose performance and memory footprint may vary considerably,
    depending on the layer dimensions. When an algorithm is automatically selected
    by cuDNN, the decision is performed on a per-layer basis, and thus it often
    resorts to slower algorithms that fit the workspace size constraints. We
    present {\mu}-cuDNN, a transparent wrapper library for cuDNN, which divides
    layers' mini-batch computation into several micro-batches. Based on Dynamic
    Programming and Integer Linear Programming, {\mu}-cuDNN enables faster
    algorithms by decreasing the workspace requirements. At the same time,
    {\mu}-cuDNN keeps the computational semantics unchanged, so that it decouples
    statistical efficiency from the hardware efficiency safely. We demonstrate the
    effectiveness of {\mu}-cuDNN over two frameworks, Caffe and TensorFlow,
    achieving speedups of 1.63x for AlexNet and 1.21x for ResNet-18 on P100-SXM2
    GPU. These results indicate that using micro-batches can seamlessly increase
    the performance of deep learning, while maintaining the same memory footprint.

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/1804.04806v1

  • HuronFS : Hierarchical, User-level and On-demand Burst Buffer File System

    Tianqi Xu, Kento Sato, Satoshi Matsuoka

    ISC2018   2018.4

     More details

  • Pushing the Limits for 2D Convolution Computation On CUDA-enabled GPUs

    Chen Peng,Wahib, Mohamed, Takizawa Shinichiro, Matsuoka Satoshi

    2018-HPC-163 ( 22 )   1 - 9   2018.2

     More details

    The 2D convolution operator is the computational bottleneck in a variety of image processing and machine learning applications. We propose an algorithm to compute convolution by employing register files to cache image data (known as register cache), rather than using the user-managed scratch-pad memory. We take advantage of CUDA's warp shuffle functions to accelerate the intra-warp communication of partial results. Unlike the GEMM-based, FFT-based or Winograd method, our algorithm executes the convolution computation without using any GPU memory as a workspace, and is general to all filter shapes. Our algorithm performs better than state-of-the-art 2D convolution implementations. Using a single TitanXp GPU, it is in average 4.7x faster than NPP (Nvidia Performance Primitives), and 1.8x faster than the highly-optimized ArrayFire library.

    researchmap

  • Efficiently Enlarging GPU Memory Capacity with NVM

    Pak Markthub, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, Satoshi Matsuoka

    2018.1

     More details

  • 機械学習による計算機トレースの自動生成

    土川稔生, 大山洋介, 野村哲弘, 松岡聡, 松岡聡

    情報処理学会研究報告(Web)   2018 ( HPC-165 )   2018

  • 大規模データセンター運用最適化フレームワーク構築に向けて

    滝澤真一朗, 高野了成, 松岡聡

    2017.12

     More details

  • Less is More: Accelerating Deep Neural Networks with Micro-Batching

    Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

    2017-HPC-162 ( 22 )   1 - 9   2017.12

     More details

  • OpenCL-Based High-Performance 3D Stencil Computation on FPGAs

    Hamid Reza Zohouri, Artur Podobas, Naoya Maruyama, Satoshi Matsuoka

    2017.11

     More details

  • A Simulation-Based Analysis on the Configuration of the Burst Buffer

    Tianqi Xu, Kento Sato, Satoshi Matsuoka

    2017.11

     More details

  • Deep Q-Networkを用いての計算機の制御による電力最適化

    寺西 賢人, 野村 哲弘, 松岡 聡

    情報処理学会研究報告   017-HPC-158 ( 3 )   2017.8

     More details

    近年のスーパーコンピュータは大量に電力を消費するようになり,実用的なスーパーコンピュータの性能の向上には電力効率が課題となっている.省電力手法としてはCPUの周波数や電圧などの制御による電力の最適化があり,その制御に適した値をパフォーマンスカウンタなどのデータを用いて算出する研究が多く進められている.しかし,既存の研究では各データを詳細に解析する手法を取っており,扱うデータ数の制限や環境の変化による再解析を必要としている.そこで我々は,近年研究が盛んに行われている深層学習を用いて解析をする汎用性が高い制御方法を提案する.特にゲーミングや囲碁のAIなどで使用されているDeep Q-Networkという深層強化学習手法によって計算機を直接制御する装置を実装し,評価する.

    researchmap

  • 動的なプロセス数操作による分散深層学習の耐故障性と性能評価

    辻 陽平, 野村 哲弘, 實本 英之, 佐藤 育郎, 松岡 聡

    情報処理学会研究報告(Web) (IPSJ Technical Report (Web))   2017 ( HPC-160 )   2017.7

     More details

    深層学習はその認識精度の高さから研究開発が盛んに行われており,実社会においても深層学習を取り入れた応用技術を目にすることができる。深層学習では十分な認識精度を得るまでに,大量のデータとGPUなどを用いた長時間の計算が必要となる。そのためHPCクラスタなどの高性能計算機での分散処理が利用される。分散システムでは故障発生間隔が短くなる傾向があり,アプリケーションの計算を正しく継続させるために耐故障性の手法が必要になる。本研究では大規模システム上の深層学習において重要になる耐故障性に対して,既存のcheckpoint/restartでない新たな手法detect/respawnを提案し,これをULFM-MPIによって実装した。SPRINTと呼ばれる分散深層学習アプリケーションを用いてTSUBAME-KFCの16ノード(128GPU)上で提案手法と既存手法を比較したところ,10時間の学習において提案手法が2.5%低いエラー率となり,より高い認識精度を達成することができた。(著者抄録)

    researchmap

  • Accelerating Spiking Neural Networks on FPGAs using OpenCL

    Artur Podobas, Satoshi Matsuoka

    2017-ARC-227 ( 23 )   1 - 7   2017.7

     More details

    Spiking Neural Networks (SNNs) are artificial neural networks inspired by the biological brain. They are used to study everything from various aspects of neuroscience to artificial intelligence and machine learning. SNNs are typically computed using general-purpose processors and the use of custom hardware is still fairly uncommon. Creating custom hardware is often time-consuming and error-prone. However, with the recent maturity in High-Level Synthesis (HLS) tools, algorithms can now be described using more abstract C/C++/Java programming models and automatically be transformed down to custom hardware. In the present work we present our findings and experience in using HLS to accelerate SNNs. We describe our prototype framework and our FPGA design and empirically evaluate its performance against modern general-purpose processors. Our evaluation shows that our design can reach up-to 82% (73% average) of the performance delivered by Intel Xeon E5-2650v3-a CPU that is two years younger and built using better technology than our FPGA platform.

    researchmap

  • 人工知能処理向け大規模・省電力クラウド基盤 AI Bridging Cloud Infrastructure (ABCI)の構想

    小川宏高, 松岡聡, 佐藤仁, 高野了成, 滝澤真一朗, 谷村勇輔, 三浦信一, 関口智嗣

    情報処理学会 研究報告   2017-HPC-160   2017.7

     More details

  • メタゲノム解析アプリケーションGHOSTZ-GPUの性能モデリングおよび改善

    山川智史, 野村哲弘, 松岡聡

    情報処理学会 研究報告 2017-HPC-160   2017.7

     More details

  • Prototype Modular Framework for Deep Learning Performance Testing

    Aleksandr Drozd, Satoshi Matsuoka

    2017.4

     More details

  • ディープラーニングのデータ並列学習における少精度浮動小数点数を用いた通信量の削減

    大山洋介, 野村哲弘, 佐藤育郎, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2017-HPC-158 ( 30 )   1 - 10   2017.3

     More details

    Deep Neural Network を用いた学習手法であるディープラーニングは他の機械学習手法と比較して高い認識精度を発揮することから近年非常に重要視されている.一方でディープラーニングはネットワークの計算量や学習に使用するデータ量が膨大であることから GPU クラスタを用いた場合でも学習に非常に長い時間を要する.また,特にパラメータ数が多いネットワークを一定のミニバッチサイズで学習する場合は勾配の GPU 間 ・ ノード間通信がスケーラビリティのボトルネックとなり,現存する GPU スパコンで利用可能な並列数よりもはるかに小さな規模でしか学習できないことが指摘されている.本論文では単精度よりも更に bit 数の少ない浮動小数点数型を用いた通信量の削減手法を提案する.提案手法では通信するデータを半精度浮動小数点数の上位 8bit により表現し,レイヤーごとに動的に表現範囲を調整することにより高速かつ単精度と比較して学習後の認識精度を大きく損なわない通信を実現する.提案手法は TSUBAME-KFC / DL の 2 ノード (16 GPU) を用いた CaffeNet と GoogLeNet の学習において,既存の単精度浮動小数点型を用いる場合と比較して認識精度を損なわずにそれぞれ 2.71 倍,2.19 倍の高速化を達成した.

    researchmap

  • Evolutionary Power Modeling for Energy Efficiency in CPU-GPU based systems

    Patricia Arroba, José M. Moya, José L. Ayala, Satoshi Matsuoka

    2017-HPC-158 ( 2 )   1 - 7   2017.3

     More details

    Supercomputers have reached a massive energy consumption due to computational demand, so there is an urgent necessity to keep them on a more scalable curve. In the last years, there has been a rising interest in reducing the power consumption of these systems. Recently research works focus on the adjustment of their power states by reducing clock frequency, applying power capping, and on the analysis of the thermal impact on static consumption. These techniques rely on power models to predict the power consumption of the infrastructure. However, the power consumption in these complex systems involves a vast number of interacting variables of different nature that may include non-linear dependencies. So, extracting the relationships between the most representative parameters and the power consumption requires an enormous effort and knowledge about the problem. We propose an automatic method based on Grammatical Evolution to obtain a model that minimizes the power prediction error of a supercomputer node that incorporates both CPU and GPU devices. We monitor the system during runtime using performance counters and frequency, temperature and power measurements. This evolutionary technique provides both Feature Engineering and Symbolic Regression to infer accurate models, which only depend on the most suitable variables, with little designers expertise requirements and effort. Our work improves the possibilities of deriving proactive energy-efficient policies in supercomputers that are simultaneously aware of complex considerations of different nature.

    researchmap

  • 低ランク近似行列によるCNNにおける畳み込み演算の最適化

    本山 義史, 遠藤 敏夫, 松岡 聡, 横田 理央, 福田 圭祐, 佐藤 育郎

    研究報告ハイパフォーマンスコンピューティング(HPC)   2017-HPC-158 ( 25 )   1 - 7   2017.3

     More details

    機械学習による画像認識の分野において,Convolutional Neural Network (CNN) を用いた優れた認識結果が報告されている.データセットが巨大であるため,学習には非常に大きな時間がかかり,また,必要となるメモリ量は大きくなる.そこで我々は,DL の計算におけるメモリ量の削減を図るため,畳み込みの演算の約 7 割を占める行列積計算に対し,低ランク近似行列を用いることを提案する.CNN アプリケーション中の行列に対し,SVD と階層型行列を適用し,評価した.特に,SVD を用いた時,圧縮率と精度とのトレードオフにおいて,認識精度をほとんど落とさず,サイズが特に大きい image 行列で最大約 9 割程のメモリ量削減に成功した.

    researchmap

  • Assessing the Interference Between Internode Communication and Network I/O Traffic

    Kevin Brown, Nikhil Jain, Abhinav Bhatele, Alfredo Gimenez, Kathryn Mohror, Satoshi Matsuoka, Martin Schulz

    2017-HPC-158 ( 11 )   1 - 6   2017.3

     More details

    Parallel file systems are used by supercomputers to support a range of applications that require concurrent access to high-performance shared storage for data workflow and resilience. The design of most of these systems result in the logical storage network sharing the same physical network infrastructure that is used for inter-process communication. Resource sharing in this manner on shared systems is a potential area of contention, which can be significant for communication and I/O intensive applications. We assess the interference caused by inter-process communication on the I/O throughput to parallel file system when they both traffic share the same network resources. For our experiments, we used miranda io and IOR I/O benchmarks for generating I/O traffic, and we used pF3D FFT kernel and NBP FT MPI benchmark for generating inter-node communication traffic. Our preliminary results from running I/O and communication benchmarks simultaneously indicate that inter-process communication does not affect the performance of typical I/O workloads.

    researchmap

  • Predicting Probabilistic Parameters of a Large-Scale Asynchronous SGD Deep Learning System

    Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, Yukimasa Tamatsu, Satoshi Matsuoka

    2017.2

     More details

  • Towards Making Legacy HPC Codes Maintainable: Two-Way Fortran-Python Transpilation with Python Type Hints (Unrefereed Workshop Manuscript)

    Mateusz Bysiek, Aleksandr Drozd, 松岡 聡

    2016-HPC-157 ( 9 )   1 - 10   2016.12

     More details

    We propose a method of accelerating Python code by just-in-time compilation leveraging type hints mechanism introduced in Python 3.5. In our approach performance-critical kernels are expected to be written as if Python was a strictly typed language, however without the need to extend Python syntax. This approach can be applied to any Python application, however we focus on a special case when legacy Fortran applications are automatically translated into Python for easier maintenance. We developed a framework implementing two-way transpilation and achieved performance equivalent to that of Python manually translated to Fortran, and better than using other currently available JIT alternatives (up to 5x times faster than Numba in some experiments).

    researchmap

  • ディレクティブによる時空間ブロッキングの自動適用

    黒田 勝汰, 遠藤 敏夫, 松岡 聡

    情報処理学会研究報告(Web) (IPSJ Technical Report (Web))   2016 ( HPC-157 )   2016.12

     More details

    ステンシル計算向けのループ最適化である時空間ブロッキングは非常に高い効果があるが,ループの制御が複雑になるためプログラミングコストが大きく,汎用的な最適化ではない。そのためループ変換ツールやステンシル向けDSLコンパイラの機能として実装されてきた。しかし,これらはパラメータ設定の柔軟性や対象プログラムの大幅な書き換えが必要という点で問題を抱えている。そこで,我々はディレクティブによる時空間ブロッキングの適用を提案する。いくつかの条件を満たすループにディレクティブにより指定されたパラメータで時空間ブロッキングを適用するツールを実装した。ステンシルベンチマークを用いて提案システムの性能改善効果とプログラミングコストを評価する。(著者抄録)

    researchmap

  • Fast Sparse General Matrix-Matrix Multiplication on GPU with Low Memory Usage

    Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

    2016.11

     More details

    Sparse general matrix-matrix multiplication (SpGEMM) is one of the key kernel of preconditioner such as algebraic multigrid method or graph algorithms. The performance of SpGEMM is quite low because of its random memory access to both input and output matrices. Moreover, the pattern of non-zero elements of resulting matrix is not known beforehand, which makes it hard to manage the memory usage. There are several GPU implementations of fast SpGEMM computation while consuming large temporal memory. We devise new SpGEMM algorithm requiring small amount of memory so that we can compute larger matrices using limited device memory of GPU. Accesses to input matrices are optimized for coalesced memory access. We devise efficient hash table on shared memory to calculate output matrix with appropriate case analysis for better load-balancing. Our algorithm achieves speedups of up to x4.0 in single precision and x3.3 in double precision compared to existing fast SpGEMM libraries.

    researchmap

  • I/O分割による遅延隠蔽を取り入れたOut-of-coreなGPU Set Intersectionの性能評価

    佐藤仁, 溝手竜, 松岡聡, 小山宏高

    2016.8

     More details

  • メモリ使用量を抑えた疎行列疎行列積計算のGPU高速化

    長坂 侑亮, 額田 彰, 松岡 聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2016-HPC-156 ( 15 )   1 - 9   2016.8

     More details

    AMG 法など反復解法の前処理において用いられる疎行列疎行列積計算は,ランダムなメモリアクセスによって性能向上が困難であることに加え,出力される行列の非ゼロ要素配置が計算開始時には不明であるという特徴を持つ.GPU での高速化を目的とした既存のアルゴリズムでは,実際に出力行列に必要となるメモリ使用量と比べて多大なメモリを要するため,適用可能な行列が制限されている.適切な場合分けとシェアードメモリの活用によってメモリの使用量を抑えることで広範な行列に対して適用可能であり,かつ更なる高性能化を実現する GPU での疎行列疎行列積計算手法を提案する.様々な特性を持つ 12 個の行列に対して Maxwell 世代 GPU にて性能評価を行い,既存の疎行列計算ライブラリから単精度で最大 4.77倍,倍精度で最大 3.84 倍の性能向上を達成した.

    researchmap

  • Towards Understanding HPC-Big Data Convergence Using Cloud Platforms

    Shweta Salaria, Kevin Brown, Hideyuki Jitsumoto, Satoshi Matsuoka

    2016-HPC-155 ( 2 )   1 - 5   2016.8

     More details

    The path to HPC-big data convergence has resulted in numerous researches that demonstrate the performance-cost tradeoff between running applications on supercomputers and cloud platforms. Previous studies typically focus on either scientific HPC benchmarks or a specific cloud configuration, failing to consider all the opportunities offered by cloud platforms. We present a comparative study of the performance of representative big data benchmarks, or ”Big Data Ogres”, and HPC benchmarks running on supercomputer and cloud. Our work distinguishes itself from previous studies in a way that we explore multiple cloud configurations: Shared, Dedicated and Spot Instances. Our results provide a more comprehensive performance-cost trade-off, thereby highlighting the gap that needs to be bridged to attain HPC-big data convergence.

    researchmap

  • データレイアウト最適化指示文によるOpenACCアプリケーションの高速化

    情報処理学会研究報告   2016-HPC-155 ( 9 )   2016.8

     More details

  • 学習条件を考慮した大規模非同期ディープラーニングシステムの性能モデリング

    大山 洋介, 野村 哲弘, 佐藤 育郎, 西村 裕紀, 玉津 幸政, 松岡 聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2016-HPC-155 ( 5 )   1 - 9   2016.8

     More details

    機械学習による画像認識において Convolutional Neural Network (CNN) と大規模なデータセットを用いた高い認識結果が報告されている.CNN の学習にはミニバッチ Stochastic Gradient Descent (SGD) と呼ばれる最適化手法が広く用いられるが,不適切なミニバッチサイズ下では認識性能が悪化することが知られている.SGD を高速化するために GPU での CNN の計算とパラメータの更新を非同期に行う非同期 SGD が提案されているが,ミニバッチサイズが動的に定まることからノード数等の学習条件の最適値は明らかではない.本論文では非同期 SGD で CNN の学習を行うシステム SPRINT の性能モデルを提案する.この性能モデルは CNN の構造とマシン性能・構成を入力とし,データセット全体を学習に使用する時間と平均ミニバッチサイズを予測する.TSUBAME-KFC/DL の 1~16 ノードを用いた評価では複数の CNN 構造について学習時間と平均ミニバッチサイズの平均予測誤差は 8%以下だった.また,2 つの異なるマシン上である平均ミニバッチサイズの範囲内で学習時間が最短となる学習条件を探索したところ,モデルが予測した順位は実測での順位と一致した.

    researchmap

  • ポストムーア時代におけるFLOPSからBYTESへの変革

    松岡 聡, 天野 英晴, 中島 研吾, 井上 弘士, 工藤 知宏, 丸山 直也, 田浦 健次, 岩下 武史, 片桐 孝弘, 塙 敏博, 遠藤 敏夫

    情報処理学会研究報告HPC-155-2016   2016.8

     More details

  • Evaluating tolerance of applications against realistic DRAM faults

    Yuya Kobayashi, Hideyuki Jitsumoto, Akihiro Nomura, Satoshi Matsuoka

    2016.6

     More details

  • Training Condition Conscious Performance Modeling of an Asynchronous Data-Parallel Deep Learning System

    Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, Yukimasa Tamatsu, Satoshi Matsuoka

    2016.6

     More details

  • 大規模グラフ処理ライブラリScaleGraphのout-of-coreメモリ拡張

    岩渕 圭太, 佐藤 仁, 松岡 聡

    ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集   2016   56 - 56   2016.5

     More details

  • Reducing Remote GPU Execution’s Overhead with mrCUDA

    Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

    2016.4

     More details

  • GPUアクセラレータと不揮発性メモリを考慮した大規模分散ソート

    社本 秀之, 佐藤 仁, 松岡 聡

    情報処理学会 研究報告 2015-HPC-154   2015-HPC-154   2016.4

     More details

  • Towards Understanding the Performance of FPGAs using OpenCL Benchmarks

    Hamid Reza, Zohouri Naoya Maruyama Aaron, Smith Motohiko Matsuda, Satoshi Matsuoka

    HiPEAC Workshop on Reconfigurable Computing   2016.3

     More details

    We evaluate the performance of a subset of the benchmarks available in the Rodinia Suite, using Altera’s OpenCL SDK and the Terasic DE5-Net FPGA board, equipped with an Altera Stratix V GXA7 FPGA, and present timing and power estimation results and comparison with a modern CPU and GPU. The results are presented for multiple versions of each benchmark, each with a varying degree of optimization for FPGAs, ranging from direct ports from the initial OpenCL implementation to loop-pipelined kernels specifically optimized for FPGAs. Our results show that, while it is possible to use a common programming language available for other more-widely used accelerators in HPC, the implementation method optimal for FPGAs is significantly different from those for other accelerators such as GPUs. Specifically, we find that multithreaded kernels typically used for GPUs do not perform as efficiently as those optimized with FPGA-specific optimizations such as sliding windows. However, by exploiting the FPGA-specific optimizations, FPGA with OpenCL shows promising performance. Our results using the Altera Stratix V 5SGXA7 FPGA indicate that, with FPGA-specific optimizations, it is possible to achieve up to 3.9x better power efficiency in comparison to an Nvidia K20C GPU.

    researchmap

  • GPU-Accelerated Large-scale Distributed Sorting Coping with Device Memory Capacity

    Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato, Satoshi Matsuoka

    IEEE Transactions on Big Data   Volume 1 ( Issue 1 )   57 - 69   2016.3

     More details

    Publisher:IEEE  

    Splitter-based parallel sorting algorithms are known to be highly efficient for distributed sorting due to their low communication complexity. Although using GPU accelerators could help to reduce the computation cost in general, their effectiveness in distributed sorting algorithms remains unclear. We investigate applicability of using GPU devices to the splitter-based algorithms and extend HykSort, an existing splitter-based algorithm by offloading costly computation phases to GPUs. To cope with the volumes of data exceeding the GPU memory capacity, out-of-core local sort is used with small overhead about 7.5 percent when the data size is tripled. We evaluate the performance of our implementation on the TSUBAME2.5 supercomputer that comprises over 4,000 NVIDIA K20x GPUs. Weak scaling analysis shows 389 times speedup with 0.25 TB/s throughput when sorting 4 TB of 64 bit integer values on 1,024 nodes compared to running on one node; this is 1.40 times faster than the reference CPU implementation. Detailed analysis however reveals that the performance is mostly bottlenecked by the CPU-GPU host-to-device bandwidth. With orders of magnitude improvements announced for next generation GPUs, the performance boost will be tremendous in accordance with other successful GPU accelerations.

    researchmap

  • Linguistic Regularities from Multiple Samples

    Research Reports on Mathematical and Computing Sciences. Ser. C, Computer Science   ( 283 )   1 - 6   2016.2

     More details

    Language:English   Publisher:Dept. of Mathematical and Computing Sciences, Tokyo Institute of Technology  

    researchmap

  • I/O分割による遅延隠蔽を用いたOut-of-coreなGPU Set Intersectionの性能評価(Unrefereed Workshop Manuscript)

    佐藤仁, 佐藤仁, 佐藤仁, 溝手竜, 溝手竜, 松岡聡, 松岡聡, 松岡聡, 小川宏高

    情報処理学会研究報告(Web)   2016 ( HPC-155 )   2016

  • Distributed Computing for Machine Learning on Large-Scale Image Dataset

    佐藤育郎, 渡邉隆太郎, 西村裕紀, 野村哲弘, 松岡聡

    Tsubame e-Science Journal   14   2016

  • Optimizing the Rodinia Benchmark for FPGAs

    Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, Satoshi Matsuoka

    IPSJ SIG Technical Report   2015-HPC-152 No.16   2015.12

     More details

  • Design and Modelling of Cloud-based Burst Buffers

    Tianqi Xu, Kento Sato, Satoshi Matsuoka

    2015.11

     More details

  • 多段階ブロッキングによるメモリアクセス量削減を図ったGPU向け疎行列ベクトル積計算手法の性能評価

    長坂侑亮, 額田彰, 松岡聡

    2015.9

     More details

  • mrCUDA: Low-Overhead Middleware for Live Migrating Remote GPU Execution to Local GPU Execution

    Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

    2015.9

     More details

  • 疎行列ベクトル積計算を対象としたGPU向けメモリアクセス削減手法

    長坂 侑亮, 額田 彰, 松岡 聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2015-HPC-151 ( 8 )   1 - 7   2015.9

     More details

    科学技術計算において巨大で疎な問題行列を持つ連立一次方程式を解く際,疎行列ベクトル積計算が実行時間の大部分を占めている.疎行列ベクトル積計算の GPU 向けの高速化も数多く行われてきているものの,疎行列ベクトル積計算がメモリバウンドなカーネルであることや入力ベクトルへのランダムアクセスによって発生する局所性低下等が要因となって性能向上が妨げられている.我々は GPU での疎行列ベクトル積計算時のメモリアクセス量とアクセス頻度を効果的に削減する疎行列フォーマットである AMB(Adaptive Multi-level Blocking) フォーマットを提案する.16bit integer の利用と種々のブロッキング手法によって,列インデックスの圧縮を行い,メモリアクセス量の削減を図っている.Florida 大学の疎行列データセットから選出した 40 個の行列に対して,既存手法との比較を行い,cuSparse と比較して最大で 2.81 倍,平均で 1.77 倍の性能向上を果たし,また,近年提案された高速な疎行列ベクトル積ライブラリである yaSpMV と比較して最大で 1.38 倍,平均で 1.13 倍の性能向上を果たした.

    researchmap

  • ノード内同時実行ジョブにおけるパフォーマンスカウンタによるプロセス毎消費電力のモデル化

    寺西 賢人, 野村 哲弘, 遠藤 敏夫, 松岡 聡

    情報処理学会研究報告   2015.8

     More details

    近年のスーパーコンピュータは大量に電力を消費するようになり,実用的なスーパーコンピュータの性能の向上には電力効率が課題となっている.消費電力の効率の良い制御のためにはより詳しい消費電力の計測を行う必要がある.しかし現状ノード毎の消費電力を計測することは可能だが,プロセス毎の消費電力の計測をすることはできない.本論文ではプロセス毎に計測可能なパフォーマンスカウンタを用いて消費電力をモデリングし,同一ノード内で同時にジョブを実行した場合のプロセス毎の消費電力の推定を提案する.作成したモデル式を用いた電力の推定実験を1プロセス時と2プロセス同時実行時についてそれぞれ行い,1プロセス時は最大で誤差5.16%,2プロセス時は計測した組み合わせのうちの84.8%が誤差4%以内となった

    researchmap

  • mrCUDA: A middleware for migrating rCUDA virtual GPUs to native GPUs

    Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

    2015.8

     More details

  • GPUアクセラレータと不揮発性メモリを考慮した外部ソート

    佐藤 仁, 溝手 竜, 松岡 聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2015-HPC-150 ( 24 )   1 - 7   2015.7

     More details

    GPU アクセラレータと不揮発性メモリを考慮した外部ソートアルゴリズム xtr2sort (extreme external sort) を提案する.GPU の高い演算性能とメモリバンド幅を活かし,不揮発性メモリ,ホストメモリ,デバイスメモリ間のデータ移動に伴う遅延を隠蔽するために,不揮発性メモリ上のソートの対象となるレコードをデバイスメモリの収まるサイズへチャンクに分割し,チャンク毎にパイプラインで不揮発性メモリへの I/O 操作, CPU-GPU 間のメモリ転送,GPU 上でのソート処理を非同期に行うことで,デバイスメモリやホストメモリの容量を超えたサイズのレコードに対しても高速なソートを行う.提案手法を 2-way の Intel Xeon E5-2699 v3 2.30GHz (18コア), NVIDIA Tesla K40 を搭載した 1 台のサーバで評価した結果,Linux Asynchronous I/O(libaio) を用いたノンブロッキング I/O による提案手法の実装において,CPU 上で実行可能なレコード数の 4 倍,GPU 上で実行可能なレコード数の 64 倍となる 25:6 × 109 の int64 t 型の整数値からなるレコードに対し,78,121,548 records/sec で動作し,2 ソケット 72 スレッドで動作させた CPU 版のノンブロッキング I/O による out-of-core ソートと比較して 2.16 倍の性能を示すことを確認した.これらから,GPU アクセラレータを用いた Out-of-core な処理に向けて,不揮発性メモリを組み合わせ I/O のチャンク化と遅延隠蔽を行うことが良好な手法であることが伺える.

    researchmap

  • メモリアクセスパターン依存故障の注入のためのQEMUベース故障注入器

    小林 佑矢, 實本 英之, 野村 哲弘, 松岡 聡

    情報処理学会研究報告(Web) (IPSJ Technical Report (Web))   2017 ( HPC-160 )   2015.7

     More details

    並列計算機の大規模化で,Silent Data Corruption(SDC)による信頼性低下が懸念されている。SDCは検出が困難な障害で,対応にはコストがかかる。適切な方法を構築・選択するには,故障注入によるオーバーヘッドや耐故障性の評価が重要になる。しかし,これまでの故障注入器はランダムなビットフリップを行うものが多く,ハードウェア特有の故障パターンを再現できない。本研究では実故障の注入を目的として,仮想マシンエミュレータQEMUを拡張し,故障注入器MH-QEMUを作成した。MH-QEMUでは,メモリ状態の変更のみならず,仮想マシンのメモリへのアクセスを検知・処理できるメモリアクセスハンドラ機能を実現した。これによりメモリアクセスパターン依存故障や永続的故障を注入できる。これらの機能のオーバーヘッドは仮想マシン上のワークロードごとに異なり,NAS Parallel Benchmarks(NPB)を用いた場合には,もっともよい場合で実行時間が約20倍で抑えられることを確認した。さらに,NPBのCGカーネルに対し,シングルビットフリップの注入では約100%の割合で計算が正常終了したが,Row-Hammerの注入では,約40%の割合で異常終了が起き,3%の割合でSDCが発生することを確認した。(著者抄録)

    researchmap

  • TSUBAME2におけるジョブスケジューリング効率化への取り組みと検証

    野村 哲弘, 佐々木 淳, 三浦 信一, 遠藤 敏夫, 松岡 聡

    情報処理学会研究報告(Web) (IPSJ Technical Report (Web))   2015 ( HPC-150 )   2015.7

     More details

    スーパーコンピュータの資源利用の効率化のためには,投入されるジョブの情報を正確に把握し,ジョブのスケジューリングを最適化することが重要である。東京工業大学学術国際情報センターのTSUBAME2においても,各種のログ情報・センサー情報を蓄積していたが,蓄積したデータの分析は十分なものではなかった。本報告では,TSUBAMEにおいて行われているユーザの資源指定を正確なものにするための取り組みと,その成果を確認するための各種ログ情報・センサー情報の解析について報告する。(著者抄録)

    researchmap

  • Performance Optimization of Large-Scale Traffic Simulation on Parallel and Distributed Systems

    Hiroki Kanezashi, Toyotaro Suzumura, Satoshi Matsuoka

    2015.7

     More details

  • 計算科学と計算機科学のコデザインのためのミニアプリ(ミニアプリ集FIBERの紹介/アプリケーションのEmpiricalな性能モデル構築のためのプロファイル情報の収集/FIBERミニアプリの性能およびそのモデル化)

    丸山 直也, 鈴木 惣一朗, 三上 和徳, 小村 幸浩, 滝澤 真一朗, 松田 元彦, 野村 哲弘, 三浦 信一, 遠藤 敏夫, 松岡 聡

    ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集   ( 2015 )   107 - 108   2015.5

     More details

    Language:Japanese  

    researchmap

  • GPU搭載システムにおける都市気流シミュレーションの大規模化と性能モデル

    高嵜 祐樹, 遠藤 敏夫, 松岡 聡

    情報処理学会研究報告. [ハイパフォーマンスコンピューティング]   2015 ( 13 )   1 - 8   2015.2

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    GPU 向けステンシル計算の規模は,通常 GPU のメモリ容量に制限されるが,テンポラル・ブロッキングと呼ばれる手法により性能劣化なく大規模化を実現可能である.本研究では,10,000 行を超えるコード規模を持つ GPU クラスタ向けアプリケーションである都市気流シミュレーションの大規模化・高性能維持を実現する手法として,HHRT をテンポラルブロッキングを組み合わせた手法を導入した結果,大規模化に対して,性能劣化とプログラミングコストを抑えることに成功した.本研究では,更なる性能最適化のために,HHRT のスワップデータサイズを削減する手法を提案する.その結果,性能が約 1.3〜1.9 倍向上し,元プログラムの約 19〜85 %の性能を達成した.さらに性能予測モデルの構築により,性能に影響を与えるパラメータの絞り込みを可能にした.

    CiNii Books

    researchmap

  • Towards Cloud-based Burst Buffers for I/O Intensive Computing in Cloud

    Tianqi Xu, Kento Sato, Satoshi Matsuoka

    2015.2

     More details

  • mrCUDA: Low-Overhead Middleware for Transparently Migrating CUDA Execution from Remote to Local GPUs

    Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

    2015

     More details

  • OpenCLによるFPGAの予備評価

    丸山直也, Hamid Reza Zohouri, 松田元彦, 松岡聡

    情報処理学会 研究報告ハイパフォーマンスコンピューティング(HPC)   2015-HPC-150   2015

     More details

  • Efficient Utilization of Multi-level Memory System for Stencil Computation (Unrefereed Workshop Manuscript)

    Tianqi Xu, Guanghao Jin, Toshio Endo, Satoshi Matsuoka

    IPSJ SIG Notes   2014 ( 10 )   1 - 7   2014.12

     More details

    Language:English   Publisher:Information Processing Society of Japan (IPSJ)  

    This paper is to efficiently use the multi-level memory system for stencil computation to enable Tera-Scale computation by single GPU. We build a performance model to explain the relationship between different memories and propose a new algorithm to reduce the communication cost between memories and efficiently use the capacity of memories. We evaluated 7 point stencil computation on the multi-level memory system which includes GPU memory, CPU memory and SSD. The evaluation on the real system shows that our algorithm enables the computation on the 23 times bigger domain than GPU memory capacity as well as achieves 5.5 times higher performance than other optimization methods.

    CiNii Books

    researchmap

  • HPC and Interactive Big Data Analytics: Case Study of Distributional Semantics

    Aleksandr Drozd, Satoshi Matsuoka

    IPSJ SIG Notes   2014-HPC-146(12)   2014.10

     More details

  • GPUクラスタ上の実ステンシルアプリケーションの大規模化に向けた局所性向上の評価

    高嵜祐樹, 遠藤敏夫, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2014 ( 23 )   1 - 8   2014.9

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    GPU の登場により,CFD で使われているステンシル計算は性能向上している.しかし,問題サイズの大きさはホストメモリより小さい GPU メモリの容量に制限されてしまっている.この問題に対して,ステンシル計算に対応したメモリアクセスの局所性を向上させる手法による解決策が提案されているが,プログラミングコストの増加が問題となっており,大規模なステンシルアプリケーションへの適応は難しいと考えられる.本研究の目的は,ステンシルアプリケーションにおける高性能化,大規模化,低プログラミングコストの 3 つを実現することである.その実現のために,実局所性向上アルゴリズムと CPU-GPU 間のデータ転送を自動化するメモリスワップランタイムを組み合わせたプログラミングモデルを提案する.本研究では,実ステンシルアプリケーションである都市気流シミュレーションに提案手法を適用し,その性能評価を行った.

    CiNii Books

    researchmap

  • 大規模分散メモリ環境におけるハイブリッドBFSの最適化

    上野 晃司, 鈴村 豊太郎, 丸山 直也, 松岡 聡

    2014.9

     More details

    近年,Web グラフやソーシャルグラフなど大規模なグラフデータが多くあり,大規模グラフ解析への関心が高まっている.本論文では,比較的直径の短いグラフで有効な幅優先探索 (BFS) アルゴリズムであるハイブリッド BFS を,計算ノードが数千〜数万あるような大規模なスーパーコンピュータ上で効率よく計算する手法を提案する.ビットマップを使った疎行列表現や,頂点濃度に応じたデータ構造選択,ボトムアップ探索の並列性を上げることによる効率化を行い,数万ノード規模でのスケーラビリティを得られた.「京」 を使った性能評価では,65,536 ノードで 17,997GTEPS の性能を達成し,2014 年 6 月の Graph500 ランキングにおいて 「京」 は 1 位を獲得した.

    researchmap

  • Increasing GPU batch queue’s utilization using rCUDA

    Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

    IPSJ SIGTechnical Report   2014-HPC-145 ( 24 )   2014.7

     More details

    In heterogeneous supercomputer, GPU job queue whose nodes compose of multiple GPUs can be under-utilized dueto resource-assignment fragmentation. For example, in the case that each node has three GPUs like TSUBAME2.5, ifa node has already been assigned to a job requesting two GPUs, that node cannot be assigned to another job requestingmore than one GPU until the current job leaves the node.We examine this problem on TSUBAME2.5’s GPU batch-queue system, and present a scheduling algorithm that usesrCUDA to alleviate it. Our simulation shows that the proposed scheduling algorithm can finish all simulated jobson simulated congesting queue by 15% - 30% faster. Moreover, using jobs patterns obtained from scheduler log ofTSUBAME GPU queue, the proposed algorithm shows 5.06% decrease in job life time (from arrives until finishesprocessing) on average. It also shows that even reducing the number of nodes in the queue by around 4% the averagejobs life time is still around the same as the present algorithm

    researchmap

  • GPU間マイグレーションによる効率的な並列実行

    鈴木太一郎, 額田彰, 松岡聡

    情報処理学会研究報告   Vol.2014-HPC-145(42)   2014.7

     More details

  • Visualizing Collectives over InfiniBand Networks

    Kevin Brown, Jens Domke, Satoshi Matsuoka

    IPSJ SIG Technical Report   2014-HPC-145 ( 13 )   2014.7

     More details

    As the scale of high performance computing systems increases, optimizing interprocess communicationbecomes more challenging while being critical for ensuring good performance. Furthermore, the hardware layer ab-straction provided by MPI makes it difficult to perform any application optimization that links network utilization withapplication communication. We overcome this barrier by extending the Peruse utility in Open MPI to track networkevents within MPI operations from the application layer. We also develop a non-intrusive profiling library to makeuse of our Peruse enhancement and show how we can use BoxFish with our profiling library to visualize the flow ofapplication traffic over each link within large scale InfiniBand networks. The tool-chain that we describe can be usedwithout any modification to the target application and incurs less than 1% application runtime overhead

    researchmap

  • Towards Cloud Bursting for Extreme Scale Supercomputers

    Tianqi Xu, Kento Sato, Satoshi Matsuoka

    2014-HPC-145 ( 5 )   1 - 8   2014.7

     More details

    Extreme-scale HPC systems, which consist of a large number of compute nodes, can provide high computational capacity for multiple users. However, computing nodes in the systems occasionally can not meet the demand due to bursty job requests in short period times. In order to accommodate the bursty requests, we consider federating HPC systems with public clouds, which is known as cloud bursting. Although the federated systems can acquire virtually infinite computational power with cloud bursting, the QoS may not be guaranteed due to a significant performance gap between HPC systems and public clouds. The most critical problem is a gap in I/O performance. In this paper, we propose an I/O acceleration technique using distributed cloud bursting buffers. We also create the I/O performance model to explore the effectiveness. Our model-based simulations, which target the TSUBAME supercomputer for an HPC system, and AMAZON EC2 for a public cloud, show that the distributed cloud busting buffer can improve I/O throughput while reducing the cost.

    researchmap

  • 実アプリケーションにおけるウェーブレット変換を用いたチェックポイントデータの非可逆圧縮手法

    佐々木尚人, 佐藤賢斗, 遠藤敏夫, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2014-HPC-145 ( 7 )   1 - 8   2014.7

     More details

    近年,HPC システムやスーパーコンピュータの規模は急速に拡大しつつあり,それに伴いシステムの平均故障間隔が短縮してしまう傾向にある.また,多くのシステムでは耐故障機能としてチェックポインティングが採用されているが,将来的にチェックポイント時間が平均故障間隔を上回ってしまう可能性があることが問題視されている.そこで,我々はチェックポイント時間を短縮するため,チェックポイントデータの非可逆圧縮手法を提案する.具体的には,チェックポイントデータに対してウェーブレット変換,量子化,符号化に加えてスタンダードな圧縮手法を適用することで非可逆圧縮を行う.本研究ではこの提案手法を気象アプリケーション NICAM のチェックポイント対象データに適用し,発生する誤差,圧縮率,圧縮時間について測定,評価を行った.その結果,特定の条件下で,相対誤差の最大が 5% 以内で,チェックポイント時間を約 70%短縮できることを確認した.

    researchmap

  • Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)

    Keisuke Fukuda, Naoya Maruyama, Jeremy S.Meredith, Jeffrey S.Vetter, Satoshi Matsuoka

    IPSJ SIG Notes   2014 ( 26 )   1 - 8   2014.7

     More details

    Language:English   Publisher:Information Processing Society of Japan (IPSJ)  

    Hierarchical algorithms are considered to be important in next-generation large scale scientific computing. Such algorithms are typically compute-intensive and have higher communication locality that are beneficial on future supercomputers with much less B/F ratio. However, one of the big challenges of such algorithms is that the data structures and computation/communication patterns are irregular and it is difficult to analyze and predict the performance. In this paper, we introduce a performance modeling method for Fast Multipole Method, a typical example of hierarchical algorithms for N-body problems, using a domain specific performance modeling language Apsen. We show that our modeling scheme can adapt to various particle distributions parameters and provides useful information to application researchers to optimize algorithmic parameters.

    CiNii Books

    researchmap

  • OpenACCディレクティブ拡張によるデータレイアウト最適化

    星野哲也, 丸山直也, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2014 ( 45 )   1 - 8   2014.7

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    近年増加傾向にある GPU 等のアクセラレータを搭載した計算環境への既存プログラムの移植方法として,CUDA・OpenCL に代表されるローレベルなプログラミングモデルを用いる方法に対し,ディレクティブベースの OpenACC のようなハイレベルなプログラミングモデルを用いる方法が注目されている.このようなディレクティブベースのプログラミングモデルの利点として,元のプログラムを維持したまま移植を行えるために,デバイス間の機能的な可搬性が高いことがあげられる.しかし現状の OpenACC などの High-level なプログラミングモデルは,スカラプロセッサとメニーコアアクセラレータの得意とするデータレイアウトの相違に対応することが出来ず,異なる性質を持ったデバイス間の性能可搬性に問題がある.そこで本研究では,データレイアウトを抽象化し,異なるデバイス間での性能可搬性を向上させるための OpenACC の拡張ディレクティブを試作し,姫野ベンチマークのデータレイアウトをトランスレーターにより変更し,マルチコア CPU,Intex Xeon Phi,K20X GPU のそれぞれで評価を行った.その結果,オリジナルと同一のデータレイアウトと比較して,Intel Xeon Phi では 27%,K20X GPU では 24%の性能向上が得られることを確認した.

    CiNii Books

    researchmap

  • Performance modeling of a tree-based hierarchical N-body algorithm with arbitrary particle distributions

    Keisuke Fukuda, Naoya Maruyama, Jeremy S.Meredith, Jeffrey S.Vetter, Satoshi Matsuoka

    2014.7

     More details

  • TSUBAME-KFC : the Greenest Supercomputer in the World With Liquid Submersion Cooling

    Tsubame ESJ. : e-science journal   11   18 - 23   2014.6

     More details

    Language:English  

    researchmap

  • TSUBAME-KFC : the Greenest Supercomputer in the World With Liquid Submersion Cooling

    Tsubame ESJ. : e-science journal   11   2 - 7   2014.6

     More details

    Language:Japanese  

    researchmap

  • GPUのキャッシュを考慮した疎行列ベクトル積計算手法の性能評価

    長坂侑亮, 額田彰, 松岡聡

    情報処理学会研究報告   014-HPC-144 ( 5 )   2014.5

     More details

  • Lustre 2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

    Hitoshi Sato, Shuichi Ihara, Satoshi Matsuoka

    2014.4

     More details

  • Abstractions for Convergence of Big Data and HPC in Deep Memory Hierarchy Machines

    Satoshi Matsuoka, Hitoshi Sato

    Workshop on Programming Abstractions for Data Locality (PADAL 2014)   2014.4

     More details

  • 自動テンポラルブロッキングによる大規模ステンシル計算の実現

    河村知輝, 丸山直也, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2014 ( 32 )   1 - 6   2014.2

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    偏微分方程式を解く際に差分法を用いるとステンシル計算に帰着する.この計算は高いメモリバンド幅を要求するため GPU を用いることで高速化が可能である.しかし GPU メモリ容量は小さく,大規模な問題を解く際に GPU メモリ容量が制限となってしまう.この問題に対して,テンポラルブロッキングを行うことで性能低下なく GPU メモリ容量以上の大きなドメインを解くことができることを示す先行研究があるが,プログラミングコストが高いという問題を抱えている.そこで,本研究ではこのテンポラルブロッキングをフレームワークに組み込むことで自動最適化を実現した.また,ブロッキング段数などのパラメータの最適値を導出するために性能モデルを構築した.

    CiNii Books

    researchmap

  • CPU-GPUそれぞれに最適なデータレイアウトを選択可能にするOpenACCディレクティブ拡張

    星野哲也, 丸山直也, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2014 ( 5 )   1 - 5   2014.2

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    近年増加傾向にある GPU 等のアクセラレータを搭載した計算環境への既存プログラムの移植方法として,CUDA・OpenCL に代表される Low-level なプログラミングモデルを用いる方法に対し,ディレクティブベースの OpenACC のような High-level なプログラミングモデルを用いる方法が考えられる.このようなディレクティブベースのプログラミングモデルの利点として,元のプログラムを壊さずに移植を行えるために,デバイス間の可搬性が高いことがあげられる.しかし現状の OpenACC などのプログラミングモデルは,スカラプロセッサとメニーコアアクセラレータの得意とするデータレイアウトの相違等に対応することが出来ず,異なる性質を持ったデバイス間の性能可搬性に問題がある.そこで本研究では,データレイアウトを抽象化し,異なるデバイス間での性能可搬性を向上させるための OpenACC の拡張ディレクティブを試作し,評価を行った.

    CiNii Books

    researchmap

  • Multi-level Temporal Blocking for Stencil Computation for Memory Hierarchy on TSUBAME2.5

    Guanghao Jin, Toshio Endo, Satoshi Matsuoka

    IPSJ SIG Notes   2014 ( 33 )   1 - 8   2014.2

     More details

    Language:English   Publisher:Information Processing Society of Japan (IPSJ)  

    The domain of the stencil computation is limited by the memory capacity of GPUs on a GPU cluster. As the domain grows to cope with higher accuracy requirements, more GPUs need to be employed to extend the memory capacity. In this paper, we propose new methods which apply temporal blocking method to device memory and registers of a set of GPUs to allow computations on the domain that is bigger than the memory capacity of GPUs while maintaining high performance on TSUBAME2.5. We also analyze the parameters and performance differences between TSUBAME2.0 and TSUBAME2.5 to apply our methods to wide range GPU clusters.

    CiNii Books

    researchmap

  • 不揮発性メモリを用いたHybrid BFSアルゴリズム

    岩渕圭太, 佐藤仁, 溝手竜, 安井雄一郎, 藤澤克樹, 松岡聡

    研究報告アルゴリズム(AL)   2014 ( 7 )   1 - 1   2014.2

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    近年、SNS 解析、道路ネットワークの経路探索、スマートグリッド、創薬、遺伝子解析等の様々な分野で大規模なグラフに対する高速処理が求められているが、従来手法では、妥当な性能を得るためには全てのデータを DRAM 上にロードして実行する必要があり、その結果、DRAM の容量を増設することによる消費電力、価格の面でのコストの増加が問題になっている。そこで、我々は、BFS に対して NVM(不揮発性メモリ) を補助的に利用することで、DRAM の容量を超えるサイズのグラフを性能低下を抑えながら高速に処理する手法を提案し、開発を進めている。現時点で、省電力なビッグデータ処理のランキングである GreenGraph500 (2013 年 11 月) のビッグデータカテゴリのリストで 4 位 (1 ノードでは世界一) を達成した。

    CiNii Books

    researchmap

  • Burst SSD Buffer: Checkpoint Strategy at Extreme Scale

    Kento Sato, Satoshi Matsuoka, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R.DeSupinski, Naoya Maruyama

    IPSJ SIG Notes   2013 ( 19 )   1 - 9   2013.9

     More details

    Language:English   Publisher:Information Processing Society of Japan (IPSJ)  

    Checkpointing is an indispensable fault tolerance technique, commonly used by HPC applications that run continuously for hours or days at a time. However, when checkpointing extreme scale systems, the bursty nature of the I/O pattern of checkpointing overburdens file systems and also causes huge overhead to be added to an application's runtime. In order to alleviate the overhead and achieve fast checkpoint/restart, we propose a highly-resilient mini-SSD-based burst buffer system, and explore a checkpoint strategy on the system based on our checkpointing model.

    CiNii Books

    researchmap

  • 不揮発性メモリを用いたHybrid-BFSアルゴリズムの最適化と性能解析

    岩渕圭太, 佐藤仁, 安井雄一郎, 藤澤克樹, 松岡聡

    情報処理学会研究報告. [ハイパフォーマンスコンピューティング]   2013 ( 3 )   1 - 9   2013.9

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    近年さまざまな分野で大規模なグラフに対する高速な処理が求められているが,その処理の特性上,妥当な性能を得るためには全てのデータを DRAM 上にロードして実行する必要があり,その結果,DRAM の容量を増設することによる消費電力,価格面でのコストの増加が問題となっている.そこで,Hybrid-BFS アルゴリズムに対して不揮発性メモリを補助的に利用した場合の I/O の最適化,性能低下要因の解析を行うことで性能低下を抑えながら大規模グラフ処理が実行可能かの評価を行った.その結果,一部データを不揮発性メモリに退避することで DRAM 用量が半分の環境において性能低下を 47.1% まで抑えることができた.また,参照され難いエッジデータをさらに退避することで性能の低下を抑えながらより DRAM 使用量が削減可能なことの確認,さらに,性能低下要因の特定とその改善案を示し,性能低下を抑えながら大規模グラフ処理の実現可能性が示唆された.

    CiNii Books

    researchmap

  • 不揮発性メモリを用いたGraph500ベンチマークの大規模実行へ向けた予備評価

    岩渕圭太, 佐藤仁, 安井雄一郎, 藤澤克樹, 松岡聡

    先進的計算基盤システムシンポジウム論文集   2013   130 - 131   2013.5

     More details

    Language:Japanese  

    researchmap

  • 大規模ヘテロ型スーパーコンピュータ向けデータ並列処理フレームワークの設計と実装

    佐藤仁, 白幡晃一, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2013 ( 24 )   1 - 7   2013.2

     More details

    Language:Japanese  

    我々は,現在,数千~数万のアクセラレータを搭載したスパコン上でのスケーラブルなデータ並列処理を目指したソフトウェア基盤として Hamar(Highly Accelerated MapReduce) の開発を進めている.本稿では,その初期設計と実装について述べ,アプリケーションとして,MapReduce に基づいた汎用グラフ処理モデルである GIM-V へ適用した事例を述べる.適用した結果,Hamar では,Map,Reduce 処理は CUDA 及び OpenMP で柔軟に記述できることを確認した.また,予備実験として,両実装を 1 台の GPU が搭載された単一計算ノード上で動作させたところ,Map 処理は平均して CUDA 版が OpenMP 版と比較して平均 1.2 倍の性能向上を示し,Reduce 処理は 10 倍以上の性能低下を示した.この構成は,単一計算ノードに GPU 1 台が接続された環境であり,CUDA 版の実装では不利な条件での結果であったものの,更なる大規模計算環境への適用や,性能最適化,自動タスクスケジューリングなどの課題が明らかになった.

    CiNii Books

    researchmap

  • 不揮発性メモリを用いたGraph500ベンチマークの大規模実行へ向けた予備評価

    岩渕圭太, 佐藤仁, 安井雄一郎, 藤澤克樹, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2013 ( 31 )   1 - 6   2013.2

     More details

    Language:Japanese  

    近年大規模グラフはさまざまな分野で出現しており,DRAM の容量を増設することによる消費電力増加の問題やそもそもシングルノード上の DRAM 容量を超えるグラフも出現している.本研究ではGraph 500 ベンチマークに対して不揮発性メモリを補助的に利用することで性能低下を最小限に押さえながらシングルノード上でできる限り大容量のグラフを扱えるようにすることを目指している.そこでまず本論文ではDRAM に乗りきらない問題サイズを実行するための手法を提案し,DRAM と不揮発性メモリの容量の比率が実行性能にどのような影響を与えるかについての予備評価を行った.

    CiNii Books

    researchmap

  • ディレクティブベースプログラミング言語OpenACCの性能評価

    星野哲也, 丸山直也, 松岡聡

    ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集   2013   91 - 91   2013.1

     More details

    Language:Japanese  

    researchmap

  • Extreme Big Data時代に向けたTSUBAMEスーパーコンピュータでの取り組み

    佐藤 仁, 松岡 聡

    大学ICT推進協議会年次大会論文集   8p   2013

     More details

    Language:Japanese   Publisher:[大学ICT推進協議会]  

    researchmap

  • Evaluating Resilience Towards Exascale-Tsubame2.0 as an Example-

    松岡聡, 佐藤賢斗, 佐藤賢斗, 遠藤敏夫

    情報処理学会研究報告(Web)   2013 ( HPC-141 )   2013

  • TSUBAME2.0におけるMulti-rail InfiniBandネットワークの性能評価

    野村哲弘, 遠藤敏夫, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2012 ( 3 )   1 - 5   2012.12

     More details

    Language:Japanese  

    TSUBAME2.0 のネットワークは Fat tree トポロジであるものの,大規模実行時に集団通信性能が劣化することが観測されている.本稿では想定される原因としてスイッチ間リンクにおけるパケット衝突とスイッチ間リンクの性能劣化に着目し,それぞれの問題を緩和するネットワーク設定を提示し,バンド幅および集団通信性能への影響を示す.ネットワーク設定の改善の結果,通信の確率的な遅延の発生をほぼなくすことができ,大規模実行時のインジェクションバンド幅において 16.0%~39.5% の性能向上を確認した.

    CiNii Books

    researchmap

  • TSUBAME2.0におけるMulti-rail InfiniBandネットワークの性能評価

    野村哲弘, 遠藤敏夫, 松岡聡

    研究報告計算機アーキテクチャ(ARC)   2012 ( 3 )   1 - 5   2012.12

     More details

    Language:Japanese   Publisher:情報処理学会  

    TSUBAME2.0 のネットワークは Fat tree トポロジであるものの,大規模実行時に集団通信性能が劣化することが観測されている.本稿では想定される原因としてスイッチ間リンクにおけるパケット衝突とスイッチ間リンクの性能劣化に着目し,それぞれの問題を緩和するネットワーク設定を提示し,バンド幅および集団通信性能への影響を示す.ネットワーク設定の改善の結果,通信の確率的な遅延の発生をほぼなくすことができ,大規模実行時のインジェクションバンド幅において 16.0%~39.5% の性能向上を確認した.

    CiNii Books

    researchmap

  • A Fast Stencil Computation Method for the Domain to Surpass Memory Capacity of GPU

    2012 ( 31 )   1 - 6   2012.12

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • TSUBAME2.0におけるMulti-rail InfiniBandネットワークの性能評価

    野村哲弘, 遠藤敏夫, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2012 ( 3 )   1 - 5   2012.12

     More details

    Language:Japanese  

    TSUBAME2.0 のネットワークは Fat tree トポロジであるものの,大規模実行時に集団通信性能が劣化することが観測されている.本稿では想定される原因としてスイッチ間リンクにおけるパケット衝突とスイッチ間リンクの性能劣化に着目し,それぞれの問題を緩和するネットワーク設定を提示し,バンド幅および集団通信性能への影響を示す.ネットワーク設定の改善の結果,通信の確率的な遅延の発生をほぼなくすことができ,大規模実行時のインジェクションバンド幅において 16.0%~39.5% の性能向上を確認した.

    CiNii Books

    researchmap

  • 動的タスクスケジューリングエンジンStarPUによるKIFMMの実装と性能評価

    福田圭祐, 丸山直也, MiquelPericas, 松岡聡

    研究報告ハイパフォーマンスコンピューティング(HPC)   2012 ( 13 )   1 - 7   2012.9

     More details

    Language:Japanese  

    Fast Multipole Method (FMM) は, N 体問題のアルゴリズムで,近似計算により O(N) の計算量を実現する. FMM は,計算特性が異なり入力データによって負荷が変動する複数の計算ステップから構成される.本研究では,FMM の入力データ (粒子分布) による負荷変動に対して CPU/GPU 間の負荷分散を適切に行うことを目的とする.そのための手法として,動的タスクスケジューリングエンジンを採用し,そのためのライブラリである StarPU 上に Kernel Independent FMM (KIFMM) アプリケーションを実装し,性能を評価した.この実装を,入力データ毎の総当たりによって最適な静的スケジューリングを決定することができる実装と比較した.均一分散では単純なヒューリスティクスを 1 つ導入することにより静的スケジューリング実装に対して 137.9% ,球表面 (不均一) 分散においてはヒューリスティクスを用いずに同実装に対して 89.5% の性能を得た.このことから,動的タスクスケジューリングを用いることにより,最適な静的スケジューリング実装に対して競争的なパフォーマンスを発揮しつつ,入力データによる負荷変動に抗して負荷分散を実現することが可能であると言える.

    CiNii Books

    researchmap

  • Towards a Dataflow FMM using the OmpSs Programming Model

    2012 ( 12 )   1 - 7   2012.9

     More details

    Language:English  

    CiNii Books

    researchmap

  • Challenges in Green Supercomputing towards 50GFlops/W, PUE<1,100KW/rack, for TSUBAME30 and future Exascale

    Matsuoka Satoshi

    IEICE technical report. Internet Architecture   112 ( 212 )   63 - 63   2012.9

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    The current global supercomputing grand-challenge it to achieve exaflop within the 20MW power budget by 2020.This is 25 times the power efficiency of the most power efficient supercomputer in the world, namely BlueGene/Q, and requires not only imovations in memory storage, network, and cooling. Such innovations will not only benefit supercomputing but will have broad impact in the overall IT infrastructure. At Tokyo tech we have been awarded the"Greenest Production Supercomputer in the World" award from the Green500 in 2010 with our Tsubame2.O supercomputer, and currently is designing Tsubame3.O which will be a stepping stone for this drive to exascale.

    CiNii Books

    researchmap

  • Evaluation of Portability for a Real-world CFD Application with CUDA and OpenACC

    2012 ( 42 )   1 - 9   2012.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • 大規模流体アプリケーションのGPUによる高速化手法の評価

    星野哲也, 丸山直也, 松岡聡

    先進的計算基盤システムシンポジウム論文集   2012   73 - 74   2012.5

     More details

    Language:Japanese  

    CiNii Research

    researchmap

  • A Multi GPU Implementation of Generalized Graph Processing Model GIM-V with Data Transfer Optimization

    2012 ( 34 )   1 - 8   2012.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Physis:ヘテロジニアススパコン向けステンシル計算フレームワーク

    丸山直也, 野村達男, 佐藤賢斗, 松岡聡

    Tsubame e-Science Journal   ( 5 )   2012

  • 【招待講演】TSUBAME2.0との1年間とエクサスケールへの飛翔

    松岡聡

    研究報告数理モデル化と問題解決(MPS)   2011 ( 1 )   1 - 1   2011.11

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Towards Optimizations of FMM on CPU-GPU Heterogeneous Environments using Dynamic Task Scheduling Runtimes

    Keisuke Fukuda, Naoya Maruyama, Satoshi Matsuoka

    IPSJ SIG Notes   2011 ( 28 )   1 - 9   2011.11

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    FMM is an O(N) approximative algorithm for N-body problems and recognized more scalable and promising than other N-body computation methods. Effectively utilizing heterogeneous systmes in FMM, however, is a challenging issue because FMM consists of several phases with different performance characteristics that call for careful load balancing for optimal performance. This paper extends our previous work18) that partially ported the CPU implementation of kifmm3d to CUDA, and presents a complete CUDA implementation. To exploit heterogeneous processing elements, we further extend the implementation with StarPU, which allows dynamic task scheduling on CPU-GPU heterogeneous environments. We have found several technical issues and challenges, such as failing CUDA kernel invocations, phase splitting and implementation of filters, to achieve a good load balancing.

    CiNii Books

    researchmap

  • Towards an Asynchronous Checkpointing System

    2011 ( 18 )   1 - 8   2011.11

     More details

    Language:English  

    CiNii Books

    researchmap

  • Operation of TSUBAME 2.0 Green Supercomputer dealing with Power Crisis

    Toshio Endo, Satoshi Matsuoka, Akira Nukada, Masamichi Nagasaka, Tadayasu Yotsu

    IPSJ SIG Notes   2011 ( 12 )   1 - 9   2011.11

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We report the operation of TSUBAME2.0 supercomputer dealing with the power crisis caused by the poweful earthquake on March 11, 2011. While saving energy consumtion is and will be the most important issue in design and operation of supercomputers, capping 'peak power consumption' also becomes essential in the power crisis. We report measures taken on operation of TSUBAME2.0 in this summer within the limitation on time and resources, and issues to be solved.

    CiNii Books

    researchmap

  • Achievement of Linpack Performance of over 1PFlops on TSUBAME 2.0 Supercomputer

    4 ( 4 )   169 - 179   2011.10

     More details

  • Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index

    2011 ( 13 )   1 - 4   2011.7

     More details

    Language:English  

    CiNii Books

    researchmap

  • Analysis of Workflow Aplication I/O Performonce on Large Parallel File System

    2011 ( 40 )   1 - 8   2011.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Towards On Demand Hierarchical Data Store for Massive Amounts of Small File Access

    2011 ( 27 )   1 - 8   2011.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Design of Advanced Software Deployment Infrastructure in HPCI Wide-area Distributed Environment

    2011 ( 68 )   1 - 7   2011.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Towards GPGPU-Based Large-Scale Fast Graph Processing

    2011 ( 14 )   1 - 8   2011.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Ultralow-power, high-performance computation

    MATSUOKA Satoshi

    80 ( 7 )   579 - 584   2011.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Optimization of Resource Allocation for Data-intensive Workflow Applications

    2010 ( 6 )   1 - 7   2011.4

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Performance Studies with Hadoop in the TSUBAME2.0 Supercomputer

    2010 ( 6 )   1 - 8   2011.4

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Optimization of FMM on CPU-GPU heterogeneous environment

    2010 ( 6 )   1 - 8   2011.4

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • High Performance Large Data Transfer for Inter-Clouds

    2010 ( 6 )   1 - 7   2011.4

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Performance Evaluation of TSUBAME 2.0 Heterogeneous Supercomputer with Linpack Benchmark

    2010 ( 5 )   1 - 6   2011.2

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Optimization of electric power efficiecy based on model in GPU

    2010 ( 2 )   1 - 6   2010.12

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Towards Characteristic-aware Optimization of OpenCL programs on Heterogeneous GPUs

    2010 ( 23 )   1 - 7   2010.12

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • A Code Generation Framework for Stencil Computations on Large Scale GPU Clusters

    2010 ( 7 )   1 - 9   2010.12

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Improving MapReduce Task Scheduling for CPU-GPU Heterogeneous Environments

    2010 ( 3 )   1 - 7   2010.10

     More details

    Language:Japanese  

    researchmap

  • Resource Federation for e-science by a Point-of-Presence

    TAKIZAWA Shin'ichiro, MATSUOKA Satoshi, SATO Hitoshi, HIGASHIDA Manabu, TOMOISHI Masahiko

    IEICE technical report. Internet Architecture   110 ( 206 )   19 - 24   2010.9

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    As an e-Science infrastructure, We propose a network environment where site resources are federated by a point-of-presence named RENKEI-PoP. RENKEI-PoPs are located in sites that provide resources for e-Science, are integrated with site local resources, and relay communications between sites by cooperating with each other using a grid security infrastructure. RENKEI-PoP provides 1) a virtual machine hosting environment that executes e-science infrastructure services and 2) a general-purpose data transfer/sharing environment. We installed RENKEI-PoPs in eight sites in Japan and connected them to SINET 10Gbps network. We show the current RENKEI-PoP system and its network and storage access performance.

    CiNii Books

    researchmap

  • The total picture of TSUBAME 2.0

    Tsubame ESJ.   1   2 - 4   2010.9

     More details

    Language:Japanese  

    researchmap

  • POP (Point-of-Presence) Linkage between Computer Centers as an E-Science Infrastructure

    TAKIZAWA SHIN'ICHIRO, MATSUOKA SATOSHI, SATO HITOSHI, HIGASHIDA MANABU, TOMOISHI MASAHIKO, JITSUMOTO HIDEYUKI

    126   e1 - e8   2010.8

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Towards an Automatic Code Generation Framework for Parallel Stencil Computations on GPU Clusters

    NOMURA TATSUO, MARUYAMA NAOYA, ENDO TOSHIO, MATSUOKA SATOSHI

    126 ( 9 )   I1 - I10   2010.8

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Improving MapReduce Task Scheduling for CPU-GPU Heterogeneous Environments

    SHIRAHATA KOICHI, SATO HITOSHI, MATSUOKA SATOSHI

    126 ( 5 )   E1 - E8   2010.8

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • MPI-CUDA Applications Checkpointing

    TOAN Nguyen, NOMURA Tatsuo, JITSUMOTO Hideyuki, MARUYAMA Naoya, ENDO Toshio, MATSUOKA Satoshi

    2010 ( 18 )   1 - 7   2010.7

     More details

    Language:English  

    CiNii Books

    researchmap

  • Dynamic Optimization for Large Data Broadcast on Clouds

    3 ( 2 )   126 - 137   2010.6

     More details

    Language:Japanese  

    Data-intensive parallel applications on clouds need to deploy large data sets from the cloud's storage facility to all compute nodes as fast as possible. Many optimal broadcast algorithms have been proposed for clusters and grid environments. The most common approach is, for example, to construct one or more optimal spanning trees, which can maximize available bandwith or avoid bottleneck links based on network topology and network monitoring data. Once available bandwidth changes dynamically, however, it is difficult to keep optimal performance. In this paper we focus on Amazon EC2/S3, which is most commonly used clouds, and we propose high performance broadcast algorithms; these algorithms make it possible to broadcast large data from Amazon S3 to multiple Amazon EC2 nodes. The salient features of our algorithms are to construct an overlay network on clouds without network topology information, to optimize node available throughput dynamically, and to increase the download throughput by letting nodes cooperate with each other. As a result, all nodes can download files from S3 quickly, even when the network performance changes while the algorithm is running. We evaluate our algorithms on EC2/S3, and show that they are scalable and consistently achieve high throughput. Both algorithms perform much better than each node downloads all data directly from S3.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00069740/

  • Auto-Tuning of a Scientific Application on GPU clusters

    IPSJ SIG Notes   2009 ( 6 )   1 - 9   2010.4

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

  • Auto-Tuning of a Scientific Application on GPU clusters

    WATANABE Yuya, ENDO Toshio, MATSUOKA Satoshi

    124 ( 18 )   R1 - R7   2010.2

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Power-Aware Task Scheduling on GPU Accelerated Clusters

    HAMANO TOMOAKI, NUKADA AKIRA, ENDO TOSHIO, MATSUOKA SATOSHI

    124 ( 17 )   Q1 - Q9   2010.2

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • A Resource Selection Support Expert System for Large-Scale Computing Environments

    KOKUBU RIO, SATO HITOSHI, MATSUOKA SATOSHI

    124 ( 12 )   L1 - L8   2010.2

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Linpack Evaluation on the TSUBAME Supercomputer with Hybrid Accelerators(<Special Topics>GPGPU Computing)

    Endo Toshio, Nukada Akira, Matsuoka Satoshi

    Bulletin of the Japan Society for Industrial and Applied Mathematics   20 ( 2 )   117 - 124   2010

     More details

    Language:Japanese   Publisher:The Japan Society for Industrial and Applied Mathematics  

    This paper reports Linpack benchmark evaluation on the TSUBAME supercomputer, a large scale hybrid supercomputer equipped with graphics processing units (GPUs) and ClearSpeed SIMD accelerators. With all of about 10,000 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, we have achieved 87 TFlops. This paper also describes our design policy and tuning method that take characteristics of accelerators into account, which are essential to achieve scalability on hybrid supercomputers. The design is significantly different from that of LANL RoadRunner, a hybrid system equipped with Cell processors. We discuss the difference from the viewpoint of system architecture.

    DOI: 10.11540/bjsiam.20.2_117

    CiNii Books

    researchmap

  • 仮想マシン動的再配置による大規模データアクセスの高速化

    佐藤賢斗, 佐藤仁, 松岡聡, 松岡聡

    情報処理学会シンポジウム論文集   2010 ( 5 )   2010

  • MapReduce Implementation on the TSUBAME Supercomputer

    SATO HITOSHI, KONISHI FUMIKAZU, YAMAMOTO YASUNORI, TAKAGI TOSHIHISA, MATSUOKA SATOSHI

    123   F1 - F7   2009.11

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Software Framework for GPU Memory Errors

    MARUYAMA NAOYA, NUKADA AKIRA, MATSUOKA SATOSHI

    123 ( 8 )   H1 - H6   2009.11

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • The Efficient Checkpoint based on Erasure Coding with Incremental Method

    JITSUMOTO HIDEYUKI, NAKAMURA SYUNSUKE, ENDO TOSHIO, MATSUOKA SATOSHI

    122 ( 9 )   I1 - I6   2009.10

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Linpack Tuning Method on a Heterogeneous Supercomputer with Hybrid Accelerators

    ENDO T.

    Proc. Summer United Workshops on Parallel, Distributed and Cooperative Processing, SWoPP2009, Sendai, Aug.   2009 ( 3 )   1 - 8   2009.10

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Correlative Analysis of Performance Counters and Power Consumption on GPUs

    2009 ( 3 )   1 - 5   2009.10

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Power-Performance Evaluation of Fault Tolerant Numerics on GPUs

    2009 ( 3 )   1 - 6   2009.10

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Auto-Tuning FFT Library for CUDA GPUs

    2 ( 3 )   107 - 115   2009.9

     More details

    Language:Japanese  

    NVIDIA CUDA capable GPUs have extremely high memory bandwidth which benefits memory intensive applications such as FFT. Already there are several implementations of FFT using CUDA but they are optimized for specific transform sizes like powers of two which are suitable for GPU architecture. In this paper, we present our auto-tuning method to generate high performance CUDA kernels for FFTs of varying transform sizes. The optimized kernels outperform not only NVIDIA CUFFT libraries but also many of existing implementations.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00066245/

  • A Study of MPI Communication on a Next Generation Optical Interconnect

    TAKIZAWA Shin'ichiro, ENDO Toshio, MATSUOKA Satoshi

    26 ( 3 )   5 - 19   2009.7

     More details

  • Towards Resource Management Considering User's Satisfaction in Large Distributed Computing Environments

    KOKUBU Rio, SATO Hitoshi, MATSUOKA Satoshi

    IEICE technical report   109 ( 168 )   19 - 24   2009.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Application users on large-scale distributed computing systems are force to select resource parameters for effective application execution, whereas this may degrade the usability of the systems for non-expert users on HPC. Expert systems, which recommend suitable resource selections for the users by considering their demands, solve such situation; however, the demands of the application users are not clear in productive large-scale computing systems. To address this problem, we sampled the actual user's demands for application executions in the TSUBAME system by questionary surveys. Then, we modeled application usage patterns from the surveys. We confirmed the model is adequate for the recommendation of the resource selection.

    CiNii Books

    researchmap

  • HPC Application Performance Improvement by a Supplemental Optical Circuit Switching Network

    Shinichiro Takizawa, Toshio Endo, SATOSHI MATSUOKA

    IPSJ Transactions on Advanced Computing Systems   2 ( 2 )   110 - 121   2009.7

     More details

    Language:Japanese  

    For large scale HPC systems which consist of many nodes, it will be unfeasible to construct a fully-connected network with high bisection bandwidth due to cost and power consumption, etc. We propose a hybrid network that is composed of an electronic packet switching (EPS) network with low bisection bandwidth and a high bandwidth supplemental optical circuit switching (OCS) network, and communication method on the network. In this network, each node connects to the EPS network with one link and partial nodes also do to the OCS network with another one link. We assign optical pathways to node pairs that are connected to the OCS network and are not in the same EPS switch by considering application's communication pattern. We avoid contentions on the EPS upstream network by letting these nodes relay messages from other nodes. By conducting simulations, we confirmed that our approach can improve the performance of applications which require high bisection bandwidth by connecting only half of nodes to the OCS network. Moreover, performance of all-to-all communication on our system was almost the same as that on fat tree EPS only network.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00060776/

  • Acceleration of Himeno Benchmark on Multi-node GPU System by Overlapping Communication with Calculation : Over 700 GFLOPS of Sustained Performance is Achieved with 32 GPUs

    KATO Toshihiro, AOKI Takayuki, NUKADA Akira, ENDO Toshio, MATSUOKA Satoshi, HASEGAWA Atsushi

    120 ( 3 )   C1 - C6   2009.6

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Dynamic Estimation of Swap Cost for Reducing Memory Energy

    HOSOGAYA YUTO, ENDO TOSHIO, MATSUOKA SATOSHI

    IPSJ SIG Notes   182 ( 14 )   85 - 90   2009.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Recently, memory system is getting one of the most power consuming parts in high performance computers. This is mainly because computers are equipped with larger capacity of DRAM than applications actually need, thus there is an opportunity for reducing power by decreasing the capacity. We have already proposed a system that uses FLASH memory for the swap device, and shown that decreasing DRAM can reduce the energy with some applications, even if it causes page swapping. In such systems, the best capacity of DRAM, which achieves the lowest energy consumption, depends on characteristics of applications and problem sizes, so it is challenging to find such a capacity. We propose an algorithm that monitors the memory accesses while applications are running and optimizes the memory capacity dynamically. Our algorithm assumes that capacity of DRAM system can be controlled dynamically, and estimates energy consumption with all selectable capacities of DRAM. Through our trace driven simulation, we show that the 25% of energy consumption can be reduced with performance loss of 8%.

    CiNii Books

    researchmap

  • Migration Optimization Accounting for Similarity of Process Images

    YAMASAKI SHOHEI, ENDO TOSHIO, MATSUOKA SATOSHI

    IPSJ SIG Notes   182 ( 14 )   145 - 150   2009.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Demands for migration of large scale jobs are getting stronger on large scale systems for several reasons. For example, jobs may be migrated to different machines to avoid machine maintainance or performance degradation. In many cases, destination and timing should be determined dynamically. For reduction of migration costs of large scale jobs, this work presents an optimization method that utilizes similarity among memory images of parallel processes. In addition to reducing amount of communication, this method has high scalability, since it creates differences of images in parallel. With this method, we evaluated migration costs on a real cluster in detail, with several problem sizes and the number of nodes.

    CiNii Books

    researchmap

  • Accelerator Again, - Key for Super Computing -:Light and Shadow of Accelerator Technologies - Mainstream Devices Towards Next-Generation Petascale and Exascale HPC

    MATSUOKA Satoshi

    IPSJ Magazine   50 ( 2 )   95 - 99   2009.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00000017/

  • Performance Evaluation of Software-Based ECC for GPUs

    MARUYAMA NAOYA, NUKADA AKIRA, MATSUOKA SATOSHI

    IPSJ SIG Notes   2009 ( 14 )   25 - 30   2009

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    General-Purpose Processing on GPUs (GPGPUs) has rapidly been recoginized as a promissing HPC technology because of GPUs' much higher peak floating-point processing power. However, GPUs have originally been developed for graphics applications, such as 3D games, where reliability is not considered as an important issue as in HPC communities. One notable example is the lack of ECC in graphics memory systems. To improve the reliability of GPUs for HPC applications, we propose a software-based technique to generate and check ECC for graphics memory. Our library-based approache allows for CUDA-based GPGPU applications to be easily extended with ECC-based error checking with little manual intervention. To evaluate the applicability of our approach, we extended two CUDA applications with our ECC libarary: 3-D FFT, matrix multiplication, and an N-body problem. Our performance studies showed that while FFT and matrix multiplication can take up to 300% overhead, the N-body application only incurrs 15% of overhead. These results suggest that software-based ECC would be a promissing approach for computation-intensive applications such as N-body problems.

    CiNii Books

    researchmap

  • 光サーキットネットワークの補助的利用によるHPCアプリケーション性能向上

    滝澤真一朗, 遠藤敏夫, 松岡聡

    情報処理学会 コンピューティングシステム(ACS)   2 ( 2 )   110--121   2009

     More details

  • High Performance 3-D FFT in CUDA Environment

    1 ( 2 )   231 - 239   2008.8

     More details

    Language:Japanese  

    CUDA environment, which is supported in latest NVIDIA GPUs, allows data sharing between threads using shared memory, and also provides more flexible memory accesses. We propose a high performance 3-D FFT algorithm for the CUDA environment. Using GeForce 8 series GPUs, we achieved a high performance up to 79.5GFLOPS at 3-D FFT, which is from 3.1 to 3.3 times the performance compared with the performance of CUFFT library 1.1.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018177/

  • Creating Vital Information Technologies for the Info-plosion Era : Information Explosion Makes Information Systems Explode

    MATSUOKA Satoshi

    IPSJ Magazine   49 ( 8 )   904 - 911   2008.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00061025/

  • Interoperability Testing of NAREGI Grid Middleware for Large-Scale Cooperation

    HIGASHIDA Manabu, TOMOISHI Masahiko, SAKANE Eisaku, SATO Hitoshi, YAMANASHI Takeshi, OOBA Junichi, KOBAYASHI Taizo, MIZUTANI Fumiyasu, YAMADA Kiyoshi, TSUDA Tomoko, KONO Takahisa, AIDA Kento, MATSUOKA Satoshi, AOYAGI Mutsumi, SHIMOJO Shinji

    IPSJ SIG Notes   109 ( 77 )   133 - 140   2008.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Four Information Technology Centers in Osaka University, Tokyo Institute of Technology, Kyushu University and Nagoya University deployed NAREGI Grid Middleware to their Supercomputing Resources under real operational scenarios, and tested its interoperability for nation-wide large-scale cooperation with two NAREGI R & D centers: National Institute of Informatics and Institute for Molecular Science. We successfully demonstrated that we're able to formulate virtual organizations with certificates from multiple authorities and manage their grid resources, and also able to submit real grid applications to authorized computing resources with resource reservations, with coordinated execution across multiple meta-schedulers that issue reservation requests independently to potentially a same resource.

    CiNii Books

    researchmap

  • Optimization of MPI_Scatter/Gather Algorithm for Grid Environment

    CHIBA TATSUHIRO, ENDO TOSHIO, MATSUOKA SATOSHI

    IPSJ SIG Notes   116 ( 74 )   13 - 18   2008.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Many Collective algorithms have been proposed for grid environments, that enable us to construct optimized network topologies and to perform fast collective communications, but they are optimized under the condition that WAN is low and bottleneck bandwidth. However, recent WAN has become much wider and many nodes in LAN are connected with high-speed netoworks, so the previous assumption isn't suitable now. In this paper, we proposed multilane MPI_Scatter/Gather Algorithms to effectively utlize the available WAN and LAN bandwidth. We assumed MPI systems use TCP/IP in low-level communications, and experimentations on an emulated network environment show that proposed multilane collective algorithms achieve higher performance than traditional methods.

    CiNii Books

    researchmap

  • HPC Performance Improvement by Supplementing a Small Optical Network

    TAKIZAWA SHIN'ICHIRO, ENDO TOSHIO, MATSUOKA SATOSHI

    IPSJ SIG Notes   116   67 - 72   2008.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    For future peta-scale HPC systems, it will be unfeasible to construct a fully-connected network with high bisection bandwidth due to cost and power consumption, etc. We propose a network which is composed of an electronic packet switching (EPS) network with low bisection bandwidth and a high bandwidth supplemental optical circuit switching (OCS) network, and a communication methodology where messages are relayed from EPS to OCS and vice versa for MPI applications. In this network, partial nodes connected to the OCS network relay messages from other nodes under the same EPS switch to nodes under other EPS switches. From results of simulations, we confirmed that our approach reduces execution time against EPS only network by 30% at maximum.

    CiNii Books

    researchmap

  • Model-based Optimization for Data-Intensive Applications on a Virtual Cluster

    SATO KENTO, SATO HITOSHI, MATSUOKA SATOSHI

    IPSJ SIG Notes   116 ( 74 )   25 - 30   2008.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We propose a model-based optimization algorithm that determines virtual machine migration strategies, i.e., which virtual machines should be migrated to which nodes, while minimizing I/O access costs on the assumption that the network bandwidth between nodes and the order, sizes and locations of target files are given. Our algorithm models this problem as a directed acyclic graph, where the vertex represents a location of a virtual machine when target files are accessed, the edge represents a flow of data access that includes a virtual machine migration and a remote I/O access, and the edge weight represents a cost of data access; we solves this problem as a shortest path problem that minimizes overall data access costs of target file accesses. Our simulation-based studies suggest that the proposed algorithm can achieve higher performance than simple techniques, such as ones that never migrate virtual machines or always migrate virtual machines onto the nodes that holds target files.

    CiNii Books

    researchmap

  • Parallel Numerical Computation on Multiple GPUs with Self Scheduling

    WATANABE YUYA, ENDO TOSHIO, MATSUOKA SATOSHI

    IPSJ SIG Notes   179 ( 75 )   85 - 90   2008.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In high performance computing area, commodity accelerators, especially GPUs attract considerable attention for their superior cost performance. Thus systems with a large number of those accelerators are promising. In such situations, incremental upgrade of systems will cause heterogeneity of accelerators. With the rapid advance of GPU performance, techniques to utilize heterogeneous GPUs effectively will become important. To achieve this goal without knowledge of precise performance of GPUs, we adopt the self scheduling technique for dynamic task distribution. We take the SGEMM, dense matrix multiply computation as target, and have evaluated its performance on a machine with multiple heterogeneous GPUs. The results show that self scheduling achieves 94% performance relative to the ideal speed, which is the sum of those individual speeds.

    CiNii Books

    researchmap

  • Access-Pattern and Bandwidth Aware File Replication Algorithm for a Grid File System

    SATO HITOSHI, MATSUOKA SATOSHI, ENDO TOSHIO

    IPSJ SIG Notes   116 ( 74 )   211 - 216   2008.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We propose an automated replication algorithm for a grid file system that considers file access frequency and replica maintenance policy, and that allows most of I/O accesses to be performed within given throughput and storage usage thresholds, while simultaneously minimizing replica transfer time. Our algorithm models the replication problem as a combinational optimization problem, where the constraints are derived from the given throughput and storage usage threshold, and various system parameters collected from direct file access monitoring. Our simulated-based studies suggest that the proposed algorithm can achieve higher performance than simple techniques, such as ones that always or never create replicas, while keeping storage usage very low. The results also indicate that the proposed algorithm can perform comparably with manual replica placement.

    CiNii Books

    researchmap

  • Power-Saving Task Scheduling on Heterogeneous Environment

    HAMANO Tomoaki, ENDO Toshio, MATSUOKA Satoshi

    IEICE technical report   108 ( 180 )   97 - 102   2008.8

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Recent accelerators such as GPUs, which are originally designed as graphics devices, and ClearSpeed SIMD accelerators achieve better cost-performance and watt-performance ratio, while the range of their application is more limited than general CPUs. Thus clusters and supercomputers equipped both with acclerators and general CPUs are becoming popular. In order to optimize power efficiency and throughput on such systems, we will require (1) that each task is compiled so that it can be executed on a CPU or an accelerator, and (2) that tasks are maintained by a task scheduler that is aware of energy consumption. With an assumption that the former is realized, we describe a modeling method of heterogeneous cluster systems. And we propose a task scheduling method that considers property of each task, and evaluate it with simulation.

    CiNii Books

    researchmap

  • ソフトウェアECCによるGPUメモリの耐故障性の実現と評価

    丸山 直也, 松岡 聡, 尾形 康彦, 額田 彰, 遠藤 敏夫

    電子情報通信学会技術研究報告. DC, ディペンダブルコンピューティング : IEICE technical report   108 ( 181 )   9 - 15   2008.8

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    高い浮動小数点演算性能により、GPUをHPC用途に用いるGPGPUが注目されている。しかし、GPUは本来グラフィックス用途に開発されてきたものであり、HPC用途としては耐故障性に不十分な点が存在する。その一つとして、メモリ誤りの検出、訂正が挙げられる。現状のGPUにはECCを備えたものなく、一般的なHPC計算ノードと比較して信頼性に劣る。我々は、GPUの信頼性向上のために、ソフトウェアによってメモリ誤りの検出、訂正を行う手法を提案する。本手法では、GPGPUアプリケーション中にECCを計算、検査するコードを追加することで、グラフィックスメモリ中のビットフリップなどの誤りを検出、訂正する。提案手法をNvidiaによるC言語拡張CUDA向けにライブラリとして実装し、行列積とN体問題アプリケーションに適用した。両アプリケーションを用いて、ECC計算による性能オーバーヘッドを調査したところ、行列積で最大300%程度,N体問題で15%程度のオーバーヘッドになることを確認し、N体問題のようにメモリアクセス頻度に対して計算量の多いアプリケーションでは比較的小さなオーバーヘッドで実現可能であることを確認した。

    CiNii Books

    researchmap

  • An Efficient, Model-based CPU-GPU Heterogeneous FFT Library

    1 ( 1 )   40 - 50   2008.6

     More details

    Language:Japanese  

    General Purpose computing on Graphics Processing Units (GPGPU) is becoming popular in HPC because of it's high peak performance. However, in spite of the potential performance improvements, it might not necessarily perform better than the current high-performance CPUs, especially with recent trends for increases in their number of cores on a single die. This is because the GPU performance can be severely limited by such restrictions as memory size and I/O bandwidth. For this reason, we can expect that performance is improved by using CPU and GPU simultaneously. In heterogeneous environments, we need to find optimal load distribution ratio. We implement a 2D-FFT library that uses heterogeneous CPU-GPU computing resources. To find optimal load distribution ratios, we construct a performance model that predicts execution time of 2D-FFT that captures the respective contributions of CPU vs. GPU. The model parameters are determined by pre-stage performance profiling; based on this, we predict the overall execution time of 2D-FFT for arbitrary problem sizes and load distributions. Preliminary evaluation shows that the performance model can predict the execution time of problem sizes that are 16 times as large as the profile runs with less than 15% error, and that the predicted optimal load distribution ratios have less than 5% error; performance overhead caused by this error is less than 1%. We show that the resulting performance improvement by such heterogeneous parallelization can be 1.19 to 1.55 times compared to using only a CPU core or a GPU.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018188/

  • 衛星観測データの処理と保管のためのストレージシステムの性能評価

    谷村勇輔, 山本直孝, 石橋拓也, 田中良夫, 西川武志, 松岡聡, 関口智嗣

    情報処理学会シンポジウム論文集   2008 ( 5 )   27 - 28   2008.6

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • Intelligent data staging with overlapped execution of grid applications

    Yuya Machida, Shin'ichiro Takizawa, Hidemoto Nakada, Satoshi Matsuoka

    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE   24 ( 5 )   425 - 433   2008.5

  • Intelligent data staging with overlapped execution of grid applications

    Yuya Machida, Shin'ichiro Takizawa, Hidemoto Nakada, Satoshi Matsuoka

    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE   24 ( 5 )   425 - 433   2008.5

  • 情報爆発に対応する耐故障性MPIフレームワークの提案

    實本 英之, 遠藤 敏夫, 松岡 聡

    全国大会講演論文集   70 ( 0 )   133 - 134   2008.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • 情報爆発時代におけるモデルベース資源選択による高速な仮想クラスタ構築

    山崎 翔平, 丸山 直也, 松岡 聡

    全国大会講演論文集   70 ( 0 )   119 - 120   2008.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Autonomic, Scalable Fault Localization for the Information Explosion Era

    MARUYAMA Naoya, MATSUOKA Satoshi

    70 ( 0 )   127 - 128   2008.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • 情報爆発時代のグリッドファイルシステム上での大規模データ管理

    佐藤 仁, 松岡 聡, 遠藤 敏夫

    全国大会講演論文集   70 ( 0 )   121 - 122   2008.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • MPI Communication Algorithm over an Optical Interconnect for the Information Explosion Era

    TAKIZAWA Shin'ichiro, ENDO Toshio, MATSUOKA Satoshi

    70 ( 0 )   137 - 138   2008.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Supercomputing on Heterogeneous Architecture toward the Information Explosion Era

    ENDO Toshio, MATSUOKA Satoshi

    70 ( 0 )   131 - 132   2008.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Our exneriences at TSUBAME Grid Cluster : Managing the super computer for the information explosion era

    NISHIKAWA Takeshi, MATSUOKA Satoshi

    70 ( 0 )   129 - 130   2008.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Optimization for MPI Collective Operations on Grid Utilizing Multilane Transfer

    CHIBA Tatsuhiro, ENDO Toshio, MATSUOKA Satoshi

    70 ( 0 )   135 - 136   2008.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Environmental-Aware Optimization of MPI Checkpointing Intervals Reviewed

    Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka

    2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING   326 - 329   2008

     More details

  • 仮想クラスタを用いたData-Intensive Application実行環境の性能モデル構築と最適化に向けて

    佐藤賢斗, 佐藤仁, 松岡聡, 松岡聡

    情報処理学会シンポジウム論文集   2008 ( 5 )   2008

  • Building A Large-Scale Storage System Using Sun Fire X4500 and Gfarm

    TANIMURA YUSUKE, YAMAMOTO NAOTAKA, ISHIBASHI TAKUYA, TANAKA YOSHIO, NISHIKAWA TAKESHI, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   113 ( 122(HPC-113) )   1 - 6   2007.12

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Sun Fire X4500 Server integrates a four-way x86-64 server and 24TB storage, which may deliver remarkable benefits to the large-scacle data processing applications. In particular, the server architecture is supposed to match a data processing model of Gfarm, which uses file-affinity scheduling. In this paper, system integration for building a storage system using 20 nodes of X4500 and Gfarm is discussed. Configuration of the ZFS/RAID-Z storage pool or UFS is determined so that Gfarm achieves significant performance in I/O throughput and metadata operations. According to the discussions and preliminary experiments, a storage system which has 256.5TB capacity was constructed and the basic performance was measured by benchmarks. The results indicate several issues for building a petabytes-scale storage system with such as the architecture.

    CiNii Books

    J-GLOBAL

    researchmap

  • Evaluation of the issue of time stamps scalability of the distributed time stamping authority grid on the Internet

    NISHIKAWA Takeshi, MATSUOKA Satoshi

    IPSJ SIG Notes   112 ( 88 )   1 - 5   2007.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We have previously proposed a distributed time stamping scheme called "K=L+M among N for G generation" that solved the problem of scalability that both a centralized TSA (Time Stamping Authority) as well as other previous distributed time stamping schemes exhibited, and moreover implemented and tested its viability in issuing one million time stamps per second on a LAN testbed environment which has the low latency and the high bandwidth. To verify the globlal scalability of our approach, we install the distributed time stamping units (TSU) in various locations on the Internet with varying access characteristics, such as the NTT East B-Flets network (regional shared optical 100Mbps best effort) as well as a European WiFi Internet service provider network. There, realistic operational experiment of the distributed time stamping grid system exhibited good scalability in that sufficient number of TSUs distributed on the Internet allows issuance of one million time-stamps per second even if there are the unpredictable network delay and/or the response delay by garbage collection of Java VM, just as was the case under a LAN environment.

    CiNii Books

    researchmap

  • Web-Site-Based Partitioning Techniques for Efficient Parallelization of the PageRank Computation

    CEVAHIR ALI, MATSUOKA SATOSHI

    IPSJ SIG Notes   2007 ( 88 )   19 - 24   2007.9

     More details

    Language:English   Publisher:Information Processing Society of Japan (IPSJ)  

    The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enourmous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. Graph and hypergraph partitioning techniques are widely used for efficient parallelization of matrix-vector multiplications. These techniques suffer from high preprocessing overhead for PageRank algorithm. In this work, we propose Web-site-based partitioning techniques to reduce the preprocessing overhead of Parallel PageRank computation.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00028745/

  • High-Performance Distributed Solar Computing (?) : Towards a Grid that Computes like Trees

    MATSUOKA Satoshi

    IPSJ SIG Notes   112 ( 88 )   61 - 66   2007.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Power-heat dissipation as well as the associated CO_2 emission are becoming serious bottlenecks in scaling of large supercomputers. Indeed a single day's operation of TSUBAME, the fastest supercomputer in Asia-Pacific circa 2007, incurs as much CO_2 emission as an entire Formula-1 race. Instead, the use of photovoltaic power generation is promising to minimize or eliminate the emission altogether. While the traditional methods would incur simple attachment to a power grid, and involve very little effect or merit from grid computing, we actually claim that grids and distributed power generation go hand-in-hand to create a robust and self-sustainable computing infrastructure that could scale to TSUBAME-class applications. For robust operation as a pragmatic operational infrastructure, much continuing research would be required customizing and integrating the results from P2P, autonomic computing, sensor networks, etc.

    CiNii Books

    researchmap

  • Evaluation of MPI Communication Performance on Next Generation Optical Interconnect

    24   1 - 11   2007.9

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Distributed Time-stamping Authority Grid and Analysis of Parameter Dependencies

    NISHIKAWA TAKESHI, MATSUOKA SATOSHI

    48 ( 13 )   117 - 126   2007.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Digital time stamping is a technique to prove the existence of a digital data prior to a specific point in time. The centralized time-stamping scheme which is the main stream at present can not stand up to the concentration of numerous time-stamping requests. So, the centralized time-stamping scheme has vulnerability to the distributed DoS (DDoS) attack. Distributed time stamping schemes have been proposed to solve a performance scalability problem such as tolerance to DDoS attack. They still have high cost problems which are caused by a utilization of atomic clock and by audit of trusted third party. In this paper, we define a reliable, a high-performance, a robust, and inexpensive distributed time stamping scheme. It is named "TSA Grid" with (N,K = L+M,G) scheme and its scheme is based on a network of peer-to-peer time-stamping programs managed by administratively independent entities. It solves the cost problem of proposed distributed time stamping schemes. In (N,K = L+M,G) scheme, one time stamp request propagates for G generation to N Time Stamping Units (TSU). In each generation, L time stamps replies from reliable TSU and M time stamps replies from randomly chosen TSU. The G and the L parameters enabled us to expect authorized time of time-stamping. And they also enabled TSU to audit TSU themselves mutually and automatically. We also investigate basic characteristic of parameter dependencies of the TSA Grid.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018232/

  • Experiments of Distributed Time Stamping Grid on the Internet

    NISHIKAWA Takeshi, MATSHUOKA Satoshi

    IEICE technical report   107 ( 175 )   61 - 64   2007.8

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    We have proposed the distributed time stamping scheme named K=L+M among N for G generation which solved the problem that a single point TSA and other distributed time stamping scheme had. Not only we have proposed but also we have implemented the programs. We have shown to issue one million time stamp per second was possible on the LAN environment which have the low latency and the high band width. In this work, time stamping units (TSU) were installed on the NTT East B-flets and on the European WiFi Internet service provider network et.al., and then operational experiment of the distributed time stamping grid system on the Internet was executed. As the result, it was shown that enough number of Time-stamping units on the Internet enables to issue one million time-stamps per second.

    CiNii Books

    researchmap

  • Modeling of Virtual Cluster Construction Time and Its Optimization

    YAMASAKI Shohei, MARUYAMA Naoya, MATSUOKA Satoshi

    IEICE technical report   107 ( 175 )   65 - 70   2007.8

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    When constructing virtual clusters on grids, randomly selecting nodes from any available computing resources can incur large time overhead in installation time. This is because installation time of each node can vary greatly in heterogeneous grid environments and the total installation time of a virtual cluster is determined by the slowest node. To achieve fast virtual cluster installation, we propose a model-based resource selection policy that chooses a near-optimal node combination of nodes to assemble each cluster. We divide the VM setup process into five steps and generate a model for each step. The time of each step is represented as a linear combination of CPU frequency, disk performance, and package size to install. Experiments using our virtual cluster installer VPC shows that the model-based selection policy is indeed effective, especially when the size of packages to install differs depending on each site. The proposed policy has shown to reduce the installation time by up to 68% compared to the most naive method, 60% to the method considering only CPU frequency, and 58% to the method considering only disk performance.

    CiNii Books

    researchmap

  • Anarysis of MPI Applications over Next Generation Optical Interconnect

    TAKIZAWA SHIN'ICHIRO, ENDO TOSHIO, MATSUOKA SATOSHI

    IPSJ SIG Notes   111   183 - 188   2007.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    For the future tens of thousands of processors systems, it is difficult to construct interconnects which fully connect all nodes with high bandwidth due to cost and power consumption. We propose a network which utilizes both fully-connected low bandwidth electronic packet switched network and optical circuit switched network. Optical network is supplimentally used only when a node communicates with nodes in other packet switches. MPI application runs on this environment in such manner that processes connect to optical circuits forward other processes' messages that cross packet switches, in accordance with a topology constructed from communication pattern. As a result of evaluations, our proposal achieves lower inter-process distance than electronic network.

    CiNii Books

    researchmap

  • Node Grouping for Large-Scale Data Management on the Grid

    SATO HITOSHI, MATSUOKA SATOSHI, ENDO TOSHIO

    IPSJ SIG Notes   111 ( 80 )   109 - 114   2007.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In parallel computing environments such as HPC clusters and the Grid, data-intensive applications involve large overhead due to the access concentration on files on commonly shared nodes. A grid filesystem with an automatic data management mechanism is one of the solutions to avoid such performance decrease. However, metrics to achieve efficient large scale data management are not clear for a given real grid environment. We federated 5 geographically distributed HPC clusters using a grid filesystem and experimented its various performance metrics of file access on the filesystem. We observed that, although remote access performance of files is affected by inter-node bandwidth, other factors are in place which makes prediction of performance solely based on limited inter-node information such as RTT or network bandwidth difficult, and that even for local file access, performance difference could be an order of magnitude depending on file access patterns due to access contentions.

    CiNii Books

    researchmap

  • Proposal and Evaluation of a FFT Library That Uses CPU and GPU Together

    OGATA YASUHIKO, ENDO TOSHIO, MATSUOKA SATOSHI

    IPSJ SIG Notes   111 ( 80 )   13 - 18   2007.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    General purpose computation on graphics processing units (GPGPU) is becoming popular in HPC field, in expectation of excellent peak performance of GPUs. Their effective performance is, however, not so far from that of recent multi-core CPUs. Therefore we can expect to improve performance by using GPUs and CPUs cooperatively. One of the key challenges in such heterogeneous environments is to determine optimal load balancing ratio among processors. It depends not only on characteristics of target computation and processors, but also on problem sizes. Our approach is to construct a performance prediction model that covers computational cost and data transfer cost of target computation. We train the model with a small number of test runs to determine model parameters. Then we use the model to obtain optimal load balancing ratio for arbitrary problem sizes. According to this approach, we have implemented a two-dimensional FFT library for heterogeneous environments and constructed its performance model. We have evaluated accuracy of our model by comparing prediction and real performance on a heterogeneous system with a GeForce8800GTX GPU and a Core2Duo CPU. After training the model with test runs of 512^2 FFT, we have evaluated larger (up to 8192^2) problem sizes. The results show that our model succeeds to predict the optimal load balancing ratio within 5% accuracy, while prediction errors in execution time are 15% or less.

    CiNii Books

    researchmap

  • Evaluation of Power Saving of Parallel Applications with Next Generation Low Power Memory

    HOSOGAYA YUTO, ENDO TOSHIO, MATSUOKA SATOSHI

    IPSJ SIG Notes   174   49 - 54   2007.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    With the increasing demand for low power high performance computing, reducing power of not only CPUs but also memories is becoming important. In typical HPC environments large capacity of DRAM is installed to avoid memory swapping, although not all of the memory is used in many cases. Since DRAM is a volatile memory, such unused memory can waste a significant amount of power even in a standby state. We propose a next generation low power system that intends to reduce the DRAM capacity without causing application performance degradation. In this system, MRAM and DRAM is used as a main memory, while FLASH is used as a SWAP. Our profile-based paging algorithm optimizes memory accesses by avoiding I/O with slower memories and using faster memories as much as possible. Results from our simulation of parallel applications show that the power consumption can be reduced up to one third, with 30% performance loss.

    CiNii Books

    researchmap

  • Evaluation of I/O Performance of IP-SAN on Cluster System using Parallel Benchmark

    KAMISAKA KIKUKO, YAMAGUCHI SANEYASU, OGUCHI MASATO, MATSUOKA SATOSHI

    IPSJ SIG Notes   111 ( 80 )   225 - 230   2007.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In Supercomputing and large scale HPC clusters, cluster system integrating connection networks of computing nodes and storage is beginning to be realized. Such cluster system simplifys network composition and reduce its costs. However, it is not clarified how the integration affects total performance of the system. In this paper, as one of the integrated networks, the cluster system connected with IP-SAN is evaluated by using parallel benchmark with I/O. In consequence, the performance of IP-SAN integrated cluster is about the same as that of cluster using local storage. According to the result, the bottleneckof en tire system's performance should be parallel processing and/or I/O processing of storage, rather than data transfer processing of networkstorage.

    CiNii Books

    researchmap

  • Performance Evaluation of TSUBAME Heterogeneous Supercomputer with Linpack

    ENDO TOSHIO, MATSUOKA SATOSHI, HASHIZUME NOBUAKI, NAGASAKA MASAMICHI

    48 ( 8 )   62 - 70   2007.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    The TSUBAME supercomputer is a heterogeneous large-scale cluster system, which is equipped with 10480 Opteron CPU cores on 655 nodes and 360 ClearSpeed SIMD accelerator boards. This paper describes techniques to run HPL, which is a parallel Linpack implementation, on the TSUBAME system efficiently, and evaluates the performance. The techniques include sharing heterogeneous computing resources among fine grained processes, and using asynchronous communications. Through the evaluation of the system with the modified HPL, we have observed 47.38TFlops, which is the world's fastest Linpack performance on heterogeneous systems. The result of this work shows that heterogeneous supercomputers, which are expected to be much more popular in the near future, are promising for large scale parallel computations.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018254/

  • MPI Collective Operations Algorithm by Using Multi-lane for Grid Environment

    CHIBA TATSUHIRO, ENDO TOSHIO, MATSUOKA SATOSHI

    48 ( 8 )   104 - 113   2007.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    The performance of MPI collective operations, such as broadcast and reduction, is heavily affected by network topologies, especially in grid environments. Many techniques to construct efficient broadcast trees have been proposed for grids. On the other hand, recent high performance computing nodes are often equipped with multi-lane network interface cards (NICs), most previous collective communication methods fail to harness effectively. Our new broadcast algorithm for grid environments harnesses almost all downward and upward bandwidths of multi-lane NICs; a message to be broadcast is split into two pieces, which are broadcast along two independent binary trees in a pipelined fashion, and swapped between both trees. The salient feature of our algorithm is generality; it works effectively on both large clusters and grid environments. It can be also applied to nodes with a single NIC, by making multiple sockets share the NIC. Experimentations on a emulated network environment show that we achieve higher performance than traditional methods, regardless of network topologies or the message sizes.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018258/

  • Creating Informatics - What the KAKEN Project is Aming at : How Does ICT Affect on Progress of Science ?

    SHIMOJO Shinji, NOZAKI Kazunori, MATSUOKA Satoshi

    IPSJ Magazine   48 ( 5 )   521 - 526   2007.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00065970/

  • Building Time-Stamp Authority Grid and Basic Experiment

    NISHIKAWA Takeshi, MATSUOKA Satoshi

    IEICE technical report   107 ( 16 )   13 - 18   2007.4

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Digital time stamping is a technique to prove the existence of a digital data prior to a specific point in time. The centralized time-stamping authority has two major difficulties. One is administrative costs and another is scalability of performance. Distributed time-stamping methods were proposed to solve the problems. But they still have the problem to prepare many TSAs. We have shown that K=L+M among N for G-generation method is able to solve these existing problems. That method does the time-stamping to the mutuality by using many time-stamping units. And we also reported basic characteristics of the method and dependence on configuration parameters at the computer cluster environment on the LAN. In this report, we described that we built TSA Grid on the Internet to install several distributed TSU. And we investigated the influence of network time delay to the authorized time. We also considered that how many time-stamping units enable to be arithmetic mean value of responding authorized times within 1-second. As that result, if it prepared more than 256TSU that arithmetic mean of the authentication time because within 1 second. It became clear that the mode is able to get smaller delay time than that of arithmetic-mean.

    CiNii Books

    researchmap

  • Job invocation interoperability between NAREGI middleware Beta and gLite

    NAKADA HIDEMOTO, SATO HITOSHI, SAGA KAZUSHIGE, HATANAKA MASAYUKI, SAEKI YUJI, MATSUOKA SATOSHI

    IPSJ SIG Notes   2007 ( 17 )   269 - 274   2007.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    As grid middlewares getting mature, the importance of the inter-operation among them is getting more significant. There is a community group called GIN(Grid Interoperation Now) in the OGF (Open Grid Forum), a standardization body for grid related technologies, which aims to establish interoperation technologies among several grid middlewares. We performed experiments on inter-operation between NAREGI Middleware beta and EGEE gLite, as one of the contributions for the group. For the experiments, we implemented several modules to enable information exchange and mutual job submission. As the result of the experiment, we confirmed the follows: 1) The security layer, such as certficate and virutal organization management, there is no essential difference between them, 2) While information services differs substantially, the resource information can be translated to enable information exchange, 3) Jobs can be mutually submitted based on the exchanged information.

    CiNii Books

    J-GLOBAL

    researchmap

  • ABARIS: an adaptable fault detection/recovery component framework for MPIs

    JITSUMOTO HIDEYUKI, ENDO TOSHIO, MATSUOKA SATOSHI

    IPSJ SIG Notes   2007 ( 17 )   163 - 168   2007.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Long-running MPI applications on clusters and grids that are prone to node and network failures, motivates the use of fault tolerant MPI implementations. However, previous fault tolerant MPIs lack the ability to allow the user to easily choose appropriate fault recovery strategies according to the execution environment, independent of the application codes. ABARIS is our new Fault/Recovery model aware component framework for MPI, where users can customize MPI fault detection and recovery algorithms according to their application and execution environmental requirements by merely selecting appropriate fault/recovery compo nents, independent of the application code. Currently, the ARABIS framework prototype is implemented on top of MPICH-P4MPD. Preliminary evaluation of the prototype using NPB on our MPI fault simulator demonstrates that overhead compaxed to the original MPICH-P4MPD is almost negligible (less than 1%) under normal execution, and when faults occur, appropriate selections and pairings of fault model and recovery method components for corresponding to the execution environment is significant to the overall execution time.

    CiNii Books

    researchmap

  • Performance evaluation of a cache-based virtual cluster installation method

    NISHIMURA HIDEO, MARUYAMA NAOYA, MATSUOKA SATOSHI

    IPSJ SIG Notes   2007 ( 17 )   121 - 126   2007.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Recently, clusters of virtual machines called virtual clusters are proposed as a means to share Grid resources efficiently. Such virtual cluster construction should be not only fine-grained customizable but also fast and scalable. However, existing ways have not fullfilled these requrirements. We have been proposing a novel virtual cluster installation system which is fast, scalable and fully customizable in corporation with existing cluster installer tools. To achieve efficiency in the presence of such full customization, it automatically caches frequently-constructed virtual disk images to save software installation time in common cases On broader environments, our experimental studies show that the average installation time could be reduced by approximately 66.7% after creation of cache images and 204-node virtual cluster can be done in 40 seconds with our prototype implementation. From the result along with a scalability study, we estimate that installation of a 1000-node virtual cluster could be done in several tens of seconds.

    CiNii Books

    researchmap

  • Multi-site MPI execution with virtual cluster

    TATEZONO MASAKI, MARUYAMA NAOYA, MATSUOKA SATOSHI

    IPSJ SIG Notes   2007 ( 17 )   115 - 120   2007.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Recently, a large-scale MPI application requests a large amount of computation resource. We propose a MPI execution environment on the the geographically distributed computation resource using virtual clusters, and we confirm that our proposal is feasible according to the application characteristics since experiment on prototype of virtual cluster. Moreover, virtual cluster makes dynamic relocation of virtual node possible. By using this feature, we propose a system which automatically relocates virtual nodes includes MPI process to suitable resources, besed on MPI application characteristics and resource usage. We confirm our approach in a experiment on the prototype, and amount of Cross-site communication gives an indication of possibility of cross-site MPI execution.

    CiNii Books

    researchmap

  • High Performance Distributed Time-Stamping Authority : How to Issue Millions Time-Stamp

    NISHIKAWA TAKESHI, MATSUOKA SATOSHI

    IPSJ SIG Notes   2007 ( 17 )   221 - 226   2007

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Time stamping is a technique to prove the existence of a digital data prior to a specific point in time. The centralized time-stamping scheme which is the main stream at present can not stand up to the concentration of numerous time-stamping requirement. So, the centralized time-stamping scheme has vulnerability to the distributed DoS(DDoS) attack. It also has high cost problem which causes using an expensive time source such as atomic clock. We soleved these problem by developing a distributed time stamping scheme. In this report, we investigated an implementation and parameter configuration those make a million time-stamp per second possible.

    CiNii Books

    researchmap

  • Outil autonome de surveillance de grilles

    Laurent Baduel, Satoshi Matsuoka

    Revue de l'Ingenierie des Systemes d'Information   12 ( 3 )   85 - 104   2007

     More details

  • Outil autonome de surveillance de grilles

    Laurent Baduel, Satoshi Matsuoka

    Revue de l'Ingenierie des Systemes d'Information   12 ( 3 )   85 - 104   2007

     More details

  • TSUBAMEの飛翔 (Extended Abstract)

    松岡 聡

    電子情報通信学会技術研究報告. CPSY, コンピュータシステム   106 ( 287 )   33 - 36   2006.10

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    CiNii Books

    researchmap

  • Next Generation High Performance Computing Systems and Aspect of Ultimate System

    TANABE NOBORU, IKEI Mitsuru, ENDO Toshio, MATSUOKA Satoshi, HATAZAKI Takao, SUMIMOTO SHINJI

    IEICE technical report. Computer systems   106 ( 287 )   49 - 49   2006.10

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    System Area Networks(SAN), which realizes high bandwidth and low latency communication, has become widely used as inter-connection network of PC clusters. This panel discusses some directions to next generation computer systems and ultimate computer system, including advanced hardware and software technologies for high performance computing.

    CiNii Books

    researchmap

  • ORE Grid: A Virtual-machine Based Fast Deployment Tool for Grid Execution Environment

    TAKAMIYA YASUHITO, YAMAGATA IKUHEI, AOKI TAKAFUMI, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    47 ( 12 )   229 - 239   2006.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    With the increased variety of jobs executed in the Grid, the execution environments such as OSes, softwares, and libraries requested by such jobs have becoming increasingly diversified. However, it is difficult for grid users to acquire the necessary environment suited for each jobs because the job execution environment on the grid are strongly tied to its local administration policies. Recently proposed solutions may achieve virtualization of execution environment at certain level, but are still incomplete that construction of execution environments will again requires manual operations and/or expert knowledge of underlying systems. Instead, we propose the system called ORE (Open Resource Environment) Grid which automatically and dynamically builds exclusive execution environment for each submitted jobs. Moreover, the GUI setup front-end offers succinct methods to pick the necessary features and generate an execution environment description automatically instead of resorting to tool-dependent VM description forms such as shell scripts or DAG descriptions. Our experiences have shown that setup of 16 VM nodes itself will only take 151 seconds, and the setup cost is certainly within an allowable range compared to accumulated running time of general Grid jobs (several hours to several days).

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018304/

  • Profile-based Optimization of Power Performance by Using Dynamic Voltage Scaling on a PC Cluster

    HOTTA YOSHIHIKO, SATO MITSUHISA, KIMURA HIDEAKI, MATSUOKA SATOSHI, BOKU TAISUKE, TAKAHASHI DAISUKE

    47 ( 12 )   272 - 284   2006.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Currently, several of the high performance processors used in a PC cluster have a DVS (Dynamic Voltage Scaling) architecture that can dynamically scale processor voltage and frequency. Adaptive scheduling of the voltage and frequency enables us to reduce power dissipation without a performance slowdown during communication and memory access. In this paper, we propose a method of profiled-based power-performance optimization by DVS scheduling in a high-performance PC cluster. We divide the program execution into several regions and select the best gear (combinations of clock frequency and voltage) for power efficiency. Selecting the best gear is not straightforward since the overhead of DVFS transition is not free. We propose an optimization algorithm to select a gear using the execution and power profile by taking the transition overhead into account. We have built and designed a power-profiling system, PowerWatch. With this system we examined the effectiveness of our optimization algorithm on two types of power-scalable clusters (Crusoe and Turion). According to the results of benchmark tests, we achieved almost 30% reduction in terms of EDP (energy-delay product) without performance impact (less than 5%) compared to results using the standard clock frequency.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018308/

  • B-12-10 Demonstration of Collective Communication in Grid Computing over OBS Network

    ONO Takashi, TAKADA Atsushi, KOGA Masafumi, TAKIZAWA Shin'ichiro, MATSUOKA Satoshi

    Proceedings of the Society Conference of IEICE   2006 ( 2 )   296 - 296   2006.9

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Books

    researchmap

  • Construction and Operation of the Grid Challenge Testbed

    AIDA KENTO, OSAWA KIYOSHI, OSUMI TOMOTAKA, KASAI TAKEFUMI, ONO ISAO, JITSUMOTO HIDEYUKI, MATSUOKA SATOSHI, SAITO HIDEO, ENDO TOSHIO, YOKOYAMA DAISAKU, TAURA KENJIRO, CHIKAYAMA TAKASHI, TANAKA YOSHIO, SHIMOSAKA HISASHI, KAJIWARA HIROKI, HIROYASU TOMOYUKI, FUJISAWA KATSUKI

    IPSJ SIG Notes   107 ( 87 )   49 - 54   2006.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    This paper presents a case study to operate the Grid testbed for the Grid Challenge in SACSIS2006. The Grid Challenge is a programming competition on a Grid testbed, which is organized by multiple computing resources installed in universities and laboratories. In the last competition, the Grid testbed with more than 1200 CPUs was operated. The paper shows hardware/software specifications of the Grid testbed, and reports experience of the operation, which includes accounting, job management, and troubleshooting.

    CiNii Books

    researchmap

  • Flight of TSUBAME: Construction of `Supercomputer for Everyone' toward Petascale

    IPSJ SIG Notes   107 ( 87 )   37 - 42   2006.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

  • Large Scale Distributed Time-Stamp Authority : Its design, implementation and performance evaluation

    NISHIKAWA Takeshi, MATSUOKA Satoshi

    IEICE technical report   106 ( 199 )   25 - 30   2006.8

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Time stamping is a technique to prove the existence of a digital data prior to a specific point in time. The centralized time-stamping scheme which is the main stream at present can not stand up to the concentration of numerous time-stamping requirement. So, the centralized time-stamping scheme has vulnerability to the distributed DoS(DDoS) attack. It also has high cost problem which causes using an expensive time source such as atomic clock. In this report, we define a reliable, a high-performance, a robust, and inexpensive distributed time stamping scheme. This scheme is based on a network of peer-to-peer time-stamping programs managed by administratively independent entities. It solves the DDos tolerance and the cost problem.

    CiNii Books

    researchmap

  • Performance Evaluation of TSUBAME Heterogeneous Supercomputer with Linpack

    ENDO TOSHIO, MATSUOKA SATOSHI, HASHIZUME NOBUAKI, NAGASAKA MASAMICHI, GOTO KAZUSHIGE

    IPSJ SIG Notes   107 ( 87 )   43 - 48   2006.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    The TSUBAME supercomputer is a heterogeneous large-scale cluster system, which is equipped with 10480 Opteron CPU cores on 655 nodes and 360 ClearSpeed SIMD accelerator boards. The TSUBAME system has achieved 38.18TFlops with Linpack benchmark and is ranked 7th in the Top500 supercomputer ranking in June 2006, even though the measurement is done without any accelerator boards. This paper discusses issues to obtain high Linpack performance on heterogeneous systems with general purpose processors and accelerators, and describes solutions. Through preliminary experiments with 256 CPU cores on sixteen nodes, we observed +8.2% speed-up when eight accelerators are added, and +19% with sixteen accelerators.

    CiNii Books

    researchmap

  • Virtual Cluster with Virtual Machines and Virtual Network

    NISHIMURA HIDEO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   107 ( 87 )   73 - 78   2006.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Recently, a virtual cluster constructed by using a virtual machine and a virtual network attracts attention as the technique of hiding heterogeneous of the grid environment. It is necessary to distribute a VM image which has requested environment to the real computing resources for constructing proper virtual cluster. However, the transfer time of the VM image cannot be generally disregarded, since those sizes have several GBytes from 100MBytes. In an existing research, there is a limitation in the execution environment though a comparatively high-speed virtual cluster construction system is advocated. Then, we propose the virtual cluster construction system that makes the environment for which the user hopes at dynamically and high speed. This system automatically generates cache images that contain packages composition frequently used. Moreover, due to estimating the construction time beforehand and using cache, we confirmed the construction time was shortened from about 100 seconds at about 75 seconds, and obtained the indicator to speed-up.

    CiNii Books

    researchmap

  • The Proposal and Evaluation of Cuckoo FTMPI : Framework of Fault/Recovery model aware Component-based FTMPI

    JITSUMOTO Hideyuki, MATSUOKA Satoshi

    IEICE technical report   106 ( 198 )   73 - 78   2006.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Execution of MPI applications on clusters and Grid deployments suffering from node and network failures motivates the use of fault tolerant MPI implementations. Therefore, some fault tolerant MPI was implemented. But, these fault tolerant MPI implementations cannot choose easily appropriate restoration according to the environment. We present Cuckoo FTMPI: Fault/Recovery model aware component framework. Users can get a MPI implementation according to their executing environment by the selection of the components. This paper presents the architecture of Cuckoo FTMPI, its theoretical foundation and the performance of the implementation. Preliminary evaluation using NPB, there's no overhead to use Cuckoo FTMPI on MPICH. And we presented validity of Fault/Recovery model aware component framework.

    CiNii Books

    researchmap

  • Towards Fault Diagnosis for Large-Scale Distributed Systems

    MARUYAMA Naoya, MATSUOKA Satoshi

    IEICE technical report   106 ( 198 )   19 - 24   2006.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    As distributed systems, such as clusters and grids, are getting larger scale and more commoditized, analysis of faults in such systems are becoming significantly harder than before. Nonetheless, none of the existing analysis techniques is not effective for such platforms, resulting huge burden to system administrators. We detect and analyze faults as follows. First, we take function-call traces from each process of the target distributed system. Next, to find anomalous behaviors, we apply an online analysis the call traces. Based on the premise that most of distributed systems processing is request-driven or event-driven, we analyze the call trace of each processing routine of requests or events. We implemented a prototype fault analyzer, applied it to a cluster resource manager, and evaluated the efficacy of our method.

    CiNii Books

    researchmap

  • Design and implementation of NAREGI SuperScheduler based on the OGSA architecture

    Satoshi Matsuoka, Masayuki Hatanaka, Yasumasa Nakano, Yuji Iguchi, Toshio Ohno, Kazushige Saga, Hidemoto Nakada

    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY   21 ( 4 )   521 - 528   2006.7

  • Dependability and Security : Devices, Architecture and Software

    SAKAI Shuichi, NAKAMURA Hiroshi, GOSHIMA Masahiro, MATSUOKA Satoshi, HASHIMOTO Mikio, KOHIYAMA Kiyoshi, NAKAMURA Tomohiro

    IEICE technical report. Dependable computing   106 ( 4 )   67 - 67   2006.4

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Dependability and security are two of the most significant things in information systems. Dependability, however, is complex concept depending on the people who use the word: one regards it as LSI design reliability and another think it as reliability and security of the internet. Here, six distinguished researchers, LSI designers, computer architects and software developers, will take the rostrum and discuss what the dependability is, what the most significant problem is and how we should solve it. After they understand each other, they discuss what is necessary for ensuring dependability of the whole information system.

    CiNii Books

    researchmap

  • Decentralized Job Scheduling System based on Information Sharing framework for Large-Scale Computing Environment

    UMEDA NORIHIRO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   2006 ( 20 )   223 - 228   2006.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Job scheduling system enables to unify and to use distributed computer resources However these systems has a single point of failure that just a few computers makes assignments job to resources, and lack of scalability to increase number of resources and jobs. We claim decentralized job scheduling system to share resources status using communication framework for large-scale computing environment and evaluated on simulations. The results showed that our proposal reduced a decline of efficency on large-scale computing environment.

    CiNii Books

    researchmap

  • A virtual-machine based fast deployment tool for Grid execution environment

    YAMAGATA IKUHEI, TAKAMIYA YASUHITO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   2006 ( 20 )   127 - 132   2006.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    With the increased variety of jobs executed in the Grid, the execution environments requested by such jobs have becoming increasingly diversified. So, we implemented system that supply exclusive virtual execution environment every submitted job by virtual machine and installer. So, we implemented system that supply exclusive virtual execution environment every submitted job by virtual machine and installer. This system enable to setup 16 machines that can execute BLAST job dynamically at about 173 seconds. This research suggest system that save and recycle virtual execution environment for shorten time to build job execution environment. We implemented this system and evaluate against old system. So our experiences have shown that the build time has been reduced by 12% than older one.

    CiNii Books

    researchmap

  • MPI Collective Communication on All Optical Network

    TAKIZAWA SHIN'ICHIRO, MATSUOKA SATOSHI, NAKADA HIDEMOTO

    IPSJ SIG Notes   2006 ( 20 )   193 - 198   2006.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    On the Optical Burst Switching network, it is necessary to establish an optical path connection between nodes before communication and release it after communication. This process takes about 10 ms on average. For this reason, in compute-intensive applications, like MPI applications, an execution of collective communication is heavily influenced by the cost of establishing and releasing a connection. We propose a method which establishes and releases connections independent of communication occurrence to reduce cost in collective communication. In this method, we accomplish fast execution by changing algorithms and simultaneously connecting based on ports on node. The evaluation result shows our proposed technique performs superior to the method which establishes connections whenever communication occurs.

    CiNii Books

    researchmap

  • A Scheduling System Coupled with a Replica Management System for Data-intensive Applications

    MACHIDA YUYA, TAKIZAWA SHIN'ICHIRO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   2006 ( 20 )   229 - 234   2006.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Existing scheduling systems for the Grid mostly handle huge I/O via a shared file system or simple staging. However, when numerous nodes access a single I/O node simultaneously, major performance degradation occurs, or in a worst case, causes I/O nodes to hang. Moreover, when a user launches a job consisting of hundreds or even thousands of tasks which share the same data set, it becomes extremely inefficient to stage essentially the same data set to each compute node after every dynamic brokering and allocation of the compute nodes. So we propose to tightly couple replica management and computation scheduling in order to reuse already replicated data effectively. We implemented a prototype system which uses a replica management system that embodies a scalable multi-replication framework, where multiple copies could be made in O(1) transfer time, and enables scheduling computation and data trasfer to single node simultaneously. The evaluation result shows our proposed technique performs superior to the traditional techniques and improves the throughput.

    CiNii Books

    researchmap

  • Cyber Science Infrastructure Initiative for Boosting Japan’s Scientific Research

    Masao Sakauchi, Shigeki Yamada, Noboru Sonehara Shigeo, Urushidani Jun, Adachi Kazunobu Konishi, Satoshi Matuoka

    CTWatch Quarterly Journal   2 ( 1 )   20 - 26   2006

     More details

  • Cyber Science Infrastructure Initiative for Boosting Japan’s Scientific Research

    Masao Sakauchi, Shigeki Yamada, Noboru Sonehara Shigeo, Urushidani Jun, Adachi Kazunobu Konishi, Satoshi Matuoka

    CTWatch Quarterly Journal   2 ( 1 )   20 - 26   2006

     More details

  • Speculative Checkpointing

    Ikuhei Yamagata, Satoshi, Matsuoka, Hidemoto Nakada

    Proceedings of DSW `06   1   2006

     More details

  • Speculative Checkpointing

    Ikuhei Yamagata, Satoshi, Matsuoka, Hidemoto Nakada

    Proceedings of DSW `06   1   2006

     More details

  • Cyber science infrastructure initiative for boosting Japan's scientific research

    Masao Sakauchi, Shigeki Yamada, Noboru Sonehara, Shigeo Urushidani, Jun Adachi, Kazunobu Konishi, Satoshi Matsuoka

    CTWatch Quarterly Journal   2 ( 1 )   20 - 26   2006

     More details

  • Cyber science infrastructure initiative for boosting Japan's scientific research

    Masao Sakauchi, Shigeki Yamada, Noboru Sonehara, Shigeo Urushidani, Jun Adachi, Kazunobu Konishi, Satoshi Matsuoka

    CTWatch Quarterly Journal   2 ( 1 )   20 - 26   2006

     More details

  • 光ネットワーク環境におけるMPI集団通信

    滝澤真一朗, 松岡聡, 松岡聡, 中田秀基, 中田秀基

    情報処理学会シンポジウム論文集   2006 ( 5 )   2006

  • BS-6-4 The Next-generation e-Science Infrastructure based on High-Speed Networking and Grid Technologies

    Matsuoka Satoshi

    Proceedings of the Society Conference of IEICE   2005 ( 2 )   "S - 60"-"S-61"   2005.9

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Books

    researchmap

  • Gridifying a Genetic Algorithm for NMR Three-dimensional Protein Structure Determination by Using Ninf-1 and Ninf-G

    ONO ISAO, MIZUGUCHI NAOAKI, NAKASHIMA NAOTOSHI, ONO NORIHIKO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, TATE SHINICHI

    46 ( 12 )   396 - 406   2005.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this paper, we parallelize the genetic algorithm (GA) for NMR protein three-dimensional structure determination, which has been proposed by Ono et al., on a grid that consists of multiple PC clusters on the WAN and report some results on the performance evaluation of the proposed system. The proposed system is parallelized with the hierarchical master-worker paradigm and consists of a master, submasters and workers. The communication between the master and each PC cluster is realized with Ninf-G, which is a secure GridRPC middleware, and that in each PC cluster is implemented by using Ninf-1, which is a fast GridRPC middleware. In the proposed system, we employ the slide transfer technique in order to hide the latency of communication on the Internet by using Ninf-G. The experimental results on the grid testbed consisting of 5 sites/1,196 CPUs showed that the proposed system effectively utilized computing resources on the grid testbed when it was applied to a problem of determining the three-dimensional structure of a 78-residue protein.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018402/

  • Gridifying a Genetic Algorithm for NMR Three-dimensional Protein Structure Determination by Using Ninf-1 and Ninf-G

    ONO ISAO, MIZUGUCHI NAOAKI, NAKASHIMA NAOTOSHI, ONO NORIHIKO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, TATE SHINICHI

    情報処理学会論文誌コンピューティングシステム(ACS)   46 ( SIG12(ACS11) )   396 - 406   2005.8

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    In this paper, we parallelize the genetic algorithm (GA) for NMR protein three-dimensional structure determination, which has been proposed by Ono et al., on a grid that consists of multiple PC clusters on the WAN and report some results on the performance evaluation of the proposed system. The proposed system is parallelized with the hierarchical master-worker paradigm and consists of a master, submasters and workers. The communication between the master and each PC cluster is realized with Ninf-G, which is a secure GridRPC middleware, and that in each PC cluster is implemented by using Ninf-1, which is a fast GridRPC middleware. In the proposed system, we employ the slide transfer technique in order to hide the latency of communication on the Internet by using Ninf-G. The experimental results on the grid testbed consisting of 5 sites/1,196 CPUs showed that the proposed system effectively utilized computing resources on the grid testbed when it was applied to a problem of determining the three-dimensional structure of a 78-residue protein.

    J-GLOBAL

    researchmap

  • Optimization of Power-Performance by controlling DVS on a PC cluster

    HOTTA YOSHIHIKO, SATO MITSUHISA, KIMURA HIDEAKI, BOKU TAISUKE, TAKAHASHI DAISUKE, MATSUOKA SATOSHI

    ARC   2005 ( 80 )   49 - 54   2005.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Recently, some high performance processors have a DVS (Dynamic Voltage Scaling) functionality that dynamically changes processor's voltage and frequency. The DVS can be used to reduce power consumption by controlling the clock frequency according to the dynamic characteristics of program execution. In parallel applications on a PC cluster, we expect the reduction of power dissipation without losing the performance by running communication phases with lower frequency by controlling DVS. In this paper, we propose a method to optimize power-performance by controlling DVS at each phase in the program based on the execution profile and observation of dynamic power consumption. We select the appropriate frequency for each phase from trial runs and decide the optimal control of DVS for actual run. In this paper, we focus on the phases of communication and computation and examine whether our proposed method will be used to optimize power-performance of some parallel benchmarks. We have conducted some experiments in two kind of PC cluster, Transmeta Crusoe and AMD Turion, and found 30% power reduction from a regular frequency of in terms of EDP possible by our method.

    CiNii Books

    researchmap

  • A scheduling system coupled with a replica management system for data-intensive applications

    MACHIDA Yuya, TAKIZAWA Shin'ichiro, NAKADA Hidemoto, MATSUOKA Satoshi

    IEICE technical report. Computer systems   105 ( 226 )   67 - 72   2005.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Existing scheduling systems for the Grid mostly handle huge I/O via a shared file system or simple staging. However, when numerous nodes access a single I/O node simultaneously, major performance degradation occurs, or in a worst case, causes I/O nodes to hang. Moreover, when a user launches a job consisting of hundreds or even thousands of tasks which share the same data set, it becomes extremely inefficient to stage essentially the same data set to each compute node after every dynamic brokering and allocation of the compute nodes. Instead, we propose to utilize a replica management system that embodies a scalable multi-replication framework as a data staging mechanism, where multiple copies could be made in O(1) transfer time as well as make intelligent reuse of already-created replicas in scheduling for efficiency. A prototype executing a sample data-intensive application proved to be quite superior to shared files or traditional staging techniques.

    CiNii Books

    researchmap

  • Job execution in Grid on customizable virtual machine

    YAMAGATA Ikuhei, AOKI Takafumi, TAKAMIYA Yasuhito, NAKADA Hidemoto, MATSUOKA Satoshi

    IEICE technical report. Computer systems   105 ( 225 )   13 - 18   2005.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    As the diversity of users' jobs increase, their requirements on execution environments have been getting richer as well. However, it would not be necessarily true that a certain execution environment satisfies their specific requirements. To fill the gap, we propose a technique that allows the user to easily generate an execution environment that is tailored to his/her own requirements. To provide the customizability, our system utilizes a virtual machine monitor and a customizable installation tool. With our initial prototype, the user submits a job using GRAM, a remote job invocation infrastructure, with a description of his/her requirements for the environment. Given the description, the system creates the tailored environment using Lucie[5], a customizable installation infrastructure, where the job is finally executed. To illustrate the effectiveness of our technique, we have conducted several preliminary studies. Based on the results, we show the system can execute jobs without interfering the existing environments, and demonstrate the environment creation time is not significantly large compared to typical job execution time on Grids. These preliminary experiments show that the proposed system achieves the customizability of execution environments on Grids.

    CiNii Books

    researchmap

  • A Flexible Configuration and Packaging Method for Cluster Installers

    TAKAMIYA Yasuhito, SAKAE Yoshiaki, YAMAGATA Ikuhei, MATSUOKA Satoshi

    IEICE technical report. Computer systems   105 ( 225 )   19 - 24   2005.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Although automated cluster installers are becoming better known, it has not attained widespread popularity for several reasons, one of which is that customization of cluster configurations according to the needs of the underlying environment as well as configuring multiple user-level packages are quite difficult for the layman. Recently proposed solutions may relieve expertise at a certain level, but are incomplete that detailed customization and/or packaging will again require expert knowledge. Instead, we propose the notion of metapackages that treats a set of packages that define a certain functionality and their mutual configurations as a templatable package in itself, and treated as a first-class entity in the installation process. We show that, with associated tools support metapackages provide very high flexibility, rigorous dependency management, ease of end-user customizability, without sacrificing performance or expressive power in cluster configurations. We demonstrate the effectiveness by implementing the metapackage feature on top of our automated cluster installer Lucie. Experiences have shown that cluster installation itself will only take 5-6 minutes after a set of necessary metapackages have been selected, with appropriate dependency and conflict checks performed. Even with low-level debugging with our support we expect that a layman can pick the necessary features from a list, get full account of possible conflicts, and build a cluster in less than an hour by resolving such dependencies with alternate picks.

    CiNii Books

    researchmap

  • MPI Environment with Load Balancing using Virtual Machine

    TATEZONO Masaki, NAKADA Hidemoto, MATSUOKA Satoshi

    IEICE technical report. Computer systems   105 ( 225 )   7 - 12   2005.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Load balancing is one of the key features for efficient execution of long-running jobs, idle-cycle harvesting, etc. We propose a technique to achieve a load-balanced MPI execution environment using a virtual machine monitor. The key idea here is that transparent migration of a MPI process running on a virtual machine would be made possible by moving the underlying virtual machine itself. Our initial implementation of the technique uses a virtual machine monitor called Xen, which has an integrated VM migration ability, and VPN for migration among different subnets. We experimentally show that the prototype successfully achieves runtime migration of MPI processes, and that the overhead due to virtualization ranges from 10% for computation-intensive applications to 50% for network-intensive ones. We also implemented a simple load-balancing algorithm on top of the prototype. Experiments with it suggest that the runtime migration is effective for efficient execution of long-running jobs even with the virtualization overhead.

    CiNii Books

    researchmap

  • Design and Implementation of NAREGI Super-Scheduler based on OGSA Architecture

    HATANAKA Masayuki, NAKANO Yasumasa, IGUCHI Yuji, OHNO Toshio, SAGA Kazushige, AKIOKA Sayaka, NAKADA Hidemoto, MATSUOKA Satoshi

    IPSJ SIG Notes   102 ( 57 )   33 - 38   2005.6

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this paper, we describe design and implementation of NAREGI Super-Scheduler based on OGSA-EMS Architecture. Through our experience of its design and implementation, we made sure that OGSA-EMS architecture is feasible. Also, we clarify the issues for the specification on resource allocation of a MPI parallel job that requires heterogeneous and many computational resources, and propose a set of extensions to OGSA-EMS components to resolve the issues.

    CiNii Books

    researchmap

  • An Interactive Job Scheduling System that Allows Job Steering by Users

    IINO AKIKO, NAKADA HIDEMOTO, SHIMODAIRA HIDETOSHI, MATSUOKA SATOSHI

    IPSJ SIG Notes   102 ( 57 )   39 - 44   2005.6

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Since the grid environment is suitable for long-runnig workflow execution, the workflow engine for grid becomes one of the hot research areas. However, applications that require user steering during its workflow execution are not addressed with existing workflow engine research. Although there are lot of work on computational steering, the requirements and nature of the steering for the long-running workflow execution are totally different from conventional computational steering, and therefore, it should be addressed differently. We designed and implemented a workflow scheduling framework that allows users to control the execution of their application. It is implemented using Condor DAGMan as a workflow engine and provides users steering capability via e-mail and web-enabled interface. We also evaluated the system with phylogenetic tree inference application.

    CiNii Books

    researchmap

  • Contribution of Information Science and Engineering Departments to informatization in Universities

    IWANO Kazuo, TOKUDA Hideyuki, MATSUOKA Satoshi, MURAKAMI Kazuaki, NISHIMURA Yoshio, YONEZAKI Naoki, Kazuo Iwano, Hideyuki Tokuda, Satoshi Matsuoka, Kazuaki Murakami, Yoshio Nishimura, Naoki Yonezaki, Software Development Laboratory:IBM Japan Ltd., Graduate School of Media and Governance Faculty of Environmental Information Keio University, Global Scientific Information and Computing Center Tokyo Institute of Technology, Department of Informatics Graduate School of Information Science and Electrical Engineering Kyushu University, Department of Computer Science Graduate School of Information Science and Engineering Tokyo Institute of Technology

    Computer Software   22 ( 2 )   1 - 20   2005.4

     More details

    Language:Japanese   Publisher:Japan Society for Software Science and Technology  

    DOI: 10.11309/jssst.22.2_1

    CiNii Books

    researchmap

  • グリッド上での遺伝アルゴリズムによるNMR蛋白質立体構造解析

    小野功, 水口尚亮, 中島直敏, 松原彬光, 小野典彦, 中田秀基, 松岡聡, 関口智嗣, 楯真一

    電気学会全国大会講演論文集   2005 ( 3 )   3.S18(11)-3.S18(14)   2005.3

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • Autonomous Deployment of Grid Monitoring Systems

    SHIROSE KEN'ICHIRO, MATSUOKA SATOSHI, NAKADA HIDEMOTO

    IPSJ SIG Notes   162 ( 19 )   1 - 6   2005.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    The problem with practical, large-scale deployment of Grid monitoring system is that, it takes considerable management cost and skills to maintain the level of quality required by production usage, since the monitoring system will be fundamentally be distributed, need to be running continuously, and will itself likely be affected by the various faults and dynamic reconfigurations of the Grid itself. Given our goal to develop a generalized autonomous management framework for Grid monitoring, we have built a prototype, on top of NWS, featuring automatic configuration of the components well as coping with single-node faults without user intervention. An experimental deployment on the Tokyo Institute of Technology's Campus Grid (The Titech Grid) consisting of over 15 sites and 800 processors has shown the system to be robust in handling faults and reconfigurations, automatically deriving an ideal configuration for the head login nodes of each PC cluster in about ten minutes.

    CiNii Books

    researchmap

  • Towards a Portable Fault Tolerant Component Framework for MPI

    JITSUMOTO HIDEYUKI, MATSUOKA SATOSHI

    IPSJ SIG Notes   2005 ( 19 )   193 - 198   2005.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Execution of MPI applications on clusters and Grid deployments suffering from node and network failures motivates the use of fault tolerant MPI implementations. Therefore, some fault tolerant MPI was implemented. But, these fault tolerant MPI implementations cannot choose easily appropriate restoration according to the environment. We present CuckooMPI, used Fault/Recovery model aware component framework. Users can get a MPI implementation according to their executing environment by the selection of the components. This paper presents the architecture of CuckooMPI, its theoretical foundation and the performance of the implementation. Preliminary evaluation using NPB-CG with 32 processes showed, CuckooMPI has 10% performance improvement compared with MPICH.

    CiNii Books

    researchmap

  • Towards a high-performance infrastructure to recover the Internet connectivity

    HAMANO TOMOYUKI, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   2005 ( 19 )   85 - 90   2005.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Many researches and developments have been done or being carried out to recover the Internet connectivity. But most of them are not suitable for Grid environment in terms of connectivity, security, independency of site policies, and high performance. We propose a infrastructure for Grid environment to recover the Internet connectivity. We have also implemented a prototype system JRouter and evaluated its performances. The result showed that the system achieves requirements of Grid environment except for high performance. So we made considerations for means of high-throughput communication.

    CiNii Books

    researchmap

  • MegaProto : A Low-Power and Compact Cluster for High-Performance Computing

    NAKASHIMA HIROSHI, NAKAMURA HIROSHI, SATO MITSUHISA, BOKU TAISUKE, MATSUOKA SATOSHI, TAKAHASHI DAISUKE, HOTTA YOSHIHIKO

    IPSJ SIG Notes   2005 ( 19 )   121 - 126   2005.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    DOI: 10.1109/IPDPS.2005.278

    Scopus

    CiNii Books

    J-GLOBAL

    researchmap

  • A Study on the Effect of Cooperative Superschedulers on the Computational Grid

    AKIOKA SAYAKA, TAKEFUSA ATSUKO, NAKADA HIDEMOTO, MATSUKOKA SATOSHI, MIURA KENICHI

    IPSJ SIG Notes   2005 ( 19 )   55 - 60   2005.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this paper, we evaluated the effect of the cooperative superschedulers on the computational Grid with a Grid simulator. Many studies have proposed to utilize superschedulers in order to achieve effective load balancing and improve the resource utilization. On the other hand, there is no deep consideration on the affects caused by superschedulers in different ways of cooperation. Through the simulation, we got many results to support the effectiveness of superschedulers in cooperation. Cooperative superschedulers shorten the waiting times of applications, and accelerate utilization of the resources. In addition to those results, we discuss on pros and cons of two major cooperative styles : tiers and distributed network.

    CiNii Books

    researchmap

  • Distributed File System with Automatic File Access Distribution for the Grid

    SATO HITOSHI, MATSUOKA SATOSHI, NAKADA HIDEMOTO

    IPSJ SIG Notes   162 ( 19 )   7 - 12   2005.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In the parallel computing environment like HPC Cluster or the Grid, some application involves large overhead due to the access concentration on the node that maintains the file. To avoid this problem on the traditional distributed file system, users have to distribute the file access manually. However, it is hard and difficult for users to do such file access distribution on the Grid environment because of its resource heterogeneousness. We claim an automatic file distribution scheme using the access concentration detection on the file system and the file replication. We implement this prototype on Gfarm and evaluate its performance. The results showed that our prototype is better performance than Gfarm in the file concentration situation.

    CiNii Books

    researchmap

  • MegaProto: 1 TFlops/10kW rack is feasible even with only commodity technology

    Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke Boku, Satoshi Matsuoka, Daisuke Takahashi, Yoshihiko Hotta

    Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC'05   2005   2005

     More details

  • ファイルへのアクセスの自動分散を行うグリッド用分散ファイルシステム

    佐藤仁, 松岡聡, 中田秀基

    コンピュータシステム・シンポジウム 2005   91 - 98   2005

     More details

  • Low Power Computing for Fleas, Mice, and Mammoth ? Do They Speak the Same Language ?

    Satoshi Matsuoka

    CTWatch Quarterly Journal   1 ( 3 )   2月11日   2005

     More details

  • GridRPCシステムNinf-GにおけるUNICOREおよびGT4によるジョブ起動

    中田秀基, 田中良夫, 関口智嗣

    情報処理学会研究報告 2005-HPC-102   45 - 50   2005

     More details

  • Towards a high-performance overlay network infrastructure for grid computing

    2005   36 - 42   2005

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • The Titech Grid ~Can a Center Sustain a Large Production Grid on Campus? ~History, Lessons Learned, and the Future~

    Satoshi Matsuoka

    4 ( 2 )   17 - 27   2005

     More details

  • Primary Study of A Task Farming API over The GridRPC Framework

    Yusuke Tanimura, Hidemoto, Nakada, Yoshio Tanaka, Satoshi Sekiguchi

    Eighth International Conference on High-Performance Computing in Asia-Pacific Region, 2005   2005   339 - 345   2005

  • Design and implementation of Condor-UNICORE bridge

    Hidemoto Nakada, Jaime Frey, Motohiro Yamada, Yasuyoshi Itou, Yasumasa Nakano, Satoshi Matsuoka

    Eighth International Conference on High-Performance Computing in Asia-Pacific Region, Proceedings   307 - 314   2005

  • Ninf-1/Ninf-Gを用いたNMR蛋白質立体構造決定のための遺伝アルゴリズムのグリッド化

    小野功, 水口尚亮, 中島直敏, 小野典彦, 中田秀基, 松岡聡, 関口智嗣, 楯真一

    先進的計算基盤システムシンポジウム SACSIS2005   143 - 152   2005

     More details

  • Megaproto: A low-power and compact cluster for high-performance computing

    Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke Boku, Satoshi Matsuoka, Daisuke Takahashi, Yoshihiko Hotta

    Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005   2005   2005

     More details

  • Low Power Computing for Fleas, Mice, and Mammoth ? Do They Speak the Same Language ?

    Satoshi Matsuoka

    CTWatch Quarterly Journal   1 ( 3 )   2005

     More details

  • インタラクティブなジョブスケジューリングシステム

    飯野彰子, 中田秀基, 中田秀基, 下平英寿, 松岡聡, 松岡聡

    情報処理学会シンポジウム論文集   2005 ( 5 )   2005

  • The Titech Grid ~Can a Center Sustain a Large Production Grid on Campus? ~History, Lessons Learned, and the Future~

    Satoshi Matsuoka

    4 ( 2 )   17 - 27   2005

     More details

  • Primary Study of A Task Farming API over The GridRPC Framework

    Yusuke Tanimura, Hidemoto, Nakada, Yoshio Tanaka, Satoshi Sekiguchi

    Eighth International Conference on High-Performance Computing in Asia-Pacific Region, 2005   2005   339 - 345   2005

  • Design and Implementation of a Fault -Tolerant RPC System : Ninf- C

    NAKADA HIDEMOTO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    45 ( 11 )   160 - 170   2004.10

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this paper, we describe design and implementation of a fault tolerant RPC system, Ninf-C. Ninf-C is designed for large-scale master-worker programs, that take from a few days to a few months for its execution. Ninf-C takes Condor, developed by University Wisconsin, as the base structure of the system. It uses file transmission and checkpointing mechanisms and provides system-wide robustness for programmers. In Ninf-C, master and workers communicate each other using file, not the socket, making crash-recovery easy. To prove robustness of the system, we performed an experiment on a heterogeneous cluster consisted of x86 and SPARC. We ran a simple but long-running master-worker program on the cluster and rebooted several machines of the cluster to disturb the program execution. As a result, the program execution finished normally, showing the robustness of Ninf-C.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018447/

  • Design and Implementation of a Fault -Tolerant RPC System : Ninf- C

    NAKADA HIDEMOTO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    情報処理学会論文誌コンピューティングシステム(ACS)   45 ( SIG11(ACS7) )   160 - 170   2004.10

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    In this paper, we describe design and implementation of a fault tolerant RPC system, Ninf-C. Ninf-C is designed for large-scale master-worker programs, that take from a few days to a few months for its execution. Ninf-C takes Condor, developed by University Wisconsin, as the base structure of the system. It uses file transmission and checkpointing mechanisms and provides system-wide robustness for programmers. In Ninf-C, master and workers communicate each other using file, not the socket, making crash-recovery easy. To prove robustness of the system, we performed an experiment on a heterogeneous cluster consisted of x86 and SPARC. We ran a simple but long-running master-worker program on the cluster and rebooted several machines of the cluster to disturb the program execution. As a result, the program execution finished normally, showing the robustness of Ninf-C.

    J-GLOBAL

    researchmap

  • Design and Implementation of Configuration Packaging Methods for Cluster Installers

    TAKAMIYA YASUHITO, MATSUOKA SATOSHI

    IPSJ SIG Notes   99 ( 81 )   55 - 60   2004.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    While the wide spread of commodity clusters, fully automatic cluster installers haven't become the setup tool of choice for many users because of its complexity and difficulty in configuration. This paper introduces methods of 1) by packaging a set of typical configurations of cluster installers into common software package format (MetaPackaging) and make it downloadable for end-users with standard package managers, allows automatic generation of customized configurations for each local sites over integrated template customization GUI, and 2) by making use of package dependency information stored at underlying software package management system and pseudo installation environment built by automatic installer, allows sanity check of contentment of dependencies between user selected software packages without actual executions of installer. Moreover, for MetaPackage developers, we deploy a helper toolkit to enable detecting package dependency problems occurs when building metapackages and code generation of metapackage customization front-end.

    CiNii Books

    researchmap

  • Scalable MultiReplication Framework on The Grid

    TAKIZAWA SHIN'ICHIRO, TAKAMIYA YASUHITO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   99 ( 81 )   247 - 252   2004.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We propose a "MultiReplication Framework" for data replication in a data grid environment, where redundant and inefficient long-haul copying of replica data is avoided on local or near-local usage of same sets of data by employing aggregated and efficient multicast-based replication schemes. The replica manager manages the replica creation centrally using an XML-based schema, and when multiple peers requests a replica in a near-simultaneous fashion, this is detected and an organized multi-replication over multiple involved peers are initiated by the use of our multi-replicaiton tool Dolly+. Benchmarks on prototype implementation show that, our scheme scales well in a realistic data grid environment constituting of multiple clusters interconnected to a distant data archiver, maintaining constant replication time even if the number of nodes increase, being superior to simpler schemes such as maintaining a local cache of replicated data within a cluster.

    CiNii Books

    researchmap

  • Design and Implementation of a Highly Portable Job Scheduling System

    MACHIDA YUYA, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   99 ( 81 )   217 - 222   2004.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We present a job-scheduling system for the Grid, Jay. Jay handles two difficulties inherent in the Grid: namely heterogeneity and instability. Jay is based on the techniques of Condor, which was developed at the University of Wisconsin, and has been implemented in Java for better portability. Although Java'does not have a secure way of changing user IDs of an arbitrary process, we resolved the problem in Jay by developing a highly-portable C++ daemon that achieves this property and can run in Java environments that ,does not support JNI. The results of small-scale experiments show its fault-tolerance and high portability.

    CiNii Books

    researchmap

  • Implementation and Evaluation of Dynamic Load Balancing for Performance Heterogeneous Clusters on Omni/SCASH

    SAKAE YOSHIAKI, MATSUOKA SATOSHI, SATO MITSUHISA, HARADA HIROSHI

    IPSJ SIG Notes   99 ( 81 )   61 - 66   2004.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Increasingly large-scale clusters of PC/WS continue to become majority platforms in HPC field. In such a commodity cluster environment, there may be incremental upgrade due to several reasons, such as rapid progress in processor technologies, or user needs and it may cause the performance heterogeneity between nodes from which the application programmer will suffer as load imbalances. To overcome these problems, some dynamic load balancing mechanisms are needed. We have implemented and reported on loop re-partitioning mechanisms based on the runtime performance so far. However, loop re-partitioning involeves changes of data access ranges so that some applications whose performance rather depends on data locality shows performance degradation. In this paper, we report our recent work on page migration mechanisms based on page reference counting and its performance. Results show that we can acheive about 60% performance gain with Laplace on 4 nodes cluster by page migration and restore the performance that degraded by loop re-partitioning.

    CiNii Books

    researchmap

  • Design and Implementation of a Cluster-Aware Fault Injector

    MARUYAMA Naoya, MATSUOKA Satoshi

    IEICE technical report. Dependable computing   104 ( 239 )   25 - 30   2004.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    We report our design and implementation of a cluster-aware fault injector, or CFI, which enables systematic reproduction of faulty clustered environments by software. CFI focuses on building a testbed for research on fault-tolerant cluster computing. It allows to investigate the behavior of systems software and applications on clustered environments when user-specified faults happen. It consists of a Linux kernel module, which accounts for injecting network-related faults into a running operating system of each node, and a set of various tools, which controls the kernel module. It injects a fault into the entire cluster nodes based on a given fault scenario written by a user. Our preliminary experiments showed that its intrusiveness is not significantly large, and that it is a promising tool for further research on fault tolerance.

    CiNii Books

    researchmap

  • Design and implementation of Speculative Checkpointing

    YAMAGATA Ikuhei, JITSUMOTO Hideyuki, NAKADA Hidemoto, MATSUOKA Satoshi

    IEICE technical report. Dependable computing   104 ( 239 )   31 - 36   2004.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Checkpointing parallel processes causes high temporal and spatial pressure to I/O subsystems. To decrease the pressure, we propose a new Checkpointing technique, called Speculative Checkpointing, that improves upon incremental checkpointing by speculatively distributing the checkpointing workload and avoiding the necessity of file synchronization. Experimentation with our prototype Speculative Checkpointer on a variety of parallel workload on a cluster showed marked improvements when speculation works effectively, exhbiting up to 33% improvement over conventional incremental checkpointing schemes. We expect that, in a production environment with larger number of nodes and dedicated backend checkpointing storage this improvement would be even higher.

    CiNii Books

    researchmap

  • 耐故障性を重視したRPCシステムNinf‐Cの設計と実装

    中田秀基, 田中良夫, 松岡聡, 関口智嗣

    情報処理学会シンポジウム論文集   2004 ( 6 )   77 - 84   2004.5

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • The Future of Grid Computing with Optical Networks : Future Grids with Fast Optical Networks

    MATSUOKA Satoshi

    104 ( 1 )   1 - 4   2004.4

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Future Grid infrastructures for high-performance, large-scale science will require high-bandwidth, low-latency optical networks. The talk will introduce several projects that base themselves on such network infrastructures, and furthermore will describe future Grids that will assume peer-to-peer optical connectivity.

    CiNii Books

    researchmap

  • Parallelization of the Genetic Programming using Jojo

    TOKUDA TAKU, TANAKA KOUJI, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   157 ( 20 )   187 - 192   2004.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Estimating mutual interactions of genetic networks is mainly to infer the mutual control relationships from multiple genes from the gene expression data. Such correlations are typically expressible in the form of nonlinear simultaneous differential equations. However, most work to date has employed S-systems as an expression of such differential equations, allowing only rough approximations of mass actions, and as such it was difficult to determine the actual correlations between the genes. Instead, we formulate the mutual interactions as actual simultaneous partial differential equations, and automatically determine its structure and coefficients using genetic programming (GP) from a given data series. Parallel implementation of the scheme in a Grid environment using our Jojo Grid programming system for Java has resulted in precise determination of the equations in many cases within some reasonable time.

    CiNii Books

    researchmap

  • Examination of the job execution on VM in Grid Environment

    OGURA SHOJI, KOUNO KENJI, MATSUOKA SATOSHI, NAKADA HIDEMOTO

    IPSJ SIG Notes   157 ( 20 )   25 - 30   2004.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Despite recent proposals for fine-grained resource control on Grid computing nodes using virtual machine technologies, the impact of the respective virtualization schemes, as well as feasibility of actually imposing control, has not been well investigated in a comprehensive fashion. We propose a virtual machine framework for the Grid that allows selection of different virtualization schemes depending on application characteristics, and (2) perform comprehensive measurements of the impact of individual schemes, as well as when the schemes are actually used for resource control, and derive a guideline that would lead to (semi-) automated selection of virtualization schemes. The created prototype runs as a job manager in Globus 2.4, and allows selection of virtualization schemes, as well as pluggable resource control depending on the user and the intended policy. Benchmarks using NPB2.4 show that we can minimize the overhead by appropriate selection of virtualization schemes, as well as deriving several guidelines such as communication-intensive applications favor virtualization via library call interpositions, whereas virtualization of multiple process tend to favor kernel modules.

    CiNii Books

    researchmap

  • Parallelization of phylogenetic tree inference using Grid technology

    YAMAMOTO YO, NAKADA HIDEMOTO, SHIMODAIRA HIDETOSHI, MATSUOKA SATOSHI

    IPSJ SIG Notes   157 ( 20 )   181 - 186   2004.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    The maximum likelihood method is considered as one of the most reliable methods forphylogenetic tree inference. But if the number of species increases, it becomes impossible to calculate all phylogenetic trees, since the number of the trees increses explosively. An approximation method using split decomposition is proposed. It reduces calculation cost drastically, although, the calculation cost for larger number of species is still too high. We propose a method to reduce the cost using combinatorial optimization technique. We also parallelize it in a master-worker style using Grid Middlewares. The 64.0 times speedup is obtained as the result of using 16 workers in the problem of 9 species.

    CiNii Books

    researchmap

  • Design and implementation of Condor-UNICORE bridge

    NAKADA HIDEMOTO, FREY JAIME, YAMADA MOTOHIRO, ITOU YASUYOSHI, NAKANO YASUMASA, MATSUOKA SATOSHI

    IPSJ SIG Notes   157 ( 20 )   37 - 42   2004.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this paper, we describe design and implementation of a Generic Grid interface for Condor. Though Condor has intefaces for specific Grid systems, such as Globus GRAM, it is not easy to add new interface for other Grid systems, since it will require some code modification inside the Condor. With our new interface, supporting a new Grid system can be established without any code modification in Condor itself. We also implemented a bridge for UNICORE system and validated that our approach is effective.

    CiNii Books

    researchmap

  • Current Status of Grid Computing Projects in Japan(Special Issue on Large-scale Computer Simulation)

    SHIMOJO Shinji, SEKIGUCHI Satoshi, MIURA Kenichi, MATSUOKA Satoshi

    SYSTEMS, CONTROL AND INFORMATION   48 ( 7 )   244 - 249   2004

     More details

    Language:Japanese   Publisher:THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS  

    DOI: 10.11509/isciesci.48.7_244

    CiNii Books

    researchmap

  • MegaProto : A Prototype of the Ultra Low-Power Mega-Scale System

    NAKASHIMA HIROSHI, NAKAMURA HIROSHI, SATO MITSUHISA, BOKU TAISUKE, MATSUOKA SATOSHI

    IPSJ SIG Notes   96 ( 102 )   85 - 90   2003.10

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    This paper gives the conceptual design of the MegaProto machine, a prototype mega-scale system developed in a research project named "Mega-Scale Computing Based on Low-Power Technology and Workload Modeling". The MegaProto is a prototype implementation of our key idea that million-scale parallel systems should be built with densely mounted low-power processors. It will also act as a platform to implement and evaluat our new technologies such as power conscious compilation, highly reliable and high performance network, highly dependable cluster management and multi-level parallel programming. The building block of the MegaProto is a 1U height and 19 inch-rack mountable mother-board unit on which 16 low-power processors are mounted with a high bandwidth, 2 Gbps per processor, network. The peak performance of the unit is 14.4 GFlops and the intra-and inter-unit network bandwidth are 32 Gbps and 8 Gbps respectively, while the unit consumes 300 W power at most to achieve high performance and density with low power consumption.

    CiNii Books

    researchmap

  • A Task-Farming API on GridRPC and its implementation

    NAKADA HIDEMOTO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   96 ( 102(HPC-96) )   61 - 66   2003.10

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Task-farming means that to compute huge number fo tasks for different parameter on large number of computers. Grid RPC is a kind of middleware on the Grid, that is considered to be suitable for task-forming. While Grid RPC provedes basic functionarity for task-forming, it lacks high-level features such as scheduling or fault tolerance, due to its design principle, and burdens application programmer to implement them. In this paper we descrive Task-forming API implemented on the GridRPC API. It is designed to ease the burden and to support master-worker computation. We also the implementation of the API and a sample program which uses the API.

    CiNii Books

    J-GLOBAL

    researchmap

  • グリッドコンピューティングの現状と未来

    中田 秀基, 松岡 聡

    計算数理工学レビュー   2003 ( 1 )   9 - 12   2003.10

     More details

    Language:Japanese   Publisher:日本計算数理工学会  

    researchmap

  • Lucie : A Fast Installation and Administration Tool for Large - scaled Clusters

    TAKAMIYA YASUHITO, MANABE ATSUSHI, MATSUOKA SATOSHI

    44 ( 11 )   79 - 88   2003.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Rapid increase in the number of nodes for commodity clustering is mandating the handling the potential cost of setup and maintenance clusters as the norm. Moreover, with arising of data intensive applications which requires several GBs of data on each cluster nodes, it is revealed that there were no installation tool aimed at installation-time setup of such large-scaled data. In this paper, we propose a new cluster installation/administration tool called Lucie which allows network boot/installation mechanism with no specific installation media and configurability which allows reconstruction of installer itself on demand. Additionally, we propose a new data distribution mechanism called Dolly+ which deploys fault tolerant, high-speed virtual ring topology data transferring. With Dolly-)-, one could distribute several GBs of images to all cluster nodes in installation-time maintaining fault tolerance. Our several benchmarks show that Lucie and Dolly+ can install and setup the whole cluster in constant time. This result shows that Lucie and Dolly+ are scalable and efficient, and could well serve as a basis for 'Plug-and-Play' clustering.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018534/

  • A Java - based Programming Environment for the Grid : Jojo

    NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    44 ( 11 )   46 - 56   2003.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    This paper introduces a Java-based programming environment for the Grid; Jojo. Jojo is a distributed programming environment implemented in Java, which is suitable for hierarchal grid environment. Jojo provides several features, including remote invocation using Globus GRAM, intuitive message passing API suitable for parallel execution and automatic user program staging. Using Jojo, users can construct parallel distributed application on the Grid with ease. In this paper, we show design and implementation of Jojo, its programming API, configuration file syntax and a working program example. We also show preliminary performance evaluation results that prove effectiveness of multi-hierarchal execution.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00018531/

  • A Java - based Programming Environment for the Grid : Jojo

    NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    情報処理学会論文誌コンピューティングシステム(ACS)   44 ( SIG11(ACS3) )   46 - 56   2003.8

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    This paper introduces a Java-based programming environment for the Grid; Jojo. Jojo is a distributed programming environment implemented in Java, which is suitable for hierarchal grid environment. Jojo provides several features, including remote invocation using Globus GRAM, intuitive message passing API suitable for parallel execution and automatic user program staging. Using Jojo, users can construct parallel distributed application on the Grid with ease. In this paper, we show design and implementation of Jojo, its programming API, configuration file syntax and a working program example. We also show preliminary performance evaluation results that prove effectiveness of multi-hierarchal execution.

    J-GLOBAL

    researchmap

  • You Don't Really Need Big Fat Switches Anymore-Almost

    MATSUOKA SATOSHI

    IPSJ SIG Notes   154 ( 84 )   157 - 162   2003.8

     More details

    Language:English   Publisher:Information Processing Society of Japan (IPSJ)  

    Although commodity cluster computing based on very fast and inexpensive commodity processors are proliferating today, one of the prohibitive factors towards its large-scale deployment is the high cost of the network switching fabric in order to retain properly high bandwidth. We argue that, except for the most demanding applications, appropriate aggregation of inexpensive switches, with collective communication algorithms that utilize the characteristics of such networks, will accommodate a bulk of parallel applications, even those with substantial communication requirements. We present 3 techniques for implementing high-bandwidth collective communications in such a setting, and provide preliminary performance measurements that hint the effectiveness of our proposal. The technique can be extended to interconnect a set of clusters for implementing high-bandwidth Grid interconnect as well as replacing SAN for high-bandwidth I/O.

    CiNii Books

    researchmap

  • Execution of the replica exchange molecular dynamics simulator on the Grid

    SATO HITOSHI, ITO MASAKATSU, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   95 ( 83 )   41 - 46   2003.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Replica-exchange method is considered to be suitable for execution on the Grid environment because of its large granurality and small size data transfer. To confirm the suitability, we performed several experiments on various environment, using an application program called REMD toolkit that implements Replica-exchange method. We also improved the REMD toolkit to cope with performance-heterogeneous environment. The results showed that, 1) REMD toolkit is scalable upto around 100 workers, 2) the improved version is faster than original version in the performance-heterogeneous environment.

    CiNii Books

    researchmap

  • Development of A Grid Portal Construction Toolkit(PCT4G) Supporting Application Installation and Data Distribution/Update

    SHIRASUNA SATOSHI, SUZUMURA TOYOTARO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   95 ( 83 )   173 - 178   2003.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    As Grid technologies become more practical, a number of Grid Portals have been constructed and used in various fields to offer user friendly interfaces for Grid resources. Along with that, several toolkits to generate Grid portals have been developed in order to reduce the burden of portal developers. However, even with the aid of those toolkits, portal developers still have to install target applications on each node on the Grid. In addition to that, it is necessary tokeep application data up to date for some applications, especially applications in bioinformatics field. In order to automate these tasks, we are implementing a toolkit. PCT4G, which automates application installation, data management, and interface generation. Also, users can construct Grid Portals for their own applications on the fly through Web interfaces of PCT4G.

    CiNii Books

    researchmap

  • Automatic Management System for Monitoring System on the Grid

    SHIROSE KEN'ICHIRO, OGAWA HIROTAKA, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   95 ( 83 )   89 - 94   2003.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Monitoring CPU, memory and disk usage and network performance is needed inGrid Computing environment. Generally, monitoring systems on Grid Computing consist of multiple components based on own functions. It is difficult to configure them all manually, because there are many dependencies between them and monitoring systems must run continuously. We propose a automatic management system for monitoring system on Grid Computing, implement a part of functions and evaluate its usefulness.

    CiNii Books

    researchmap

  • Towards a C Language Hosting Environment for OGSA

    HAMANO TOMOYUKI, NAKADA HIDEMOTO, SUZUMURA TOYOTARO, MATSUOKA SATOSHI

    IPSJ SIG Notes   95 ( 83 )   179 - 184   2003.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    OGSA is a new architecture that is hubrid of traditional Grid architecture and Web services. OGSI base OGSA has two issues: 1) current OGSI implementations do not provide hosting environment for C language, and 2) XML-based protocol communication decline performance. In This paper, we propose a system that provides C hosting environment on OGSI and provide a auxiliary tool that eases developing services on the system. We also show performance evaluation results that prove effectiveness of the system and issues of OGSI implementations.

    CiNii Books

    researchmap

  • Implementation and Evaluation of a Fault Tolerant MPI with Reliable TCP/IP Sockets

    JITSUMOTO HIDEYUKI, TAKAMIYA YASUHITO, MATSUOKA SATOSHI

    IPSJ SIG Notes   95 ( 83 )   149 - 154   2003.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    On cluster systems, failure rates tend to be high due to a large number of constituents. Therefore, to perform stable long-time computation on cluster systems, middleware support for fault-tolerancy is inevitably required. We implemented a fault-tolerant MPI prototype system and measured the overhead of the system. Our MPI system implements coordinated checkpointing and recovery protocol on MPICH using a single process checkpointer called ckpt and a reliable network called Rocks. Preliminary evaluation using NPB-CG with 32 processes showed the overhead posed by Rocks stayed within just 8%.

    CiNii Books

    researchmap

  • Javaによる階層型グリッド環境Jojoの設計と実装

    中田秀基, 松岡聡, 関口智嗣

    情報処理学会シンポジウム論文集   2003 ( 8 )   113 - 120   2003.5

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • An Efficient NAS Parallel Benchmarks Algorithm for Heterogeneous Clusters

    SASOU TAKERU, MATSUOKA SATOSHI

    IPSJ SIG Notes   93 ( 29 )   1 - 6   2003.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this study, we implemented the optimization of the Kernel Benchmarks of NAS Parallel Benchmarks for a heterogeneous cluster system and evaluated on the CPU heterogeneous cluster. We used the technique of optimization that load sharing by changing data size corre sponding to a performance of each nodes. From the experimental results, our method achieves improvement of performance on EP, IS, and MG. But in the case of CG and FT, increase of a communicative overhead affects the performance, and the performance of our method less than original CG and FT.

    CiNii Books

    researchmap

  • A Parallel Minimal Generation Gap Model Using Ninf for Evolutionary Analysis of Protein Structures and Its Performance Evaluation

    ONO Isao, IMADE Hiroaki, NOKADA Hidemoto, ONO Norihiko, MATSUOKA Satoshi, SEKIGUCHI Satoshi, TATE Shin-ichi

    IPSJ SIG Notes   93 ( 29 )   119 - 154   2003.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Nuclear Magnetic Resonance (NMR) spectroscopy is a promising method for the three-dimensional structure of proteins that is one of the most important problems in post-sequence era. This method has a serious problem that it takes several months for experts to analyze the data of only one protein. In order to remedy the problem, Ono et al have proposed an automatic method based on a genetic algorithm (GA) for analyzing the data and determining the three-dimensional structures of proteins and reported that, they had good results on relatively small-size problems. In this report, to speed up the GA, we propose an parallel implementation of the generation alternation model, Minimal Generation Gap (MGG), which is employed in the GA. In the implementation, we employ Ninf proposed by National Institute of Advanced Industrial Science and Technology (AIST) as a middleware. In order to examine the performance, we perform some experiments.

    CiNii Books

    researchmap

  • Protein structure optimization using Genetic Algorithm on Jojo

    NAKADA HIDEMOTO, NAKAJIMA NAOTOSHI, ONO ISAO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, ONO NORIHIKO, TATE SHIN'ICH

    IPSJ SIG Notes   93 ( 29 )   155 - 160   2003.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Java language is suitable for Grid environment due to its 1) portability among heterogeneous architechtures, 2) integrated multi-thread capability that can effectively hide latency. Genetic Algorithm (GA) is a good as an application area for the Grid, because its affinity for parallel execution. From these viewpoints, we are developping a Java-based programming framework for GA called JP0P-GA. However, we did not have concrete knowledge on effective parallel implementation for GA. In this paper, we implemneted a real GA application on top of Java-based Grid programming environment Jojo in several parallelization methods. As an application, we deployed protein 3-dimensional structure optimization using NMR spectroscopy. We performed several experiments on a Grid environment and obtained knowledge on parallelization of GA applications.

    CiNii Books

    researchmap

  • Implementation and Evaluation of Dynamic Load Balancing Using Runtime Performance Monitoring on Omni/SCASH

    SAKAE YOSHIAKI, MATSUOKA SATASHI, SATO MITSUHISA, HARADA HIROHSI

    IPSJ SIG Notes   93 ( 29 )   131 - 136   2003.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In the commodity cluster environment, there may be performance heterogeneity between nodes due to several reasons, from which the application programmer will suffer as load imbalances. To overcome these problems, some dynamic load balancing mechanisms are needed. In this paper, we report our ongoing work on dynamic load balancing extension to Omni/SCASH. Using our dynamic load balancing mechanisms, we expect that programmers can have load imbalances adjusted automatically by the runtime system without explicit definition of data and task placements in a commodity cluster environment with possibly heterogeneous performance nodes. The results of evaluation indicates that, our loop re-partitioning scheme which is one of the dynamic load balancing extension works well, and also it is important to combine loop re-partition with dynamic page migration.

    CiNii Books

    researchmap

  • Preliminary evaluation of dynamic load balancing using loop re-partitioning on Omni/SCASH

    Y Sakae, S Matsuoka, M Sato, H Harada

    CCGRID 2003: 3RD IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS   463 - 470   2003

  • 蛋白質立体構造の進化的解析のためのNinf 版並列MGG とその性能評価

    小野功, 今出広明, 中田秀基, 小野典彦, 松岡聡, 関口智嗣, 楯真一

    情報処理学会研究報告 2002-HPC-93(HOKKE2003)   149 - 154   2003

     More details

  • Omni/SCASH のループ再分割を用いた動的負荷分散拡張の実装と評価

    栄純明, 松岡聡, 佐藤三久, 原田浩

    先進的計算基盤システムシンポジウム SACSIS2003 論文集   307 - 314   2003

     More details

  • Evaluation of the inter-cluster data transfer on Grid environment

    S Ogura, S Matsuoka, H Nakada

    CCGRID 2003: 3RD IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS   374 - 381   2003

  • グリッド環境に適した並列組み合わせ最適化システム jPoP における分枝限定法の実装

    秋山智宏, 中田秀基, 松岡聡, 関口智嗣

    第6回プログラミングおよび応用のシステムに関するワークショップ SPA 2003   2003

     More details

  • Lucie: 大規模クラスタに適した高速セットアップ・管理ツール

    高宮安仁, 真鍋篤, 松岡聡

    先進的計算基盤システムシンポジウム SACSIS2003 論文集   365 - 372   2003

     More details

  • A Java-based Programming Environment for the Grid : Jojo

    NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   92 ( 99 )   31 - 36   2002.10

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    This paper introduces a java-based programming environment for the Grid; Jojo. Jojo is a distributed programming environment implemented in Java, which is suitable for hierarchal grid environment. Jojo provides several features, including remote invocation using Globus GRAM, intuitive message passing API suitable for parallel execution and automatic user program staging. Using Jojo, users can construct parallel distributed application on the Grid. In this paper, we show design and implementation of Jojo, its programming API, configuration file syntax and a working program example. We also show preliminary performance evaluation result.

    CiNii Books

    researchmap

  • Grid Portal Toolkit Ninf-Portal

    NAKADA HIDEMOTO, SAITO MASAYUKI, SUZUMURA TOYOTARO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    情報処理学会論文誌. ハイパフォーマンスコンピューティングシステム   43 ( 5 )   172 - 183   2002.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    As the Grid proliferates as the next-generation wide-area high-performance computing infrastructure, end-user Grid interfaces in the form of "Grid Portals" is becoming increasingly important, especially computational scientists and engineers. Although several Grid portal toolkits and proposals have been proposed, a Grid Portal creator must construct and deploy both the user interface and the application portions of the Grid Portal, resulting in considerable programming efforts. We aim to easen this burden by applying the state-of-the-art Web/XML interface generation technologies for the former, and the Ninf-G GridRPC system for easily "Gridifying" exisiting applications for the latter, and realizing their seamless integration. The resulting system which we call the "Ninf Portal" allowed concise description and easy deployment of a sample application on the Grid with very small programming efforts.

    CiNii Books

    J-GLOBAL

    researchmap

  • Grid Datafarm Architecture for Global Petascale Data-intensive Computing

    TATEBE OSAMU, MORITA YOUHEI, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, SODA NORIYUKI

    43 ( 5 )   184 - 195   2002.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Grid Datafarm architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage using a grid of clusters with tens of thousands of nodes. New parallel I/O APIs and file-affinity scheduling make it possible to achieve scalable I/O bandwidth and scalable parallel processing. Preliminary performance evaluation of a Gfarm reference implementation has shown scalable disk I/O and network bandwidth on the Presto III Athlon cluster. Gfarm parallel I/O write and read operations has achieved 1.74 GB/s and 1.97 GB/s, respectively, using 64 cluster nodes. Gfarm parallel file copy achieved 443 MB/s with 23 parallel streams on the Myrinet 2000.

    CiNii Books

    researchmap

  • Evaluation of the inter-cluster data transfer on Grid environment

    OGURA SHOJI, MATSUOKA SATOSHI, NAKADA HIDETOMO

    IPSJ SIG Notes   91 ( 80 )   155 - 160   2002.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Large-scale storage systems to be utilized in DataGrid settings implemented by interconnecting and federating large-scale storage clusters is being proposed and constructed. On peer-to-peer data transfer between two large clusters, two major factors are involved: on one hand network pipes with large RTT×bandwidth typically become data-starved, resulting in bandwidth loss whereas when multiple nodes on the clusters attempt simultaneous transfer, the network pipe could become saturated, resulting in packet loss which again may result in bandwidth degradation in large RTT×bandwidth networks. By dynamically and automatically adjusting transfer parameters between the two clusters, such as the number bf network nodes, number of socket stripes, we could achieve optimal bandwidth even when the network is under heavy contention. We have conducted several simulations on a few environments to evaluate and determine the appropriate transfer parameters for this purpose.

    CiNii Books

    researchmap

  • Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture for High Energy Physics Applications

    TAKEFUSA ATSUKO, TATEBE OSAMU, MATSUOKA SATOSHI, MORITA YOUHEI

    IPSJ SIG Notes   91 ( 80 )   137 - 142   2002.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Data Grid is a Grid environment for ubiquitous access and analysis of large-scale data. Due to its early research status, the performance of petabyte-scale Data Grid models in a realistic data processing setting have not been well investigated. By enhancing our Bricks Grid simulator to be able to simulate Data Grid scenarios, we investigate and compare the performance of different Data Grid models in the Grid Datafarm architecture, mainly categorized into the central and the tier models but with varying scheduling and replication strategies, under realistic assumptions of job processing for the CERN LHC experiments.

    CiNii Books

    researchmap

  • Lucie : A fast installation and administration tool for large-scaled clusters

    TAKAMIYA YASUHITO, MANABE ATSUSHI, SHIRASUNA SATOSHI, MATSUOKA SATOSHI

    IPSJ SIG Notes   91 ( 80 )   131 - 136   2002.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Rapid increase in the number of nodes for commodity clustering is mandating the handling the potential cost of setup and maintenance clusters as the norm. Moreover, with arising of data intensive applications which requires several GBs of data on each cluster nodes, it is revealed that there were no installation tool aimed at installation-time setup of such large-scaled data. In this paper, we propose a new cluster installation/administration tool called Lucie which allows network boot/installation mechanism with no specific installation media and configurability which allows reconstruction of installer itself on demand. Additionally, using data distribution mechanism with virtual ring topology network, one could distribute several GBs of images to all cluster nodes in installation-time maintaining fault tolerance. Our several benchmarks show that Lucie can install and setup whole cluster in constant time. This result show's that Lucie is scalable and efficient, and could well serve as a basis for Plug-and-Play clustering.

    CiNii Books

    researchmap

  • Dynamic Application Development and Execution Environment for Grid Portals

    SUZUMURA TOYOTARO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   91 ( 80 )   191 - 196   2002.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    As the Grid proliferates as the next-generation computing infrastructure, a user interface in the form of "Grid Portals" is becoming increasingly important, especially for computational scientists and engineers. Although several Grid Portal toolkits have been proposed, the portal developer still must build and deploy both the user interface and the application, which results in considerable programming efforts. We aim to ease this burden by generating the portal frontend (that constitutes of JSP and Java Servlets) from a XML document for the former, and a GridRPG system, Ninf-G for easily "gridifying" existing applications for the latter, and realizing their seamless integration. The resulting system, which we call the Ninf Portal, allowed concise description and easy deployment of a real Grid application with greatly small programming efforts. This paper describes the off-the-self architecture which automatically generates an application portal by developing a Grid application by the use of an scripting language in an interactive way and giving user interface information on the web page. This allows portal users to utilize a large variety of applications including default applications defined by portal administrators as well as user-defined applications generated by this architecture.

    CiNii Books

    researchmap

  • Performance Tuning High-Performance Linpack (HPL)

    SASOU TAKERU, MATSUOKA SATOSHI

    IPSJ SIG Notes   91 ( 80 )   125 - 130   2002.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    HPL is one of the implementation of LINPACK benchmark and is used for performance evaluation of Top500 by many users. We can achieve good performance by tuning parametars, but it is difficult to determine the best parameter since HPL has many parameters. So, the information about a parameter setup of HPL on various parallel systems of is very useful for users. In this paper, we exhibit the configuration of HPL when PrestoIII cluster ranked as the 47th place in the 19th Top500 list, and evaluate in all kinds of parametar setting on PrestoIII cluster. Therefore, we acquired the knowledge about the line of the best parameter tuning.

    CiNii Books

    researchmap

  • A Proposal for Parallel Combinatorial Optimization System for the Grid

    AKIYAMA TOMOHIRO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   91 ( 80 )   143 - 148   2002.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    For combinatorial optimization problems, which compute the optimal value of a multidimensional parameter function, serveral methods are known to be effective, such as Branchand-Bound methods, Genetic Algorithm, etc. Since these methods can be massively parallelized and the granularities of computation tasks are easily controllable, they are considered to be suitable for executing on the Grid. However, distributed parallel programming on the Grid is quite complicated and furthermore setting up the Grid-wide computing environment is a heavy burden. Here, we propose a system called jPoP, which makes it easy to develop and execute optimization-problem solvers on the Grid. To support the development, the jPoP provides a template class for each algorithm. And to reduce the cost of the setup, it automatically stages the user programs to the Grid environment. This paper describes the design and implementation of the jPoP system. The template classes for Genetic Algorithms are also shown.

    CiNii Books

    researchmap

  • Evaluating Web Service Based Implementations of GridRPC

    SHIRASUNA SATOSHI, NAKADA HIDEMOTO, MASTUOKA SATOSHI, SEKIGUCHI SATOSI

    IPSJ SIG Notes   91 ( 80 )   197 - 202   2002.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    GridRPC is a class of Grid middleware for scientific computing. Interoperability has been an important issue, because current GridRPC systems each employ its own protocol. Web services, where XML-based standards such as SOAP and WSDL are expected to see widespread use, could be the medium of interoperability; however, it is not clear 1) if XML-based schemas have sufficient expressive power for GridRPC, and 2) whether performance could be made sufficient. Our experiments indicate that the use of such technologies are more promising than previously reported. Although a naive implementation of SOAP-based GridRPC has severe performance overhead, application of a series of optirnizations improves performance. However, encoding of various features of GridRPC proved to be somewhat difficult due to WSDL limitations. The results show that GridRPC systems can be based on Web technologies, but there needs to be work to extend WSDL specifications.

    CiNii Books

    researchmap

  • Grid as the Future of Wide-Area Distributed Computing

    MATSUOKA Satoshi

    7 ( 3 )   529 - 532   2002.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • A Case Study of Access Grid Node Construction and a Global Technical Conference

    SHUDO Kazuyuki, TANAKA Yoshio, KOMATSU Hiroyuki, MATSUOKA Satoshi, NANRI Takeshi, OKAMURA Koji, SEKIGUCHI Satoshi

    IPSJ SIG Notes   147 ( 22 )   31 - 36   2002.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Access Grid represents a project and a software suite that support human interaction across the grid. The main and basic part of the technology is a large-scale video conference system. We have designed and constructed a package Access Grid node, named "Delivery Grid". Utilizing the technology, SC Global was held at the SC2001 conference. It is the first global technical conference on the Grid. Over 40 nodes attended to the event. We contributed to the event by planning and hosting a panel discussion related to Asia-Pacific Grid and attending from Japan and Denver. This paper describes our experiences in the construction of Access Grid and the SC Global.

    CiNii Books

    researchmap

  • Implementation and Evaluation of a Scalable Job Management Architecture for Large-Scale PC Cluster on the Grid Environment

    IWASAKI SATORU, MATSUOKA SATOSHI, SODA NORIYUKI, HIRANO MOTONORI, TATEBE OSAMU, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   147 ( 22 )   37 - 42   2002.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this paper we describe the design and implementation of the job launch architecture for Grid Data Farm(Gfarm) system. Gfarm system is composed of PG clusters with ten thousands of nodes on the Grid. Gfarm system uses GSI for communication and authentication between nodes. Because of this, if an ingenuous method is used to start a job on the Gfarm system, the GSI authentication cost which is in proportion to the number of nodes occurs, and expects that the start of the job which consists of thousands of processes takes several thousand seconds. We avoid the authentication cost by using the connection which has been established in advance. Our system shows that the job launching time is 3.5 second with 15 nodes and 6 second with 63 nodes. We think that we can achieve more scalability by improving job-launching protocol.

    CiNii Books

    researchmap

  • 情報処理学会の終焉? : IPSJのあり方

    村岡 洋一, 土居 範久, 戸田 巖, 萩谷 昌己, 松岡 聡

    情報処理   43 ( 2 )   37 - 37   2002.2

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    学会で「学会のあり方」を議論すると, 非常に単純化した言い方とするならば, 次のような図式になります.(1)学会は論文のaccreditaion機関(2)加えて, 研究発表の機会提供機関(3)これらの活動をコストパフォーマンスよく行いたい.そのためには経済的基盤の確立が必須(4)さあ, 一般会員の数を増やそう.会員のメリットのために質の良い会誌が不可欠(5)でも市販雑誌に太刀打ちできるかしら(6)ということで, あの手この手の会員エンタテインメント策の立案でもこれだけで本当にいいのでしょうか.たとえば, 以下のような疑問があります.(1)研究活動の場として, 学会はドッグイヤーとか称するこの時代に対応できるほど, 身軽に動いているのでしょうか?(2)そもそもそれほど大事にしているはずの研究活動が, 本当に世のため, 自分のためになるものでしょうか?単なる自己満足だけでなければいいのですが.(3)身軽に, 素晴らしい研究成果を世に問う場所である学会が, 重たくなっていないでしょうか.なんでもできる場である学会になるためには, もっと強力なサポートインフラの構築がいらないでしょうか?(4)そもそも学会は, 研究成果の発表の場だけでいいのでしょうか?これからの大不況時代を技術者・研究者として生き延びていくために, もっと智恵を発揮する場になる必要はないでしょうか?このような疑問も含め, 常日ごろから学会のあり方について「建設的な破壊的ご意見(?)」をお持ちの論客の方々にご参加いただき, 「春の嵐」を巻き起こしたいと思います.若人よ, 黙っていると学会は解体されてしまいますよ!!

    CiNii Books

    researchmap

  • Overview of GridRPC: A remote procedure call API for grid computing

    K Seymour, H Nakada, S Matsuoka, J Dongarra, C Lee, H Casanova

    GRID COMPUTING - GRID 2002   2536   274 - 278   2002

     More details

    Language:English  

    Web of Science

    researchmap

  • Titech Grid : Toward the Next Generation Computation Infrastructure

    NAKADA Hidemoto, MATSUOKA Satoshi

    The Proceedings of The Computational Mechanics Conference   2002 ( 0 )   685 - 686   2002

     More details

    Language:Japanese   Publisher:The Japan Society of Mechanical Engineers  

    Evolving "e-science" requires more and more computing power. Considering power consumption and cooling, it is getting impossible to provide computing resources from a single centralized site, like ordinary "University Computer Center". Global Scientific Information and Computing Center (GSIC) at Tokyo Institute of Technology deployed a new form of the computation infrastructure, called Titech Grid, utillzing Grid technology, commodity PC cluster technology and newly introduced hi-speed network. Here, we give presice description of Titech Grid configuration and operation.

    DOI: 10.1299/jsmecmd.2002.15.685

    CiNii Books

    researchmap

  • The Ninf Portal : An Automatic Generation Tool for Computing Portals

    Toyotaro Suzumura Hidemoto Nakada, Masayuki Saito, Satoshi Matsuoka, Yoshio Tanaka, Satoshi Sekiguchi

    Joint ACM Java Grande - ISCOPE 2002 Conference, Seattle, Washington, November 3-5, 2002   2002

     More details

  • Evaluating Web Services Based Implementations of GridRPC.

    Satoshi Shirasuna, Hidemoto Nakada, Satoshi Matsuoka, Satoshi Sekiguchi

    In Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002)   237 - 245   2002

  • Towards Dynamic Load Balancing Using Page Migration and Loop Re-partitioning on Omni/SCASH

    Yoshiaki Sakae, Satoshi, Matsuoka Mitsuhisa Sato, Hiroshi Harada

    In Proceedings of The Fourth European Workshop on OpenMP (EWOMP 2002)   2002

     More details

  • Grid datafarm architecture for petascale data intensive computing

    Osamu Tatebe, Youhei Morita, Satoshi Matsuoka, Noriyuki Soda, Satoshi Sekiguchi

    2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2002   2002

     More details

  • ユーザー透過な耐故障製を実現するMPIへ向けて

    高宮安仁, 松岡

    情報処理学会・電気通信処理学会 並列処理シンポジウム JSPP2002 論文集   217 - 224   2002

     More details

  • ヘテロなクラスタ環境における並列LINPACKアルゴリズム

    笹生健, 松岡聡, 建部修見

    情報処理学会・電気通信処理学会 並列処理シンポジウム JSPP2002 論文集   71 - 78   2002

     More details

  • Grid Datafarm architecture for petascale data intensive computing

    O Tatebe, Y Morita, S Matsuoka, N Soda, S Sekiguchi

    CCGRID 2002: 2ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS   102 - 110   2002

  • ペタスケール広域分散データ解析のためのGrid Datafarmアーキテクチャ

    建部修見, 森田洋平, 松岡聡, 関口智嗣, 曽田哲之

    情報処理学会論文誌:ハイパフォーマンスコンピューティングシステム,HPCS2002論文集 情報処理学会   89 - 96   2002

     More details

  • XMLベースGridRPCシステムの構築と評価

    白砂哲, 中田秀基, 松岡聡, 関口智嗣

    日本ソフトウエア科学会 第5回プログラミングおよび応用システムに関するワークショップ(SPA2002)   2002

     More details

  • Towards Dynamic Load Balancing Using Page Migration and Loop Re-partitioning on Omni/SCASH

    Yoshiaki Sakae, Satoshi, Matsuoka Mitsuhisa Sato, Hiroshi Harada

    In Proceedings of The Fourth European Workshop on OpenMP (EWOMP 2002)   2002

     More details

  • Evaluating Web services based implementations of GridRPC

    S. Shirasuna, H. Nakada, S. Matsuoka, S. Sekiguchi

    Proceedings of the IEEE International Symposium on High Performance Distributed Computing   2002-   237 - 245   2002

     More details

    Language:English   Publisher:Institute of Electrical and Electronics Engineers Inc.  

    DOI: 10.1109/HPDC.2002.1029923

    Scopus

    J-GLOBAL

    researchmap

  • The Ninf Portal : An Automatic Generation Tool for Computing Portals

    Toyotaro Suzumura Hidemoto Nakada, Masayuki Saito, Satoshi Matsuoka, Yoshio Tanaka, Satoshi Sekiguchi

    Joint ACM Java Grande - ISCOPE 2002 Conference, Seattle, Washington, November 3-5, 2002   2002

     More details

  • Gridポータル構築ツールキットNinf-Portal

    中田秀基, 齊藤真幸, 鈴村豊太郎, 田中良夫, 松岡聡, 関口智嗣

    情報処理学会・電気通信処理学会 並列処理シンポジウム JSPP2002 論文集   209 - 216   2002

     More details

  • A Proposal for API of GridRPC

    NAKADA HIDEMOTO, TANAKA YOSHIO, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   88 ( 78 )   37 - 42   2001.10

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Computation system based on RPC (Remote Procedure Call) is a promising candidate as a middleware of the Grid. Several systems, including Ninf and NetSolve, are already proposed and used in various area. However, Grid RPC API is not standardised yet, and the fact is precluding further spread of Grid RPC systems. In this paper, we examine two existing Grid RPC API and propose a Grid RPC API intended to be a standard, based on them. The API is designed to be minimal but sufficient for aplications. We are planning to promote this API as a standard for Grid RPC in Global Grid Forum.

    CiNii Books

    researchmap

  • Towards performance evaluation of high-performance computing on multiple Java platforms

    S Matsuoka, S Itou

    FUTURE GENERATION COMPUTER SYSTEMS   18 ( 2 )   281 - 291   2001.10

  • Fault Tolerance on the Ninf System

    SHIRASUNA SATOSHI, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   87 ( 77 )   153 - 158   2001.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Fault Tolerance is becoming an increasingly important research topic in the Grid as it gains widespread use. The availability of abundance of albeit unstable resources in non-dedicated environments mandate that all faults in the stages of user computation be handled in a transparent and graceful fashion. Our analysis shows that, in GridRPC, which is one of the viable programming models and systems for the Grid, variable stages during the computation exhibits various facets of fault tolerance, and as such they must be handled in a stage-by-stage basis. An experiment in integrating Ninf, a Grid RPC system, with the Condor system for checkpointing to enable fault tolerance for computation shows that the integration is largely transparent to the user, and for large-grained computations, the overhead is relatively small. On the other hand, overhead for sma11er-grained computations exhibits anomolous and spurious overhead, in addition to overhead incurred for transfer of the checkpointing library on each invocation, and we are conducting further investigation on its viability.

    CiNii Books

    researchmap

  • Evaluation of Monitoring Method in the Grid

    AKIYAMA TOMOHIRO, NAKADA HIDEMOTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   87 ( 77 )   159 - 164   2001.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    The Grid allows distributed resources to be coordinated in order to facilitate large-scale computing over the wide-area network. In such an environment, fault detection and performance monitoring as well as its predeiction becomes one of the important features that need to be agreed upon and possibly standardized. The Grid Perbrmance Working Group within the Global Grid Forum has recently proposed and defined the basic architecture of Grid monitoring and the XML-based data format definitions, but the proposal has been yet tested in practice. In particular, technical concerns include 1)scalability of the proposed architecture, 2)the cost of XML representation of instrumentation events, and 3)extensibility and flexibility of the data definition schema. Our experimental implementation of the part of the proposed architecture on our Ninf GridRPC system has shown that, within a realistic Grid setting the architecture seems reasonably scalable, the added cost of data representation is within permissible bounds, and the schema is sufficiently extensible to accomodidate the specifics of the Ninf system.

    CiNii Books

    researchmap

  • Towards MPI with user-transparent fault tolerance

    TAKAMIYA YASUHITO, MATSUOKA SATOSHI

    IPSJ SIG Notes   87 ( 77 )   129 - 134   2001.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Rapid increase in the number of nodes as well as the massive scale of computing in terms of both time and memory space for commodity clustering is mandatinig the handling the potential failure of applications and system as the norm, while inherent fault tolerance and recovery have not been integral part of software tools being developed for parallel computing on such clusters. Moreover, flexible fault tolerance mechanisms in which the user could manage the balance of reliability vs. transparency vs. execution overhead would be vital, but most previous work on cluster fault tolerance have made available only a single policy and/or mechanism, and moreover, their overhead have not been exactly measured for practical applications. Insted, we propose a new fault tolerant MPI system called Parakeet which al1ows variouss fault tolerance and recovery mechanism could be easily specified by the user, while retaining the efficiency. As a preliminary basis, we have implemented a user-level, coordinated checkpointing and migration protocol on top of MPICH in a user-transparent fashion. By specifying new protocols based on the underlying Parakeet mechanism, one could achieve Plug-and-Play management of large-scale clusters. Preliminary benchmarks show that Parakeet is portable and efficient, and could well serve as a basis for Plug-and-Play clustering.

    CiNii Books

    researchmap

  • Design and Implementation of a Jini-based Computing Portal System

    SUZUMURA TOYOTARO, MATSUOKA SATOSHI, NAKADA HIDEMOTO

    IPSJ SIG Notes   87 ( 78 )   171 - 176   2001.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    JiPANG(Jini-based Portal Augmenting Grids)is a portal system and a toolkit which provides uniform access interface layer to a variety of Grid systems, and is built on top of Jini distributed object technology. JiPANG performs uniform higher-level management of the computing services and resources being managed by individual Grid systems such as Ninf, NetSolve, Globus, etc. In order to give the user a uniform interface to the Grids JiPANG provides a set of simple Java APIs called the JiPANG Toolkits, and furthermore, allows the user to interact with Grid systems, again in a uniform way, using the JiPANG Browser application. With JiPANG, users need not install any client packages beforehand to interact with Grid systems, nor be concerned about updating to the latest version. Such uniform, transparent services available in a ubiquitous manner we believe is essential for the success of Grid as a viable computing platform for the next generation.

    CiNii Books

    researchmap

  • Grid Datafarm Architecture for Petascale Data Intensive Computing

    TATEBE OSAMU, MORITA YOUHEI, MATSUOKA SATOSHI, SEKIGUCHI SATOSHI, SODA NORIYUKI

    IPSJ SIG Notes   87 ( 77 )   177 - 182   2001.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Design of Grid Datafarm architecture for Petascale data intensive computing is described. Grid Datafarm provides global data parallel filesystems with online Petascale storage and scalable I/O bandwidth to exploit local disks of group of PC clusters on the Grid. Gfarm parallel I/O APIs and Gfarm commands provide a single system image for the filesystem. Automatic management of fault-tolerance and load balancing is also an important issue, which is done by file duplication and re-computation using a command history.

    CiNii Books

    researchmap

  • An Implementation of Java Based Software DSM System

    NAKADA HIDEMOTO, SOHDA YUKIHIKO, OGAWA HIROTAKA, MATSUOKA SATOSHI

    42 ( 7 )   85 - 85   2001.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Due to rapid commoditi ation of advanced hardware, parallel machines are being commoditied in the form of PC clusters. Software DSM systems using Java language, which is portable on heterogenous systems, are good candidates for such computing environment. In our previous paper, we proposed a ava based software DSM system for clusters. The system successfully proved its usefulness, but we found some defects including 1) long startup time due to remote invocation of Java VM and 2) troublesome labor to transfer class files on each nodes. In this paper, we introduce our new Java DSM system, which enables Java VMs to settle on each nodes, reducing startup time. It automatically transfers application class files and provides access to the client file system.

    CiNii Books

    researchmap

  • The Optimization of The LINPACK Benchmark for Heterogeneous Clusters

    SASOU TAKERU, MATSUOKA SATOSHI, TATEBE OSAMU

    IPSJ SIG Notes   86 ( 49 )   49 - 54   2001.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this study, we implemented the optimization of HPL, which is one of the LINPACK Benchmark, for a heterogeneous cluster system and evaluated on the CPU heterogeneous cluster. We used the technique of optimization that load sharing by changing data size corre-spoonding to a performance of each nodes. From the experimental results, we attains 57.1% efficiency to theoretical peak performance and 1.49 times at maximum as much as best performance of HPL.

    CiNii Books

    researchmap

  • Implementation of Software DSM in Java

    SOHDA YUKIHIKO, NAKADA HIDEMOTO, OGAWA HIROTAKA, MATSUOKA SATOSHI

    情報処理学会論文誌プログラミング(PRO)   42 ( 3 )   14 - 26   2001.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Rapid commoditization of advanced hardware and progress of networking technology is now making wide area high-performance computing a.k.a.the 'Grid'Computing a reality.Since a Grid will consist of vastly heterogeneous sets of compute nodes, especially commodity clusters, some have articulated the use of Jave as a suitable technology to satisfy portability across different machines.Since Java's natural model of parallelism is shared memory multithreading, one will have to support distributed shared memory(DSM)in a portable manner;however, none of the previous work on implementing Java on DSM has been portable solution.Instead, we propose a software architecture whose goal is to achieve portability of DSM implementation across different commodity clustering platforms, and implemented a prototype system JDSM.Benchmark results show that the current implementation on Jave incurs increased memory coherency maintenance cost compared to C-based DSMs, thus limiting scalability to some degree, and we are currently working on a solution to alleviate this cost.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00016879/

  • Evaluating OpenMP Performance on SDSM using SPLASH2: Omni/SCASH Benchmarks.

    SAKAE YOSHIAKI, MATSUOKA SATOSHI, SATO MITSUHISA, HASEGAWA ATSUSHI, HARADA HIROSHI

    IPSJ SIG Notes   2001 ( 22 )   187 - 192   2001.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Omni/SCASH is an implementation of OpenMP on top of a DSM system SCASH, allowing portable execution of shared-memory OpenMP programs on SMPs as well as on clusters. To validate the effectiveness of Omni/SCASH, we conduct the following benchmarks: porting of selected sets of SPLASH2 benchmarks onto OpenMP and execution thereof on Omni/SCASH to measure the effectiveness of the implementation, such as the costs/frequencies of cache hit/cache miss/DSM fault handler/barrier invocations. We then test the effectiveness of whether Omni/SCASH serves as a effective programming platform for heterogeneous clusters. Preliminary results are mixed, and indicate that further work is needed for portable parallel programming on (heterogeneous) clusters.

    CiNii Books

    researchmap

  • An evaluation of multiple pointing input systems

    K Fukuchi, S Matsuoka

    HUMAN-COMPUTER INTERACTION - INTERACT'01   739 - 740   2001

     More details

    Language:English  

    Web of Science

    researchmap

  • A Foundation of Solution Methods for Constraint Hierarchies

    Hiroshi Hosobe, Satoshi Matsuoka

    Constraints Journal, Special Issue on Soft Constraints   2001

     More details

  • Implementation of a Portable Software DSM in Java

    Yukihiko Sohda, Hidemoto, Nakada Satoshi, Matsuoka Hirotaka Ogawa

    Proceedings of ACM JavaGrande/ISCOPE 2001,San Francisco, pp.163--172, June, 2001.JavaGrande/ISCOPE 2001 Conference, Jun. 2001   163 - 172   2001

     More details

  • Grid Data Farm for Petascale Data Intensive Computing

    Osamu Tatebe, Youhei Morita Satoshi, Matsuoka Noriyuki Soda, Hiroyuki Sato, Yoshio Tanaka, Satoshi Sekiguchi, Yoshiyuki Watase, Masatoshi Imori, Tomio Kobaya

    Techinical Report, Electrotechnical Laboratory, TR-2001-4   2001

     More details

  • A Grid Programming Primer, (Draft 2.4)

    Craig Lee, Satoshi, Matsuoka Domenico Talia Alan, Sussman Nicholas Karonis Gabrielle Allen Mary Thomas

    Whitepaper for Global Grid Forum Advanced Programming Models Working Group   2001

     More details

  • OpenJIT 2: The Design and Implementation of Application Framework for JIT Compilers

    Fuyuhiko Maruyama Satoshi, Matsuoka Hirotaka Ogawa Naoya, Maruyama Kouya Shimura

    USENIX Java Virtual Machine Research and Technology Symposium (JVM'01), Work in Progress session. Monterey. April 23-24 2001   2001

     More details

  • A Jini-based Computing Portal System

    Toyotaro Suzumura, Satoshi Matsuoka, Hidemoto Nakada

    Proceedings of IEEE/ACM Supercomputing '2001, IEEE Computer Society, Denver, Colorado, Nov. 2001   2001

     More details

  • Network-Enabled Server Systems and the Computational Grid

    Henri Casanova, Satoshi, Matsuoka Jack Dongarra

    High Performance Computing Symposium (HPC'01),Advanced Simulation Technologies Conference, April 22-26 in Seattle, Washington (USA), 2001   2001

     More details

  • Ninfシステムにおけるフォールトトレランス

    白砂哲, 中田秀基, 松岡聡

    情報処理学会研究報告 2001-HPC-87(SwoPP2001沖縄), July 2001   159 - 164   2001

     More details

  • Implementation of a Portable Software DSM in Java

    Yukihiko Sohda, Hidemoto, Nakada Satoshi, Matsuoka Hirotaka Ogawa

    Proceedings of ACM JavaGrande/ISCOPE 2001,San Francisco, pp.163--172, June, 2001.JavaGrande/ISCOPE 2001 Conference, Jun. 2001   163 - 172   2001

     More details

  • A Foundation of Solution Methods for Constraint Hierarchies

    Hiroshi Hosobe, Satoshi Matsuoka

    Constraints Journal, Special Issue on Soft Constraints   2001

     More details

  • Grid data farm for atlas simulation data challenges

    Y Morita, O Tatebe, S Matsuoka, N Soda, H Sato, Y Tanaka, S Sekiguchi, S Kawabata, Y Watase, M Imori, T Kobayashi

    PROCEEDINGS OF CHEP 2001   699 - 701   2001

     More details

    Language:English  

    Web of Science

    researchmap

  • A study of deadline scheduling for client-server systems on the Computational Grid

    A Takefusa, H Casanova, S Matsuoka, F Berman

    10TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS   406 - 415   2001

     More details

    Language:English  

    Web of Science

    researchmap

  • OpenJIT 2: The Design and Implementation of Application Framework for JIT Compilers

    Fuyuhiko Maruyama Satoshi, Matsuoka Hirotaka Ogawa Naoya, Maruyama Kouya Shimura

    USENIX Java Virtual Machine Research and Technology Symposium (JVM'01), Work in Progress session. Monterey. April 23-24 2001   2001

     More details

  • Grid計算環境におけるデッドラインスケジューリング手法の性能

    竹房あつ子, 松岡聡

    情報処理学会 電気通信処理学会 並列シンポジウムJSPP 2001 論文集 2001.06   263 - 270   2001

     More details

  • Java向けソフトウエア分散共有メモリの実現

    早田恭彦, 中田秀基, 小川宏高, 松岡聡

    情報処理学会論文誌 ,Vol.42 No.SIG 3 (PRO10), March. 2001   12 - 24   2001

     More details

  • Problem Solving Environment Comparison

    Rajkummar Buyya, Tom Eidson Dennis Gannon Erwin Laure Satoshi, Matsuoka Thierry, Priol Joel Saltz, Seidel Yoshio Tanaka

    Whitepaper for Global Grid Forum Advanced Programming Models Working Group   2001

     More details

  • MPC plus plus performance for commodity clustering

    Y Sakae, S Matsuoka

    HIGH-PERFORMANCE COMPUTING AND NETWORKING   2110   503 - 512   2001

     More details

    Language:English  

    Web of Science

    researchmap

  • Network-Enabled Server Systems and the Computational Grid

    Henri Casanova, Satoshi, Matsuoka Jack Dongarra

    High Performance Computing Symposium (HPC'01),Advanced Simulation Technologies Conference, April 22-26 in Seattle, Washington (USA), 2001   2001

     More details

  • Grid Data Farm for Petascale Data Intensive Computing

    Osamu Tatebe, Youhei Morita Satoshi, Matsuoka Noriyuki Soda, Hiroyuki Sato, Yoshio Tanaka, Satoshi Sekiguchi, Yoshiyuki Watase, Masatoshi Imori, Tomio Kobaya

    Techinical Report, Electrotechnical Laboratory, TR-2001-4   2001

     More details

  • A Grid Programming Primer, (Draft 2.4)

    Craig Lee, Satoshi, Matsuoka Domenico Talia Alan, Sussman Nicholas Karonis Gabrielle Allen Mary Thomas

    Whitepaper for Global Grid Forum Advanced Programming Models Working Group   2001

     More details

  • Problem Solving Environment Comparison

    Rajkummar Buyya, Tom Eidson Dennis Gannon Erwin Laure Satoshi, Matsuoka Thierry, Priol Joel Saltz, Seidel Yoshio Tanaka

    Whitepaper for Global Grid Forum Advanced Programming Models Working Group   2001

     More details

  • JavaでのOpen Just-In-Timeコンパイラ技術 OpenJIT

    小川宏高, 松岡

    2001

     More details

  • 分散オブジェクト技術Jiniを用いたComputing Portal Systemの実装

    鈴村豊太郎, 松岡聡, 中田秀基

    情報処理学会研究報告 2001-HPC-87(SwoPP2001沖縄), July 2001   171 - 176   2001

     More details

  • 2010 : A Simulation Roadmap : A Road to PetaFLOPS Using Commodity Technology

    MATSUOKA Satoshi

    Journal of the Japan Society for Simulation Technology   19 ( 4 )   238 - 245   2000.12

     More details

    Language:Japanese   Publisher:Japan Society for Simmulation Technology  

    Commodity High-Performance Computing which utilizes commodity computing building blocks for high-performance computing is expected to reduce the cost of computing by a factor of over ten thousand over the next ten years, implementing the so-called Petaflops computing as well as making Terascale computing prevalent. As a result, simulations of unprecedented scale or resolution will become possible, making the role of simulation ever more important in science and technology. We attempt to predict the advances of computing power by exploring the technical trends, and investigate how such advances will affect to revolutionize simulation.

    CiNii Books

    researchmap

    Other Link: http://dl.ndl.go.jp/info:ndljp/pid/11082261

  • Evaluation of MPC++-on-MPI on Commodity Cluster Environment

    SAKAE YOSHISAKI, ISHIKAWA YUTAKA, MATSUOKA SATOSHI, TAKAHASHI TOSHIYUKI

    41 ( 2 )   60 - 72   2000.11

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Parallel Programming Languages such as MPC++ which facilitates finer-grained multithreading, remote method invocation, global memory read/write, and synchronized data structures at the language level, have often been claimed as being allowing his parallelism to be expressed in much richer, easier style than programming with libraries such as C +MPI. Due to reliance on language mechanisms which are finer-grained, such languages have traditionally been implemented only on specialized user-level libraries on top of fast, expensive networks. On the other hand, in order for such languages to gain common acceptance, they must be implemented on top of portable messaging libraries running on commodity hardware with substantially less expensive networking. However, little systematic studies have beendone as to identify(1)whether the languages allow easy expression of traditional parallel programs, and(2)in such a case, how much performance one loses by using commodity software/hardware, and(3)the degree of scalability compared to dedicated software/hardware implementations. In order to verify the viability of commodity implementation, we ported the MPC++ language on top of different breeds of MPI, to be executed on two networks of substantial performance/cost difference, namely, Myrinet and 100Base-T Ethernet. We then investigated whether some NASPAR applications can be ported"naturally"on top of MPC++, to be benchmarked in such a environment. Results were quite positive for MPC++ and its commodity implementation, namely(a)the port was quite effortless, (b)the small penalty caused by the additional MPI layer was negligible for NASPAR applications, and(c)for large data sets, MPC++/MPI running on the 100Base-T network was surprisingly competitive to both the C+MOI on Myrinet, the original dedicated implementation of MPC++ on PM/Myrinet. The results are quite promising for wider-spread acceptance of higher-level parallel languages on commodity clustering environments.

    CiNii Books

    researchmap

  • コモディティな並列処理のORにおける可能性 : クラスタとGridコンピューティングの動向(特別部会セッション : 数理計画)

    松岡 聡

    日本オペレーションズ・リサーチ学会秋季研究発表会アブストラクト集   2000   258 - 259   2000.9

     More details

    Language:Japanese   Publisher:公益社団法人日本オペレーションズ・リサーチ学会  

    CiNii Books

    researchmap

  • Will a x86 Android Dream of an Electronic Cow?(Broadcasting and Information Processing)

    MATSUOKA Satoshi

    IPSJ Magazine   41 ( 9 )   1072 - 1074   2000.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

  • Implementing and Evaluating of MPC++ Multi-Thread Template Library on Multiple Communication Layers

    NODA YUSUKE, SAKAE YOSHIAKI, MATSUOKA SATOSHI, OGAWA HIROTAKA

    IPSJ SIG Notes   82 ( 73 )   137 - 142   2000.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Parallel Programming Languages such as MPC++ embody various features required for finer-grained parallel processing such as user-level threads, remote thread invocation, and remote memory access. Such languages are often implemented assuming a fixed, high-performance hardware to eliminate software overhead as much as possible, and portable, high-performance implementations on top of commodity clusters which could involve a variety of execution environment(different CPUs, OS, networks)have not been well investigated. In order to clarify the commodity-level viability of such languages, we have been experimenting with a variety of combinations of the underlying execution environments. In particular, we are testing the use of VIA as an underlying messaging layer for MPC++. Although a general low-level messaging layer, nontheless the semantics of MPC++ makes it non-trivial to perform a straightforward port. Although the current problems in the implementation prohibits us from large-scale benchmarks, the initial experiments with NAS Parallel CG show that implementation of VIA on 100Base-T allows notable speedup compared to MPI im-plementations due to low-latency messaging, and throughput increasing by 190% for small, 32-byte messages, which is often used for such languages.

    CiNii Books

    researchmap

  • Evaluation of Fast Barrier Synchronization on commodity PC Cluster connected with Ethernet

    IWASAKI SATORU, MATSUOKA SATOSHI, SAKAE YOSHIAKI, OGAWA HIROTAKA

    IPSJ SIG Notes   82 ( 73 )   131 - 136   2000.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    It is still a typical belief that high-performance clusters require expensive networks with low latency and high bandwidth such as the Myrinet, especially for communication-intensive situations such as barrier synchronization. In order to achieve similar level of performance in barrier synchronization with commodity networks, in particular Fast Ethernet, we propose and investigate the design space of using multicasts and multiple networks. Our experimental library employs VIA-style low-latency access to Ethernet cards as well as supports multicasts, both of which are employed to construct several barrier algorithms. Benchmarks show that the Shuffle Exchange algorithm on our library can be low as 170μseconds with 32 nodes, almost matching the best performance on Myrinet. Although the use of multicast is found to be currently slower with 200μseconds, theoretical analysis using the LogP model reveals that better design of the library will likely yield even lower latency than Shuffle Exchange. The results show that commodity networks are sufficient for clustering, allowing lower cost and their wider acceptance as a result.

    CiNii Books

    researchmap

  • Interactive Essay : For the Future of Japanese Super Computers / Keep It up and Let's Work Smart for Japanese Supercomputers! / Why in the World Would a Software Guy Like Me Want to Create a Large-Scale Commodity Cluster ? / Supercomputing Business in Japan, for Tomorrow

    41 ( 7 )   877 - 884   2000.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • For the Future of Japanese Super Computers

    BOKU Taisuke

    IPSJ Magazine   41 ( 7 )   877 - 878   2000.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

  • Why in the World Would a Software Guy Like Me Want to Create a Large-Scale Commodity Cluster?

    MATSUOKA Satoshi

    IPSJ Magazine   41 ( 7 )   880 - 882   2000.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

  • Implementation of Multiple Pointing Input System

    FUKUCHI KENTAROU, MATSUOKA SATOSHI

    89 ( 61 )   15 - 21   2000.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    This paper describes a prototype system of Multiple Pointing Input System(MPIS), which allows concurrent manipulation of multiple pointing devices. On the traditional GUI system the users have to manipulate each objects sequentially with one pointing device. Our MPIS allows users to point multiple places on GUI screen simultaneously, and manipulate multiple graphical objects(icon, slider)concurrently. The devices are manipulated on a clear acrylic table and the coordinate of each device is calculated from the images captured by a video camera below the table.

    CiNii Books

    researchmap

  • Design issues for Network Enabled Server Systems

    NAKADA HIDEMOTO, MATSUOKA SATOSHI, SATO MITSUHISA, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   81 ( 57(HPC-81) )   69 - 74   2000.6

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Network Enabled Server is considered to be a good candidate for global computing middleware. This paper clarifies design issues for Network Enabled Server systems and discusses alternatives on each issue. Issues are connection methods, protocol command representation, security methods. We implemented new Ninf system considering with the issues. We also show the design of the system focusing on the security facility.

    CiNii Books

    J-GLOBAL

    researchmap

  • Overview of a Jini-based Computing Portal System

    SUZUMURA TOYOTARO, MATSUOKA SATOSHI, NAKADA HIDEMOTO, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   81 ( 57 )   57 - 62   2000.6

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    JiPANG(Jini-based Portal Augmenting Grids) is a portal system which provides uniform access interface layer to a variety of Grid systems, and is built on top of Jini distributed object technology. JiPANG supports a virtual computing infrastructure called the JiPANG pool, which performs uniform higher-level management of the computing services and resources being managed by individual Grid systems such as Globus or Ninf. In order to give the user a uniform interface to the system JiPANG provides a set of simple Jave Grid APIs called the JiPANG API, and furthermore, allows the user to interact with Grid systems, again in a uniform way, using the JiPANG Browser application. With JiPANG, users need not install any client packages beforehand to interact with Grid systems, nor be concerned about updating to the latest version. Such uniform, transparent services available in a ubiquitous manner we believe is essential for the success of Grid as a viable computing platform for the next generation.

    CiNii Books

    researchmap

  • A Scheduling Framework for Global Computing

    NAKADA Hidemoto, TAKEFUSA Atsuko, MATSUOKA Satoshi, SATO Mitsuhisa, SEKIGUCHI Satoshi

    Transactions of Information Processing Society of Japan   41 ( 5 )   1617 - 1627   2000.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Rapid progress in networking technology is now making global computing systems feasible. Although there have been proposals of global computing systems, it is still a research issue as to how to achieve efficient usage of computing resources in global computing. In particular, we need to devise appropriate scheduling strategies/algorithms of computing resources over wide-area networks, which are often dynamic and unstable in nature. This paper presents our preliminary scheduling framework for unifying application and job scheduling in global computing. The proposed framework establishes a layer of scheduling and resource allocation subframeworks. We show our software framework Ninf metaserver which provides low-level scheduler and resource monitor. We also evaluate some scheduling strategies using the framework. The evaluation results prove that the framework is flexible enough to implement plural scheduling algorithms on top of it.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00012320/

  • Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms

    TAKEFUSA Atsuko, AIDA Kento, MATSUOKA Satoshi, NAKADA Hidemoto, NAGASHIMA Umpei

    Transactions of Information Processing Society of Japan   41 ( 5 )   1628 - 1638   2000.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    While there have been several proposals of high performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. Our Bricks performance evaluation system would allow analysis and comparison of various scheduling schemes on a typical high-performance global computing setting. Bricks can simulate various behaviors of global computing systems, especially the behavior of networks and resource scheduling algorithms. Moreover, Bricks is componentalized such that not only its constituents could be replaced to simulate various different system algorithms, but also allows incorporation of existing global computing components via its foreign interface. To test the validity of the latter characteristics, we incorporated the NWS system, which monitors and forecasts global computing systems behavior. Experiments were conducted by running NWS under a real environment versus the simulated environment given the observed parameters of the real environment. We observed that Bricks behaved in the same manner as the real environment, and NWS also behaved similarly, making quite comparative forecasts under both environments.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00012321/

  • A Design of OpenJIT Frontend System

    OGAWA HIROTAKA, MATSUOKA SATOSHI, MARUYAMA FUYUHIKO, SOHDA YUKIHIKO, SHIMURA KOUYA

    41 ( 2 )   1 - 12   2000.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    The so-called 'Open Compilers' is a technique to incorporate various self-descriptive modules for language customization and optimization based on computational reflection. We apply the open compiler technique to a Java Just-In-Time compiler to develop the OpenJIT compiler, which allows class-specific customization and optimization, fostering research of new compilation techniques such as application-specific customization and dynamic optimizations. The OpenJIT is largely divided into the frontend and the backend. The frontend takes the Java bytecodes as input, performs higher-level optimizations involving source-to-source transformations, and passes on the intermediate code to the backend. The backend takes the intermediate code from the frontend as input, performs lower-level optimizations, and outputs the native code for direct execution. In this paper, we describe the internal architecture of the frontend system and evaluate it for a simple loop example.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00016944/

  • Evaluation of Parallel LU Factorization in Java

    HASEGAWA HIROKAZU, MATSUOKA SATOSHI, ITOU SHIGEO

    IPSJ SIG Notes   137 ( 23 )   83 - 88   2000.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Most previous attempts at utilizing Java for HPC sacrificed Java's portability, or did not achieve necessary performance required for HPC. Instead, we propose an alternative methodology based on Downloadale Self-tuning Library, and constructed an experimental prototype called AJaPACK, which is a portable and high-performance parallel BLAS library for Java which "tunes" itself to the environment to which it is installed upon. Once AJaPACK is downloaded and executed, the Java version of ATLAS (ATLAS for Java) and the parallelized version of JLAPACK combine to achieve optimized pure Java execution for the given environment. Benchmarks have shown that AJaPACK achieves approximately 1 / 2 to 1 / 5 of the speed of optimized C-ATLAS and vendor supplied BLAS libraries, and with portable parallelization in SMP environments, achieves superior performance to single-threaded C-based native libraries. This is an order of magnitude superior w.r.t. performance compared to previous pure Java BLAS libraries. For Blocked LU-decomposition, reasonable speedup had also been reached ; on the other hand, the AJaPACK version suffers from high-overhead of subarray manipulation, resulting in loss in performance compared to previous routines such as JLAPACK. This shows that building numerical libraries in Java is still not straightforward, and programming techniques specific to Java should be developed for high-performance.

    CiNii Books

    researchmap

  • Are Global Computing Systems Useful? Comparison of Client-server Global Computing Systems Ninf, NetSolve Versus CORBA

    SATOSHI MATSUOKA

    14th IEEE International Parallel \& Distributed Processing Symposium   547 - 556   2000

     More details

  • Performance Issues in Client-Server Global Computing

    SATOSHI MATSUOKA

    International Workshop on Global and Cluster Computing (WGCC'2000).2000.03   2000

     More details

  • AJaPack; A Performance Portable Parallel Java Numerical Library

    SATOSHI MATSUOKA

    Proceedings of the ACM 2000 Java Grande Conference, The ACM Press,June, 2000   140 - 149   2000

     More details

  • Performance Issues in Client-Server Global Computing

    SATOSHI MATSUOKA

    International Workshop on Global and Cluster Computing (WGCC'2000).2000.03   2000

     More details

  • Open JIT:Javaのための開放型自己反映的JITコンパイラフレームワーク

    松岡聡

    日本ソフトウエア科学会 第三回プログラミングおよび応用システムに関するワークショップ(SPA2000,口頭発表),March 2000   2000

     More details

  • AJaPack; A Performance Portable Parallel Java Numerical Library

    SATOSHI MATSUOKA

    Proceedings of the ACM 2000 Java Grande Conference, The ACM Press,June, 2000   140 - 149   2000

     More details

  • Are Global Computing Systems Useful? Comparison of Client-server Global Computing Systems Ninf, NetSolve Versus CORBA

    SATOSHI MATSUOKA

    14th IEEE International Parallel \& Distributed Processing Symposium   547 - 556   2000

     More details

  • NetCFD: A Ninf CFD component for global computing, and its Java applet GUI

    M. Sato, K. Kusano, H. Nakada, S. Sekiguchi, S. Matsuoka

    Proceedings - 4th International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region, HPC-Asia 2000   1   501 - 506   2000

     More details

    Language:English   Publisher:Institute of Electrical and Electronics Engineers Inc.  

    DOI: 10.1109/HPC.2000.846605

    Scopus

    J-GLOBAL

    researchmap

  • OpenJIT: An open-ended, reflective JIT compiler framework for Java

    H Ogawa, K Shimura, S Matsuoka, F Maruyama, Y Sohda, Y Kimura

    ECOOP 2000 - OBJECT-ORIENTED PROGRAMMING   1850   362 - 387   2000

     More details

    Language:English  

    Web of Science

    researchmap

  • An Effective Decompilation Algorithm for Java Bytecodes

    MARUYAMA FUYUHIKO, OGAWA HIROTAKA, MATSUOKA SATOSHI

    40 ( 10 )   39 - 50   1999.12

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    The technique called decompilation that reads sequences of machine code and generates the corresponding source program has been known for some time, and utilized primarily for reverse-engineering. For Java and its bytecode, although there have been several proposals of decompilers, most generate outputs that are inappropriately extend the Java language, such as insertion of gotos not present in Java. Moreover, the decompilation algorithms are somewhat ad-hoc and difficult to extend of verify its applicability, which is a hindrance to our OpenJIT compiler which requires a decompiler frontend to recover the correct source structure from arbitrary bytecode. Instead, we have devised a new and effective algorithm for decompilation, with emphasis on properly recovering control structures. The key idea is to base on the observation that, for a properly-nested block-structured language, each part of program representing a control structure corresponds to just a single subtree in the dominator tree. As such, the algorithm is general enough to be applied to other languges besides Java. The evaluation of our preliminary implementation in OpenJIT shows that our algorithm properly recovers control structures where other existing decompilers fail, and with relatively equivalent execution speeds.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00016961/

  • 手書きスケッチによる3次元モデリングシステム Teddy--フリーハンドで自由曲線を描くだけで手軽に3次元モデルを作成できる

    五十嵐 健夫, 松岡 聡, 田中 英彦

    日経CG   ( 156 )   110 - 117   1999.9

     More details

    Language:Japanese   Publisher:日経BP社  

    CiNii Books

    researchmap

  • Comparison of Client-Server Global Computing Systems : Performance Evaluation of Ninf, NetSlove, CORBA, Ninf-on-Globus

    SUZUMURA TOYOTARO, NAKAGAWA TAKAYUKI, MATSUOKA SATOSHI, NAKADA HIDEMOTO

    IPSJ SIG Notes   77 ( 66 )   197 - 202   1999.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Recent developments of global computing systems such as Ninf, NetSolve and Globus have opened up the opportunites for providing high-performance computing services over wide-area networks. However, most research focused on the individual architectural aspects of the system, or application deployment examples, instead of the necessary charactersistics such systems should intrinsically satisfy, nor how such systems relate with each other. Our comparative study performs deployment of example publications of network-based libraries using Ninf, NetSolve, and CORBA. There, we discover that dedicated systems for global computing such as Ninf and NetSolve have clear management, progammability, as well as performance advantages over CORBA. Furthermore, deployment of Ninf on top of Nexus, the communication layer of Globus, has exhibited some loss of performance as well as somewhat kludgy glueing, due to the fundamental difference on the assumptions of the underlying communication models. Such results indicate that further basic research is necessary across multiple systems to identify the ideal software architectures for global computing.

    CiNii Books

    researchmap

  • Implementing and Evaluating the MPC++ Multi-Thread Template Library on Multiple MPI Platforms

    SAKAE YOSHIAKI, ISHIKAWA YUTAKA, MATSUOKA SATOSHI, OGAWA HIROTAKA

    IPSJ SIG Notes   77 ( 66 )   41 - 46   1999.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Our parallel programming language MPC++ has been only available on the SCore cluster system software developed at the Real World Computing Partnership. In order to achieve better portability amongst multiple platforms, the scope or MPC++ is being widened via implementation using MPI as the underlying communication layer. This brings up the question of applicability, since MPI performance varies considerably on different platforms. Our evaluation results show that the communication overhead is negligible when the data size is larger than 8 Kbytes. Furthermore, the CG kernel benchmark of Nas Parallel Benchmarks written in MPC++ using MPI achieves a comparable speed to one written in MPI when the number of nodes are small. However, increase in the number of nodes causes severe loss of performance for commodity platforms with low network performance, while it continues to scale well on those with high-performance networks, as well as MPIs on MPPs with fast communication infrastructure. These results suggest that, although MPC++ on MPI is viable on high-performance platforms, we need further research on optimizing for commodity networks.

    CiNii Books

    researchmap

  • グローバルコンピューティングシステムNinfを用いた数値流体解析コンポーネントnetCFD

    佐藤 三久, 草野 和寛, 中田 秀基, 関口 智嗣, 松岡 聡

    年会一般講演   18   369 - 370   1999.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • 並列処理 Javaによる大域的並列計算環境Ninflet

    高木浩光, 松岡聡, 中田秀基, 関口智嗣, 佐藤三久, 長嶋雲兵

    情報処理学会論文誌   40 ( 5 )   2203 - 2214   1999.5

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • Ninflet : A Java - based Global Parallel Computing Environment

    TAKAGI Hiromitsu, MATSUOKA Satoshi, NAKADA Hidemoto, SEKIGUCHI Satoshi, SATO Mitsuhisa, NAGASHIMA Umpei

    Transactions of Information Processing Society of Japan   40 ( 5 )   2203 - 2214   1999.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    To make global-wide distributed computing system attractive, the system should be open to an arbitrary individual not only for its usage but also for construction of wide variety of application programs. For this purpose, the system must supply a secure environment for safely executing arbitrary programs. Our proposed global computing environment "Ninflet" fulfills such a requirement by exploiting the security mechanism of the Java language, allowing computation to occur on machines not owned or administered by the individual invoking the computation. Ninflet realizes a globally-shared metacomputer which would allow "lending" of computing cycles of machines which would be otherwise unused at nights to the other side of the globe, or to simply build a parallel execution environment on a heterogeneous sets of workstation clusters. We present the system architecture of Ninflet and a preliminary performance evaluation when used as a parallel execution environment.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00012665/

  • Additive Interaction Nets : Yet Another Linear Logic Programming Language

    MATSUOKA SATOSHI

    40 ( 4 )   72 - 72   1999.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We propose a new programming language, which is an extension of Lafont's interaction nets to the additive case. The extension here is to introduce first-order unification variables: each agent in interaction nets has several first order terms with unification variables. When agents interact, information on interaction nets can be distributed by first order unification. In contrast with the standard interaction nets, our interaction nets with first order terms do not have the Church-Rosser property: several rewrite rules may apply to an additive interaction net. Girard's additive proof nets can be considered as a special case of our interaction nets with first order terms. We consider the extended interaction nets as a better substitute for linear logic programming languages based on backward proof search, which is a concurrent object oriented programming language, for some purposes, especially for formalization of componentbased programming, which is a trend in real computing, e.g. Java Beans and Active X. We can encode a π-calculus-like logic programming language as well as the SLD-resolution into the additive interaction nets.

    CiNii Books

    researchmap

  • Performance Evaluation of Global Computing Systems by Simulation

    TAKEFUSA Atsuko, AIDA Kento, NAKADA Hidemoto, OGAWA Hirotaka, MATSUOKA Satoshi, SATO Mitsuhisa, SEKIGUCHI Satoshi, NAGASHIMA Umpei

    Transactions of Information Processing Society of Japan   40 ( 5 )   2192 - 2202   1999.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    While there have been several proposals of high performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. This paper describes design and implementation of the simulator that evaluates scheduling schemes on a typical high-performance global computing system. The simulator can simulate various features of global computing systems by adopting a queueing model. Effectiveness of the simulator was verified by the simulation results, which showed very similar results to the experimental results on a real global computing system. This paper also shows simulation results of simple scheduling schemes by the simulator. Results show it is important to consider resource conditions appropriately for overall system performance.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00012664/

  • A Navigational Interface for Mobile Computing using 3D Spatial Audio

    KII MANABU, MATSUOKA SATOSHI, HAYASHI KAZUTERU

    83   31 - 36   1999.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We proipose the 3D Audio Compass, an navigational interface for mobile computing using 3D spatial audio. 3D Audio Compass can guide the user to the destination intuitively, allowing the user to concentrate his attention on his real-world task. A prototype system is tested using VRML, and experimental results suggest that the guidance by 3D spatial audio is effective.

    CiNii Books

    researchmap

  • Evaluation of Portable Software DSM employing Reflection

    YAGISAWA NAOYA, OGAWA HIROTAKA, SOHDA YUKIHIKO, MATSUOKA SATOSHI

    IPSJ SIG Notes   132 ( 21 )   109 - 114   1999.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Platform portability is one of the utmost demanded properties of a system today, due to the diversity of runtime execution environment of wide-area networks, and parallel programs are no exceptions. However, parallel execution environments are very diverse, could change dynamically, while performance must be portable as well. As a result, techniques for achieving platform portability are sometimes not appropriate, or could restrict the programming model, e.g., to simple message passing. Instead, we propose the use of reflection for achieving platform portability of parallel programs. As a prototype experiment, a software DSM system was created which utilizes the compile-time metaprogramming features of OpenC++ 2.5 to generate a message-passing MPC++ code from a SPMD-style, shared-memory C++ program. To characterize the effect of our system, we perform SPLASH2 on a PC cluster linked by the Myrinet gigabit network, and resulted in resonable performance compared to a high-performance SMP. We also indicate that it can achieve comparable performance to low-overhead DSMs, such as Shasta.

    CiNii Books

    researchmap

  • Towards Performance Evaluation of High-Performance Computing on Multiple Java Systems

    ITOU SHIGEO, MATSUOKA SATOSHI

    IPSJ SIG Notes   132 ( 21 )   25 - 30   1999.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Despite claims of platform portability, it is not clear whether Java is suitable for high-performance scientific computing. In fact optimizations by e.g. JIT compilers may not be effective for achieving high performance in various scientific code, i.e., "performance portability" may not be guaranteed in current Java systems. To solve this situation, we are constructing a benchmarking platform for Java that candidly compares different Java systems. In particular, we have constructed a Java version of ATLAS, a program generator that outputs platform-specific optimized BLAS, to investigate the peak performance of each Java system. Then, we compared this performance to typical source-level optimizations that a user or a compiler might perform, to see how close such optimizations can approach the peak performance.

    CiNii Books

    researchmap

  • Overview of a Global Computing Simulator

    TAKEFUSA ATSUKO, AIDA KENTO, NAKADA HIDEMOTO, MATSUOKA SATOSHI, NAGASHIMA UMPEI

    IPSJ SIG Notes   132 ( 21 )   31 - 36   1999.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    While there have been several proposals of high performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. This paper describes an overview of the Bricks simulator that evaluates scheduling schemes on a typical high-performance global computing system. Bricks can simulate various behaviors of global computing systems, especially the behavior of networks and resource scheduling algorithms. Moreover, Bricks is componentalized such that not only its constituents could be replaced to simulate various different system algorithms,but also allows incorporation of existing global computing components via its foreign interface. To test the validity of the latter characteristics, we incorporated the NWS system, which monitors and forecasts global computing systems behavior. Experiments were conducted by running NWS under a real environment versus the simulated environment given the observed parameters or the real environment. Under both environments, NWS behaved similarly, making quite comparative forecasts.

    CiNii Books

    researchmap

  • Development and preliminary evaluation of remote computing resource access systems using Ninf

    SATO MITSUHISA, TANAKA YOSHIO, KUSANO KAZUHIRO, NAKADA HIDEMOTO, SEKIGUCHI SATOSHI, NAGASHIMA UMPEI, MATSUOKA SATOSHI

    IPSJ SIG Notes   132 ( 21 )   37 - 42   1999.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We are developing prototype systems to access remote computing resources by using a global computing middle ware, Ninf. Ninf allows the users to make use of the remote computing resources as computational components in his program. As our prototypes, we designed a Computational Fluid Dynamics (CFD) component, netCFD, and a Computational Chemistry component, netMO for molecular orbital computation. Though a large amount of data in each time step may be stored in CFD applications, the overhead of the I/O can be reduced by overlapping I/O and computation even in the remote CFD computation. As a demonstration of netCFD, we design the GUI using Java applet to make use of the net CFD component through Web browsers. The GUI applet invokes the CFD computation in remote Ninf server, and receives the results by the callback interface in Ninf to visualize the results in each time step.

    CiNii Books

    J-GLOBAL

    researchmap

  • 「世紀末討論会 : 20世紀, コンピュータ・サイエンスは何の役に立ったか? : <現場エンジニアvs理論研究者たちの壮絶バトル>」

    竹内 郁雄, 鯵坂 恒夫, 荒木 啓二郎, 石田 喬也, 上原 三八, 土屋 正登, 松岡 聡

    情報処理   40 ( 2 )   32 - 32   1999.2

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    CiNii Books

    researchmap

  • Implementation of DSM Using OpenC++ Reflection

    SOHDA YUKIHIKO, OGAWA HIROTAKA, MATSUOKA SATOSHI

    40 ( 1 )   13 - 22   1999.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Platform portability is one of the utmost demanded properties of a system today, due to the diversity of runtime execution environment of wide-area networks, and parallel programs are no exceptions. However, parallel execution environments are VERY diverse, could change dynamic any, while performance must be portable as well. As a result, techniques for achieving platform portability are sometimes not appropriate, or could restrict the programming model, e.g., to simple message passing. Instead, we propose the use of reflection for achieving platform portability of parallel programs. As a prototype experiment, a soft ware DSM system was created which utilizes the compile-time metaprogramming features of OpenC++ 2.5 to generate a message-passing MPC++ code from a SPMD-style, shared-memory C++ program. The translation creates memory management objects on each node to manage the consistency protocols for objects arrays residing on different nodes. Read-and write-barriers are automatically inserted on references to shared objects. We evaluated this system on a PC cluster linked by the Myrinet gigabit network.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00017020/

  • Teddy: A sketching interface for 3D freeform design

    T Igarashi, S Matsuoka, H Tanaka

    SIGGRAPH 99 CONFERENCE PROCEEDINGS   409 - 416   1999

     More details

    Language:English  

    Web of Science

    researchmap

  • OMPC++ --- A Portable High-Performance Implementation of DSM using OpenC++ Reflection

    Yukihiko Sohda, Hirotaka Ogawa, Satoshi Matsuoka

    Proc. of Reflection'99, Springer LNCS   1616   215 - 234   1999

     More details

  • Teddy: A Sketching Interface for 3D Freeform Design

    Takeo Igarashi, Satoshi Matsuoka, Hidehiko Tanaka

    Proc. ACM SIGGRAPH'99   409 - 416   1999

     More details

  • OMPC++-A Portable High-Performance Implementation of DSM using OpenC++ Reflection

    SOHDA Y.

    LNCS   1616   215 - 234   1999

     More details

  • OpenJIT--自己反映計算に基づいた動的に変更可能なJava JITコンパイラ (特集 ネットワ-クコンピュ-ティングの新展開--オ-プンJavaのもたらすもの)

    松岡 聡

    Computer today   15 ( 6 )   4 - 11   1998.11

     More details

    Language:Japanese   Publisher:サイエンス社  

    CiNii Books

    researchmap

  • History and Developments of Java Implementation Technologies

    MATSUOKA Satoshi

    Journal of The Society of Instrument and Control Engineers   37 ( 9 )   627 - 632   1998.9

     More details

    Language:Japanese   Publisher:The Society of Instrument and Control Engineers  

    DOI: 10.11499/sicejl1962.37.627

    CiNii Books

    researchmap

  • Implementation of Communication Library on Ninflet : A Java-based Global Parallel Computing System

    OOHISA MITSUTAKA, TAKAGI HIROMITSU, MATSUOKA SATOSHI, OGAWA HIROTAKA

    IPSJ SIG Notes   72 ( 72 )   67 - 72   1998.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    There have been several recent proposals of high-performance distributed systems that utilize idle computing resources during the nights, etc. These systems typically employ highly portable programming language systems such as Java, and our Ninflet is one such system.However, evaluation of these systems have been mostly done with simple master-worker styles only, and more complex parallel programming styles have resorted to low-level communication primitives such as RMI and MPI.Instead, we design and encapaulate several high-level par-allel programming patterns as class libraries for Ninflet using the so-called 'design patterns', and evaluate its effectiveness by comparing with traditional parallel Programming styles.

    CiNii Books

    researchmap

  • Authentication for Ninf : Global Computation System

    NAKADA HIDEMOTO, MATSUOKA SATOSHI, SATOH MITSUHISA, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   72 ( 72(HPC-72) )   79 - 84   1998.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Rapid growth of network technology made high-performance distributed computing possible, Technical aspects of software framework for high-performance distributed computing are already almost established. However, from social aspect, some important issues still remain open, i.e, access control or accounting. In this paper, we discuss authentication mechanism which is needed for the above issues.Strictness of authentication and easiness of system usage are tradeoff.Authentication machanism have to be choosen according to system usage.

    CiNii Books

    J-GLOBAL

    researchmap

  • Implementation and Preliminary Evaluation of Global Scheduling Framework in Ninf

    TAKEFUSA ATSUKO, NAKADA HIDEMOTO, AIDA KENTO, OGAWA HIROTAKA, MATSUOKA SATOSHI, NAGASHIMA UMPEI

    IPSJ SIG Notes   72 ( 72 )   73 - 78   1998.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Rapid progress in networking technology is now making global computing systems feasible.Although there have been proposals of global computing systems, it is still a research issue as to how to achieve efficient usage of computing resources in global computing. In particular, we need to devise appropriate scheduling strategies/algorithms of computing resources over wide-area networks, which are often dynamic and unstable in nature. This paper presents our preliminary scheduling framework for unifying application and job scheduling in global computing.The proposed framework establishes a layer of scheduling and resource allocation subframeworks. We show our software framework Ninf metaserver which provides low-level scheduler and resource monitor.We also evaluate some scheduling strategies by actual envi-ronment and our performance evaluation model.

    CiNii Books

    researchmap

  • OpenJIT : A Reflective Java JIT Compiler

    MATSUOKA Satoshi, OGAWA Hirotaka, SHIMURA Kouya, KIMURA Yasunori, HOTTA Kohichior, TAKAGI Hiromitsu

    IEICE technical report. Computer systems   98 ( 234 )   49 - 56   1998.8

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    The so-called 'Open Compilers' is a technique to incorporate various self-descriptive modules for language customization and optimization based on computational reflection. We apply the open compiler technique to a Java Just-In-Time compiler to develop the OpenJIT compiler, which allows class-specific customization and optimization, fostering research of new compilation techniques such as application-specific customization and dynamic optimizations.

    CiNii Books

    researchmap

  • Global Parallel Computation using Ninf (Special Issue on Parallel Processings)

    NAKADA Hideki, TAKAGI Hiromitsu, MATSUOKA Satoshi, NAGASHIMA Umpei, SATOH Mitsuhisa, SEKIGUCHI Satoshi

    Transactions of Information Processing Society of Japan   39 ( 6 )   1818 - 1826   1998.6

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Distributed computing using message passing libraries in a LAN (Local Area Network) environment is already accepted as an effective supercomputing methodology. On the other hand, although distributed computing in WAN (Wide Area Network) environment is becoming practical due to recent development of high-speed network facilities, software framework for supercomputing in WAN is yet to be established. We propose'Ninf', a distributed computing framework for globally distributed computing environment. Ninf enables parallel computing in WAN based on the macro dataflow model, and facilitates automatic dynamic load distribution and scheduling. Ninf has the following advantages over using existing message passing libraries in WAN supercomputing: (1) communication protocol suited for globally distributed environment, (2) ease of programming (3) reuse of existing libraries, (4) integration with existing data resources on the Internet.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00013059/

  • OMPI : A Compile - time Optimizer for MPI Programs (Special Issue on Parallel Processings)

    OGAWA Hirotaka, MATSUOKA Satoshi

    Transactions of Information Processing Society of Japan   39 ( 6 )   1700 - 1708   1998.6

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    MPI is gaining widespread acceptance as a standard for message passing in high-performance computing, due to its powerful and flexible support of various communication styles. However, the complexity of its API poses significant software overhead, and as a result, applicability of MPI has been restricted to rather regular, coarse-grained computations. Our OMPI (Optimizing MPI) system removes much of the excess overhead by employing partial evaluation techniques, which exploit static information of MPI calls. Because partial evaluation alone is insufficient, we also utilize template functions for further optimization. Benchmarks show that OMPI improves execution efficiency by as much as factor of two for communication-intensive application core with minimal code increase. It also performs significantly better than previous dynamic optimization technique.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00013046/

  • Multi - client LAN/WAN Performance Analysis of Ninf (Special Issue on Parallel Processings)

    TAKEFUSA Atsuko, OGAWA Hirotaka, MATSUOKA Satoshi, NAKADA Hideki, TAKAGI Hiromitsu, SATO Mitsuhisa, SEKIGUCHI Satoshi, NAGASHIMA Umpei

    Transactions of Information Processing Society of Japan   39 ( 6 )   1827 - 1838   1998.6

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Rapid increase in speed and availability of network of supercomputers is making high-performance global computing possible, including our Ninf system. However, critical issues regarding system performance characteristics in global computing have been little investigated, especially under multi-client, multi-site WAN settings. In order to investigate the feasibility of Ninf and similar systems, we conducted benchmarks under various LAN and WAN environments, and observed the following results: 1)Given sufficient communication bandwidth, Ninf performance quickly overtakes client local performance, 2) current supercomputers are sufficient platforms for supporting Ninf and similar systems in terms of performance and OS fault resiliency, 3) for a vector-parallel machine (Cray J90), employing optimized data-parallel library is a better choice compared to conventional task-parallel execution employed for non-numerical data servers, 4) computationally intensive tasks such as EP can readily be supported under the current Ninf infrastructure, and 5) for communication-intensive applications such as Linpack, server CPU utilization dominates LAN performance, while communication bandwidth dominates WAN performance, and furthermore, aggregate bandwidth could be sustained for multiple clients located at different Internet sites; as a result, distribution of multiple tasks to computing servers on different networks would be essential for achieving higher client-observed performance.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00013060/

  • Javaによる大域的並列計算環境Ninflet

    高木浩光, 松岡聡, 中田秀基, 関口智嗣, 佐藤三久, 長嶋雲兵

    情報処理学会シンポジウム論文集   98 ( 7 )   135 - 142   1998.6

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • Interactive Beautification : A Technique for Rapid Geometric Design (Special Issue on Next Generation Human Interface and Interaction)

    IGARASHI Takeo, MATSUOKA Satoshi, KAWACHIYA Sachiko, TANAKA Hidehiko

    Transactions of Information Processing Society of Japan   39 ( 5 )   1373 - 1384   1998.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Diagram drawing with conventional computer-assisted drawing editors often tend to take considerable amount of time despite their seeming ease of use. The causes of the problem are too many commands and unintuitive procedures to satisfy geometric constraints. To solve the problem, we propose interactive beautification, a technique for rapid geometric design, and developed a prototype system Pegasus to verify the efficiency of the technique. Interactive beautification system receives the user's freestroke and beautifies it considering geometric constraints among segments. Using the technique, the user can draw precise diagrams with geometric relations rapidly without using any editing commands. Current prototype system supports drawings comprised of straight lines, and a user study was preformed using the prototype system, a commercial CAD, and an OO-based drawing system. The result showed that the users can draw required diagrams more rapidly and more precisely using the prototype system.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00013100/

  • Essay: Java Versus Programming Language Research

    MATSUOKA Satoshi

    IPSJ Magazine   39 ( 4 )   301 - 301   1998.4

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

  • The Frontiers of Java-based Frameworks : Applications to Metacomputing

    TAKAGI Hiromitsu, MATSUOKA Satoshi

    IPSJ Magazine   39 ( 4 )   302 - 305   1998.4

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

  • Implementation of Global Numerical Information Database Server system "NinfDB"

    56 ( 0 )   258 - 259   1998.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Evaluation of Implicit Co-scheduling on Clustered Parallel Computer

    FUKUCHI KENTAROU, MATSUOKA SATOSHI, HORI ATSUSHI, ISHIKAWA YUTAKA

    IPSJ SIG Notes   128 ( 18 )   43 - 48   1998.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Implicit co-scheduling is a parallel job scheduling methodology proposed by the UC Berkeley NOW project, and embodies favorable characteristics such as lack of global schedulers, low overhead, and easy implementation. Previous literatures have claimed that overhead versus traditional gang schedulers was about a factor or 0.6to1.6; however, evaluations were not performed using real-life workloads. We have implemented an implicit co-scheduler on a large-scale, high-performance cluster, and used NAS parallel benchmarks to measure effective performance, There, we found that for FT and CG, the overhead versus gang scheduling can be as high as factor of 2.3, negating the Berkeley results. We conjecture that this is due to excessive network traffic, but are still in the process of performing additional experiments.

    CiNii Books

    researchmap

  • Ninf and PM: Communication libraries for global computing and high-performance cluster computing

    M Sato, H Tezuka, A Hori, Y Ishikawa, S Sekiguchi, H Nakada, S Matsuoka, U Nagashima

    FUTURE GENERATION COMPUTER SYSTEMS   13 ( 4-5 )   349 - 359   1998.3

  • Ninflet: a Migratable Parallel Objects Framework using Java

    Hiromitsu Takagi, Satoshi, Matsuoka Hidemoto, Nakada Satoshi Sekiguchi, Mitsuhisa Satoh, Umpei Nagashima

    ACM 1998 Workshop on Java for High-Performance Network Computing   151 - 159   1998

     More details

  • Pegasus: A Drawing System for Rapid Geometric Design

    Takeo Igarashi, Satoshi Matsuoka, Sachiko Kawachiya, Hidehiko Tanaka

    CHI'98 Summary (ACM Conference on Human Factors in Computing Systems)   24 - 25   1998

  • Popup Vernier: A Tool for Sub-Pixel-Pitch Dragging with a Smooth Mode Transition

    Yuji Ayatsuka Satoshi, Matsuoka Jun Rekimoto

    Proceedings of ACM Symposium on User Interface Software and Technology (UIST'98)   39 - 48   1998

  • Utilizing the Metaserver Architecture in the Ninf Global Computing System

    Hidemoto Nakada Hiromitsu, Takagi Satoshi, Matsuoka Umpei Nagashima Mitsuhisa Sato, Satoshi Sekiguchi

    Proc. High-Performance Computing and Networking '98, Springer LNCS   1401   607 - 616   1998

     More details

  • OpenJIT ---A Reflective Java JIT Compiler

    S. Matsuoka, H. Ogawa, K. Shimura, Y. Kimura, K. Hotta, H. Takagi

    Proc. OOPSLA '98 Workshop on Reflective Programming in C++ and Java   16 - 20   1998

     More details

  • Layered penumbrae: An effective 3D feedback technique

    Y Ayatsuka, S Matsuoka, J Rekimoto

    3RD ASIA PACIFIC COMPUTER HUMAN INTERACTION, PROCEEDINGS   202 - 209   1998

     More details

    Language:English  

    Web of Science

    researchmap

  • Ninflet: a Migratable Parallel Objects Framework using Java

    Hiromitsu Takagi, Satoshi, Matsuoka Hidemoto, Nakada Satoshi Sekiguchi, Mitsuhisa Satoh, Umpei Nagashima

    ACM 1998 Workshop on Java for High-Performance Network Computing   151 - 159   1998

     More details

  • Popup Vernier: A Tool for Sub-Pixel-Pitch Dragging with a Smooth Mode Transition

    Yuji Ayatsuka Satoshi, Matsuoka Jun Rekimoto

    Proceedings of ACM Symposium on User Interface Software and Technology (UIST'98)   39 - 48   1998

     More details

  • A Constraint-Based Approach for Visualization and Animation

    Shin Takahashi, Satoshi Matsuoka, Ken Miyashita, Hiroshi Hosobe, Tomihisa Kamada

    Constraints   3 ( 1 )   61 - 86   1998

     More details

    Language:English   Publisher:Kluwer Academic Publishers  

    DOI: 10.1023/A:1009708715411

    Scopus

    researchmap

  • OpenJIT ---A Reflective Java JIT Compiler

    S. Matsuoka, H. Ogawa, K. Shimura, Y. Kimura, K. Hotta, H. Takagi

    Proc. OOPSLA '98 Workshop on Reflective Programming in C++ and Java   16 - 20   1998

     More details

  • Pegasus: a drawing system for rapid geometric design.

    Takeo Igarashi, Sachiko Kawachiya, Hidehiko Tanaka, Satoshi Matsuoka

    CHI 98 Conference Summary on Human Factors in Computing Systems   24 - 25   1998

  • Utilizing the metaserver architecture in the Ninf global computing system

    H Nakada, H Takagi, S Matsuoka, U Nagashima, M Sato, S Sekiguchi

    HIGH-PERFORMANCE COMPUTING AND NETWORKING   1401   607 - 616   1998

     More details

    Language:English  

    Web of Science

    researchmap

  • A performance evaluation model for effective job scheduling in global computing systems

    K Aida, A Takefusa, H Nakada, S Matsuoka, U Nagashima

    SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING - PROCEEDINGS   352 - 353   1998

     More details

    Language:English  

    Web of Science

    researchmap

  • Reduction of overhead in drawing figures with computer - Detailed analyses of drawing tasks

    S Kawachiya, T Igarashi, S Matsuoka, H Tanaka

    3RD ASIA PACIFIC COMPUTER HUMAN INTERACTION, PROCEEDINGS   11 - 18   1998

     More details

    Language:English  

    Web of Science

    researchmap

  • A Report on Grandprix for Java, 1997

    MATSUOKA Satoshi

    IPSJ Magazine   38 ( 12 )   1093 - 1098   1997.12

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00003323/

  • ABCL/EM - 4 : An Implementation and Evaluation of a Concurrent Object -oriented Language System on a Data- driven Parallel Computer

    YASUGI Masahiro, MATSUOKA Satoshi, YONEZAWA Akinori

    Transactions of Information Processing Society of Japan   38 ( 9 )   1790 - 1799   1997.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Concurrent object-oriented computing provides modeling power and natural MIMD parallelism through concurrency of objects. Unfortunately the high costs of inter-node message passing and intra-node scheduling make the implementation of concurrent object-oriented languages inefficient. To overcome these problems, we have proposed a new software/hardware architecture (ABCL/EM-4) which realizes efficient parallel execution of programs based on a concurrent object-oriented computation model. Our ABCL/EM-4 achieved high performance with a combination of simple and fast hardware mechanisms and sophisticated software design, where the cost of a remote message-passing and/or a context-switch can be almost comparable to that of a sequential procedure call. This paper shows the evaluation results with the developed ABCL/ST compiler on the data-driven parallel computer EM-4.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00013334/

  • The Plan -Do Style Compilation Technique for Eager Data Transfer in Thread- based Execution and Its Evaluation

    YASUGI Masahiro, MATSUOKA Satoshi, YONEZAWA Akinori

    Transactions of Information Processing Society of Japan   38 ( 9 )   1840 - 1848   1997.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Plan-Do compilation technique is a new, advanced compilation framework for eager data transfer on distributed-memory parallel architectures. The technique is especially effective for a recent breed of fine-grain architectures by realizing a high-throughput low-latency communication scheme, pipelined sends. The compilation of high-level, plan-do style code into low-level, eager data transfer code is achieved via straightforward application of the translation function. Benchmark results on a real parallel architecture, EM-4, with the developed ABCL/ST compiler exhibit good performance.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00013338/

  • Preliminary Study of Global Job Scheduling for Ninf : a High-Performance Global Computing System

    OGAWA HIROTAKA, TAKEFUSA ATSUKO, NAKADA HIDEMOTO, AIDA KENTO, MATSUOKA SATOSHI

    IPSJ SIG Notes   67 ( 75 )   85 - 90   1997.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Rapid increase in speed and availability of global-network may make global supercomputing possible including our Ninf system. However, performance characteristics of these systems have been little investigated, especially under multi-clients, multi-sites situation. In order to establish methodology to schedule multiple job requests to multiple computational servers effectively and guarantee performance per each client, we conducted benchmarks under various WAN environments. And we observed communication bandwidth dominated performance for communication-intensive applications such as Linpack, and aggregate bandwidth could be sustained for multi-clients located at different internet sites. Furthermore, according to these observations, we proposed simulation model based on queuing theory. And we also performed preliminary benchmarks using our scheduling server named Metaserver.

    CiNii Books

    researchmap

  • Ninfによる広域分散並列計算

    中田秀基, 高木浩光, 松岡聡, 長嶋雲兵, 佐藤三久, 関口智嗣

    並列処理シンポジウム論文集   1997   281 - 288   1997.5

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • A Constraint Drawing System Combining Dexterity and Precision

    54 ( 0 )   425 - 426   1997.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Network Numerical Information System Ninf : Performance for Multi-Clients

    TAKEFUSA ATSUKO, OGAWA HIROTAKA, MATSUOKA SATOSHI, NAKADA HIDEMOTO, SATO MITSUHISA, SEKIGUCHI SATOSHI, NAGASHIMA UMPEI

    IPSJ SIG Notes   65 ( 21(HPC-65) )   3 - 8   1997.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    To establish a basis for globally-distributed parallel computing in numerial computing, we are currently working on the Ninf (Network based Information Library towards a Globally High Performance Computing) software system. To evaluate the Ninf system, we perform Linpack Benchmark with the Ninf system on Cray J90 vector-parallel supercomputer and DEC Alpha cluster of workstations, and Sun workstations. Results show that the utility and robustness of the Ninf system, and multicomputers such as vector-parallel computers and MPPs can effectively support network information services via Ninf.

    CiNii Books

    J-GLOBAL

    researchmap

  • Supporting Multiple Parallel Programming Styles with MPC++ and their Performance

    NIKAMI ATSUYUKI, MATSUOKA SATOSHI, ISHIKAWA YUTAKA, SATOH MITSUHISA

    IPSJ SIG Notes   65 ( 21 )   57 - 62   1997.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    For parallel processing to become general, the underlying basis should be advanced commodity technology, and parallel (programming) languages are no exceptions. On the other hand, parallel languages must also satisfy the requirements that inherently stem from parallel processing, such as the support of a wide range of parallel programming styles, ease-of-programming, and high performance. We investigated whether existing object-oriented languages satisfy such requirements or not by showing that C++ can support a wide range of parallel programming styles without special language extensions. More concretely, based on MPC++, which is a parallel dialect of C++ extended using only templates and inheritance, we created a class/template library which support three major kinds of parallel programming styles. We tested its performance with representative benchmark programs of each programming styles on a workstation cluster.

    CiNii Books

    researchmap

  • Global Parallel Computation using Ninf

    NAKADA HIDEMOTO, TAKAGI HIROMITSU, MATSUOKA SATOSHI, NAGASHIMA UMPEI, SATOH MITSUHISA, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   65 ( 21(HPC-65) )   9 - 14   1997.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Distributed computing using message passing libraries in a LAN(Local Area Network) environment is already accepted as an effective supercomputing methodology. On the other hand, although distributed computing in WAN(Wide Area Network) environment is becoming practical due to recent development of high-speed network facilities, software framework for supercomputing in WAN is yet to be established. We propose 'Ninf', a distributed computing framework for globally distributed computing environment. Ninf enables parallel computing in WAN based on the macro dataflow model, and facilitates automatic dynamic load distribution and scheduling. Ninf has the following advantages over using existing message passing libraries in WAN supercomputing : (1)communication protocol suited for grobally distributed environment, (2)ease of programming (3)reuse of existing libraries, (4)integration with existing data resources on the Internet.

    CiNii Books

    J-GLOBAL

    researchmap

  • Ninflet:JavaによるWorld-Wide High Performance Computing環境 (インターネットコンファレンス'97論文集) -- (Session 3(Application)〔和文〕)

    高木 浩光, 松岡 聡, 中田 秀基

    インタ-ネットコンファレンス論文集   ( 1997 )   133 - 147   1997

     More details

    Language:Japanese   Publisher:日本ソフトウェア科学会インタ-ネットテクノロジ研究会〔ほか〕  

    CiNii Books

    researchmap

  • Multi-client LAN/WAN Analysis of Ninf : a High-Performance Global Computing System

    SATOSHI MATSUOKA

    Proceedings of IEEE Supercomputing '97, San Jose, CA   1997

  • Interactive Beautification : A Technique for Rapid Geometric Design

    SATOSHI MATSUOKA

    Proceedings of ACM Symposium on User Interface Software and Technology (UIST'97), Banff, Canada   1997

     More details

  • A Methodology for Specifying Data Distribution using only Standard Object-Oriented Features

    SATOSHI MATSUOKA

    Proceedings of ACM/IEEE International Conference on Supercomputing (ICS'97), Vienna, Austria   116 - 123   1997

     More details

  • Multi-client LAN/WAN Analysis of Ninf : a High-Performance Global Computing System

    SATOSHI MATSUOKA

    Proceedings of IEEE Supercomputing '97, San Jose, CA   1997

  • Interactive Beautification : A Technique for Rapid Geometric Design

    SATOSHI MATSUOKA

    Proceedings of ACM Symposium on User Interface Software and Technology (UIST'97), Banff, Canada   1997

     More details

  • A Methodology for Specifying Data Distribution using only Standard Object-Oriented Features

    SATOSHI MATSUOKA

    Proceedings of ACM/IEEE International Conference on Supercomputing (ICS'97), Vienna, Austria   116 - 123   1997

     More details

  • Developement of NinfCalc: Network Based Table Calculator for Matrix

    53 ( 0 )   467 - 468   1996.9

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Ninf : Network Based Information Library for Global World-Wide Computing Infrastructure : the Software Architecture and its Performance

    SEKIGUCHI SATOSHI, NAKADA HIDEMOTO, SATO MITSUHISA, NAGASHIMA UMPEI, MATSUOKA SATOSHI

    IPSJ SIG Notes   62 ( 81(HPC-62) )   153 - 158   1996.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    For the purpose of establishinig a framework of information sharing over the Internetwork, we have proposed the Ninf, Network based information library for high performance computing. Basically, the Ninf is based on the server-client model. Thus, servers, residing in the network world, handle information resources either its numerical executables or scientific constants, and clients are programmed by users with the Ninf client library which establishes RPC connections connect to servers. In this article, the Ninf software system will be overviewed followed by the preliminary results on the Linpack Benchmark with the Ninf-RPC, which lead to the conclusion that the possibility of the network computing with the Ninf even if the granularity of the program size is rather smaller.

    CiNii Books

    J-GLOBAL

    researchmap

  • Parallel Programming using Parallel STL

    NAKADA HIDEMOTO, SATOH MITSUHISA, MATSUOKA SATOSHI, ISHIKAWA YUTAKA, MATSUDA MOTOHIKO

    IPSJ SIG Notes   96 ( 82 )   85 - 90   1996.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    There are several works on parallel processing with C++ using template library. In this paper, we discuss. template technology as an interface for parallel programming. Parallel template library allows us to program parallel programs taking no thought of target machine configuration. Data-parallel template library enables optimization using data locality. Task-parallel template library provides methods to distribute work-loads.

    CiNii Books

    researchmap

  • Ninf API for Distributed Memory Multiprocessors

    OGAWA HIROTAKA, MATSUOKA SATOSHI, NAKADA HIDEMOTO, SATO MITSUHISA, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   62 ( 81(HPC-62) )   159 - 164   1996.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    To establish a basis for globally-distributed parallel computing in numerial computing, we are currently working on the Ninf (Network based Information library for High Performance Computing) software system. Using this system on distributed memory multiprocessors, Ninf core server runs on a single node such as a frond-end machine or an I/O processor. As a result, computation data concentrates on the node and easily become a bottleneck and even might exhaust its memory resource. To prevent this problem, we propose new common API for describing initial data distributions and mechanism to hand-off connection with Ninf-client to the target node successively. We preliminarily evaluate our hand-off mechachism on Fujitsu's AP1000. Results show that it improves the total execution times.

    CiNii Books

    J-GLOBAL

    researchmap

  • Implementing MPI in a High-Performance, Multithreaded Language MPC++

    O'CARROLL FRANCIS B., HORI ATSUSHI, TEZUKA HIROSHI, ISHIKAWA YUTAKA, MATSUOKA SATOSHI

    IPSJ SIG Notes   62 ( 81 )   141 - 146   1996.8

     More details

    Language:English   Publisher:Information Processing Society of Japan (IPSJ)  

    We have ported the MPICH implementation of MPI to the high-performance, multithreaded programming language MPC++. We discuss our modifications to the design of MPICH to support multiple threads. MPICH now runs experimentally on top of MPC++ on a Sun workstation cluster connected by Myrinet and achieves higher performance than standard MPICH on Myrinet TCP/IP on the same hardware.

    CiNii Books

    researchmap

  • A Pen-Based Constraint Drawing System Combining Dexterity and Precision

    IGARASHI Takeo, KAWACHIYA Sachiko, MATSUOKA Satoshi, TANAKA Hidehiko

    IPSJ SIG Notes   81 ( 77 )   85 - 90   1996.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Traditional sketching on computers lacked the freedom of real pens, Our drawing editor combines the dexterity of real pens and computer-assisted precision based on techniques such as two-phased sketch interaction with pie menu and sliders, beautification with perceptual constraints, and segment-based drawing representation. Prototype implementation on IBM pen PC and Xerox Liveboard has shown that the system is fast and easy to use.

    CiNii Books

    researchmap

  • A Compilation Technique for Parallel Reflective Language Systems Using Partial Evaluation

    MASUHARA Hidehiko, MATSUOKA Satoshi, YONEZAWA Akinori

    Transactions of Information Processing Society of Japan   37 ( 7 )   1290 - 1298   1996.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Meta-programmability of parallel reflective language systems is beneficial for parallel applications to describe optimizations, etc. On the other hand, their execution model based on interpretation is an obstacle to efficient implementation. We propose a compilation technique for parallel reflective languages using partial evaluation. The technique, which effectively eliminates program interpretation, includes partial evaluation extended for side-effects, and several program transformation techniques. Benchmarks on a MPP show that parallel applications with meta-level optimizations can be executed with small overhead.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00013611/

  • UbiquitousLinks : Hypermedia Links Embeded in the Real World

    AYATSUKA Yuji, REKIMOTO Jun, MATSUOKA Satoshi

    67   23 - 30   1996.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Progress of hardware technology makes computers portable, and they become able to mediate between users and the real world. On the other hand, World Wide Web (WWW) becomes widely available. Although information related with objects in the real world exists on many WWW sites, these relations are unclear. This paper proposes a hypermedia system to link the real world objects with information on the WWW. We use IDs attached on the real world objects and a portable computer connected to the Internet via wireless networks. This combination makes it possible to retrieve information on the WWW from real world objects.

    CiNii Books

    researchmap

  • A Commentary on Program Language Design and Implementation Research at ICOT-Honoring our Great Predecessors

    MATSUOKA Satoshi

    IPSJ Magazine   37 ( 5 )   407 - 410   1996.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

  • Analysis of the structures of diagrams created in drawing editors

    52 ( 0 )   89 - 90   1996.3

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • A Meta Server Architecture for Ninf : Networked Information Library for High Performance Computing

    NAKADA HIDEMOTO, HUSANO TAKAYUKI, MATSUOKA SATOSHI, SATOH MITSUHISA, SEKIGUCHI SATOSHI

    IPSJ SIG Notes   60 ( 22(HPC-60) )   77 - 82   1996.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    To establish a framework of information sharing in the numerical computation area, we have proposed the Ninf, Network based information library for high performance computing. In this paper, we show a Meta Server architecture, which is a component of the Ninf system. Meta Server stand between the Server and the Client and hides the Server from the Client. It also enables easy distributed concurrent computation.

    CiNii Books

    J-GLOBAL

    researchmap

  • OMPI : Optimizing MPI programs using Partial Evaluation

    SATOSHI MATSUOKA

    Proc. IEEE/ACM Supercomputing '96, Pittsburgh, PA, IEEE Society Press, 1996 (proceedings in CD-ROM).   1996

     More details

  • OMPI : Optimizing MPI programs using Partial Evaluation

    SATOSHI MATSUOKA

    Proc. IEEE/ACM Supercomputing '96, Pittsburgh, PA, IEEE Society Press, 1996 (proceedings in CD-ROM)   1996

     More details

  • COMPILING AWAY THE META-LEVEL IN OBJECT-ORIENTED CONCURRENT REFLECTIVE LANGUAGES USING PARTIAL EVALUATION

    H MASUHARA, S MATSUOKA, K ASAI, A YONEZAWA

    SIGPLAN NOTICES   30 ( 10 )   300 - 315   1995.10

     More details

  • Extension to a Parallel Constraint Logic Programming Language for Applications in Optimization Problems.

    51 ( 0 )   77 - 78   1995.9

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Report on the Object-Oriented '95 Symposium (OO '95)

    AOYAMA Mikio, NISHIOKA Kenji, KISHI Tomoji, UEHARA Sanya, MATSUOKA Satoshi, CHUSHO Takeshi, FUKAZAWA Yoshiaki

    IPSJ SIG Notes   105 ( 84 )   89 - 97   1995.9

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    The Object-Oriented '95 Symposium was held on June 1-2. 1995 at Mita Campus of Keio University in Tokyo Under the theme of "Theory and Practice of Object-Oriented Systems Development", opening speeches, tutorials. general sessions and panel session have covered a wide spectrum of development technologies based on object-orientation. This report highlightens the major topics of the symposium as well as two special sessions ; one session presented the experience of object-oriented systems development and another was a panel on the theory and practice of object-oriented development technology.

    CiNii Books

    researchmap

  • Hierarchical Collection : a Simple Scheme for the Separation of Parallelism and Distribution

    SATO Naohito, MATSUOKA Satoshi, YONEZAWA Akinori

    IPSJ SIG Notes   57 ( 81 )   37 - 42   1995.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Separation of parallelism and distribution is one of major concerns of efficient massively parallel computation. The details of distribution should be hidden from users of parallel / distributed class frameworks, but should be easily modifiable by (library) programmers. We have proposed a new scheme for building object-oriented parallel distributed class frameworks based on a simple but mathematically disciplined model called hierarchy of collections. Based on this model, classes can be easily derived to achieve high performance massively parallel computation on a variety of physical platforms. We describe in detail how to define hierarchical collections for typical examples of distributions.

    CiNii Books

    researchmap

  • Evaluation of MPI Optimization Method by Eliminating Software Overhead

    OGAWA Hirotaka, MATSUOKA Satoshi

    IPSJ SIG Notes   57 ( 81 )   13 - 18   1995.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    With generic implementation of MPI (Standard Interface of Message Passing Library for parallel computers), dynamic data types and communication contexts must be supported extensively. As a result, software overhead per single communication becomes very large, despite inherent low latency communication performance of the target architecture. In this paper, we propose a method of generating optimized MPI programs, by 1) analyzing a MPI program statically, 2) specializing the program using static information to eliminate software overhead. We preliminarily evaluate basic communication performance of this method on Fujitsu's AP1000. Results show that simple static analysis decreases the overhead from 338μsec to 76μsec, and greatly improves both latency and throughput.

    CiNii Books

    researchmap

  • Meta-Level Architecture of ABCL/f and its Use in Parallel Programs

    Masuhara Hidehiko, Matsuoka Satoshi, Yonezawa Akinori

    IPSJ SIG Notes   95 ( 82 )   65 - 72   1995.8

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Meta-level programming via computational reflection has come to be recognized as beneficial for parallel applications. Whether we can clearly program practical meta-programs greatly depends on the design of the language's meta-architecture. This paper presents a design of the meta-architecture of ABCL/f, an object-oriented concurrent reflective language. Its features are customization via the meta-interpreters and the meta-objects, annotations that serve as directives that are implemented by the meta-programs, re-use via inheritance in meta-programs, etc. The effectiveness of the architecture is examined through examples from several parallel programs.

    CiNii Books

    researchmap

  • Adaptive Recognition of Implicit Structures in Human-Organized Layouts

    IGARASHI Takeo, MATSUOKA Satoshi, MASUI Toshiyuki

    61 ( 70 )   33 - 38   1995.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Card-handling using hypertext editor can be a powerful methodology understanding of complex problems. To support such activity, recognizing implicit structure in the arrangement of cards would be useful. But, because these structures are by nature ambiguous and highly dependent on user-specific perception, it is difficult for conventional rule-based spatial parsing algorithm to achieve this task. We propose techniques for building spatial parser suitable for finding such ambiguous structures based on the mechanics of human perception. Moreover, our parser is adaptively customized to reflect a particular user's preferences through an interactive suggestion process, supported by application of a genetic algorithm.

    CiNii Books

    researchmap

  • Editor's Message to Special Session on Parallel Processings

    IPSJ Journal   36 ( 7 )   1503 - 1503   1995.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    CiNii Books

    researchmap

  • Evaluation of performance and reliability of client-server system coded by TCP/IP and PVM for Ninf system

    IIOKA Mie, NII Yukako, NAGASHIMA Umpei, SEKIGUCHI Satoshi, SATO Mitsuhisa, MATSUOKA Satoshi, HOSOYA Haruo

    IPSJ SIG Notes   55 ( 28(HPC-55) )   81 - 88   1995.3

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    A prototype of a numerical information library system based on a high performance wide area network (Ninf) was developed to evaluate performance and reliability of communication using TCP/IP and PVM. Though client-server system constructed by TCP/IP is almost 10 times faster than that by PVM, programming cost is quite high because many overhead and error treatment should be coded by the user. On the other hand, client-server program is easily made using PVM.

    CiNii Books

    J-GLOBAL

    researchmap

  • Interactive Generation of Graphical User Interfaces by Multiple Visual Examples

    MIYASHITA Ken, MATSUOKA Satoshi, TAKAHASHI Sin, YONEZAWA Akinori, Ken Miyashita, Satoshi Matsuoka, Shin Takahashi, Akinori Yonezawa, Information & Communication System Research Division Research Center SONY Co. Ltd., Department of Mathematical Engineering Faculty of Engineering University of Tokyo., Department of Information Science Faculty of Science University of Tokyo., Department of Information Science Faculty of Science University of Tokyo.

    11 ( 6 )   41 - 51   1994.11

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • Constructing Algorithm Animations via Declarative Specifications

    TAKAHASHI Shin, MIYASHITA Ken, MATSUOKA Satoshi, YONEZAWA Akinori, Shin Takahashi, Ken Miyashita, Satoshi Matsuoka, Akinori Yonezawa, Department of Information Science The University of Tokyo., Department of Information Science The University of Tokyo:(Present address) Research Center Sony Co., Department of Mathematical Engineering The University of Tokyo., Department of Information Science The University of Tokyo.

    11 ( 6 )   83 - 94   1994.11

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • An Object-Oriented Concurrent Reflective Language for Dynamic Resource Management in Highly Parallel Computing

    Masuhara Hidehiko, Matsuoka Satoshi, Yonezawa Akinori

    94 ( 65 )   57 - 64   1994.7

     More details

    Language:English   Publisher:Information Processing Society of Japan (IPSJ)  

    Irregular parallel applications, whose data and communication patterns are determined only at run-time, often requires good dynamic resource management (DRM) tailored to the application and/or hardware architecture for efficient execution. To easily provide such DRM system, this paper proposes an object-oriented concurrent reflective language ABCL/R3. In ABCL/R3, various DRM systems including scheduling, object allocation, and load balancing, can be realized by modifying/extending abstracted meta-level of the language in an encapsulated way. This paper also shows preliminary evaluation of the language including a basic cost of reflection and a simple DRM system, developed in a prototype system running on a multicomputer AP1000.

    CiNii Books

    researchmap

  • Design and Implementation of an Object-Oriented Concurrent Reflective Language ABCL/R2.

    MASUHARA Hidehiko, MATSUOKA Satoshi, WATANABE Takuo, Hidehiko Masuhara, Satoshi Matsuoka, Takuo Watanabe, Department of Information Science The University of Tokyo., Department of Information Science The University of Tokyo:(Present address) Department of Mathematical Engineering The University of Tokyo, Department of Information Science The University of Tokyo:(Present address) School of Information Science Japan Advanced Institute of Science and Technology

    11 ( 3 )   15 - 32   1994.5

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • The Implementation of a Reflective Object-Oriented Concurrent Language without a Run-time Kernel.

    ICHISUGI Yuuji, MATSUOKA Satoshi, YONEZAWA Akinori, Yuuji Ichisugi, Satoshi Matsuoka, Akinori Yonezawa, Electrotechnical Laboratory., Department of Mathematical Engineering The University of Tokyo., Department of Information Science The University of Tokyo.

    11 ( 3 )   65 - 77   1994.5

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • The Plan-Do Style Compilation Technique for Eager Data Transfer in Thread-Based Execution

    Yasugi M., Matsuoka S., Yonezawa A.

    94 ( 65 )   9 - 16   1994

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Plan-do compilation technique is a new, advanced compilation framework for eager data transfer on distributed-memory parallel architectures. The technique is especially effective for a recent breed of fine-grained architectures by realizing a high-throughput low-latency communication scheme, pipelined sends. The compilation of high-level, plan-do style code into low-level, eager data transfer code is achieved via straightforward application of a set of translation rules. Preliminary low-level benchmark results on a real parallel architecture, EM-4, exhibit good speedups.

    CiNii Books

    researchmap

  • An Efficient Implementation of Concurrent Object-Oriented Languages on Multicomputers

    TAURA Kenjiro, MATSUOKA Satoshi, YONEZAWA Akinori

    7   39 - 42   1993.7

     More details

    Language:Japanese  

    CiNii Books

    researchmap

  • IMSA'92国際リフレクションワークショップ

    松岡 聡, 増原 英彦, Satoshi Matsuoka, Hidehiko Masuhara, 東京大学理学部情報科学科, 東京大学理学部情報科学科, Department of Information Science the University of Tokyo., Department of Information Science the University of Tokyo.

    コンピュータソフトウェア   10 ( 4 )   76 - 82   1993.7

     More details

    Language:Japanese   Publisher:日本ソフトウェア科学会  

    CiNii Books

    researchmap

  • Mu1tilispの操作的意味及び実現

    浅井 健一, 松岡 聡, 米澤 明憲

    全国大会講演論文集   41 ( 0 )   8 - 9   1990.9

     More details

    Language:Japanese  

    近年、並列Lispが関数型言語の潜在的な並列性を大きく引き出せるものとして注目されている。実際にMultilisp[3]をはじめとしてMultischeme[6],Mul-T[5],QIisp[2]などたくさんの並列が開発され並列計算機上で高い性能が報告されている。しかし、現在のところ並列Lispは並列計算機上での性能を向上させることを目的としているのでもっぱら性能に関しての議論がなされ、言語の意味に関する考察はほとんどなされていない。そのため言語仕様があいまいになるし、言語仕様の変更も難しくなっている。このことはスケジューリング方式の固定化を引き起こし、ひいては自己反映計算[8]の実現を難しくしている。そこでMultiLispの操作的意味記述[1]を与え、これを用いて逐次型計算機上にSchme[7]によるインタプリタを作成した。さらにこれをもとに表示的意味記述を与える。またその記述から導かれるfutureとcall/ccとの相互干渉について述べる。

    CiNii Books

    researchmap

  • 並列オブジェクト指向言語におけるSynchronization Constraintsと継承について

    松岡 聡, 米澤 明憲

    全国大会講演論文集   41 ( 0 )   28 - 29   1990.9

     More details

    Language:English  

    On developing large-scale programs with object-orientedconcurrent programming (OOCP) languages, we generally acknowledge that inheritance is one of the most essential features. However, it has been previously pointed out that in heritance and synchronization constraints in concurrent object systems often conflict with each other. For this reason, some languages such as ABCL/1[13] do not employ inheritance. Although several solutions[3, 4, 7,10,12] have been proposed in the past, we argue that, unfortunately,most of the proposals render inheritance totalLy useless.

    CiNii Books

    researchmap

  • 並列オブジェクト指向言語への安全な継承の導入について

    脇田 建, 松岡 聡

    全国大会講演論文集   41 ( 0 )   26 - 27   1990.9

     More details

    Language:Japanese  

    並列オブジェクト指向言語では,同期制約の記述を継承することの困難が指摘されて以来,さまざまな言語で同期制約の記述の工夫が図られてきた.その多くのものは,同期制約を受理可能なメッセージの集合で表すものであったが,本稿はその方法の問題点を指摘し,それに対する解決として同期制約の論理式による表現法を挙げる.さらに,このように表現されたプログラムをプログラム変換を用いて実現することを提案する.

    CiNii Books

    researchmap

  • On Formal Treatment of Interactive Graphics

    39 ( 0 )   846 - 847   1989.10

     More details

    Language:Japanese  

    CiNii Books

    researchmap

▼display all

Presentations

  • Distributed Diskless Checkpoint for Large Scale Systems

    10 IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2010)  2010 

     More details

  • HPC in the Cloud---A Hype, the End of SCs, or Peaceful Coexistence?

    2010 

     More details

  • Auto-Tuning of a Scientific Application on GPU clusters

    IPSJ SIG Technical Report  2010 

     More details

  • クラウド環境における大規模データブロードキャストの動的最適化

    ハイパフォーマンスコンピューティングと計算科学シンポジウム (HPCS2010)  2010 

     More details

  • Improving the Large-Scale Data Access Using Virtual Machine Migration

    2010 

     More details

  • Performance Evaluation of Software Framework for Memory Fault Tolerance in GPU Accelerators

    SIAM Conference on Parallel Processing and Scientific Computing (PP10), MS36: Trends and Experiences in Heterogeneous Many-core Computing  2010 

     More details

  • Accelerated Computing in TSUBAME 1.2/2.0

    2010 

     More details

  • HPC in the Cloud---A Hype, the End of SCs, or Peaceful Coexistence?

    2010 

     More details

  • GPU クラスタにおける科学技術計算の自動最適化

    HPC研究会  2010 

     More details

  • Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators

    IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010)  2010 

     More details

  • A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs

    24th IEEE International Parallel and Distributed Processing Symposium (IPDPS'10)  2010 

     More details

  • GPU Acceleration: a Fad or the Yellow Brick Road onto Exascale

    2010 

     More details

  • 大規模計算機システムの資源選択を支援するエキスパートシステム

    情報処理学会研究報告2009-HPC-124  2010 

     More details

  • GPUクラスタにおける省電力タスクスケジューリング

    第124回HPC研究会  2010 

     More details

  • Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds

    The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  2010 

     More details

  • Distributed Diskless Checkpoint for Large Scale Systems

    10 IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2010)  2010 

     More details

  • 仮想マシン動的再配置による大規模データアクセスの高速化

    情報処理学会先進的計算基盤システムシンポジウム論文集 (SACSIS2010)  2010 

     More details

  • Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds

    The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  2010 

     More details

  • Rise of the commodity vectors

    2008 8th International Meeting High Performance Computing for Computational Science  2008 

     More details

  • 性能モデルに基づくCPU及びGPUを併用する効率的なFFTライブラリ

    情報処理学会 ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2008)  2008 

     More details

  • Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

    2008 ACM/IEEE conference on Supercomputing (SC08)  2008 

     More details

  • HPC-GPGPU: Large-scale commodity accelerated clusters and its application to advanced structural proteomics

    AHeDD2008/IPAB2008 Joint Symposium  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • Hundred million cores in commodity---Why not? (or, will `custom'*finally* prevail?)

    CCGSC2008  2008 

     More details

  • Coupled-simulation e-science support in the NAREGI grid

    IEEE Computer  2008 

     More details

  • 情報爆発時代のグリッド環境に対応したMPI集団通信アルゴリズムの最適化

    第70回情報処理学会全国大会  2008 

     More details

  • HPC-GPGPU: Large-scale commodity accelerated clusters and its application to advanced structural proteomics

    Microsoft Science All-Hands-Meeting  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • 情報爆発に対応する耐故障性 MPI フレームワークの提案

    第70回情報処理学会全国大会  2008 

     More details

  • 情報爆発時代の光インターコネクト上でのMPI通信アルゴリズム

    第70回情報処理学会全国大会  2008 

     More details

  • Grid'BnB: A parallel branch & bound framework for grids

    14th International Conference on High Performance Computing (HiPC)  2008 

     More details

  • 省電力ページング方式を実装した次世代メモリアーキテクチャ上での並列プログラムの評価

    情報処理学会 ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2008)  2008 

     More details

  • 情報爆発時代へ向けた不均一アーキテクチャにおけるスーパーコンピューティング

    第70回情報処理学会全国大会  2008 

     More details

  • 情報爆発時代のグリッドファイルシステム上での大規模データ管理

    第70回情報処理学会全国大会  2008 

     More details

  • 情報爆発に対応するスケーラブルかつ自律的な障害解析

    情報処理学会第70回全国大会  2008 

     More details

  • 情報爆発時代におけるモデルベース資源選択による高速な仮想クラスタ構築

    情報処理学会第70回全国大会  2008 

     More details

  • An efficient, model-based CPU-GPU heterogeneous FFT library

    International Heterogeneity in Computing Workshop (HCW '08)  2008 

     More details

  • Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method

    The Fourth Workshop on High-Performance  2008 

     More details

  • Massive supercomputing coping with heterogeneity of modern accelerators

    IEEE International Parallel & Distributed Processing Symposium (IPDPS 2008)  2008 

     More details

  • Locality aware MPI communication on a commodity opto-electronic hybrid network

    Workshop on Large-Scale Parallel Processing (LSPP)  2008 

     More details

  • 情報爆発時代のスーパコンピュータ運用経験:TSUBAME Grid Clusterにて

    情報処理学会第70回全国大会  2008 

     More details

  • NAREGIグリッドミドルウェアによる大規模連携接続実証実験

    情報処理学会研究報告  2008 

     More details

  • Index distribution technique for efficient search on unstructured peer-to-peer networks

    2008 

     More details

  • A decentralized, scalable, and autonomous grid monitoring system

    11th International Conference on Principles of Distributed Systems (OPODIS)  2008 

     More details

  • Model-based fault localization in large-scale computing systems

    The 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS'08)  2008 

     More details

  • Index distribution technique for efficient search on unstructured peer-to-peer networks

    The International Conference in Electrical Engineering/Electronics  2008 

     More details

  • Rise of the commodity vectors

    2008 8th International Meeting High Performance Computing for Computational Science  2008 

     More details

  • モデルベース資源選択による効率的な仮想クラスタ構築

    情報処理学会 先進的計算基盤システムシンポジウム(SACSIS2008)  2008 

     More details

  • グリッド環境におけるMPI Scatter/Gather通信アルゴリズムの最適化

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2008)  2008 

     More details

  • ソフトウェアECCによるGPUメモリの耐故障性の実現と評価

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2008)  2008 

     More details

  • CUDA 環境における高性能3次元FFT

    情報処理学会 先進的計算基盤システムシンポジウム(SACSIS2008)  2008 

     More details

  • Time stamping authoruty grid

    Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'08)  2008 

     More details

  • 不均一な複数GPUにおけるセルフスケジューリングによる並列数値演算

    情報処理学会 先進的基盤システムシンポジウム (SACSIS2008)  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • ヘテロ計算環境のための省電力タスクスケジューリング

    情報処理学会 先進的基盤システムシンポジウム (SACSIS2008)  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • ヘテロ計算環境のための省電力タスクスケジューリング

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2008)  2008 

     More details

  • Rise of the Commodity Vectors or Democratization of Supercomputing

    NVISION2008  2008 

     More details

  • Access-pattern and bandwidth aware file replication algorithm in a grid environment

    The 9th IEEE/ACM International Conference on Grid Computing (Grid 2008)  2008 

     More details

  • Environmental-aware optimization of MPI checkpointing intervals

    The 2008 IEEE International Conference on Cluster Computing (Cluster 2008)  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • HPC-GPGPU: Large-scale commodity accelerated clusters and its application to advanced structural proteomics

    AHeDD2008/IPAB2008 Joint Symposium  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • Model-based Optimization for Data-Intensive Application on Virtual Cluster

    The 2008 9th IEEE/ACM International Conference on Grid Computing (Grid 2008)  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • 光ネットワークの補助的利用によるHPC性能向上

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2008)  2008 

     More details

  • 広域分散ファイルシステムにおけるアクセスパターンと性能を考慮したファイル配置

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2008)  2008 

     More details

  • 仮想クラスタを用いたData-Intensive Application 実行環境の 性能モデル構築と最適化に向けて

    2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • 仮想クラスタを用いたData-Intensive Application 実行環境の性能モデル構築と最適化

    情報処理学会研究報告  2008 

     More details

  • Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

    2008 ACM/IEEE conference on Supercomputing (SC08)  2008 

     More details

  • 複数GPUにおけるセルフスケジューリングによる並列数値演算

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2008)  2008 

     More details

  • High-performance MPI broadcast algorithm for grid environments utilizing multi-lane NICs

    Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07)  2007 

     More details

  • Virtual clusters on the fly - fast, scalable, and flexible installation

    Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07)  2007 

     More details

  • Web-site-based partitioning techniques for efficient parallelization of the PageRank Computation

    2007 

     More details

  • High-performance distributed solar computing (?) --- Towards a grid that computes like trees---

    2007 

     More details

  • Data management on grid filesystem for data-intensive computing

    SAINT 2007 Workshop on Middleware Architecture in the Internet  2007 

     More details

  • Peer-to-peer scheduling system with scalable information sharing protocol

    SAINT 2007 Workshop on Middleware Architecture in the Internet  2007 

     More details

  • A peer-to-peer infrastructure for autonomous grid monitoring

    The 3rd International Workshop on Hot Topics in Peer-to-Peer Systems at the International Parallel & Distributed Processing Symposium 2007  2007 

     More details

  • ABARIS: An adaptable fault detection/recovery component framework for MPIs

    12th IEEE Workshop on Dependable Parallel  2007 

     More details

  • TSUBAME 1.2 の概要---世界初のGPU加速された大規模スパコン

    SGI セミナー  2008 

     More details

  • 最新TSUBAMEシステム

    IPAB セミナー  2008 

     More details

  • NAREGIミドルウェアβ-gLite 間における相互ジョブ起動実験

    情報処理学会研究報告2007-HPC-109(HOKKE2007)  2007 

     More details

  • ABARIS: An adaptable fault detection/recovery component framework for MPIs

    12th IEEE Workshop on Dependable Parallel  2007 

     More details

  • ハイパフォーマンス分散時刻認証局:毎秒百万タイムスタンプ発行の実現

    情報処理学会研究報告2007-HPC-109(HOKKE2007)  2007 

     More details

  • グリッド環境におけるマルチレーンを用いたMPIコレクティブ通信アルゴリズム

    情報処理学会 ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2007)  2007 

     More details

  • ヘテロ型スーパーコンピュータTSUBAMEのLinpackによる性能評価

    2007年ハイパフォーマンスコンピューティングと計算科学シンポジウムHPCS2007  2007 

     More details

  • Data management on grid filesystem for data-intensive computing

    SAINT 2007 Workshop on Middleware Architecture in the Internet  2007 

     More details

  • Peer-to-peer scheduling system with scalable information sharing protocol

    SAINT 2007 Workshop on Middleware Architecture in the Internet  2007 

     More details

  • Autonomically-adapting master-worker programming framework for multi-layered grid-of-clusters

    HPC Asia 2007  2007 

     More details

  • Model-based resource selection for efficient virtual cluster deployment

    2nd International Workshop on Virtualization Technology in Distributed Computing (VTDC'07)  2007 

     More details

  • Job invocation interoperability between NAREGI Middleware Beta and gLite

    HPC Asia 2007  2007 

     More details

  • フォールト/リカバリモデルを考慮した耐故障性をもつMPI フレームワークABARIS の提案と評価

    情報処理学会研究報告2007-HPC-109(HOKKE2007)  2007 

     More details

  • High-performance MPI broadcast algorithm for grid environments utilizing multi-lane NICs

    Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07)  2007 

     More details

  • Virtual clusters on the fly - fast, scalable, and flexible installation

    Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07)  2007 

     More details

  • CPUおよびGPUを併用するFFTライブラリの提案と評価

    情報処理学会研究報告 2007-HPC-111(SWOPP2007)  2007 

     More details

  • クラスタシステムにおけるIP-SANを用いたI/O処理の並列ベンチマークによる評価

    情報処理学会研究報告 2007-HPC-111(SWOPP2007)  2007 

     More details

  • 仮想クラスタを用いた複数サイト上でのMPI実行環境

    情報処理学会研究報告2007-HPC-109(HOKKE2007)  2007 

     More details

  • キャッシュを用いた仮想クラスタ高速構築手法の性能評価

    情報処理学会研究報告2007-HPC-109(HOKKE2007)  2007 

     More details

  • 分散時刻認証局グリッドとパラメータ依存性の解析

    情報処理学会 先進的計算基盤システムシンポジウム(SACSIS2007)  2007 

     More details

  • A peer-to-peer infrastructure for autonomous grid monitoring

    The 3rd International Workshop on Hot Topics in Peer-to-Peer Systems at the International Parallel & Distributed Processing Symposium 2007  2007 

     More details

  • 仮想クラスタ構築時間のモデリングおよびその最適化

    電子情報通信学会技術研究報告 2007-CPSY (SWOPP2007)  2007 

     More details

  • 次世代省電力メモリを用いた並列プログラムの省電力化の評価

    情報処理学会研究報告 2007-HPC-111(SWOPP2007)  2007 

     More details

  • High-performance distributed solar computing (?) --- Towards a grid that computes like trees---

    2007 

     More details

  • 分散システムにおける故障の自律的な解析

    ソフトウェア科学会第24回大会  2007 

     More details

  • Web-site-based partitioning techniques for efficient parallelization of the PageRank Computation

    2007 

     More details

  • インターネット上での分散時刻認証グリッドのタイムスタンプ発行スケーラビリティの評価

    情報処理学会研究報告2007-HPC-112,HPC Asia併設WS  2007 

     More details

  • 時刻認証グリッドの構築と基礎実験

    電子情報通信学会技術研究報告 2007-CPSY (SWOPP2007)  2007 

     More details

  • 分散時刻認証グリッドのインターネット上での動作実験

    電子情報通信学会技術研究報告 2007-CPSY (SWOPP2007)  2007 

     More details

  • 次世代光インターコネクトでの MPI 通信性能の評価

    日本ソフトウェア科学会第24回大会(2007年度)  2007 

     More details

  • 広域分散環境における大規模データ管理のためのノードグルーピング

    情報処理学会研究報告  2007 

     More details

  • 次世代光インターコネクト上での MPI アプリケーションの評価

    情報処理学会研究報告 2007-HPC-111(SWOPP2007)  2007 

     More details

  • Autonomically-adapting master-worker programming framework for multi-layered grid-of-clusters

    HPC Asia 2007  2007 

     More details

  • Model-based resource selection for efficient virtual cluster deployment

    2nd International Workshop on Virtualization Technology in Distributed Computing (VTDC'07)  2007 

     More details

  • Job invocation interoperability between NAREGI Middleware Beta and gLite

    HPC Asia 2007  2007 

     More details

  • Multi-replication with intelligent staging in data-intensive grid applications

    The 7th IEEE/ACM International Conference on Grid Computing  2006 

     More details

  • Multi-Replication with Intelligent Staging in ata-Intensive Grid Applications.

    In The 7th IEEE/ACM International Conference on Grid Computing  2006 

     More details

  • Speculative checkpointing

    DSW 2006  2006 

     More details

  • Profile-based optimization of power-performance by using dynamic voltage scaling on a PC cluster

    20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006)  2006 

     More details

  • Speculative checkpointing

    DSW 2006  2006 

     More details

  • Construction and Operation of the Grid Challenge Testbed

    2006 

     More details

  • MegaProto/E: Power-aware high-performance cluster with commodity technology

    20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006)  2006 

     More details

  • 光ネットワーク環境におけるMPI集団通信

    第159回計算機アーキテクチャ・第105回ハイパフォーマンスコンピューティング合同研究発表会(HOKKE-2006)  2006 

     More details

  • レプリカ管理システムを利用したデータインテンシブアプリケーション向けスケジューリングシステム

    第159回計算機アーキテクチャ・第105回ハイパフォーマンスコンピューティング合同研究発表会(HOKKE-2006)  2006 

     More details

  • グリッド上における仮想計算機を用いたジョブ実行環境構築システムの高速化

    第159回計算機アーキテクチャ・第105回ハイパフォーマンスコンピューティング合同研究発表会(HOKKE-2006)  2006 

     More details

  • 大規模環境向け情報共有手法を用いた分散ジョブスケジューリングシステム

    第159回計算機アーキテクチャ・第105回ハイパフォーマンスコンピューティング合同研究発表会(HOKKE-2006)  2006 

     More details

  • TSUBAMEの飛翔: ペタスケールへ向けた「みんなのスパコン」の構想.

    情報処理学会研究報告 2006-HPC-107  2006 

     More details

  • 動的なノード群構成機構を備えた階層型グリッド環境: Jojo2

    先進的計算基盤システムシンポジウム SACSIS2006  2006 

     More details

  • MegaProto/E: Power-aware high-performance cluster with commodity technology

    20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006)  2006 

     More details

  • Profile-based optimization of power-performance by using dynamic voltage scaling on a PC cluster

    20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006)  2006 

     More details

  • 仮想計算機を用いたグリッド上でのMPI実行環境

    先進的計算基盤システムシンポジウム SACSIS2006  2006 

     More details

  • ORE Grid:仮想計算機を用いたグリッド実行環境の高速な配置ツール

    先進的計算基盤システムシンポジウム SACSIS2006  2006 

     More details

  • グリッドチャレンジテストベッドの構築と運用縲怎Oリチャレテストベッドの作り方縲鰀

    並列/分散/協調処理に関する『高知』サマー・ワークショップ(SWoPP2006)  2006 

     More details

  • 仮想計算機と仮想ネットワークを用いた仮想クラスタの構築

    並列/分散/協調処理に関する『高知』サマー・ワークショップ(SWoPP2006)  2006 

     More details

  • フォールトモデルを考慮した耐故障性をもつ MPI フレームワーク Cuckoo FTMPI の提案と評価

    電子情報通信学会技術研究報告  2006 

     More details

  • ヘテロ型スーパーコンピュータTSUBAMEのLinpackによる性能評価

    並列/分散/協調処理に関する『高知』サマー・ワークショップ(SWoPP2006)  2006 

     More details

  • TSUBAMEの飛翔:ペタスケールへ向けた「みんなのスパコン」の構想

    並列/分散/協調処理に関する『高知』サマー・ワークショップ(SWoPP2006)  2006 

     More details

  • Being "BYTES-oriented" in HPC leads to an Open Big Data/AI Ecosystem and Further Advances into the Post-Moore Era (Keynote Talk) Invited

    Satoshi Matsuoka

    2017 IEEE International Conference on Big Data  2017.12 

     More details

  • レプリカ管理システムを利用したデータインテンシブアプリケーション向けスケジューリングシステム

    ハイパフォーマンスコンピューティングと計算科学シンポジウム  2006 

     More details

  • Converging HPC and Big Data / AI in an Open Public Infrastructure: Tokyo Tech. Tsubame3 and AIST ABCI Invited

    Satoshi Matsuoka

    The 19th IEEE International Conference on High Performance Computing and Communications (HPCC2017).  2017.12 

     More details

  • Energy Efficiency Gains From Software: Retrospectives and Perspectives (Panelist Talk) Invited

    Satoshi Matsuoka

    The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17)  2017.11 

     More details

  • Multi-Replication with Intelligent Staging in ata-Intensive Grid Applications.

    In The 7th IEEE/ACM International Conference on Grid Computing  2006 

     More details

  • Efficient Sparse General Matrix-Matrix Multiplication Algorithms for Many-Core Processors

    Yusuke Nagasaka, Aydın Buluç, Ariful Azad, Akira Nukada, Satoshi Matsuoka

    SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP'18)  2018.3 

     More details

  • 大規模分散システムにおける故障の解析

    電子情報通信学会技術研究報告 DC2006-16  2006 

     More details

  • Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization

    Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Marc Snir, Peter Nugent, Brian Van Essen

    PDML19 @ ICPP2019  2019.8 

     More details

  • データインテンシブコンピューティングのためのグリッドファイルシステム上でのデータ管理

    コンピュータシステム・シンポジウム(Compsys 2006)  2006 

     More details

  • Cambrian Explosion of Computing and Big Data in the Post-Moore Era

    HPDC 2018  2018 

     More details

  • Multi-replication with intelligent staging in data-intensive grid applications

    The 7th IEEE/ACM International Conference on Grid Computing  2006 

     More details

  • From Post-K to Cambrian Explosion of Computing and Big Data in the Post-Moore Era

    HPC2018 - International Advanced Workshop, From Clouds and Big Data to Exascale and Beyond  2018 

     More details

  • You Don't Really Need Big Fat Switches Anymore --- Almost

    情報処理学会研究報告 2003-ARC-154, SWoPP 2003  2003 

     More details

  • Converging HPC and BD/AI: Tokyo Tech TSUBAME3.0 and AIST ABCI (Booth Talk at Nvidia Booth) Invited

    Satoshi Matsuoka

    The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17)  2017.11 

     More details

  • Java 言語向け適応的部分計算の設計と実装

    第6回プログラミングおよび応用のシステムに関するワークショップ SPA 2003  2003 

     More details

  • Blurring the Lines: High-End Computing and Data Science (Panelist Talk) Invited

    Satoshi Matsuoka

    The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17)  2017.11 

     More details

  • OpenJITコンパイラフレームワークにおける実行時特化システム

    日本ソフトウエア科学会 第4回プログラミングおよび応用システムに関するワークショップ(SPA2001),March 2001  2001 

     More details

  • Converging HPC and BD/AI: Tokyo Tech TSUBAME3.0 and AIST ABCI (Booth Talk at Tokyo Tech Booth) Invited

    Satoshi Matsuoka

    The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17)  2017.11 

     More details

  • Grid RPC meets Data Grid: Network Enabled Services for Data Farming on the Grid

    Proceedings of IEEE Symposium on Cluster Computing and the Grid Brisbane, Australia, May 2001 (Invited Paper)  2001 

     More details

  • Converging HPC and BD/AI: Tokyo Tech TSUBAME3.0 and AIST ABCI (Booth Talk at DDN Booth) Invited

    Satoshi Matsuoka

    The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17)  2017.11 

     More details

  • Japanese Computional Grid Research Project: NAREGI

    Proceedings of the IEEE  2005 

     More details

  • Japanese Computional Grid Research Project: NAREGI

    Proceedings of the IEEE  2005 

     More details

  • Large-scale Dynamic Graph Processing on HPC Systems

    Keita Iwabuchi, Roger Pearce, Maya Gokhale, Satoshi Matsuoka

    Minisymposium @ SIAM 2017  2017.1 

     More details

  • Exploring User Level Burst Buffer on Public Cloud and HPC Invited

    Satoshi Matsuoka

    Dagstuhl Seminar: Challenges and Opportunities of User-Level File Systems for HPC  2017.5 

     More details

  • Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms

    Proceedings of 8th IEEE International Symposium on High Performance Distributed Computing (HPDC8)  1999 

     More details

  • HPCとビッグデータ・AIの融合インフラ Invited

    松岡 聡

    産総研IMPULSEコンソ セミナー(第3回)  2017.10 

     More details

  • Grid RPC meets Data Grid: Network Enabled Services for Data Farming on the Grid

    Proceedings of IEEE Symposium on Cluster Computing and the Grid Brisbane, Australia, May 2001 (Invited Paper)  2001 

     More details

  • Results from Tsubame 3.0 - A 47 AI-PFLOPS System for HPC and AI Convergence Invited

    Satoshi Matsuoka

    HP-CAST29  2017.11 

     More details

  • Highly Efficient and Encapsulated Re-use of Synchronization Code in Concurrent Object-Oriented Languages Washington D. C., Sep. 1993.

    Proceedings of ACM OOPSLA '93  1993 

     More details

  • FLOPS to BYTES: Accelerating Beyond Moore's Law is Data-Oriented Invited

    Satoshi Matsuoka

    PPAM2017  2017.9 

     More details

  • Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms

    Proceedings of 8th IEEE International Symposium on High Performance Distributed Computing (HPDC8)  1999 

     More details

  • TSUBAME3/ABCI and AI Invited

    Satoshi Matsuoka

    The 3rd International High Performance Computing Forum (IHPCF2017)  2017.9 

     More details

  • アプリケーションのEmpiricalな性能モデル構築のためのプロファイル情報の収集 (オーガナイズドセッション: 計算科学と計算機科学のコデザインのためのミニアプリ)

    野村 哲弘, 三浦 信一, 遠藤 敏夫, 松岡 聡

    2015年ハイパフォーマンスコンピューティングと計算科学シンポジウム  2015.5 

     More details

  • Can Local Binary Convolutions Make Neural Networks Models Smaller?

    Haoyu Zhang, Wahib Mohamed, Pen Chen, Satoshi Matsuoka

    International Conference on Parallel Processing (ICPP' 2019) 

     More details

  • Finishing GPU Jobs running on a Multi-GPU Batch-Queue Node-Sharing System Earlier with Remote GPU Execution and Migration

    Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

    ISC2016 PhD Forum  2016.6 

     More details

  • Dynamic Optimization for large data Broadcast on Clouds

    2010 

     More details

  • Evaluations of Directive Based Programming Model for GPUs and Extensions for Performance Portability

    Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka

    SIAM Conference and Computational Science (CSE) 2015  2015.3 

     More details

  • Highly Efficient and Encapsulated Re-use of Synchronization Code in Concurrent Object-Oriented Languages Washington D. C., Sep. 1993.

    Proceedings of ACM OOPSLA '93  1993 

     More details

  • A General Framwork for Bi-Directional Translation between Abstract and Pictorial Data.

    ACM Transactions on Information Systems  1992 

     More details

  • Increasing Jobs that a Multi-GPU Batch-Queue System can serve, with GPU Remoting and Migration

    Pak Markthub, Akihiro Nomura, Satoshi Matsuoka

    TJIA 2016 : The 8th Thailand-Japan International Academic Conference (TJIA)  2016.10 

     More details

  • A General Framwork for Bi-Directional Translation between Abstract and Pictorial Data.

    ACM Transactions on Information Systems  1992 

     More details

  • A Resource Selection Support Expert System for Large-Scale Computing Environments

    2010 

     More details

  • Performance Evaluation of Software Framework for Memory Fault Tolerance in GPU Accelerators

    SIAM Conference on Parallel Processing and Scientific Computing (PP10), MS36: Trends and Experiences in Heterogeneous Many-core Computing  2010 

     More details

  • Accelerated Computing in TSUBAME 1.2/2.0

    2010 

     More details

  • Power-Aware Task Scheduling on GPU Accelerated Clusters

    2010 

     More details

  • GPU Acceleration: a Fad or the Yellow Brick Road onto Exascale

    2010 

     More details

  • Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators

    IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010)  2010 

     More details

  • A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs

    24th IEEE International Parallel and Distributed Processing Symposium (IPDPS'10)  2010 

     More details

  • Access-pattern and bandwidth aware file replication algorithm in a grid environment

    The 9th IEEE/ACM International Conference on Grid Computing (Grid 2008)  2008 

     More details

  • TSUBAME 1.2 and the Road to TSUBAME 2.0 - Accelerated Multi-Petascale Commodity Computing for Everyone

    2009 

     More details

  • Speculative checkpointing: Exploiting temporal affinity of memory operations

    HPC ASIA 2009  2009 

     More details

  • Fast Conjugate Gradients with Multiple GPUs

    Lecture Notes in Computer Science  2009 

     More details

  • A Model-Based Algorithm for Optimizing I/O Intensive Applications in Clouds using VM-Based Migration

    Proceedings of Cloud2009 in the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid  2009 

     More details

  • Preliminary evaluation of software-based memory fault tolerance for GPGPU

    2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • Fast conjugate gradient solver on multi-GPU systems

    2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • Environmental-aware optimization of MPI checkpointing intervals

    HPC ASIA 2009  2009 

     More details

  • HPC Application Performance Improvement by a Supplemental Optical Circuit Switching Network

    High Performance Computing Symposium 2009  2009 

     More details

  • An Efficient Conjugate Gradient Solver on Double Precision Multi-GPU Systems

    2009 

     More details

  • Adaptive Resource Indexing Technique for Unstructured Peer-to-Peer Networks

    9th IEEE/ACM International Symposium on Cluster Computing and the Grid  2009 

     More details

  • Linpack Tuning Method on a Heterogeneous Supercomputer with Hybrid Accelerators

    Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP2009)  2009 

     More details

  • Towards user-satisfaction-based resource management in a large-scale computing environment

    SWoPP2009  2009 

     More details

  • Petascaling Commodity onto Exascale: GPUs as Multithreaded Massively-Parallel Vector Processors - the Only Road to Exascale

    2009 

     More details

  • A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs

    2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • GPU Accelerated Computing---From Hype to Mainstream, the Rebirth of Vector Computing

    2009 

     More details

  • File Clustering Based Replication Algorithm in a Grid Environment

    The 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid  2009 

     More details

  • GPU accelerated computing窶吐rom hype to mainstream, the rebirth of vector computing

    Scientific Discovery through Advanced Computing (SciDAC 2009)  2009 

     More details

  • Software-Based ECC for GPUs

    2009 Symposium on Application Accelerators in High Performance Computing (SAAHPC'09)  2009 

     More details

  • The Efficient Checkpoint based on Erasure Coding with Incremental Method

    SIG HPC  2009 

     More details

  • Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters

    The Fifth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction to IEEE IPDPS 2009  2009 

     More details

  • プロセス間共通メモリイメージを考慮したマイグレーション最適化

    情報処理学会 ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009)  2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • 四種プロセッサからなるヘテロ型スーパーコンピュータにおけるLinpackチューニング

    情報処理学会 ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009)  2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • Fast conjugate gradient solver on multi-GPU systems

    2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • SWAPアクセス数の実行時推定によるメモリの省電力化手法

    情報処理学会 ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009)  2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • Petascaling Commodity onto Exascale with GPUs on TSUBAME1.2 onto TSUBAME2.0

    2009 

     More details

  • Auto-Tuning 3-D FFT Library for CUDA GPUs

    2009 ACM/IEEE conference on Supercomputing (SC09)  2009 

     More details

  • 複数 GPU システムに対応する自動最適化 3D-FFT ライブラリ

    情報処理学会 ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009)  2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • Petascaling Commodity onto Exascale with GPUs and Windows HPC

    2009 

     More details

  • MapReduce Implementation on the TSUBAME Supercomputer

    2009 

     More details

  • CG on GPU-enhanced Clusters

    2009 

     More details

  • TSUBAME 1.2 and the Road to TSUBAME 2.0 - Accelerated Multi-Petascale Commodity Computing for Everyone

    2009 

     More details

  • Speculative checkpointing: Exploiting temporal affinity of memory operations

    HPC ASIA 2009  2009 

     More details

  • A Model-Based Algorithm for Optimizing I/O Intensive Applications in Clouds using VM-Based Migration

    Proceedings of Cloud2009 in the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid  2009 

     More details

  • スワップコストの動的推定によるメモリの省電力化手法

    計算機アーキテクチャ・ハイパフォーマンスコンピューティング合同研究発表会(HOKKE-2009)  2009 

     More details

  • 四種プロセッサからなるヘテロ型スーパーコンピュータにおけるLinpackチューニング

    計算機アーキテクチャ・ハイパフォーマンスコンピューティング合同研究発表会(HOKKE-2009)  2009 

     More details

  • Environmental-aware optimization of MPI checkpointing intervals

    HPC ASIA 2009  2009 

     More details

  • プロセス間共通メモリイメージを考慮したマイグレーション最適化

    計算機アーキテクチャ・ハイパフォーマンスコンピューティング合同研究発表会(HOKKE-2009)  2009 

     More details

  • Preliminary evaluation of software-based memory fault tolerance for GPGPU

    2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • GPU向けソフトウェアECCの性能評価

    計算機アーキテクチャ・ハイパフォーマンスコンピューティング合同研究発表会(HOKKE-2009)  2009 

     More details

  • 光サーキットネットワークの補助的利用によるHPCアプリケーション性能向上

    情報処理学会 ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2009)  2009 

     More details

  • TSUBAME2.0における高バンド幅なペタフロップス・コンピューティングの可能性

    2009 

     More details

  • 姫野ベンチマークのGPUマルチノード実行における通信と演算のオーバーラップによる高速化 ~ 32GPUで700GFLOPS超を達成 ~

    HPC研究会  2009 

     More details

  • GPUにおける耐故障性を考慮した数値計算の電力性能

    2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • GPUにおける性能と消費電力の相関性の解析

    2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • CUDA GPU向けの自動最適化FFTライブラリ

    先進的基盤システムシンポジウム SACSIS 2009  2009 

     More details

  • File Clustering Based Replication Algorithm in a Grid Environment

    The 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid  2009 

     More details

  • Adaptive Resource Indexing Technique for Unstructured Peer-to-Peer Networks

    9th IEEE/ACM International Symposium on Cluster Computing and the Grid  2009 

     More details

  • Fast Conjugate Gradients with Multiple GPUs

    Lecture Notes in Computer Science  2009 

     More details

  • Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters

    The Fifth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction to IEEE IPDPS 2009  2009 

     More details

  • An Efficient Conjugate Gradient Solver on Double Precision Multi-GPU Systems

    2009 

     More details

  • Software-Based ECC for GPUs

    2009 Symposium on Application Accelerators in High Performance Computing (SAAHPC'09)  2009 

     More details

  • GPU Accelerated Computing---From Hype to Mainstream, the Rebirth of Vector Computing

    2009 

     More details

  • A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs

    2009 

     More details

    Presentation type:Poster presentation  

    researchmap

  • 異種アクセラレータを持つヘテロ型スーパーコンピュータ上のLinpack の性能向上手法

    並列/分散/協調処理に関するサマーワークショップ(SWoPP2009)  2009 

     More details

  • TSUBAME2.0におけるGPGPUによるスケーラブルなペタフロップス・ベクトル・スーパーコンピューティング

    2009 

     More details

  • Petascaling Commodity onto Exascale: GPUs as Multithreaded Massively-Parallel Vector Processors - the Only Road to Exascale

    2009 

     More details

  • GPUにおける耐故障性を考慮した数値計算の電力性能

    情報処理学会研究報告2009-HPC-121  2009 

     More details

  • GPU における性能と消費電力 の相関性の解析

    情報処理学会研究報告2009-HPC-121  2009 

     More details

  • 大規模計算環境におけるユーザ満足度を考慮した資源管理へむけて

    2009年並列/分散/協調処理に関する『仙台』サマー・ワークショップ(SWoPP仙台2009)  2009 

     More details

  • GPU accelerated computing窶吐rom hype to mainstream, the rebirth of vector computing

    Scientific Discovery through Advanced Computing (SciDAC 2009)  2009 

     More details

  • 増分データとErasure Coding を利用した高速なチェックポイント手法

    HPC研究会  2009 

     More details

  • CG on GPU-enhanced Clusters

    2009 

     More details

  • GPU向け耐メモリエラーソフトウェアフレームワーク

    情報処理学会研究報告 2009-HPC-123  2009 

     More details

  • Hundred million cores in commodity---Why not? (or, will `custom'*finally* prevail?)

    CCGSC2008  2008 

     More details

  • Coupled-simulation e-science support in the NAREGI grid

    IEEE Computer  2008 

     More details

  • Grid'BnB: A parallel branch & bound framework for grids

    14th International Conference on High Performance Computing (HiPC)  2008 

     More details

  • Auto-Tuning 3-D FFT Library for CUDA GPUs

    2009 ACM/IEEE conference on Supercomputing (SC09)  2009 

     More details

  • スーパーコンピュータTSUBAME上でのMapReduceの実現

    情報処理学会研究報告2009-HPC-123(HOKKE17)  2009 

     More details

  • Petascaling Commodity onto Exascale with GPUs and Windows HPC

    2009 

     More details

  • Petascaling Commodity onto Exascale with GPUs on TSUBAME1.2 onto TSUBAME2.0

    2009 

     More details

  • Index distribution technique for efficient search on unstructured peer-to-peer networks

    2008 

     More details

  • HPC-GPGPU: Large-scale commodity accelerated clusters and its application to advanced structural proteomics

    Microsoft Science All-Hands-Meeting  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • An efficient, model-based CPU-GPU heterogeneous FFT library

    International Heterogeneity in Computing Workshop (HCW '08)  2008 

     More details

  • Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method

    The Fourth Workshop on High-Performance  2008 

     More details

  • Time stamping authoruty grid

    Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'08)  2008 

     More details

  • Index distribution technique for efficient search on unstructured peer-to-peer networks

    The International Conference in Electrical Engineering/Electronics  2008 

     More details

  • Massive supercomputing coping with heterogeneity of modern accelerators

    IEEE International Parallel & Distributed Processing Symposium (IPDPS 2008)  2008 

     More details

  • Locality aware MPI communication on a commodity opto-electronic hybrid network

    Workshop on Large-Scale Parallel Processing (LSPP)  2008 

     More details

  • A decentralized, scalable, and autonomous grid monitoring system

    11th International Conference on Principles of Distributed Systems (OPODIS)  2008 

     More details

  • Model-based fault localization in large-scale computing systems

    The 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS'08)  2008 

     More details

  • Environmental-aware optimization of MPI checkpointing intervals

    The 2008 IEEE International Conference on Cluster Computing (Cluster 2008)  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

  • Rise of the Commodity Vectors or Democratization of Supercomputing

    NVISION2008  2008 

     More details

  • Model-based Optimization for Data-Intensive Application on Virtual Cluster

    The 2008 9th IEEE/ACM International Conference on Grid Computing (Grid 2008)  2008 

     More details

    Presentation type:Poster presentation  

    researchmap

▼display all

Works

  • the Lucie project, a turn-key network installer for large-scaled cluster

    2003 - 2006

     More details

    Work type:Artistic work  

    researchmap

  • 大規模クラスタ用セットアップ・管理ツールの実用化

    2003 - 2006

     More details

    Work type:Artistic work  

    researchmap

  • グリッド技術に基づくディペンダブルな大規模モディティクラスタ構築技術

    2001 - 2006

     More details

    Work type:Artistic work  

    researchmap

  • クラスタグリッドテストベッド開発 グリッド上数理最適化ライブラリ アプリケーションの構築 Grid RPC/Ninfスケーラブル高信頼性拡張の研究

    2001 - 2004

     More details

    Work type:Artistic work  

    researchmap

Awards

▼display all

Research Projects

  • -

    2021.4 - 2023.3

    Fujitsu Ltd  Collaborative Research 

      More details

    Authorship:Principal investigator  Grant type:Collaborative (industry/university)

    researchmap

  • Fast and cost-effective deep learning algorithm platform for video processing in social infrastructure

    2019.4 - 2023.3

    Japan Science and Technology Agency  CREST 

      More details

    Authorship:Coinvestigator(s) 

    researchmap

  • 次世代コンピュータシステムのソフト・ハードアーキテクチャと適用アプリに関する研究

    2017.4 - 2021.3

    株式会社富士通研究所  共同研究 

    松岡 聡, 遠藤 敏夫, 野村 哲弘

      More details

    Authorship:Principal investigator  Grant type:Collaborative (industry/university)

    researchmap

  • 社会インフラ映像処理のための高速・省資源深層学習アルゴリズム基盤(スモールフェーズ課題)

    2016.12 - 2019.3

    科学技術振興機構 (JST)  戦略的創造研究推進事業(CREST) 

    篠田 浩一, 松岡 聡, 村田 剛志, 横田 理央

      More details

    Authorship:Coinvestigator(s)  Grant type:Competitive

    researchmap

  • Accelerating High-Performance Computing Application Kernels Through Reconfigurable Hardware

    Grant number:16F16764  2016.11 - 2019.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for JSPS Fellows  Grant-in-Aid for JSPS Fellows

      More details

    Grant amount:\2200000 ( Direct Cost: \2200000 )

    researchmap

  • 機械学習の処理高速化に関する研究

    2016.10 - 2020.3

    株式会社デンソーアイティーラボラトリ  共同研究 

    松岡 聡, 横田 理央, 野村 哲弘

      More details

    Authorship:Principal investigator  Grant type:Collaborative (industry/university)

    researchmap

  • 次世代HPC のソフト・ハードアーキテクチャと適用アプリに関する研究

    2016.10 - 2017.3

    株式会社富士通研究所  共同研究 

      More details

    Authorship:Principal investigator  Grant type:Collaborative (industry/university)

    researchmap

  • 圧縮性流体解析プログラムの高速化に関する研究

    2015.11 - 2016.3

    株式会社IHI  共同研究 

      More details

    Authorship:Principal investigator  Grant type:Collaborative (industry/university)

    researchmap

  • 機械学習の処理高速化に関する研究

    2015.10 - 2016.9

    株式会社デンソー  共同研究 

      More details

    Authorship:Principal investigator  Grant type:Collaborative (industry/university)

    researchmap

  • EBD: Extreme Big Data - Convergence of Big Data and HPC for Yottabyte Processing

    2013.10 - 2019.3

    Japan Science and Technology Agency (JST)  Core Research for Evolutional Science and Technology (CREST) 

      More details

    Authorship:Principal investigator  Grant type:Competitive

    researchmap

  • 高性能計算のためのプログラミングモデル

    Grant number:12F02044  2012 - 2013

    日本学術振興会  科学研究費助成事業 特別研究員奨励費  特別研究員奨励費

    松岡 聡, PERICASGLEIM M.

      More details

    Grant amount:\2300000 ( Direct Cost: \2300000 )

    本研究の主たる目的は次世代スーパーコンピュータにおいて高性能・高電力効率と生産性を両立させる並列プログラミング手法の開発である。本年度はタスクパラレルモデルとデータフローモデルのランタイム評価と資源管理に焦点を置き、研究計画を推進した。前年度に行ったexaFMMを対象とした解析において、スケジューラーによるアプリケーションの性能差は、スケジューリング法の差によるプロセッサアイドル時間では説明がつかず、資源管理によるものと考えられたことによる。インターコネクトの制約が増大する将来のシステムでは、この点は性能・電力両面からより重要性が増すと考えられる。
    この目的のため、タスクパラレルモデルとデータフローモデルにおけるreuse distance methodの解析手法の開発を行った。Reuse distanceは、ある特定のデータ要素への2回のアクセスの間にアクセスされたデータの量を示す指標である。この手法は資源管理において最も重要となるメモリアクセスの時間的局所性を解析するための強力な手法であり、プロセッサキャッシュの利用効率と高い相関がある。しかし、元来シングルコアプロセッサ向けに開発されたものであり、本研究に用いるための実装手法は明らかではなかった。そこで、克服すべき課題(トレースの生成、トレースのサイズ、計算の複雑さ)を抽出し、実現手法を提案した。
    まず、このような手法が調査対象となる計算カーネルのデータサイズより十分大きい距離に関しては正確なreuse distanceを計算できる一方でトレースのサイズを大幅に削減することができる手法を示した。この手法の有効性を示すため、トレースの生成がほぼオーバーヘッド無しで測定できることを示すプロトタイプを構築した。加えてこの手法は、先行研究よりも大規模・長時間にわたる実行へもスケール可能である。これらの結果を、二報の論文として発表した。

    researchmap

  • Fault Tolerant Infrastructure Toward Billion of Parallelization and Exa-scale Supercomputer

    Grant number:23220003  2011.4 - 2016.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (S)  Grant-in-Aid for Scientific Research (S)

    Matsuoka Satoshi, Hideyuki Jitsumoto, Toshio Endo, Hitoshi Sato, Naoya Maruyama, Shinichiro Takizawa, Kento Sato, Leonardo Bautista Gomez, Jens Domke

      More details

    Grant amount:\213720000 ( Direct Cost: \164400000 、 Indirect Cost:\49320000 )

    Fault tolerance has been recognized as an indispensable technique for exascale computing as supercomputers grow towards billion-way of parallelism. For future exascale supercomputers, we proposed advanced fault tolerant infrastructures. The advanced fault tolerant infrastructures include a scalable checkpoint/restart library, a fault tolerant messaging interface and a highly resilient burst buffer architecture. We validated the effectiveness based on mathematical statistics. We also released the software and made impact to the community.

    researchmap

  • 1億並列・エクサスケールスーパーコンピュータの耐故障性基盤

    Grant number:23240006  2011

    日本学術振興会  科学研究費助成事業 基盤研究(A)  基盤研究(A)

    松岡 聡

      More details

    Grant amount:\13000000 ( Direct Cost: \10000000 、 Indirect Cost:\3000000 )

    科学技術分野において、大規模なシミュレーションではスーパーコンピュータ(スパコン)の利用が不可欠となっている。しかし、スパコンに搭載される機器の増大・複雑化により、障害発生率が増加し、システムが実質的に動作しなくなると危惧されており、チェックポイント/リスタートなどの耐障害手法の適用が不可避となっているが、ポストペタースケールスパコンでは、技術的な課題が残る。このため、初年度は、1億スレッド・ポストペタのための基礎的な複合的なチェックポイント・リスタートを行うための耐故障性の数理モデル・性能モデルを探求し、特にポストペタスケールアプリケーションに適した耐障害手法の億単位のスレッド時の定量的性質を明らかにすることを目的として研究に従事した。実際、ポストペタ・エクサスケールスパコンの最有力アーキテクチャである細粒度長並列プロセッサ+粗粒度プロセッサを併用するハイブリッド型アーキテクチャにおいて、チェックポイント/リスタートは種々の技術的困難を伴う技術であったが、我々は、単一GPUにおける「リプレイ手法(メモリ割り当てやメモリーコピーの履歴を取り、リスタートの際に、その履歴に基づいて再現実行"リプレイ"を行うことにより、整合性の取れたチェックポイントを取る手法)」を拡張し、ノード内およびノード外の複数のCPU・GPUを使用するアプリケーションにおいて、安定かつ一貫性のとれたチェックポインティングを実現した。また、性能面においても、許容されるレベルのオーバーヘッドまでに押さえた。また、我々が開発したライブラリでは既存のプログラムに変更を加えることなく、これを実現することが可能であり、可用性にも優れている。超細粒度並列・ハイブリッド型アーキテクチャにおける透過的なチェックポイントの実現は、学術的インパクトも大きいと期待される。

    researchmap

  • Platform of large scale and high quality genomics and bioinformatics: Towards the advancement of genome sciences in academia

    Grant number:221S0002  2010.4 - 2016.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)

    KOHARA Yuji, KATO Kazuto, TOYODA Atsushi, KUROKI Yoko, SUGANO Sumio, SUZUKI Yutaka, HAYASHI Tetsuya, YAMAMOTO Ken, TSUJI Shoji, INOUE Ituro, KUROKAWA Ken, MORISHITA Shinichi, NAKAMURA Yasukazu, TABATA Satoshi, KUHARA Satoshi, IWASAKI Wataru, SESE Jun, TAKAHASHI Hiroki, ASAI Kiyoshi, KASAHARA Masahiro, SAKAKIBARA Yasubumi, YADA Tetsushi, YAMAGATA Zentaro, MUTO Kaori, IDA Ryuichi, MASUI Tohru, KURIYAMA Mariko, TAKAGI Toshihisa, FUJIYAMA Asao, HATTORI Masahira, OGURA Yoshitoshi, TOKUNAGA Katsushi, KUWANO Ryozo, OHASHI Jun, ITOH Takehiko, HIRAKAWA Hideki, NOGUCHI Hideki, MATSUOKA Satoshi, OGASAWARA Naotake, NAKAMURA Kensuke, HAMADA Michiaki, KANAYA Shigehiko, ANZAI Yuichiro, OKADA Kiyotaka, SAKAKI Yoshiyuki, TAKAKU Fumimaro, TOYOSHIMA Kumao, NAKAMURA Keiko, HOTTA Yoshiki, YONEZAWA Akinori, YOSHIKAWA Hiroshi, YOSHIDA Mitsuaki, INOKO Hidetoshi, TODA Tatsushi, INAZAWA Johji, GOJOBORI Takashi, URUSHIHARA Hideko, TAKEDA Hiroyuki, SHIROISHI Toshihiko, ITOH Takashi, SATOH Noriyuki, MATSUDA Hideo, GOTO Susumu, TSUDA Masataka

      More details

    We have provided technologies of large scale and high quality genomics and bioinformatics to many KAKENHI projects, 60 to 90 subjects every year and altogether 464 subjects, based on application and selection. This kind of support became possible by concentrating to a limited number of DNA sequencing centers under the situation that there was unexpectedly fast advancement of these technologies in the world. Our activity has led to 363 papers including the Coelacanth genome paper. The KAKENHI subjects that we supported cover all the KAKENHI items and almost divisions of life science domain. Furthermore, we have developed new methodologies to solve the problems that emerged from the support activity : One of them is the genome assembly software PLATANUS that has become a key method to decipher difficult genomes. Such a virtuous circle and the outcome show that the platform is essential and effective in life sciences.

    researchmap

  • ULP-HPC: Ultra Low-Power, High Performance Computing via Modeling and Optimization of Next Generation HPC Technologies

    2007.10 - 2013.3

    Japan Science and Technology Agency (JST)  Core Research for Evolutional Science and Technology (CREST) 

      More details

    Authorship:Principal investigator  Grant type:Competitive

    researchmap

  • Design and Development of Advanced IT Research Platform for Information Explosion Era

    Grant number:18049073  2006 - 2010

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas  Grant-in-Aid for Scientific Research on Priority Areas

    ADACHI Jun, TANAKA Katsumi, NISHIDA Toyoaki, KUNIYOSHI Yasuo, SUDOH Osamu, KUROHASHI Sadao, HARA Takahiro, MATSUOKA Satoshi, TAURA Kenjiro, TATEBE Osami, MUNETOMO Masaharu, HIROTSU Toshio, MATSUBARA Jin, SHIMOJYO Shinji, CHIBA Shigeru, YUASA Taichi, MATSUYAMA Takashi, CHIKAYAMA Takashi, KONDO Toru, KONO Kenji, OKAMOTO Masahiro, AIDA Kento, KAMADA Tomio, KITSUREGAWA Mararu, YAMANA Hayato, NAKAMURA Yutaka, KOBAYASHI Hiroaki, NAKAJIMA Hiroshi

      More details

    Grant amount:\644600000 ( Direct Cost: \644600000 )

    This project implemented a common research infrastructure for all the research groups participating in this priority-area research initiative, accordingly supported all research activities in this initiative. Providing this infrastructure, we succeeded in accelerating shared utilization of research facilities and resources within the limitation of research funding and strengthening the collaboration among research groups. These shared facilities include (a)TSUBAKI: a open search engine for large-scale corpus, (b)InTrigger : Widely-distributed computing test-bed, (c)IMADE : an environment for real-world interaction measurement and analysis, and (d) prototyping for sensor-network based preventive medicine.

    researchmap

  • Highly Scalable, High Performance and Autonomous Distributed Execution for Information Explosion Environments

    Grant number:18049028  2006 - 2010

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas  Grant-in-Aid for Scientific Research on Priority Areas

    MATSUOKA Satoshi, AIDA Kento, NAKADA Hidemoto, TAKEFUSA Atsuko, MARUYAMA Naoya, JITSUMOTO Hideyuki, SATO Hitoshi, TAKIZAWA Shinichiro

      More details

    Grant amount:\87100000 ( Direct Cost: \87100000 )

    We have conducted several fundamental research activities for constructing highly scalable, high performance and autonomous distributed execution environments, called "resilient grids", for the information explosion era. We have built the constituent techniques, including modeling and simulation, for the resilient grids in terms of autonomous construction of high performance application execution environments and federation of future-networks and the environments.

    researchmap

  • New IT Infrastructure for the Information-explosion Era

    Grant number:17077001  2005 - 2011

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas  Grant-in-Aid for Scientific Research on Priority Areas

    KITSUREGAWA Masaru, ADACHI Jun, MATSUOKA Satoshi, MATSUYAMA Takashi, SUDOU Osamu

      More details

    Grant amount:\60400000 ( Direct Cost: \60400000 )

    researchmap

  • グリッドのプログラミングモデルProActiveを大規模テストベッド上でGridRPCなどと比較する

    Grant number:05F05791  2005 - 2007

    日本学術振興会  科学研究費助成事業 特別研究員奨励費  特別研究員奨励費

    松岡 聡, DADUEL Laurent, BADUEL LAURENT, BADUEL. LAURENT

      More details

    Grant amount:\1000000 ( Direct Cost: \1000000 )

    これまで、自律的でスケーラブルな効率の良いグリッドモニタリングシステムの構築を行ってきた。このモニタリングシステムは自律的に動作するためにP2Pネットワークを通して、情報のやり取りを行う。一般的にシステムのモニタリングは、イベントのモニタリングを行い、その結果得られた情報を必要とするシステムへ通信することにより行われる。
    現在のモニタリングシステムの問題点は、中央管理型であり、また構成の静的な決定を前提とすることである。中央管理型は単一故障点が存在し、ボトルネックを発生させる。また、環境の静的構成は大規模システムにおいて、すべての構成ノードの位置を正確に知る必要があり管理者に大きな負担を与える。これらの問題を解決するために、我々はP2Pネットワークの大規模性、単一故障点の回避による頑健性を利用した自律的モニタリングシステムを提案した。提案システムを利用することにより、自律構成、自己最適化、自己回復、および自己保護が実現可能となり、システムの完全な自律運用が可能となった。
    本年度は、上記提案の開発を進めた。これまでのプロイタイプは、自律構成を実装し、実環境で運用可能であった。これに加え、システム運用時の振る舞いを利用した自己最適化を実装した。自己最適化は通信量、分散データベース、情報の分散速度、システムの適応性と動的なサイズ決定、グリッドサービスの協調性、システムを構成するコンポーネントの構成に焦点をあてて行っている。

    researchmap

  • Study on advanced programming environment using OpenMP for a next generation high performance cluster system

    Grant number:14208026  2002 - 2004

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)  Grant-in-Aid for Scientific Research (A)

    SATO Mitsuhisa, ISHIKAWA Yutaka, MATSUOKA Satoshi, HONDA Hiroki, BOKU Taisuke, TAKAHASHI Daisuke

      More details

    Grant amount:\34970000 ( Direct Cost: \26900000 、 Indirect Cost:\8070000 )

    We have studied the OpenMP programming environment for the next generation 64-bit high-performance clusters, by using software distributed shared memory (SDSM) system to enable OpenMP program to run on the cluster. We have also developed a programming support system for OpenMP, and numerical libraries using OpenMP.
    1.We ported the SCore cluster system software to 64-bit processor architectures. We conducted the performance evaluation of SCASH DSM system which runs on SCore.
    2.We have designed and implemented a very portable SDSM system, SCASH-MPI which uses MPI as its communication layer. MPI is the most portable communication library supported for many kinds of high-speed communication network, so that this approach provide highly portability It allows the users to make use of wide address space in 64-bit processor. We found that the overhead of this implementation is just 6% comparing to the original SCASH.
    3.We have designed a new SDSM system, FDSM, by using the access pattern analysis of applications. The access pattern is detected by a hardware mechanism provided by IA64, and is used for efficient communication. It achieves more performance than SCASH.
    4.We have studied the optimization of OpenMP program running a DSM system of heterogeneous clusters. We found that the performance can be improved by the combination of the loop re-partitioning and the page migration.
    5.We have designed and implemented the interactive tool, OMP/iPat, to support the programmer for OpenMP program developments. It allows the programmer to develop his OpenMP program interactively using the information from parallelism analysis by the compiler.
    6.We have conducted the performance evaluation by using the OpenMP benchmark, SPEC-OMP. We have designed and implemented an algorithm of parallel recursive FFT by using OpenMP for IA-64 shared memory multi-processors.

    researchmap

  • Research on Peer-to-peer large-scale data processing on the Grid

    Grant number:13224034  2001 - 2005

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas  Grant-in-Aid for Scientific Research on Priority Areas

    MATSUOKA Satoshi, AIDA Kento, MATSUDA Yuko, MORITA Youhei, NAKADA Hidemoto, TATEBE Osamu

      More details

    Grant amount:\113200000 ( Direct Cost: \113200000 )

    The aim of this project is to develop basic technologies for the petabyte scale data processing. The goal is to bear the actual data processing from the LHC/ATLAS detector, as a part of the 'Gfarm' data grid middleware project. We have studied the following themes.
    1). Wide area peer-to-peer federation and data transfer among the large clusters on the Grid
    2). Programming, performance analyses and simulation techniques on the peer-to-peer data Grid.
    3). High performance large scale data management for the data Grid.
    1) We have promoted research on wide area large scale data transfer, and participated in the Bandwidth Challenge 2002, 2003 and StorCloud 2005 held in the United States to demonstrate the performance of large data transfer. Through the challenges, we have succeeded in running real applications on the GFarm and got the outlook for the realization of the super large scale data centers and the super scale data analyses for international collaborative experiments.
    2) We have developed a Grid simulator named 'Bricks' to construct the performance models and to perform analyses of the data Grid. With that, we made lots of simulations of the data Grid and investigated characteristics of the data Grid and the GFarm architecture. We have also developed a Grid programming environment 'Jay', which is portable and suitable for the peer-to-peer Grid environments
    3) We have developed a widely distributed file system that can avoid access concentration on a small number of nodes on the data Grid. We have implemented the system as an extension to GFarm, and inspected the validity of the system.

    researchmap

  • Polyhedral Homotopy Continuation Methods for Computing All Real and Complex Solutions of Systems of Polynomial Equations

    Grant number:13650444  2001 - 2002

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)  Grant-in-Aid for Scientific Research (C)

    KOJIMA Masakazu, FUJISAWA Katsuki, MATSUOKA Satoshi

      More details

    Grant amount:\3000000 ( Direct Cost: \3000000 )

    The purpose of this research project is to develop practical numerical methods for all real and complex solutions of large scale systems of polynomial equations. The polyhedral homotopy continuation method used in this research consists of the following three phases :
    Phase 1 : Construction of polyhedral homotopy systems.
    Phase 2 : Numerical tracing of homotopy paths by the predictor-corrector method.
    Phase 3 : Verification of solutions.
    In 2001, we designed and developed basic algorithms for each phase above. In 2002, we studied the following issues.
    1. Improvement of computational efficiency in each phase. In phase 1, we proposed an efficient construction of homotopy systems arising from symmetric systems of polynomial equations. We incorporated a linear algebra library LAPACK into phase 2, and developed a new method for verifying and classifying solutions of the cyclic polynomial.
    2. Improvement of numerical stability in each phase. Linear systems to be solved in phase 2 become often so ill-conditioned that computation of their accurate solutions are difficult. We devised new dynamic scaling techniques to resolve this difficulty. We confirmed through numerical experiments that the use of these scaling techniques together with the singular value decomposition worked very effectively to improve the numerical stability of phase 2.
    3. We combined the three phases into a software package PHoM, and released it through Internet. This software solved some large scale systems of polynomial equations that had not been solved before. In conclusion, this research project has accomplished its purpose mentioned above.
    4. We have started a parallel implementation of PHoM, which will continue in the next year.

    researchmap

  • Descriptions and Negotiation Models of Security Policies

    Grant number:12133205  2000 - 2003

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas  Grant-in-Aid for Scientific Research on Priority Areas

    SHIBAYAMA Etsuya, TAKAHASHI Shin, WAKITA Ken, MATSUOKA Satoshi

      More details

    Grant amount:\32200000 ( Direct Cost: \32200000 )

    As a first step to building next generation secure information infrastructures, we have investigated the following three areas, representing three different viewpoints : descriptions, users, and systems.
    1. Flexible Security Policy Description Schemes and Their Enforcement Mechanisims Taking account that mutual ly untrusted parties may have to collaborate or do trade with one another in the Internet era, we propose a new model of security policy that is compatible with privacy protections. Our research results include a model of policy negotiation using at tribute authentications, description schemes based upon security automata, an enforcement mechanism with instrumentation, and optimization with partial evaluations.
    2. Convenient Methodologies for Constructions and Operations of Secure Software Systems We propose (semi-) automated construct ions and operations of secure software systems by developers, operators, and end-users. Our research results include automatic exploitations of security policies from information of package managers, semi-automated constructions of secure programuing language processors, development environments of secure software including a visual language system and a debugger.
    3. Foundations of Next Generation Information Infrastructures We propose various security mechanisms for computing systems utilizing massive resources. Our research results include a fault-tolerant and high performance communication library, a scalable authentication algorithm, a remote installation and recovery tool for PC clusters, a virtual machine technology for the resolution of interference among virtual organizations.

    researchmap

  • Multi-focus Zooming Interfaces with Focus Predictions

    Grant number:12480070  2000 - 2002

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)  Grant-in-Aid for Scientific Research (B)

    SHIBAYAMA Etsuya, TAKAHASHI Shin, MATSUOKA Satoshi, TANAKA Jiro

      More details

    Grant amount:\7200000 ( Direct Cost: \7200000 )

    We have investigated two application domains of multi-focus zooming interfaces, that is, interactive visualizations of declarative data-flow visual program executions and hierarchical directory structures. In addition, for future enhancement of our zooming interfaces, we have been doing basic research on human-computer interactions beyond traditional desktop environments.
    1. We have implemented an interactive browser with a multi-focus zooming interface, which provides a support for navigation of a huge and static diagram, representing an entire execution of a declarative data-flow visual program. This browser can effectively put multiple foci on arbitrary portions of a diagram and render an image consisting of not only those focal points but also their overall contexts. Also, based upon the notion of a "trace view,"we have proposed techniques to simultaneously depict asynchronous events such as inputs and outputs of a process and to illustrate dependences among processes.
    2. We have implemented a prototype system that can interactively visualize a directory structure through a multi-focus zooming interface. This system provides a support for three sorts of navigation patterns, that is, navigation via parent-child links, keywords, and similarities. An important feature of this system is to show the enclosing directories of foci as contexts.
    3. For the purposes of future enhancement of our zooming interfaces, we have investigated fundamental interaction techniques in wall display and mixed reality environments.

    researchmap

  • j-GRID

    Grant number:12558031  2000 - 2001

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)  Grant-in-Aid for Scientific Research (B)

    MURAOKA Yoichi, SEKIGUCHI Satoshi, AIDA Kento, SATOSHI Matsuoka, TANABE Osamu, TANAKA Yoshio

      More details

    Grant amount:\13200000 ( Direct Cost: \13200000 )

    In this project, we have developed basic technologies to construct Knowledge GRID which is a next generation GRID. Knowledge GRID enables us to share knowledge among people. Basic technologies include virtual space, hyperbook, and soft-computing.

    researchmap

  • Wide-Area Grid Cluster for Parallel Optimization

    Grant number:12480068  2000 - 2001

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)  Grant-in-Aid for Scientific Research (B)

    MATSUOKA Satoshi, AIDA Kento, DAI You, KOJIMA Masakazu, OGAWA Hirotaka, FUJISAWA Katsuki

      More details

    Grant amount:\12500000 ( Direct Cost: \12500000 )

    We employ the so-called Grid technology to construct a fleet of compute nodes as an aggregation of computing cluster nodes over a wide-area network, and using such "federation of cluster resources" attempt to tackle non-convex quadratic optimization problems of unprecedented scale, and made it accessible from throughout the Internet. More specifically, we developed an algorithm called SCRM (Successive Convex Relaxation Method) which is heavily based on using large numbers of SDP (Semidefinite Programming, SDP) subsolvers, which itself is called SDPA and is a very fast SDP solver using the Interior Point Methods. By efficiently spreading out the SDP solvers over the Grid we showed that we can solve non-convex quadratic problems of very large scale very efficiently, achieving almost linear speedup. For this purpose, we have constructed a fleet of PC clusters spread out throughout several locations, including Titech Oo-okayama Campus, Titech Suzukake-dai Campus, and Kyoto University. We have been able to achieve nearly 100-fold speedup using 128 processors. The key issue was not only the algorithm but efficient programming using the Ninf GridRPC system, which had to be modified extensively as well as new programming methodologies had to be 4eyeloped in order to cope with massive parallel execution of hundreds of tasks over the Grid.
    More specifically, we parallelized SDPA with OpenMP using worksharing methodology to achieve nearly perfect parallel speedup for each cluster on the Grid. Also, we automated the process of selecting the best solver based on the data structure of the problem as well as the "shape" of the non-zero elements in the problem matrix. Then using the 256 nodes worth of clusters spread out over the -country, and using the Ninf GridRPC middleware, we constructed a "optimization solver server", achieving good speedup as mentioned above. The result not only set several world records for benchmark problems but also lead to even larger Grid research in the coming years.

    researchmap

  • Reconfigurable Parallel Processing Plug&Play Clustering

    Grant number:12558025  2000 - 2001

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)  Grant-in-Aid for Scientific Research (B)

    MATSUOKA Satoshi, ISHIKAWA Yutaka, OGAWA Hirotaka, AIDA Kento, TAKAGI Hiromitsu

      More details

    Grant amount:\8100000 ( Direct Cost: \8100000 )

    The objective of the project is to push the technological envelop of fault tolerance and reconfigurability in large-scale clustering such that the clusters become almost self-sustaining, and reconfiguring is a matter of "Plug&Play". Some of the salient results are as follows :
    1) Construction of the "Plug&Play" clustering testbed (20 nodes of DELL Inspiron , Mobile Celeron 600 MHz, 128 MB Memory, 20 GB HDD, 3COM Plug&Play PCMCIA 100Base-T Network Card). This served as a flexible testbed for middleware development. It was also very compact (a small rack) and low power (less than 400 watts/20 nodes)
    2) Development of the Parakeet Fault Tolerant, High-Performance Cluster MPI which allows various checkpointing algorithms to be selected from a set of available algorithm by the user according to his application characteristics. Parakeet is an entirely user-level implementation, is portable and efficient, and frees the users from checkpointing concerns within his code. We have implemented various checkpointing strategies to achieve the best efficiency, and conducted detailed performance analysis comparing with full restart.
    3) Self-organizing cluster middleware, the Lucie prototype. As a basic technology, plug&play clustering requires hot swapping of nodes, reconfiguration of software organization within a node, and dynamic partition management. Lucie builds on existing Linux tools to implement full cluster configuration capabilities in an automated fashion. Lucie allows fully automated (re)installation and configuration of every node in a cluster in a very rapid fashion.
    4) Prototyping scalable, secure and self-organizing cluster communication. We have identified that scalable, reliable, secure, and self-organizing communication within the cluster node is the essential foundation for reliable, plug&play clustering. We have prototyped some of the ideas in the Gfarm (cluster middleware for Petascale Datagrid processing) job manager : there, the self-organizing process ring structure governs all the nodes, and jobs can be started up rapidly in parallel, in a safe secure manner.

    researchmap

  • 超広域高性能計算環境の基礎的研究

    1998 - 2001

    科学技術振興事業団  戦略的創造研究推進事業(さきがけ) 

      More details

    Authorship:Principal investigator  Grant type:Competitive

    researchmap

  • Interactive Software Architecture for Advanced Movile Interface

    Grant number:10480055  1998 - 1999

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)  Grant-in-Aid for Scientific Research (B)

    MATSUOKA Satoshi, YONEZAWA Akinori, TAKAHASHI Shin, SHIBAYAMA Etusya

      More details

    Grant amount:\10400000 ( Direct Cost: \10400000 )

    (1) Pen-Based Drawing System for Geometric Design:
    We proposed two novel interaction techniques called "interactive beautification" and "predictive drawing." Interactive beautification receives the user's free stroke input and generates multiple candidates by considering possible geometric constraints. Predictive drawing shows next drawing operation based on the spatial relationship among existing figures. We then developed a prototype system Pegasus, and achieved fast and flexible drawing interactions.
    (2) Pen-Based Sketching Interface for 3D Freeform Design :
    We developed a pen-based drawing system "Teddy", a sketching interface for creating 3D polygonal surface models. The user can create a 3D model by drawing 2D silhouette with a pen. Our informal user study showed that a first-time user typically masters the operations within 10 minutes, and can construct 3D models within minutes. This papers was awarded Impact Paper at SIGGRAPH'99.
    (3) Context-aware information system for mobile computer :
    We proposed a new information system architecture called the "WEBPAD system". WEBPAD is attempts to be a general-purpose context-aware UI system, which recognizes and predicts current user's "context" information. Our system employs thin-client technique to offload much of the data processing to the server. The server collects useful information from the WWW automatically. The user can create new information via a camera and a microphone, and the information is distributed to others via information filtering techniques.
    (4) Multiple Painting Input System :
    We developed a prototype system of multiple pointing input system. Our system recognizes multiple movements of devices that is moved by a user simultaneously, so the user can point and manipulate multiple object on the screen.

    researchmap

  • Development of System-LSI Architectures Based on Merged Memory/Logic Technology

    Grant number:09358005  1997 - 1999

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)  Grant-in-Aid for Scientific Research (A)

    MURAKAMI Kazuaki, MIYAZAKI Akio, TANIGUCHI Hideo, YASUURA Hiroto, SAWADA Sunao, IWAIHARA Mizuho

      More details

    Grant amount:\28600000 ( Direct Cost: \28600000 )

    The objectives of this research project are to develop system-LSI architectures and computer-system architectures, or PPRAM (Parallel Processing RAM), which are based mainly on merged memory/logic technology, parallel/distributed processing technology , and inter-LSI high-speed interconnect technology. The project has performed the following research results.
    (1) Inter-LSI high-speed interconnect standard, or PPRAM-Link : The project has defined a set of specifications for physical layer, logical layer, and API of PPRAM-Link ; and then it has implemented these specifications in several ways.
    (2) Reference PPRAM architectures : The project has developed a couple of architectures good for merged DRAM/logic system-LSI, such as (I) shared-register CMP (chip multiprocessor), (ii) statically/dynamically variable line-size cache, (iii) way-predicting set-associative cache.
    (3) DRAM refresh architectures for merged DRAM/logic LSI : The project has developed a couple of architectures good for merged DRAM/logic system-LSI so that alleviate the DRAM refresh characteristics to be worsened by on-chip logic.
    (4) Hardware/software codesign methodology for embedded system-LSI : The project has developed a hardware/software codesign methodology based on soft-core processor and Valen-C technologies.
    (5) Software-controlled low power architectures : The project has designed a processor architecture, or PowerPro, which can optimize the power consumption by means of software control according to the system load.
    (6) Test methodology for system-LSI : The project has proposed a test methodology good for system-LSI, which combines BIST and external test.
    (7) PPRAM-based MOE (molecular orbital calculation engine) : The project has developed some PPRAM applications, including MOE chips and MOE system. The MOE chip consists of a 32-bit integer RISC processor, a 76-bit MO-specific floating-point processor, 1Mb SRAM, and a PPRAM-Link interface. The MOE system consists of a number of MOE boards, each of which includes five MOE chips and a bridge chip for PPRAM-Link and IEEE1394.
    (8) PPRAM-based realtime digital-watermarking engine for movies : Another PPRAM application is a realtime digital-watermarking engine for movies. The project has implemented a suite of wavelet transformation function, PPRAM-Link interface and PCI-bus interface by means of FPGA.

    researchmap

  • A Methodology of Pattern-Oriented Visual Parallel Programming and Its Interactive Supports

    Grant number:09680328  1997 - 1999

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)  Grant-in-Aid for Scientific Research (C)

    SHIBAYAMA Etsuya, TAKAHASHI Shin, MATSUOKA Satoshi

      More details

    Grant amount:\3100000 ( Direct Cost: \3100000 )

    1. We have proposed a visual language, in which higher levels of abstractions for object-oriented parallel programming can be effectively described. Thanks to a visual notation, inherently diagrammatic concepts such as patterns and architectures can have comprehensible representations in the language.
    2. We have proposed the notion of visual pattern and a programming methodology based upon this notion. We have also designed an interactive and integrated environment for uses/reuses of visual patterns.
    3. We have designed and implemented a visual language environment KLIEG, whose major features areas follows :
    (1) A single notation is available in design, programming, and debugging phases.
    (2) A simple graphical user interface is provided for uses/reuses of visual patterns, which encapsulate design information of object compositions.
    (3) Software architectures are represented as nested compositions of visual patterns. Each visual pattern in any level is replaceable through a simple graphical user interface.
    (4) Upon displaying portions of a program, the environment may put more stresses on (allocate more area for) the objects that should be replaced.
    (5) For each hole in a visual pattern, multiple alternative implementations can be defined
    (6) A visual tracer that automatically animates visual program executions is available.

    researchmap

  • StackThreads/MP : Integrating Futures into Calling Standards

    Grant number:08408008  1996 - 1998

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)  Grant-in-Aid for Scientific Research (A)

    YONEZAWA Akinori, MASUHARA Hidehiko, KOBAYASHI Naoki, MATSUOKA Satoshi, TAURA Kenjiro

      More details

    Grant amount:\34600000 ( Direct Cost: \34600000 )

    An implementation scheme of fine-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is implemented. Like previous similar systems, it performs an asynchronous call as if it were an ordinary procedure call, and detaches the callee from the caller when the callee suspends or either of them migrates to another processor. Unlike previous similar systems, it detaches and connects arbitrary frames generated by off-the-shelf sequential compilers obeying calling standards. As a consequence, it requires neither a frontend preprocessor nor a native code generator that has a builtin notion of parallelism. The system practically works with unmodified GNU C compiler (GCC). Desirable extensions to sequential compilers for guaranteeing portability and correctness of the scheme are clarified and claimed modest. Experiments indicate that sequential performance is not sacrificed for practical applications and both sequential and parallel performance are comparable to Cilk, whose current implementation requires a fairly sophisticated preprocessor to C. These results show that efficient asynchronous calls (i.e., future calls) can be integrated into current calling standard with a very small impact both on sequential performance and compiler engineering.

    researchmap

  • Advanced User Interface Construction via Multiple Visual Examples

    Grant number:06452388  1994 - 1996

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)  Grant-in-Aid for Scientific Research (B)

    MATSUOKA Satoshi, TAKAHASHI Shin, YONEZAWA Akinori

      More details

    Grant amount:\7700000 ( Direct Cost: \7700000 )

    (1) Interactive constructionof GUI via multiple user examples : We have constructed a framework where a user can construct declarative rules of by providing multiple examples of user interfaces. In particular, the system infers the intentions of user corrections, whereby a cycle of system presentation versus user correction refines the interface.
    (2) Declarative Animation Interface : We extended our previously proposed model of GUI called "bidirectional translation of abstract data and pictures" by incorporating the notion of time, thereby achieving semi-automated visualization of dynamic behavior of application data structures. The user merely declares the correspondence between the program's actions on the data structure and the desired animation effects, and the rest of the animation is generated by interpolation. The system has also been extended to incorporate 3-D interfaces.
    (3) A theory of Generalized Local Propagation : We generalized the theory of local propagation in solving the constraints in a hierarchical constraint system. First, we refined the definition of hierarchical constraints by Alan Borning et.al. ; then, we defined the notion of local semi-monotonicity and global-monotonicity in the solution graph, obtaining the necessary and sufficient condition under which the solution obtained by the local propagation algorithm can be considered "correct". We then categorized different solvers and comparators. Finally, we developed a constraint solver DETAIL based on the theory, and used it in our prototype systems.
    (4) New interaction techniques for pen computing : We proposed a model of recognition in local structures called the "Link Model". We then developed a pen-interaction techniques in a prototype pen-system Pegasus. Which presents multiple candidates per user actions in drawing, achieving fast and flexible drawing interactions.
    (5) Interactive Penumbrae : We proposed a new use of penumbrae in 3-D interaction, called the "Interactive Penumbrae". An artificially-drawing penumbrae in 3-D space enhances user perception of height and location from the projection plane. A fast rendering algorithm has been developed which makes the technique useful for real-time interaction in 3-D space.

    researchmap

  • Implementation of Parallel Functional Programming Systems

    Grant number:06558039  1994 - 1996

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)  Grant-in-Aid for Scientific Research (A)

    TAKEICHI Masato, TANAKA Tetsuro, MATSUOKA Satoshi, IWASAKI Hideya, YONEZAWA Akinori

      More details

    Grant amount:\15800000 ( Direct Cost: \15800000 )

    This project aims at development of functional programming systems for parallel computers. Implementation of a parallel functional system Parallel Gofer on the AP1000 computer has been finished and under evaluation. This implementation is based on the Gofer system developed by Mark Jones and accepts any Gofer programs for sequential evaluation. Programs are allowed to include references to extended library functions for parallelization.
    Several new ideas for this implementation have been published already. One of such ideas is so-called "unboxing" techniques for data construction. This implementation showed that the idea is promising while some optimization should be considered for practical application.
    Along with this implementation, a novel idea for optimization has been explored and implemented. Although most optimization so far relies on heuristics, our new system is completely mechanical. It is based on hylomorphisms which comes from research on constructive algorithmics. This technique is applicable to sequential and parallel functional programs.
    These results have been made public at international conferences and published in the proceedings.

    researchmap

  • Design and Implementation of Concurrent Programming Language based on Linear Logic

    Grant number:06452389  1994 - 1995

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for General Scientific Research (B)  Grant-in-Aid for General Scientific Research (B)

    YONEZAWA Akinori, KOBAYASHI Naoki, MATSUOKA Satoshi

      More details

    Grant amount:\6000000 ( Direct Cost: \6000000 )

    The goal of our research is to develop a theoretical foundation of concurrent computation based on linear logic so that we can uniformly discuss various issues of concurrent programming languages : program analyzes, language desigh, and implementation techniques. The concrete research achievements are in order.
    1. Development of a concurrent linear logic programming framework ACL/Higher-order ACL : We showed that the esence of concurrent computation is captured by proof search in first-otder linear logic. We further extended it to a higher-order system, and showed that static type systems and higher-order processes for concurrent programming languages are naturally introduced in the system.
    2. Design and implementation of a typed concurrent linear logic programming language HACL : We designed and implemented a programming language HACL based on concurrent linear logic programming framework. A compiler on a single processor workstation was constructed, and programming experiments were made by using the compiler. We also constructed a prototype compiler on a cluster of workstations.
    3. Study of high-level mechanisms for concurrent object-oriented languages through HACL : We showed that various high-level mechanisms of concurrent objects-inheritance, access control for methods-are easily constructed on top of HACL.The result implies not onbly that we can construct a concurrent object-oriented interface of HACL,but also that we can uniformly discuss various issues of other concurrent object-oriented languages.
    4. Development of program analysis techniquse : Novel program analysis techniques for concurrent programming languages were developed through HACL.The proposed techniques enable compile-time optimizations, and also improve reliability of concurrent programs.

    researchmap

  • Efficient Implementation of Concurrent Object-Oriented Languages for General Purpose MIMD Parallel Computers.

    Grant number:05558026  1993 - 1995

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Developmental Scientific Research (B)  Grant-in-Aid for Developmental Scientific Research (B)

    YONEZAWA Akinori, KOBAYASHI Naoki, MATSUOKA Satoshi

      More details

    Grant amount:\20500000 ( Direct Cost: \20500000 )

    The goal of our research is to develop a highly efficient languages processor (i.e., compiler and runtime system) for concurrent object-oriented languages on general purpose MIMD parallel machines, demonstrating the viability of the concurrent object-oriented paradigm in the practical setting. After three-year research effort, our goal has been basically achieved and furthermore, much good results have been obtained in designing a new efficient debugging scheme for multi-thread parallel programs. The concrete research achievements are in order.
    (1) A new concurrent object-oriented language called ABCL/f was designed. In the normal style of programming in this language, a mutable data structure is represented as a concurrent object where its access is only allowed through its associated methods which are invoked mutually exclusively [Ref 7].
    (2) By simplifying ABCL/f, we newly designed a language called Schematic. This language can be viewed as an concurrent object-oriented extension of the Scheme languages which is a very popular dialect of Lisp. [Ref14]
    (3) We designed an abstract machine called StackThreads amd highly efficient techniques for implementing StackThreads were developed. [Ref 1,2,8]
    (4) Based on the implementation of StackThread, we implemented a language processor for ABCL/f where good performance numbers were obtained.
    (5) A new garbage collection scheme was disigned and implemented and its performance was measured. This scheme was incorporated in our language processor mentioned in (4). [10,11]
    (6) We designed a new debugging scheme which supports replay and tracing, requiring a small amount of logging information even where a large number of threads are active in a program. This scheme has been implemented. [Ref 15]
    (7) In ABCL/f, we developed programs for (a) predicting RNA secondary structures and (b) finite element method application and N-body problem applications [9,12,13].

    researchmap

  • 直接操作インターフェースのための高速制約解消系の実現

    Grant number:05780227  1993

    日本学術振興会  科学研究費助成事業 奨励研究(A)  奨励研究(A)

    松岡 聡

      More details

    Grant amount:\900000 ( Direct Cost: \900000 )

    グラフィカルユーザインターフェース(以下、GUI)では、計算機内の情報の視覚化と、その視覚化に対する直接操作の実現のためのプログラム開発コストが大きい。その解決法の1つとして、幾何的制約を用いて図の構造を表現し、制約を解くことによって図を校正し、制約の動的変更により直接操作を実現する手法が注目され、盛んに研究されている。しかし、これまでに提案された制約解消法は、高速化のために、制約の連立や非線形な制約を禁ずるなど、制約系のクラスを大幅に制限していたため、これを採用したGUIシステムでは実用上必要な図を表現することが難しかった。そこで本研究では、制約の連立や非線形な制約も扱うことができる高速な制約解消法を開発した。これは以下のような分析と考察に基づいている。GUIで使われる一般的な制約系では、制約の連立は局所的に現れていて、大部分では制約を単独に解くことが可能である。また、単独であれば非線形な制約でも高速に解くことができ、連立する部分が小さければ全体的な速度低下への影響も少ない。そこで開発した制約解消法では、制約系を分析して連立する必要のある最小限の部分を求め、それ以外の制約を単独で解ける部分と合わせて、統一的に扱うようにした。これに基づいて制約解消系を作成し、その性能評価を行ったところ、制約系のクラスを大幅に制限した制約解消系と比べても、それほど大きな速度低下を生じていないことがわかった。さらに、この制約解消系を使用して、複数の視覚的例による直接操作インターフェイスの対話的実現を行うシステムImageを作成した。このシステムでは、制約を連立する機能を大いに活用している。また、現在開発中の宣言的記述に基づくアルゴリズムアニメーション作成システムにも採用する予定である。

    researchmap

  • 視覚的例による宣言的グラフィカルユーザインターフェースのプログラミング

    Grant number:04780025  1992

    日本学術振興会  科学研究費助成事業 奨励研究(A)  奨励研究(A)

    松岡 聡

      More details

    Grant amount:\900000 ( Direct Cost: \900000 )

    researchmap

  • グラフィカルユーザインターフェースにおける新しい抽象データの視覚化及び操作の手法

    Grant number:03780021  1991

    日本学術振興会  科学研究費助成事業 奨励研究(A)  奨励研究(A)

    松岡 聡

      More details

    Grant amount:\900000 ( Direct Cost: \900000 )

    researchmap

  • Computational Reflection in Object-Oriented Concurrent Computing and its Applications

    Grant number:01420045  1989 - 1991

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for General Scientific Research (A)  Grant-in-Aid for General Scientific Research (A)

    YONEZAWA Akinori, WATANABE Takuo, MATSUOKA Satoshi

      More details

    Grant amount:\11800000 ( Direct Cost: \11800000 )

    1. An Object-oriented Concurrent Language ABCL/R which is able to describe reflective computation was designed and its preliminary implementation has been completed. [1]
    2. We leaned a new notion called "Group-Wide Reflection" which is the reflective capability for a group of concurrent objects as a whole. [4, 5, 9]
    3. An Actor-based reflective model and language ACT/R was designed, which have the group-wide reflective capability. [4, 5, 9]
    4. Various results including correctness of the reflective model have been obtained by analyzing the notion of Group-Wide Reflection. [4, 5, 9]
    5. A prototype implementation of ACT/R was completed.
    6. We defined a new reflective notion called "Hybrid Group Reflection" by incorporating the results of our research on Group-Wide Reflection into our language ACT/R. [6, 13, 14]
    7. In conventional languages, important aspects of parallel computation such as scheduling and shared resource coordination could only be programmed in an ad hoc way. Our research results enabled us to model and program such aspects of parallel computation in reflective (user) languages. Using our reflective language, we also showed that controlling and programming such aspects of parallel discrete event simulation can be done in a very succinctly way in the same language as the simulation is written. [13, 14, 11]
    8. Our implementation of ABCL/R2 on a multi-processor workstation Omuron Luna88k demonstrated the effectiveness of the use of reflective capability in programming parallel discrete event simulation.
    9. Using examples, we showed that the reflective capability provides an effective means for coping with the inheritance anomaly problem. [6]
    10. The runtime kernel for an object-oriented concurrent language includes not only its intermediate-code interpreter, method dispatcher, and garbage collector, but also its scheduler and inter-node communication facilities. It is often the case in distributed computing environments that the behavior of the runtime k. emel needs to be adaptive according to its execution environment. For this purpose, we constructed a reflective architecture system called RbCl the almost all runtime facilities of which can be dynamically replaceable with user-defined ones. [8, 11]
    Other Contributions : - International Workshops -
    Two of our research members, Satoshi Matsuoka and Takuo Watanabe, organized (as program committee members) the following two international workshops :
    1. ECOOP/OOPSLA'90 Workshop on Reflection and Metalevel Architectures in Object-Oriented Programming, Ottawa, October 21 1990.
    2. OOPSLA'90 Workshop on Reflection and Metalevel Architectures in Object-Oriented Programming, Phoenix, October 7, 1991.

    researchmap

▼display all