2025/02/28 更新

写真a

ササキ ヒロシ
佐々木 広
SASAKI HIROSHI
所属
工学院 准教授
職名
准教授
連絡先
メールアドレス
外部リンク

学位

  • 博士 ( 2008年3月   東京大学 )

研究分野

  • 情報通信 / 計算機システム  / コンピュータアーキテクチャ

  • 情報通信 / 情報セキュリティ  / コンピュータセキュリティ

学歴

  • 東京大学   大学院 工学系研究科   先端学際工学専攻 博士課程修了

    - 2008年3月

      詳細を見る

  • 東京大学   大学院 情報理工学系研究科   システム情報学専攻 修士課程修了

    - 2005年3月

      詳細を見る

  • 東京大学   工学部   計数工学科 卒業

    - 2003年3月

      詳細を見る

経歴

  • 東京工業大学   工学院 情報通信系   准教授

    2020年4月 - 現在

      詳細を見る

    国名:日本国

    researchmap

  • コロンビア大学   コンピュータサイエンス専攻   研究員 (Associate Research Scientist)

    2016年4月 - 2020年3月

      詳細を見る

    国名:アメリカ合衆国

    researchmap

  • コロンビア大学   コンピュータサイエンス専攻   訪問研究員(日本学術振興会 海外特別研究員)(Visiting Research Scientist)

    2014年4月 - 2016年3月

      詳細を見る

    国名:アメリカ合衆国

    researchmap

  • IBM T.J. ワトソン研究所   訪問研究員 (Visiting Research Scientist)

    2013年7月 - 2014年3月

      詳細を見る

    国名:アメリカ合衆国

    researchmap

  • 九州大学   大学院 システム情報科学研究院   特任准教授

    2011年8月 - 2014年3月

      詳細を見る

    国名:日本国

    researchmap

  • 東京大学   大学院 情報理工学系研究科   特任助教

    2010年4月 - 2011年7月

      詳細を見る

    国名:日本国

    researchmap

  • 東京大学   先端科学技術研究センター   特任助教

    2008年4月 - 2010年3月

      詳細を見る

    国名:日本国

    researchmap

▼全件表示

論文

  • RAPLET: Demystifying Publish/Subscribe Latency for ROS Applications 査読

    Keisuke Nishimura, Takahiro Ishikawa, Hiroshi Sasaki, Shinpei Kato

    In Proceedings of the IEEE 27th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)   41 - 50   2021年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE  

    DOI: 10.1109/rtcsa52859.2021.00013

    researchmap

  • Practical Byte-Granular Memory Blacklisting using Califorms. 査読

    Hiroshi Sasaki, Miguel A. Arroyo, M. Tarek Ibn Ziad, Koustubha Bhat, Kanad Sinha, Simha Sethumadhavan

    In Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO)   558 - 571   2019年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1145/3352460.3358299

    researchmap

  • Why Do Programs Have Heavy Tails? 査読

    Hiroshi Sasaki, Fang-Hsiang Su, Teruo Tanimoto, Simha Sethumadhavan

    In Proceedings of the 2017 IEEE International Symposium on Workload Characterization (IISWC)   135 - 145   2017年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1109/IISWC.2017.8167771

    researchmap

  • Characterization and Mitigation of Power Contention across Multiprogrammed Workloads 査読

    Hiroshi Sasaki, Alper Buyuktosunoglu, Augusto Vega, Pradip Bose

    In Proceedings of the 2016 IEEE International Symposium on Workload Characterization (IISWC)   55 - 64   2016年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE  

    Shared resource contention has been a major performance issue for CMPs. In this paper we focus on power, which is one of the most valuable shared resources of CMPs. We believe it is important to study power contention, especially with the prevalence of power capping features among modern commercial microprocessors. When multiple processes compete for power in such systems, the power management system attempts to mitigate the contention (i.e., reduce the power consumption) by slowing down the processor, which results in degraded total system performance. We characterize this phenomenon using a real testbed with an Intel processor with power capping capability realized by the RAPL technology. We observe noticeable performance degradation for SPEC CPU2006, especially at tighter power caps. In order to solve this problem, we develop a shared resource-aware scheduling algorithm that improves system performance by mitigating the contention for power and the shared memory subsystem at the same time. Evaluation results across a variety of multiprogrammed workloads show performance improvements over a state-of-the-art scheduling policy which only considers memory subsystem contention. In addition, we present a guard mechanism implemented on top of the proposed scheduler that greatly improves performance when there is severe power contention that introduces performance anomalies.

    DOI: 10.1109/IISWC.2016.7581266

    Web of Science

    researchmap

  • Power and Performance Characterization and Modeling of GPU-Accelerated Systems. 査読

    Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, Martin Peres

    In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS)   113 - 122   2014年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1109/IPDPS.2014.23

    researchmap

  • McRouter: Multicast within a Router for High Performance Network-on-Chips 査読

    Yuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

    In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)   319 - 329   2013年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE  

    The inevitable advent of the multi-core era has driven an increasing demand for low latency on-chip interconnection networks (or NoCs). Being a critical part of the memory hierarchy for modern chip multi-processors (CMPs), these networks face stringent design constraints to provide fast communication with tight power budget. Modern NoC's first-order concern is clearly its latency, while we also find that internal bandwidth of its routers is relatively plentiful; thus, we present a low latency router design utilizing a technique we call "multicast within a router" or McRouter, which allows productive utilization of remaining bandwidth inside a NoC router. McRouter allows a single cycle transfer of flits which shortens the communication latency when there is enough remaining bandwidth within the router. The key idea is to transmit a header flit to all possible output ports (multicast) so that it is always transmitted to the correct output port without relying on route computation. In addition, we find it is affordable with marginal power overhead while still being a stand-alone design by maintaining portability and modularity (unlike look-ahead routing based designs). Our evaluation with application traffic shows that McRouter helps achieving system speed-ups of 1.28, 1.17 and 1.05 over the conventional router (CR), the VSA router (VSAR) and the prediction router (PR), respectively.

    DOI: 10.1109/PACT.2013.6618828

    Web of Science

    researchmap

    その他リンク: http://doi.ieeecomputersociety.org/10.1109/PACT.2013.6618828

  • Coordinated Power-Performance Optimization in Manycores. 査読

    Hiroshi Sasaki, Satoshi Imamura, Koji Inoue

    In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)   51 - 61   2013年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1109/PACT.2013.6618803

    researchmap

    その他リンク: http://doi.ieeecomputersociety.org/10.1109/PACT.2013.6618803

  • Scalability-Based Manycore Partitioning 査読

    Hiroshi Sasaki, Teruo Tanimoto, Koji Inoue, Hiroshi Nakamura

    In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)   107 - 116   2012年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE  

    Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a number of active processes in the computer system and the continuing development of multi-threaded applications benefit from the growing core counts to sustain system throughput. This trend brings us a situation where a number of parallel applications simultaneously being executed on a single system. Since multi-threaded applications try to maximize its throughput by utilizing the whole system, each of them usually create equal or larger number of threads compared to underlying logical core counts. This introduces much greater number of threads to be co-scheduled in the entire system. However, each program has different characteristics (or scalability) and contends for shared resources, which are the CPU cores and memory hierarchies, with each other. Therefore, it is clear that OS thread scheduling will play a major role in achieving high system performance under such conditions. We develop a sophisticated scheduler that (1) dynamically predicts the scalability of programs via the use of hardware performance monitoring units, (2) decides the optimal number of cores to be allocated for each program, and (3) allocates the cores to programs while maximizing the system utilization to achieve fair and maximum performance. The evaluation results on a 4S-core AMD Opteron system show improvements over the Linux scheduler for a variety of multiprogramming workloads.

    DOI: 10.1145/2370816.2370833

    Web of Science

    researchmap

  • Practical Byte-Granular Memory Blacklisting using Califorms. 査読

    Hiroshi Sasaki, Miguel A. Arroyo, M. Tarek Ibn Ziad, Koustubha Bhat, Kanad Sinha, Simha Sethumadhavan

    CoRR   abs/1906.01838   2019年

     詳細を見る

    掲載種別:研究論文(学術雑誌)  

    researchmap

  • Evaluating Energy-Efficiency of DRAM Channel Interleaving Schemes for Multithreaded Programs 査読

    Satoshi Imamura, Yuichiro Yasui, Koji Inoue, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa

    IEICE Transactions on Information and Systems   E101D ( 9 )   2247 - 2257   2018年9月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG  

    The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality (RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5% on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively.

    DOI: 10.1587/transinf.2017EDP7296

    Web of Science

    researchmap

  • Power-Efficient Breadth-First Search with DRAM Row Buffer Locality-Aware Address Mapping 査読

    Satoshi Imamura, Yuichiro Yasui, Koji Inoue, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa

    In 2016 High Performance Graph Data Management and Processing Workshop (HPGDMP)   17 - 24   2017年1月

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    © 2016 IEEE. Graph analysis applications have been widely used in real services such as road-traffic analysis and social network services. Breadth-first search (BFS) is one of the most representative algorithms for such applications; therefore, many researchers have tuned it to maximize performance. On the other hand, owing to the strict power constraints of modern HPC systems, it is necessary to improve power efficiency (i.e., performance per watt) when executing BFS. In this work, we focus on the power efficiency of DRAM and investigate the memory access pattern of a state-of-the-art BFS implementation using a cycle-accurate processor simulator. The results reveal that the conventional address mapping schemes of modern memory controllers do not efficiently exploit row buffers in DRAM. Thus, we propose a new scheme called per-row channel interleaving and improve the DRAM power efficiency by 30.3% compared to a conventional scheme for a certain simulator setting. Moreover, we demonstrate that this proposed scheme is effective for various configurations of memory controllers.

    DOI: 10.1109/HPGDMP.2016.010

    Scopus

    researchmap

  • Enhanced Dependence Graph Model for Critical Path Analysis on Modern Out-of-Order Processors. 査読

    Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Hiroshi Sasaki

    IEEE Computer Architecture Letters.   16 ( 2 )   111 - 114   2017年

     詳細を見る

    掲載種別:研究論文(学術雑誌)  

    DOI: 10.1109/LCA.2017.2684813

    researchmap

  • Heavy Tails in Program Structure. 査読

    Hiroshi Sasaki, Fang-Hsiang Su, Teruo Tanimoto, Simha Sethumadhavan

    IEEE Computer Architecture Letters.   16 ( 1 )   34 - 37   2017年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:Institute of Electrical and Electronics Engineers Inc.  

    Designing and optimizing computer systems require deep understanding of the underlying system behavior. Historically many important observations that led to the development of essential hardware and software optimizations were driven by empirical observations about program behavior. In this paper, we report an interesting property of program structures by viewing dynamic program execution as a changing network. By analyzing the communication network created as a result of dynamic program execution, we find that communication patterns follow heavy-tailed distributions. In other words, a few instructions have consumers that are orders of magnitude larger than most instructions in a program. Surprisingly, these heavy-tailed distributions follow the iconic power law previously seen in man-made and natural networks. We provide empirical measurements based on the SPEC CPU2006 benchmarks to validate our findings as well as perform semantic analysis of the source code to reveal the causes of such behavior.

    DOI: 10.1109/LCA.2016.2574350

    Scopus

    researchmap

  • Mitigating Power Contention: A Scheduling Based Approach. 査読

    Hiroshi Sasaki, Alper Buyuktosunoglu, Augusto Vega, Pradip Bose

    IEEE Computer Architecture Letters.   16 ( 1 )   60 - 63   2017年

     詳細を見る

    掲載種別:研究論文(学術雑誌)  

    DOI: 10.1109/LCA.2016.2572080

    researchmap

  • A Runtime Optimization Selection Framework to Realize Energy Efficient Networks-on-Chip 査読

    Yuan He, Masaaki Kondo, Takashi Nakada, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

    IEICE Transactions on Information and Systems   E99D ( 12 )   2881 - 2890   2016年12月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG  

    Networks-on-Chip (or NoCs, for short) play important roles in modern and future multi-core processors as they are highly related to both performance and power consumption of the entire chip. Up to date, many optimization techniques have been developed to improve NoC's bandwidth, latency and power consumption. But a clear answer to how energy efficiency is affected with these optimization techniques is yet to be found since each of these optimization techniques comes with its own benefits and overheads while there are also too many of them. Thus, here comes the problem of when and how such optimization techniques should be applied. In order to solve this problem, we build a runtime framework to throttle these optimization techniques based on concise performance and energy models. With the help of this framework, we can successfully establish adaptive selections over multiple optimization techniques to further improve performance or energy efficiency of the network at runtime.

    DOI: 10.1587/transinf.2016PAP0026

    Web of Science

    researchmap

  • A scalability analysis of many cores and on-chip mesh networks on the TILE-Gx platform 査読

    Ye Liu, Hiroshi Sasaki, Shinpei Kato, Masato Edahiro

    In Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)   46 - 52   2016年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE  

    TILE-Gx processors that have emerged in recent years can be considered as the representative of prevailing manycore processors. The available TILE-Gx processors are featured with directory-based cache coherence protocol, two-dimensional mesh networks and up to 72 on-chip cores. In this paper, we study and analyze problems of performance scalability and network collision of many-core processors using the TILE-Gx36 processor.
    We find that most multi-threaded programs from the PARSEC benchmark suite, which aim at shared-memory on-chip processors, cannot scale well on Linux as the number of cores increases. Meanwhile, applications compiled with Pthreads get affected by the approach of task-to-core assignment. The results also show that current multi-threaded applications do not entirely utilize the hardware resources on TILE-Gx36 processor. Moreover, OS designers might need to pay attention to the memory allocation if memory stripping is not supported. Because huge memory accesses to only one memory controller can burden the twodimensional mesh network. This observation appears if cores access the further memory controllers intensively as well.

    DOI: 10.1109/MCSoC.2016.40

    Web of Science

    researchmap

  • Runtime Multi-Optimizations for Energy Efficient On-chip Interconnections 査読

    Yuan He, Masaaki Kondo, Takashi Nakada, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

    In Proceedings of the 33nd IEEE International Conference on Computer Design (ICCD)   455 - 458   2015年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE  

    On-chip interconnection (or NoC) is a major performance and power contributor to modern and future multicore processors. So far, many optimization techniques have been developed to improve its bandwidth, latency and power consumption. But it is not clear how energy efficiency is affected since an optimization technique normally comes with overheads. This paper thus attempts to address when and how such optimization techniques should be applied and tuned to help achieve better energy efficiency. We firstly model the performance and energy impacts of representative NoC optimization techniques. These models help us more easily understand the consequences when applying these optimization techniques and their combinations under different circumstances. Moreover, based on such modeling, we propose and implement an adaptive control over these NoC optimization techniques to improve both performance and energy efficiency of the network. Our results show that, this proposal can achieve an average improvement of 26% and 57% on network performance and energy delay product, respectively.

    DOI: 10.1109/ICCD.2015.7357147

    Web of Science

    researchmap

  • A Flexible Hardware Barrier Mechanism for Many-Core Processors 査読

    Takeshi Soga, Hiroshi Sasaki, Tomoya Hirao, Masaaki Kondo, Koji Inoue

    In Proceedings of the 20th Asia and South Pacific Design Automation Conference (ASP-DAC)   61 - 68   2015年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE  

    This paper proposes a new hardware barrier mechanism which offers the flexibility to select which cores should join the synchronization, allowing for executing multiple multi-threaded applications by dividing a many-core processor into several groups. Experimental results based on an RTL simulation show that our hardware barrier achieves a 66-fold reduction in latency over typical software based implementations, with a hardware overhead of the processor of only 1.8%. Additionally, we demonstrate that the proposed mechanism is sufficiently flexible to cover a variety of core groups with minimal hardware overhead.

    DOI: 10.1109/ASPDAC.2015.7058982

    Web of Science

    researchmap

  • Power-Capped DVFS and Thread Allocation with ANN Models on Modern NUMA Systems. 査読

    Satoshi Imamura, Hiroshi Sasaki, Koji Inoue, Dimitrios S. Nikolopoulos

    In Proceedings of the 32nd IEEE International Conference on Computer Design (ICCD)   324 - 331   2014年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1109/ICCD.2014.6974701

    researchmap

  • SMYLEref: A Reference Architecture for Manycore-Processor SoCs 招待 査読

    Masaaki Kondo, S. T. Nguyen, Tomoya Hirao, Takeshi Soga, Hiroshi Sasaki, Koji Inoue

    In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC)   561 - 564   2013年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE  

    Nowadays, the trend of developing micro-processor with tens of cores brings a promising prospect for embedded systems. Realizing a high performance and low power many-core processor is becoming a primary technical challenge. We are currently developing a many-core processor architecture for embedded systems as a part of a NEDO's project. This paper introduces the many-core architecture called SMYLEref along whit the concept of Virtual Accelerator on Many-core, in which many cores on a chip are utilized as a hardware platform for realizing multiple virtual accelerators. We are developing its prototype system with off-the-shelf FPGA evaluation boards. In this paper, we introduce the architecture of SMYLEref and the detail of the prototype system. In addition, several initial experiments with the prototype system are also presented.

    DOI: 10.1109/ASPDAC.2013.6509656

    Web of Science

    researchmap

  • Line Sharing Cache: Exploring Cache Capacity with Frequent Line Value Locality 査読

    Keitarou Oka, Hiroshi Sasaki, Koji Inoue

    In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC)   669 - 674   2013年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE  

    This paper proposes a new last level cache architecture called line sharing cache (LSC), which can reduce the number of cache misses without increasing the size of the cache memory. It stores lines which contain the identical value in a single line entry, which enables to store greater amount of lines. Evaluation results show performance improvements of up to 35% across a set of SPEC CPU2000 benchmarks.

    DOI: 10.1109/ASPDAC.2013.6509677

    Web of Science

    researchmap

  • Predict-More Router: A Low Latency NoC Router with More Route Predictions. 査読

    Yuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

    In Proceedings of the 2013 IEEE International Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), Communication Architecture for Scalable Systems (CASS)   842 - 850   2013年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:IEEE Computer Society  

    Network-on-Chip (NoC) is a critical part of the memory hierarchy of emerging multicores. Lowering its communication latency while preserving its bandwidth is key to achieving high system performance. By now, one of the most effective methods helps achieving this goal is prediction router (PR). PR works by predicting the route an incoming packet may be transferred to and it speculatively allocates resources (virtual channels and the switch crossbar) to the packet and traverses the packet's flits using this predicted route in a single cycle without waiting for route computation
    however, if prediction misses, the packet will then be processed in the conventional pipeline (in our work, four cycles) and the speculatively allocated router resources will be wasted. Obviously, prediction accuracy contributes to the amount of successful predictions, latency reduction and bandwidth consumption. We find that predictions hit around 65% for most applications even under the best algorithm so in such cases PR can at most accelerate about 65% of the packets while the left 35% will consume extra router resources and bandwidth. In order to increase the prediction accuracy, we propose a technique, which makes use of multiple prediction algorithms at the same time for one incoming packet. Such a prediction is more accurate. With this proposal, we design and implement predict-more router (PmR). While effectively increasing the prediction accuracy, PmR also helps utilizing remaining bandwidth within the router more productively. When both PmR and PR are evaluated under their best algorithm(s), we find that PmR is over 15% higher in prediction accuracy than PR, which helps PmR outperform PR by 3.5% on average in speeding-up the system. We also find that although PmR creates more contentions in prediction, these contentions can be well resolved and are kept within the router so both router internal bandwidth and link bandwidth are not exacerbated with it. © 2013 IEEE.

    DOI: 10.1109/IPDPSW.2013.40

    Scopus

    researchmap

  • Power and Performance of GPU-Accelerated Systems: A Closer Look. 査読

    Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, Martin Peres

    In Proceedings of the 2013 IEEE International Symposium on Workload Characterization (IISWC)   109 - 110   2013年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1109/IISWC.2013.6704675

    researchmap

  • Power and Performance Analysis of GPU-Accelerated Systems. 査読

    Yuki Abe, Hiroshi Sasaki, Martin Peres, Koji Inoue, Kazuaki J. Murakami, Shinpei Kato

    In 2012 Workshop on Power-Aware Computing and Systems (HotPower)   2012年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    researchmap

  • Performance Evaluation of 3D Stacked Multi-Core Processors with Temperature Consideration. 査読

    Takaaki Hanada, Hiroshi Sasaki, Koji Inoue, Kazuaki J. Murakami

    In Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC)   1 - 5   2012年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1109/3DIC.2012.6263025

    researchmap

  • Energy-Efficient Dynamic Instruction Scheduling Logic Through Instruction Grouping 査読

    Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

    IEEE Transactions on Very Large Scale Integration Systems (TVLSI)   17 ( 6 )   848 - 852   2009年6月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC  

    Dynamic instruction scheduling logic is quite complex and dissipates significant energy in microprocessors that support superscalar and out-of-order execution. We propose a novel microarchitectural technique to reduce the complexity and energy consumption of the dynamic instruction scheduling logic. The proposed method groups several instructions as a single issue unit and reduces the required number of ports and the size of the structure. This paper describes the microarchitecture mechanisms and shows evaluation results for energy savings and performance. These results reveal that the proposed technique can greatly reduce energy with almost no performance degradation, compared to the conventional dynamic instruction scheduling logic.

    DOI: 10.1109/TVLSI.2009.2013397

    Web of Science

    researchmap

  • Power-Performance Modeling of Heterogeneous Cluster-Based Web Servers. 査読

    Hiroshi Sasaki, Takatsugu Oya, Masaaki Kondo, Hiroshi Nakamura

    In Proceedings of the 2009 20th IEEE/ACM International Conference on Grid Computing (Grid)   35 ( 1 )   225 - 231   2009年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1109/GRID.2009.5353057

    researchmap

  • Cooperative Shared Resource Access Control for Low-Power Chip Multiprocessors 査読

    Noriko Takagi, Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

    In Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED)   177 - 182   2009年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:ASSOC COMPUTING MACHINERY  

    In a single-chip multiprocessor (CMP), the last-level cache and its lower memory hierarchy components are typically shared by multiple processors. Conflicts in these resources lead to poor overall performance of the CMP and/or unpredictable performance of the individual cores. If applications on different; cores have different performance constraints, even though these constraints can be satisfied by dynamic voltage and frequency scaling (DVFS) control of each core, conflicts in shared resources will lead to increased power consumption. Therefore, in the present paper, we derive a condition whereby, under resource conflicts, the total power consumption is minimized by a newly developed power consumption model and propose a method by which to minimize the power consumption of CMPs by cooperative access control of multiple shared resources and DVFS control. Experimental results reveal that the proposed technique can reduce power consumption by 15% on average in a dual-core CMP and by 13% in a quad-core CMP, as compared to the case in which only DVFS control is applied.

    DOI: 10.1145/1594233.1594278

    Web of Science

    researchmap

  • Improving Fairness, Throughput and Energy-Efficiency on a Chip Multiprocessor through DVFS. 査読

    Masaaki Kondo, Hiroshi Sasaki, Hiroshi Nakamura

    SIGARCH Computer Architecture News   35 ( 1 )   31 - 38   2007年

     詳細を見る

    掲載種別:研究論文(学術雑誌)  

    DOI: 10.1145/1241601.1241609

    researchmap

  • An Intra-Task DVFS Technique Based on Statistical Analysis of Hardware Events. 査読

    Hiroshi Sasaki, Yoshimichi Ikeda, Masaaki Kondo, Hiroshi Nakamura

    In Proceedings of the 4th ACM International Conference on Computing Frontiers (CF)   123 - 130   2007年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1145/1242531.1242551

    researchmap

  • Energy-Efficient Dynamic Instruction Scheduling Logic through Instruction Grouping. 査読

    Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

    In Proceedings of the 2006 ACM International Symposium on Low Power Electronics and Design (ISLPED)   43 - 48   2006年

     詳細を見る

    掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1145/1165573.1165585

    researchmap

  • Dynamic Instruction Cascading on GALS Microprocessors 査読

    Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

    In 2005 International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)   30 - 39   2005年

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:SPRINGER-VERLAG BERLIN  

    As difficulty and the costs of distributing a single global clock throughout a processor is growing generation by generation, Globally-Asynchronous Locally-Synchronous (GALS) designs are an alternative approach to the conventional synchronous processors.
    In this paper, we propose Dynamic Instruction Cascading (DIC). DIC is a technique to execute two dependent instructions in one cycle by scaling down the clock frequency. Lowering the clock frequency enables the signal to reach farther, thereby computing two instructions in one cycle becomes possible. DIC is effectively applied to GALS processors because lowering only the clock frequency of the target domain is needed and therefore unwanted performance degradation will be prevented.
    The results showed average performance improvement of 7% on SPEC CPU2000 Integer and MediaBench applications when assuming that DIC is possible by lowering the clock frequency to 80%.

    DOI: 10.1007/11556930_4

    Web of Science

    researchmap

▼全件表示

共同研究・競争的資金等の研究課題

  • ネットワーク分析を用いた低コストで柔軟な高信頼メモリアーキテクチャ

    研究課題/領域番号:22K19771  2022年 - 2024年

    日本学術振興会  科学研究費助成事業  挑戦的研究(萌芽)

    佐々木 広

      詳細を見る

    担当区分:研究代表者 

    配分額:6500000円 ( 直接経費:5000000円 、 間接経費:1500000円 )

    researchmap

  • RISC-Vシステム設計プラットフォームの研究開発

    2021年 - 2024年

    新エネルギー・産業技術総合開発機構 (NEDO)  高効率・高速処理を可能とするAIチップ・次世代コンピューティングの技術開発/ 研究開発項目④AIエッジコンピューティングの産業応用加速のための技術開発 

      詳細を見る

    担当区分:研究分担者 

    researchmap

  • エネルギーセキュアな計算機システムの研究

    研究課題/領域番号:26700004  2014年 - 2018年

    日本学術振興会  科学研究費助成事業  若手研究(A)

    佐々木 広

      詳細を見る

    担当区分:研究代表者 

    配分額:4290000円 ( 直接経費:3300000円 、 間接経費:990000円 )

    researchmap

  • 安全・安定かつ省電力な計算機システムを実現するハード・ ソフトウェア協調技術の研究

    2014年 - 2016年

    日本学術振興会  人材育成事業 海外特別研究員 

    佐々木 広

      詳細を見る

    担当区分:研究代表者 

    researchmap

  • 統計的モデリングに基づくチップマルチプロセッサの動的最適化に関する研究

    研究課題/領域番号:21700054  2009年 - 2010年

    日本学術振興会  科学研究費助成事業  若手研究(B)

    佐々木 広

      詳細を見る

    配分額:4420000円 ( 直接経費:3400000円 、 間接経費:1020000円 )

    複数のプロセッサコアを1チップに搭載するチップマルチプロセッサ(CMP:chip multiprocessor)の問題である共有リソースの競合を考慮し、動的電源電圧制御(DVFS)手法と併用することによる、動的な低消費電力化手法を提案した。本手法はCMP上での実行をモデル化し数式で表現し、電力を最小化する協調制御を導出した。評価の結果、従来手法と比較し、要求性能を維持しつつ、消費電力を削減可能なことが分かった。

    researchmap