研究者詳細 - 佐々木　広

写真a

ササキ　ヒロシ

佐々木　広

SASAKI HIROSHI

所属

工学院准教授

連絡先

ホームページ

https://hiroshi-sasaki.github.io

外部リンク

学位

博士（ 2008年3月東京大学）

研究分野

情報通信 / 計算機システム / コンピュータアーキテクチャ
情報通信 / 情報セキュリティ / コンピュータセキュリティ

学歴

東京大学大学院工学系研究科先端学際工学専攻博士課程修了

- 2008年3月

　詳細を見る

researchmap
東京大学大学院情報理工学系研究科システム情報学専攻修士課程修了

- 2005年3月

　詳細を見る

researchmap
東京大学工学部計数工学科卒業

- 2003年3月

　詳細を見る

researchmap

経歴

東京工業大学工学院情報通信系准教授

2020年4月 - 現在

　詳細を見る

国名：日本国

researchmap
コロンビア大学コンピュータサイエンス専攻研究員 (Associate Research Scientist)

2016年4月 - 2020年3月

　詳細を見る

国名：アメリカ合衆国

researchmap
コロンビア大学コンピュータサイエンス専攻訪問研究員（日本学術振興会海外特別研究員）(Visiting Research Scientist)

2014年4月 - 2016年3月

　詳細を見る

国名：アメリカ合衆国

researchmap
IBM T.J. ワトソン研究所訪問研究員 (Visiting Research Scientist)

2013年7月 - 2014年3月

　詳細を見る

国名：アメリカ合衆国

researchmap
九州大学大学院システム情報科学研究院特任准教授

2011年8月 - 2014年3月

　詳細を見る

国名：日本国

researchmap
東京大学大学院情報理工学系研究科特任助教

2010年4月 - 2011年7月

　詳細を見る

国名：日本国

researchmap
東京大学先端科学技術研究センター特任助教

2008年4月 - 2010年3月

　詳細を見る

国名：日本国

researchmap

▼全件表示

論文

RAPLET: Demystifying Publish/Subscribe Latency for ROS Applications 査読

Keisuke Nishimura, Takahiro Ishikawa, Hiroshi Sasaki, Shinpei Kato

In Proceedings of the IEEE 27th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA) 41 - 50 2021年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

DOI： 10.1109/rtcsa52859.2021.00013

researchmap
Practical Byte-Granular Memory Blacklisting using Califorms. 査読

Hiroshi Sasaki, Miguel A. Arroyo, M. Tarek Ibn Ziad, Koustubha Bhat, Kanad Sinha, Simha Sethumadhavan

In Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO) 558 - 571 2019年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1145/3352460.3358299

researchmap
Why Do Programs Have Heavy Tails? 査読

Hiroshi Sasaki, Fang-Hsiang Su, Teruo Tanimoto, Simha Sethumadhavan

In Proceedings of the 2017 IEEE International Symposium on Workload Characterization (IISWC) 135 - 145 2017年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/IISWC.2017.8167771

researchmap
Characterization and Mitigation of Power Contention across Multiprogrammed Workloads 査読

Hiroshi Sasaki, Alper Buyuktosunoglu, Augusto Vega, Pradip Bose

In Proceedings of the 2016 IEEE International Symposium on Workload Characterization (IISWC) 55 - 64 2016年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

Shared resource contention has been a major performance issue for CMPs. In this paper we focus on power, which is one of the most valuable shared resources of CMPs. We believe it is important to study power contention, especially with the prevalence of power capping features among modern commercial microprocessors. When multiple processes compete for power in such systems, the power management system attempts to mitigate the contention (i.e., reduce the power consumption) by slowing down the processor, which results in degraded total system performance. We characterize this phenomenon using a real testbed with an Intel processor with power capping capability realized by the RAPL technology. We observe noticeable performance degradation for SPEC CPU2006, especially at tighter power caps. In order to solve this problem, we develop a shared resource-aware scheduling algorithm that improves system performance by mitigating the contention for power and the shared memory subsystem at the same time. Evaluation results across a variety of multiprogrammed workloads show performance improvements over a state-of-the-art scheduling policy which only considers memory subsystem contention. In addition, we present a guard mechanism implemented on top of the proposed scheduler that greatly improves performance when there is severe power contention that introduces performance anomalies.

DOI： 10.1109/IISWC.2016.7581266

Web of Science

researchmap
Power and Performance Characterization and Modeling of GPU-Accelerated Systems. 査読

Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, Martin Peres

In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS) 113 - 122 2014年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/IPDPS.2014.23

researchmap
McRouter: Multicast within a Router for High Performance Network-on-Chips 査読

Yuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT) 319 - 329 2013年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

The inevitable advent of the multi-core era has driven an increasing demand for low latency on-chip interconnection networks (or NoCs). Being a critical part of the memory hierarchy for modern chip multi-processors (CMPs), these networks face stringent design constraints to provide fast communication with tight power budget. Modern NoC's first-order concern is clearly its latency, while we also find that internal bandwidth of its routers is relatively plentiful; thus, we present a low latency router design utilizing a technique we call "multicast within a router" or McRouter, which allows productive utilization of remaining bandwidth inside a NoC router. McRouter allows a single cycle transfer of flits which shortens the communication latency when there is enough remaining bandwidth within the router. The key idea is to transmit a header flit to all possible output ports (multicast) so that it is always transmitted to the correct output port without relying on route computation. In addition, we find it is affordable with marginal power overhead while still being a stand-alone design by maintaining portability and modularity (unlike look-ahead routing based designs). Our evaluation with application traffic shows that McRouter helps achieving system speed-ups of 1.28, 1.17 and 1.05 over the conventional router (CR), the VSA router (VSAR) and the prediction router (PR), respectively.

DOI： 10.1109/PACT.2013.6618828

Web of Science

researchmap

その他リンク： http://doi.ieeecomputersociety.org/10.1109/PACT.2013.6618828
Coordinated Power-Performance Optimization in Manycores. 査読

Hiroshi Sasaki, Satoshi Imamura, Koji Inoue

In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT) 51 - 61 2013年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/PACT.2013.6618803

researchmap

その他リンク： http://doi.ieeecomputersociety.org/10.1109/PACT.2013.6618803
Scalability-Based Manycore Partitioning 査読

Hiroshi Sasaki, Teruo Tanimoto, Koji Inoue, Hiroshi Nakamura

In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) 107 - 116 2012年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a number of active processes in the computer system and the continuing development of multi-threaded applications benefit from the growing core counts to sustain system throughput. This trend brings us a situation where a number of parallel applications simultaneously being executed on a single system. Since multi-threaded applications try to maximize its throughput by utilizing the whole system, each of them usually create equal or larger number of threads compared to underlying logical core counts. This introduces much greater number of threads to be co-scheduled in the entire system. However, each program has different characteristics (or scalability) and contends for shared resources, which are the CPU cores and memory hierarchies, with each other. Therefore, it is clear that OS thread scheduling will play a major role in achieving high system performance under such conditions. We develop a sophisticated scheduler that (1) dynamically predicts the scalability of programs via the use of hardware performance monitoring units, (2) decides the optimal number of cores to be allocated for each program, and (3) allocates the cores to programs while maximizing the system utilization to achieve fair and maximum performance. The evaluation results on a 4S-core AMD Opteron system show improvements over the Linux scheduler for a variety of multiprogramming workloads.

DOI： 10.1145/2370816.2370833

Web of Science

researchmap
Practical Byte-Granular Memory Blacklisting using Califorms. 査読

Hiroshi Sasaki, Miguel A. Arroyo, M. Tarek Ibn Ziad, Koustubha Bhat, Kanad Sinha, Simha Sethumadhavan

CoRR abs/1906.01838 2019年

　詳細を見る

掲載種別：研究論文（学術雑誌）

researchmap
Evaluating Energy-Efficiency of DRAM Channel Interleaving Schemes for Multithreaded Programs 査読

Satoshi Imamura, Yuichiro Yasui, Koji Inoue, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa

IEICE Transactions on Information and Systems E101D ( 9 ) 2247 - 2257 2018年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality (RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5% on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively.

DOI： 10.1587/transinf.2017EDP7296

Web of Science

researchmap
Power-Efficient Breadth-First Search with DRAM Row Buffer Locality-Aware Address Mapping 査読

Satoshi Imamura, Yuichiro Yasui, Koji Inoue, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa

In 2016 High Performance Graph Data Management and Processing Workshop (HPGDMP) 17 - 24 2017年1月

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

© 2016 IEEE. Graph analysis applications have been widely used in real services such as road-traffic analysis and social network services. Breadth-first search (BFS) is one of the most representative algorithms for such applications; therefore, many researchers have tuned it to maximize performance. On the other hand, owing to the strict power constraints of modern HPC systems, it is necessary to improve power efficiency (i.e., performance per watt) when executing BFS. In this work, we focus on the power efficiency of DRAM and investigate the memory access pattern of a state-of-the-art BFS implementation using a cycle-accurate processor simulator. The results reveal that the conventional address mapping schemes of modern memory controllers do not efficiently exploit row buffers in DRAM. Thus, we propose a new scheme called per-row channel interleaving and improve the DRAM power efficiency by 30.3% compared to a conventional scheme for a certain simulator setting. Moreover, we demonstrate that this proposed scheme is effective for various configurations of memory controllers.

DOI： 10.1109/HPGDMP.2016.010

Scopus

researchmap
Enhanced Dependence Graph Model for Critical Path Analysis on Modern Out-of-Order Processors. 査読

Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Hiroshi Sasaki

IEEE Computer Architecture Letters. 16 ( 2 ) 111 - 114 2017年

　詳細を見る

掲載種別：研究論文（学術雑誌）

DOI： 10.1109/LCA.2017.2684813

researchmap
Heavy Tails in Program Structure. 査読

Hiroshi Sasaki, Fang-Hsiang Su, Teruo Tanimoto, Simha Sethumadhavan

IEEE Computer Architecture Letters. 16 ( 1 ) 34 - 37 2017年

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：Institute of Electrical and Electronics Engineers Inc.

Designing and optimizing computer systems require deep understanding of the underlying system behavior. Historically many important observations that led to the development of essential hardware and software optimizations were driven by empirical observations about program behavior. In this paper, we report an interesting property of program structures by viewing dynamic program execution as a changing network. By analyzing the communication network created as a result of dynamic program execution, we find that communication patterns follow heavy-tailed distributions. In other words, a few instructions have consumers that are orders of magnitude larger than most instructions in a program. Surprisingly, these heavy-tailed distributions follow the iconic power law previously seen in man-made and natural networks. We provide empirical measurements based on the SPEC CPU2006 benchmarks to validate our findings as well as perform semantic analysis of the source code to reveal the causes of such behavior.

DOI： 10.1109/LCA.2016.2574350

Scopus

researchmap
Mitigating Power Contention: A Scheduling Based Approach. 査読

Hiroshi Sasaki, Alper Buyuktosunoglu, Augusto Vega, Pradip Bose

IEEE Computer Architecture Letters. 16 ( 1 ) 60 - 63 2017年

　詳細を見る

掲載種別：研究論文（学術雑誌）

DOI： 10.1109/LCA.2016.2572080

researchmap
A Runtime Optimization Selection Framework to Realize Energy Efficient Networks-on-Chip 査読

Yuan He, Masaaki Kondo, Takashi Nakada, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

IEICE Transactions on Information and Systems E99D ( 12 ) 2881 - 2890 2016年12月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

Networks-on-Chip (or NoCs, for short) play important roles in modern and future multi-core processors as they are highly related to both performance and power consumption of the entire chip. Up to date, many optimization techniques have been developed to improve NoC's bandwidth, latency and power consumption. But a clear answer to how energy efficiency is affected with these optimization techniques is yet to be found since each of these optimization techniques comes with its own benefits and overheads while there are also too many of them. Thus, here comes the problem of when and how such optimization techniques should be applied. In order to solve this problem, we build a runtime framework to throttle these optimization techniques based on concise performance and energy models. With the help of this framework, we can successfully establish adaptive selections over multiple optimization techniques to further improve performance or energy efficiency of the network at runtime.

DOI： 10.1587/transinf.2016PAP0026

Web of Science

researchmap
A scalability analysis of many cores and on-chip mesh networks on the TILE-Gx platform 査読

Ye Liu, Hiroshi Sasaki, Shinpei Kato, Masato Edahiro

In Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) 46 - 52 2016年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

TILE-Gx processors that have emerged in recent years can be considered as the representative of prevailing manycore processors. The available TILE-Gx processors are featured with directory-based cache coherence protocol, two-dimensional mesh networks and up to 72 on-chip cores. In this paper, we study and analyze problems of performance scalability and network collision of many-core processors using the TILE-Gx36 processor.
We find that most multi-threaded programs from the PARSEC benchmark suite, which aim at shared-memory on-chip processors, cannot scale well on Linux as the number of cores increases. Meanwhile, applications compiled with Pthreads get affected by the approach of task-to-core assignment. The results also show that current multi-threaded applications do not entirely utilize the hardware resources on TILE-Gx36 processor. Moreover, OS designers might need to pay attention to the memory allocation if memory stripping is not supported. Because huge memory accesses to only one memory controller can burden the twodimensional mesh network. This observation appears if cores access the further memory controllers intensively as well.

DOI： 10.1109/MCSoC.2016.40

Web of Science

researchmap
Runtime Multi-Optimizations for Energy Efficient On-chip Interconnections 査読

Yuan He, Masaaki Kondo, Takashi Nakada, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

In Proceedings of the 33nd IEEE International Conference on Computer Design (ICCD) 455 - 458 2015年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

On-chip interconnection (or NoC) is a major performance and power contributor to modern and future multicore processors. So far, many optimization techniques have been developed to improve its bandwidth, latency and power consumption. But it is not clear how energy efficiency is affected since an optimization technique normally comes with overheads. This paper thus attempts to address when and how such optimization techniques should be applied and tuned to help achieve better energy efficiency. We firstly model the performance and energy impacts of representative NoC optimization techniques. These models help us more easily understand the consequences when applying these optimization techniques and their combinations under different circumstances. Moreover, based on such modeling, we propose and implement an adaptive control over these NoC optimization techniques to improve both performance and energy efficiency of the network. Our results show that, this proposal can achieve an average improvement of 26% and 57% on network performance and energy delay product, respectively.

DOI： 10.1109/ICCD.2015.7357147

Web of Science

researchmap
A Flexible Hardware Barrier Mechanism for Many-Core Processors 査読

Takeshi Soga, Hiroshi Sasaki, Tomoya Hirao, Masaaki Kondo, Koji Inoue

In Proceedings of the 20th Asia and South Pacific Design Automation Conference (ASP-DAC) 61 - 68 2015年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

This paper proposes a new hardware barrier mechanism which offers the flexibility to select which cores should join the synchronization, allowing for executing multiple multi-threaded applications by dividing a many-core processor into several groups. Experimental results based on an RTL simulation show that our hardware barrier achieves a 66-fold reduction in latency over typical software based implementations, with a hardware overhead of the processor of only 1.8%. Additionally, we demonstrate that the proposed mechanism is sufficiently flexible to cover a variety of core groups with minimal hardware overhead.

DOI： 10.1109/ASPDAC.2015.7058982

Web of Science

researchmap
Power-Capped DVFS and Thread Allocation with ANN Models on Modern NUMA Systems. 査読

Satoshi Imamura, Hiroshi Sasaki, Koji Inoue, Dimitrios S. Nikolopoulos

In Proceedings of the 32nd IEEE International Conference on Computer Design (ICCD) 324 - 331 2014年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/ICCD.2014.6974701

researchmap
SMYLEref: A Reference Architecture for Manycore-Processor SoCs 招待査読

Masaaki Kondo, S. T. Nguyen, Tomoya Hirao, Takeshi Soga, Hiroshi Sasaki, Koji Inoue

In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC) 561 - 564 2013年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

Nowadays, the trend of developing micro-processor with tens of cores brings a promising prospect for embedded systems. Realizing a high performance and low power many-core processor is becoming a primary technical challenge. We are currently developing a many-core processor architecture for embedded systems as a part of a NEDO's project. This paper introduces the many-core architecture called SMYLEref along whit the concept of Virtual Accelerator on Many-core, in which many cores on a chip are utilized as a hardware platform for realizing multiple virtual accelerators. We are developing its prototype system with off-the-shelf FPGA evaluation boards. In this paper, we introduce the architecture of SMYLEref and the detail of the prototype system. In addition, several initial experiments with the prototype system are also presented.

DOI： 10.1109/ASPDAC.2013.6509656

Web of Science

researchmap
Line Sharing Cache: Exploring Cache Capacity with Frequent Line Value Locality 査読

Keitarou Oka, Hiroshi Sasaki, Koji Inoue

In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC) 669 - 674 2013年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

This paper proposes a new last level cache architecture called line sharing cache (LSC), which can reduce the number of cache misses without increasing the size of the cache memory. It stores lines which contain the identical value in a single line entry, which enables to store greater amount of lines. Evaluation results show performance improvements of up to 35% across a set of SPEC CPU2000 benchmarks.

DOI： 10.1109/ASPDAC.2013.6509677

Web of Science

researchmap
Predict-More Router: A Low Latency NoC Router with More Route Predictions. 査読

Yuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

In Proceedings of the 2013 IEEE International Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), Communication Architecture for Scalable Systems (CASS) 842 - 850 2013年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE Computer Society

Network-on-Chip (NoC) is a critical part of the memory hierarchy of emerging multicores. Lowering its communication latency while preserving its bandwidth is key to achieving high system performance. By now, one of the most effective methods helps achieving this goal is prediction router (PR). PR works by predicting the route an incoming packet may be transferred to and it speculatively allocates resources (virtual channels and the switch crossbar) to the packet and traverses the packet's flits using this predicted route in a single cycle without waiting for route computation
however, if prediction misses, the packet will then be processed in the conventional pipeline (in our work, four cycles) and the speculatively allocated router resources will be wasted. Obviously, prediction accuracy contributes to the amount of successful predictions, latency reduction and bandwidth consumption. We find that predictions hit around 65% for most applications even under the best algorithm so in such cases PR can at most accelerate about 65% of the packets while the left 35% will consume extra router resources and bandwidth. In order to increase the prediction accuracy, we propose a technique, which makes use of multiple prediction algorithms at the same time for one incoming packet. Such a prediction is more accurate. With this proposal, we design and implement predict-more router (PmR). While effectively increasing the prediction accuracy, PmR also helps utilizing remaining bandwidth within the router more productively. When both PmR and PR are evaluated under their best algorithm(s), we find that PmR is over 15% higher in prediction accuracy than PR, which helps PmR outperform PR by 3.5% on average in speeding-up the system. We also find that although PmR creates more contentions in prediction, these contentions can be well resolved and are kept within the router so both router internal bandwidth and link bandwidth are not exacerbated with it. © 2013 IEEE.

DOI： 10.1109/IPDPSW.2013.40

Scopus

researchmap
Power and Performance of GPU-Accelerated Systems: A Closer Look. 査読

Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, Martin Peres

In Proceedings of the 2013 IEEE International Symposium on Workload Characterization (IISWC) 109 - 110 2013年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/IISWC.2013.6704675

researchmap
Power and Performance Analysis of GPU-Accelerated Systems. 査読

Yuki Abe, Hiroshi Sasaki, Martin Peres, Koji Inoue, Kazuaki J. Murakami, Shinpei Kato

In 2012 Workshop on Power-Aware Computing and Systems (HotPower) 2012年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

researchmap
Performance Evaluation of 3D Stacked Multi-Core Processors with Temperature Consideration. 査読

Takaaki Hanada, Hiroshi Sasaki, Koji Inoue, Kazuaki J. Murakami

In Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC) 1 - 5 2012年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/3DIC.2012.6263025

researchmap
Energy-Efficient Dynamic Instruction Scheduling Logic Through Instruction Grouping 査読

Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

IEEE Transactions on Very Large Scale Integration Systems (TVLSI) 17 ( 6 ) 848 - 852 2009年6月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Dynamic instruction scheduling logic is quite complex and dissipates significant energy in microprocessors that support superscalar and out-of-order execution. We propose a novel microarchitectural technique to reduce the complexity and energy consumption of the dynamic instruction scheduling logic. The proposed method groups several instructions as a single issue unit and reduces the required number of ports and the size of the structure. This paper describes the microarchitecture mechanisms and shows evaluation results for energy savings and performance. These results reveal that the proposed technique can greatly reduce energy with almost no performance degradation, compared to the conventional dynamic instruction scheduling logic.

DOI： 10.1109/TVLSI.2009.2013397

Web of Science

researchmap
Power-Performance Modeling of Heterogeneous Cluster-Based Web Servers. 査読

Hiroshi Sasaki, Takatsugu Oya, Masaaki Kondo, Hiroshi Nakamura

In Proceedings of the 2009 20th IEEE/ACM International Conference on Grid Computing (Grid) 35 ( 1 ) 225 - 231 2009年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/GRID.2009.5353057

researchmap
Cooperative Shared Resource Access Control for Low-Power Chip Multiprocessors 査読

Noriko Takagi, Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

In Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED) 177 - 182 2009年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ASSOC COMPUTING MACHINERY

In a single-chip multiprocessor (CMP), the last-level cache and its lower memory hierarchy components are typically shared by multiple processors. Conflicts in these resources lead to poor overall performance of the CMP and/or unpredictable performance of the individual cores. If applications on different; cores have different performance constraints, even though these constraints can be satisfied by dynamic voltage and frequency scaling (DVFS) control of each core, conflicts in shared resources will lead to increased power consumption. Therefore, in the present paper, we derive a condition whereby, under resource conflicts, the total power consumption is minimized by a newly developed power consumption model and propose a method by which to minimize the power consumption of CMPs by cooperative access control of multiple shared resources and DVFS control. Experimental results reveal that the proposed technique can reduce power consumption by 15% on average in a dual-core CMP and by 13% in a quad-core CMP, as compared to the case in which only DVFS control is applied.

DOI： 10.1145/1594233.1594278

Web of Science

researchmap
Improving Fairness, Throughput and Energy-Efficiency on a Chip Multiprocessor through DVFS. 査読

Masaaki Kondo, Hiroshi Sasaki, Hiroshi Nakamura

SIGARCH Computer Architecture News 35 ( 1 ) 31 - 38 2007年

　詳細を見る

掲載種別：研究論文（学術雑誌）

DOI： 10.1145/1241601.1241609

researchmap
An Intra-Task DVFS Technique Based on Statistical Analysis of Hardware Events. 査読

Hiroshi Sasaki, Yoshimichi Ikeda, Masaaki Kondo, Hiroshi Nakamura

In Proceedings of the 4th ACM International Conference on Computing Frontiers (CF) 123 - 130 2007年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1145/1242531.1242551

researchmap
Energy-Efficient Dynamic Instruction Scheduling Logic through Instruction Grouping. 査読

Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

In Proceedings of the 2006 ACM International Symposium on Low Power Electronics and Design (ISLPED) 43 - 48 2006年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1145/1165573.1165585

researchmap
Dynamic Instruction Cascading on GALS Microprocessors 査読

Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

In 2005 International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) 30 - 39 2005年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

As difficulty and the costs of distributing a single global clock throughout a processor is growing generation by generation, Globally-Asynchronous Locally-Synchronous (GALS) designs are an alternative approach to the conventional synchronous processors.
In this paper, we propose Dynamic Instruction Cascading (DIC). DIC is a technique to execute two dependent instructions in one cycle by scaling down the clock frequency. Lowering the clock frequency enables the signal to reach farther, thereby computing two instructions in one cycle becomes possible. DIC is effectively applied to GALS processors because lowering only the clock frequency of the target domain is needed and therefore unwanted performance degradation will be prevented.
The results showed average performance improvement of 7% on SPEC CPU2000 Integer and MediaBench applications when assuming that DIC is possible by lowering the clock frequency to 80%.

DOI： 10.1007/11556930_4

Web of Science

researchmap

▼全件表示

共同研究・競争的資金等の研究課題

ネットワーク分析を用いた低コストで柔軟な高信頼メモリアーキテクチャ

研究課題/領域番号：22K19771 2022年 - 2024年

日本学術振興会科学研究費助成事業挑戦的研究(萌芽)

佐々木広

　詳細を見る

担当区分：研究代表者

配分額：6500000円（直接経費：5000000円、間接経費：1500000円）

researchmap
RISC-Vシステム設計プラットフォームの研究開発

2021年 - 2024年

新エネルギー・産業技術総合開発機構 (NEDO) 高効率・高速処理を可能とするAIチップ・次世代コンピューティングの技術開発／研究開発項目④AIエッジコンピューティングの産業応用加速のための技術開発

　詳細を見る

担当区分：研究分担者

researchmap
エネルギーセキュアな計算機システムの研究

研究課題/領域番号：26700004 2014年 - 2018年

日本学術振興会科学研究費助成事業若手研究(A)

佐々木広

　詳細を見る

担当区分：研究代表者

配分額：4290000円（直接経費：3300000円、間接経費：990000円）

researchmap
安全・安定かつ省電力な計算機システムを実現するハード・ソフトウェア協調技術の研究

2014年 - 2016年

日本学術振興会人材育成事業海外特別研究員

佐々木広

　詳細を見る

担当区分：研究代表者

researchmap
統計的モデリングに基づくチップマルチプロセッサの動的最適化に関する研究

研究課題/領域番号：21700054 2009年 - 2010年

日本学術振興会科学研究費助成事業若手研究(B)

佐々木広

　詳細を見る

配分額：4420000円（直接経費：3400000円、間接経費：1020000円）

複数のプロセッサコアを1チップに搭載するチップマルチプロセッサ(CMP:chip multiprocessor)の問題である共有リソースの競合を考慮し、動的電源電圧制御(DVFS)手法と併用することによる、動的な低消費電力化手法を提案した。本手法はCMP上での実行をモデル化し数式で表現し、電力を最小化する協調制御を導出した。評価の結果、従来手法と比較し、要求性能を維持しつつ、消費電力を削減可能なことが分かった。

researchmap