Faculty Profiles - SASAKI HIROSHI

写真a

SASAKI HIROSHI

Organization

School of Engineering Associate Professor

Contact information

Homepage

https://hiroshi-sasaki.github.io/index-en.html

External link

Degree

博士 ( 2008.3 東京大学 )

Research Areas

Informatics / Computer system / Computer Architecture
Informatics / Information security / Computer Security

Education

The University of Tokyo

- 2008.3

　 More details

researchmap
The University of Tokyo

- 2005.3

　 More details

researchmap
The University of Tokyo

- 2003.3

　 More details

researchmap

Research History

Tokyo Institute of Technology Associate Professor

2020.4

　 More details

Country：Japan

researchmap
Department of Computer Science, Columbia University Associate Research Scientist

2016.4 - 2020.3

　 More details

Country：United States

researchmap
Department of Computer Science, Columbia University Visiting Research Scientist

2014.4 - 2016.3

　 More details

Country：United States

researchmap
IBM T. J. Watson Research Center Visiting Research Scientist

2013.7 - 2014.3

　 More details

Country：United States

researchmap
Kyushu University Research Associate Professor

2011.8 - 2014.3

　 More details

Country：Japan

researchmap
The University of Tokyo Research Assistant Professor

2010.4 - 2011.7

　 More details

Country：Japan

researchmap
The University of Tokyo Research Assistant Professor

2008.4 - 2010.3

　 More details

Country：Japan

researchmap

▼display all

Papers

RAPLET: Demystifying Publish/Subscribe Latency for ROS Applications Reviewed

Keisuke Nishimura, Takahiro Ishikawa, Hiroshi Sasaki, Shinpei Kato

In Proceedings of the IEEE 27th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA) 41 - 50 2021

　More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/rtcsa52859.2021.00013

researchmap
Practical Byte-Granular Memory Blacklisting using Califorms. Reviewed

Hiroshi Sasaki, Miguel A. Arroyo, M. Tarek Ibn Ziad, Koustubha Bhat, Kanad Sinha, Simha Sethumadhavan

In Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO) 558 - 571 2019

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3352460.3358299

researchmap
Why Do Programs Have Heavy Tails? Reviewed

Hiroshi Sasaki, Fang-Hsiang Su, Teruo Tanimoto, Simha Sethumadhavan

In Proceedings of the 2017 IEEE International Symposium on Workload Characterization (IISWC) 135 - 145 2017

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IISWC.2017.8167771

researchmap
Characterization and Mitigation of Power Contention across Multiprogrammed Workloads Reviewed

Hiroshi Sasaki, Alper Buyuktosunoglu, Augusto Vega, Pradip Bose

In Proceedings of the 2016 IEEE International Symposium on Workload Characterization (IISWC) 55 - 64 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

Shared resource contention has been a major performance issue for CMPs. In this paper we focus on power, which is one of the most valuable shared resources of CMPs. We believe it is important to study power contention, especially with the prevalence of power capping features among modern commercial microprocessors. When multiple processes compete for power in such systems, the power management system attempts to mitigate the contention (i.e., reduce the power consumption) by slowing down the processor, which results in degraded total system performance. We characterize this phenomenon using a real testbed with an Intel processor with power capping capability realized by the RAPL technology. We observe noticeable performance degradation for SPEC CPU2006, especially at tighter power caps. In order to solve this problem, we develop a shared resource-aware scheduling algorithm that improves system performance by mitigating the contention for power and the shared memory subsystem at the same time. Evaluation results across a variety of multiprogrammed workloads show performance improvements over a state-of-the-art scheduling policy which only considers memory subsystem contention. In addition, we present a guard mechanism implemented on top of the proposed scheduler that greatly improves performance when there is severe power contention that introduces performance anomalies.

DOI： 10.1109/IISWC.2016.7581266

Web of Science

researchmap
Power and Performance Characterization and Modeling of GPU-Accelerated Systems. Reviewed

Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, Martin Peres

In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS) 113 - 122 2014

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IPDPS.2014.23

researchmap
McRouter: Multicast within a Router for High Performance Network-on-Chips Reviewed

Yuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT) 319 - 329 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

The inevitable advent of the multi-core era has driven an increasing demand for low latency on-chip interconnection networks (or NoCs). Being a critical part of the memory hierarchy for modern chip multi-processors (CMPs), these networks face stringent design constraints to provide fast communication with tight power budget. Modern NoC's first-order concern is clearly its latency, while we also find that internal bandwidth of its routers is relatively plentiful; thus, we present a low latency router design utilizing a technique we call "multicast within a router" or McRouter, which allows productive utilization of remaining bandwidth inside a NoC router. McRouter allows a single cycle transfer of flits which shortens the communication latency when there is enough remaining bandwidth within the router. The key idea is to transmit a header flit to all possible output ports (multicast) so that it is always transmitted to the correct output port without relying on route computation. In addition, we find it is affordable with marginal power overhead while still being a stand-alone design by maintaining portability and modularity (unlike look-ahead routing based designs). Our evaluation with application traffic shows that McRouter helps achieving system speed-ups of 1.28, 1.17 and 1.05 over the conventional router (CR), the VSA router (VSAR) and the prediction router (PR), respectively.

DOI： 10.1109/PACT.2013.6618828

Web of Science

researchmap

Other Link： http://doi.ieeecomputersociety.org/10.1109/PACT.2013.6618828
Coordinated Power-Performance Optimization in Manycores. Reviewed

Hiroshi Sasaki, Satoshi Imamura, Koji Inoue

In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT) 51 - 61 2013

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/PACT.2013.6618803

researchmap

Other Link： http://doi.ieeecomputersociety.org/10.1109/PACT.2013.6618803
Scalability-Based Manycore Partitioning Reviewed

Hiroshi Sasaki, Teruo Tanimoto, Koji Inoue, Hiroshi Nakamura

In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) 107 - 116 2012

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a number of active processes in the computer system and the continuing development of multi-threaded applications benefit from the growing core counts to sustain system throughput. This trend brings us a situation where a number of parallel applications simultaneously being executed on a single system. Since multi-threaded applications try to maximize its throughput by utilizing the whole system, each of them usually create equal or larger number of threads compared to underlying logical core counts. This introduces much greater number of threads to be co-scheduled in the entire system. However, each program has different characteristics (or scalability) and contends for shared resources, which are the CPU cores and memory hierarchies, with each other. Therefore, it is clear that OS thread scheduling will play a major role in achieving high system performance under such conditions. We develop a sophisticated scheduler that (1) dynamically predicts the scalability of programs via the use of hardware performance monitoring units, (2) decides the optimal number of cores to be allocated for each program, and (3) allocates the cores to programs while maximizing the system utilization to achieve fair and maximum performance. The evaluation results on a 4S-core AMD Opteron system show improvements over the Linux scheduler for a variety of multiprogramming workloads.

DOI： 10.1145/2370816.2370833

Web of Science

researchmap
Practical Byte-Granular Memory Blacklisting using Califorms. Reviewed

Hiroshi Sasaki, Miguel A. Arroyo, M. Tarek Ibn Ziad, Koustubha Bhat, Kanad Sinha, Simha Sethumadhavan

CoRR abs/1906.01838 2019

　More details

Publishing type：Research paper (scientific journal)

researchmap
Evaluating Energy-Efficiency of DRAM Channel Interleaving Schemes for Multithreaded Programs Reviewed

Satoshi Imamura, Yuichiro Yasui, Koji Inoue, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa

IEICE Transactions on Information and Systems E101D ( 9 ) 2247 - 2257 2018.9

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality (RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5% on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively.

DOI： 10.1587/transinf.2017EDP7296

Web of Science

researchmap
Power-Efficient Breadth-First Search with DRAM Row Buffer Locality-Aware Address Mapping Reviewed

Satoshi Imamura, Yuichiro Yasui, Koji Inoue, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa

In 2016 High Performance Graph Data Management and Processing Workshop (HPGDMP) 17 - 24 2017.1

　More details

Publishing type：Research paper (international conference proceedings)

© 2016 IEEE. Graph analysis applications have been widely used in real services such as road-traffic analysis and social network services. Breadth-first search (BFS) is one of the most representative algorithms for such applications; therefore, many researchers have tuned it to maximize performance. On the other hand, owing to the strict power constraints of modern HPC systems, it is necessary to improve power efficiency (i.e., performance per watt) when executing BFS. In this work, we focus on the power efficiency of DRAM and investigate the memory access pattern of a state-of-the-art BFS implementation using a cycle-accurate processor simulator. The results reveal that the conventional address mapping schemes of modern memory controllers do not efficiently exploit row buffers in DRAM. Thus, we propose a new scheme called per-row channel interleaving and improve the DRAM power efficiency by 30.3% compared to a conventional scheme for a certain simulator setting. Moreover, we demonstrate that this proposed scheme is effective for various configurations of memory controllers.

DOI： 10.1109/HPGDMP.2016.010

Scopus

researchmap
Enhanced Dependence Graph Model for Critical Path Analysis on Modern Out-of-Order Processors. Reviewed

Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Hiroshi Sasaki

IEEE Computer Architecture Letters. 16 ( 2 ) 111 - 114 2017

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/LCA.2017.2684813

researchmap
Heavy Tails in Program Structure. Reviewed

Hiroshi Sasaki, Fang-Hsiang Su, Teruo Tanimoto, Simha Sethumadhavan

IEEE Computer Architecture Letters. 16 ( 1 ) 34 - 37 2017

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers Inc.

Designing and optimizing computer systems require deep understanding of the underlying system behavior. Historically many important observations that led to the development of essential hardware and software optimizations were driven by empirical observations about program behavior. In this paper, we report an interesting property of program structures by viewing dynamic program execution as a changing network. By analyzing the communication network created as a result of dynamic program execution, we find that communication patterns follow heavy-tailed distributions. In other words, a few instructions have consumers that are orders of magnitude larger than most instructions in a program. Surprisingly, these heavy-tailed distributions follow the iconic power law previously seen in man-made and natural networks. We provide empirical measurements based on the SPEC CPU2006 benchmarks to validate our findings as well as perform semantic analysis of the source code to reveal the causes of such behavior.

DOI： 10.1109/LCA.2016.2574350

Scopus

researchmap
Mitigating Power Contention: A Scheduling Based Approach. Reviewed

Hiroshi Sasaki, Alper Buyuktosunoglu, Augusto Vega, Pradip Bose

IEEE Computer Architecture Letters. 16 ( 1 ) 60 - 63 2017

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/LCA.2016.2572080

researchmap
A Runtime Optimization Selection Framework to Realize Energy Efficient Networks-on-Chip Reviewed

Yuan He, Masaaki Kondo, Takashi Nakada, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

IEICE Transactions on Information and Systems E99D ( 12 ) 2881 - 2890 2016.12

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

Networks-on-Chip (or NoCs, for short) play important roles in modern and future multi-core processors as they are highly related to both performance and power consumption of the entire chip. Up to date, many optimization techniques have been developed to improve NoC's bandwidth, latency and power consumption. But a clear answer to how energy efficiency is affected with these optimization techniques is yet to be found since each of these optimization techniques comes with its own benefits and overheads while there are also too many of them. Thus, here comes the problem of when and how such optimization techniques should be applied. In order to solve this problem, we build a runtime framework to throttle these optimization techniques based on concise performance and energy models. With the help of this framework, we can successfully establish adaptive selections over multiple optimization techniques to further improve performance or energy efficiency of the network at runtime.

DOI： 10.1587/transinf.2016PAP0026

Web of Science

researchmap
A scalability analysis of many cores and on-chip mesh networks on the TILE-Gx platform Reviewed

Ye Liu, Hiroshi Sasaki, Shinpei Kato, Masato Edahiro

In Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) 46 - 52 2016

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

TILE-Gx processors that have emerged in recent years can be considered as the representative of prevailing manycore processors. The available TILE-Gx processors are featured with directory-based cache coherence protocol, two-dimensional mesh networks and up to 72 on-chip cores. In this paper, we study and analyze problems of performance scalability and network collision of many-core processors using the TILE-Gx36 processor.
We find that most multi-threaded programs from the PARSEC benchmark suite, which aim at shared-memory on-chip processors, cannot scale well on Linux as the number of cores increases. Meanwhile, applications compiled with Pthreads get affected by the approach of task-to-core assignment. The results also show that current multi-threaded applications do not entirely utilize the hardware resources on TILE-Gx36 processor. Moreover, OS designers might need to pay attention to the memory allocation if memory stripping is not supported. Because huge memory accesses to only one memory controller can burden the twodimensional mesh network. This observation appears if cores access the further memory controllers intensively as well.

DOI： 10.1109/MCSoC.2016.40

Web of Science

researchmap
Runtime Multi-Optimizations for Energy Efficient On-chip Interconnections Reviewed

Yuan He, Masaaki Kondo, Takashi Nakada, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

In Proceedings of the 33nd IEEE International Conference on Computer Design (ICCD) 455 - 458 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

On-chip interconnection (or NoC) is a major performance and power contributor to modern and future multicore processors. So far, many optimization techniques have been developed to improve its bandwidth, latency and power consumption. But it is not clear how energy efficiency is affected since an optimization technique normally comes with overheads. This paper thus attempts to address when and how such optimization techniques should be applied and tuned to help achieve better energy efficiency. We firstly model the performance and energy impacts of representative NoC optimization techniques. These models help us more easily understand the consequences when applying these optimization techniques and their combinations under different circumstances. Moreover, based on such modeling, we propose and implement an adaptive control over these NoC optimization techniques to improve both performance and energy efficiency of the network. Our results show that, this proposal can achieve an average improvement of 26% and 57% on network performance and energy delay product, respectively.

DOI： 10.1109/ICCD.2015.7357147

Web of Science

researchmap
A Flexible Hardware Barrier Mechanism for Many-Core Processors Reviewed

Takeshi Soga, Hiroshi Sasaki, Tomoya Hirao, Masaaki Kondo, Koji Inoue

In Proceedings of the 20th Asia and South Pacific Design Automation Conference (ASP-DAC) 61 - 68 2015

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

This paper proposes a new hardware barrier mechanism which offers the flexibility to select which cores should join the synchronization, allowing for executing multiple multi-threaded applications by dividing a many-core processor into several groups. Experimental results based on an RTL simulation show that our hardware barrier achieves a 66-fold reduction in latency over typical software based implementations, with a hardware overhead of the processor of only 1.8%. Additionally, we demonstrate that the proposed mechanism is sufficiently flexible to cover a variety of core groups with minimal hardware overhead.

DOI： 10.1109/ASPDAC.2015.7058982

Web of Science

researchmap
Power-Capped DVFS and Thread Allocation with ANN Models on Modern NUMA Systems. Reviewed

Satoshi Imamura, Hiroshi Sasaki, Koji Inoue, Dimitrios S. Nikolopoulos

In Proceedings of the 32nd IEEE International Conference on Computer Design (ICCD) 324 - 331 2014

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICCD.2014.6974701

researchmap
SMYLEref: A Reference Architecture for Manycore-Processor SoCs Invited Reviewed

Masaaki Kondo, S. T. Nguyen, Tomoya Hirao, Takeshi Soga, Hiroshi Sasaki, Koji Inoue

In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC) 561 - 564 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

Nowadays, the trend of developing micro-processor with tens of cores brings a promising prospect for embedded systems. Realizing a high performance and low power many-core processor is becoming a primary technical challenge. We are currently developing a many-core processor architecture for embedded systems as a part of a NEDO's project. This paper introduces the many-core architecture called SMYLEref along whit the concept of Virtual Accelerator on Many-core, in which many cores on a chip are utilized as a hardware platform for realizing multiple virtual accelerators. We are developing its prototype system with off-the-shelf FPGA evaluation boards. In this paper, we introduce the architecture of SMYLEref and the detail of the prototype system. In addition, several initial experiments with the prototype system are also presented.

DOI： 10.1109/ASPDAC.2013.6509656

Web of Science

researchmap
Line Sharing Cache: Exploring Cache Capacity with Frequent Line Value Locality Reviewed

Keitarou Oka, Hiroshi Sasaki, Koji Inoue

In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC) 669 - 674 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

This paper proposes a new last level cache architecture called line sharing cache (LSC), which can reduce the number of cache misses without increasing the size of the cache memory. It stores lines which contain the identical value in a single line entry, which enables to store greater amount of lines. Evaluation results show performance improvements of up to 35% across a set of SPEC CPU2000 benchmarks.

DOI： 10.1109/ASPDAC.2013.6509677

Web of Science

researchmap
Predict-More Router: A Low Latency NoC Router with More Route Predictions. Reviewed

Yuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

In Proceedings of the 2013 IEEE International Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), Communication Architecture for Scalable Systems (CASS) 842 - 850 2013

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE Computer Society

Network-on-Chip (NoC) is a critical part of the memory hierarchy of emerging multicores. Lowering its communication latency while preserving its bandwidth is key to achieving high system performance. By now, one of the most effective methods helps achieving this goal is prediction router (PR). PR works by predicting the route an incoming packet may be transferred to and it speculatively allocates resources (virtual channels and the switch crossbar) to the packet and traverses the packet's flits using this predicted route in a single cycle without waiting for route computation
however, if prediction misses, the packet will then be processed in the conventional pipeline (in our work, four cycles) and the speculatively allocated router resources will be wasted. Obviously, prediction accuracy contributes to the amount of successful predictions, latency reduction and bandwidth consumption. We find that predictions hit around 65% for most applications even under the best algorithm so in such cases PR can at most accelerate about 65% of the packets while the left 35% will consume extra router resources and bandwidth. In order to increase the prediction accuracy, we propose a technique, which makes use of multiple prediction algorithms at the same time for one incoming packet. Such a prediction is more accurate. With this proposal, we design and implement predict-more router (PmR). While effectively increasing the prediction accuracy, PmR also helps utilizing remaining bandwidth within the router more productively. When both PmR and PR are evaluated under their best algorithm(s), we find that PmR is over 15% higher in prediction accuracy than PR, which helps PmR outperform PR by 3.5% on average in speeding-up the system. We also find that although PmR creates more contentions in prediction, these contentions can be well resolved and are kept within the router so both router internal bandwidth and link bandwidth are not exacerbated with it. © 2013 IEEE.

DOI： 10.1109/IPDPSW.2013.40

Scopus

researchmap
Power and Performance of GPU-Accelerated Systems: A Closer Look. Reviewed

Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, Martin Peres

In Proceedings of the 2013 IEEE International Symposium on Workload Characterization (IISWC) 109 - 110 2013

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/IISWC.2013.6704675

researchmap
Power and Performance Analysis of GPU-Accelerated Systems. Reviewed

Yuki Abe, Hiroshi Sasaki, Martin Peres, Koji Inoue, Kazuaki J. Murakami, Shinpei Kato

In 2012 Workshop on Power-Aware Computing and Systems (HotPower) 2012

　More details

Publishing type：Research paper (international conference proceedings)

researchmap
Performance Evaluation of 3D Stacked Multi-Core Processors with Temperature Consideration. Reviewed

Takaaki Hanada, Hiroshi Sasaki, Koji Inoue, Kazuaki J. Murakami

In Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC) 1 - 5 2012

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/3DIC.2012.6263025

researchmap
Energy-Efficient Dynamic Instruction Scheduling Logic Through Instruction Grouping Reviewed

Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

IEEE Transactions on Very Large Scale Integration Systems (TVLSI) 17 ( 6 ) 848 - 852 2009.6

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Dynamic instruction scheduling logic is quite complex and dissipates significant energy in microprocessors that support superscalar and out-of-order execution. We propose a novel microarchitectural technique to reduce the complexity and energy consumption of the dynamic instruction scheduling logic. The proposed method groups several instructions as a single issue unit and reduces the required number of ports and the size of the structure. This paper describes the microarchitecture mechanisms and shows evaluation results for energy savings and performance. These results reveal that the proposed technique can greatly reduce energy with almost no performance degradation, compared to the conventional dynamic instruction scheduling logic.

DOI： 10.1109/TVLSI.2009.2013397

Web of Science

researchmap
Power-Performance Modeling of Heterogeneous Cluster-Based Web Servers. Reviewed

Hiroshi Sasaki, Takatsugu Oya, Masaaki Kondo, Hiroshi Nakamura

In Proceedings of the 2009 20th IEEE/ACM International Conference on Grid Computing (Grid) 35 ( 1 ) 225 - 231 2009

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/GRID.2009.5353057

researchmap
Cooperative Shared Resource Access Control for Low-Power Chip Multiprocessors Reviewed

Noriko Takagi, Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

In Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED) 177 - 182 2009

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ASSOC COMPUTING MACHINERY

In a single-chip multiprocessor (CMP), the last-level cache and its lower memory hierarchy components are typically shared by multiple processors. Conflicts in these resources lead to poor overall performance of the CMP and/or unpredictable performance of the individual cores. If applications on different; cores have different performance constraints, even though these constraints can be satisfied by dynamic voltage and frequency scaling (DVFS) control of each core, conflicts in shared resources will lead to increased power consumption. Therefore, in the present paper, we derive a condition whereby, under resource conflicts, the total power consumption is minimized by a newly developed power consumption model and propose a method by which to minimize the power consumption of CMPs by cooperative access control of multiple shared resources and DVFS control. Experimental results reveal that the proposed technique can reduce power consumption by 15% on average in a dual-core CMP and by 13% in a quad-core CMP, as compared to the case in which only DVFS control is applied.

DOI： 10.1145/1594233.1594278

Web of Science

researchmap
Improving Fairness, Throughput and Energy-Efficiency on a Chip Multiprocessor through DVFS. Reviewed

Masaaki Kondo, Hiroshi Sasaki, Hiroshi Nakamura

SIGARCH Computer Architecture News 35 ( 1 ) 31 - 38 2007

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1145/1241601.1241609

researchmap
An Intra-Task DVFS Technique Based on Statistical Analysis of Hardware Events. Reviewed

Hiroshi Sasaki, Yoshimichi Ikeda, Masaaki Kondo, Hiroshi Nakamura

In Proceedings of the 4th ACM International Conference on Computing Frontiers (CF) 123 - 130 2007

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/1242531.1242551

researchmap
Energy-Efficient Dynamic Instruction Scheduling Logic through Instruction Grouping. Reviewed

Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

In Proceedings of the 2006 ACM International Symposium on Low Power Electronics and Design (ISLPED) 43 - 48 2006

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/1165573.1165585

researchmap
Dynamic Instruction Cascading on GALS Microprocessors Reviewed

Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura

In 2005 International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) 30 - 39 2005

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

As difficulty and the costs of distributing a single global clock throughout a processor is growing generation by generation, Globally-Asynchronous Locally-Synchronous (GALS) designs are an alternative approach to the conventional synchronous processors.
In this paper, we propose Dynamic Instruction Cascading (DIC). DIC is a technique to execute two dependent instructions in one cycle by scaling down the clock frequency. Lowering the clock frequency enables the signal to reach farther, thereby computing two instructions in one cycle becomes possible. DIC is effectively applied to GALS processors because lowering only the clock frequency of the target domain is needed and therefore unwanted performance degradation will be prevented.
The results showed average performance improvement of 7% on SPEC CPU2000 Integer and MediaBench applications when assuming that DIC is possible by lowering the clock frequency to 80%.

DOI： 10.1007/11556930_4

Web of Science

researchmap

▼display all

Research Projects

ネットワーク分析を用いた低コストで柔軟な高信頼メモリアーキテクチャ

Grant number：22K19771 2022 - 2024

日本学術振興会科学研究費助成事業挑戦的研究(萌芽)

佐々木広

　 More details

Authorship：Principal investigator

Grant amount：\6500000 （ Direct Cost: \5000000 、 Indirect Cost：\1500000 ）

researchmap
RISC-Vシステム設計プラットフォームの研究開発

2021 - 2024

新エネルギー・産業技術総合開発機構 (NEDO) 高効率・高速処理を可能とするAIチップ・次世代コンピューティングの技術開発／研究開発項目④AIエッジコンピューティングの産業応用加速のための技術開発

　 More details

Authorship：Coinvestigator(s)

researchmap
エネルギーセキュアな計算機システムの研究

Grant number：26700004 2014 - 2018

日本学術振興会科学研究費助成事業若手研究(A)

佐々木広

　 More details

Authorship：Principal investigator

Grant amount：\4290000 （ Direct Cost: \3300000 、 Indirect Cost：\990000 ）

researchmap
安全・安定かつ省電力な計算機システムを実現するハード・ソフトウェア協調技術の研究

2014 - 2016

Japan Society for the Promotion of Science Overseas Research Fellowships

　 More details

Authorship：Principal investigator

researchmap
Dynamic optimization of CMPs based on statistical analysis

Grant number：21700054 2009 - 2010

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B)

SASAKI Hiroshi

　 More details

Grant amount：\4420000 （ Direct Cost: \3400000 、 Indirect Cost：\1020000 ）

In a chip multiprocessor (CMP) architecture, multiple cores usually share resources in the memory hierarchy including the last-level cache, the memory bus, and the DRAM memory banks. We derive the condition where the total CPU power consumption becomes minimum by constructing a power consumption model under resource conflicts, and propose a novel dynamic optimization method to minimize the power consumption by a cooperative access control to multiple shared resource with DVFS.

researchmap