Publications

FY2026

Refereed Papers

Chen Zhuang, Lingqi Chang, B. Brock, Du Wu, Peng Chen. Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication. In Proceedings of ACM International Conference on Supercomputing (ICS 2026), pp. 422-435, Belfast, July 6-9, 2026. (accepted)
[DOI: 10.1145/3797905.3807833 ] [Conference]

Wengang Li, Lingqi Zhang, Toshio Endo, Mohamed Wahib. Understanding Cross-layer Contributions to Mixture-of-Experts Routing in LLMs. In Proceedings of The Fourteenth International Conference on Learning Representations (ICLR 2026), Rio de Janeiro, April 23-27, 2026.
[Conference] [OpenReview]

FY2025

Refereed Papers

Lingqi Zhang, Tengfei Wang, Jiajun Huang, Chen Zhuang, Ivan R. Ivanov, Peng Chen, Toshio Endo, Mohamed Wahib. FRUGAL: Pushing GPU Applications beyond Memory Limits. In Proceedings of the International Symposium on Code Generation and Optimization (CGO 2026), pp. 188-201, Sydney, January 31-February 4, 2026.
[DOI: 10.1109/CGO68049.2026.11395210 ] [Conference]

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit. Optimizing Intra-Layer Parallel Communication for LLM Training on Systems with Fully-Connected Mesh GPU Topology . In Proceedings of The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2026), pp. 328-339, Osaka, January 26-29, 2026.
[DOI: 10.1145/3773656.3773675 ] [Conference]

Shohei Minami, Toshio Endo, Akihiro Nomura, Hiroki Ohtsuji, Jun Kato, Mashiro Miwa, Eiji Yoshida. Physical System Study on Balancing Interactive and Batch Job Performance through Oversubscribing Scheduling . 6th Combined Workshop on Interactive and Urgent High Performance Computing (CIW-IUS), In Proceedings of SC Workshops '25, pp. 2137-2145, Saint Louis, November 21, 2025.
[DOI: 10.1145/3731599.3767472 ]

Ivan R Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert. Dynamic Thread Coarsening for CPU and GPU OpenMP Code . LLVM-HPC2025: The 11th Workshop on the LLVM Compiler Infrastructure in HPC, In Proceedings of SC Workshops '25, pp. 1066-1074, Saint Louis, November 17, 2025.
[DOI: 10.1145/3731599.3767482 ]

Chen Zhuang, Lingqi Chang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers. In Proceedings of ACM International Conference on Supercomputing (ICS 2025), pp. 57-72, Salt Lake City, June 8-11, 2025.
[DOI: 10.1145/3721145.3730422 ] [Conference]

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit. An Optimization Technique for Hiding Communication Costs in 3D Parallel Training of Deep Learning . In Proceedings of the 25th IEEE international Symposium on Cluster, Cloud and Internet Computing (CCGrid 2025), pp. 472-481, Tromso, May 19-22, 2025.
[DOI: 10.1109/CCGRID64434.2025.00044 ] [Symposium] [Best Paper Award]

Unrefereed Papers

Chen Zhuang, Lingqi Zhang, Benjamin Brock, Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication ． arXiv:2512.20178 [cs.DC], December 2025.

Chen Zhuang, Emmanuel Jeannot, Toshio Endo, Mohamed Wahib. Topology-aware Process Mapping for Distributed Graph Convolutional Network Training . Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2025), IPSJ SIG Technical Report, 2025-HPC-200, No.16, Takamatsu, August 4-5, 2025.

Du Wu, Enzhi Thang, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. High-performance high-througput high-resolution X-ray CT reconstruction . Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2025), IPSJ SIG Technical Report, 2025-HPC-200, No.36, Takamatsu, August 4-5, 2025.

Invited Talks

Toshio Endo. (Status Report) Supercomputing Research Center/Center for Information Infrastructure, Institute of Science Tokyo. Vision and Strategy: How will supercomputing centers contribute to the future development of HPC/AI+?, Invited Session at SCA/HPC Asia 2026, Osaka, January 28, 2026.

Toshio Endo. TSUBAME4.0: More of Everyone's Supercomputer toward Future Computing. The 9th ISM-ISCT-NII-ZIB-NUS-MODAL Workshop on Optimization and Machine Learning for Data Science and Future Computing, Tokyo, September 29, 2025.

Poster Presentations

Muyao Xiao, Ivan R. Ivanov, Jens Domke, Toshio Endo. Bridge Over Troubled Water: Offloading OpenMP Regions to XLA via StableHLO ． The 25th The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2026), poster session, P-252, Osaka, January 26-29, 2026.

FY2024

Refereed Papers

Ivan R. Ivanov, Jens Domke, Toshio Endo, and Johannes Doerfert. Automatic Parallelization and OpenMP Offloading of Fortran Array Notation. In Proceedings of 20th International Workshop on OpenMP (IWOMP 2024), LNCS 15195, pp. 197-209, Perth, Sep 23-25, 2024.
[DOI: 10.1007/978-3-031-72567-8_13]

Du Wu, Peng Chen, Xiao Wang, Issac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. Real-time High-resolution X-Ray Computed Tomography. In Proceedings of ACM International Conference on Supercomputing (ICS 2024), pp. 110-123, Kyoto, June 4-7, 2024.
[DOI: 10.1145/3650200.3656634] [Conference]

Toshio Endo, Shohei Minami, Akihiro Nomura, Hiroki Ohtsuji, Jun Kato, Masahiro Miwa, Eiji Yoshida, Tomoya Yuki, and Ryuichi Sakamoto. Challenges in Computing Resource Sharing towards Next-Gen Interactive Accelerated HPC. Third Combined Workshop on Interactive and Urgent High-Performance Computing (CIW-IUS), in conjunction with ISC24, High Performance Computing. ISC High Performance 2024 International Workshops, LNCS 15058, Springer, pp. 231-242, Hamburg, May 16, 2024.
[DOI: 10.1007/978-3-031-73716-9_16] [Workshop]

Unrefereed Papers

Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. SuperGCN: General and Scalable Framework for GCN Training on CPU-powered Supercomputers ． arXiv:2411.16025 [cs.DC], November 2024.

Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. Leveraging GPUDirect Storage for Efficient Image Reconstruction ． Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report, 2024-HPC-195, No.5, Tokushima, August 8-9, 2024.

Chen Zhuang, Peng Chen, Xin Liu, Rio Yokota, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. High-performance Graph Convolutional Networks Training on Fugaku and ABCI Supercomputers ． Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report, 2024-HPC-195, No.14, Tokushima, August 8-9, 2024.

Tengfei Wang, Lingqi Chang, Ivan Ivanov, Peng Chen, Toshio Endo, Mohamed Wahib. FRUGAL: Reducing GPU Memory Requirement of HPC Applications ． Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2024), IPSJ SIG Technical Report, 2024-HPC-195, No.27, Tokushima, August 8-9, 2024.

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit. An optimization pass for training speed-up and strategy search in 3D parallelism ． IPSJ SIG Technical Report, 2024-HPC-194, No.7, Yokohama, May 8, 2024.

Poster Presentations

Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, and Edouard Audit. An optimization pass for training speed-up and strategy search in 3D parallelism ． 2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) poster session, Kobe, Sep 24-27, 2024.

Chen Zhuang, Peng Chen, Xin Liu, Toshio Endo, Satoshi Matsuoka, and Mohamed Wahib. Communication Optimization for Distributed GCN Training on ABCI Supercomputer ． 2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) poster session, Kobe, Sep 24-27, 2024.

Lingqi Zhang, Ryan Barton, Peng Chen, Xiao Wang, Toshio Endo, Satoshi Matsuoka, and Mohamed Wahib. Investigating Nvidia GPU Architecture Trends via Microbenchmarks ． 2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) poster session, Kobe, Sep 24-27, 2024.

Du Wu, Peng Chen, Yiyu Tan, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka, and Mohamed Wahib. Asynchronous I/O Optimization for X-ray Imaging via GPUDirect Storage ． 2024 IEEE International Conference on Cluster Computing (CLUSTER 2024) poster session, Kobe, Sep 24-27, 2024.

FY2023

Refereed Papers

Ivan Radanov Ivanov, Oleksandr Zinenko, Jens Domke, Toshio Endo, William S. Moses. Retargeting and Respecializing GPU Workloads for Performance Portability. In Proceedings of the International Symposium on Code Generation and Optimization (CGO 2024), pp. 119-132, Edinburgh, March 2-6, 2024.
[DOI: 10.1109/CGO57630.2024.10444828] [Conference]

Ivan Radanov Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert. Automatic Parallelization and OpenMP Offloading of Fortran. In Proceedings of LLVM Performance Workshop, in conjuction with CGO 2024, Edinburgh, March 2, 2024.
[Workshop]

Futa Kambe, Toshio Endo. Accelerating Stencil Computations on a GPU by Combining Using Tensor Cores and Temporal Blocking. In Proceedings of the Workshop on General Purpose Processing using GPU (GPGPU 2024), in conjunction with PPoPP 2024, 6pages, Edinburgh, March 2, 2024.
[DOI: 10.1145/3649411.3649412] [Workshop]

Ryubu Hosoki, Toshio Endo, Takahiro Hirofuchi and Tsutomu Ikegami. AshPipe: Asynchronous Hybrid Pipeline Parallel for DNN Training. In Proceedings of The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2024), pp. 117-126, Nagoya, January 25-27, 2024.
[DOI: 10.1145/3635035.3635045] [Conference]

Shohei Minami, Toshio Endo, Akihiro Nomura. The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System ． In Proceedings of IEEE High Performance Extreme Computing Conference (HPEC 2023), Virtual, September 23-27, 2023.
[DOI: 10.1109/HPEC58863.2023.10363580] [Conference]

Chenyu Wang, Toshio Endo, Takahiro Hirofuchi and Tsutomu Ikegami. Pyramid Swin Transformer for Multi-Task: Expanding to More Computer Vision Tasks. In Proceedings of Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS 2023), Springer, LNCS Vol. 14124, pp. 53-65, Kumamoto, August 21-22, 2023.
[DOI: 10.1007/978-3-031-45382-3_5] [Conference]

Hayato Fujita, Akihiro Nomura, Toshio Endo, Masakazu Sekijima. Enhancing the Performance of AlphaFold Through Modified Storage Method and Optimization of HHblits on TSUBAME3.0 Supercomputer ． 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE), July 24, 2023.
[DOI: 10.1109/csce60160.2023.00351]

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka. PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. In Proceedings of ACM International Conference on Supercomputing (ICS 2023), pp. 167-179, Orlando, June 21-23, 2023.
[DOI: 10.1145/3577193.3593705] [Conference]

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka. Revisiting Temporal Blocking Stencil Optimizations. In Proceedings of ACM International Conference on Supercomputing (ICS 2023), pp. 251-263, Orlando, June 21-23, 2023.
[DOI: 10.1145/3577193.3593716] [Conference]

Unrefereed Papers

Du Wu, Peng Chen, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. High Throughput 3D Image Reconstruction with GPUDirect and Tensor Core ． IPSJ SIG Technical Report, 2024-HPC-193, No.25, 9 pages, March 18-19, 2024.

Chen Zhuang, Peng Chen, Xin Liu, Satoshi Matsuoka, Toshio Endo, Mohamed Wahib. Scalable Training of Graph Convolutional Networks on Supercomputers ． Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2023), IPSJ SIG Technical Report, 2023-HPC-190, No.19, 10 pages, August 2-4, 2023.

Lingqi Zhang, Mohamed Wahib, Peng Chen, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka. High-performance Temporal Blocking Stencils at Low GPU Occupancy ． Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2023), IPSJ SIG Technical Report, 2023-HPC-190, No.26, 10 pages, August 2-4, 2023.

Poster Presentations

Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. Optimizing Matrix Multiplication on Arm Architectures ． The 6th R-CCS International Symposium, poster session, January 29-30, 2024.

Chen Zhuang, Peng Chen, Xin Liu, Toshio Endo, Mohamed Wahib. General and Scalable Framework for GCN Training on CPU-powered Supercomputers ． The 6th R-CCS International Symposium, poster session, January 29-30, 2024.

Shohei Minami, Toshio Endo, Akihiro Nomura. The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System ． The cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG 2023), poster session, August 2-4, 2023.

FY2022

Refereed Papers

Shohei Minami, Toshio Endo, Akihiro Nomura. Effectiveness of the Oversubscribing Scheduling on Supercomputer Systems. In Proceedings of High Performance Computing in the Asia-Pacific Region (HPC ASIA), pp. 18-28, Singapore, February 2023.
[DOI: 10.1145/3578178.3578221] [Conference]

William S. Moses, Ivan Radanov Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko. High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs. In Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2023), pp. 119-134, Montreal, February 2023.
[DOI: 10.1145/3572848.3577475] [Symposium]

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka. Exploiting Scratchpad Memory for Deep Temporal Blocking. In Proceedings of the 15th Workshop on General Purpose Processing Using GPU (GPGPU 2023), co-located with PPoPP 2023, short paper, Montreal, February 2023.
[DOI: 10.1145/3589236.3589242] [Workshop]

Chenyu Wang, Toshio Endo, Takahiro Hirofuchi and Tsutomu Ikegami. Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5 VISAPP, SciTePress, pp. 583-590, VISAPP 2023, Lisbon (hybrid), February 2023.
[DOI: 10.5220/0011675800003417] [Conference]

Hiroki Aikawa, Toshio Endo, Tomoya Yuki, Takahiro Hirofuchi, Tsutomu Ikegami. Efficient Stencil Computation with Temporal Blocking by Halide DSL. In Proceedings of 20th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp. 870-877, online, December 2022.
[DOI: 10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00116] [Conference]

Chenyu Wang, Toshio Endo, Takahiro Hirofuchi and Tsutomu Ikegami. Speed-up Single Shot Detector on GPU with CUDA. In Proceedings of 23rd ACIS International Summer Virtual Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD2022-Summer), Kyoto (online), Studies in Computational Intelligence, vol 1074. Springer, pp. 89-106, July 2022.
[DOI: 10.1007/978-3-031-19604-1_7] [Conference]

Unrefereed Papers

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka. Breaking the Memory Bottleneck for Iterative Memory-bound Applications Via Persistent Kernels ． IPSJ SIG Technical Report, 2022-HPC-187, No.18, 10 pages, December 2022.

William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko. High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs ． arXiv:2207.00257 [cs.PL], July 2022.

FY2021

Refereed Papers

Shohei Minami, Toshio Endo and Akihiro Nomura. Measurement and Modeling of Performance of HPC Applications towards Overcommitting Scheduling Systems . In Proceedings of 24th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2021), in Conjunction with IPDPS 2021, pp. 59-79, Portland (online), May 2021.
[DOI: 10.1007/978-3-030-88224-2_4] [Springer site] [slides@JSSPP site]

Unrefereed Papers

Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hiroki Kobayashi, Shin Matsushima, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuke Shimotoku, Daisuke Miyamoto,Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa, Ikki Fujiwara, Yusuke Tanimura, Takayuki Aoki, Toshio Endo, Satoshi Ohshima, Keiichiro Fukazawa, Susumu Date, Toshihiro Uchibayashi. mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations ． arXiv:2203.14188 [cs.LG], March 2022.

FY2020

Poster Presentations

Ivan R. Ivanov, Jens Domke, Akihiro Nomura and Toshio Endo. Improved failover for HPC interconnects through localised routing restoration ． The 3rd R-CCS International Symposium, poster session, Feb 2021.

Shohei Minami, Toshio Endo, Akihiro Nomura. Performance Modeling of HPC Applications on Overcommitted Systems ． HPC Asia 2021, poster session, Jan 2021.

FY2019

Refereed Papers

Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka. AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs ． In Proceedings of International Symposium on Code Generation and Optimization (CGO 2020), pp. 199-211, San Diego, Feb 2020.
[DOI: 10.1145/3368826.3377904] [ACM digital library]

Toshio Endo. Integrating Cache Oblivious Approach with Modern Processor Architecture: The Case of Floyd-Warshall Algorithm. In Proceedings of HPC Asia 2020, Fukuoka, Jan 2020.
[DOI: 10.1145/3368474.3368477] [ACM digital library] [paper] [slides]

Unrefereed Papers

Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo. Profiling based Out-of-core Hybrid Method for Large Neural Networks ． arXiv:1907.05013 [cs.LG], July 2019.

Poster Presentations

Tomoya Yuki, Toshio Endo. Toward Latency-Aware Data Arrangement on Many-Core Processors ． HPC Asia 2020, poster session, No. 51, Fukuoka, Jan 2020.
[abstract]

FY2018

Refereed Papers

Yukinori Sato, Tomoya Yuki, and Toshio Endo. An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation. ACM Transactions on Architecture and Code Optimization (TACO). Volume 15, Issue 4, Article No. 67, 23 pages. Jan 2019.
[DOI:10.1145/3293449] [ACM library]

Ryo Matsumiya, Toshio Endo. Scalable RMA-based Communication Library Featuring Node-local NVMs. In Proceedings of 2018 IEEE High Performance Extreme Computing Conference(HPEC 2018), 7 pages. Sep 2018,
[DOI:10.1109/HPEC.2018.8547546] [IEEE library] [paper]

Toshio Endo. Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy. In Proceedings of the 7th IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA 2018), pp.19-24. Aug 2018.
[DOI: 10.1109/NVMSA.2018.00016] [IEEE library] [paper] [slides]

Book Chapters

Toshio Endo, Hiroko Midorikawa, Yukinori Sato. Software Technology That Deals with Deeper Memory Hierarchy in Post-petascale Era. Advanced Software Technologies for Post-Peta Scale Computing, Mitsuhisa Sato (Ed), Springer, pp. 227-248, Jan 2019.
[ISBN: 978-981-13-1923-5, 978-981-13-1924-2 (online)] [DOI: 10.1007/978-981-13-1924-2] [Springer link]

Poster Presentations

Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo. Profiling based out-of-core hybrid method for large neural networks ． the 24th ACM Symposium on Principles and Practice of Parallel Programming, poster session, Washington DC, Feb 2019.
[DOI: 10.1145/3293883.3298790] [ACM library]

FY2017

Refereed Papers

Noboru Tanabe and Toshio Endo. Characterizing Memory-Latency Sensitivity of Sparse Matrix Kernels. 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2018), pp. 249-254, Cambridge, March 2018.
[DOI: 10.1109/PDP2018.2018.00042]

Noboru Tanabe and Toshio Endo. Exhaustive Evaluation of Memory-Latency Sensitivity on Manycore Processors with Large Cache. 2018 2nd International Conference on High Performance Compilation, Computing and Communications (HP3C-2018), pp. 27-34, Hong Kong, March 2018.
[DOI: 10.1145/3195612.3195616]

Yuki Ito, Ryo Matsumiya, and Toshio Endo. ooc_cuDNN: Accommodating Convolutional Neural Networks over GPU Memory Capacity. In Proceedings of 2017 IEEE International Conference on Big Data (IEEE BigData 2017), pp. 183-192, Boston, December 2017.
[DOI: 10.1109/BigData.2017.8257926] [IEEE digital library]

Shota Kuroda, Toshio Endo, Satoshi Matsuoka. Applying Temporal Blocking with a Directive-based Approach. In Proceedings of Fourth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), in conjuntion with SC17, Article No. 8, Denver, November 13, 2017.
[DOI: 10.1145/3148173.3148190] [ACM digital library] [paper] [slides]

Takashi Shimokawabe, Toshio Endo, Naoyuki Onodera, Takayuki Aoki. A Stencil Framework to Realize Large-scale Computations Beyond Device Memory Capacity on GPU Supercomputers. In Proceedings of IEEE International Conference on Cluster Computing (CLUSTER 2017), pp. 525-529, Honolulu, September 2017.
[DOI: 10.1109/CLUSTER.2017.97]

Yukinori Sato and Toshio Endo. An Accurate Simulator of Cache-line Conflicts to Exploit the Underlying Cache Performance. In Proceedings of 23rd International European Conference on Parallel and Distributed Computing (Euro-par 2017), pp. 119-133, Santiago, Spain, August 2017.
[DOI: 10.1007/978-3-319-64203-1_9]

Yukinori Sato, Tomoya Yuki and Toshio Endo. ExanaDBT: A Dynamic Compilation System for Transparent Polyhedral Optimizations at Runtime. In Proceedings of ACM International Conference on Computing Frontiers 2017, 10pages, Siena, May 2017.
[DOI: 10.1145/3075564.3077627]

Articles

Satoshi Matsuoka, Toshio Endo, Akira Nukada, Shinichi Miura, Akihiro Nomura, Hitoshi Sato, Hideyuki Jitsumoto, Aleksandr Drozd. Overview of TSUBAME3.0, Green Cloud Supercomputer for Convergence of HPC, AI and Big-Data . Global Scientific Information and Computing Center, Tokyo Institute of Technology, e-Science Journal, Vol. 16, pp. 2--9, November 2017.

Poster Presentations

Yuki Ito, Ryo Matsumiya, and Toshio Endo. ooc cuDNN: A Deep Learning Library Supporting CNNs over GPU Memory capacity. International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia2018) Poster Session. Tokyo, January 2018.

Ryo Matsumiya, and Toshio Endo. vGASNet: A PGAS Communication Library Supporting Out-of-Core Processing. International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia2018) Poster Session. Tokyo, January 2018.

Tomoya Yuki, Yukinori Sato, and Toshio Endo. Evaluating Autotuning Heuristics for Loop Tiling. International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia2018) Poster Session. Tokyo, January 2018.

Yuki Ito, Ryo Matsumiya, and Toshio Endo. ooc_cuDNN : A Deep Learning Library Supporting CNNs over GPU Memory Capacity. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC17), Research Poster Session. Denver, November 2017.

FY2016

Refereed Papers

Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, Takatsugu Ono. Evaluating the Impacts of Code-Level Performance Tunings on Power Efficiency. In Proceedings of IEEE International Conference on Big Data (BigData 2016), 6pages, Dec 2016.
[DOI: 10.1109/BigData.2016.7840624] [IEEE digital library]

Ryo Matsumiya, Toshio Endo. PGAS Communication Runtime for Extreme Large Data Computation. In Proceedings of Second International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), in conjunction with IEEE/ACM SC16, 8pages, Saltlake City, November 18, 2016.
[DOI: 10.1109/ESPM2.2016.007] [ACM digital library]

Toshio Endo. Realizing Out-of-Core Stencil Computations using Multi-Tier Memory Hierarchy on GPGPU Clusters . In Proceedings of IEEE Cluster Computing (CLUSTER2016), pp. 21-29, Taipei, Sep 2016.
[DOI: 10.1109/CLUSTER.2016.61] [paper] [slides]

Katsuki Fujisawa, Toyotaro Suzumura, Hitoshi Sato, Koji Ueno, Yuichiro Yasui, Keita Iwabuchi, Toshio Endo. Advanced Computing & Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers. Fujisawa, Katsuki, Shinano, Yuji, and Waki, Hayato (eds.), Optimization in the Real World - Toward Solving Real-World Optimization Problems -, Series of Mathematics for Industry, Springer, pp. 1-13, 2016.
[DOI:10.1007/978-4-431-55420-2_1]

Invited Papers

Satoshi Matsuoka, Hideharu Amano, Kengo Nakajima, Koji Inoue, Tomohiro Kudoh, Naoya Maruyama, Kenjiro Taura, Takeshi Iwashita, Takahiro Katagiri, Toshihiro Hanawa, Toshio Endo. From FLOPS to BYTES: Disruptive Change in High-Performance Computing towards the Post-Moore Era . In Proceedings of the ACM International Conference on Computing Frontiers (CF'16), pp. 274-281, May 2016.
[DOI: 10.1145/2903150.2906830] [ACM digital library]

Poster Presentations

Takashi Shimokawabe, Toshio Endo, Naoyuki Onodera, Takayuki Aoki. Performance Evaluation of Wind Simulation Based on a GPU-computing Framework to Realize Large-scale Stencil Computations Beyond Device Memory Capacity. The 7th AICS International Symposium, Poster session, Kobe, Feb 2017.

FY2015

Refereed Papers

Yukinori Sato, Toshio Endo. Dynamic Compilation for Transparent Data Locality Analysis and Memory Subsystem Tuning . The International Workshop on Architectural and Micro-Architectural Support for Dynamic Optimization (AMAS-DO), In conjunction with CGO 2016, Barcelona, March 13, 2016.

Shimpei Sato, Yukinori Sato, Toshio Endo. A Cache-aware Temporal Blocking Method for 3D Stencil Computation . 3rd International Workshop on High-Performance Stencil Computations (HiStencils 2016), In conjunction with HiPEAC 2016, Prague, January 18, 2016.

Toshio Endo, Yuki Takasaki, Satoshi Matsuoka. Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers . In Proceedings of The 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS 2015), pp. 625-632, Melbourne, December, 2015.
[DOI: 10.1109/ICPADS.2015.84] [IEEE digital library] [paper] [slides]

Yuki Tsujita, Toshio Endo, Katsuki Fujisawa. The Scalable Petascale Data-Driven Approach for the Cholesky Factorization with Multiple GPUs. In Proceedings of First International Workshop on Extreme Scale Programming Models and Middleware (ESPM2 2015), in conjunction with IEEE/ACM SC15, Austin, November 15, 2015.
[DOI: 10.1145/2832241.2832245] [paper] [slides]

Yukinori Sato, Shimpei Sato, Toshio Endo. Exana: An Execution-driven Application Analysis Tool for Assisting Productive Performance Tuning. In Proceedings of The Second Workshop on Software Engineering for Parallel Systems (SEPS), in conjunction with ACM SPLASH 2015, Pittsburgh, October 27, 2015.
[DOI: 10.1145/2837476.2837477] [ACM digital library]

Shimpei Sato, Yukinori Sato, Toshio Endo. Investigating Potential Performance Benefits of Memory Layout Optimization based on Roofline Model. In Proceedings of The Second Workshop on Software Engineering for Parallel Systems (SEPS), in conjunction with ACM SPLASH 2015, Pittsburgh, October 27, 2015.
[DOI: 10.1145/2837476.2837483] [ACM digital library]

Naoto Sasaki, Kento Sato, Toshio Endo, Satoshi Matsuoka. Exploration of Lossy Compression for Application-level Checkpoint/Restart. In Proceedings of IEEE International Conference on Parallel and Distributed Processing Symposium 2015 (IPDPS2015), pp. 914-922, Hyderabad, May 2015.
[DOI:10.1109/IPDPS.2015.67] [IEEE digital library]

Yuki Tsujita, Toshio Endo. Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition. In Proceedings of Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), in conjunction with IPDPS 2015, Hyderabad, May 2015.
[JSSPP15 site]

Kazuki Tsuzuku, Toshio Endo. Power Capping of CPU-GPU Heterogeneous Systems Using Power and Performance Models. In Proceedings of International Conference on Smart Cities and Green ICT Systems (SMARTGREENS2015), pp. 226-233, Lisbon, May 2015.
[DOI: 10.5220/0005445102260233] [IEEE digital library]

Invited Talks

Toshio Endo. Harnessing Multi-tier Memory Hierarchy of GPU, Host and Flash. 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing, Taipei, February 20, 2016.

Posters

Kazuki Tsuzuku, Toshio Endo. Online Power Capping of CPU-GPU Heterogeneous Systems, GPU Technology Conference Japan (GTC Japan), poster session, Tokyo, September 18, 2015.

Guanghao Jin, Toshio Endo. High Productive Framework to Enable Stencil Computation on Bigger Domains on TSUBAME2.5 , GPU Technology Conference Japan (GTC Japan), poster session, Tokyo, September 18, 2015.

Guanghao Jin，Toshio Endo． Efficient Utilization of GPU Cluster Resource for Stencil Computation. IPSJ HPCS 2015 symposium， Poster session, Tokyo, May 19, 2015.

FY2014

Refereed Papers

Guanghao Jin, James Lin, Toshio Endo. Efficient Utilization of Memory Hierarchy to Enable the Computation on Bigger Domains for Stencil Computation in CPU-GPU Based Systems . In Proceedings of IEEE International Conference on High Performance Computing and Applications (ICHPCA-2014), 6 pages, Bhubaneswar, December, 2014.
[DOI:10.1109/ICHPCA.2014.7045354]

Toshio Endo, Akira Nukada, Satoshi Matsuoka. TSUBAME-KFC: a Modern Liquid Submersion Cooling Prototype towards Exascale Becoming the Greenest Supercomputer in the World . In Proceedings of The 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014), pp.360-367, Hsinchu, December, 2014.
[DOI:10.1109/PADSW.2014.7097829] [paper] [slides]

Toshio Endo, Guanghao Jin. Software Technologies Coping with Memory Hierarchy of GPGPU Clusters for Stencil Computations . In Proceedings of IEEE Cluster Computing (CLUSTER2014), pp.132-139, Madrid, September 25, 2014.
[DOI:10.1109/CLUSTER.2014.6968747] [paper] [slides]

Hiroko Midorikawa, Hideyuki Tan, Toshio Endo. An Evaluation of the Potential of Flash SSD as Large and Slow Memory for Stencil Computations . In Proceedings of The 2014 International Conference on High Performance Computing & Simulation (HPCS 2014), Bologna, Italy, July 24, 2014.
[DOI: 10.1109/HPCSim.2014.6903695]

Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki. Peta-scale General Solver for Semidefinite Programming Problems with over Two Million Constraints . In Proceedings of IEEE International Conference on Parallel and Distributed Processing Symposium 2014 (IPDPS2014), pp.1171-1180, Phoenix, USA, May 22, 2014.
[DOI:10.1109/IPDPS.2014.121]

Invited Talks

Toshio Endo. [Plenary Talk] Harnessing Memory Hierarchy towards Extreme Fast and Big Simulations. 2015 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing. Taipei, Feb 27, 2015.

Toshio Endo. Experiences with the 5.7Pflop/s System TSUBAME2.5 at Tokyo Tech. HP-CAST 22. Leipzig, Jun 20, 2014.

Articles

Toshio Endo, Akira Nukada, Satoshi Matsuoka. TSUBAME-KFC: the Greenest Supercomputer in the World With Liquid Submersion Cooling . Global Scientific Information and Computing Center, Tokyo Institute of Technology, e-Science Journal, Vol. 11, pp. 2--7, June 2014.

Unrefereed Papers

Tianqi Xu, Jin Guanghao, Endo Toshio, Matsuoka Satoshi. Efficient Utilization of Multi-level Memory System for Stencil Computation, IPSJ SIG Technical Report, 2014-HPC-147 No.10, 7 pages, Otaru, December 2014.

Posters

Kazuki Tsuzuku, Toshio Endo. Power Capping of CPU-GPU Heterogeneous Systems using Power and Performance Models. GPU Technology Conference (GTC 2015), poster session, San Jose, March, 2015.

Toshio Endo, Yukinori Sato, Hiroko Midorikawa. Software Technology that Deals with Deeper Memory Hierarchy in Post-petascale Era. JST/CREST International Symposium on Post Petascale System Software (ISP2S2), poster session, Kobe, December 2, 2014.

Guanghao Jin, Toshio Endo. The Efficient Utilization of Memory Hierarchy on GPU Clusters. JST/CREST International Symposium on Post Petascale System Software (ISP2S2), poster session, Kobe, December 2, 2014.

Guanghao Jin, Toshio Endo. Data Management and Loop Controlling to Surpass Memory Capacity of GPU in OpenACC Framework. GTC Technology Conference Japan，poster session, Tokyo, July 16, 2014. [NVIDIA Award]

Naoto Sasaki, Kento Sato, Toshio Endo and Satoshi Matsuoka. Exploration of Application-level Lossy Compression for Fast Checkpoint/Restart. HPC in Asia poster session, held with ISC'14, Leipzig, June 2014.

Akihiro Nomura, Shin'ichi Miura, Toshio Endo and Satoshi Matsuoka. Application Performance Characterization towards Exa-scale Supercomputers. HPC in Asia poster session, held with ISC'14, Leipzig, June 2014.

Guanghao Jin, Toshio Endo and Satoshi Matsuoka. Efficient Utilization of Memory Hierarchy on GPU Clusters: Optimization Methods and Performance Models. HPC in Asia poster session, held with ISC'14, Leipzig, June 2014.

FY2013

Refereed Papers

Guanghao Jin, Toshio Endo, Satoshi Matsuoka. A Parallel Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPUs . In Proceedings of IEEE Cluster Computing (CLUSTER2013), pp. 1--8, Indianapolis, September 2013.
[DOI: 10.1109/CLUSTER.2013.6702633]

Yukinori Sato, Hiroko Midorikawa, and Toshio Endo. Identifying working data set of particular loop iterations for dynamic performance tuning. In 6th Workshop on Architectural and Microarchitectural Support for Binary Translation (AMAS-BT2013). Held in conjunction with the 40th Int'l Symposium on Computer Architecture (ISCA-40), Tel-Aviv, Israel, pp. 1-6, Jun. 24, 2013.

Guanghao Jin, Toshio Endo, Satoshi Matsuoka. A Multi-level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU . In Proceedings of The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), in conjunction with IEEE IPDPS 2013, pp. 1080--1087, Boston, May 2013.
[DOI: 10.1109/IPDPSW.2013.58]

Unrefereed Papers

Jin Guanghao, Endo Toshio, Matsuoka Satoshi. Multi-level Temporal Blocking for Stencil Computation for Memory Hierarchy on TSUBAME2.5, IPSJ SIG Technical Report, 2014-HPC-143 No.33, 8 pages, Nanao, March 2014.

Posters

Guangho Jin, Tomoki Kawamura, Naoya Maruyama, Toshio Endo, Satoshi Matsuoka. Optimization Methods for Efficient Utilization of Memory Hierarchy on GPU Cluster, GPU Technology Conference (GTC2014), poster session, San Jose, March 2014.

Katsuki Fujisawa, Toshio Endo, Hitoshi Sato, Yuichiro Yasui, Naoki Matsuzawa, Hayato Waki. Peta-Scale General Solver for Semidefinite Programming Problems with Over Two Million Constraints, IEEE/ACM SC13, poster session, Denver, November 2013.

Katsuki Fujisawa, Toshio Endo, Hitoshi Sato, Yuichiro Yasui, Naoki Matsuzawa, Hayato Waki. Peta-scale General Solver for Semidefinite Programming Problems with over Two Million Constraints, GPU Technology Conference Japan (GTC Japan), poster session, Tokyo, June 2013. [NVIDIA Award]

FY2012

Refereed Papers

Katsuki Fujisawa, Toshio Endo, Hitoshi Sato, Makoto Yamashita, Satoshi Matsuoka, Maho Nakata. High-Performance General Solver for Extremely Large-scale Semidefinite Programming Problems. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC12), pp. 1-11. Saltlake City, November 2012.
[DOI: 10.1109/SC.2012.67]

Posters

Keisuke Fukuda, Naoya Maruyama, Toshio Endo, Miquel Pericas, Satoshi Matsuoka. Fast Multipole Method on a Heterogeneous Dynamic Task Scheduling Engine, GPU Technology Conference (GTC), poster session, San Jose, March 2013.

FY2011

Refereed Papers

Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Akinori Yamanaka, Akira Nukada, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka. Peta-scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), pp. 1--11, Seattle, November 2011.
[DOI: 10.1145/2063384.2063388] [ACM Gordon Bell Prize Special Achievements in Scalability and Time-to-Solution]

Massimo Bernaschi, Mauro Bisson, Toshio Endo, Massimiliano Fatica, Satoshi Matsuoka, Simone Melchionna, Sauro Succi. Petaflop Biofluidics Simulations On A Two Million-Core System. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), pp. 1--12, Seattle, November 2011.
[DOI: 10.1145/2063384.2063389]

Shiqiao Du, Takuro Udagawa, Toshio Endo and Masakazu Sekijima. Molecular Dynamics Simulation of a Biomolecule with High Speed, Low Power and Accuracy Using GPU-Accelerated TSUBAME2.0 Supercomputer. In Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2011), Xi'an, October 2011.

Invited Talks

Toshio Endo. TSUBAME2.0: A Petascale GPU-accelerated Supercomputer, The Second International Conference on Networking and Computing (ICNC'11), Tutorial, Osaka, December 2011.

Unrefereed Papers

Irina Demeshko, Satoshi Matsuoka, Toshio Endo. GPU-based approach for elastic-plastic deformation simulation, Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2011), IPSJ SIG Technical Report, 2011-HPC-130 No.12, 7 pages, Kagoshima, August 2011.

FY2010

Refereed Papers

Takashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka. An 80-Fold Speedup, 15.0 TFlops, Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC10), pp.1-11, New Orleans, November 2010.

Hitoshi Nagasaka, Naoya Maruyama, Akira Nukada, Toshio Endo, and Satoshi Matsuoka, Statistical Power Modeling of GPU Kernels Using Performance Counters. Proceedings of International Green Computing Conference (IGCC'10), pp. 115--122, Chicago, IL, USA, Aug 2010.

Toshio Endo, Akira Nukada, Satoshi Matsuoka and Naoya Maruyama. Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators. In Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010), Atlanta, pp.1-8, April 2010. [paper] [slides]

Unrefereed Papers

Nguyen Toan, Hideyuki Jitsumoto, Naoya Maruyama, Tatsuo Nomura, Toshio Endo, Satoshi Matsuoka. MPI-CUDA Applications Checkpointing, Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2010), IPSJ SIG Technical Report, 2010-HPC-126 No.18, 7 pages, Kanazawa, August 2010.

FY2009

Journal Article

Satoshi Matsuoka, Takayuki Aoki, Toshio Endo, Akira Nukada, Toshihiro Kato and Atushi Hasegawa. GPU accelerated computing?from hype to mainstream, the rebirth of vector computing. Journal of Physics: Conference Series, Vol 180, 10 pages, 2009. [DOI: 10.1088/1742-6596/180/1/012043]

Refereed Papers

Tomoaki Hamano, Toshio Endo and Satoshi Matsuoka. Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters. In Proceedings of The Fourth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction with IPDPS 2009, pp.1-8, May 2009.

Hitoshi Sato, Satoshi Matsuoka and Toshio Endo. File Clustering Based Replication Algorithm in a Grid Environment. In Proceedings of IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2009), pp.204-211, May 2009.

Invited Talks

Toshio Endo. Supercomputing on The TSUBAME GPU-Accelerated Cluster, CSIRO GPU Cluster Workshop, Melbourne, June 2009.

FY2008

Refereed Papers

Hideyuki Jitsumoto, Toshio Endo and Satoshi Matsuoka. Environmental-Aware Optimization of MPI Checkpointing Intervals . In Proceedings of HPC ASIA 2009, pp. 285--292, March 2009.

Akira Nukada, Yasuhiko Ogata, Toshio Endo and Satoshi Matsuoka. Bandwidth Intensive 3-D FFT kernel for GPUs using CUDA . In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC08), pp.1-11, November 2008.

Hitoshi Sato, Satoshi Matsuoka, Toshio Endo and Naoya Maruyama. Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid Environment. In Proceedings of IEEE/ACM International Conference on Grid Computing (Grid 2008), pp.250-257, October 2008.

Toshio Endo and Satoshi Matsuoka. Massive Supercomputing Coping with Heterogeneity of Modern Accelerators . In Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2008), pp.1-10, April 2008. [paper] [slides]

Shin'ichiro Takizawa, Toshio Endo and Satoshi Matsuoka. Locality Aware MPI Communication on a Commodity Opto-Electronic Hybrid Network. In Proceedings of Workshop on Large-Scale Parallel Processing (LSPP), in conjunction with IEEE IPDPS 2008, pp.1-8, April 2008.

Yasuhiko Ogata, Toshio Endo, Naoya Maruyama and Satoshi Matsuoka. An Efficient, Model-Based CPU-GPU Heterogeneous FFT Library. In Proceedings of 17th International Heterogeneity in Computing Workshop (HCW '08), in conjunction with IEEE IPDPS 2008, pp.1-10, April 2008.

Yuto Hosogaya, Toshio Endo and Satoshi Matsuoka. Performance Evaluation of Parallel Applications on Next Generation Memory Architecture with Power-Aware Paging Method. In Proceedings of The Fourth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction with IPDPS 2008, pp.1-8, April 2008.

Posters

Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka. Environmental-Aware Optimization of MPI Checkpointing Intervals, IEEE International Conference on Cluster Computing (Cluster 2008), poster session, September 2008.

FY2007

Refereed Papers

Tatsuhiro Chiba, Toshio Endo and Satoshi Matsuoka. High-Performance MPI Broadcast Algorithm for Grid Environments Utilizing Multi-lane NICs. In Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid2007), pp.487--494, May 2007.

Posters

Toshio Endo, Satoshi Matsuoka. A Methodology for Coping with Heterogeneity of Modern Accelerators on a Massive Supercomputing Scale, ACM/IEEE Conference on Supercomputing (High Performance Computing, Networking, Storage and Analysis) (SC07), poster session, November 2007. [poster]

FY2006

Refereed Papers

Hideyuki Jitsumoto, Toshio Endo and Satoshi Matsuoka. ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs. In Proceedings of 12th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems (DPDNS '07), in conjunction with IPDPS 2007, pp.1-8, March 2007.

FY2005 and before

Refereed Papers

Toshio Endo and Kenjiro Taura. Highly Latency Tolerant Gaussian Elimination. In Proceedings of IEEE/ACM International Workshop on Grid Computing (Grid2005), pp. 91--98, November 2005. [paper] [slides]

Toshio Endo, Kenji Kaneda, Kenjiro Taura and Akinori Yonezawa. High Performance LU Factorization for Non-dedicated Clusters. In Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid2004), pp. 678--685, April 2004. [paper] [slides]

Kenjiro Taura, Toshio Endo, Kenji Kaneda, and Akinori Yonezawa. Phoenix : a Parallel Programming Model for Accommodating Dynamically Joining/Leaving Resources. In Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '03), pp.216-229, June 2003.

Toshio Endo and Kenjiro Taura. Reducing Pause Time of Conservative Collectors. In Proceedings of ACM SIGPLAN International Symposium on Memory Management (ISMM2002), Berlin, pp.119-131, June 2002. [paper] [slides]

Toshio Endo, Kenjiro Taura and Akinori Yonezawa. Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors. In Proceedings of 15th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, pp.1-6, April 2001. [paper]

Toshio Endo, Kenjiro Taura and Akinori Yonezawa. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. In Proceedings of ACM/IEEE Conference on Supercomputing (High Performance Networking and Computing) (SC97), San Jose, 14pages, November 1997. [paper]

Theses for Degrees

Toshio Endo. Scalable Dynamic Memory Management Module on Shared Memory Multiprocessors. Ph.D Thesis, Department of Information Science, Faculty of Science, University of Tokyo. June 2001. [paper] [slides in japanese]
NOTE: The paper is written in English, but title page and abstract page include Japanese characters.

Toshio Endo. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. Master thesis, Department of Information Science, The University of Tokyo, February 1998. [paper]
NOTE: The paper is written in English, but title page and abstract page include Japanese characters.

Toshio Endo. A Methodology for Constructing a Portable Garbage Collector on Parallel Machines. Senior thesis, Department of Information Science, The University of Tokyo, February 1996.

[Publications in Japanese]
[Endo lab]
[Endo's page]

Publications

FY2026

Refereed Papers

FY2025

Refereed Papers

Unrefereed Papers

Invited Talks

Poster Presentations

FY2024

Refereed Papers

Unrefereed Papers

Poster Presentations

Other Presentations

FY2023

Refereed Papers

Unrefereed Papers

Poster Presentations

FY2022

Refereed Papers

Unrefereed Papers

FY2021

Refereed Papers

Unrefereed Papers

Other Presentations

FY2020

Poster Presentations

FY2019

Refereed Papers

Unrefereed Papers

Poster Presentations

Other Presentations

FY2018

Refereed Papers

Book Chapters

Poster Presentations

Other Presentations

FY2017

Refereed Papers

Articles

Poster Presentations

Other Presentations

FY2016

Refereed Papers

Invited Papers

Poster Presentations

Other Presentations

FY2015

Refereed Papers

Invited Talks

Other Presentations

Posters

FY2014

Refereed Papers

Invited Talks

Articles

Unrefereed Papers

Other Presentations

Posters

FY2013

Refereed Papers

Unrefereed Papers

Posters

Other Presentations

FY2012

Refereed Papers

Posters

FY2011

Refereed Papers

Invited Talks

Unrefereed Papers

FY2010

Refereed Papers

Unrefereed Papers

FY2009

Journal Article

Refereed Papers

Invited Talks

FY2008

Refereed Papers

Posters