Publications

[Publications in Japanese]

FY2023

Refereed Papers

  • Ivan Radanov Ivanov, Oleksandr Zinenko, Jens Domke, Toshio Endo, William S. Moses. Retargeting and Respecializing GPU Workloads for Performance Portability. In proceedings of the International Symposium on Code Generation and Optimization (CGO 2024), pp. 119-132, Edinburgh, March 2-6, 2024.
    [DOI: 10.1109/CGO57630.2024.10444828] [Conference]
  • Ivan Radanov Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert. Automatic Parallelization and OpenMP Offloading of Fortran. In proceedings of LLVM Performance Workshop, in conjuction with CGO 2024, Edinburgh, March 2, 2024.
    [Workshop]
  • Futa Kambe, Toshio Endo. Accelerating Stencil Computations on a GPU by Combining Using Tensor Cores and Temporal Blocking. In proceedings of the Workshop on General Purpose Processing using GPU (GPGPU 2024), in conjunction with PPoPP 2024, Edinburgh, March 2, 2024.
    [Workshop]
  • Ryubu Hosoki, Toshio Endo, Takahiro Hirofuchi and Tsutomu Ikegami. AshPipe: Asynchronous Hybrid Pipeline Parallel for DNN Training. In proceedings of The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2024), pp. 117-126, Nagoya, January 25-27, 2024.
    [DOI: 10.1145/3635035.3635045] [Conference]
  • Shohei Minami, Toshio Endo, Akihiro Nomura. The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System . In proceedings of IEEE High Performance Extreme Computing Conference (HPEC 2023), Virtual, September 23-27, 2023.
    [DOI: 10.1109/HPEC58863.2023.10363580] [Conference]
  • Chenyu Wang, Toshio Endo, Takahiro Hirofuchi and Tsutomu Ikegami. Pyramid Swin Transformer for Multi-Task: Expanding to More Computer Vision Tasks. In proceedings of Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS 2023), Springer, LNCS Vol. 14124, pp. 53-65, Kumamoto, August 21-22, 2023.
    [DOI: 10.1007/978-3-031-45382-3_5] [Conference]
  • Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka. PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. In proceedings of ACM International Conference on Supercomputing (ICS 2023), pp. 167-179, Orlando, June 21-23, 2023.
    [DOI: 10.1145/3577193.3593705] [Conference]
  • Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka. Revisiting Temporal Blocking Stencil Optimizations. In proceedings of ACM International Conference on Supercomputing (ICS 2023), pp. 251-263, Orlando, June 21-23, 2023.
    [DOI: 10.1145/3577193.3593716] [Conference]
  • Unrefereed Papers

  • Du Wu, Peng Chen, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. High Throughput 3D Image Reconstruction with GPUDirect and Tensor Core . IPSJ SIG Technical Report, 2024-HPC-193, No.25, 9 pages, March 18-19, 2024.
  • Chen Zhuang, Peng Chen, Xin Liu, Satoshi Matsuoka, Toshio Endo, Mohamed Wahib. Scalable Training of Graph Convolutional Networks on Supercomputers . Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2023), IPSJ SIG Technical Report, 2023-HPC-190, No.19, 10 pages, August 2-4, 2023.
  • Lingqi Zhang, Mohamed Wahib, Peng Chen, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka. High-performance Temporal Blocking Stencils at Low GPU Occupancy . Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2023), IPSJ SIG Technical Report, 2023-HPC-190, No.26, 10 pages, August 2-4, 2023.
  • Poster Presentations

  • Du Wu, Peng Chen, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib. Optimizing Matrix Multiplication on Arm Architectures . The 6th R-CCS International Symposium, poster session, January 29-30, 2024.
  • Chen Zhuang, Peng Chen, Xin Liu, Toshio Endo, Mohamed Wahib. General and Scalable Framework for GCN Training on CPU-powered Supercomputers . The 6th R-CCS International Symposium, poster session, January 29-30, 2024.
  • Shohei Minami, Toshio Endo, Akihiro Nomura. The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System . The cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG 2023), poster session, August 2-4, 2023.

    FY2022

    Refereed Papers

  • Shohei Minami, Toshio Endo, Akihiro Nomura. Effectiveness of the Oversubscribing Scheduling on Supercomputer Systems. In proceedings of High Performance Computing in the Asia-Pacific Region (HPC ASIA), pp. 18-28, Singapore, February 2023.
    [DOI: 10.1145/3578178.3578221] [Conference]
  • William S. Moses, Ivan Radanov Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko. High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs. In proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2023), pp. 119-134, Montreal, February 2023.
    [DOI: 10.1145/3572848.3577475] [Symposium]
  • Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka. Exploiting Scratchpad Memory for Deep Temporal Blocking. In proceedings of the 15th Workshop on General Purpose Processing Using GPU (GPGPU 2023), co-located with PPoPP 2023, short paper, Montreal, February 2023.
    [Workshop]
  • Chenyu Wang, Toshio Endo, Takahiro Hirofuchi and Tsutomu Ikegami. Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5 VISAPP, SciTePress, pp. 583-590, VISAPP 2023, Lisbon (hybrid), February 2023.
    [DOI: 10.5220/0011675800003417] [Conference]
  • Hiroki Aikawa, Toshio Endo, Tomoya Yuki, Takahiro Hirofuchi, Tsutomu Ikegami. Efficient Stencil Computation with Temporal Blocking by Halide DSL. In proceedings of 20th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp. 870-877, online, December 2022.
    [DOI: 10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00116] [Conference]
  • Chenyu Wang, Toshio Endo, Takahiro Hirofuchi and Tsutomu Ikegami. Speed-up Single Shot Detector on GPU with CUDA. In proceedings of 23rd ACIS International Summer Virtual Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD2022-Summer), Kyoto (online), Studies in Computational Intelligence, vol 1074. Springer, pp. 89-106, July 2022.
    [DOI: 10.1007/978-3-031-19604-1_7] [Conference]
  • Unrefereed Papers

  • Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka. Breaking the Memory Bottleneck for Iterative Memory-bound Applications Via Persistent Kernels . IPSJ SIG Technical Report, 2022-HPC-187, No.18, 10 pages, December 2022.
  • William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko. High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs arXiv:2207.00257 [cs.PL], July 2022.
  • FY2021

    Refereed Papers

  • Shohei Minami, Toshio Endo and Akihiro Nomura. Measurement and Modeling of Performance of HPC Applications towards Overcommitting Scheduling Systems . In proceedings of 24th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2021), in Conjunction with IPDPS 2021, pp. 59-79, Portland (online), May 2021.
    [DOI: 10.1007/978-3-030-88224-2_4] [Springer site] [slides@JSSPP site]
  • Unrefereed Papers

  • Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hiroki Kobayashi, Shin Matsushima, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuke Shimotoku, Daisuke Miyamoto,Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa, Ikki Fujiwara, Yusuke Tanimura, Takayuki Aoki, Toshio Endo, Satoshi Ohshima, Keiichiro Fukazawa, Susumu Date, Toshihiro Uchibayashi. mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations arXiv:2203.14188 [cs.LG], March 2022.
  • Other Presentations

  • Ivan Ivanov, Jens Domke and Toshio Endo. Automatic translation of CUDA code into high performance CPU code using LLVM IR transformations. The 4th R-CCS International symposium, Lightning talks session, Online, February 7, 2022.
  • FY2020

    Poster Presentations

  • Ivan R. Ivanov, Jens Domke, Akihiro Nomura and Toshio Endo. Improved failover for HPC interconnects through localised routing restoration . The 3rd R-CCS International Symposium, poster session, Feb 2021.
  • Shohei Minami, Toshio Endo, Akihiro Nomura. Performance Modeling of HPC Applications on Overcommitted Systems HPC Asia 2021, poster session, Jan 2021.

    FY2019

    Refereed Papers

  • Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka. AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs . In proceedings of International Symposium on Code Generation and Optimization (CGO 2020), pp. 199-211, San Diego, Feb 2020.
    [DOI: 10.1145/3368826.3377904] [ACM digital library]
  • Toshio Endo. Integrating Cache Oblivious Approach with Modern Processor Architecture: The Case of Floyd-Warshall Algorithm. In proceedings of HPC Asia 2020, Fukuoka, Jan 2020.
    [DOI: 10.1145/3368474.3368477] [ACM digital library] [paper] [slides]

    Unrefereed Papers

  • Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo. Profiling based Out-of-core Hybrid Method for Large Neural Networks arXiv:1907.05013 [cs.LG], July 2019.

    Poster Presentations

  • Tomoya Yuki, Toshio Endo. Toward Latency-Aware Data Arrangement on Many-Core Processors . HPC Asia 2020, poster session, No. 51, Fukuoka, Jan 2020.
    [abstract]

    Other Presentations

  • Toshio Endo. Activity Report from Tokyo Tech:Energy Efficiency of TSUBAME3.0. Energy Efficient HPC State of the Practice Kobe Meeting, Kobe, August 4, 2019.

    FY2018

    Refereed Papers

  • Yukinori Sato, Tomoya Yuki, and Toshio Endo. An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation. ACM Transactions on Architecture and Code Optimization (TACO). Volume 15, Issue 4, Article No. 67, 23 pages. Jan 2019.
    [DOI:10.1145/3293449] [ACM library]
  • Ryo Matsumiya, Toshio Endo. Scalable RMA-based Communication Library Featuring Node-local NVMs. In proceedings of 2018 IEEE High Performance Extreme Computing Conference(HPEC 2018), 7 pages. Sep 2018,
    [DOI:10.1109/HPEC.2018.8547546] [IEEE library] [paper]
  • Toshio Endo. Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy. In proceedings of the 7th IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA 2018), pp.19-24. Aug 2018.
    [DOI: 10.1109/NVMSA.2018.00016] [IEEE library] [paper] [slides]

    Book Chapters

  • Toshio Endo, Hiroko Midorikawa, Yukinori Sato. Software Technology That Deals with Deeper Memory Hierarchy in Post-petascale Era. Advanced Software Technologies for Post-Peta Scale Computing, Mitsuhisa Sato (Ed), Springer, pp. 227-248, Jan 2019.
    [ISBN: 978-981-13-1923-5, 978-981-13-1924-2 (online)] [DOI: 10.1007/978-981-13-1924-2] [Springer link]

    Poster Presentations

  • Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo. Profiling based out-of-core hybrid method for large neural networks . the 24th ACM Symposium on Principles and Practice of Parallel Programming, poster session, Washington DC, Feb 2019.
    [DOI: 10.1145/3293883.3298790] [ACM library]

    Other Presentations

  • Toshio Endo. Current Status of TSUBAME3.0 Operation (as of Mar 2019). 7th Accelerated Data and Computing (ADAC) Workshop, Zurich, March 25, 2019.
  • Toshio Endo. Current Status of TSUBAME3.0 Operation. 6th Accelerated Data and Computing (ADAC) Workshop, Zurich, June 2018.

    FY2017

    Refereed Papers

  • Noboru Tanabe and Toshio Endo. Characterizing Memory-Latency Sensitivity of Sparse Matrix Kernels. 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2018), pp. 249-254, Cambridge, March 2018.
    [DOI: 10.1109/PDP2018.2018.00042]
  • Noboru Tanabe and Toshio Endo. Exhaustive Evaluation of Memory-Latency Sensitivity on Manycore Processors with Large Cache. 2018 2nd International Conference on High Performance Compilation, Computing and Communications (HP3C-2018), pp. 27-34, Hong Kong, March 2018.
    [DOI: 10.1145/3195612.3195616]
  • Yuki Ito, Ryo Matsumiya, and Toshio Endo. ooc_cuDNN: Accommodating Convolutional Neural Networks over GPU Memory Capacity. In Proceedings of 2017 IEEE International Conference on Big Data (IEEE BigData 2017), pp. 183-192, Boston, December 2017.
    [DOI: 10.1109/BigData.2017.8257926] [IEEE digital library]
  • Shota Kuroda, Toshio Endo, Satoshi Matsuoka. Applying Temporal Blocking with a Directive-based Approach. In Proceedings of Fourth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), in conjuntion with SC17, Article No. 8, Denver, November 13, 2017.
    [DOI: 10.1145/3148173.3148190] [ACM digital library] [paper] [slides]
  • Takashi Shimokawabe, Toshio Endo, Naoyuki Onodera, Takayuki Aoki. A Stencil Framework to Realize Large-scale Computations Beyond Device Memory Capacity on GPU Supercomputers. In Proceedings of IEEE International Conference on Cluster Computing (CLUSTER 2017), pp. 525-529, Honolulu, September 2017.
    [DOI: 10.1109/CLUSTER.2017.97]
  • Yukinori Sato and Toshio Endo. An Accurate Simulator of Cache-line Conflicts to Exploit the Underlying Cache Performance. In Proceedings of 23rd International European Conference on Parallel and Distributed Computing (Euro-par 2017), pp. 119-133, Santiago, Spain, August 2017.
    [DOI: 10.1007/978-3-319-64203-1_9]
  • Yukinori Sato, Tomoya Yuki and Toshio Endo. ExanaDBT: A Dynamic Compilation System for Transparent Polyhedral Optimizations at Runtime. In Proceedings of ACM International Conference on Computing Frontiers 2017, 10pages, Siena, May 2017.
    [DOI: 10.1145/3075564.3077627]

    Articles

  • Satoshi Matsuoka, Toshio Endo, Akira Nukada, Shinichi Miura, Akihiro Nomura, Hitoshi Sato, Hideyuki Jitsumoto, Aleksandr Drozd. Overview of TSUBAME3.0, Green Cloud Supercomputer for Convergence of HPC, AI and Big-Data . Global Scientific Information and Computing Center, Tokyo Institute of Technology, e-Science Journal, Vol. 16, pp. 2--9, November 2017.

    Poster Presentations

  • Yuki Ito, Ryo Matsumiya, and Toshio Endo. ooc cuDNN: A Deep Learning Library Supporting CNNs over GPU Memory capacity. International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia2018) Poster Session. Tokyo, January 2018.
  • Ryo Matsumiya, and Toshio Endo. vGASNet: A PGAS Communication Library Supporting Out-of-Core Processing. International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia2018) Poster Session. Tokyo, January 2018.
  • Tomoya Yuki, Yukinori Sato, and Toshio Endo. Evaluating Autotuning Heuristics for Loop Tiling. International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia2018) Poster Session. Tokyo, January 2018.
  • Yuki Ito, Ryo Matsumiya, and Toshio Endo. ooc_cuDNN : A Deep Learning Library Supporting CNNs over GPU Memory Capacity. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC17), Research Poster Session. Denver, November 2017.

    Other Presentations

  • Toshio Endo. Realizing Extremely Large-Scale Scientific Applications Using Deep Memory Hierarchy. SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP18), Tokyo, March 2018.
  • Toshio Endo, Hiroko Midorikawa, Yukinori Sato. Software Technology that Deals with Deeper Memory Hierarchy in Post-petascale Era. JST/CREST International Symposium on Post Petascale System Software (ISP2S2-2017), Tokyo, December 2017.
    [slides]
  • Toshio Endo, Satoshi Matsuoka. TSUBAME3.0: A Green, Accelerated, Big-Data Supercomputer. ATIP Workshop on International Exascale and Next-Generation Computing Programs, in conjunction with SC17. Denver, November 2017.

    FY2016

    Refereed Papers

  • Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, Takatsugu Ono. Evaluating the Impacts of Code-Level Performance Tunings on Power Efficiency. In Proceedings of IEEE International Conference on Big Data (BigData 2016), 6pages, Dec 2016.
    [DOI: 10.1109/BigData.2016.7840624] [IEEE digital library]
  • Ryo Matsumiya, Toshio Endo. PGAS Communication Runtime for Extreme Large Data Computation. In Proceedings of Second International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), in conjunction with IEEE/ACM SC16, 8pages, Saltlake City, November 18, 2016.
    [DOI: 10.1109/ESPM2.2016.007] [ACM digital library]
  • Toshio Endo. Realizing Out-of-Core Stencil Computations using Multi-Tier Memory Hierarchy on GPGPU Clusters . In Proceedings of IEEE Cluster Computing (CLUSTER2016), pp. 21-29, Taipei, Sep 2016.
    [DOI: 10.1109/CLUSTER.2016.61] [paper] [slides]
  • Katsuki Fujisawa, Toyotaro Suzumura, Hitoshi Sato, Koji Ueno, Yuichiro Yasui, Keita Iwabuchi, Toshio Endo. Advanced Computing & Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers. Fujisawa, Katsuki, Shinano, Yuji, and Waki, Hayato (eds.), Optimization in the Real World - Toward Solving Real-World Optimization Problems -, Series of Mathematics for Industry, Springer, pp. 1-13, 2016.
    [DOI:10.1007/978-4-431-55420-2_1]

    Invited Papers

  • Satoshi Matsuoka, Hideharu Amano, Kengo Nakajima, Koji Inoue, Tomohiro Kudoh, Naoya Maruyama, Kenjiro Taura, Takeshi Iwashita, Takahiro Katagiri, Toshihiro Hanawa, Toshio Endo. From FLOPS to BYTES: Disruptive Change in High-Performance Computing towards the Post-Moore Era . In proceedings of the ACM International Conference on Computing Frontiers (CF'16), pp. 274-281, May 2016.
    [DOI: 10.1145/2903150.2906830] [ACM digital library]

    Poster Presentations

  • Takashi Shimokawabe, Toshio Endo, Naoyuki Onodera, Takayuki Aoki. Performance Evaluation of Wind Simulation Based on a GPU-computing Framework to Realize Large-scale Stencil Computations Beyond Device Memory Capacity. The 7th AICS International Symposium, Poster session, Kobe, Feb 2017.

    Other Presentations

  • Toshio Endo. Operating Experience with SSD and GPUs. Accelerated Data and Computing (ADAC) Workshop, Lugano, June 2016.

    FY2015

    Refereed Papers

  • Yukinori Sato, Toshio Endo. Dynamic Compilation for Transparent Data Locality Analysis and Memory Subsystem Tuning . The International Workshop on Architectural and Micro-Architectural Support for Dynamic Optimization (AMAS-DO), In conjunction with CGO 2016, Barcelona, March 13, 2016.
  • Shimpei Sato, Yukinori Sato, Toshio Endo. A Cache-aware Temporal Blocking Method for 3D Stencil Computation . 3rd International Workshop on High-Performance Stencil Computations (HiStencils 2016), In conjunction with HiPEAC 2016, Prague, January 18, 2016.
  • Toshio Endo, Yuki Takasaki, Satoshi Matsuoka. Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers . In Proceedings of The 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS 2015), pp. 625-632, Melbourne, December, 2015.
    [DOI: 10.1109/ICPADS.2015.84] [IEEE digital library] [paper] [slides]
  • Yuki Tsujita, Toshio Endo, Katsuki Fujisawa. The Scalable Petascale Data-Driven Approach for the Cholesky Factorization with Multiple GPUs. In Proceedings of First International Workshop on Extreme Scale Programming Models and Middleware (ESPM2 2015), in conjunction with IEEE/ACM SC15, Austin, November 15, 2015.
    [paper] [slides]
  • Yukinori Sato, Shimpei Sato, Toshio Endo. Exana: An Execution-driven Application Analysis Tool for Assisting Productive Performance Tuning. In Proceedings of The Second Workshop on Software Engineering for Parallel Systems (SEPS), in conjunction with ACM SPLASH 2015, Pittsburgh, October 27, 2015.
    [ACM digital library]
  • Shimpei Sato, Yukinori Sato, Toshio Endo. Investigating Potential Performance Benefits of Memory Layout Optimization based on Roofline Model. In Proceedings of The Second Workshop on Software Engineering for Parallel Systems (SEPS), in conjunction with ACM SPLASH 2015, Pittsburgh, October 27, 2015.
    [ACM digital library]
  • Naoto Sasaki, Kento Sato, Toshio Endo, Satoshi Matsuoka. Exploration of Lossy Compression for Application-level Checkpoint/Restart. In Proceedings of IEEE International Conference on Parallel and Distributed Processing Symposium 2015 (IPDPS2015), pp. 914-922, Hyderabad, May 2015.
    [DOI:10.1109/IPDPS.2015.67] [IEEE digital library]
  • Yuki Tsujita, Toshio Endo. Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition. In Proceedings of Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), in conjunction with IPDPS 2015, Hyderabad, May 2015.
    [JSSPP15 site]
  • Kazuki Tsuzuku, Toshio Endo. Power Capping of CPU-GPU Heterogeneous Systems Using Power and Performance Models. In Proceedings of International Conference on Smart Cities and Green ICT Systems (SMARTGREENS2015), 8pages, Lisbon, May 2015.
    [IEEE digital library]

    Invited Talks

  • Toshio Endo. Harnessing Multi-tier Memory Hierarchy of GPU, Host and Flash. 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing, Taipei, February 20, 2016.

    Other Presentations

  • Toshio Endo, Akira Nukada, Satoshi Matsuoka. Power Capping Scheduling on TSUBAME2.5 and Upgrade of TSUBAME-KFC. Building Energy Efficient HPC Working Group Workshop, held with SC15, Austin, November 16, 2015.
  • Toshio Endo, Satoshi Matsuoka. Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers with a Memory Hierarchy Management Runtime Library. Workshop on Programming Abstractions for Data Locality (PADAL 2015), Berkeley, June 25, 2015.

    Posters

  • Kazuki Tsuzuku, Toshio Endo. Online Power Capping of CPU-GPU Heterogeneous Systems, GPU Technology Conference Japan (GTC Japan), poster session, Tokyo, September 18, 2015.
  • Guanghao Jin, Toshio Endo. High Productive Framework to Enable Stencil Computation on Bigger Domains on TSUBAME2.5 , GPU Technology Conference Japan (GTC Japan), poster session, Tokyo, September 18, 2015.
  • Guanghao Jin,Toshio Endo. Efficient Utilization of GPU Cluster Resource for Stencil Computation. IPSJ HPCS 2015 symposium, Poster session, Tokyo, May 19, 2015.

    FY2014

    Refereed Papers

  • Guanghao Jin, James Lin, Toshio Endo. Efficient Utilization of Memory Hierarchy to Enable the Computation on Bigger Domains for Stencil Computation in CPU-GPU Based Systems . In Proceedings of IEEE International Conference on High Performance Computing and Applications (ICHPCA-2014), 6 pages, Bhubaneswar, December, 2014. [DOI:10.1109/ICHPCA.2014.7045354]
  • Toshio Endo, Akira Nukada, Satoshi Matsuoka. TSUBAME-KFC: a Modern Liquid Submersion Cooling Prototype towards Exascale Becoming the Greenest Supercomputer in the World . In Proceedings of The 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014), pp.360-367, Hsinchu, December, 2014. [DOI:10.1109/PADSW.2014.7097829] [paper] [slides]
  • Toshio Endo, Guanghao Jin. Software Technologies Coping with Memory Hierarchy of GPGPU Clusters for Stencil Computations . In Proceedings of IEEE Cluster Computing (CLUSTER2014), pp.132-139, Madrid, September 25, 2014. [DOI:10.1109/CLUSTER.2014.6968747] [paper] [slides]
  • Hiroko Midorikawa, Hideyuki Tan, Toshio Endo. An Evaluation of the Potential of Flash SSD as Large and Slow Memory for Stencil Computations . In Proceedings of The 2014 International Conference on High Performance Computing & Simulation (HPCS 2014), Bologna, Italy, July 24, 2014.
  • Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki. Peta-scale General Solver for Semidefinite Programming Problems with over Two Million Constraints . In Proceedings of IEEE International Conference on Parallel and Distributed Processing Symposium 2014 (IPDPS2014), pp.1171-1180, Phoenix, USA, May 22, 2014. [DOI:10.1109/IPDPS.2014.121]

    Invited Talks

  • Toshio Endo. [Plenary Talk] Harnessing Memory Hierarchy towards Extreme Fast and Big Simulations. 2015 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing. Taipei, Feb 27, 2015.
  • Toshio Endo. Experiences with the 5.7Pflop/s System TSUBAME2.5 at Tokyo Tech. HP-CAST 22. Leipzig, Jun 20, 2014.

    Articles

  • Toshio Endo, Akira Nukada, Satoshi Matsuoka. TSUBAME-KFC: the Greenest Supercomputer in the World With Liquid Submersion Cooling . Global Scientific Information and Computing Center, Tokyo Institute of Technology, e-Science Journal, Vol. 11, pp. 2--7, June 2014.

    Unrefereed Papers

  • Tianqi Xu, Jin Guanghao, Endo Toshio, Matsuoka Satoshi. Efficient Utilization of Multi-level Memory System for Stencil Computation, IPSJ SIG Technical Report, 2014-HPC-147 No.10, 7 pages, Otaru, December 2014.

    Other Presentations

  • Toshio Endo. Locality Improvement of Stencil Computations for Big Simulations. JST/CREST International Symposium on Post Petascale System Software (ISP2S2), Kobe, December 4, 2014.
  • Toshio Endo. Software Technology that Deals with Deeper Memory Hierarchy in Post-petascale Era. 2014 ATIP Workshop: Japanese Research Toward Next-Generation Extreme Computing , New Orleans, November 17, 2014.
  • H. Nakamura, M. Kondo, K. Inoue, M. Schulz, T. Gamblin, B. Rountree, T. Endo, A. Nukada, S. Matsuoka. Power Management and Optimization toward Exascale Supercomputing. Workshop on International Cooperation for Extreme-Scale Computing, held with ISC'14, Leipzig, June 22, 2014.
  • Guanghao Jin, Mohamed Wahib, Naoya Maruyama, Toshio Endo, Satoshi Matsuoka. Locality Optimizations for Stencil Computations: Algorithms and Implementations, Workshop on Programming Abstractions for Data Locality (PADAL 2014), Lugano, April 28, 2014.

    Posters

  • Kazuki Tsuzuku, Toshio Endo. Power Capping of CPU-GPU Heterogeneous Systems using Power and Performance Models. GPU Technology Conference (GTC 2015), poster session, San Jose, March, 2015.
  • Toshio Endo, Yukinori Sato, Hiroko Midorikawa. Software Technology that Deals with Deeper Memory Hierarchy in Post-petascale Era. JST/CREST International Symposium on Post Petascale System Software (ISP2S2), poster session, Kobe, December 2, 2014.
  • Guanghao Jin, Toshio Endo. The Efficient Utilization of Memory Hierarchy on GPU Clusters. JST/CREST International Symposium on Post Petascale System Software (ISP2S2), poster session, Kobe, December 2, 2014.
  • Guanghao Jin, Toshio Endo. Data Management and Loop Controlling to Surpass Memory Capacity of GPU in OpenACC Framework. GTC Technology Conference Japan,poster session, Tokyo, July 16, 2014. [NVIDIA Award]
  • Naoto Sasaki, Kento Sato, Toshio Endo and Satoshi Matsuoka. Exploration of Application-level Lossy Compression for Fast Checkpoint/Restart. HPC in Asia poster session, held with ISC'14, Leipzig, June 2014.
  • Akihiro Nomura, Shin'ichi Miura, Toshio Endo and Satoshi Matsuoka. Application Performance Characterization towards Exa-scale Supercomputers. HPC in Asia poster session, held with ISC'14, Leipzig, June 2014.
  • Guanghao Jin, Toshio Endo and Satoshi Matsuoka. Efficient Utilization of Memory Hierarchy on GPU Clusters: Optimization Methods and Performance Models. HPC in Asia poster session, held with ISC'14, Leipzig, June 2014.

    FY2013

    Refereed Papers

  • Guanghao Jin, Toshio Endo, Satoshi Matsuoka. A Parallel Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPUs . In Proceedings of IEEE Cluster Computing (CLUSTER2013), pp. 1--8, Indianapolis, September 2013. [DOI: 10.1109/CLUSTER.2013.6702633]
  • Yukinori Sato, Hiroko Midorikawa, and Toshio Endo. Identifying working data set of particular loop iterations for dynamic performance tuning. In 6th Workshop on Architectural and Microarchitectural Support for Binary Translation (AMAS-BT2013). Held in conjunction with the 40th Int'l Symposium on Computer Architecture (ISCA-40), Tel-Aviv, Israel, pp. 1-6, Jun. 24, 2013.
  • Guanghao Jin, Toshio Endo, Satoshi Matsuoka. A Multi-level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU . In Proceedings of The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), in conjunction with IEEE IPDPS 2013, pp. 1080--1087, Boston, May 2013. [DOI: 10.1109/IPDPSW.2013.58]

    Unrefereed Papers

  • Jin Guanghao, Endo Toshio, Matsuoka Satoshi. Multi-level Temporal Blocking for Stencil Computation for Memory Hierarchy on TSUBAME2.5, IPSJ SIG Technical Report, 2014-HPC-143 No.33, 8 pages, Nanao, March 2014.

    Posters

  • Guangho Jin, Tomoki Kawamura, Naoya Maruyama, Toshio Endo, Satoshi Matsuoka. Optimization Methods for Efficient Utilization of Memory Hierarchy on GPU Cluster, GPU Technology Conference (GTC2014), poster session, San Jose, March 2014.
  • Katsuki Fujisawa, Toshio Endo, Hitoshi Sato, Yuichiro Yasui, Naoki Matsuzawa, Hayato Waki. Peta-Scale General Solver for Semidefinite Programming Problems with Over Two Million Constraints, IEEE/ACM SC13, poster session, Denver, November 2013.
  • Katsuki Fujisawa, Toshio Endo, Hitoshi Sato, Yuichiro Yasui, Naoki Matsuzawa, Hayato Waki. Peta-scale General Solver for Semidefinite Programming Problems with over Two Million Constraints, GPU Technology Conference Japan (GTC Japan), poster session, Tokyo, June 2013. [NVIDIA Award]

    Other Presentations

  • Toshio Endo. Software Technology that Deals with Deeper Memory Hierarchy in Post-petascale Era, The Japanese Extreme Big Data Projects Workshop, Fukuoka, Japan, February 2014. [slides]

    FY2012

    Refereed Papers

  • Katsuki Fujisawa, Toshio Endo, Hitoshi Sato, Makoto Yamashita, Satoshi Matsuoka, Maho Nakata. High-Performance General Solver for Extremely Large-scale Semidefinite Programming Problems. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC12), pp. 1-11. Saltlake City, November 2012. [DOI: 10.1109/SC.2012.67]

    Posters

  • Keisuke Fukuda, Naoya Maruyama, Toshio Endo, Miquel Pericas, Satoshi Matsuoka. Fast Multipole Method on a Heterogeneous Dynamic Task Scheduling Engine, GPU Technology Conference (GTC), poster session, San Jose, March 2013.

    FY2011

    Refereed Papers

  • Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Akinori Yamanaka, Akira Nukada, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka. Peta-scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), pp. 1--11, Seattle, November 2011. [DOI: 10.1145/2063384.2063388] [ACM Gordon Bell Prize Special Achievements in Scalability and Time-to-Solution]
  • Massimo Bernaschi, Mauro Bisson, Toshio Endo, Massimiliano Fatica, Satoshi Matsuoka, Simone Melchionna, Sauro Succi. Petaflop Biofluidics Simulations On A Two Million-Core System. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), pp. 1--12, Seattle, November 2011. [DOI: 10.1145/2063384.2063389]
  • Shiqiao Du, Takuro Udagawa, Toshio Endo and Masakazu Sekijima. Molecular Dynamics Simulation of a Biomolecule with High Speed, Low Power and Accuracy Using GPU-Accelerated TSUBAME2.0 Supercomputer. In Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2011), Xi'an, October 2011.

    Invited Talks

  • Toshio Endo. TSUBAME2.0: A Petascale GPU-accelerated Supercomputer, The Second International Conference on Networking and Computing (ICNC'11), Tutorial, Osaka, December 2011.

    Unrefereed Papers

  • Irina Demeshko, Satoshi Matsuoka, Toshio Endo. GPU-based approach for elastic-plastic deformation simulation, Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2011), IPSJ SIG Technical Report, 2011-HPC-130 No.12, 7 pages, Kagoshima, August 2011.

    FY2010

    Refereed Papers

  • Takashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka. An 80-Fold Speedup, 15.0 TFlops, Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC10), pp.1-11, New Orleans, November 2010.
  • Hitoshi Nagasaka, Naoya Maruyama, Akira Nukada, Toshio Endo, and Satoshi Matsuoka, Statistical Power Modeling of GPU Kernels Using Performance Counters. Proceedings of International Green Computing Conference (IGCC'10), pp. 115--122, Chicago, IL, USA, Aug 2010.
  • Toshio Endo, Akira Nukada, Satoshi Matsuoka and Naoya Maruyama. Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators. In Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010), Atlanta, pp.1-8, April 2010. [paper] [slides]

    Unrefereed Papers

  • Nguyen Toan, Hideyuki Jitsumoto, Naoya Maruyama, Tatsuo Nomura, Toshio Endo, Satoshi Matsuoka. MPI-CUDA Applications Checkpointing, Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2010), IPSJ SIG Technical Report, 2010-HPC-126 No.18, 7 pages, Kanazawa, August 2010.

    FY2009

    Journal Article

    Satoshi Matsuoka, Takayuki Aoki, Toshio Endo, Akira Nukada, Toshihiro Kato and Atushi Hasegawa. GPU accelerated computing?from hype to mainstream, the rebirth of vector computing. Journal of Physics: Conference Series, Vol 180, 10 pages, 2009. [DOI: 10.1088/1742-6596/180/1/012043]

    Refereed Papers

  • Tomoaki Hamano, Toshio Endo and Satoshi Matsuoka. Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters. In Proceedings of The Fourth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction with IPDPS 2009, pp.1-8, May 2009.
  • Hitoshi Sato, Satoshi Matsuoka and Toshio Endo. File Clustering Based Replication Algorithm in a Grid Environment. In Proceedings of IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2009), pp.204-211, May 2009.

    Invited Talks

  • Toshio Endo. Supercomputing on The TSUBAME GPU-Accelerated Cluster, CSIRO GPU Cluster Workshop, Melbourne, June 2009.

    FY2008

    Refereed Papers

  • Hideyuki Jitsumoto, Toshio Endo and Satoshi Matsuoka. Environmental-Aware Optimization of MPI Checkpointing Intervals . In Proceedings of HPC ASIA 2009, pp. 285--292, March 2009.
  • Akira Nukada, Yasuhiko Ogata, Toshio Endo and Satoshi Matsuoka. Bandwidth Intensive 3-D FFT kernel for GPUs using CUDA . In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC08), pp.1-11, November 2008.
  • Hitoshi Sato, Satoshi Matsuoka, Toshio Endo and Naoya Maruyama. Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid Environment. In Proceedings of IEEE/ACM International Conference on Grid Computing (Grid 2008), pp.250-257, October 2008.
  • Toshio Endo and Satoshi Matsuoka. Massive Supercomputing Coping with Heterogeneity of Modern Accelerators . In Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2008), pp.1-10, April 2008. [paper] [slides]
  • Shin'ichiro Takizawa, Toshio Endo and Satoshi Matsuoka. Locality Aware MPI Communication on a Commodity Opto-Electronic Hybrid Network. In Proceedings of Workshop on Large-Scale Parallel Processing (LSPP), in conjunction with IEEE IPDPS 2008, pp.1-8, April 2008.
  • Yasuhiko Ogata, Toshio Endo, Naoya Maruyama and Satoshi Matsuoka. An Efficient, Model-Based CPU-GPU Heterogeneous FFT Library. In Proceedings of 17th International Heterogeneity in Computing Workshop (HCW '08), in conjunction with IEEE IPDPS 2008, pp.1-10, April 2008.
  • Yuto Hosogaya, Toshio Endo and Satoshi Matsuoka. Performance Evaluation of Parallel Applications on Next Generation Memory Architecture with Power-Aware Paging Method. In Proceedings of The Fourth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction with IPDPS 2008, pp.1-8, April 2008.

    Posters

  • Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka. Environmental-Aware Optimization of MPI Checkpointing Intervals, IEEE International Conference on Cluster Computing (Cluster 2008), poster session, September 2008.

    FY2007

    Refereed Papers

  • Tatsuhiro Chiba, Toshio Endo and Satoshi Matsuoka. High-Performance MPI Broadcast Algorithm for Grid Environments Utilizing Multi-lane NICs. In Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid2007), pp.487--494, May 2007.

    Posters

  • Toshio Endo, Satoshi Matsuoka. A Methodology for Coping with Heterogeneity of Modern Accelerators on a Massive Supercomputing Scale, ACM/IEEE Conference on Supercomputing (High Performance Computing, Networking, Storage and Analysis) (SC07), poster session, November 2007. [poster]

    Other Presentations

  • Tatsuhiro Chiba, Toshio Endo, Satoshi Matsuoka. High Performance MPI Broadcast Algorithm for Grid Environments with Long-fat Pipes, Korea-Japan Grid Symposium 2007, Sapporo, Japan, July 2007.

    FY2006

    Refereed Papers

  • Hideyuki Jitsumoto, Toshio Endo and Satoshi Matsuoka. ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs. In Proceedings of 12th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems (DPDNS '07), in conjunction with IPDPS 2007, pp.1-8, March 2007.

    FY2005 and before

    Refereed Papers

  • Toshio Endo and Kenjiro Taura. Highly Latency Tolerant Gaussian Elimination. In Proceedings of IEEE/ACM International Workshop on Grid Computing (Grid2005), pp. 91--98, November 2005. [paper] [slides]
  • Toshio Endo, Kenji Kaneda, Kenjiro Taura and Akinori Yonezawa. High Performance LU Factorization for Non-dedicated Clusters. In Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid2004), pp. 678--685, April 2004. [paper] [slides]
  • Kenjiro Taura, Toshio Endo, Kenji Kaneda, and Akinori Yonezawa. Phoenix : a Parallel Programming Model for Accommodating Dynamically Joining/Leaving Resources. In Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '03), pp.216-229, June 2003.
  • Toshio Endo and Kenjiro Taura. Reducing Pause Time of Conservative Collectors. In Proceedings of ACM SIGPLAN International Symposium on Memory Management (ISMM2002), Berlin, pp.119-131, June 2002. [paper] [slides]
  • Toshio Endo, Kenjiro Taura and Akinori Yonezawa. Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors. In Proceedings of 15th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, pp.1-6, April 2001. [paper]
  • Toshio Endo, Kenjiro Taura and Akinori Yonezawa. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. In Proceedings of ACM/IEEE Conference on Supercomputing (High Performance Networking and Computing) (SC97), San Jose, 14pages, November 1997. [paper]

    Theses for Degrees

  • Toshio Endo. Scalable Dynamic Memory Management Module on Shared Memory Multiprocessors. Ph.D Thesis, Department of Information Science, Faculty of Science, University of Tokyo. June 2001. [paper] [slides in japanese]
    NOTE: The paper is written in English, but title page and abstract page include Japanese characters.
  • Toshio Endo. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. Master thesis, Department of Information Science, The University of Tokyo, February 1998. [paper]
    NOTE: The paper is written in English, but title page and abstract page include Japanese characters.
  • Toshio Endo. A Methodology for Constructing a Portable Garbage Collector on Parallel Machines. Senior thesis, Department of Information Science, The University of Tokyo, February 1996.

    [Publications in Japanese]
    [Endo lab]
    [Endo's page]