Table of Contents
并行计算
论文阅读
适合并行计算的任务
并行计算
CUDA
、
OpenMP
、
OpenACC
和并行计算相关的基础知识:
并行计算
论文阅读
A compression-based memory-efficient optimization for out-of-core GPU stencil computation
A Compute Unified System Architecture for Graphics Clusters Incorporating Data-Locality
A Device‑Side Execution Model for Multi‑GPU Task Graphs
A Method for Estimating Task Granularity for Automating GPU Cycle Sharing
A Performance Model for Evaluating Inter-Node Device-Side Communication
Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression
Automated Code Generation of High-Order Stencils for a Dataflow Architecture
DCUDA Dynamic GPU Scheduling with Live Migration Support
GPUサイクル共有システムにおける分散深層学習の高速化
Lightning Scaling the GPU Programming Model Beyond a Single GPU
Multi‑GPU work sharing in a task‑based dataflow programming model
PACC An Extension of OpenACC for Pipelined Processing of Large Data on a GPU
PaRSEC Parallel Runtime Scheduling and Execution Controller
STAPL standard template adaptive parallel library
Towards Automating Multi-dimensional Data Decomposition for Executing a Single-GPU Code on a Multi-GPU System
GPU上のアウトオブコア・ステンシル計算を高速化するための実行パラメータの選定
単一GPUコードをマルチGPU環境で実行するための多次元データ分割手法の検討
期刊、会议
:
Future Generation Computer Systems (FGCS)
适合并行计算的任务
Stencil
加速方法
数据重用
数据压缩
压缩索引
优先计算中心部分
时间分块(一次计算多步)
梯形压缩(不太理解?)
Wavefront
Temporal Blocking
边缘部分叫作【Halo】
Cholesky 分解
可以分解为Task DAG
…