Table of Contents

并行计算
- 论文阅读
- 适合并行计算的任务

并行计算

CUDA、OpenMP、OpenACC

和并行计算相关的基础知识：并行计算

论文阅读

A compression-based memory-efficient optimization for out-of-core GPU stencil computation
A Compute Unified System Architecture for Graphics Clusters Incorporating Data-Locality
A Device‑Side Execution Model for Multi‑GPU Task Graphs
A Method for Estimating Task Granularity for Automating GPU Cycle Sharing
A Performance Model for Evaluating Inter-Node Device-Side Communication
Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression
Automated Code Generation of High-Order Stencils for a Dataflow Architecture
DCUDA Dynamic GPU Scheduling with Live Migration Support
GPUサイクル共有システムにおける分散深層学習の高速化
Lightning Scaling the GPU Programming Model Beyond a Single GPU
Multi‑GPU work sharing in a task‑based dataflow programming model
PACC An Extension of OpenACC for Pipelined Processing of Large Data on a GPU
PaRSEC Parallel Runtime Scheduling and Execution Controller
STAPL standard template adaptive parallel library
Towards Automating Multi-dimensional Data Decomposition for Executing a Single-GPU Code on a Multi-GPU System
GPU上のアウトオブコア・ステンシル計算を高速化するための実行パラメータの選定
単一GPUコードをマルチGPU環境で実行するための多次元データ分割手法の検討

期刊、会议：

Future Generation Computer Systems (FGCS)

适合并行计算的任务

Stencil
- 加速方法
  - 数据重用
  - 数据压缩
  - 压缩索引
  - 优先计算中心部分
  - 时间分块（一次计算多步）
  - 梯形压缩（不太理解？）
  - Wavefront
  - Temporal Blocking
- 边缘部分叫作【Halo】
Cholesky 分解
- 可以分解为Task DAG
…