GPU vs CPU architecture. Memory hierarchy (briefly).
Approaches to distribute data among threads and thread blocks.
Shared memory.
Asynchronous operations. CUDA streams.
Visualization of time lines.
OpenACC. Compiler's directives to program GPU.
Practical part.
Day 2 (April 24th)
Application of GPU to solving challenging tasks. Different approaches and application are discussed.
Multiple GPU.
GPU optimized libraries.
Practical part.
Day 3 (May 26th)
GPU program optimization
CUDA + FORTRAN/Java/Python/C
Answering question and discussions. Here I plan to help researchers to find the way to adapt their code to GPU and make it effectively use compute resources.