Options for doing SIMD computation with fewer tears, for people, like me, who give not a damn about implementation details, but just want it to work fast enough.
Aside: you're going to have to mess around with downloading proprietary GPU toolkits from the manufacturer. Tedious. Consider instead paying some cloud provider to rent their pre-configured machines.
Hip, but not so hop as FPGA computation.
Just writing GSL shaders using your compiler and the relevant manufacturer toolboxes. Laborious and tangential, unless you are a GPU-algorithm researcher. But could be fun I s'pose. See the book of shaders.
for data-oriented computational data flow graphs, use one of those toolkits from the deep_learning community. These are easy and performant, although not quite as general as just writing a damn shader.
OK, try these (emphasis on integration with python):
numba compiles a subset of python to run on CPUs or GPUs; this sound uninspiring, but it turns out to be amazing because the debugging affordances are really good when you can switch between a python interpreter and a C compiler for the same code. It generates C loops from plain python, which is incredible. OTOH the GPU stuff is not seamless and requires a little too much parallelism hinting to be plausibly useful to amateurs like me.
cupy is an NVIDIA-backed numpy clone which includes bonus CUDA libraries and DNN operations.
Gnumpy isn't fashionable but has been around, and has a very fancy pedigree:
Do you want to have both the compute power of GPUs and the programming convenience of Python numpy? Gnumpy + Cudamat will bring you that.
Gnumpy is a simple Python module that interfaces in a way almost identical to numpy, but does its computations on your computer's GPU.[…]
Gnumpy runs on top of, and therefore requires, the excellent cudamat library, written by Vlad Mnih.
Gnumpy can run in simulation mode: everything happens on the CPU, but the interface is the same. This can be helpful if you like to write your programs on your GPU-less laptop before running them on a GPU-equipped machine. It also allows you to easily test what performance gain you get from using a GPU. The simulation mode requires npmat, written by Ilya Sutskever.
The aim of the cudamat project is to make it easy to perform basic matrix calculations on CUDA-enabled GPUs from Python. cudamat provides a Python matrix class that performs calculations on a GPU. At present, some of the operations our GPU matrix class supports include…
The book of shaders
This book will focus on the use of GLSL pixel shaders. First we'll define what shaders are; then we'll learn how to make procedural shapes, patterns, textures and animations with them. You'll learn the foundations of shading language and apply it to more useful scenarios such as: image processing (image operations, matrix convolutions, blurs, color filters, lookup tables and other effects) and simulations (Conway's game of life, Gray-Scott's reaction-diffusion, water ripples, watercolor effects, Voronoi cells, etc.). Towards the end of the book we'll see a set of advanced techniques based on Ray Marching.