I want to do machine learning without the cloud, which as we learn previously, is awful.
But also I’m a vagabond with nowhere safe and high-bandwidth to store a giant GPU machine (campus IT don’t return my calls about it; I think they think I’m taking the piss.)
So, let’s buy a Razer Blade 2016, a nice portable, surprisingly cheap laptop with all the latest feature and a comparable performance to the kind of single-GPU desktop machine I could afford.
I don’t want to do anything fancy here, just process a few gigabytes of MP3 data. My data is stored in the AARNET owncloud server. It’s all quite simple, but the algorithm is just too slow without a GPU and I don’t have a GPU machine I can leave running. I’ve developed it in keras v1.2.2, which depends on tensorflow 1.0.
So installing CUDA etc on the laptop is straightforward. Making it run is not.
Switching the GPU on or off at the right times.
Confusing, because most sources seem to think you want to have NVIDIA graphics; but what if you merely want NVIDIA computing via CUDA, and don’t care about graphics?
In stark contrast to NVIDIA, they claim:
There is sometimes confusion about CUDA. You don’t need Bumblebee to run CUDA. Follow the How-to to get CUDA working under Ubuntu.
Lies! without Bumblebee you have no control which processes are using the GPU and they collide and explode. Specifically, Tensorflow wants the whole GPU.
Their second point is interesting though
There is however a new feature (--no-xorg option for optirun) in Bumblebee 3.2, which makes it possible to run CUDA / OpenCL applications that does not need the graphics rendering capabilities.
There is are various walkthroughs. What worked for me was a combination of all of them. There is also a distinction between optimus and primus, which I am doing my utmost to know nothing about. Try both, maybe one will work.
- older walkthrough for a Razer Blade.
- Daniel Teichmann’s Bumblebee instructions are fresher but more generic hardware and optimus instead of primus
- Webupd8 bumblebee howto is old and non-specific but authoritative
sudo apt-get install linux-source-4.10.0 build-essential linux-headers-generic-hwe-16.04 sudo apt install bumblebee bumblebee-nvidia primus
Note that Teichmann doesn’t give a kernel version number, but I do because kernel version 4.4 is broken for Razer laptops, causing infinite loops in the power management subsystem if ever you close the laptop lid. So, 4.10 works, let’s use it. 4.8 might work, I can’t rememeber.
sudo update-alternatives --config i386-linux-gnu_gl_conf sudo update-alternatives --config x86_64-linux-gnu_egl_conf sudo update-alternatives --config x86_64-linux-gnu_gl_conf sudo prime-select intel sudo atom /etc/bumblebee/bumblebee.conf sudo atom /etc/modprobe.d/bumblebee.conf sudo atom /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf sudo atom /usr/lib/i386-linux-gnu/mesa/ld.so.conf sudo atom /etc/bumblebee/xorg.conf.nvidia
test using virtualgl.
sudo modprobe nvidia-uvm nvidia primusrun python -i jobs/spectrogram_normed.py primusrun nvidia-smi
CUDA_VISIBLE_DEVICES= primusrun python jobs/spectrogram_normed.py
Fixing kernel version because the NVIDIA drivers don’t always seem to exist
Using python library path: /home/dan/.virtualenvs/keras2/lib/python3.5/site-packages Do you wish to build TensorFlow with MKL support? [y/N] No MKL support will be enabled for TensorFlow Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Do you wish to use jemalloc as the malloc implementation? [Y/n] jemalloc enabled Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] No Google Cloud Platform support will be enabled for TensorFlow Do you wish to build TensorFlow with Hadoop File System support? [y/N] No Hadoop File System support will be enabled for TensorFlow Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] y XLA JIT support will be enabled for TensorFlow Do you wish to build TensorFlow with VERBS support? [y/N] No VERBS support will be enabled for TensorFlow Do you wish to build TensorFlow with OpenCL support? [y/N] No OpenCL support will be enabled for TensorFlow Do you wish to build TensorFlow with CUDA support? [y/N] y CUDA support will be enabled for TensorFlow Do you want to use clang as CUDA compiler? [y/N] nvcc will be used as CUDA compiler Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0 Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-8.0 Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Please specify the cuDNN version you want to use. [Leave empty to use system default]: 6.0.21 Please specify the location where cuDNN 6.0.21 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-8.0]: /usr/include/x86_64-linux-gnu Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: "3.5,5.2"]: 6.1 Do you wish to build TensorFlow with MPI support? [y/N] MPI support will not be enabled for TensorFlow ........ INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes. Configuration finished
sudo add-apt-repository ppa:openrazer/stable sudo add-apt-repository ppa:lah7/polychromatic sudo apt update sudo apt install polychromatic sudo gpasswd -a plugdev user