I want to do machine learning without the cloud, which as we learned previously, is awful.
But also I’m a vagabond with nowhere safe and high-bandwidth to store a giant GPU machine (campus IT don’t return my calls about it.)
So, I buy a Razer Blade 2016, a portable, surprisingly cheap laptop with all the latest feature and a comparable performance to the kind of single-GPU desktop machine I could afford. Similar steps would probably work for any machine where you run a GUI and your GPU computation on the same machine.
I don’t want to do anything fancy here, just process a few gigabytes of MP3 data, and prototype some algorithms to finally run on a rented GPU in the cloud. It’s all quite simple, but I need to prototype the algorithms on a GPU before I try to get the cloud GPUs working.
This involves installing and enabling access to the GPU libraries, then compiling my ML software (e.g. tensorflow) to take best advantage of them.
The CUDA Bit
Installing CUDA etc on the laptop is needlessly long but straightforward. Making it run is more tedious.
There are a few moving pieces here. You need
- Driver for the graphics card
- CUDA libraries to enable computation for the graphics card so that tensorflow can do computation on the GPU
- Bumblebee, which allows you to switch off the graphics card except for your desired applications (e.g.) tensorflow, because otherwise everything seems to ork but crashes randomly.
One might not be aware of step 3 if you use cloud GPUs because they are not setup by default to use the GPU for rendering the screen, so the conflict does not arise.
Get deps, by downloading the deb then installing.
sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub sudo apt-get update sudo apt-get install cuda-9-1
NVIDIA’s howto suggests that you will need to care about Bumblebee to survive Ubuntu. This is probably misleading. Like many sources, it assumes I want to have NVIDIA graphics; but what if I merely want NVIDIA computing via CUDA and don’t care about graphics?
In contrast, AskUbuntu claims:
There is sometimes confusion about CUDA. You don’t need Bumblebee to run CUDA. Follow the How-to to get CUDA working under Ubuntu.
This might be true but it is not useful. Without Bumblebee the NVIDIA either gets switched off, or you get various other applications claiming GPU memory and fighting with Tensorflow, causing strange crashes in the middle of complicated jobs. Tensorflow wants the whole GPU. You need bumblebee in order to selectively allow GPU access to bad GPU citizens like Tensorflow.
The second Askubuntu point is interesting
There is however a new feature (
optirun) in Bumblebee 3.2, which makes it possible to run CUDA / OpenCL applications that does not [sic] need the graphics rendering capabilities.
Should try that.
Bumblebee is its own small world. There is a walkthrough for a Razer Blade. The most up-to-date instructions are simpler though, and I recommend these.
NB: the next part has not been updated for a while. I will revise it next time I reinstall.
It has a debugging page, which you will need. Daniel Teichmann’s Bumblebee instructions.
Install bumblebee via the ppa (NB the stable version is too old. as of 2018-03)
Tensorflow ACPI interaction.
Turn Off Discrete nVidia Optimus Graphics Card in Ubuntu.
Webupd8 bumblebee howto, or one with fewer manual steps from PC suggest
Monitoring NVIDIA power without using the NVIDIA
Boot without graphics in Ubuntu.
Things change in Ubuntu 18.04 Bionic Beaver.
The master bug thread tracking the configuration changes is here. There is a scruffier thread on the Bumblebee repo. AFAICT this is partially automated in the bumlebee-nvidia package.
Now you need to build tensorflow to benefit from this
Now I need to install bazel and a whole bunch of java. This will also be useful if you wish to do discount imitations of other google infrastructure..
sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python openjdk-8-jdk echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - sudo apt-get update && sudo apt-get install bazel
Pinning kernel version because the NVIDIA drivers don’t always seem to exist
This occurred a couple of times for me; I think compatible versions of something didn’t match with my manually selected versions of some other thing i don’t care why would anyone care about this? Sidestepping the issue by pinning grub default to boot the good kernel seemed to work, although i haven’t needed to recently, for reasons i will make no attempt to discover.
Avoiding the fiddly bits with anaconda
Maybe this will work:
conda create -n tensorflow pip python=3.6 source activate tensorflow conda install -c anaconda tensorflow-gpu