I want to do machine learning without the cloud, which as we learn previously, is awful.
But also I'm a vagabond with nowhere safe and high-bandwidth to store a giant GPU machine (campus IT don't return my calls about it; I think they think I'm taking the piss.)
So, let's buy a Razer Blade 2016, a portable, surprisingly cheap laptop with all the latest feature and a comparable performance to the kind of single-GPU desktop machine I could afford.
I don't want to do anything fancy here, just process a few gigabytes of MP3 data. My data is stored in the AARNET owncloud server. It's all quite simple, but the algorithm is just too slow without a GPU and I don't have a GPU machine I can leave running.
Miscellaneous Razerblade steps
See comfy razer.
The CUDA Bit
Installing CUDA etc on the laptop is straightforward. Making it run is not.
Get deps, by downloading the deb then installing.
sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub sudo apt-get update sudo apt-get install cuda-9-1
NVIDIA's howto points out that you will need to care about Bumblebee to survive Ubuntu.
Confusing, because most sources seem to think you want to have NVIDIA graphics; but what if you merely want NVIDIA computing via CUDA, and don't care about graphics?
In stark contrast to NVIDIA, AskUbuntu claims:
There is sometimes confusion about CUDA. You don't need Bumblebee to run CUDA. Follow the How-to to get CUDA working under Ubuntu.
Lies! without Bumblebee the NVIDIA either gets switched off, or you get various other applications claiming GPU memory and fighting with tensorflow, causing strange crashes in the middle of complicated modelling jobs. Tensorflow wants the whole GPU. You need bumblebee in order to selectively allow GPU access to bad GPU citizens like Tensorflow.
The second Askubuntu point is interesting though
There is however a new feature (
optirun) in Bumblebee 3.2, which makes it possible to run CUDA / OpenCL applications that does not [sic] need the graphics rendering capabilities.
should try that.
Bumblebee is its own small world. There is a walkthrough for a Razer Blade. It has a debugging page, which you will need. Daniel Teichmann's Bumblebee instructions.
sudo apt install nvidia-prime sudo prime-select intel
Install bumblebee via the ppa (NB the stable version is too old. as of 2018-03)
sudo add-apt-repository ppa:bumblebee/testing sudo apt-get update
sudo modprobe nvidia-uvm nvidia primusrun python -i jobs/spectrogram_normed.py primusrun nvidia-smi
CUDA_VISIBLE_DEVICES= primusrun python jobs/spectrogram_normed.py
Tensorflow ACPI interaction.
Turn Off Discrete nVidia Optimus Graphics Card in Ubuntu.
Webupd8 bumblebee howto, or one with fewer manual steps from PC suggest
Monitoring NVIDIA power without using the NVIDIA
Boot without graphics in Ubuntu.
Things change in Ubuntu 18.04 Bionic Beaver.
The master bug thread tracking the configuration changes is here. There is a scruffier thread on the Bumblebee repo. AFAICT this is partially automated in the bumlebee-nvidia package.
Now you need to build tensorflow to benefit from this
So you need to install bazel and a whole bunch of java. This will also be useful if you wish to do discount imitations of other google infrastructure..
sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python openjdk-8-jdk echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - sudo apt-get update && sudo apt-get install bazel
Pinning kernel version because the NVIDIA drivers don't always seem to exist
This occurred a couple of times for me; I think compatible versions of something didn't match with my manually selected versions of some other thing i don't care why would anyone care about this? Sidestepping the issue by pinning grub default to boot the good kernel seemed to work, although i haven't needed to recently, for reasons i will make no attempt to discover.
Avoiding the fiddly bits with anaconda
Maybe this will work:
conda create -n tensorflow pip python=3.6 source activate tensorflow conda install -c anaconda tensorflow-gpu
See comfy ubuntu.