I want to do machine learning without the cloud, which as we learn previously, is awful.
But also I’m a vagabond with nowhere safe and high-bandwidth to store a giant GPU machine (campus IT don’t return my calls about it; I think they think I’m taking the piss.)
So, let’s buy a Razer Blade 2016, a nice portable, surprisingly cheap laptop with all the latest feature and a comparable performance to the kind of single-GPU desktop machine I could afford.
I don’t want to do anything fancy here, just process a few gigabytes of MP3 data. My data is stored in the AARNET owncloud server. It’s all quite simple, but the algorithm is just too slow without a GPU and I don’t have a GPU machine I can leave running.
Installing CUDA etc on the laptop is straightforward. Making it run is not.
Get deps, by downloading the deb then installing.
sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub sudo apt-get update sudo apt-get install cuda-9-1
Confusing, because most sources seem to think you want to have NVIDIA graphics; but what if you merely want NVIDIA computing via CUDA, and don’t care about graphics?
In stark contrast to NVIDIA, AskUbuntu claims:
There is sometimes confusion about CUDA. You don’t need Bumblebee to run CUDA. Follow the How-to to get CUDA working under Ubuntu.
Lies! without Bumblebee the NVIDIA either gets switched off, or you get various other applications claiming GPU memory and fighting with tensorflow, causing strange crashes in the middle of complicated modelling jobs. Tensorflow wants the whole GPU. You need bumblebee in order to selectively allow GPU access to bad GPU citizens like Tensorflow.
The second Askubuntu point is interesting though
There is however a new feature (--no-xorg option for optirun) in Bumblebee 3.2, which makes it possible to run CUDA / OpenCL applications that does not [sic] need the graphics rendering capabilities.
should try that.
sudo apt install nvidia-prime sudo prime-select intel
sudo add-apt-repository ppa:bumblebee/testing sudo apt-get update
sudo modprobe nvidia-uvm nvidia primusrun python -i jobs/spectrogram_normed.py primusrun nvidia-smi
CUDA_VISIBLE_DEVICES= primusrun python jobs/spectrogram_normed.py
Things change in Ubuntu 18.04 Bionic Beaver.
The master bug thread tracking the configuration changes is here. There is a scruffier thread on the Bumblebee repo. AFAICT this is partially automated in the bumlebee-nvidia package.
So you need to install bazel and a whole bunch of java. This will also be useful if you wish to do discount imitations of other google infrastructure..
sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python openjdk-8-jdk echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - sudo apt-get update && sudo apt-get install bazel
This occurred a couple of times for me; I think compatible versions of something didn’t match with my manually selected versions of some other thing i don’t care why would anyone care about this? Sidestepping the issue by pinning grub default to boot the good kernel seemed to work, although i haven’t needed to recently, for reasons i will make no attempt to discover.
Maybe this will work:
conda create -n tensorflow pip python=3.6 source activate tensorflow conda install -c anaconda tensorflow-gpu