tl;dr Google cloud ML is probably excellent if you design your algorithm from the ground up for it, but if you have some thing that runs perfectly well on your laptop and you wrote it to use, e.g. local files, or modern versions of python, or custom compiled code, then you are going to need to substantially rewrite. NB this all is outdated now.
I want to do cloud machine learning using google’s CloudML offering.
The goal: Working through getting and analysing the magnatagatune dataset in google cloud on my OSX laptop using tensorflow.
I will follow approximately the least-nerview HOWTO, which sadly conviently sidesteps many of my difficulties by having the input data be magically good.
There are too many ways to get the damn thing going.
There is much documentation for all these things, but its often unclear from page to page what the hell is happening, since it’s not clear whether you are spinning up VMs locally or in google’s cloud.
One way is the purely cloud-based
cloud shell, but this is clearly too fragile and restrictive for real usage unless you are in Mountain View.
Offline there is a a docker based thing, called
cloud datalab, which I guess is a somewhat monolithic machine image which approximates the online APIs or something. There is a surly google help page which implies as much.
Or you can install a bunch of python packages from the command line.
Anyway, the terminology page is the clearest explanation of all terms before they get washed out in a combinatorial explosion. Lin Yi-Jhen does the best disambiguation, though.
Do i need this datalab nonsense? I can’t tell. I just want to run tensorflow. Idea: proceed installing stuff randomly until eventually I have finished a deep-learning-based paper.
Cloud datalab thingy
To run Datalab locally.
First you must install docker.
Or must I? I’m so awash right here.
Then do all the actual work.
Getting data into cloud datastore
Complicated and tiresome. Not in the sense that it is too complicated if I really am fitting a million=parameter regression, but too complicated in the sense that I am just one grad student doing a side project I don’t have 2 weeks of coding time to fit their data ingestion workflow.
It’s not a bad workflow as such, it’s just overkill for my small project.
Now, port my python 3 code to python 2
This is an aspirational section; I won’t actually get here.
Oh sod it, just give me a normal virtual machine
ARGH their tensorflow nonsense is melting my brain I just want to use the sweet prototype on my laptop but go slightly faster.
Maybe I can rent a machine?
That IS supported, after all. OK, but how do I get GPUs in it? GPU.
Oh wait! I don’t get GPUs, so my entire motivation for using this google stuff (I had some free credits) is hereby obliterated.
Sod this, I’m going to rent myself a big-ass GPU from Amazon and get this finished.