Lazy bookmark for practical details to processing and transforming possibly-infinite streams of data, from signals to parse trees. Disambiguating “transducers”.
Processing of very large data sets that do not fit in core.
I am imagining more general objects than singly-indexed real-valued signals; Tokens, maybe. Classic DSP can be elsewhere. Infrastructure to do this is under message queues.
In statistics and machine learning, this connects with online learning; incorporating data as it comes in, but there you probably care about good out-of-core optimisation algorithms.
See also artificial chemistry, parallel computing. (Streams aren’t necessarily parallel, but they are a convenient way of managing asynchrony.)
CSP/ reactive programming
A parallelism/streaming thing. Communicating Sequential Processes.
CSP and transducers — I don’t think these are transducers as I understand them, i.e. stack machines, but I could be wrong. See also the expositions from the clojure authors:
ReactiveX is a particular stream processing paradigm with implementations for many languages
We believe that a coherent approach to systems architecture is needed, and we believe that all necessary aspects are already recognized individually: we want systems that are Responsive, Resilient, Elastic and Message Driven. We call these Reactive Systems.
Systems built as Reactive Systems are more flexible, loosely-coupled and scalable. This makes them easier to develop and amenable to change. They are significantly more tolerant of failure and when failure does occur they meet it with elegance rather than disaster. Reactive Systems are highly responsive, giving users effective interactive feedback.
|||(Why manifesto? Because “design pattern” isn’t as cool as “manifesto” this year, and staying current is a buzzword Red Queen race.)|
- intro to reactive programming
- Rx.js is the Microsoft one, which I use myself.
- flyd is a minimal one with a very functional-friendly api
- kefir also tries to be minimal
- of course there are more, why not bacon.js?
- task.js seems to be a coroutine-like hack for JS generators, by Mozillla
- Node.js has (maybe?) a front-runner co-routine implementation, node-fibers which might be ok.
- The incredibly-similar-and-similarly-named transducers-js also works. * extra howto
- Heterogeneous, pretty-lookin’ application: node-red
Streaming data analysis
Online, possibly realtime, certainly memory-constrained.
- Loading and parsing log files from a file system.
- Accepting statsd type metrics data for aggregation and forwarding to upstream time series data stores such as graphite or InfluxDB.
- Launching external processes to gather operational data from the local system.
- Performing real time analysis, graphing, and anomaly detection on any data flowing through the Heka pipeline.
- Shipping data from one location to another via the use of an external transport (such as AMQP) or directly (via TCP).
- Delivering processed data to one or more persistent data stores.
Written in Go, plugins in Lua.
- Storm-compatible, Heron aims to be Storm-but-more-reliable.