The service provides a software development kit that can be used to build complex pipelines and analysis. Like MapReduce, Cloud Dataflow will initially use the Java programming language. In the future, other languages may be supported.
The pipelines can ingest data from external sources and use them for a variety of things. The service provides a library to prepare and reformat data for further analysis, and users can write their own transformations.
The treated dataset can be queried against using Google's BigQuery service. Or the user can write modules to examine the data as it crosses the wire, to look for aberrant behavior or trends in real-time.
Google announced Cloud Dataflow at the company's Google I/O user conference in San Francisco. A small number of Google customers are testing it and the company plans to open it up as a public preview later this year.
Sign up for Computerworld eNewsletters.