What is a Continuous Application?

What is a Continuous Application?

Databricks website says this "[w]e define a continuous application as an end-to-end application that reacts to data in real-time."

The proper term should probably be "continual application" as there may be discrete moments where no data is coming in. Many streams can be interrupted. In fact, Structured Streaming, an aspect of Spark that is used in Databricks' "continuous applications" is based on microbatching (according to this site). A series of microbatches would be a continual, not a continuous process.

In their article that defines "continuous application," Databricks refers to "Structured Streaming" as an integral component of such an application. They say this: "Our long-term vision for streaming in Spark is ambitious: we want every library in Spark to work in an incremental fashion on Structured Streaming" (which can be found here). This "incremental fashion" clearly favors the word "continual" and not "continuous."

We believe Databricks meant by the somewhat vague phrase "end-to-end" as a source of data to its destination. We believe the application itself must adapt to data in real-time. This is different from the data the application presents for it to be considered a "continuous application" (or "continual application").

We like the term "adapt" because Databricks refers to "[o]nline machine learning" as an example in their discussion of a "continuous application" (as you can read here).

Pactpub.com defines "continuous applications" in a way that is distinguished from microbatches. They also refer to such a "continuous application" being centered around the pipeline of data.

"[T]he notion of continuous applications--in contrast to batch processing--emerged, and basically means the composite of batch processing and real-time stream processing with a clear focus of the streaming part being the main driver of the application, and just accessing the data created or processed by batch processes for further augmentation. Continuous applications never stop and continuously produce data as new data arrives." (The source of this quote is here.)

Slide 9 of this slide deck show presents a graphic of a continuous application. We take away from the above picture that the "continuous application" embodies the streaming process and it incorporates static data the whole time. We also find that the "continuous application" is consistent with individual batch jobs. Thus the term should be "continual application."

We find some semblance of a definition of "continuous application" from a source outside of Databricks with the www.oreilly.com site:

"In addition to the venerable Lambda Architecture, emerging systems like Apache Flink, Kafka, Apache Beam and Spark 2.0’s upcoming Structured Streaming offer new ways to provide more principled implementations of continuous applications. These systems hold promise for applications where the analytics can be pre-computed, often involving simple aggregations and specialized “checkpoint” models of application state. In general, these systems are based on low-level programming primitives.

But what about continuous applications that need to support complex analytics, including joins, data science models, even analysis over various periods of history? We look at options for building complex analytic big data applications including tradeoffs for simplicity, completeness, and changing semantics over time backed by rich query engines."

The above paragraphs were taken from oreilly.com.

We add this note to help define "continuous application": "A Continuous Application is an end-to-end application that reacts to data in real-time. But it is more than a typical event-based streaming app. Continuous applications capture input streams, blend them when static/offline data and sometimes apply machine learning to the combined data before serving the results back out. These modern applications support quick ad-hoc queries along with long running batch queries." (The source of this quote is linked here.)

Leave a comment

Your email address will not be published.