Blog Home

Event Streaming Essentials

Overview

Event Streaming is at the heart of how data is transported throughout Reaction’s enterprise applications and in conjunction with our customers’ technology stacks. To accommodate a truly microservice driven architecture, our company has chosen to couple Event Streaming with GraphQL to create an experience that focuses on delivering customers core ecommerce functions and features all through a unified data layer.

But what exactly is Event Streaming and how does it work? How does it relate to ecommerce? This blog post aims to provide context around such questions. Let’s dive in!

The Basics

Event Streaming, also known as Event Stream Processing (ESP), is a technology model designed to empower event driven information systems. In an event-driven system, data is transported in the following sequence comprised of the following core elements:

  • Producer: An action is produced by someone, usually via UI on a software application
  • Broker: Those actions are recorded as Events in a Record Log, this log is the source of truth and is an objective history of all events that have occurred since t = 0. The broker then distributes those events to a Consumer
  • Consumer: Software applications act as subscribers to events that are relevant and pertinent for that specific application or use case. These events are acquired via the Broker and not directly from the Producer

In essence, data is brokered and shuttled in specific parts to subscriber applications who want that specific data to perform a certain function. The notion is simple, but has far reaching consequences in modern data software architecture.

Event streaming applies to a multitude of industries and form-factors including algorithmic trading in the banking and capital markets, radio-frequency ID protection applications in security, and location-based services in telecommunications.

Event Streaming in Software Architecture

In the database and software realms, much of the thought leadership around this technology should be attributed to the thinking and work done by the Apache Kafka team led by Jay Kreps, CEO of Confluent. In an iconic blog post that he committed back in 2013, Kreps highlighted the behemoth of problems that enterprise engineering teams typically run into with data infrastructure and architecture, a data entanglement to say the least:

What on paper looks feasible is actually a painful orchestration of data, coordinating data by making connections between external apps, internal apps, databases, and querying tools. Some data flows are unidirectional, others are bidirectional. As more endpoints are added to the system, the throughput of the system starts to suffer furthermore, and maintaining these endpoints becomes an ever-augmenting nightmare for engineering teams from an architectural and operational standpoint. The reality is that, at scale, this sort of data architecture is untenable and costly.

Taking the event stream paradigm, Kreps' team focused on making data more portable and synchronized, architected to abstractly adhere to a much more elegant approach:

Going back to the earlier mentioned general form of Event Stream Processing model with Consumers, Producers, and Record Log, the core difference between the two exhibits is the notion of a unified log. What Kreps' team visualized above (Exhibit C) was an abstraction of having a unified data log, a central “source of truth” that coordinates the flow of traffic (the data), and is what created the impetus for Kafka. Apache Kafka has since been donated to the Apache Software Foundation and today remains to be a critical open-source project.

Event Streaming Unlocks “Real-Time” Data

Creating a unified log requires a change in thinking as to how folks view the relationship between databases and queries. This relationship impacts the real-time nature of data processing and can be best understood by comparing the traditional Database Paradigm to the new Event Steaming Paradigm.

Database Paradigm
In the traditional database paradigm, data is passively logging in the back. The query is what is actively being used in place by someone to take a snapshot of that data within a designated time-frame. This has traditionally worked well with CRUD applications and usually managed through a simple UI to speed up the querying process to yield data visualizations and insights. This method is also universally understood across technical and business functions within organizations (engineering, IT, business intelligence, etc.). While this model has been more commonplace, it still leaves much to be desired, as setting up and maintaining this system at scale can very much look like the jungle of tangled data, as illustrated in earlier Exhibit B.

This has implications beyond architecture; it affects the data insights and learnings that business intelligence and analytics teams can process effectively and cleanly; if the data is not well enriched to begin with, how can someone derive true actionable insights off of that data?

Event Streaming Paradigm
Going back to the relationship between databases and queries, in event streaming the database is actually active and the query is passive. The query passively processes in the background, continuously and infinitely, with fresh data arriving to the query as new events occur and giving the query something to work with. No one has to manually query the database or run a CRON job in this case. This dynamic lends itself to the real-time nature of data; no snapshots of data are needed, the data is being processed as it occurs, similar to a real-time CCTV footage, and hence the notion of “real-time”.

Event Streaming and E-Commerce

Let’s use a quick (and less abstract) example of event streaming applied to ecommerce! Going back to our basics of the event streaming model, let’s use the following pieces to model a real-world application:

  • Producers: Carts, Orders, and Fulfillment
  • Broker: A layer that logs events performed by the Producer components
  • Consumers: Inventory

The following table highlights events performed by the Producer components that the Consumer component, Inventory, would be interested in subscribing to:

A visualization of the orchestration of all of these pieces would look like the following:

An illustrative route of data sequencing can be seen with Orders, where an ‘OrderCreated’ event occurs and originates from the “Orders” application, whereafter the Kafka records this event, translates it into a standard topic, and then shuttles this topic to the Inventory application. This process would occur continuously in the background so long as “Orders are being created or taking place”. The Inventory application would be the master view that rolls up relevant data from the producer applications and can be a real-time view of those events impacting the Inventory function at the organization.

Taking this further, let’s say that we have another Consumer app that Customer Support agents use at a call-center to monitor these orders, the layout would then be the following:

Note that in this instance, the Customer Support application is subscribing to a different combination of events. The events could be ‘OrderCancelled’ and ‘ItemShipped’ for a customer support agent who wants to confirm with a customer that while their item was shipped, their order was cancelled and now a refund can be processed.

Pairing both examples together from Exhibit E and Exhibit F, you can now see how different functions and departments of an e-commerce organization are using overlapping data sources from certain core Consumers applications to drive their daily functions and operations.

Key Takeaways and Benefits

Event Streaming architecture makes for the most enriched form of data available and affords the following exciting benefits:

  • Portability: Data architecturally moves through a cleaner process in the form of topics and organized for efficient processing through Kafka
  • Synchronization: Since data is processed and queried continuously and converted into topics and move to Kafka continuously, data is inherently more readily available
  • Modularity: Because data is processed more modularly through topics and Kafka, data can be attributed to as many microservices as desired

In our upcoming posts, we will go into further detail on how Reaction Commerce thinks about Event Streams related to our architecture and business model, and how this paradigm is empowering an exciting generation of ecommerce capabilities and collaboration provided by none other than Reaction. Stay tuned!


References:

  1. Confluent. “Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft.” October 1st, 2019. https://www.youtube.com/watch?v=4QoCbhsQeyE&t=647s.
  2. Kreps, Jay. “The Log: What every software engineer should know about real-time data's unifying abstraction.” December, 16, 2013. https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying.
  3. Apache Kafka. “Introduction: Apache Kafka® is a distributed streaming platform. What exactly does that mean?” https://kafka.apache.org/intro
  4. “Event Stream processing.” https://en.wikipedia.org/wiki/Event_stream_processing.



comments powered by Disqus