In this post, we shall discuss how Kafka Streams can be used to build microservices application with the help of a use case.

What is Apache Kafka?

Apache Kafka is a community distributed streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged streaming platform.

A streaming platform would not be complete without the ability to manipulate that data as it arrives. The Streams API within Apache Kafka is a powerful, lightweight library that allows for on-the-fly processing, letting you aggregate, create windowing parameters, perform joins of data within a stream, and more. Perhaps best of all, it is built as a Java application on top of Kafka, keeping your workflow intact with no extra clusters to maintain.

Kafka Streams

Kafka Streams, a component of open source Apache Kafka. Kafka Streams is a powerful, easy-to-use library for building highly scalable, fault-tolerant, distributed stream processing applications on top of Apache Kafka. It builds upon important concepts for stream processing such as properly distinguishing between event-time and processing-time, handling of late-arriving data, and efficient management of application state.

The following list highlights several key capabilities and aspects of Kafka Streams that make it a compelling choice for building use cases such as stream processing applications, continuous queries and transformations, and microservices.

Powerful
  • Highly scalable, elastic, fault-tolerant
  • Stateful and stateless processing
  • Event-time processing with windowing, joins, aggregations
Lightweight
  •  No dedicated cluster required
  • No external dependencies
  • “It’s a library, not a framework.”
Fully integrated
  • 100% compatible with Kafka 0.10.0.0
  • Easy to integrate into existing applications
  • No artificial rules for deploying applications
Real-time
  • Millisecond processing latency
  • Does not micro-batch messages
  • Windowing with out-of-order data
  • Allows for arrival of late data
Use Case – A Car Fleet Company

A car fleet company has their fleet of cars with sensors. The geo-event sensor captures important events from the truck (e.g. lane change, breaking, start/stop, acceleration along with its geolocation) and the speed sensor captures the speed of the truck at different intervals. These two sensors are streaming data into their own respective kafka topics: syndicate-geo-event-avro and syndicate-speed-event-avro.

The requirements for the use case are the following:

  1. Create streams consuming from the two Kafka topics
  2. Join the streams of Geo & Speed sensors over a time based aggregate window.
  3. Apply rules on the stream to filter on events of interest.
  4. Calculate the average speed of driver over 3 minute window and create alert for speeding driver.
  5. Find all the drivers who have been speeding (> 80) over that 3 minute window.
  6. Send alerts for speeding drivers to downstream alert topic.
  7. Apply access control (ACL) to the source of Kafka topics, the alert topic and intermediate topics that are created by Kafka stream apps
  8. Monitor each Microservice providing a view into producers, consumers, brokers and key metrics like consumer group lag, etc.
Kafka Streams Microservices Architecture.

Using Kafka Streams, we can implement these requirements with set of light-weight microservices that are highly decoupled and independently scalable.

Join Support in Kafka Streams & Integration with Schema Registry

Kafka Streams has rich support for joins and provides compositional simple APIs to do stream-to-stream joins and stream-to-table joins using the KStream and KTable abstractions.

The below diagram illustrates how JoinFilterGeoSpeed MicroService was implemented using the join support in Kafka Streams as well as integration with Schema Registry to deserialize the events from the source kafka topics.

The code for this microservice can be found here: JoinFilterMicroService.

Aggregation over Windows and Filtering Streams

Group records that have the same key for stateful operations such as joins or aggregations can be controlled into so called windows through windowing. Four types of windowing are supported in Kafka streams including: tumbling, hopping, sliding and session windows.

The below diagram illustrates how tumbling windows was used in CalculateDriverAvgSpeed MicroService to calculate the average speed of driver over a 3 minute window.

The code for this microservice can be found here: CalculateDriverAvgSpeedMicroService.

The third Microservice called AlertSpeedingDrivers filters the stream for drivers who are speeding over that three minute window.

The code for this microservice can be found here: AlertSpeedingDriversMicroService.java

Running & Scaling the Microservices without a Cluster

One of the key benefits of using Kafka Streams over other streaming engines is that the stream processing apps / microservices don’t need a cluster. Rather, each microservice can be run as a standalone app (e.g: jvm container). You can then spin multiple instances of each to scale up the microservice. Kafka will treat this as a single consumer group with multiple instances. Kafka streams takes care of consumer partition reassignments for scalability.

You can see how to start these three microservices here.

Secure & Auditable Microservices with Ranger, Ambari & Kerberos

The last two requirements for the trucking fleet application have to do with security and audit. For the JoinFilter MicroService, let’s distill these into the following more granular auth/authz requirements:

For Req #1, when we start up the microservice we configure the jvm parameter java.security.auth.login.config to point to the following jaas file. This jaas file contains the principal named truck_join_filter_microservice with its associated keytab that the microservice will use when connecting to kafka resources.

For Req # 2, 3 and 4, we use Ranger to configure the ACL policies.

For Req #5, Ranger provides audit services by indexing via Solr all access logs to kafka resources.

Subscription Center
Subscribe to Our Blog Subscribe to Our Blog