Build Secure and Governed Microservices with Kafka Streams
Tuesday, January 29, 2019
Tuesday, January 29, 2019
In this post, we shall discuss how Kafka Streams can be used to build microservices application with the help of a use case.
Apache Kafka is a community distributed streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged streaming platform.
A streaming platform would not be complete without the ability to manipulate that data as it arrives. The Streams API within Apache Kafka is a powerful, lightweight library that allows for on-the-fly processing, letting you aggregate, create windowing parameters, perform joins of data within a stream, and more. Perhaps best of all, it is built as a Java application on top of Kafka, keeping your workflow intact with no extra clusters to maintain.
Kafka Streams, a component of open source Apache Kafka. Kafka Streams is a powerful, easy-to-use library for building highly scalable, fault-tolerant, distributed stream processing applications on top of Apache Kafka. It builds upon important concepts for stream processing such as properly distinguishing between event-time and processing-time, handling of late-arriving data, and efficient management of application state.
The following list highlights several key capabilities and aspects of Kafka Streams that make it a compelling choice for building use cases such as stream processing applications, continuous queries and transformations, and microservices.
A car fleet company has their fleet of cars with sensors. The geo-event sensor captures important events from the truck (e.g. lane change, breaking, start/stop, acceleration along with its geolocation) and the speed sensor captures the speed of the truck at different intervals. These two sensors are streaming data into their own respective kafka topics: syndicate-geo-event-avro and syndicate-speed-event-avro.
The requirements for the use case are the following:
Using Kafka Streams, we can implement these requirements with set of light-weight microservices that are highly decoupled and independently scalable.
Kafka Streams has rich support for joins and provides compositional simple APIs to do stream-to-stream joins and stream-to-table joins using the KStream and KTable abstractions.
The below diagram illustrates how JoinFilterGeoSpeed MicroService was implemented using the join support in Kafka Streams as well as integration with Schema Registry to deserialize the events from the source kafka topics.
The code for this microservice can be found here: JoinFilterMicroService.
Group records that have the same key for stateful operations such as joins or aggregations can be controlled into so called windows through windowing. Four types of windowing are supported in Kafka streams including: tumbling, hopping, sliding and session windows.
The below diagram illustrates how tumbling windows was used in CalculateDriverAvgSpeed MicroService to calculate the average speed of driver over a 3 minute window.
The code for this microservice can be found here: CalculateDriverAvgSpeedMicroService.
The third Microservice called AlertSpeedingDrivers filters the stream for drivers who are speeding over that three minute window.
The code for this microservice can be found here: AlertSpeedingDriversMicroService.java
One of the key benefits of using Kafka Streams over other streaming engines is that the stream processing apps / microservices don’t need a cluster. Rather, each microservice can be run as a standalone app (e.g: jvm container). You can then spin multiple instances of each to scale up the microservice. Kafka will treat this as a single consumer group with multiple instances. Kafka streams takes care of consumer partition reassignments for scalability.
You can see how to start these three microservices here.
The last two requirements for the trucking fleet application have to do with security and audit. For the JoinFilter MicroService, let’s distill these into the following more granular auth/authz requirements:
For Req #1, when we start up the microservice we configure the jvm parameter java.security.auth.login.config to point to the following jaas file. This jaas file contains the principal named truck_join_filter_microservice with its associated keytab that the microservice will use when connecting to kafka resources.
For Req # 2, 3 and 4, we use Ranger to configure the ACL policies.
For Req #5, Ranger provides audit services by indexing via Solr all access logs to kafka resources.
Latest Thought's
Categories