In previous article, we covered how to establish a Kafka connection and how to publish messages into a Kafka topic. This article will cover the other side of operation - how to consume Kafka messages and some important properties which may either process run more efficiently or tumble server when misconfigured.
KafkaReader
component, CloverDX now allows us to connect to a Kafka Topic and consume its data. As in previous article, we’re going to explain this process on a simple example that connects to a Topic, consumes its messages, and inserts them into a database table. To illustrate work with yet another component (KafkaCommit
), we’re going to commit our read to the Kafka broker, setting new message offset. It is not necessary to use KafkaCommit in the solution but we'll get to that later.
Let's first take a look at our example graph, before we get to more detailed explanation of each component's role.
Next, is a Topic we’re going to subscribe to. If we want to read from a single Topic, we can simply provide its name via Topic
property. However, if we want to read from multiple topics, we have a couple of options. We can either:
Topic
property (ex: topic1;topic2;topic3
)Topic Pattern
property. To achieve same result as in first example, we'd use topic[1-3]
If both are provided, the Topic Pattern
property takes precedence.
Consumer group is another required property of Kafka Reader so that we can effectively auto-assign partitions if there are multiple consumers in a group.
The Maximum time for new event
is important efficiency-related property. It defines the amount of time (in seconds) our reader will wait for new messages to appear on our topic before CloverDX decides it has read all available messages and terminates the input stream. Its configuration requires knowledge about Kafka write velocity as one needs to be careful not to set up this parameter too high as that would effectively convert our job into a streaming job which would never end. Some components (like Aggregate or Sort) would keep waiting for end of an input data stream indefinitely.
On the other hand, too short interval could result in very high number of triggered jobs, also having a negative impact on server's performance.
With all of our properties set, we can go ahead and move onto the next step.
The next step of our process is going to take our Kafka message and turn it into a columnar format so that we can send it to our database.
Again, we're going to process our incoming data from our first input port as discrete values (remember my previous article?). Only "trick" here is to have correct metadata attached to output from FlatFileReader
. That is, if data are in CSV format. When in other formats, use different de-serialization component, like JSONExtract
, Reformat
(for Avro), ParquetReader
, etc. to convert data into CloverDX format.
Finally, we can send our data to a DatabaseWriter component that will take our incoming data flow and write it to a Database of our choosing. This will allow us to persist our information. At this point, handling of data does not really differ from any other data source. For simplicity of our example here, we just decided to just write it into a database but this is where you "add your secret sauce".
Kafka maintains a pointer to the index of each partition called an Offset
. Every time we read from a partition, we need to communicate to Kafka on how to update the Offset so that only new messages are read on our next run. This is the process known as Commit
.
If you look closely on Figure 3: Configured KafkaReader, you may notice, checkbox Auto-commit
is unchecked. This is, because in our little example we've gone route 2 of the following paragraph.
In these two articles, we learned how to work with Kafka event streaming in context of CloverDX platform and described small nuances in configuration may affect efficiency. In the future, we may consider looking in how to pair Kafka components with CloverDX Server's Kafka event listener. Watch out for new articles!