3 scenarios to consume an Apache Kafka Topic with MuleSoft
In this post:
Scenario #2 - Connect to a Topic but without specifying the Partition
Scenario #3 - MuleSoft acting as a cluster and reading from Kafka
Apache Kafka is one of the best options in the market for data streaming. Developed by LinkedIn engineers and then opened up for the open-source community, it has been very well received (and used) by many organizations.
If you need to move/produce streams of data from one point to another, Apache Kafka is the right option for you.
Apache Kafka has simple architecture in terms of how the information is streamed:
Those four concepts are the basis for understanding where and how a producer and/or a consumer can produce/read information from Kafka.
The architecture itself, as we’ve mentioned, is not complex, but the zookeeper dependency may be one of the challenges when deploying this on your own. That is why Confluent is a good option for customers who want to avoid the configuration and maintenance challenges that Apache Kafka may bring to the table.
Producers are any type of application and/or platform that needs to produce information into Kafka. For example:
Log files of an application that you would like to send to Kafka for further analysis and/or consolidation
Geolocation of your fleet of buses that needs to be processed and analyzed in real-time
Information from your Point-of-sale (PoS) that you would like to process in your central office. Imagine a retailer that needs to send information from every branch to the central.
Tweets analysis. Imagine you need to get information from different Twitter channels
As we’ve depicted in the diagram, the minimum unit of configuration is a partition that belongs to a Topic, and Topics belong to Brokers. Topics can be replicated along with the Brokers. And Brokers can act in a cluster fashion.
Messages are published to Topics’ partitions and marked with an offset, which will be very useful for consumers at the time they start to read from the stream.
Kafka is a very powerful platform, and part of that is that it works pretty smart when dealing with the idea of multiple partitions for a Topic.
One of the main Kafka capabilities is the resiliency of the platform as well as the flexibility to read the messages even when one of the brokers may be experiencing an issue, or a consumer is having a problem.
As you can see -in the previous image- a message is published into a Topic Partition (as a producer you can decide where to point to) and an offset is generated to mark the messages, which will help us when we want to read the messages.
You can have single or multiple producers, producing messages on the same topic and the same or different partitions. It will depend on your design. But let’s not just leave it as a generic idea of “depends on your design,” which is true but this type of article has the objective to help you decide on how to use things, and that’s what we are going to do.
This idea of having multiple producers producing messages on the same Topic but under different partitions can be to divide how the consumers will read the information. Let’s get back to the retail scenario where different branches need to send information to the central. Every branch has a group of PoS (Point-of-Sales). The Topic may represent the branch and partitions can represent every PoS. Then, from the central, we can have multiple consumers reading from the same Topic (branch) but pointing to different partitions (PoS), and processing the information individually. Maybe we need to differentiate every single PoS, and therefore we can assign consumers to read from that particular partition.
Another idea is that a Topic represents a group of branches (regions for example), and every partition represents a single branch; you may have a specific consumer reading the information of a specific branch.
As we’ve mentioned, it will depend on how you design it and that is fully related to your use case.