Introduction to Kafka
This post is 101
session on Kafka. I will be covering few more advanced Kafka topics in later posts.
The idea behind asynchronous processing
is, caller is not waiting for immediate response or in some cases, we need to send events to another system to perform some action.
For example, with an order processing system
, if the inventory for a specific item is lower that certain threshold while processing the order, we can send an event to inventory system
to restock that specific item.
There are many messaging platforms in the industry, Examples are RabbitMQ
, ActiveMQ
, IBM MQ
, JMS
etc…
Selection of which Messaging platform to use will be purely depending on the use case and following factors can play crucial role:
- Broadcast message to multiple consumers
- Horizontal scaling of producers and consumers
- Parallel Processing of messages.
- Streaming Ability
- Message Ordering
For example, if we need message filtering, Kafka might not support it directly, (you can support using multiple topics and filtered streaming), you might need to look for other messaging platforms like RabbitMQ.
Kafka Highlights:
- Kafka is a distributed messaging platform.
- Kafka can store messages durably and reliably for the defined period.
- Kafka provides the ability to publish/subscribe to messages.
- Kafka provides the ability to replay the messages from specific position/from start.
- Senders/Receivers are not bound to discuss in same language.
- Kafka provides parallel processing mechanism for consumers.
Kafka Eco System:
- Producers: Producers act as clients can send messages to Kafka without even noticing who consumes it.
- Consumers: Consumers can subscribe to the topic and receive messages from Brokers. Consumers are typically part of a consumer group.
- Brokers: Integral component of Kafka ecosystem. Responsible for receiving messages from producers and responsible for maintaining coordinating with other message brokers.
- Clusters: Multiple brokers coordinating together will form a Kafka cluster. Easy to handle to add/remove brokers from the cluster.
- Zookeeper: Zookeeper is a distributed store that provides discovery aspect for Kafka brokers. (There are plans to remove Zookeeper in later versions of Kafka).
Kafka Components:
- Topic: Messages in Kafka are categorized into topics. Topic is almost similar to a table which logically divides the messages per use case.
- Offset: Offset is the position of consumer to track the position.
- Partition: Topics are additionally broken down into partitions to provide redundancy, scalability and parallelism behavior
- Replication: Each message is replicated across the cluster to maintain high availability and broker loss scenarios.
Kafka Concepts:
- Leader: One of the broker will act as a leader for partition and will be responsible for writes and replication to other brokers.
- Followers: Followers are also brokers, which are responsible to be in sync with leader replicas, and in case of leader node failure case, they will be elected as leader
- Commit Log: Messages are appended to the end of a file
- Group Coordinator: One of the broker will be designated as Group coordinator which takes care of rebalances and managing consumers.
Kafka Java Client Examples:
Producer Example:
Writing a producer is pretty easy. Kafka provider client jar file.
- Create
Properties
specifying the Kafka brokers, how to serialize the key and message. - Create a
Record
with topic name and message. - Send the message using
Kafka Producer
, if we don’t specify a key, message will be sent to random partition. Acknowledgment
from Kafka can be synchronous or asynchronous with callback handler.
Consumer Example:
- Create
Properties
specifying Kafka brokers, how to deserialize the key and message. We can specifyconsumer group
property, which can be used to represent set of consumers working asparallel
under one single group. - We can
subscribe
to single topic or multiple topics. Also we can specifyregex pattern
based on topic name to subscribe and retrieve messages from Kafka brokers. - Since it’s stream of data, we need to handle continuous flow. A
Consumer
canpoll
Kafka and retrieve a batch of records based onbatch.size
property. - The retrieve record will have current
offset
,partition
number and themessage
which can be used to process the business logic.
I will cover Kafka Architecture
in next blog. Please let me know your feedback in comments and if you want me to cover any extra content.