Introduction to Kafka

3 min readFeb 19, 2022

This post is 101 session on Kafka. I will be covering few more advanced Kafka topics in later posts.

The idea behind asynchronous processing is, caller is not waiting for immediate response or in some cases, we need to send events to another system to perform some action.

For example, with an order processing system, if the inventory for a specific item is lower that certain threshold while processing the order, we can send an event to inventory system to restock that specific item.

There are many messaging platforms in the industry, Examples are RabbitMQ, ActiveMQ, IBM MQ, JMS etc…

Selection of which Messaging platform to use will be purely depending on the use case and following factors can play crucial role:

Broadcast message to multiple consumers
Horizontal scaling of producers and consumers
Parallel Processing of messages.
Streaming Ability
Message Ordering

For example, if we need message filtering, Kafka might not support it directly, (you can support using multiple topics and filtered streaming), you might need to look for other messaging platforms like RabbitMQ.

Kafka Highlights:

Kafka is a distributed messaging platform.
Kafka can store messages durably and reliably for the defined period.
Kafka provides the ability to publish/subscribe to messages.
Kafka provides the ability to replay the messages from specific position/from start.
Senders/Receivers are not bound to discuss in same language.
Kafka provides parallel processing mechanism for consumers.

Kafka Eco System:

Producers: Producers act as clients can send messages to Kafka without even noticing who consumes it.
Consumers: Consumers can subscribe to the topic and receive messages from Brokers. Consumers are typically part of a consumer group.
Brokers: Integral component of Kafka ecosystem. Responsible for receiving messages from producers and responsible for maintaining coordinating with other message brokers.
Clusters: Multiple brokers coordinating together will form a Kafka cluster. Easy to handle to add/remove brokers from the cluster.
Zookeeper: Zookeeper is a distributed store that provides discovery aspect for Kafka brokers. (There are plans to remove Zookeeper in later versions of Kafka).

Kafka Components:

Topic: Messages in Kafka are categorized into topics. Topic is almost similar to a table which logically divides the messages per use case.
Offset: Offset is the position of consumer to track the position.
Partition: Topics are additionally broken down into partitions to provide redundancy, scalability and parallelism behavior
Replication: Each message is replicated across the cluster to maintain high availability and broker loss scenarios.

Kafka Concepts:

Leader: One of the broker will act as a leader for partition and will be responsible for writes and replication to other brokers.
Followers: Followers are also brokers, which are responsible to be in sync with leader replicas, and in case of leader node failure case, they will be elected as leader
Commit Log: Messages are appended to the end of a file
Group Coordinator: One of the broker will be designated as Group coordinator which takes care of rebalances and managing consumers.

Kafka Java Client Examples:

Producer Example:

Writing a producer is pretty easy. Kafka provider client jar file.

Create Properties specifying the Kafka brokers, how to serialize the key and message.
Create a Record with topic name and message.
Send the message using Kafka Producer, if we don’t specify a key, message will be sent to random partition.
Acknowledgment from Kafka can be synchronous or asynchronous with callback handler.

Kafka Producer Consumer Code

Consumer Example:

Create Properties specifying Kafka brokers, how to deserialize the key and message. We can specify consumer group property, which can be used to represent set of consumers working as parallel under one single group.
We can subscribe to single topic or multiple topics. Also we can specify regex pattern based on topic name to subscribe and retrieve messages from Kafka brokers.
Since it’s stream of data, we need to handle continuous flow. A Consumer can poll Kafka and retrieve a batch of records based on batch.size property.
The retrieve record will have current offset, partition number and the message which can be used to process the business logic.

Kafka Java Consumer Code

I will cover Kafka Architecture in next blog. Please let me know your feedback in comments and if you want me to cover any extra content.