Introduction to Kafka

  1. Introduction:

Kafka is an open-source distributed streaming platform developed by Apache Software Foundation. It is written in Java and Scala.

Kafka is used for real-time data streaming, data processing, and message queuing.
It is horizontally scalable, fault-tolerant, and can handle millions of messages per second.

Conclusion: Kafka is a powerful distributed streaming platform that can handle real-time data processing and message queuing at scale.
Kafka has a rich ecosystem of tools and libraries for building data pipelines and stream processing applications.
Understanding Kafka architecture, topics, producers, consumers, consumer groups, message format, Kafka Connect, and Kafka Streams is essential for developing and deploying applications on Kafka.

2. Architecture of Kafka:

Kafka architecture is based on a distributed system. It has four main components: brokers, producers, consumers, and Zookeeper. Brokers are the servers that handle message storage and serving.
Producers are the applications that write data to Kafka. Consumers are the applications that read data from Kafka. Zookeeper is a distributed coordination service that manages brokers and tracks the state of Kafka clusters.

3. Kafka Topics:

Kafka organizes data into topics. A topic is a category (n-1)

topic (n-1) partitions

or feed name to which messages are published. Each topic can have multiple partitions. Partitions allow Kafka to scale horizontally across multiple brokers. Kafka guarantees the order of messages within a partition but not across partitions.

4. Kafka Producers:

Producers are applications that write data to Kafka topics.
Producers can write messages synchronously or asynchronously.
Producers can also specify the partition to which a message is written.
Producers can also define custom partitioning strategies.

5. Kafka Consumers:

Consumers are applications that read data from Kafka topics.
Consumers can read messages from a single partition or multiple partitions.
Consumers can also specify the starting offset from which to read messages.
Consumers can also group together to form a consumer group.

6. Kafka Consumer Groups:

Consumer groups allow multiple consumers to work together to consume messages from a topic. Each partition in a topic can be consumed by only one consumer in a consumer group.
If there are more consumers in a consumer group than partitions, some consumers will be idle.

7. Components of Kafka:

Components are:

Producer – produces messages and can communicate to a specific topic
Topic: a bunch of messages that come under the same topic
Consumer: One who consumes the published data and subscribes to different topics
Brokers: act as a channel between consumers and producers.

8. Kafka Message Format:

Kafka messages are key-value pairs. The key and value are both byte arrays.
Messages can be compressed for efficient storage and transmission.
Messages can also have headers for metadata.

9. Kafka Load Balancing:

The load balancer distributes loads across multiple systems in caseload gets increased by replicating messages on different systems.

10. What is Zookeeper in Kafka?

Q. Can we use Kafka without Zookeeper?
Bypassing Zookeeper and connecting directly to the Kafka broker was not possible. This is because when the Zookeeper is down, it is unable to fulfill client requests.

11. Partition in Kafka:

Kafka topics are separated into partitions, each of which contains records in a fixed order. A unique offset is assigned and attributed to each record in a partition. Multiple partition logs can be found in a single topic.

12. Fault tolerance:

In Kafka, data is stored across multiple nodes in the cluster. There is a high probability of one of the nodes failing. Fault tolerance means that the system is protected and available even when nodes in the cluster fail.

13. Load balancing:

The load balancer distributes loads across multiple systems in caseload gets increased by replicating messages on different systems.

14. Replica:

In Apache Kafka,A replication is a fundamental feature which is designed to ensure fault tolerance and data durability.
The statement you provided refers to how Kafka achieves fault tolerance by replicating partitions across multiple brokers.

Let’s take an example:

Imagine you have a Kafka topic named “example_topic” with three partitions:

Partition 0, Partition 1, and Partition 2 and Have 3 brokers: Broker A, Broker B, and Broker C.

Let’s setting the replication factor to 2. This means that each partition will have two replicas distributed across different brokers.

Partition 0: Replica on Broker A, Replica on Broker B
Partition 1: Replica on Broker B, Replica on Broker C
Partition 2: Replica on Broker C, Replica on Broker A

Fault Tolerance Scenario:

Now, let’s consider Broker A experiences a hardware failure and goes offline. Even though Broker A is down, the replicas on Broker B and Broker C are still available.

Partition 0: Replica on Broker B (active), Replica on Broker C (passive)
Partition 1: Replica on Broker B (active), Replica on Broker C (passive)
Partition 2: Replica on Broker C (active), Replica on Broker A (passive)
In this scenario, even with the failure of Broker A, the data is still accessible because the replicas on other brokers are active and can serve requests.

Data Durability:
Kafka making ensure that the data durability by making each message written to a partition is replicated across multiple brokers before acknowledging the write as successful.

Using this way, even if one or more brokers go down, the data remains available and can be recovered from the replicas.

15. Cluster:

In Kafka, Basically brokers collaborate to manage data streams and ensure high availability.
For an instance, if a topic has three partitions, each partition’s data is replicated across multiple brokers, ensuring fault tolerance.
If one of the broker fails, data remains accessible through replicas on other brokers, guaranteeing continuous operation.

16. Retention Policy: Default 7 days

17. Message Size: 1 MB (Maximum)

Connect with me: