In Part 1 of this series, we explained why organisations adopt Apache Kafka and why traditional integration methods fail under complexity.
Now we move from why to how. This second article explores Kafka’s internal architecture – not as a list of components, but as a coherent design philosophy that enables the system to scale and withstand failure.
Kafka becomes intuitive only when you understand this architecture. Until then, it behaves like a black box.
- 1. Architecture as the Foundation of Kafka’s Behaviour
- 2. Topics: Streams of Immutable Events
- 3. Partitions: Kafka’s Scalability Mechanism
- 4. Offsets: How Consumers Control the Timeline
- 5. Producers: Controlled Routing Without Tight Coupling
- 6. Consumers and Consumer Groups: Parallelism Without Fragility
- 7. Brokers and Replication: A Cluster Without Single Points of Failure
- 8. ZooKeeper vs. KRaft: The Evolution of Coordination
- 9. Ordering and Delivery Semantics: Guarantees with Trade-Offs
- 10. Reliability Through Design, Not Luck
- Conclusion: Architecture Before Performance
1. Architecture as the Foundation of Kafka’s Behaviour
Apache Kafka is built on one premise: failure is normal.
Nodes fail, networks split, throughput spikes, and consumers fall behind. Instead of ignoring this reality, Kafka embraces it and makes resilience a primary design objective. This is what differentiates Kafka from traditional brokers and queues.
Successful Kafka adoption begins with accepting this mindset. Everything else – performance, scalability, and operational predictability – follows from it.
2. Topics: Streams of Immutable Events
A topic is Kafka’s fundamental abstraction, representing an ordered stream of events within a specific domain. Unlike queues or tables, a topic is append-only and immutable.
This means:
- consumed events are not removed,
- historical events remain available for replay,
- downstream systems can read at their own pace.
This immutability creates a clear separation between producers, consumers, and the business domain timeline. A well-designed topic structure often reflects domain-driven boundaries such as orders, payments, or shipments, rather than technical subsystems.
3. Partitions: Kafka’s Scalability Mechanism
Scalability in Apache Kafka is achieved through partitioning.
Every topic consists of one or more partitions, each representing a separate ordered log.
Understanding partitions is essential because they determine:
- maximum throughput (more partitions mean more parallelism),
- ordering guarantees (ordering is preserved within a partition),
- consumer scaling (each partition in a consumer group is processed by exactly one consumer).
Kafka deliberately avoids global ordering across a topic, as global ordering would limit throughput. By making ordering local to partitions, Kafka achieves horizontal scaling without central coordination.
Partition count is therefore not a minor configuration option; it is an architectural decision with long-term consequences.
4. Offsets: How Consumers Control the Timeline
Instead of tracking which messages have been consumed, Kafka allows consumers to track their own position using offsets. An offset is simply a position in the partition’s log.
This design offers capabilities that queues cannot provide:
- the ability to replay events at any time,
- independent consumer groups reading the same topic for different workloads,
- auditability and reconstructable history,
- resilience against consumer failures (offsets can be reloaded or reset).
Offsets make Kafka a distributed log rather than a traditional broker. This is fundamental for CDC pipelines, event sourcing, and real-time analytics.
5. Producers: Controlled Routing Without Tight Coupling
Producers publish data to Kafka. Their main architectural responsibility is partition selection. This is determined either by:
- a key, which ensures related events remain in the same partition, or
- a distribution strategy, often round-robin for maximum parallelism.
This approach allows Kafka to maintain ordering where necessary while still achieving scale. Crucially, producers are completely unaware of consumers – a central reason Kafka decouples systems so effectively.
6. Consumers and Consumer Groups: Parallelism Without Fragility
Kafka consumers process events from partitions. A consumer group is a coordinated set of consumers that share the workload of a topic.
The rule is simple: one partition is processed by one consumer within a group.
This enables horizontal scaling without risking duplicate work or inconsistent ordering.
Different consumer groups can read the same data independently, enabling architectures where transactional processing, analytics, and monitoring pipelines coexist without interfering with each other.
This fan-out capability is one of Apache Kafka’s most powerful features.
7. Brokers and Replication: A Cluster Without Single Points of Failure
A Kafka cluster is consists of multiple brokers, each storing a subset of partitions. Partition replicas are distributed across brokers to ensure durability.
Every partition has:
- a leader, which handles reads and writes,
- one or more followers, which replicate data.
If a leader broker fails, Kafka automatically elects a follower as the new leader. This makes failure handling predictable, mechanical, and fast, without manual intervention.
Kafka’s brokers are intentionally symmetrical; there is no master node. This symmetry eliminates many operational risks found in traditional systems.
8. ZooKeeper vs. KRaft: The Evolution of Coordination
Historically, Kafka relied on ZooKeeper to manage metadata and elections. While effective, ZooKeeper introduced operational complexity and required careful coordination.
Kafka’s modern architecture replaces Zookeeper with KRaft, a Raft-based internal consensus layer. KRaft simplifies deployment, reduces dependencies, and provides clearer state management. For new deployments, KRaft is generally the recommended default.
9. Ordering and Delivery Semantics: Guarantees with Trade-Offs
Apache Kafka supports multiple delivery semantics – at-most-once, at-least-once, and exactly-once – but these guarantees are not automatic. They result from configuration choices, keying strategy, and consumer behaviour.
Ordering is guaranteed only within a partition. Kafka does not provide global ordering, as this would severely limit scalability.
Successful Kafka architectures are based on understanding these constraints, not resisting them.
10. Reliability Through Design, Not Luck
Kafka’s fault tolerance relies on replication, leadership failover, and deterministic rebalancing. When a broker fails, partitions are reassigned, consumers adapt, and the system continues to operate.
Failure is routine, not catastrophic. This philosophy enables Kafka to support mission-critical data flows at large scale.
Conclusion: Architecture Before Performance
Apache Kafka’s architecture underpins its performance profile, scalability, resilience, and operational complexity.
Topics, partitions, offsets, replication, and consumer groups are not mere implementation details; they are intentional design choices that enable Kafka to serve as the backbone of modern data platforms.
In Part 3, we will address the next logical step: why Kafka is so fast, and how log-structured storage, page cache, zero-copy transfers, and retention policies combine to deliver exceptional throughput.
Baremon helps organisations design and operate Apache Kafka architectures that remain robust, predictable, and maintainable as they scale.