In our previous articles, we explored why Apache Kafka exists and how its architecture is structured. In this third part, we address the question every engineer asks when they first see the benchmarks: How can a disk-based system consistently outperform in-memory message brokers?
Kafka’s ability to process millions of messages per second with sub-millisecond latency is the result of mechanical sympathy – designing software that aligns closely with the physical realities of modern hardware and operating systems. Kafka does not fight the machine; it leverages it.
- 1. Log-Structured Storage: Sequential Writes vs Random I/O
- 2. Page Cache: The OS as a High-Throughput Buffer
- 3. Zero-Copy: Bypassing the CPU
- 4. Reliability vs Speed: The Acks and Durability Trade-off
- 5. ISR, High Watermark, and Replica Throttling
- 6. Batching, Compression, and Write Amplification
- 7. Partitioning: Scaling Through Parallelism
- 8. Retention: Why It Remains Fast at Scale
- 9. When Kafka Is NOT Fast (The Battle-Tested Reality)
- Conclusion: The Philosophy of Speed
1. Log-Structured Storage: Sequential Writes vs Random I/O
The belief that “disk is slow” applies only to random access. Although not as fast as sequential RAM access, sequential disk I/O delivers throughput orders of magnitude higher than random access, often saturating the physical bus. Kafka employs a log-structured storage model with strictly append-only writes.
- No Disk Seeks: The disk head does not need to jump; it writes linearly.
- Minimal Fragmentation: Partitions grow as contiguous files, avoiding the overhead of complex B-Tree or LSM-Tree index updates.
- Flush Policy Nuances: By default, Kafka does not call fsync after every write. It relies on the operating system to flush dirty pages to disk asynchronously. This shifts the durability guarantee from a single disk sync to distributed replication.
Expert Insight: In cloud environments, be aware of disk variability. EBS (AWS) and Managed Disks (Azure) have IOPS and throughput limits. A burstable disk may provide full performance for ten minutes and then hit a hard throttle, causing a sudden spike in Kafka producer latency.
2. Page Cache: The OS as a High-Throughput Buffer
Kafka bypasses heavyweight application-level caching and relies on the Linux Page Cache.
- Hot Data in RAM: Real-time reads are served directly from kernel memory.
- Zero GC Pressure: By avoiding the JVM heap for message caching, Kafka eliminates the “stop-the-world” Garbage Collection pauses that affect in-memory brokers.
Expert Recommendation: Do not over-allocate the JVM heap. For a 64 GB server, 6–10 GB for the Kafka process is usually the “sweet spot”. The remaining RAM should be left to the OS Page Cache to maximise I/O efficiency.
3. Zero-Copy: Bypassing the CPU
Standard brokers transfer data from disk to network through four copies and multiple context switches. Kafka uses Zero-Copy (via the sendfile() system call), moving data from the Page Cache directly to the Network Interface Card (NIC) buffer.
This bypasses the application (User Space) entirely. The CPU remains idle during the transfer, allowing the broker to saturate a 25 Gbps link effortlessly. However, note that Exactly-Once Semantics (EOS) and Encryption (SSL/TLS) introduce overhead here, as data must be decrypted or processed in user space, partially negating the Zero-Copy benefit.o-CO
4. Reliability vs Speed: The Acks and Durability Trade-off
Performance in Kafka is directly related to your durability SLA, which is controlled via acks and min.insync.replicas.
| acks setting | Behaviour | Performance Impact |
| 0 | Producer doesn’t wait for acknowledgement. | Maximum throughput, highest risk. |
| 1 | Leader writes locally and acknowledges. | Balanced. Fast, but vulnerable to leader failure. |
| all (-1) | Leader waits for all ISRs to acknowledge. | Highest durability, lowest throughput. |
The Latency Factor: In multi-AZ (Availability Zone) clusters, acks=all requires the producer to wait for a network round-trip between data centres. Even with high-speed fibre, this physical distance imposes a latency floor that no software tuning can eliminate.
5. ISR, High Watermark, and Replica Throttling
To ensure consistency, Kafka uses the In-Sync Replicas (ISR) mechanism.
- ISR: The subset of replicas currently caught up with the leader.
- High Watermark: The offset up to which data is safely replicated. Consumers can only read up to this point.
War Story: Most production Kafka incidents are caused not by disk failure, but by metadata overload or replication imbalance. When a broker returns after a failure, it must resynchronise. Without defined follower.replication.throttled.rate and leader.replication.throttled.rate (replication quotas), the resync process can consume all available bandwidth, effectively DOSing your client traffic.
6. Batching, Compression, and Write Amplification
Small I/O operations are problematic. Kafka groups messages into batches, but this leads to significant write amplification.
The Multiplier Effect: In a typical cluster with a replication factor (RF) of 3, every 1 MB of data sent by a producer results in a total of 3 MB of network traffic and 3 MB of disk I/O across the brokers. Notably inter-broker traffic (replication) is often twice the volume of client-to-broker traffic.
- Idempotence Overhead: Setting
enable.idempotence=trueadds sequence numbers to every batch. While essential for exactly-once semantics (EOS), it slightly increases header size. - Compression Efficiency: By setting
linger.ms=5, Kafka can compress a 16 KB batch instead of 1 KB messages. Using Zstd or LZ4 on larger batches can reduce inter-broker network costs by 60–80%.
7. Partitioning: Scaling Through Parallelism
This batching efficiency enables partitions to scale. As each partition is an independent append-only log, Kafka can distribute writes across hundreds of disks. However, partitions are not “free”.
- The Controller Storm: A single broker manages partition leadership. If you have 50,000 partitions on a 5-broker cluster, a single failure triggers a “metadata storm”, stalling client requests while the Controller updates thousands of ZK/KRaft entries.
8. Retention: Why It Remains Fast at Scale
Unlike traditional queues that slow down as they fill up, Kafka maintains consistent performance regardless of data size.
Kafka uses log segmentation, storing data in segment files (typically 1 GB). When retention limits are reached, Kafka deletes entire files from the file system. There is no per-message deletion or “vacuuming” process. This ensures stable O(1) write and read performance whether you are storing 1 GB or 1 PB.
9. When Kafka Is NOT Fast (The Battle-Tested Reality)
- Metadata Overload: High partition counts cause election delays and increased memory pressure on the Controller.
- Cold Data Reads: Requests for data older than the Page Cache require physical disk reads, disrupting Zero-Copy efficiency.
- Small Message Jitter: Sending 1-byte messages with
linger.ms=0bottlenecks the CPU on system calls long before the disk or network is saturated.
Conclusion: The Philosophy of Speed
Kafka is not fast because it has “better code” than traditional brokers. Kafka is fast because it deliberately omits features that traditional queues consider essential. It discards individual message acknowledgements, ignores random-access querying, and shifts the complexity of state management entirely to the clients.
From a strategic perspective, choosing Kafka means trading flexibility for raw, mechanical efficiency. It is designed to be a high-performance backbone, not a general-purpose database.
In Part 4, we move up the stack to Kafka Connect and Kafka Streams, exploring how to build robust data pipelines and stateful computations without writing custom integration code.