1. java object → in-process memory → kernel pagecache
2. directly write to log file → kernel pagecache
3. message set 批量发送,批量写
4. avoid byte copying,broker直接写文件,OS直接 sendfile
5. Using the zero-copy optimization above, data is copied into pagecache exactly once and reused on each consumption instead of being stored in memory and copied out to user-space every time it is read. 

Zero Copy


Batch Compression

Batch Send

a fixed number of messages and to wait no longer than some fixed latency bound

The Kafka consumer works by issuing "fetch" requests to the brokers leading the partitions it wants to consume. 

Sent vs Consumed

This makes the state about what has been consumed very small, just one number for each partition.

If a producer attempts to publish a message and experiences a network error it cannot be sure if this error happened before or after the message was committed. 


  1. At most once: It can read the messages, then save its position in the log, and finally process the messages
  2. At least once: It can read the messages, process the messages, and finally save its position. 
  3. Exactly once: two-phase commit 

Otherwise, Kafka guarantees at-least-once delivery by default, and allows the user to implement at-most-once delivery by disabling retries on the producer and committing offsets in the consumer prior to processing a batch of messages.


Under non-failure conditions, each partition in Kafka has a single leader and zero or more followers. 

in sync node

The fundamental guarantee a log replication algorithm must provide is that if we tell the client a message is committed, and the leader fails, the new leader we elect must also have that message. 

Kafka dynamically maintains a set of in-sync replicas (ISR) that are caught-up to the leader. Only members of this set are eligible for election as leader

A write to a Kafka partition is not considered committed until all in-sync replicas have received the write.


acks=all in-sync all

partition controller (broker
- Message
    - write to file system cache not disk
    - zero-copy
- Producer
    - msg from same client to same broken is handle in order
- Consumer
- Consumer Group
    - Each Partition is only consumed by one member
    - rebalance
    - group coordinator __consumer_offsets
- Broker
    - keep an open file handle to every segment in every partition - even inactive segments
- Cluster
- Topic
- Partition
    - consumer num DO NOT larger than partition num
- Key
    - Which partition the message will be written to, same key goes to same partition
    - num of partition change? key will go to diffrent partition
- Offset
- Partition leader
- Replication
    - produce and fetch request must be sent to the leader replica of a partition
- in-sync, ack
- message format
    - normal message
    - wrapper message
- index
    - map Offsets to segment files and positions within the file
- compaction

- reliable
    - replication
    - in-sync replica
    - ack
    - commit offset
    - successfully committed

When the consumer starts up, it finds the coordinator for its group and sends a request to join the group. The coordinator then begins a group rebalance so that the new member is assigned its fair share of the group’s partitions. Every rebalance results in a new generation of the group.

partition assginments