Files
monibuca/doc/bufreader_analysis.md
2025-10-19 08:03:11 +08:00

693 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# BufReader: Zero-Copy Network Reading with Non-Contiguous Memory Buffers
## Table of Contents
- [1. Problem: Traditional Contiguous Memory Buffer Bottlenecks](#1-problem-traditional-contiguous-memory-buffer-bottlenecks)
- [2. Core Solution: Non-Contiguous Memory Buffer Passing Mechanism](#2-core-solution-non-contiguous-memory-buffer-passing-mechanism)
- [3. Performance Validation](#3-performance-validation)
- [4. Usage Guide](#4-usage-guide)
## TL;DR (Key Takeaways)
**Core Innovation**: Non-Contiguous Memory Buffer Passing Mechanism
- Data stored as **sliced memory blocks**, non-contiguous layout
- Pass references via **ReadRange callback**, zero-copy
- Memory blocks **reused from object pool**, avoiding allocation and GC
**Performance Data** (Streaming server, 100 concurrent streams):
```
bufio.Reader: 79 GB allocated, 134 GCs, 374.6 ns/op
BufReader: 0.6 GB allocated, 2 GCs, 30.29 ns/op
Result: 98.5% GC reduction, 11.6x throughput improvement
```
**Ideal For**: High-concurrency network servers, streaming media, long-running services
---
## 1. Problem: Traditional Contiguous Memory Buffer Bottlenecks
### 1.1 bufio.Reader's Contiguous Memory Model
The standard library `bufio.Reader` uses a **fixed-size contiguous memory buffer**:
```go
type Reader struct {
buf []byte // Single contiguous buffer (e.g., 4KB)
r, w int // Read/write pointers
}
func (b *Reader) Read(p []byte) (n int, err error) {
// Copy from contiguous buffer to target
n = copy(p, b.buf[b.r:b.w]) // Must copy
return
}
```
**Cost of Contiguous Memory**:
```
Reading 16KB data (with 4KB buffer):
Network → bufio buffer → User buffer
↓ (4KB contiguous) ↓
1st [████] → Copy to result[0:4KB]
2nd [████] → Copy to result[4KB:8KB]
3rd [████] → Copy to result[8KB:12KB]
4th [████] → Copy to result[12KB:16KB]
Total: 4 network reads + 4 memory copies
Allocates result (16KB contiguous memory)
```
### 1.2 Issues in High-Concurrency Scenarios
In streaming servers (100 concurrent connections, 30fps each):
```go
// Typical processing pattern
func handleStream(conn net.Conn) {
reader := bufio.NewReaderSize(conn, 4096)
for {
// Allocate contiguous buffer for each packet
packet := make([]byte, 1024) // Allocation 1
n, _ := reader.Read(packet) // Copy 1
// Forward to multiple subscribers
for _, sub := range subscribers {
data := make([]byte, n) // Allocations 2-N
copy(data, packet[:n]) // Copies 2-N
sub.Write(data)
}
}
}
// Performance impact:
// 100 connections × 30fps × (1 + subscribers) allocations = massive temporary memory
// Triggers frequent GC, system instability
```
**Core Problems**:
1. Must maintain contiguous memory layout → Frequent copying
2. Allocate new buffer for each packet → Massive temporary objects
3. Forwarding requires multiple copies → CPU wasted on memory operations
## 2. Core Solution: Non-Contiguous Memory Buffer Passing Mechanism
### 2.1 Design Philosophy
BufReader uses **non-contiguous memory block slices**:
```
No longer require data in contiguous memory:
1. Data scattered across multiple memory blocks (slice)
2. Each block independently managed and reused
3. Pass by reference, no data copying
```
**Core Data Structures**:
```go
type BufReader struct {
Allocator *ScalableMemoryAllocator // Object pool allocator
buf MemoryReader // Memory block slice
}
type MemoryReader struct {
Buffers [][]byte // Multiple memory blocks, non-contiguous!
Size int // Total size
Length int // Readable length
}
```
### 2.2 Non-Contiguous Memory Buffer Model
#### Contiguous vs Non-Contiguous Comparison
```
bufio.Reader (Contiguous Memory):
┌─────────────────────────────────┐
│ 4KB Fixed Buffer │
│ [Read][Available] │
└─────────────────────────────────┘
- Must copy to contiguous target buffer
- Fixed size limitation
- Read portion wastes space
BufReader (Non-Contiguous Memory):
┌──────┐ ┌──────┐ ┌────────┐ ┌──────┐
│Block1│→│Block2│→│ Block3 │→│Block4│
│ 512B │ │ 1KB │ │ 2KB │ │ 3KB │
└──────┘ └──────┘ └────────┘ └──────┘
- Directly pass reference to each block (zero-copy)
- Flexible block sizes
- Recycle immediately after processing
```
#### Memory Block Chain Workflow
```mermaid
sequenceDiagram
participant N as Network
participant P as Object Pool
participant B as BufReader.buf
participant U as User Code
N->>P: 1st read (returns 512B)
P-->>B: Block1 (512B) - from pool or new
B->>B: Buffers = [Block1]
N->>P: 2nd read (returns 1KB)
P-->>B: Block2 (1KB) - reused from pool
B->>B: Buffers = [Block1, Block2]
N->>P: 3rd read (returns 2KB)
P-->>B: Block3 (2KB)
B->>B: Buffers = [Block1, Block2, Block3]
U->>B: ReadRange(4096)
B->>U: yield(Block1) - pass reference
B->>U: yield(Block2) - pass reference
B->>U: yield(Block3) - pass reference
B->>U: yield(Block4[0:512])
U->>B: Processing complete
B->>P: Recycle Block1, Block2, Block3, Block4
Note over P: Memory blocks return to pool for reuse
```
### 2.3 Zero-Copy Passing: ReadRange API
**Core API**:
```go
func (r *BufReader) ReadRange(n int, yield func([]byte)) error
```
**How It Works**:
```go
// Internal implementation (simplified)
func (r *BufReader) ReadRange(n int, yield func([]byte)) error {
remaining := n
// Iterate through memory block slice
for _, block := range r.buf.Buffers {
if remaining <= 0 {
break
}
if len(block) <= remaining {
// Pass entire block
yield(block) // Zero-copy: pass reference directly!
remaining -= len(block)
} else {
// Pass portion
yield(block[:remaining])
remaining = 0
}
}
// Recycle processed blocks
r.recycleFront()
return nil
}
```
**Usage Example**:
```go
// Read 4096 bytes of data
reader.ReadRange(4096, func(chunk []byte) {
// chunk is reference to original memory block
// May be called multiple times with different sized blocks
// e.g.: 512B, 1KB, 2KB, 512B
processData(chunk) // Process directly, zero-copy!
})
// Characteristics:
// - No need to allocate target buffer
// - No need to copy data
// - Each chunk automatically recycled after processing
```
### 2.4 Advantages in Real Network Scenarios
**Scenario: Read 10KB from network, each read returns 500B-2KB**
```
bufio.Reader (Contiguous Memory):
1. Read 2KB to internal buffer (contiguous)
2. Copy 2KB to user buffer ← Copy
3. Read 1.5KB to internal buffer
4. Copy 1.5KB to user buffer ← Copy
5. Read 2KB...
6. Copy 2KB... ← Copy
... Repeat ...
Total: Multiple network reads + Multiple memory copies
Must allocate 10KB contiguous buffer
BufReader (Non-Contiguous Memory):
1. Read 2KB → Block1, append to slice
2. Read 1.5KB → Block2, append to slice
3. Read 2KB → Block3, append to slice
4. Read 2KB → Block4, append to slice
5. Read 2.5KB → Block5, append to slice
6. ReadRange(10KB):
→ yield(Block1) - 2KB
→ yield(Block2) - 1.5KB
→ yield(Block3) - 2KB
→ yield(Block4) - 2KB
→ yield(Block5) - 2.5KB
Total: Multiple network reads + 0 memory copies
No contiguous memory needed, process block by block
```
### 2.5 Real Application: Stream Forwarding
**Problem Scenario**: 100 concurrent streams, each forwarded to 10 subscribers
**Traditional Approach** (Contiguous Memory):
```go
func forwardStream_Traditional(reader *bufio.Reader, subscribers []net.Conn) {
packet := make([]byte, 4096) // Alloc 1: contiguous memory
n, _ := reader.Read(packet) // Copy 1: from bufio buffer
// Copy for each subscriber
for _, sub := range subscribers {
data := make([]byte, n) // Allocs 2-11: 10 times
copy(data, packet[:n]) // Copies 2-11: 10 times
sub.Write(data)
}
}
// Per packet: 11 allocations + 11 copies
// 100 concurrent × 30fps × 11 = 33,000 allocations/sec
```
**BufReader Approach** (Non-Contiguous Memory):
```go
func forwardStream_BufReader(reader *BufReader, subscribers []net.Conn) {
reader.ReadRange(4096, func(chunk []byte) {
// chunk is original memory block reference, may be non-contiguous
// All subscribers share the same memory block!
for _, sub := range subscribers {
sub.Write(chunk) // Send reference directly, zero-copy
}
})
}
// Per packet: 0 allocations + 0 copies
// 100 concurrent × 30fps × 0 = 0 allocations/sec
```
**Performance Comparison**:
- Allocations: 33,000/sec → 0/sec
- Memory copies: 33,000/sec → 0/sec
- GC pressure: High → Very low
### 2.6 Memory Block Lifecycle
```mermaid
stateDiagram-v2
[*] --> Get from Pool
Get from Pool --> Read Network Data
Read Network Data --> Append to Slice
Append to Slice --> Pass to User
Pass to User --> User Processing
User Processing --> Recycle to Pool
Recycle to Pool --> Get from Pool
note right of Get from Pool
Reuse existing blocks
Avoid GC
end note
note right of Pass to User
Pass reference, zero-copy
May pass to multiple subscribers
end note
note right of Recycle to Pool
Active recycling
Immediately reusable
end note
```
**Key Points**:
1. Memory blocks **circularly reused** in pool, bypassing GC
2. Pass references instead of copying data, achieving zero-copy
3. Recycle immediately after processing, minimizing memory footprint
### 2.7 Core Code Implementation
```go
// Create BufReader
func NewBufReader(reader io.Reader) *BufReader {
return &BufReader{
Allocator: NewScalableMemoryAllocator(16384), // Object pool
feedData: func() error {
// Get memory block from pool, read network data directly
buf, err := r.Allocator.Read(reader, r.BufLen)
if err != nil {
return err
}
// Append to slice (only add reference)
r.buf.Buffers = append(r.buf.Buffers, buf)
r.buf.Length += len(buf)
return nil
},
}
}
// Zero-copy reading
func (r *BufReader) ReadRange(n int, yield func([]byte)) error {
for r.buf.Length < n {
r.feedData() // Read more data from network
}
// Pass references block by block
for _, block := range r.buf.Buffers {
yield(block) // Zero-copy passing
}
// Recycle processed blocks
r.recycleFront()
return nil
}
// Recycle memory blocks to pool
func (r *BufReader) Recycle() {
if r.Allocator != nil {
r.Allocator.Recycle() // Return all blocks to pool
}
}
```
## 3. Performance Validation
### 3.1 Test Design
**Real Network Simulation**: Each read returns random size (64-2048 bytes), simulating real network fluctuations
**Core Test Scenarios**:
1. **Concurrent Network Connection Reading** - Simulate 100+ concurrent connections
2. **GC Pressure Test** - Demonstrate long-term running differences
3. **Streaming Server** - Real business scenario (100 streams × forwarding)
### 3.2 Performance Test Results
**Test Environment**: Apple M2 Pro, Go 1.23.0
#### GC Pressure Test (Core Comparison)
| Metric | bufio.Reader | BufReader | Improvement |
|--------|-------------|-----------|-------------|
| Operation Latency | 1874 ns/op | 112.7 ns/op | **16.6x faster** |
| Allocation Count | 5,576,659 | 3,918 | **99.93% reduction** |
| Per Operation | 2 allocs/op | 0 allocs/op | **Zero allocation** |
| Throughput | 2.8M ops/s | 45.7M ops/s | **16x improvement** |
#### Streaming Server Scenario
| Metric | bufio.Reader | BufReader | Improvement |
|--------|-------------|-----------|-------------|
| Operation Latency | 374.6 ns/op | 30.29 ns/op | **12.4x faster** |
| Memory Allocation | 79,508 MB | 601 MB | **99.2% reduction** |
| **GC Runs** | **134** | **2** | **98.5% reduction** ⭐ |
| Throughput | 10.1M ops/s | 117M ops/s | **11.6x improvement** |
#### Performance Visualization
```
📊 GC Runs Comparison (Core Advantage)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
bufio.Reader ████████████████████████████████████████████████████████████████ 134 runs
BufReader █ 2 runs ← 98.5% reduction!
📊 Total Memory Allocation
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
bufio.Reader ████████████████████████████████████████████████████████████████ 79 GB
BufReader █ 0.6 GB ← 99.2% reduction!
📊 Throughput Comparison
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
bufio.Reader █████ 10.1M ops/s
BufReader ████████████████████████████████████████████████████████ 117M ops/s
```
### 3.3 Why Non-Contiguous Memory Is So Fast
**Reason 1: Zero-Copy Passing**
```go
// bufio - Must copy
buf := make([]byte, 1024)
reader.Read(buf) // Copy to contiguous memory
// BufReader - Pass reference
reader.ReadRange(1024, func(chunk []byte) {
// chunk is original memory block, no copy
})
```
**Reason 2: Memory Block Reuse**
```
bufio: Allocate → Use → GC → Reallocate → ...
BufReader: Allocate → Use → Return to pool → Reuse from pool → ...
↑ Same memory block reused repeatedly, no GC
```
**Reason 3: Multi-Subscriber Sharing**
```
Traditional: 1 packet → Copy 10 times → 10 subscribers
BufReader: 1 packet → Pass reference → 10 subscribers share
↑ Only 1 memory block, all 10 subscribers reference it
```
## 4. Usage Guide
### 4.1 Basic Usage
```go
func handleConnection(conn net.Conn) {
// Create BufReader
reader := util.NewBufReader(conn)
defer reader.Recycle() // Return all blocks to pool
// Zero-copy read and process
reader.ReadRange(4096, func(chunk []byte) {
// chunk is non-contiguous memory block
// Process directly, no copy needed
processChunk(chunk)
})
}
```
### 4.2 Real-World Use Cases
**Scenario 1: Protocol Parsing**
```go
// Parse FLV packet (header + data)
func parseFLV(reader *BufReader) {
// Read packet type (1 byte)
packetType, _ := reader.ReadByte()
// Read data size (3 bytes)
dataSize, _ := reader.ReadBE32(3)
// Skip timestamp etc (7 bytes)
reader.Skip(7)
// Zero-copy read data (may span multiple non-contiguous blocks)
reader.ReadRange(int(dataSize), func(chunk []byte) {
// chunk may be complete data or partial
// Parse block by block, no need to wait for complete data
parseDataChunk(packetType, chunk)
})
}
```
**Scenario 2: High-Concurrency Forwarding**
```go
// Read from one source, forward to multiple targets
func relay(source *BufReader, targets []io.Writer) {
reader.ReadRange(8192, func(chunk []byte) {
// All targets share the same memory block
for _, target := range targets {
target.Write(chunk) // Zero-copy forwarding
}
})
}
```
**Scenario 3: Streaming Server**
```go
// Receive RTSP stream and distribute to subscribers
type Stream struct {
reader *BufReader
subscribers []*Subscriber
}
func (s *Stream) Process() {
s.reader.ReadRange(65536, func(frame []byte) {
// frame may be part of video frame (non-contiguous)
// Send directly to all subscribers
for _, sub := range s.subscribers {
sub.WriteFrame(frame) // Shared memory, zero-copy
}
})
}
```
### 4.3 Best Practices
**✅ Correct Usage**:
```go
// 1. Always recycle resources
reader := util.NewBufReader(conn)
defer reader.Recycle()
// 2. Process directly in callback, don't save references
reader.ReadRange(1024, func(data []byte) {
processData(data) // ✅ Process immediately
})
// 3. Explicitly copy when retention needed
var saved []byte
reader.ReadRange(1024, func(data []byte) {
saved = append(saved, data...) // ✅ Explicit copy
})
```
**❌ Wrong Usage**:
```go
// ❌ Don't save references
var dangling []byte
reader.ReadRange(1024, func(data []byte) {
dangling = data // Wrong: data will be recycled
})
// dangling is now a dangling reference!
// ❌ Don't forget to recycle
reader := util.NewBufReader(conn)
// Missing defer reader.Recycle()
// Memory blocks cannot be returned to pool
```
### 4.4 Performance Optimization Tips
**Tip 1: Batch Processing**
```go
// ✅ Optimized: Read multiple packets at once
reader.ReadRange(65536, func(chunk []byte) {
// One chunk may contain multiple packets
for len(chunk) >= 4 {
size := int(binary.BigEndian.Uint32(chunk[:4]))
packet := chunk[4 : 4+size]
processPacket(packet)
chunk = chunk[4+size:]
}
})
```
**Tip 2: Choose Appropriate Block Size**
```go
// Choose based on application scenario
const (
SmallPacket = 4 << 10 // 4KB - RTSP/HTTP
MediumPacket = 16 << 10 // 16KB - Audio streams
LargePacket = 64 << 10 // 64KB - Video streams
)
reader := util.NewBufReaderWithBufLen(conn, LargePacket)
```
## 5. Summary
### Core Innovation: Non-Contiguous Memory Buffering
BufReader's core is not "better buffering" but **fundamentally changing the memory layout model**:
```
Traditional thinking: Data must be in contiguous memory
BufReader: Data can be scattered across blocks, passed by reference
Result:
✓ Zero-copy: No need to reassemble into contiguous memory
✓ Zero allocation: Memory blocks reused from object pool
✓ Zero GC pressure: No temporary objects created
```
### Key Advantages
| Feature | Implementation | Performance Impact |
|---------|---------------|-------------------|
| **Zero-Copy** | Pass memory block references | No copy overhead |
| **Zero Allocation** | Object pool reuse | 98.5% GC reduction |
| **Multi-Subscriber Sharing** | Same block referenced multiple times | 10x+ memory savings |
| **Flexible Block Sizes** | Adapt to network fluctuations | No reassembly needed |
### Ideal Use Cases
| Scenario | Recommended | Reason |
|----------|------------|---------|
| **High-concurrency network servers** | BufReader ⭐ | 98% GC reduction, 10x+ throughput |
| **Stream forwarding** | BufReader ⭐ | Zero-copy multicast, memory sharing |
| **Protocol parsers** | BufReader ⭐ | Parse block by block, no complete packet needed |
| **Long-running services** | BufReader ⭐ | Stable system, minimal GC impact |
| Simple file reading | bufio.Reader | Standard library sufficient |
### Key Points
Remember when using BufReader:
1. **Accept non-contiguous data**: Process each block via callback
2. **Don't hold references**: Data recycled after callback returns
3. **Leverage ReadRange**: This is the core zero-copy API
4. **Must call Recycle()**: Return memory blocks to pool
### Performance Data
**Streaming Server (100 concurrent streams, continuous running)**:
```
1-hour running estimation:
bufio.Reader (Contiguous Memory):
- Allocates 2.8 TB memory
- Triggers 4,800 GCs
- Frequent system pauses
BufReader (Non-Contiguous Memory):
- Allocates 21 GB memory (133x less)
- Triggers 72 GCs (67x less)
- Almost no GC impact
```
### Testing and Documentation
**Run Tests**:
```bash
sh scripts/benchmark_bufreader.sh
```
## References
- [GoMem Project](https://github.com/langhuihui/gomem) - Memory object pool implementation
- [Monibuca v5](https://m7s.live) - Streaming media server
- Test Code: `pkg/util/buf_reader_benchmark_test.go`
---
**Core Idea**: Eliminate traditional contiguous buffer copying overhead through non-contiguous memory block slices and zero-copy reference passing, achieving high-performance network data processing.