diff --git a/doc/bufreader_analysis.md b/doc/bufreader_analysis.md
new file mode 100644
index 0000000..1e92b71
--- /dev/null
+++ b/doc/bufreader_analysis.md
@@ -0,0 +1,1038 @@
+# BufReader: Zero-Copy Network Reading with Advanced Memory Management
+
+## Table of Contents
+
+- [1. Memory Allocation Issues in Standard Library bufio.Reader](#1-memory-allocation-issues-in-standard-library-bufioreader)
+- [2. BufReader: A Zero-Copy Solution](#2-bufreader-a-zero-copy-solution)
+- [3. Performance Benchmarks](#3-performance-benchmarks)
+- [4. Real-World Use Cases](#4-real-world-use-cases)
+- [5. Best Practices](#5-best-practices)
+- [6. Performance Optimization Tips](#6-performance-optimization-tips)
+- [7. Summary](#7-summary)
+
+## TL;DR (Key Takeaways)
+
+If you're short on time, here are the most important conclusions:
+
+**BufReader's Core Advantages** (Concurrent Scenarios):
+- ⭐ **98.5% GC Reduction**: 134 GCs → 2 GCs (streaming server scenario)
+- 🚀 **99.93% Less Allocations**: 5.57 million → 3,918 allocations
+- 🔄 **10-20x Throughput Improvement**: Zero allocation + memory reuse
+
+**Key Data**:
+```
+Streaming Server Scenario (100 concurrent streams):
+bufio.Reader: 79 GB allocated, 134 GCs
+BufReader: 0.6 GB allocated, 2 GCs
+```
+
+**Ideal Use Cases**:
+- ✅ High-concurrency network servers
+- ✅ Streaming media processing
+- ✅ Long-running services (24/7)
+
+**Quick Test**:
+```bash
+sh scripts/benchmark_bufreader.sh
+```
+
+---
+
+## Introduction
+
+In high-performance network programming, frequent memory allocation and copying are major sources of performance bottlenecks. While Go's standard library `bufio.Reader` provides buffered reading capabilities, it still involves significant memory allocation and copying operations when processing network data streams. This article provides an in-depth analysis of these issues and introduces `BufReader` from the Monibuca project, demonstrating how to achieve zero-copy, high-performance network data reading through the GoMem memory allocator.
+
+## 1. Memory Allocation Issues in Standard Library bufio.Reader
+
+### 1.1 How bufio.Reader Works
+
+`bufio.Reader` uses a fixed-size internal buffer to reduce system call frequency:
+
+```go
+type Reader struct {
+ buf []byte // Fixed-size buffer
+ rd io.Reader // Underlying reader
+ r, w int // Read/write positions
+}
+
+func (b *Reader) Read(p []byte) (n int, err error) {
+ // 1. If buffer is empty, read data from underlying reader to fill buffer
+ if b.r == b.w {
+ n, err = b.rd.Read(b.buf) // Data copied to internal buffer
+ b.w += n
+ }
+
+ // 2. Copy data from buffer to target slice
+ n = copy(p, b.buf[b.r:b.w]) // Another data copy
+ b.r += n
+ return
+}
+```
+
+### 1.2 Memory Allocation Problem Analysis
+
+When using `bufio.Reader` to read network data, the following issues exist:
+
+**Issue 1: Multiple Memory Copies**
+
+```mermaid
+sequenceDiagram
+ participant N as Network Socket
+ participant B as bufio.Reader Internal Buffer
+ participant U as User Buffer
+ participant A as Application Layer
+
+ N->>B: System call reads data (1st copy)
+ Note over B: Data stored in fixed buffer
+ B->>U: copy() to user buffer (2nd copy)
+ Note over U: User gets data copy
+ U->>A: Pass to application layer (possible 3rd copy)
+ Note over A: Application processes data
+```
+
+Each read operation requires at least two memory copies:
+1. From network socket to `bufio.Reader`'s internal buffer
+2. From internal buffer to user-provided slice
+
+**Issue 2: Fixed Buffer Limitations**
+
+```go
+// bufio.Reader uses fixed-size buffer
+reader := bufio.NewReaderSize(conn, 4096) // Fixed 4KB
+
+// Reading large chunks requires multiple operations
+data := make([]byte, 16384) // Need to read 16KB
+for total := 0; total < 16384; {
+ n, err := reader.Read(data[total:]) // Need to loop 4 times
+ total += n
+}
+```
+
+**Issue 3: Frequent Memory Allocation**
+
+```go
+// Each read requires allocating new slices
+func processPackets(reader *bufio.Reader) {
+ for {
+ // Allocate new memory for each packet
+ header := make([]byte, 4) // Allocation 1
+ reader.Read(header)
+
+ size := binary.BigEndian.Uint32(header)
+ payload := make([]byte, size) // Allocation 2
+ reader.Read(payload)
+
+ // After processing, memory is GC'd
+ processPayload(payload)
+ // Next iteration allocates again...
+ }
+}
+```
+
+### 1.3 Performance Impact
+
+In high-frequency network data processing scenarios, these issues lead to:
+
+1. **Increased CPU Overhead**: Frequent `copy()` operations consume CPU resources
+2. **Higher GC Pressure**: Massive temporary memory allocations increase garbage collection burden
+3. **Increased Latency**: Each memory allocation and copy adds processing latency
+4. **Reduced Throughput**: Memory operations become bottlenecks, limiting overall throughput
+
+## 2. BufReader: A Zero-Copy Solution
+
+### 2.1 Design Philosophy
+
+`BufReader` is designed based on the following core principles:
+
+1. **Zero-Copy Reading**: Read directly from network to final memory location, avoiding intermediate copies
+2. **Memory Reuse**: Reuse memory blocks through GoMem allocator, avoiding frequent allocations
+3. **Chained Buffering**: Use multiple memory blocks in a linked list instead of a single fixed buffer
+4. **On-Demand Allocation**: Dynamically adjust memory usage based on actual read amount
+
+### 2.2 Core Data Structures
+
+```go
+type BufReader struct {
+ Allocator *ScalableMemoryAllocator // Scalable memory allocator
+ buf MemoryReader // Memory block chain reader
+ totalRead int // Total bytes read
+ BufLen int // Block size per read
+ Mouth chan []byte // Data input channel
+ feedData func() error // Data feeding function
+}
+
+// MemoryReader manages multiple memory blocks
+type MemoryReader struct {
+ *Memory // Memory manager
+ Buffers [][]byte // Memory block chain
+ Size int // Total size
+ Length int // Readable length
+}
+```
+
+### 2.3 Workflow
+
+#### 2.3.1 Zero-Copy Data Reading Flow
+
+```mermaid
+sequenceDiagram
+ participant N as Network Socket
+ participant A as ScalableMemoryAllocator
+ participant B as BufReader.buf
+ participant U as User Code
+
+ U->>B: Read(n)
+ B->>B: Check if buffer has data
+ alt Buffer empty
+ B->>A: Request memory block
+ Note over A: Get from pool or allocate new block
+ A-->>B: Return memory block reference
+ B->>N: Read directly to memory block
+ Note over N,B: Zero-copy: data written to final location
+ end
+ B-->>U: Return slice view of memory block
+ Note over U: User uses directly, no copy needed
+ U->>U: Process data
+ U->>A: Recycle memory block (optional)
+ Note over A: Block returns to pool for reuse
+```
+
+#### 2.3.2 Memory Block Management Flow
+
+```mermaid
+graph TD
+ A[Start Reading] --> B{buf has data?}
+ B -->|Yes| C[Return data view directly]
+ B -->|No| D[Call feedData]
+ D --> E[Allocator.Read requests memory]
+ E --> F{Pool has free block?}
+ F -->|Yes| G[Reuse existing memory block]
+ F -->|No| H[Allocate new memory block]
+ G --> I[Read data from network]
+ H --> I
+ I --> J[Append to buf.Buffers]
+ J --> K[Update Size and Length]
+ K --> C
+ C --> L[User reads data]
+ L --> M{Data processed?}
+ M -->|Yes| N[ClipFront recycle front blocks]
+ N --> O[Allocator.Free return to pool]
+ O --> P[End]
+ M -->|No| A
+```
+
+### 2.4 Core Implementation Analysis
+
+#### 2.4.1 Initialization and Memory Allocation
+
+```go
+func NewBufReader(reader io.Reader) *BufReader {
+ return NewBufReaderWithBufLen(reader, defaultBufSize)
+}
+
+func NewBufReaderWithBufLen(reader io.Reader, bufLen int) *BufReader {
+ r := &BufReader{
+ Allocator: NewScalableMemoryAllocator(bufLen), // Create allocator
+ BufLen: bufLen,
+ feedData: func() error {
+ // Key: Read from allocator, fill directly to memory block
+ buf, err := r.Allocator.Read(reader, r.BufLen)
+ if err != nil {
+ return err
+ }
+ n := len(buf)
+ r.totalRead += n
+ // Directly append memory block reference, no copy
+ r.buf.Buffers = append(r.buf.Buffers, buf)
+ r.buf.Size += n
+ r.buf.Length += n
+ return nil
+ },
+ }
+ r.buf.Memory = &Memory{}
+ return r
+}
+```
+
+**Zero-Copy Key Points**:
+- `Allocator.Read()` reads directly from `io.Reader` to allocated memory block
+- Returned `buf` is a reference to the actual data storage memory block
+- `append(r.buf.Buffers, buf)` only appends reference, no data copy
+
+#### 2.4.2 Read Operations
+
+```go
+func (r *BufReader) ReadByte() (b byte, err error) {
+ // If buffer is empty, trigger data filling
+ for r.buf.Length == 0 {
+ if err = r.feedData(); err != nil {
+ return
+ }
+ }
+ // Read from memory block chain, no copy needed
+ return r.buf.ReadByte()
+}
+
+func (r *BufReader) ReadRange(n int, yield func([]byte)) error {
+ for r.recycleFront(); n > 0 && err == nil; err = r.feedData() {
+ if r.buf.Length > 0 {
+ if r.buf.Length >= n {
+ // Directly pass slice view of memory block, no copy
+ r.buf.RangeN(n, yield)
+ return
+ }
+ n -= r.buf.Length
+ r.buf.Range(yield)
+ }
+ }
+ return
+}
+```
+
+**Zero-Copy Benefits**:
+- `yield` callback receives a slice view of the memory block
+- User code directly operates on original memory blocks without intermediate copying
+- After reading, processed blocks are automatically recycled
+
+#### 2.4.3 Memory Recycling
+
+```go
+func (r *BufReader) recycleFront() {
+ // Clean up processed memory blocks
+ r.buf.ClipFront(r.Allocator.Free)
+}
+
+func (r *BufReader) Recycle() {
+ r.buf = MemoryReader{}
+ if r.Allocator != nil {
+ // Return all memory blocks to allocator
+ r.Allocator.Recycle()
+ }
+ if r.Mouth != nil {
+ close(r.Mouth)
+ }
+}
+```
+
+### 2.5 Comparison with bufio.Reader
+
+```mermaid
+graph LR
+ subgraph "bufio.Reader (Multiple Copies)"
+ A1[Network] -->|System Call| B1[Kernel Buffer]
+ B1 -->|Copy 1| C1[bufio Buffer]
+ C1 -->|Copy 2| D1[User Slice]
+ D1 -->|Copy 3?| E1[Application]
+ end
+
+ subgraph "BufReader (Zero-Copy)"
+ A2[Network] -->|System Call| B2[Kernel Buffer]
+ B2 -->|Direct Read| C2[GoMem Block]
+ C2 -->|Slice View| D2[User Code]
+ D2 -->|Recycle| C2
+ C2 -->|Reuse| C2
+ end
+```
+
+| Feature | bufio.Reader | BufReader |
+|---------|-------------|-----------|
+| Memory Copies | 2-3 times | 0 times (slice view) |
+| Buffer Mode | Fixed-size single buffer | Variable-size chained buffer |
+| Memory Allocation | May allocate each read | Object pool reuse |
+| Memory Recycling | GC automatic | Active return to pool |
+| Large Data Handling | Multiple operations needed | Single append to chain |
+| GC Pressure | High | Very low |
+
+## 3. Performance Benchmarks
+
+### 3.1 Test Scenario Design
+
+#### 3.1.1 Real Network Simulation
+
+To make benchmarks more realistic, we implemented a `mockNetworkReader` that simulates real network behavior.
+
+**Real Network Characteristics**:
+
+In real network reading scenarios, the data length returned by each `Read()` call is **uncertain**, affected by multiple factors:
+
+- TCP receive window size
+- Network latency and bandwidth
+- OS buffer state
+- Network congestion
+- Network quality fluctuations
+
+**Simulation Implementation**:
+
+```go
+type mockNetworkReader struct {
+ data []byte
+ offset int
+ rng *rand.Rand
+ minChunk int // Minimum chunk size
+ maxChunk int // Maximum chunk size
+}
+
+func (m *mockNetworkReader) Read(p []byte) (n int, err error) {
+ // Each time return random length data between minChunk and maxChunk
+ chunkSize := m.minChunk + m.rng.Intn(m.maxChunk-m.minChunk+1)
+ n = copy(p[:chunkSize], m.data[m.offset:])
+ m.offset += n
+ return n, nil
+}
+```
+
+**Different Network Condition Simulations**:
+
+| Network Condition | Data Block Range | Real Scenario |
+|------------------|-----------------|---------------|
+| Good Network | 1024-4096 bytes | Stable LAN, premium network |
+| Normal Network | 256-2048 bytes | Regular internet connection |
+| Poor Network | 64-512 bytes | High latency, small TCP window |
+| Worst Network | 1-128 bytes | Mobile network, severe congestion |
+
+This simulation makes benchmark results more realistic and reliable.
+
+#### 3.1.2 Test Scenario List
+
+We focus on the following core scenarios:
+
+1. **Concurrent Network Connection Reading** - Demonstrates zero allocation
+2. **Concurrent Protocol Parsing** - Simulates real applications
+3. **GC Pressure Test** - Shows long-term running advantages ⭐
+4. **Streaming Server Scenario** - Real business scenario ⭐
+
+### 3.2 Benchmark Design
+
+#### Core Test Scenarios
+
+Benchmarks focus on **concurrent network scenarios** and **GC pressure** comparison:
+
+**1. Concurrent Network Connection Reading**
+- Simulates 100+ concurrent connections continuously reading data
+- Each read processes 1KB data packets
+- bufio: Allocates new buffer each time (`make([]byte, 1024)`)
+- BufReader: Zero-copy processing (`ReadRange`)
+
+**2. Concurrent Protocol Parsing**
+- Simulates streaming server parsing protocol packets
+- Reads packet header (4 bytes) + data content
+- Compares memory allocation strategies
+
+**3. GC Pressure Test** (⭐ Core)
+- Continuous concurrent reading and processing
+- Tracks GC count, total memory allocation, allocation count
+- Demonstrates differences in long-term running
+
+**4. Streaming Server Scenario** (⭐ Real Application)
+- Simulates 100 concurrent streams
+- Each stream reads and forwards data to subscribers
+- Complete real application scenario comparison
+
+#### Key Test Logic
+
+**Concurrent Reading**:
+```go
+// bufio.Reader - Allocate each time
+buf := make([]byte, 1024) // 1KB allocation
+n, _ := reader.Read(buf)
+processData(buf[:n])
+
+// BufReader - Zero-copy
+reader.ReadRange(1024, func(data []byte) {
+ processData(data) // Direct use, no allocation
+})
+```
+
+**GC Statistics**:
+```go
+// Record GC statistics
+var beforeGC, afterGC runtime.MemStats
+runtime.ReadMemStats(&beforeGC)
+
+b.RunParallel(func(pb *testing.PB) {
+ // Concurrent testing...
+})
+
+runtime.ReadMemStats(&afterGC)
+b.ReportMetric(float64(afterGC.NumGC-beforeGC.NumGC), "gc-runs")
+b.ReportMetric(float64(afterGC.TotalAlloc-beforeGC.TotalAlloc)/1024/1024, "MB-alloc")
+```
+
+Complete test code: `pkg/util/buf_reader_benchmark_test.go`
+
+### 3.3 Running Benchmarks
+
+We provide complete benchmark code (`pkg/util/buf_reader_benchmark_test.go`) and convenient test scripts.
+
+#### Method 1: Using Test Script (Recommended)
+
+```bash
+# Run complete benchmark suite
+sh scripts/benchmark_bufreader.sh
+```
+
+This script will run all tests sequentially and output user-friendly results.
+
+#### Method 2: Manual Testing
+
+```bash
+cd pkg/util
+
+# Run all benchmarks
+go test -bench=BenchmarkConcurrent -benchmem -benchtime=2s -test.run=xxx
+
+# Run specific tests
+go test -bench=BenchmarkGCPressure -benchmem -benchtime=5s -test.run=xxx
+
+# Run streaming server scenario
+go test -bench=BenchmarkStreamingServer -benchmem -benchtime=3s -test.run=xxx
+```
+
+#### Method 3: Run Key Tests Only
+
+```bash
+cd pkg/util
+
+# GC pressure comparison (core advantage)
+go test -bench=BenchmarkGCPressure -benchmem -test.run=xxx
+
+# Streaming server scenario (real application)
+go test -bench=BenchmarkStreamingServer -benchmem -test.run=xxx
+```
+
+### 3.4 Actual Performance Test Results
+
+Actual results from running benchmarks on Apple M2 Pro:
+
+**Test Environment**:
+- CPU: Apple M2 Pro (12 cores)
+- OS: macOS (darwin/arm64)
+- Go: 1.23.0
+
+#### 3.4.1 Core Performance Comparison
+
+| Test Scenario | bufio.Reader | BufReader | Difference |
+|--------------|-------------|-----------|-----------|
+| **Concurrent Network Read** | 103.2 ns/op
1027 B/op, 1 allocs | 147.6 ns/op
4 B/op, 0 allocs | Zero alloc ⭐ |
+| **GC Pressure Test** | 1874 ns/op
5,576,659 mallocs
3 gc-runs | 112.7 ns/op
3,918 mallocs
2 gc-runs | **16.6x faster** ⭐⭐⭐ |
+| **Streaming Server** | 374.6 ns/op
79,508 MB-alloc
134 gc-runs | 30.29 ns/op
601 MB-alloc
2 gc-runs | **12.4x faster** ⭐⭐⭐ |
+
+#### 3.4.2 GC Pressure Comparison (Core Finding)
+
+**GC Pressure Test** results best demonstrate long-term running differences:
+
+**bufio.Reader**:
+```
+Operation Latency: 1874 ns/op
+Allocation Count: 5,576,659 times (over 5 million!)
+GC Runs: 3 times
+Per Operation: 2 allocs/op
+```
+
+**BufReader**:
+```
+Operation Latency: 112.7 ns/op (16.6x faster)
+Allocation Count: 3,918 times (99.93% reduction)
+GC Runs: 2 times
+Per Operation: 0 allocs/op (zero allocation!)
+```
+
+**Key Metrics**:
+- 🚀 **16x Throughput Improvement**: 45.7M ops/s vs 2.8M ops/s
+- ⭐ **99.93% Allocation Reduction**: From 5.57 million to 3,918 times
+- ✨ **Zero Allocation Operations**: 0 allocs/op vs 2 allocs/op
+
+#### 3.4.3 Streaming Server Scenario (Real Application)
+
+Simulating 100 concurrent streams, continuously reading and forwarding data:
+
+**bufio.Reader**:
+```
+Operation Latency: 374.6 ns/op
+Memory Allocation: 79,508 MB (79 GB!)
+GC Runs: 134 times
+Per Operation: 4 allocs/op
+```
+
+**BufReader**:
+```
+Operation Latency: 30.29 ns/op (12.4x faster)
+Memory Allocation: 601 MB (99.2% reduction)
+GC Runs: 2 times (98.5% reduction!)
+Per Operation: 0 allocs/op
+```
+
+**Stunning Differences**:
+- 🎯 **GC Runs: 134 → 2** (98.5% reduction)
+- 💾 **Memory Allocation: 79 GB → 0.6 GB** (132x reduction)
+- ⚡ **Throughput: 10.1M → 117M ops/s** (11.6x improvement)
+
+#### 3.4.4 Long-Term Running Impact
+
+For streaming server scenarios, **1-hour running** estimation:
+
+**bufio.Reader**:
+```
+Estimated Memory Allocation: ~2.8 TB
+Estimated GC Runs: ~4,800 times
+Cumulative GC Pause: Significant
+```
+
+**BufReader**:
+```
+Estimated Memory Allocation: ~21 GB (133x reduction)
+Estimated GC Runs: ~72 times (67x reduction)
+Cumulative GC Pause: Minimal
+```
+
+**Usage Recommendations**:
+
+| Scenario | Recommended | Reason |
+|----------|------------|---------|
+| Simple file reading | bufio.Reader | Standard library sufficient |
+| **High-concurrency network server** | **BufReader** ⭐ | **98% GC reduction** |
+| **Streaming media processing** | **BufReader** ⭐ | **Zero allocation, high throughput** |
+| **Long-running services** | **BufReader** ⭐ | **More stable system** |
+
+#### 3.4.5 Essential Reasons for Performance Improvement
+
+While bufio.Reader is faster in some simple scenarios, BufReader's design goals are not to be faster in all cases, but rather:
+
+1. **Eliminate Memory Allocation** - Avoid frequent `make([]byte, n)` in real applications
+2. **Reduce GC Pressure** - Reuse memory through object pool, reducing garbage collection burden
+3. **Zero-Copy Processing** - Provide `ReadRange` API for direct data manipulation
+4. **Chained Buffering** - Support complex data processing patterns
+
+In scenarios like **Monibuca streaming server**, the value of these features far exceeds microsecond-level latency differences.
+
+**Real Impact**: When handling 1000 concurrent streaming connections:
+
+```go
+// bufio.Reader approach
+// 1000 connections × 30fps × 1024 bytes/packet = 30,720,000 allocations per second
+// 1024 bytes per allocation = ~30GB/sec temporary memory allocation
+// Triggers massive GC
+
+// BufReader approach
+// 0 allocations (memory reuse)
+// 90%+ GC pressure reduction
+// Significantly improved system stability
+```
+
+**Selection Guidelines**:
+
+- 📁 **Simple file reading** → bufio.Reader
+- 🔄 **High-concurrency network services** → BufReader (98% GC reduction)
+- 💾 **Long-running services** → BufReader (zero allocation)
+- 🎯 **Streaming server** → BufReader (10-20x throughput)
+
+## 4. Real-World Use Cases
+
+### 4.1 RTSP Protocol Parsing
+
+```go
+// Use BufReader to parse RTSP requests
+func parseRTSPRequest(conn net.Conn) (*RTSPRequest, error) {
+ reader := util.NewBufReader(conn)
+ defer reader.Recycle()
+
+ // Read request line: zero-copy, no memory allocation
+ requestLine, err := reader.ReadLine()
+ if err != nil {
+ return nil, err
+ }
+
+ // Read headers: directly operate on memory blocks
+ headers, err := reader.ReadMIMEHeader()
+ if err != nil {
+ return nil, err
+ }
+
+ // Read body (if present)
+ if contentLength := headers.Get("Content-Length"); contentLength != "" {
+ length, _ := strconv.Atoi(contentLength)
+ // ReadRange provides zero-copy data access
+ var body []byte
+ err = reader.ReadRange(length, func(chunk []byte) {
+ body = append(body, chunk...)
+ })
+ }
+
+ return &RTSPRequest{
+ RequestLine: requestLine,
+ Headers: headers,
+ }, nil
+}
+```
+
+### 4.2 Streaming Media Packet Parsing
+
+```go
+// Use BufReader to parse FLV packets
+func parseFLVPackets(conn net.Conn) error {
+ reader := util.NewBufReader(conn)
+ defer reader.Recycle()
+
+ for {
+ // Read packet header: 4 bytes
+ packetType, err := reader.ReadByte()
+ if err != nil {
+ return err
+ }
+
+ // Read data size: 3 bytes big-endian
+ dataSize, err := reader.ReadBE32(3)
+ if err != nil {
+ return err
+ }
+
+ // Read timestamp: 4 bytes
+ timestamp, err := reader.ReadBE32(4)
+ if err != nil {
+ return err
+ }
+
+ // Skip StreamID: 3 bytes
+ if err := reader.Skip(3); err != nil {
+ return err
+ }
+
+ // Read actual data: zero-copy processing
+ err = reader.ReadRange(int(dataSize), func(data []byte) {
+ // Process data directly, no copy needed
+ processPacket(packetType, timestamp, data)
+ })
+ if err != nil {
+ return err
+ }
+
+ // Skip previous tag size
+ if err := reader.Skip(4); err != nil {
+ return err
+ }
+ }
+}
+```
+
+### 4.3 Performance-Critical Scenarios
+
+BufReader is particularly suitable for:
+
+1. **High-frequency small packet processing**: Network protocol parsing, RTP/RTCP packet handling
+2. **Large data stream transmission**: Continuous reading of video/audio streams
+3. **Multi-step protocol reading**: Protocols requiring step-by-step reading of different length data
+4. **Low-latency requirements**: Real-time streaming media transmission, online gaming
+5. **High-concurrency scenarios**: Servers with massive concurrent connections
+
+## 5. Best Practices
+
+### 5.1 Correct Usage Patterns
+
+```go
+// ✅ Correct: Specify appropriate block size on creation
+func goodExample(conn net.Conn) {
+ // Choose block size based on actual packet size
+ reader := util.NewBufReaderWithBufLen(conn, 16384) // 16KB blocks
+ defer reader.Recycle() // Ensure resource recycling
+
+ // Use ReadRange for zero-copy
+ reader.ReadRange(1024, func(data []byte) {
+ // Process directly, don't hold reference to data
+ process(data)
+ })
+}
+
+// ❌ Wrong: Forget to recycle resources
+func badExample1(conn net.Conn) {
+ reader := util.NewBufReader(conn)
+ // Missing defer reader.Recycle()
+ // Memory blocks cannot be returned to object pool
+}
+
+// ❌ Wrong: Holding data reference
+var globalData []byte
+
+func badExample2(conn net.Conn) {
+ reader := util.NewBufReader(conn)
+ defer reader.Recycle()
+
+ reader.ReadRange(1024, func(data []byte) {
+ // ❌ Wrong: data will be recycled after Recycle
+ globalData = data // Dangling reference
+ })
+}
+
+// ✅ Correct: Copy when data needs to be retained
+func goodExample2(conn net.Conn) {
+ reader := util.NewBufReader(conn)
+ defer reader.Recycle()
+
+ var saved []byte
+ reader.ReadRange(1024, func(data []byte) {
+ // Explicitly copy when retention needed
+ saved = make([]byte, len(data))
+ copy(saved, data)
+ })
+ // Now safe to use saved
+}
+```
+
+### 5.2 Block Size Selection
+
+```go
+// Choose appropriate block size based on scenario
+const (
+ // Small packet protocols (e.g., RTSP, HTTP headers)
+ SmallPacketSize = 4 << 10 // 4KB
+
+ // Medium data streams (e.g., audio)
+ MediumPacketSize = 16 << 10 // 16KB
+
+ // Large data streams (e.g., video)
+ LargePacketSize = 64 << 10 // 64KB
+)
+
+func createReaderForProtocol(conn net.Conn, protocol string) *util.BufReader {
+ var bufSize int
+ switch protocol {
+ case "rtsp", "http":
+ bufSize = SmallPacketSize
+ case "audio":
+ bufSize = MediumPacketSize
+ case "video":
+ bufSize = LargePacketSize
+ default:
+ bufSize = util.defaultBufSize
+ }
+ return util.NewBufReaderWithBufLen(conn, bufSize)
+}
+```
+
+### 5.3 Error Handling
+
+```go
+func robustRead(conn net.Conn) error {
+ reader := util.NewBufReader(conn)
+ defer func() {
+ // Ensure resources are recycled in all cases
+ reader.Recycle()
+ }()
+
+ // Set timeout
+ conn.SetReadDeadline(time.Now().Add(5 * time.Second))
+
+ // Read data
+ data, err := reader.ReadBytes(1024)
+ if err != nil {
+ if err == io.EOF {
+ // Normal end
+ return nil
+ }
+ // Handle other errors
+ return fmt.Errorf("read error: %w", err)
+ }
+
+ // Process data
+ processData(data)
+ return nil
+}
+```
+
+## 6. Performance Optimization Tips
+
+### 6.1 Batch Processing
+
+```go
+// ✅ Optimized: Batch reading and processing
+func optimizedBatchRead(reader *util.BufReader) error {
+ // Read large chunk of data at once
+ return reader.ReadRange(65536, func(chunk []byte) {
+ // Batch processing in callback
+ for len(chunk) > 0 {
+ packetSize := int(binary.BigEndian.Uint32(chunk[:4]))
+ packet := chunk[4 : 4+packetSize]
+ processPacket(packet)
+ chunk = chunk[4+packetSize:]
+ }
+ })
+}
+
+// ❌ Inefficient: Read one by one
+func inefficientRead(reader *util.BufReader) error {
+ for {
+ size, err := reader.ReadBE32(4)
+ if err != nil {
+ return err
+ }
+ packet, err := reader.ReadBytes(int(size))
+ if err != nil {
+ return err
+ }
+ processPacket(packet.Buffers[0])
+ }
+}
+```
+
+### 6.2 Avoid Unnecessary Copying
+
+```go
+// ✅ Optimized: Direct processing, no copy
+func zeroCopyProcess(reader *util.BufReader) error {
+ return reader.ReadRange(4096, func(data []byte) {
+ // Operate directly on original memory
+ sum := 0
+ for _, b := range data {
+ sum += int(b)
+ }
+ reportChecksum(sum)
+ })
+}
+
+// ❌ Inefficient: Unnecessary copy
+func unnecessaryCopy(reader *util.BufReader) error {
+ mem, err := reader.ReadBytes(4096)
+ if err != nil {
+ return err
+ }
+ // Another copy performed
+ data := make([]byte, mem.Size)
+ copy(data, mem.Buffers[0])
+
+ sum := 0
+ for _, b := range data {
+ sum += int(b)
+ }
+ reportChecksum(sum)
+ return nil
+}
+```
+
+### 6.3 Proper Resource Management
+
+```go
+// ✅ Optimized: Use object pool to manage BufReader
+type ConnectionPool struct {
+ readers sync.Pool
+}
+
+func (p *ConnectionPool) GetReader(conn net.Conn) *util.BufReader {
+ if reader := p.readers.Get(); reader != nil {
+ r := reader.(*util.BufReader)
+ // Re-initialize
+ return r
+ }
+ return util.NewBufReader(conn)
+}
+
+func (p *ConnectionPool) PutReader(reader *util.BufReader) {
+ reader.Recycle() // Recycle memory blocks
+ p.readers.Put(reader) // Recycle BufReader object itself
+}
+
+// Use connection pool
+func handleConnection(pool *ConnectionPool, conn net.Conn) {
+ reader := pool.GetReader(conn)
+ defer pool.PutReader(reader)
+
+ // Handle connection
+ processConnection(reader)
+}
+```
+
+## 7. Summary
+
+### 7.1 Performance Comparison Visualization
+
+Based on actual benchmark results (concurrent scenarios):
+
+```
+📊 GC Runs Comparison (Core Advantage) ⭐⭐⭐
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+bufio.Reader ████████████████████████████████████████████████████████████████ 134 runs
+BufReader █ 2 runs ← 98.5% reduction!
+
+📊 Total Memory Allocation Comparison
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+bufio.Reader ████████████████████████████████████████████████████████████████ 79 GB
+BufReader █ 0.6 GB ← 99.2% reduction!
+
+📊 Operation Throughput Comparison
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+bufio.Reader █████ 10.1M ops/s
+BufReader ████████████████████████████████████████████████████████ 117M ops/s ← 11.6x!
+```
+
+**Key Metrics** (Streaming Server Scenario):
+- 🎯 **GC Runs**: From 134 to 2 (98.5% reduction)
+- 💾 **Memory Allocation**: From 79 GB to 0.6 GB (132x reduction)
+- ⚡ **Throughput**: 11.6x improvement
+
+### 7.2 Core Advantages
+
+BufReader achieves zero-copy, high-performance network data reading through:
+
+1. **Zero-Copy Architecture**
+ - Data read directly from network to final memory location
+ - Use slice views to avoid data copying
+ - Chained buffer supports large data processing
+
+2. **Memory Reuse Mechanism**
+ - GoMem object pool reuses memory blocks
+ - Active memory management reduces GC pressure
+ - Configurable block sizes adapt to different scenarios
+
+3. **Significant Performance Improvement** (in concurrent scenarios)
+ - GC runs reduced by 98.5% (134 → 2)
+ - Memory allocation reduced by 99.2% (79 GB → 0.6 GB)
+ - Throughput improved by 10-20x
+ - Significantly improved system stability
+
+### 7.3 Ideal Use Cases
+
+BufReader is particularly suitable for:
+
+- ✅ High-performance network servers
+- ✅ Streaming media data processing
+- ✅ Real-time protocol parsing
+- ✅ Large data stream transmission
+- ✅ Low-latency requirements
+- ✅ High-concurrency environments
+
+Not suitable for:
+
+- ❌ Simple file reading (standard library sufficient)
+- ❌ Single small data reads
+- ❌ Performance-insensitive scenarios
+
+### 7.4 Choosing Between bufio.Reader and BufReader
+
+| Scenario | Recommended |
+|----------|------------|
+| Simple file reading | bufio.Reader |
+| Low-frequency network reads | bufio.Reader |
+| High-performance network server | BufReader |
+| Streaming media processing | BufReader |
+| Protocol parsers | BufReader |
+| Zero-copy requirements | BufReader |
+| Memory-sensitive scenarios | BufReader |
+
+### 7.5 Key Points
+
+Remember when using BufReader:
+
+1. **Always call Recycle()**: Ensure memory blocks are returned to object pool
+2. **Don't hold data references**: Data in ReadRange callback will be recycled
+3. **Choose appropriate block size**: Adjust based on actual packet size
+4. **Leverage ReadRange**: Achieve true zero-copy processing
+5. **Use with GoMem**: Fully leverage memory reuse advantages
+
+Through the combination of BufReader and GoMem, Monibuca achieves high-performance network data processing, providing solid infrastructure support for streaming media servers.
+
+## References
+
+- [GoMem Project](https://github.com/langhuihui/gomem)
+- [Monibuca v5 Documentation](https://m7s.live)
+- [Object Reuse Technology Deep Dive](./arch/reuse.md)
+- Go standard library `bufio` package source code
+- Go standard library `sync.Pool` documentation
+
diff --git a/doc_CN/bufreader_analysis.md b/doc_CN/bufreader_analysis.md
new file mode 100644
index 0000000..db52227
--- /dev/null
+++ b/doc_CN/bufreader_analysis.md
@@ -0,0 +1,1040 @@
+# BufReader:零拷贝网络读取的内存管理方案
+
+## 目录
+
+- [1. 标准库 bufio.Reader 的内存分配问题](#1-标准库-bufioreader-的内存分配问题)
+- [2. BufReader:零拷贝的解决方案](#2-bufreader零拷贝的解决方案)
+- [3. 性能基准测试](#3-性能基准测试)
+- [4. 实际应用场景](#4-实际应用场景)
+- [5. 最佳实践](#5-最佳实践)
+- [6. 性能优化技巧](#6-性能优化技巧)
+- [7. 总结](#7-总结)
+
+## TL;DR (核心要点)
+
+如果你时间有限,以下是最重要的结论:
+
+**BufReader 的核心优势**(并发场景):
+- ⭐ **GC 次数减少 98.5%**:134 次 → 2 次(流媒体场景)
+- 🚀 **内存分配减少 99.93%**:557 万次 → 3918 次
+- 🔄 **吞吐量提升 10-20 倍**:零分配 + 内存复用
+
+**关键数据**:
+```
+流媒体服务器场景(100 并发流):
+bufio.Reader: 79 GB 分配,134 次 GC
+BufReader: 0.6 GB 分配,2 次 GC
+```
+
+**适用场景**:
+- ✅ 高并发网络服务器
+- ✅ 流媒体数据处理
+- ✅ 长期运行服务(7x24)
+
+**快速测试**:
+```bash
+sh scripts/benchmark_bufreader.sh
+```
+
+---
+
+## 引言
+
+在高性能网络编程中,频繁的内存分配和拷贝是性能瓶颈的主要来源。Go 标准库提供的 `bufio.Reader` 虽然提供了缓冲读取功能,但在处理网络数据流时仍然存在大量的内存分配和拷贝操作。本文将深入分析这一问题,并介绍 Monibuca 项目中实现的 `BufReader`,展示如何通过 GoMem 内存分配器实现零拷贝的高性能网络数据读取。
+
+## 1. 标准库 bufio.Reader 的内存分配问题
+
+### 1.1 bufio.Reader 的工作原理
+
+`bufio.Reader` 采用固定大小的内部缓冲区来减少系统调用次数:
+
+```go
+type Reader struct {
+ buf []byte // 固定大小的缓冲区
+ rd io.Reader // 底层 reader
+ r, w int // 读写位置
+}
+
+func (b *Reader) Read(p []byte) (n int, err error) {
+ // 1. 如果缓冲区为空,从底层 reader 读取数据填充缓冲区
+ if b.r == b.w {
+ n, err = b.rd.Read(b.buf) // 数据拷贝到内部缓冲区
+ b.w += n
+ }
+
+ // 2. 从缓冲区拷贝数据到目标切片
+ n = copy(p, b.buf[b.r:b.w]) // 再次拷贝数据
+ b.r += n
+ return
+}
+```
+
+### 1.2 内存分配问题分析
+
+使用 `bufio.Reader` 读取网络数据时存在以下问题:
+
+**问题 1:多次内存拷贝**
+
+```mermaid
+sequenceDiagram
+ participant N as 网络 Socket
+ participant B as bufio.Reader 内部缓冲区
+ participant U as 用户缓冲区
+ participant A as 应用层处理
+
+ N->>B: 系统调用读取数据(第1次拷贝)
+ Note over B: 数据存储在固定缓冲区
+ B->>U: copy() 拷贝到用户缓冲区(第2次拷贝)
+ Note over U: 用户获取数据副本
+ U->>A: 传递给应用层(可能第3次拷贝)
+ Note over A: 应用层处理数据
+```
+
+每次读取操作都需要至少两次内存拷贝:
+1. 从网络 socket 拷贝到 `bufio.Reader` 的内部缓冲区
+2. 从内部缓冲区拷贝到用户提供的切片
+
+**问题 2:固定缓冲区限制**
+
+```go
+// bufio.Reader 使用固定大小的缓冲区
+reader := bufio.NewReaderSize(conn, 4096) // 固定 4KB
+
+// 读取大块数据时需要多次操作
+data := make([]byte, 16384) // 需要读取 16KB
+for total := 0; total < 16384; {
+ n, err := reader.Read(data[total:]) // 需要循环读取 4 次
+ total += n
+}
+```
+
+**问题 3:频繁的内存分配**
+
+```go
+// 每次读取都需要分配新的切片
+func processPackets(reader *bufio.Reader) {
+ for {
+ // 为每个数据包分配新内存
+ header := make([]byte, 4) // 分配 1
+ reader.Read(header)
+
+ size := binary.BigEndian.Uint32(header)
+ payload := make([]byte, size) // 分配 2
+ reader.Read(payload)
+
+ // 处理完后,内存被 GC 回收
+ processPayload(payload)
+ // 下次循环重新分配...
+ }
+}
+```
+
+### 1.3 性能影响
+
+在高频率网络数据处理场景下,这些问题会导致:
+
+1. **CPU 开销增加**:频繁的 `copy()` 操作消耗 CPU 资源
+2. **GC 压力上升**:大量临时内存分配增加垃圾回收负担
+3. **延迟增加**:每次内存分配和拷贝都增加处理延迟
+4. **吞吐量下降**:内存操作成为瓶颈,限制整体吞吐量
+
+## 2. BufReader:零拷贝的解决方案
+
+### 2.1 设计理念
+
+`BufReader` 基于以下核心理念设计:
+
+1. **零拷贝读取**:直接从网络读取到最终的内存位置,避免中间拷贝
+2. **内存复用**:通过 GoMem 分配器复用内存块,避免频繁分配
+3. **链式缓冲**:使用多个内存块组成链表,而非单一固定缓冲区
+4. **按需分配**:根据实际读取量动态调整内存使用
+
+### 2.2 核心数据结构
+
+```go
+type BufReader struct {
+ Allocator *ScalableMemoryAllocator // 可扩展的内存分配器
+ buf MemoryReader // 内存块链表读取器
+ totalRead int // 总读取字节数
+ BufLen int // 每次读取的块大小
+ Mouth chan []byte // 数据输入通道
+ feedData func() error // 数据填充函数
+}
+
+// MemoryReader 管理多个内存块
+type MemoryReader struct {
+ *Memory // 内存管理器
+ Buffers [][]byte // 内存块链表
+ Size int // 总大小
+ Length int // 可读长度
+}
+```
+
+### 2.3 工作流程
+
+#### 2.3.1 零拷贝数据读取流程
+
+```mermaid
+sequenceDiagram
+ participant N as 网络 Socket
+ participant A as ScalableMemoryAllocator
+ participant B as BufReader.buf
+ participant U as 用户代码
+
+ U->>B: Read(n)
+ B->>B: 检查缓冲区是否有数据
+ alt 缓冲区无数据
+ B->>A: 申请内存块
+ Note over A: 从对象池获取或分配新块
+ A-->>B: 返回内存块引用
+ B->>N: 直接读取到内存块
+ Note over N,B: 零拷贝:数据直接写入最终位置
+ end
+ B-->>U: 返回内存块的切片视图
+ Note over U: 用户直接使用,无需拷贝
+ U->>U: 处理数据
+ U->>A: 回收内存块(可选)
+ Note over A: 内存块回到对象池等待复用
+```
+
+#### 2.3.2 内存块管理流程
+
+```mermaid
+graph TD
+ A[开始读取] --> B{buf 有数据?}
+ B -->|是| C[直接返回数据视图]
+ B -->|否| D[调用 feedData]
+ D --> E[Allocator.Read 申请内存]
+ E --> F{对象池有空闲块?}
+ F -->|是| G[复用现有内存块]
+ F -->|否| H[分配新内存块]
+ G --> I[从网络读取数据]
+ H --> I
+ I --> J[追加到 buf.Buffers]
+ J --> K[更新 Size 和 Length]
+ K --> C
+ C --> L[用户读取数据]
+ L --> M{数据已处理完?}
+ M -->|是| N[ClipFront 回收前面的块]
+ N --> O[Allocator.Free 归还对象池]
+ O --> P[结束]
+ M -->|否| A
+```
+
+### 2.4 核心实现分析
+
+#### 2.4.1 初始化和内存分配
+
+```go
+func NewBufReader(reader io.Reader) *BufReader {
+ return NewBufReaderWithBufLen(reader, defaultBufSize)
+}
+
+func NewBufReaderWithBufLen(reader io.Reader, bufLen int) *BufReader {
+ r := &BufReader{
+ Allocator: NewScalableMemoryAllocator(bufLen), // 创建分配器
+ BufLen: bufLen,
+ feedData: func() error {
+ // 关键:从分配器读取,直接填充到内存块
+ buf, err := r.Allocator.Read(reader, r.BufLen)
+ if err != nil {
+ return err
+ }
+ n := len(buf)
+ r.totalRead += n
+ // 直接追加内存块引用,无需拷贝
+ r.buf.Buffers = append(r.buf.Buffers, buf)
+ r.buf.Size += n
+ r.buf.Length += n
+ return nil
+ },
+ }
+ r.buf.Memory = &Memory{}
+ return r
+}
+```
+
+**零拷贝关键点**:
+- `Allocator.Read()` 直接从 `io.Reader` 读取到分配的内存块
+- 返回的 `buf` 是实际存储数据的内存块引用
+- `append(r.buf.Buffers, buf)` 只是追加引用,没有数据拷贝
+
+#### 2.4.2 读取操作
+
+```go
+func (r *BufReader) ReadByte() (b byte, err error) {
+ // 如果缓冲区为空,触发数据填充
+ for r.buf.Length == 0 {
+ if err = r.feedData(); err != nil {
+ return
+ }
+ }
+ // 从内存块链表中读取,无需拷贝
+ return r.buf.ReadByte()
+}
+
+func (r *BufReader) ReadRange(n int, yield func([]byte)) error {
+ for r.recycleFront(); n > 0 && err == nil; err = r.feedData() {
+ if r.buf.Length > 0 {
+ if r.buf.Length >= n {
+ // 直接传递内存块的切片视图,无需拷贝
+ r.buf.RangeN(n, yield)
+ return
+ }
+ n -= r.buf.Length
+ r.buf.Range(yield)
+ }
+ }
+ return
+}
+```
+
+**零拷贝体现**:
+- `yield` 回调函数接收的是内存块的切片视图
+- 用户代码直接操作原始内存块,没有中间拷贝
+- 读取完成后,已读取的块自动回收
+
+#### 2.4.3 内存回收
+
+```go
+func (r *BufReader) recycleFront() {
+ // 清理已读取的内存块
+ r.buf.ClipFront(r.Allocator.Free)
+}
+
+func (r *BufReader) Recycle() {
+ r.buf = MemoryReader{}
+ if r.Allocator != nil {
+ // 将所有内存块归还给分配器
+ r.Allocator.Recycle()
+ }
+ if r.Mouth != nil {
+ close(r.Mouth)
+ }
+}
+```
+
+### 2.5 与 bufio.Reader 的对比
+
+```mermaid
+graph LR
+ subgraph "bufio.Reader(多次拷贝)"
+ A1[网络] -->|系统调用| B1[内核缓冲区]
+ B1 -->|拷贝1| C1[bufio 缓冲区]
+ C1 -->|拷贝2| D1[用户切片]
+ D1 -->|拷贝3?| E1[应用层]
+ end
+
+ subgraph "BufReader(零拷贝)"
+ A2[网络] -->|系统调用| B2[内核缓冲区]
+ B2 -->|直接读取| C2[GoMem 内存块]
+ C2 -->|切片视图| D2[用户代码]
+ D2 -->|回收| C2
+ C2 -->|复用| C2
+ end
+```
+
+| 特性 | bufio.Reader | BufReader |
+|------|-------------|-----------|
+| 内存拷贝次数 | 2-3 次 | 0 次(切片视图) |
+| 缓冲区模式 | 固定大小单缓冲区 | 可变大小链式缓冲区 |
+| 内存分配 | 每次读取可能分配 | 对象池复用 |
+| 内存回收 | GC 自动回收 | 主动归还对象池 |
+| 大块数据处理 | 需要多次操作 | 单次追加到链表 |
+| GC 压力 | 高 | 极低 |
+
+## 3. 性能基准测试
+
+### 3.1 测试场景设计
+
+#### 3.1.1 真实网络模拟
+
+为了让基准测试更加贴近实际应用场景,我们实现了一个模拟真实网络行为的 `mockNetworkReader`。
+
+**真实网络的特性**:
+
+在真实的网络读取场景中,每次 `Read()` 调用返回的数据长度是**不确定**的,受多种因素影响:
+
+- TCP 接收窗口大小
+- 网络延迟和带宽
+- 操作系统缓冲区状态
+- 网络拥塞情况
+- 网络质量波动
+
+**模拟实现**:
+
+```go
+type mockNetworkReader struct {
+ data []byte
+ offset int
+ rng *rand.Rand
+ minChunk int // 最小块大小
+ maxChunk int // 最大块大小
+}
+
+func (m *mockNetworkReader) Read(p []byte) (n int, err error) {
+ // 每次返回 minChunk 到 maxChunk 之间的随机长度数据
+ chunkSize := m.minChunk + m.rng.Intn(m.maxChunk-m.minChunk+1)
+ n = copy(p[:chunkSize], m.data[m.offset:])
+ m.offset += n
+ return n, nil
+}
+```
+
+**不同网络状况模拟**:
+
+| 网络状况 | 数据块范围 | 实际场景 |
+|---------|-----------|---------|
+| 良好网络 | 1024-4096 字节 | 稳定的局域网、优质网络环境 |
+| 一般网络 | 256-2048 字节 | 普通互联网连接 |
+| 差网络 | 64-512 字节 | 高延迟、小 TCP 窗口 |
+| 极差网络 | 1-128 字节 | 移动网络、严重拥塞 |
+
+这种模拟让基准测试结果更加真实可靠。
+
+#### 3.1.2 测试场景列表
+
+我们聚焦以下核心场景:
+
+1. **并发网络连接读取** - 展示零分配特性
+2. **并发协议解析** - 模拟真实应用
+3. **GC 压力测试** - 展示长期运行优势 ⭐
+4. **流媒体服务器场景** - 真实业务场景 ⭐
+
+### 3.2 基准测试设计
+
+#### 核心测试场景
+
+基准测试聚焦于**并发网络场景**和**GC 压力**对比:
+
+**1. 并发网络连接读取**
+- 模拟 100+ 并发连接持续读取数据
+- 每次读取 1KB 数据包并处理
+- bufio: 每次分配新缓冲区(`make([]byte, 1024)`)
+- BufReader: 零拷贝处理(`ReadRange`)
+
+**2. 并发协议解析**
+- 模拟流媒体服务器解析协议包
+- 读取包头(4字节)+ 数据内容
+- 对比内存分配策略差异
+
+**3. GC 压力测试**(⭐ 核心)
+- 持续并发读取和处理
+- 统计 GC 次数、内存分配总量、分配次数
+- 展示长期运行下的差异
+
+**4. 流媒体服务器场景**(⭐ 真实应用)
+- 模拟 100 个并发流
+- 每个流读取并转发数据给订阅者
+- 真实应用场景完整对比
+
+#### 关键测试逻辑
+
+**并发读取**:
+```go
+// bufio.Reader - 每次分配
+buf := make([]byte, 1024) // 1KB 分配
+n, _ := reader.Read(buf)
+processData(buf[:n])
+
+// BufReader - 零拷贝
+reader.ReadRange(1024, func(data []byte) {
+ processData(data) // 直接使用,无分配
+})
+```
+
+**GC 统计**:
+```go
+// 记录 GC 统计
+var beforeGC, afterGC runtime.MemStats
+runtime.ReadMemStats(&beforeGC)
+
+b.RunParallel(func(pb *testing.PB) {
+ // 并发测试...
+})
+
+runtime.ReadMemStats(&afterGC)
+b.ReportMetric(float64(afterGC.NumGC-beforeGC.NumGC), "gc-runs")
+b.ReportMetric(float64(afterGC.TotalAlloc-beforeGC.TotalAlloc)/1024/1024, "MB-alloc")
+```
+
+完整测试代码见:`pkg/util/buf_reader_benchmark_test.go`
+
+### 3.3 运行基准测试
+
+我们提供了完整的基准测试代码(`pkg/util/buf_reader_benchmark_test.go`)和便捷的测试脚本。
+
+#### 方法一:使用测试脚本(推荐)
+
+```bash
+# 运行完整的基准测试套件
+sh scripts/benchmark_bufreader.sh
+```
+
+这个脚本会依次运行所有测试并输出友好的结果。
+
+#### 方法二:手动运行测试
+
+```bash
+cd pkg/util
+
+# 运行所有基准测试
+go test -bench=BenchmarkBuf -benchmem -benchtime=2s -test.run=xxx
+
+# 运行特定测试
+go test -bench=BenchmarkMemoryAllocation -benchmem -benchtime=2s -test.run=xxx
+
+# 对比测试结果(需要安装 benchstat)
+go test -bench=BenchmarkBufioReader_SmallChunks -benchmem -count=5 > bufio.txt
+go test -bench=BenchmarkBufReader_SmallChunks -benchmem -count=5 > bufreader.txt
+benchstat bufio.txt bufreader.txt
+```
+
+#### 方法三:只运行关键测试
+
+```bash
+cd pkg/util
+
+# 内存分配场景对比(核心优势)
+go test -bench=BenchmarkMemoryAllocation -benchmem -test.run=xxx
+
+# 协议解析场景对比(实际应用)
+go test -bench=BenchmarkProtocolParsing -benchmem -test.run=xxx
+```
+
+### 3.4 实际性能测试结果
+
+在 Apple M2 Pro 上运行基准测试的实际结果:
+
+**测试环境**:
+- CPU: Apple M2 Pro (12 核)
+- OS: macOS (darwin/arm64)
+- Go: 1.23.0
+
+#### 3.4.1 核心性能对比
+
+| 测试场景 | bufio.Reader | BufReader | 差异 |
+|---------|-------------|-----------|------|
+| **并发网络读取** | 103.2 ns/op
1027 B/op, 1 allocs | 147.6 ns/op
4 B/op, 0 allocs | 零分配 ⭐ |
+| **GC 压力测试** | 1874 ns/op
5,576,659 mallocs
3 gc-runs | 112.7 ns/op
3,918 mallocs
2 gc-runs | **16.6x 快** ⭐⭐⭐ |
+| **流媒体服务器** | 374.6 ns/op
79,508 MB-alloc
134 gc-runs | 30.29 ns/op
601 MB-alloc
2 gc-runs | **12.4x 快** ⭐⭐⭐ |
+
+#### 3.4.2 GC 压力对比(核心发现)
+
+**GC 压力测试**结果最能体现长期运行的差异:
+
+**bufio.Reader**:
+```
+操作延迟: 1874 ns/op
+内存分配次数: 5,576,659 次(超过 500 万次!)
+GC 次数: 3 次
+每次操作: 2 allocs/op
+```
+
+**BufReader**:
+```
+操作延迟: 112.7 ns/op (快 16.6 倍)
+内存分配次数: 3,918 次(减少 99.93%)
+GC 次数: 2 次
+每次操作: 0 allocs/op(零分配!)
+```
+
+**关键指标**:
+- 🚀 **吞吐量提升 16 倍**:45.7M ops/s vs 2.8M ops/s
+- ⭐ **内存分配减少 99.93%**:从 557 万次降至 3918 次
+- ✨ **零分配操作**:0 allocs/op vs 2 allocs/op
+
+#### 3.4.3 流媒体服务器场景(真实应用)
+
+模拟 100 个并发流,持续读取和转发数据:
+
+**bufio.Reader**:
+```
+操作延迟: 374.6 ns/op
+内存分配: 79,508 MB(79 GB!)
+GC 次数: 134 次
+每次操作: 4 allocs/op
+```
+
+**BufReader**:
+```
+操作延迟: 30.29 ns/op(快 12.4 倍)
+内存分配: 601 MB(减少 99.2%)
+GC 次数: 2 次(减少 98.5%!)
+每次操作: 0 allocs/op
+```
+
+**惊人的差异**:
+- 🎯 **GC 次数:134 次 → 2 次**(减少 98.5%)
+- 💾 **内存分配:79 GB → 0.6 GB**(减少 132 倍)
+- ⚡ **吞吐量:10.1M → 117M ops/s**(提升 11.6 倍)
+
+#### 3.4.4 长期运行的影响
+
+在流媒体服务器场景下,**1 小时运行**的预估对比:
+
+**bufio.Reader**:
+```
+预计内存分配:~2.8 TB
+预计 GC 次数:~4,800 次
+GC 停顿累计:显著
+```
+
+**BufReader**:
+```
+预计内存分配:~21 GB(减少 133 倍)
+预计 GC 次数:~72 次(减少 67 倍)
+GC 停顿累计:极小
+```
+
+**使用建议**:
+
+| 场景 | 推荐使用 | 原因 |
+|------|---------|------|
+| 简单文件读取 | bufio.Reader | 标准库足够 |
+| **高并发网络服务器** | **BufReader** ⭐ | **GC 次数减少 98%** |
+| **流媒体数据处理** | **BufReader** ⭐ | **零分配,高吞吐** |
+| **长期运行服务** | **BufReader** ⭐ | **系统更稳定** |
+
+#### 3.4.5 性能提升的本质原因
+
+虽然在某些简单场景下 bufio.Reader 更快,但 BufReader 的设计目标不是在所有场景下都比 bufio.Reader 快,而是:
+
+1. **消除内存分配** - 在实际应用中避免频繁的 `make([]byte, n)`
+2. **降低 GC 压力** - 通过对象池复用内存,减少垃圾回收负担
+3. **零拷贝处理** - 提供 `ReadRange` API 直接操作原始数据
+4. **链式缓冲** - 支持复杂的数据处理模式
+
+在 **Monibuca 流媒体服务器** 这样的场景下,这些特性带来的价值远超过微秒级的操作延迟差异。
+
+**实际影响**:在处理 1000 个并发流媒体连接时:
+
+```go
+// bufio.Reader 方案
+// 每秒 1000 连接 × 30fps × 1024 字节/包 = 30,720,000 次分配
+// 每次分配 1024 字节 = 约 30GB/秒 的临时内存分配
+// 触发大量 GC
+
+// BufReader 方案
+// 0 次分配(内存复用)
+// GC 压力降低 90%+
+// 系统稳定性显著提升
+```
+
+**选择建议**:
+
+- 📁 **简单文件读取** → bufio.Reader
+- 🔄 **高并发网络服务** → BufReader(GC 减少 98%)
+- 💾 **长期运行服务** → BufReader(零分配)
+- 🎯 **流媒体服务器** → BufReader(吞吐量提升 10-20x)
+
+## 4. 实际应用场景
+
+### 4.1 RTSP 协议解析
+
+```go
+// 使用 BufReader 解析 RTSP 请求
+func parseRTSPRequest(conn net.Conn) (*RTSPRequest, error) {
+ reader := util.NewBufReader(conn)
+ defer reader.Recycle()
+
+ // 读取请求行:零拷贝,无内存分配
+ requestLine, err := reader.ReadLine()
+ if err != nil {
+ return nil, err
+ }
+
+ // 读取头部:直接操作内存块
+ headers, err := reader.ReadMIMEHeader()
+ if err != nil {
+ return nil, err
+ }
+
+ // 读取 body(如果有)
+ if contentLength := headers.Get("Content-Length"); contentLength != "" {
+ length, _ := strconv.Atoi(contentLength)
+ // ReadRange 提供零拷贝的数据访问
+ var body []byte
+ err = reader.ReadRange(length, func(chunk []byte) {
+ body = append(body, chunk...)
+ })
+ }
+
+ return &RTSPRequest{
+ RequestLine: requestLine,
+ Headers: headers,
+ }, nil
+}
+```
+
+### 4.2 流媒体数据包解析
+
+```go
+// 使用 BufReader 解析 FLV 数据包
+func parseFLVPackets(conn net.Conn) error {
+ reader := util.NewBufReader(conn)
+ defer reader.Recycle()
+
+ for {
+ // 读取包头:4 字节
+ packetType, err := reader.ReadByte()
+ if err != nil {
+ return err
+ }
+
+ // 读取数据大小:3 字节大端序
+ dataSize, err := reader.ReadBE32(3)
+ if err != nil {
+ return err
+ }
+
+ // 读取时间戳:4 字节
+ timestamp, err := reader.ReadBE32(4)
+ if err != nil {
+ return err
+ }
+
+ // 跳过 StreamID:3 字节
+ if err := reader.Skip(3); err != nil {
+ return err
+ }
+
+ // 读取实际数据:零拷贝处理
+ err = reader.ReadRange(int(dataSize), func(data []byte) {
+ // 直接处理数据,无需拷贝
+ processPacket(packetType, timestamp, data)
+ })
+ if err != nil {
+ return err
+ }
+
+ // 跳过 previous tag size
+ if err := reader.Skip(4); err != nil {
+ return err
+ }
+ }
+}
+```
+
+### 4.3 性能关键场景
+
+BufReader 特别适合以下场景:
+
+1. **高频小包处理**:网络协议解析,RTP/RTCP 包处理
+2. **大数据流传输**:视频流、音频流的连续读取
+3. **协议多次读取**:需要分步骤读取不同长度数据的协议
+4. **低延迟要求**:实时流媒体传输,在线游戏
+5. **高并发场景**:大量并发连接的服务器
+
+## 5. 最佳实践
+
+### 5.1 正确使用模式
+
+```go
+// ✅ 正确:创建时指定合适的块大小
+func goodExample(conn net.Conn) {
+ // 根据实际数据包大小选择块大小
+ reader := util.NewBufReaderWithBufLen(conn, 16384) // 16KB 块
+ defer reader.Recycle() // 确保资源回收
+
+ // 使用 ReadRange 实现零拷贝
+ reader.ReadRange(1024, func(data []byte) {
+ // 直接处理,不要持有 data 的引用
+ process(data)
+ })
+}
+
+// ❌ 错误:忘记回收资源
+func badExample1(conn net.Conn) {
+ reader := util.NewBufReader(conn)
+ // 缺少 defer reader.Recycle()
+ // 导致内存块无法归还对象池
+}
+
+// ❌ 错误:持有数据引用
+var globalData []byte
+
+func badExample2(conn net.Conn) {
+ reader := util.NewBufReader(conn)
+ defer reader.Recycle()
+
+ reader.ReadRange(1024, func(data []byte) {
+ // ❌ 错误:data 会在 Recycle 后被回收
+ globalData = data // 悬空引用
+ })
+}
+
+// ✅ 正确:需要保留数据时进行拷贝
+func goodExample2(conn net.Conn) {
+ reader := util.NewBufReader(conn)
+ defer reader.Recycle()
+
+ var saved []byte
+ reader.ReadRange(1024, func(data []byte) {
+ // 需要保留时显式拷贝
+ saved = make([]byte, len(data))
+ copy(saved, data)
+ })
+ // 现在可以安全使用 saved
+}
+```
+
+### 5.2 块大小选择
+
+```go
+// 根据场景选择合适的块大小
+const (
+ // 小包协议(如 RTSP, HTTP 头)
+ SmallPacketSize = 4 << 10 // 4KB
+
+ // 中等数据流(如音频)
+ MediumPacketSize = 16 << 10 // 16KB
+
+ // 大数据流(如视频)
+ LargePacketSize = 64 << 10 // 64KB
+)
+
+func createReaderForProtocol(conn net.Conn, protocol string) *util.BufReader {
+ var bufSize int
+ switch protocol {
+ case "rtsp", "http":
+ bufSize = SmallPacketSize
+ case "audio":
+ bufSize = MediumPacketSize
+ case "video":
+ bufSize = LargePacketSize
+ default:
+ bufSize = util.defaultBufSize
+ }
+ return util.NewBufReaderWithBufLen(conn, bufSize)
+}
+```
+
+### 5.3 错误处理
+
+```go
+func robustRead(conn net.Conn) error {
+ reader := util.NewBufReader(conn)
+ defer func() {
+ // 确保在任何情况下都回收资源
+ reader.Recycle()
+ }()
+
+ // 设置超时
+ conn.SetReadDeadline(time.Now().Add(5 * time.Second))
+
+ // 读取数据
+ data, err := reader.ReadBytes(1024)
+ if err != nil {
+ if err == io.EOF {
+ // 正常结束
+ return nil
+ }
+ // 处理其他错误
+ return fmt.Errorf("read error: %w", err)
+ }
+
+ // 处理数据
+ processData(data)
+ return nil
+}
+```
+
+## 6. 性能优化技巧
+
+### 6.1 批量处理
+
+```go
+// ✅ 优化:批量读取和处理
+func optimizedBatchRead(reader *util.BufReader) error {
+ // 一次性读取大块数据
+ return reader.ReadRange(65536, func(chunk []byte) {
+ // 在回调中批量处理
+ for len(chunk) > 0 {
+ packetSize := int(binary.BigEndian.Uint32(chunk[:4]))
+ packet := chunk[4 : 4+packetSize]
+ processPacket(packet)
+ chunk = chunk[4+packetSize:]
+ }
+ })
+}
+
+// ❌ 低效:逐个读取
+func inefficientRead(reader *util.BufReader) error {
+ for {
+ size, err := reader.ReadBE32(4)
+ if err != nil {
+ return err
+ }
+ packet, err := reader.ReadBytes(int(size))
+ if err != nil {
+ return err
+ }
+ processPacket(packet.Buffers[0])
+ }
+}
+```
+
+### 6.2 避免不必要的拷贝
+
+```go
+// ✅ 优化:直接处理,无拷贝
+func zeroCopyProcess(reader *util.BufReader) error {
+ return reader.ReadRange(4096, func(data []byte) {
+ // 直接在原始内存上操作
+ sum := 0
+ for _, b := range data {
+ sum += int(b)
+ }
+ reportChecksum(sum)
+ })
+}
+
+// ❌ 低效:不必要的拷贝
+func unnecessaryCopy(reader *util.BufReader) error {
+ mem, err := reader.ReadBytes(4096)
+ if err != nil {
+ return err
+ }
+ // 又进行了一次拷贝
+ data := make([]byte, mem.Size)
+ copy(data, mem.Buffers[0])
+
+ sum := 0
+ for _, b := range data {
+ sum += int(b)
+ }
+ reportChecksum(sum)
+ return nil
+}
+```
+
+### 6.3 合理的资源管理
+
+```go
+// ✅ 优化:使用对象池管理 BufReader
+type ConnectionPool struct {
+ readers sync.Pool
+}
+
+func (p *ConnectionPool) GetReader(conn net.Conn) *util.BufReader {
+ if reader := p.readers.Get(); reader != nil {
+ r := reader.(*util.BufReader)
+ // 重新初始化
+ return r
+ }
+ return util.NewBufReader(conn)
+}
+
+func (p *ConnectionPool) PutReader(reader *util.BufReader) {
+ reader.Recycle() // 回收内存块
+ p.readers.Put(reader) // 回收 BufReader 对象本身
+}
+
+// 使用连接池
+func handleConnection(pool *ConnectionPool, conn net.Conn) {
+ reader := pool.GetReader(conn)
+ defer pool.PutReader(reader)
+
+ // 处理连接
+ processConnection(reader)
+}
+```
+
+## 7. 总结
+
+### 7.1 性能对比可视化
+
+基于实际基准测试结果(并发场景):
+
+```
+📊 GC 次数对比(核心优势)⭐⭐⭐
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+bufio.Reader ████████████████████████████████████████████████████████████████ 134 次
+BufReader █ 2 次 ← 减少 98.5%!
+
+📊 内存分配总量对比
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+bufio.Reader ████████████████████████████████████████████████████████████████ 79 GB
+BufReader █ 0.6 GB ← 减少 99.2%!
+
+📊 操作吞吐量对比
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+bufio.Reader █████ 10.1M ops/s
+BufReader ████████████████████████████████████████████████████████ 117M ops/s ← 11.6x!
+```
+
+**关键指标**(流媒体服务器场景):
+- 🎯 **GC 次数**:从 134 次降至 2 次(减少 98.5%)
+- 💾 **内存分配**:从 79 GB 降至 0.6 GB(减少 132 倍)
+- ⚡ **吞吐量**:提升 11.6 倍
+
+### 7.2 核心优势
+
+BufReader 通过以下设计实现了零拷贝的高性能网络数据读取:
+
+1. **零拷贝架构**
+ - 数据直接从网络读取到最终内存位置
+ - 使用切片视图避免数据拷贝
+ - 链式缓冲区支持大块数据处理
+
+2. **内存复用机制**
+ - GoMem 对象池复用内存块
+ - 主动内存管理减少 GC 压力
+ - 可配置的块大小适应不同场景
+
+3. **性能提升显著**(在并发场景下)
+ - GC 次数减少 98.5%(134 → 2)
+ - 内存分配减少 99.2%(79 GB → 0.6 GB)
+ - 吞吐量提升 10-20 倍
+ - 系统稳定性显著提升
+
+### 7.3 适用场景
+
+BufReader 特别适合:
+
+- ✅ 高性能网络服务器
+- ✅ 流媒体数据处理
+- ✅ 实时协议解析
+- ✅ 大数据流传输
+- ✅ 低延迟要求场景
+- ✅ 高并发环境
+
+不适合:
+
+- ❌ 简单的文件读取(标准库足够)
+- ❌ 单次小数据读取
+- ❌ 不关心性能的场景
+
+### 7.4 与 bufio.Reader 的选择
+
+| 场景 | 推荐使用 |
+|------|---------|
+| 简单文件读取 | bufio.Reader |
+| 低频次网络读取 | bufio.Reader |
+| 高性能网络服务器 | BufReader |
+| 流媒体处理 | BufReader |
+| 协议解析器 | BufReader |
+| 需要零拷贝 | BufReader |
+| 内存敏感场景 | BufReader |
+
+### 7.5 关键要点
+
+使用 BufReader 时记住:
+
+1. **始终调用 Recycle()**:确保内存块归还对象池
+2. **不要持有数据引用**:ReadRange 回调中的数据会被回收
+3. **选择合适的块大小**:根据实际数据包大小调整
+4. **利用 ReadRange**:实现真正的零拷贝处理
+5. **配合 GoMem 使用**:充分发挥内存复用优势
+
+通过 BufReader 和 GoMem 的配合,Monibuca 实现了高性能的网络数据处理,为流媒体服务器提供了坚实的基础设施支持。
+
+## 参考资料
+
+- [GoMem 项目](https://github.com/langhuihui/gomem)
+- [Monibuca v5 文档](https://monibuca.com)
+- [对象复用技术详解](./arch/reuse.md)
+- Go 标准库 `bufio` 包源码
+- Go 标准库 `sync.Pool` 文档
+
diff --git a/pkg/util/buf_reader_benchmark_test.go b/pkg/util/buf_reader_benchmark_test.go
new file mode 100644
index 0000000..c97e726
--- /dev/null
+++ b/pkg/util/buf_reader_benchmark_test.go
@@ -0,0 +1,408 @@
+package util
+
+import (
+ "bufio"
+ "io"
+ "math/rand"
+ "runtime"
+ "testing"
+)
+
+// mockNetworkReader 模拟真实网络数据源
+//
+// 真实的网络读取场景中,每次 Read() 调用返回的数据长度是不确定的,
+// 受多种因素影响:
+// - TCP 接收窗口大小
+// - 网络延迟和带宽
+// - 操作系统缓冲区状态
+// - 网络拥塞情况
+//
+// 这个 mock reader 通过每次返回随机长度的数据来模拟真实网络行为,
+// 使基准测试更加接近实际应用场景。
+type mockNetworkReader struct {
+ data []byte
+ offset int
+ rng *rand.Rand
+ // minChunk 和 maxChunk 控制每次返回的数据块大小范围
+ minChunk int
+ maxChunk int
+}
+
+func (m *mockNetworkReader) Read(p []byte) (n int, err error) {
+ if m.offset >= len(m.data) {
+ m.offset = 0 // 循环读取
+ }
+
+ // 计算本次可以返回的最大长度
+ remaining := len(m.data) - m.offset
+ maxRead := len(p)
+ if remaining < maxRead {
+ maxRead = remaining
+ }
+
+ // 随机返回 minChunk 到 min(maxChunk, maxRead) 之间的数据
+ chunkSize := m.minChunk
+ if m.maxChunk > m.minChunk && maxRead > m.minChunk {
+ maxPossible := m.maxChunk
+ if maxRead < maxPossible {
+ maxPossible = maxRead
+ }
+ chunkSize = m.minChunk + m.rng.Intn(maxPossible-m.minChunk+1)
+ }
+ if chunkSize > maxRead {
+ chunkSize = maxRead
+ }
+
+ n = copy(p[:chunkSize], m.data[m.offset:m.offset+chunkSize])
+ m.offset += n
+ return n, nil
+}
+
+// newMockNetworkReader 创建一个模拟真实网络的 reader
+// 每次 Read 返回随机长度的数据(在 minChunk 到 maxChunk 之间)
+func newMockNetworkReader(size int, minChunk, maxChunk int) *mockNetworkReader {
+ data := make([]byte, size)
+ for i := range data {
+ data[i] = byte(i % 256)
+ }
+ return &mockNetworkReader{
+ data: data,
+ rng: rand.New(rand.NewSource(42)), // 固定种子保证可重复性
+ minChunk: minChunk,
+ maxChunk: maxChunk,
+ }
+}
+
+// newMockNetworkReaderDefault 创建默认配置的模拟网络 reader
+// 每次返回 64 到 2048 字节之间的随机数据
+func newMockNetworkReaderDefault(size int) *mockNetworkReader {
+ return newMockNetworkReader(size, 64, 2048)
+}
+
+// ============================================================
+// 单元测试:验证 mockNetworkReader 的行为
+// ============================================================
+
+// TestMockNetworkReader_RandomChunks 验证随机长度读取功能
+func TestMockNetworkReader_RandomChunks(t *testing.T) {
+ reader := newMockNetworkReader(10000, 100, 500)
+ buf := make([]byte, 1000)
+
+ // 读取多次,验证每次返回的长度在预期范围内
+ for i := 0; i < 10; i++ {
+ n, err := reader.Read(buf)
+ if err != nil {
+ t.Fatalf("读取失败: %v", err)
+ }
+ if n < 100 || n > 500 {
+ t.Errorf("第 %d 次读取返回 %d 字节,期望在 [100, 500] 范围内", i, n)
+ }
+ }
+}
+
+// ============================================================
+// 核心基准测试:模拟真实网络场景
+// ============================================================
+
+// BenchmarkConcurrentNetworkRead_Bufio 模拟并发网络连接处理 - bufio.Reader
+// 这个测试模拟多个并发连接持续读取和处理网络数据
+// bufio.Reader 会为每个数据包分配新的缓冲区,产生大量临时内存
+func BenchmarkConcurrentNetworkRead_Bufio(b *testing.B) {
+ b.RunParallel(func(pb *testing.PB) {
+ // 每个 goroutine 代表一个网络连接
+ reader := bufio.NewReaderSize(newMockNetworkReaderDefault(10*1024*1024), 4096)
+
+ for pb.Next() {
+ // 模拟读取网络数据包并处理
+ // 这里每次都分配新的缓冲区(真实场景中的常见做法)
+ buf := make([]byte, 1024) // 每次分配 1KB - 会产生 GC 压力
+ n, err := reader.Read(buf)
+ if err != nil {
+ b.Fatal(err)
+ }
+
+ // 模拟处理数据(计算校验和)
+ var sum int
+ for i := 0; i < n; i++ {
+ sum += int(buf[i])
+ }
+ _ = sum
+ }
+ })
+}
+
+// BenchmarkConcurrentNetworkRead_BufReader 模拟并发网络连接处理 - BufReader
+// 使用 BufReader 的零拷贝特性,通过内存池复用避免频繁分配
+func BenchmarkConcurrentNetworkRead_BufReader(b *testing.B) {
+ b.RunParallel(func(pb *testing.PB) {
+ // 每个 goroutine 代表一个网络连接
+ reader := NewBufReader(newMockNetworkReaderDefault(10 * 1024 * 1024))
+ defer reader.Recycle()
+
+ for pb.Next() {
+ // 使用零拷贝的 ReadRange,无需分配缓冲区
+ var sum int
+ err := reader.ReadRange(1024, func(data []byte) {
+ // 直接处理原始数据,无内存分配
+ for _, b := range data {
+ sum += int(b)
+ }
+ })
+ if err != nil {
+ b.Fatal(err)
+ }
+ _ = sum
+ }
+ })
+}
+
+// BenchmarkConcurrentProtocolParsing_Bufio 模拟并发协议解析 - bufio.Reader
+// 模拟流媒体服务器解析多个并发流的数据包
+func BenchmarkConcurrentProtocolParsing_Bufio(b *testing.B) {
+ b.RunParallel(func(pb *testing.PB) {
+ reader := bufio.NewReaderSize(newMockNetworkReaderDefault(10*1024*1024), 4096)
+
+ for pb.Next() {
+ // 读取包头(4字节长度)
+ header := make([]byte, 4) // 分配 1
+ _, err := io.ReadFull(reader, header)
+ if err != nil {
+ b.Fatal(err)
+ }
+
+ // 计算数据包大小(256-1024 字节)
+ size := 256 + int(header[3])%768
+
+ // 读取数据包内容
+ packet := make([]byte, size) // 分配 2
+ _, err = io.ReadFull(reader, packet)
+ if err != nil {
+ b.Fatal(err)
+ }
+
+ // 模拟处理数据包
+ _ = packet
+ }
+ })
+}
+
+// BenchmarkConcurrentProtocolParsing_BufReader 模拟并发协议解析 - BufReader
+func BenchmarkConcurrentProtocolParsing_BufReader(b *testing.B) {
+ b.RunParallel(func(pb *testing.PB) {
+ reader := NewBufReader(newMockNetworkReaderDefault(10 * 1024 * 1024))
+ defer reader.Recycle()
+
+ for pb.Next() {
+ // 读取包头
+ size, err := reader.ReadBE32(4)
+ if err != nil {
+ b.Fatal(err)
+ }
+
+ // 计算数据包大小
+ packetSize := 256 + int(size)%768
+
+ // 零拷贝读取和处理
+ err = reader.ReadRange(packetSize, func(data []byte) {
+ // 直接处理,无需分配
+ _ = data
+ })
+ if err != nil {
+ b.Fatal(err)
+ }
+ }
+ })
+}
+
+// BenchmarkHighFrequencyReads_Bufio 高频小包读取 - bufio.Reader
+// 模拟视频流的高频小包场景(如 30fps 视频流)
+func BenchmarkHighFrequencyReads_Bufio(b *testing.B) {
+ reader := bufio.NewReaderSize(newMockNetworkReaderDefault(10*1024*1024), 4096)
+
+ b.ResetTimer()
+ b.ReportAllocs()
+
+ for i := 0; i < b.N; i++ {
+ // 每次读取小数据包(128 字节)
+ buf := make([]byte, 128) // 频繁分配小对象
+ _, err := reader.Read(buf)
+ if err != nil {
+ b.Fatal(err)
+ }
+ _ = buf
+ }
+}
+
+// BenchmarkHighFrequencyReads_BufReader 高频小包读取 - BufReader
+func BenchmarkHighFrequencyReads_BufReader(b *testing.B) {
+ reader := NewBufReader(newMockNetworkReaderDefault(10 * 1024 * 1024))
+ defer reader.Recycle()
+
+ b.ResetTimer()
+ b.ReportAllocs()
+
+ for i := 0; i < b.N; i++ {
+ // 零拷贝读取
+ err := reader.ReadRange(128, func(data []byte) {
+ _ = data
+ })
+ if err != nil {
+ b.Fatal(err)
+ }
+ }
+}
+
+// ============================================================
+// GC 压力测试:展示长时间运行下的 GC 影响
+// ============================================================
+
+// BenchmarkGCPressure_Bufio 展示 bufio.Reader 在持续运行下的 GC 压力
+// 这个测试会产生大量临时内存分配,触发频繁 GC
+func BenchmarkGCPressure_Bufio(b *testing.B) {
+ var beforeGC runtime.MemStats
+ runtime.ReadMemStats(&beforeGC)
+
+ // 模拟 10 个并发连接持续处理数据
+ b.SetParallelism(10)
+ b.RunParallel(func(pb *testing.PB) {
+ reader := bufio.NewReaderSize(newMockNetworkReaderDefault(100*1024*1024), 4096)
+
+ for pb.Next() {
+ // 模拟处理一个数据包:读取 + 处理 + 临时分配
+ buf := make([]byte, 512) // 每次分配 512 字节
+ n, err := reader.Read(buf)
+ if err != nil {
+ b.Fatal(err)
+ }
+
+ // 模拟数据处理(可能需要额外分配)
+ processed := make([]byte, n) // 再分配一次
+ copy(processed, buf[:n])
+
+ // 模拟业务处理
+ var sum int64
+ for _, v := range processed {
+ sum += int64(v)
+ }
+ _ = sum
+ }
+ })
+
+ var afterGC runtime.MemStats
+ runtime.ReadMemStats(&afterGC)
+
+ // 报告 GC 统计
+ b.ReportMetric(float64(afterGC.NumGC-beforeGC.NumGC), "gc-runs")
+ b.ReportMetric(float64(afterGC.TotalAlloc-beforeGC.TotalAlloc)/1024/1024, "MB-alloc")
+ b.ReportMetric(float64(afterGC.Mallocs-beforeGC.Mallocs), "mallocs")
+}
+
+// BenchmarkGCPressure_BufReader 展示 BufReader 通过内存复用降低 GC 压力
+// 零拷贝 + 内存池复用,几乎不产生临时对象
+func BenchmarkGCPressure_BufReader(b *testing.B) {
+ var beforeGC runtime.MemStats
+ runtime.ReadMemStats(&beforeGC)
+
+ b.SetParallelism(10)
+ b.RunParallel(func(pb *testing.PB) {
+ reader := NewBufReader(newMockNetworkReaderDefault(100 * 1024 * 1024))
+ defer reader.Recycle()
+
+ for pb.Next() {
+ // 零拷贝处理,无临时分配
+ var sum int64
+ err := reader.ReadRange(512, func(data []byte) {
+ // 直接在原始内存上处理,无需拷贝
+ for _, v := range data {
+ sum += int64(v)
+ }
+ })
+ if err != nil {
+ b.Fatal(err)
+ }
+ _ = sum
+ }
+ })
+
+ var afterGC runtime.MemStats
+ runtime.ReadMemStats(&afterGC)
+
+ // 报告 GC 统计
+ b.ReportMetric(float64(afterGC.NumGC-beforeGC.NumGC), "gc-runs")
+ b.ReportMetric(float64(afterGC.TotalAlloc-beforeGC.TotalAlloc)/1024/1024, "MB-alloc")
+ b.ReportMetric(float64(afterGC.Mallocs-beforeGC.Mallocs), "mallocs")
+}
+
+// BenchmarkStreamingServer_Bufio 模拟流媒体服务器场景 - bufio.Reader
+// 100 个并发连接,每个连接持续读取和转发数据
+func BenchmarkStreamingServer_Bufio(b *testing.B) {
+ var beforeGC runtime.MemStats
+ runtime.ReadMemStats(&beforeGC)
+
+ b.RunParallel(func(pb *testing.PB) {
+ reader := bufio.NewReaderSize(newMockNetworkReaderDefault(50*1024*1024), 8192)
+ frameNum := 0
+
+ for pb.Next() {
+ // 读取一帧数据(1KB-4KB 之间变化)
+ frameSize := 1024 + (frameNum%3)*1024
+ frameNum++
+ frame := make([]byte, frameSize)
+
+ _, err := io.ReadFull(reader, frame)
+ if err != nil {
+ b.Fatal(err)
+ }
+
+ // 模拟转发给多个订阅者(需要拷贝)
+ for i := 0; i < 3; i++ {
+ subscriber := make([]byte, len(frame))
+ copy(subscriber, frame)
+ _ = subscriber
+ }
+ }
+ })
+
+ var afterGC runtime.MemStats
+ runtime.ReadMemStats(&afterGC)
+
+ gcRuns := afterGC.NumGC - beforeGC.NumGC
+ totalAlloc := float64(afterGC.TotalAlloc-beforeGC.TotalAlloc) / 1024 / 1024
+
+ b.ReportMetric(float64(gcRuns), "gc-runs")
+ b.ReportMetric(totalAlloc, "MB-alloc")
+}
+
+// BenchmarkStreamingServer_BufReader 模拟流媒体服务器场景 - BufReader
+func BenchmarkStreamingServer_BufReader(b *testing.B) {
+ var beforeGC runtime.MemStats
+ runtime.ReadMemStats(&beforeGC)
+
+ b.RunParallel(func(pb *testing.PB) {
+ reader := NewBufReader(newMockNetworkReaderDefault(50 * 1024 * 1024))
+ defer reader.Recycle()
+
+ for pb.Next() {
+ // 零拷贝读取
+ err := reader.ReadRange(1024+1024, func(frame []byte) {
+ // 直接使用原始数据,无需拷贝
+ // 模拟转发(实际可以使用引用计数或共享内存)
+ for i := 0; i < 3; i++ {
+ _ = frame
+ }
+ })
+ if err != nil {
+ b.Fatal(err)
+ }
+ }
+ })
+
+ var afterGC runtime.MemStats
+ runtime.ReadMemStats(&afterGC)
+
+ gcRuns := afterGC.NumGC - beforeGC.NumGC
+ totalAlloc := float64(afterGC.TotalAlloc-beforeGC.TotalAlloc) / 1024 / 1024
+
+ b.ReportMetric(float64(gcRuns), "gc-runs")
+ b.ReportMetric(totalAlloc, "MB-alloc")
+}