Files
golib/ioutils/delim/doc.go
nabbar 96ed6f9a1f [Package IOUtils/Delim]
- FIX: potential CWE-400 with bufio.ReadBytes & bufio.ReadSlices, with
  no limited read buffer
- ADD: test to check overflow buffer with discard or error
- REFACTOR: all buffering package, parsing process
- UPDATE: doc, examples, test following changes
- OPTIMIZE: rework code to optimize process
- REWORK: benchmark to check benefice of optimization
- FIX: wording error

Package IOUtils/Multi:
- REWORK: re-design all package to allow sequential/parallel mode
- UPDATE: package with adaptive mode to allow switch automaticly between
  sequential and parallel mode following measurment of sample
- OPTIMIZE: code to maximize bandwith and reduce time of write
- UPDATE: documentation, test and comments
- REWORK: testing organization and benchmark aggregation

Package HttpServer:
- FIX: bug with dial addr rewrite for healtcheck & testing PortUse

Package Logger/HookFile:
- FIX: bug with race condition on aggregator counter file

Other:
- Bump dependencies
- FIX: format / import file
2025-12-21 16:56:13 +01:00

294 lines
9.4 KiB
Go

/*
* MIT License
*
* Copyright (c) 2025 Nicolas JUHEL
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
*
*/
/*
Package delim provides a buffered reader for reading delimiter-separated data streams.
# Overview
The delim package wraps an io.ReadCloser with a buffered reader that efficiently
processes data separated by a configurable delimiter character. Unlike bufio.Scanner,
which is limited to newline-delimited text, delim allows any rune character as a
delimiter and provides more control over the reading process.
# Key Features
- Custom delimiter support (newlines, commas, pipes, tabs, null bytes, Unicode characters)
- Configurable buffer sizes for performance optimization
- Standard Go interfaces (io.ReadCloser, io.WriterTo)
- Zero-copy operations where possible
- Memory-efficient chunk processing
- Thread-safe when used correctly (one goroutine per instance)
# Architecture
The package follows a layered architecture to provide efficient, buffered reading with custom delimiters.
┌────────────────────────────────────────────────┐
│ BufferDelim Interface │
│ io.ReadCloser + io.WriterTo + Custom Methods │
└──────────┬─────────────────────────────────────┘
┌──────────▼────────────────────────┐
│ dlm Implementation │
│ │
│ ┌──────────────────────────────┐ │
│ │ io.ReadCloser (source) │ │
│ └──────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ Internal Buffer │ │
│ └──────────────────────────────┘ │
│ │ │
│ ▼ │
│ delimiter detection │
│ │ │
│ ▼ │
│ chunk extraction │
└───────────────────────────────────┘
Component Characteristics:
BufferDelim: O(1) memory, Simple API, Thread-safe per instance
dlm: O(1) memory, Internal logic, Thread-safe per instance
DiscardCloser: O(1) memory, Minimal logic, Always thread-safe
# Basic Usage
The primary type is BufferDelim, created using the New function:
file, err := os.Open("data.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Create a BufferDelim with newline delimiter and default buffer
bd := delim.New(file, '\n', 0)
defer bd.Close()
// Read line by line
for {
line, err := bd.ReadBytes()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Printf("Line: %s", line) // line includes '\n'
}
# Reading Methods
BufferDelim provides several methods for reading data:
Read(p []byte) - Reads one delimited chunk into the provided buffer
buf := make([]byte, 1024)
n, err := bd.Read(buf)
data := buf[:n]
ReadBytes() - Returns the next delimited chunk as a byte slice
chunk, err := bd.ReadBytes()
// chunk includes the delimiter
UnRead() - Returns buffered but unread data (consumes buffer)
buffered, err := bd.UnRead()
// Data is removed from buffer for next Read
# Writing Methods
BufferDelim also implements io.WriterTo for efficient copying:
outputFile, _ := os.Create("output.txt")
defer outputFile.Close()
written, err := bd.WriteTo(outputFile)
fmt.Printf("Copied %r bytes\n", written)
# Common Use Cases
CSV/TSV Processing:
// Read CSV fields separated by commas
csvFile, _ := os.Open("data.csv")
bd := delim.New(csvFile, ',', 0)
defer bd.Close()
for {
field, err := bd.ReadBytes()
if err == io.EOF {
break
}
processField(field)
}
Log File Processing:
// Process log entries separated by newlines with larger buffer
logFile, _ := os.Open("app.log")
bd := delim.New(logFile, '\n', 64*size.KiB)
defer bd.Close()
for {
entry, err := bd.ReadBytes()
if err == io.EOF {
break
}
analyzeLogEntry(entry)
}
Null-Terminated Strings:
// Read C-style null-terminated strings
dataStream, _ := net.Dial("tcp", "server:port")
bd := delim.New(dataStream, 0, 0) // 0 is null byte
defer bd.Close()
msg, err := bd.ReadBytes()
// msg is terminated by '\0'
# Performance Considerations
Buffer Size:
The sizeBufferRead parameter in New() controls the internal buffer size.
Larger buffers can improve performance when reading large data chunks:
// Default buffer (32KB)
bd := delim.New(reader, '\n', 0)
// Large buffer for high-throughput scenarios
bd := delim.New(reader, '\n', 64*size.KiB)
// Very large buffer for processing huge records
bd := delim.New(reader, '\n', size.MiB)
Memory Efficiency:
ReadBytes() returns slices backed by the internal buffer, which may be
reused on subsequent reads. Copy the data if you need to retain it:
data, _ := bd.ReadBytes()
retained := make([]byte, len(data))
copy(retained, data)
# Error Handling
The package defines ErrInstance for invalid operations:
data, err := bd.ReadBytes()
if err == delim.ErrInstance {
// BufferDelim was closed or is invalid
log.Fatal("Invalid BufferDelim instance")
}
if err == io.EOF {
// End of stream reached
return
}
# DiscardCloser
The package also provides DiscardCloser, a no-op io.ReadWriteCloser
useful for testing and benchmarking:
dc := delim.DiscardCloser{}
n, _ := dc.Write([]byte("test")) // n == 4, data discarded
n, _ = dc.Read(buf) // n == 0, immediate EOF
# Best Practices
- Always close the BufferDelim instance to release resources.
- Use the default buffer size (0) for most use cases unless profiling indicates a bottleneck.
- For high-throughput scenarios with large records, consider using a larger buffer (e.g., 64KB).
- Ensure thread safety by using one BufferDelim instance per goroutine.
- Check for io.EOF to detect the end of the stream gracefully.
- For comprehensive testing examples and guidelines, refer to the TESTING.md file.
# API Reference
The main interface is BufferDelim, which combines io.ReadCloser and io.WriterTo with custom delimiter methods.
type BufferDelim interface {
io.ReadCloser
io.WriterTo
Delim() rune
Reader() io.ReadCloser
Copy(w io.Writer) (n int64, err error)
ReadBytes() ([]byte, error)
UnRead() ([]byte, error)
}
Key Functions:
- New(r io.ReadCloser, delim rune, sizeBufferRead size.Size) BufferDelim
Error Handling:
- ErrInstance: Returned when accessing a closed or invalid instance.
- io.EOF: Returned when the stream ends.
# Related Packages
- github.com/nabbar/golib/size - Convenient size constants (KiB, MiB, etc.)
- bufio - Standard library buffered I/O (delim provides similar functionality but with custom delimiters)
- io - Standard library I/O interfaces
# Concurrency
BufferDelim instances are not safe for concurrent access. Each instance
should be used by a single goroutine. If multiple goroutines need to process
the same stream, either use a mutex or create separate readers.
# Comparison with bufio.Scanner
Advantages of delim over bufio.Scanner:
- Custom delimiters (not limited to newlines or custom split functions)
- Delimiter character included in returned data
- More predictable buffer behavior
- Implements standard io.ReadCloser and io.WriterTo interfaces
- Better control over buffer sizes
When to use bufio.Scanner instead:
- You only need newline-delimited text
- You need to remove delimiters from results
- You want simpler line-by-line iteration
# Examples
See the test files for comprehensive usage examples:
- constructor_test.go - Construction patterns
- read_test.go - Reading operations
- write_test.go - Writing operations
- edge_cases_test.go - Unicode, binary data, error handling
- benchmark_test.go - Performance optimization examples
*/
package delim