mirror of
https://github.com/gonum/gonum.git
synced 2025-10-25 08:10:28 +08:00
Used 3xIncrement registers to consolidate pointer arithmetic to the end of the main loop. Moved pipelining register calculations to after short-vector test. Added two-wide tail code block.