update_wint2_doc (#3968)

This commit is contained in:
AIbin
2025-09-08 15:53:09 +08:00
committed by GitHub
parent 83bd55100b
commit 316ac546d3
4 changed files with 2 additions and 2 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

View File

@@ -4,7 +4,7 @@ Weights are compressed offline using the [CCQ (Convolutional Coding Quantization
- **Supported Hardware**: GPU
- **Supported Architecture**: MoE architecture
This method relies on the convolution algorithm to use overlapping bits to map 2-bit values to a larger numerical representation space, so that the model weight quantization retains more information of the original data while compressing the true value to an extremely low 2-bit size. The general principle can be seen in the figure below:
[卷积编码量化示意图](./wint2.png)
![卷积编码量化示意图](./images/wint2.png)
CCQ WINT2 is generally used in resource-constrained and low-threshold scenarios. Taking ERNIE-4.5-300B-A47B as an example, weights are compressed to 89GB, supporting single-card deployment on 141GB H20.