mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
update_wint2_doc (#3968)
This commit is contained in:
BIN
docs/quantization/images/wint2.png
Normal file
BIN
docs/quantization/images/wint2.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 81 KiB |
@@ -4,7 +4,7 @@ Weights are compressed offline using the [CCQ (Convolutional Coding Quantization
|
||||
- **Supported Hardware**: GPU
|
||||
- **Supported Architecture**: MoE architecture
|
||||
This method relies on the convolution algorithm to use overlapping bits to map 2-bit values to a larger numerical representation space, so that the model weight quantization retains more information of the original data while compressing the true value to an extremely low 2-bit size. The general principle can be seen in the figure below:
|
||||
[卷积编码量化示意图](./wint2.png)
|
||||

|
||||
|
||||
CCQ WINT2 is generally used in resource-constrained and low-threshold scenarios. Taking ERNIE-4.5-300B-A47B as an example, weights are compressed to 89GB, supporting single-card deployment on 141GB H20.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user