[Docx] add language (en/cn) switch links (#4470)

* add install docs * 修改文档 * 修改文档
2025-12-24 13:28:13 +08:00 · 2025-10-17 15:47:41 +08:00
parent a3e0a15495
commit ba5c2b7e37
106 changed files with 206 additions and 0 deletions
--- a/docs/quantization/README.md
+++ b/docs/quantization/README.md
@@ -1,3 +1,5 @@
+[简体中文](../zh/quantization/README.md)
+
 # Quantization

 FastDeploy supports various quantization inference precisions including FP8, INT8, INT4, 2-bits, etc. It supports different precision inference for weights, activations, and KVCache tensors, which can meet the inference requirements of different scenarios such as low cost, low latency, and long context.
--- a/docs/quantization/online_quantization.md
+++ b/docs/quantization/online_quantization.md
@@ -1,3 +1,5 @@
+[简体中文](../zh/quantization/online_quantization.md)
+
 # Online Quantization

 Online quantization refers to the inference engine quantizing weights after loading BF16 weights, rather than loading pre-quantized low-precision weights. FastDeploy supports online quantization of BF16 to various precisions, including: INT4, INT8, and FP8.
--- a/docs/quantization/wint2.md
+++ b/docs/quantization/wint2.md
@@ -1,3 +1,5 @@
+[简体中文](../zh/quantization/wint2.md)
+
 # WINT2 Quantization

 Weights are compressed offline using the [CCQ (Convolutional Coding Quantization)](https://arxiv.org/pdf/2507.07145) method. The actual stored numerical type of weights is INT8, with 4 weights packed into each INT8 value, equivalent to 2 bits per weight. Activations are not quantized. During inference, weights are dequantized and decoded in real-time to BF16 numerical type, and calculations are performed using BF16 numerical type.