[Docx] add language (en/cn) switch links (#4470)

* add install docs * 修改文档 * 修改文档
2025-12-24 13:28:13 +08:00 · 2025-10-17 15:47:41 +08:00
parent a3e0a15495
commit ba5c2b7e37
106 changed files with 206 additions and 0 deletions
--- a/docs/zh/features/chunked_prefill.md
+++ b/docs/zh/features/chunked_prefill.md
@@ -1,3 +1,5 @@
+[English](../../features/chunked_prefill.md)
+
 # Chunked Prefill 与 128K 长文推理部署

 Chunked Prefill 采用分块策略，将预填充（Prefill）阶段请求拆解为小规模子任务，与解码（Decode）请求混合批处理执行。可以更好地平衡计算密集型（Prefill）和访存密集型（Decode）操作，优化GPU资源利用率，减少单次Prefill的计算量和显存占用，从而降低显存峰值，避免显存不足的问题。
--- a/docs/zh/features/data_parallel_service.md
+++ b/docs/zh/features/data_parallel_service.md
@@ -1,3 +1,5 @@
+[English](../../features/data_parallel_service.md)
+
 # 数据并行
 在MOE模型下，开启专家并行（EP）与数据并行（DP）相结合，EP 分摊专家负载，结合 DP 实现请求并行处理。

--- a/docs/zh/features/disaggregated.md
+++ b/docs/zh/features/disaggregated.md
@@ -1,3 +1,5 @@
+[English](../../features/disaggregated.md)
+
 # 分离式部署

 大模型推理分为两个部分Prefill和Decode阶段，分别为计算密集型（Prefill）和存储密集型（Decode）两部分。将Prefill 和 Decode 分开部署在一定场景下可以提高硬件利用率，有效提高吞吐，降低整句时延，
--- a/docs/zh/features/early_stop.md
+++ b/docs/zh/features/early_stop.md
@@ -1,3 +1,4 @@
+[English](../../features/early_stop.md)

 # 早停功能

--- a/docs/zh/features/graph_optimization.md
+++ b/docs/zh/features/graph_optimization.md
@@ -1,3 +1,5 @@
+[English](../../features/graph_optimization.md)
+
 # FastDeploy 中的图优化技术
 FastDeploy 的 `GraphOptimizationBackend` 中集成了多种图优化技术:

--- a/docs/zh/features/load_balance.md
+++ b/docs/zh/features/load_balance.md
@@ -1,3 +1,5 @@
+[English](../../features/load_balance.md)
+
 # 全局调度器: 多实例负载均衡

 ## 设计方案
--- a/docs/zh/features/multi-node_deployment.md
+++ b/docs/zh/features/multi-node_deployment.md
@@ -1,3 +1,5 @@
+[English](../../features/multi-node_deployment.md)
+
 # 多节点部署

 ## 概述
--- a/docs/zh/features/plas_attention.md
+++ b/docs/zh/features/plas_attention.md
@@ -1,3 +1,5 @@
+[English](../../features/plas_attention.md)
+
 # PLAS

 ## 介绍
--- a/docs/zh/features/plugins.md
+++ b/docs/zh/features/plugins.md
@@ -1,3 +1,5 @@
+[English](../../features/plugins.md)
+
 # FastDeploy 插件机制说明文档

 FastDeploy 支持插件机制，允许用户在不修改核心代码的前提下扩展功能。插件通过 Python 的 `entry_points` 机制实现自动发现与加载。
--- a/docs/zh/features/prefix_caching.md
+++ b/docs/zh/features/prefix_caching.md
@@ -1,3 +1,5 @@
+[English](../../features/prefix_caching.md)
+
 # Prefix Caching

 Prefix Caching（前缀缓存）是一种优化生成式模型推理效率的技术，核心思想是通过缓存输入序列的中间计算结果（KV Cache），避免重复计算，从而加速具有相同前缀的多个请求的响应速度。
--- a/docs/zh/features/reasoning_output.md
+++ b/docs/zh/features/reasoning_output.md
@@ -1,3 +1,5 @@
+[English](../../features/reasoning_output.md)
+
 # 思考链内容

 思考模型在输出中返回 `reasoning_content` 字段，表示思考链内容,即得出最终结论的思考步骤.
--- a/docs/zh/features/sampling.md
+++ b/docs/zh/features/sampling.md
@@ -1,3 +1,5 @@
+[English](../../features/sampling.md)
+
 # 采样策略

 采样策略用于决定如何从模型的输出概率分布中选择下一个token。FastDeploy目前支持 Top-p 、 Top-k_Top-p 和 Min-p Sampling 多种采样策略。
--- a/docs/zh/features/speculative_decoding.md
+++ b/docs/zh/features/speculative_decoding.md
@@ -1,3 +1,5 @@
+[English](../../features/speculative_decoding.md)
+
 # 🔮 投机解码
 本项目基于 PaddlePaddle 实现了高效的 **投机解码（Speculative Decoding）** 推理框架，支持多 Token 预测（Multi-token Proposing, MTP），用于加速大语言模型（LLM）的生成，显著降低时延并提升吞吐量。

--- a/docs/zh/features/structured_outputs.md
+++ b/docs/zh/features/structured_outputs.md
@@ -1,3 +1,5 @@
+[English](../../features/structured_outputs.md)
+
 # Structured Outputs

 ## 概述