bukejiyu
08b3153661
update doc ( #3990 )
...
Co-authored-by: root <root@tjdm-inf-sci-k8s-hzz2-h12ni8-0214.tjdm.baidu.com >
2025-09-08 21:04:26 +08:00
AIbin
d00faeec69
update dsk doc ( #3989 )
2025-09-08 20:42:48 +08:00
yinwei
7e0bfd024f
update release note ( #3986 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-08 19:03:14 +08:00
JYChen
1f056a7469
[docs] update best practice docs ( #3969 )
...
* update best practice docs
* add version and v1 loader info
2025-09-08 17:39:38 +08:00
yangjianfengo1
9ead10e1bc
更新文档 ( #3975 )
2025-09-08 16:53:37 +08:00
xiaolei373
571ddc677b
Modify markdown ( #3896 )
...
* feat(log):add_request_and_response_log
* modify markdown graceful shutdown
2025-09-08 16:42:34 +08:00
AIbin
316ac546d3
update_wint2_doc ( #3968 )
2025-09-08 15:53:09 +08:00
Sunny-bot1
ed5133f704
update env docs for Machete ( #3959 )
2025-09-08 14:44:31 +08:00
qwes5s5
17169a14f2
[metrics] Add serveral observability metrics ( #3868 )
...
* Add several observability metrics
* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息
* adjust some metrics and md files
* trigger ci
* adjust ci file
* trigger ci
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-08 14:13:13 +08:00
yangjianfengo1
472402bf4e
Update sparse attn documentation ( #3954 )
...
* 更新文档
* 更新文档
* 更新文档
* 更新文档
2025-09-08 12:23:18 +08:00
ltd0924
7643e6e6b2
[Docs] add data parallel ( #3883 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Docs] add data parallel
* [Docs] add data parallel
2025-09-04 20:33:50 +08:00
xiaolei373
ed97cf8396
Graceful shut down ( #3785 )
...
* feat(log):add_request_and_response_log
* 优雅退出-接口增加退出时长参数
2025-09-04 19:33:50 +08:00
AIbin
54b458fd98
[Doc] update wint2 doc ( #3819 )
...
* update_wint2_doc
2025-09-03 11:27:43 +08:00
Jiang-Jia-Jun
18e5d355a1
Update version in docs
2025-09-02 19:21:10 +08:00
kevin
1908465542
[Feature] mm and thinking model support structred output ( #2749 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* mm support structured output
* update code
* update code
* update format
* update code
* update code
* add enable_thinking default
* update code
* add structured_outputs test case
* add ci install xgrammar
* add ci timeout time
* update test for structured_outputs
* update code
* add error traceback info
* update error msg
* update structred output code
* update code
* update code
* update config
* update torch version
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-02 16:21:09 +08:00
Jiang-Jia-Jun
27f2e7a6f1
Create faq.md
2025-09-02 11:07:37 +08:00
lizexu123
6dd61a1bab
fix Document ( #3782 )
...
Co-authored-by: example_name <example_email>
2025-09-01 20:22:43 +08:00
co63oc
d6369b4d51
fix typos ( #3684 )
2025-09-01 17:50:17 +08:00
Jiang-Jia-Jun
0513a78ecc
Update docs for reasoing-parser
2025-09-01 17:42:58 +08:00
Jiang-Jia-Jun
2bd7d90929
Remove useless parameters
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-01 14:43:56 +08:00
yangjianfengo1
3754a9906d
[Feature] block sparse attention ( #3668 )
...
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
* 修复显存
* add test server
* add test server
* fix mlp 最后一层使用full attn
2025-08-29 19:46:30 +08:00
Yuan Xiaolan
c71ee0831c
add w4afp8 offline script ( #3636 )
2025-08-29 17:56:05 +08:00
周周周
17b414c2df
MoE Default use triton's blockwise fp8 in TP Case ( #3678 )
2025-08-29 11:07:30 +08:00
Mattheliu
108d989d9d
[Docs] add fastdeploy_unit_test_guide.md ( #3484 )
...
* docs:add fastdeploy_unit_test_guide.md
* docs:fix fastdeploy_unit_test_guide.md
* docs: add FastDeploy unit test spec (EN) and update usage nav
* fix codestyle
2025-08-28 14:12:25 +08:00
Jiang-Jia-Jun
c694fa2879
Revert "[Feature] block sparse attention ( #3209 )" ( #3647 )
...
This reverts commit 646a0c2fd8
.
2025-08-27 17:35:04 +08:00
JYChen
e645db348b
[docs] Update best practice doc ( #3539 )
...
* fix some docs error
* [docs] x1 best-practice
* update docs
* fix docs
2025-08-27 15:45:30 +08:00
chen
ce9c0917c5
[Precision] Support lm_head layer running in float32 ( #3597 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support lm_head fp32 bf16 fp16
* support lm_head fp32 bf16 fp16
* add doc and check code
* lm_head_fp32 specify lm_head as fp32
* code check
* check doc
2025-08-27 11:34:53 +08:00
yangjianfengo1
646a0c2fd8
[Feature] block sparse attention ( #3209 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
2025-08-26 07:16:04 -07:00
Yuanle Liu
cbce94a00e
rename ernie_xxx to ernie4_5_xxx ( #3621 )
...
* rename ernie_xxx to ernie4_5_xxx
* ci fix
2025-08-26 19:29:27 +08:00
Sunny-bot1
c68c3c4b8b
[Feature] bad words support v1 scheduler and specifiy token ids ( #3608 )
...
* support bad_words_token_ids
* docs
* fix test
* fix
* bad words support kvcache v1 and token ids
* fix
2025-08-25 20:14:51 -07:00
Kane2011
2ae7ab28d2
[MetaxGPU] adapt to the latest fastdeploy on metax gpu ( #3492 )
2025-08-25 17:44:20 +08:00
chen
9cab3f47ff
[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing ( #3552 )
...
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
2025-08-25 14:11:49 +08:00
zhink
df7c31012b
Modified to support custom all reduce by default ( #3538 )
2025-08-22 16:59:05 +08:00
luukunn
371fb3f853
[Feature] add tool parser ( #3483 )
...
* add tool parser
* add x1 enable_thinking
* restart ci
* fix vl reasoning parser
* modify call style
* modify call style
* add offline enablethinking
* fix completion
* fix
* fix unit test
* fix unit test
* fix unit test
* fix vl reasoning parser
* fix vl reasoning parser
2025-08-21 17:25:44 +08:00
Yzc216
466cbb5a99
[Feature] Models api ( #3073 )
...
* add v1/models interface related
* add model parameters
* default model verification
* unit test
* check model err_msg
* unit test
* type annotation
* model parameter in response
* modify document description
* modify document description
* unit test
* verification
* verification update
* model_name
* pre-commit
* update test case
* update test case
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/entrypoints/openai/serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-08-21 17:02:56 +08:00
Zhang Yulong
33ff0bfe38
Update disaggregated.md ( #3495 )
...
修复文档错误
2025-08-20 19:39:18 +08:00
luukunn
9c129813f9
[Feature] add custom chat template ( #3251 )
...
* add custom chat_template
* add custom chat_template
* add unittest
* fix
* add docs
* fix comment
* add offline chat
* fix unit test
* fix unit test
* fix
* fix pre commit
* fix unit test
* add unit test
* add unit test
* add unit test
* fix pre_commit
* fix enable_thinking
* fix pre commit
* fix pre commit
* fix unit test
* add requirements
2025-08-18 16:34:08 +08:00
RAM
154308102e
[Docs]Updata docs of graph opt backend ( #3442 )
...
* Updata docs of graph opt backend
* update best_practices
2025-08-15 21:30:32 +08:00
yongqiangma
5703d7aa0f
update installation readme ( #3429 )
2025-08-15 19:09:41 +08:00
yangjianfengo1
615930bc05
Update README ( #3426 )
...
* 修改READMe
* code style
* code style
2025-08-15 18:46:28 +08:00
JYChen
6f11171478
fix some docs error ( #3439 )
2025-08-15 18:45:27 +08:00
yinwei
354575b6d1
[Docs]Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95 ( #3428 )
...
* XPU Update 2.1 Release Documentation
* code style check
* Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95
2025-08-15 18:34:37 +08:00
ming1753
d4e3a20300
[Docs] Release 2.1 docs and fix some description ( #3424 )
2025-08-15 14:27:19 +08:00
yinwei
fbb6dcb9e4
[Docs]XPU Update 2.1 Release Documentation ( #3423 )
...
* XPU Update 2.1 Release Documentation
* code style check
2025-08-15 14:07:47 +08:00
JYChen
562e01c979
update docs ( #3420 )
2025-08-15 13:00:08 +08:00
ltd0924
5a84324798
[Doc] Add multinode deployment documents ( #3417 )
...
* Create multi-node_deployment.md
* Create multi-node_deployment.md
* Update mkdocs.yml
2025-08-15 10:37:04 +08:00
yzwu
ce9180241e
[Iluvatar GPU] Modify the names of some variables ( #3273 )
2025-08-13 11:38:02 +08:00
yangjianfengo1
b808c49585
[Doc] 增加中英文切换 ( #3318 )
...
* 增加中英文切换
* 增加中英文切换
* 修改readme
2025-08-12 11:20:45 +08:00
Sunny-bot1
19fda4e912
fix docs ( #3332 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-11 21:03:49 +08:00
Sunny-bot1
789dc67ff7
[Docs]fix sampling docs ( #3113 )
...
* fix sampling docs
* fix sampling docs
* update
2025-08-11 20:42:27 +08:00