GoldPancake
f7cad30a38
[Feature] Add speculative decoding simulation benchmark. ( #2751 )
...
* Add speculative decoding simulation benchmark
* Fix the name of the parameter
2025-07-09 12:08:43 +08:00
gaoziyuan
6b10c19482
【Feature】add fd commit/branch info when start server ( #2752 )
...
* add_commit_config
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-09 11:52:22 +08:00
EnflameGCU
f4f1d8de44
Support for non-CUDA builds ( #2750 )
...
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-09 11:48:40 +08:00
RichardWooSJTU
6610aa29d0
Revert "[Bug fix] fix attention rank init ( #2743 )" ( #2761 )
...
This reverts commit e8bbe7244b .
2025-07-09 10:38:12 +08:00
Ryan
f72c4de539
[SOT] Make custom_op dy&st unified ( #2733 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* make_custom_op dy&st unified
* add instance judgement
2025-07-08 19:21:44 +08:00
xiegetest
f6ffbc3cbd
add precision check for ci ( #2732 )
...
* add precision check for ci
* add precision check for ci
* add precision check for ci
* add precision check for ci
---------
Co-authored-by: xiegegege <xiege01@baidu.com >
2025-07-08 18:43:53 +08:00
RichardWooSJTU
e8bbe7244b
[Bug fix] fix attention rank init ( #2743 )
...
* fix attention rank init
* fix attention rank init
2025-07-08 17:19:49 +08:00
Longzhi Wang
57b086dc6b
[Bug fix] Add the missing pod_ip param to the launch_cache_manager function. ( #2742 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug fix] fix the missing position args in expert_service.py
* update
2025-07-08 14:52:13 +08:00
lizexu123
525be243e7
[Bug fix] Fixed the garbled text issues in Qwen3-8B ( #2737 )
...
* fix qwen3.py
* update
* update lm_head tie_word_embeddings
* update tie_word_embeddings
* fix
* fix tie_word_embedding not in config.json
---------
Co-authored-by: lizexu <lizexu@baidu.com >
2025-07-07 23:15:27 -07:00
EnflameGCU
d0f4d6ba3a
[GCU] Support gcu platform ( #2702 )
...
baseline: e7fa57ebae
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-08 13:00:52 +08:00
gaoziyuan
26d5d737dd
【Fearture】support qwen2 some func ( #2740 )
...
* add rl qwen model support
* fix
* fix
2025-07-08 12:03:04 +08:00
Ryan
fefbd65cf8
[SOT] Remove BreakGraph with paddle.maximum ( #2731 )
...
* rm if with clip
* clip -> maximum
* int64 -> int32
2025-07-08 11:44:25 +08:00
ming1753
1eb8ea7328
[Bug fix] fix complie bug when sm < 89 ( #2738 )
2025-07-08 11:24:52 +08:00
ming1753
ef6649a577
[Optimize] Optimize tensorwise fp8 performance ( #2729 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optimize] Optimize tensorwise fp8 performance
2025-07-07 20:06:28 +08:00
liddk1121
1b54a2831e
Adapt for iluvatar gpu ( #2684 )
2025-07-07 16:53:14 +08:00
YUNSHEN XIE
2579e8fea8
support FastDeploy version setting ( #2725 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-07 14:50:11 +08:00
Yuanle Liu
91528f1af9
remove redundant install whl of fastdeploy ( #2726 )
...
* remove redundant install
* remove redundant install
2025-07-06 23:49:37 -07:00
lddfym
4e293e50fa
Check if the controller port is available ( #2724 )
2025-07-07 13:24:55 +08:00
chen
66b321d9ec
Update eb45-0.3B cuda memory ( #2686 )
2025-07-07 11:31:15 +08:00
ltd0924
68b4755587
[LLM] support multi node deploy ( #2708 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy
* Update engine.py
* fix bugs
* fix
* [LLM] support multi node deploy
* [LLM] support multi node deploy
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-06 10:33:51 +08:00
LQX
04a8e1ef2b
修改XPU CI, test=model ( #2721 )
2025-07-06 10:19:04 +08:00
Ting
a6e9161045
fix bug. ( #2718 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-05 08:19:19 +08:00
Ting
90ef28d982
spec token map lazy. ( #2715 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-05 00:14:54 +08:00
YuBaoku
b37585e693
[BugFix] fix paddle_git_commit_id error ( #2714 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* set git identity to avoid merge failure in CI
* add ci cases
* [CI] Add validation for MTP and CUDAGraph
* [BugFix] fix paddle_git_commit_id error
2025-07-04 22:16:37 +08:00
lizexu123
9cb08e71e8
add support QWQ enable_thinking ( #2706 )
...
* add support QWQ enable_thinking
* add stream=True
* fix stream=true
* fix qwen
---------
Co-authored-by: lizexu <lizexu@baidu.com >
2025-07-04 20:55:23 +08:00
YuBaoku
dacc46f04c
[CI] Add validation for MTP and CUDAGraph ( #2710 )
...
* set git identity to avoid merge failure in CI
* add ci cases
* [CI] Add validation for MTP and CUDAGraph
2025-07-04 18:13:54 +08:00
Jiang-Jia-Jun
09ded7715f
Update mkdocs.yml
2025-07-04 17:55:52 +08:00
LQX
11cfdf5d89
添加XPU CI, test=model ( #2701 )
...
* 添加XPU CI, test=model
* 添加XPU CI, test=model
* 添加XPU CI, test=model
* 添加XPU CI, test=model
* 添加XPU CI, test=model
* 添加XPU CI, test=model
* 添加XPU CI, test=model
* 添加XPU CI, test=model
* 添加XPU CI, test=model
2025-07-04 16:16:06 +08:00
GoldPancake
e7fa57ebae
Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue ( #2707 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mtp eh_proj layer
* fix mtp update_cfg function
* fix stringdoc
* simplify class name
2025-07-04 14:15:04 +08:00
gaoziyuan
a5ae88ded9
[feature]add fd whl version info ( #2698 )
2025-07-04 14:12:42 +08:00
ltd0924
87e638498c
[RL] update reschedule finish reason ( #2709 )
2025-07-04 13:47:36 +08:00
freeliuzc
667547be59
support chunk_prefill in MTP ( #2705 )
2025-07-04 11:55:48 +08:00
LiqinruiG
b38823bc66
modify reasoning_output docs ( #2696 )
2025-07-04 11:30:02 +08:00
Divano
050d9658a5
Update requirements.txt
2025-07-04 09:53:03 +08:00
Divano
be5cabaf80
add quick benchmark ( #2703 )
...
测试脚本不需要过CI
2025-07-04 09:32:36 +08:00
Yuanle Liu
240bdac2a4
[feat] support fa3 backend for pd disaggregated ( #2695 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* delete use_fast_ffn
2025-07-03 22:33:27 +08:00
ltd0924
00863c43fd
[Bug] fix logger format ( #2689 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-03 19:58:03 +08:00
kevin
3d3bccdf79
[doc] update docs ( #2690 )
2025-07-03 19:33:19 +08:00
Jiang-Jia-Jun
9fd74f75bd
Update dynamic_weight_manager.py
2025-07-03 15:55:22 +08:00
Jiang-Jia-Jun
05c670e593
[Sync] Update to latest code ( #2679 )
...
* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
d222248d00
Update README.md
2025-07-03 15:28:28 +08:00
Jiang-Jia-Jun
e5b94d4117
Update README.md
2025-07-03 15:28:05 +08:00
Jiang-Jia-Jun
87e2e58a22
Update gh-pages.yml
2025-07-03 15:26:21 +08:00
Jiang-Jia-Jun
de20e5a992
Update Dockerfile.xpu
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-03 10:14:50 +08:00
Jiang-Jia-Jun
2f9c0618f0
Update Dockerfile.gpu
2025-07-03 10:14:39 +08:00
Yuanle Liu
9a14ab6572
add --force-reinstall --no-cache-dir when pip install fastdeploy*.whl ( #2682 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-02 05:32:20 -07:00
Divano
d1cb3ed571
Update gh-pages.yml ( #2680 )
2025-07-02 17:36:18 +08:00
handiz
b8a8a19689
add wint2 performance ( #2673 )
2025-07-02 17:10:01 +08:00
Jiang-Jia-Jun
97ac82834f
Update nvidia_gpu.md
2025-07-02 16:54:14 +08:00
Jiang-Jia-Jun
685265a97d
Update nvidia_gpu.md
2025-07-02 15:43:35 +08:00