Commit Graph

804 Commits

Author SHA1 Message Date
yinwei
e98937cbba delete useless file (#2772)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-11 11:46:04 +08:00
Sunny-bot1
240d6236bc [Fix]fix top_k_top_p sampling (#2801)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix topk-topp

* update

* add base_non_truncated
2025-07-10 22:35:10 +08:00
littledgg
59071268b6 [Executor] Move forward_meta.py to fastdeploy/model_executor (#2774)
* Use PEP 563 in attention.py and fix conflict

* merge commit

* Change what was left out last time
2025-07-10 20:36:51 +08:00
lizexu123
8c660a0dfb [BugFix] fix RMSNorm rms_norm_esp (#2797)
* fix rms

* add vl

* fix

* add vl

* fix

* fix
2025-07-10 20:02:24 +08:00
LiqinruiG
ce5adec877 [Doc] modify offline-inerence docs (#2800)
* modify offline-inerence docs

* [bug] remove tool_call_content
2025-07-10 19:41:12 +08:00
yulangz
830de5a925 [XPU] Supports TP4 deployment on 4,5,6,7 (#2794)
* 支持通过 XPU_VISIBLE_DEVICES 指定 4,5,6,7 卡运行
* 修改 XPU 文档中多卡说明
2025-07-10 16:48:08 +08:00
chen
d33105baeb [Feature] Online Chat API Support Return logprobs (#2777)
* online chat support logprobs

* check xpu

* check vl_gpu_model_runner and xpu_model_runner

* get_worker() check platform
2025-07-10 16:33:40 +08:00
K11OntheBoat
24f934f1f9 [BugFix] Fix low prediction accuracy of deepseekv3 (#2798) 2025-07-10 16:16:44 +08:00
Sunny-bot1
1e2319cbef Rename top_p_sampling to top_k_top_p_sampling (#2791) 2025-07-10 00:09:25 -07:00
Sunny-bot1
e45050cae3 [Feature] support top_k_top_p sampling (#2753)
* support top_k_top_p sampling

* fix

* add api param

* add api para

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2025-07-09 20:58:58 -07:00
Ryan
b0f525955c [SOT] Remove breakgraph in post processing && fix datatype (#2780) 2025-07-10 11:26:00 +08:00
Yuanle Liu
2ea267f624 assert prompt len > 0 (#2773) 2025-07-10 11:14:52 +08:00
0x3878f
1d8af7ab73 Add env variable for dy2st (#2779) 2025-07-10 11:06:06 +08:00
Jiang-Jia-Jun
a4fdb3970b [BugFix] Fix vocab size error for ernie model (#2785)
* [BugFix] Fix vocab size error for ernie model

* [BugFix] Fix vocab size error for ernie model

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-10 01:05:51 +08:00
Jiang-Jia-Jun
2a86928657 [BugFix Revert] Fix vocab size error for ernie model 2025-07-09 22:14:54 +08:00
Jiang-Jia-Jun
b1c53fa779 [BugFix] Fix vocab size error for ernie model 2025-07-09 22:13:41 +08:00
lizexu123
da20cf681e [Bug fix] Fixed the garbled text issues in Qwen3-8B (#2783) 2025-07-09 22:03:57 +08:00
chen
888780ffde [Feature] block_wise_fp8 support triton_moe_backend (#2767) 2025-07-09 19:22:47 +08:00
RAM
e3768c5a83 [Executor] Fix bug of logger.debug (#2778) 2025-07-09 04:13:43 -07:00
lifulll
1f28bdf994 dcu adapter ernie45t (#2756)
Co-authored-by: lifu <lifu@sugon.com>
Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-09 18:56:27 +08:00
RAM
03a74995b8 Clear dead code And supplementary notes (#2757)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* 1.supplementary notes 2.delete dead code

* fix bug of forward meta

* Global modification of forward meta

* fix vl model_runner bug
2025-07-09 16:17:34 +08:00
zhink
b89180f1cd [Feature] support custom all-reduce (#2758)
* [Feature] support custom all-reduce

* add vllm adapted
2025-07-09 16:00:27 +08:00
yulangz
be21ef5047 [XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B (#2765)
* fix no quant xpu moe

* change dir of xpu moe weight only
2025-07-09 15:57:51 +08:00
yulangz
0350831c2b fix xpu offline demo garbled output (#2763) 2025-07-09 14:51:20 +08:00
RichardWooSJTU
fee544e808 fix ep prefill (#2762) 2025-07-09 14:03:05 +08:00
Ryan
c4718fd693 Enable SOT D2St in Multimodal Model (#2735) 2025-07-09 12:26:18 +08:00
GoldPancake
f7cad30a38 [Feature] Add speculative decoding simulation benchmark. (#2751)
* Add speculative decoding simulation benchmark

* Fix the name of the parameter
2025-07-09 12:08:43 +08:00
gaoziyuan
6b10c19482 【Feature】add fd commit/branch info when start server (#2752)
* add_commit_config

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-09 11:52:22 +08:00
RichardWooSJTU
6610aa29d0 Revert "[Bug fix] fix attention rank init (#2743)" (#2761)
This reverts commit e8bbe7244b.
2025-07-09 10:38:12 +08:00
Ryan
f72c4de539 [SOT] Make custom_op dy&st unified (#2733)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* make_custom_op dy&st unified

* add instance judgement
2025-07-08 19:21:44 +08:00
RichardWooSJTU
e8bbe7244b [Bug fix] fix attention rank init (#2743)
* fix attention rank init

* fix attention rank init
2025-07-08 17:19:49 +08:00
Longzhi Wang
57b086dc6b [Bug fix] Add the missing pod_ip param to the launch_cache_manager function. (#2742)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug fix] fix the missing position args in expert_service.py

* update
2025-07-08 14:52:13 +08:00
lizexu123
525be243e7 [Bug fix] Fixed the garbled text issues in Qwen3-8B (#2737)
* fix qwen3.py

* update

* update lm_head tie_word_embeddings

* update tie_word_embeddings

* fix

* fix tie_word_embedding not in config.json

---------

Co-authored-by: lizexu <lizexu@baidu.com>
2025-07-07 23:15:27 -07:00
EnflameGCU
d0f4d6ba3a [GCU] Support gcu platform (#2702)
baseline: e7fa57ebae

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-08 13:00:52 +08:00
gaoziyuan
26d5d737dd 【Fearture】support qwen2 some func (#2740)
* add rl qwen model support

* fix

* fix
2025-07-08 12:03:04 +08:00
Ryan
fefbd65cf8 [SOT] Remove BreakGraph with paddle.maximum (#2731)
* rm if with clip

* clip -> maximum

* int64 -> int32
2025-07-08 11:44:25 +08:00
ming1753
1eb8ea7328 [Bug fix] fix complie bug when sm < 89 (#2738) 2025-07-08 11:24:52 +08:00
ming1753
ef6649a577 [Optimize] Optimize tensorwise fp8 performance (#2729)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optimize] Optimize tensorwise fp8 performance
2025-07-07 20:06:28 +08:00
liddk1121
1b54a2831e Adapt for iluvatar gpu (#2684) 2025-07-07 16:53:14 +08:00
lddfym
4e293e50fa Check if the controller port is available (#2724) 2025-07-07 13:24:55 +08:00
ltd0924
68b4755587 [LLM] support multi node deploy (#2708)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy

* Update engine.py

* fix bugs

* fix

* [LLM] support multi node deploy

* [LLM] support multi node deploy

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-06 10:33:51 +08:00
Ting
a6e9161045 fix bug. (#2718)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-05 08:19:19 +08:00
Ting
90ef28d982 spec token map lazy. (#2715)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-05 00:14:54 +08:00
lizexu123
9cb08e71e8 add support QWQ enable_thinking (#2706)
* add support QWQ enable_thinking

* add stream=True

* fix stream=true

* fix qwen

---------

Co-authored-by: lizexu <lizexu@baidu.com>
2025-07-04 20:55:23 +08:00
GoldPancake
e7fa57ebae Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue (#2707)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mtp eh_proj layer

* fix mtp update_cfg function

* fix stringdoc

* simplify class name
2025-07-04 14:15:04 +08:00
gaoziyuan
a5ae88ded9 [feature]add fd whl version info (#2698) 2025-07-04 14:12:42 +08:00
ltd0924
87e638498c [RL] update reschedule finish reason (#2709) 2025-07-04 13:47:36 +08:00
freeliuzc
667547be59 support chunk_prefill in MTP (#2705) 2025-07-04 11:55:48 +08:00
Yuanle Liu
240bdac2a4 [feat] support fa3 backend for pd disaggregated (#2695)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* delete use_fast_ffn
2025-07-03 22:33:27 +08:00
ltd0924
00863c43fd [Bug] fix logger format (#2689)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-03 19:58:03 +08:00