FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 16:48:03 +08:00

Author	SHA1	Message	Date
yinwei	e98937cbba	delete useless file (#2772 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-11 11:46:04 +08:00
Sunny-bot1	240d6236bc	[Fix]fix top_k_top_p sampling (#2801 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix topk-topp * update * add base_non_truncated	2025-07-10 22:35:10 +08:00
littledgg	59071268b6	[Executor] Move forward_meta.py to fastdeploy/model_executor (#2774 ) * Use PEP 563 in attention.py and fix conflict * merge commit * Change what was left out last time	2025-07-10 20:36:51 +08:00
lizexu123	8c660a0dfb	[BugFix] fix RMSNorm rms_norm_esp (#2797 ) * fix rms * add vl * fix * add vl * fix * fix	2025-07-10 20:02:24 +08:00
LiqinruiG	ce5adec877	[Doc] modify offline-inerence docs (#2800 ) * modify offline-inerence docs * [bug] remove tool_call_content	2025-07-10 19:41:12 +08:00
yulangz	830de5a925	[XPU] Supports TP4 deployment on 4,5,6,7 (#2794 ) * 支持通过 XPU_VISIBLE_DEVICES 指定 4,5,6,7 卡运行 * 修改 XPU 文档中多卡说明	2025-07-10 16:48:08 +08:00
chen	d33105baeb	[Feature] Online Chat API Support Return logprobs (#2777 ) * online chat support logprobs * check xpu * check vl_gpu_model_runner and xpu_model_runner * get_worker() check platform	2025-07-10 16:33:40 +08:00
K11OntheBoat	24f934f1f9	[BugFix] Fix low prediction accuracy of deepseekv3 (#2798 )	2025-07-10 16:16:44 +08:00
Sunny-bot1	1e2319cbef	Rename top_p_sampling to top_k_top_p_sampling (#2791 )	2025-07-10 00:09:25 -07:00
Sunny-bot1	e45050cae3	[Feature] support top_k_top_p sampling (#2753 ) * support top_k_top_p sampling * fix * add api param * add api para * fix * fix * fix * fix * fix * fix * fix	2025-07-09 20:58:58 -07:00
Ryan	b0f525955c	[SOT] Remove breakgraph in post processing && fix datatype (#2780 )	2025-07-10 11:26:00 +08:00
Yuanle Liu	2ea267f624	assert prompt len > 0 (#2773 )	2025-07-10 11:14:52 +08:00
0x3878f	1d8af7ab73	Add env variable for dy2st (#2779 )	2025-07-10 11:06:06 +08:00
Jiang-Jia-Jun	a4fdb3970b	[BugFix] Fix vocab size error for ernie model (#2785 ) * [BugFix] Fix vocab size error for ernie model * [BugFix] Fix vocab size error for ernie model --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-10 01:05:51 +08:00
Jiang-Jia-Jun	2a86928657	[BugFix Revert] Fix vocab size error for ernie model	2025-07-09 22:14:54 +08:00
Jiang-Jia-Jun	b1c53fa779	[BugFix] Fix vocab size error for ernie model	2025-07-09 22:13:41 +08:00
lizexu123	da20cf681e	[Bug fix] Fixed the garbled text issues in Qwen3-8B (#2783 )	2025-07-09 22:03:57 +08:00
chen	888780ffde	[Feature] block_wise_fp8 support triton_moe_backend (#2767 )	2025-07-09 19:22:47 +08:00
RAM	e3768c5a83	[Executor] Fix bug of logger.debug (#2778 )	2025-07-09 04:13:43 -07:00
lifulll	1f28bdf994	dcu adapter ernie45t (#2756 ) Co-authored-by: lifu <lifu@sugon.com> Co-authored-by: yongqiangma <xing.wo@163.com>	2025-07-09 18:56:27 +08:00
RAM	03a74995b8	Clear dead code And supplementary notes (#2757 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * 1.supplementary notes 2.delete dead code * fix bug of forward meta * Global modification of forward meta * fix vl model_runner bug	2025-07-09 16:17:34 +08:00
zhink	b89180f1cd	[Feature] support custom all-reduce (#2758 ) * [Feature] support custom all-reduce * add vllm adapted	2025-07-09 16:00:27 +08:00
yulangz	be21ef5047	[XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B (#2765 ) * fix no quant xpu moe * change dir of xpu moe weight only	2025-07-09 15:57:51 +08:00
yulangz	0350831c2b	fix xpu offline demo garbled output (#2763 )	2025-07-09 14:51:20 +08:00
RichardWooSJTU	fee544e808	fix ep prefill (#2762 )	2025-07-09 14:03:05 +08:00
Ryan	c4718fd693	Enable SOT D2St in Multimodal Model (#2735 )	2025-07-09 12:26:18 +08:00
GoldPancake	f7cad30a38	[Feature] Add speculative decoding simulation benchmark. (#2751 ) * Add speculative decoding simulation benchmark * Fix the name of the parameter	2025-07-09 12:08:43 +08:00
gaoziyuan	6b10c19482	【Feature】add fd commit/branch info when start server (#2752 ) * add_commit_config * fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-09 11:52:22 +08:00
RichardWooSJTU	6610aa29d0	Revert "[Bug fix] fix attention rank init (#2743 )" (#2761 ) This reverts commit `e8bbe7244b`.	2025-07-09 10:38:12 +08:00
Ryan	f72c4de539	[SOT] Make custom_op dy&st unified (#2733 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * make_custom_op dy&st unified * add instance judgement	2025-07-08 19:21:44 +08:00
RichardWooSJTU	e8bbe7244b	[Bug fix] fix attention rank init (#2743 ) * fix attention rank init * fix attention rank init	2025-07-08 17:19:49 +08:00
Longzhi Wang	57b086dc6b	[Bug fix] Add the missing `pod_ip` param to the launch_cache_manager function. (#2742 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Bug fix] fix the missing position args in expert_service.py * update	2025-07-08 14:52:13 +08:00
lizexu123	525be243e7	[Bug fix] Fixed the garbled text issues in Qwen3-8B (#2737 ) * fix qwen3.py * update * update lm_head tie_word_embeddings * update tie_word_embeddings * fix * fix tie_word_embedding not in config.json --------- Co-authored-by: lizexu <lizexu@baidu.com>	2025-07-07 23:15:27 -07:00
EnflameGCU	d0f4d6ba3a	[GCU] Support gcu platform (#2702 ) baseline: `e7fa57ebae` Co-authored-by: yongqiangma <xing.wo@163.com>	2025-07-08 13:00:52 +08:00
gaoziyuan	26d5d737dd	【Fearture】support qwen2 some func (#2740 ) * add rl qwen model support * fix * fix	2025-07-08 12:03:04 +08:00
Ryan	fefbd65cf8	[SOT] Remove BreakGraph with `paddle.maximum` (#2731 ) * rm if with clip * clip -> maximum * int64 -> int32	2025-07-08 11:44:25 +08:00
ming1753	1eb8ea7328	[Bug fix] fix complie bug when sm < 89 (#2738 )	2025-07-08 11:24:52 +08:00
ming1753	ef6649a577	[Optimize] Optimize tensorwise fp8 performance (#2729 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Optimize] Optimize tensorwise fp8 performance	2025-07-07 20:06:28 +08:00
liddk1121	1b54a2831e	Adapt for iluvatar gpu (#2684 )	2025-07-07 16:53:14 +08:00
lddfym	4e293e50fa	Check if the controller port is available (#2724 )	2025-07-07 13:24:55 +08:00
ltd0924	68b4755587	[LLM] support multi node deploy (#2708 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [LLM] support multi node deploy * Update engine.py * fix bugs * fix * [LLM] support multi node deploy * [LLM] support multi node deploy --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-06 10:33:51 +08:00
Ting	a6e9161045	fix bug. (#2718 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-05 08:19:19 +08:00
Ting	90ef28d982	spec token map lazy. (#2715 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-05 00:14:54 +08:00
lizexu123	9cb08e71e8	add support QWQ enable_thinking (#2706 ) * add support QWQ enable_thinking * add stream=True * fix stream=true * fix qwen --------- Co-authored-by: lizexu <lizexu@baidu.com>	2025-07-04 20:55:23 +08:00
GoldPancake	e7fa57ebae	Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue (#2707 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix mtp eh_proj layer * fix mtp update_cfg function * fix stringdoc * simplify class name	2025-07-04 14:15:04 +08:00
gaoziyuan	a5ae88ded9	[feature]add fd whl version info (#2698 )	2025-07-04 14:12:42 +08:00
ltd0924	87e638498c	[RL] update reschedule finish reason (#2709 )	2025-07-04 13:47:36 +08:00
freeliuzc	667547be59	support chunk_prefill in MTP (#2705 )	2025-07-04 11:55:48 +08:00
Yuanle Liu	240bdac2a4	[feat] support fa3 backend for pd disaggregated (#2695 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * delete use_fast_ffn	2025-07-03 22:33:27 +08:00
ltd0924	00863c43fd	[Bug] fix logger format (#2689 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-03 19:58:03 +08:00

... 3 4 5 6 7 ...

804 Commits