FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
ophilia-lee	33db137d0b	新增vLLM默认请求参数yaml	2025-07-15 19:31:27 +08:00
lijingning	9d6a42b334	适配vLLM无arrival_time；适配vLLM model必传；RequestFuncInput/RequestFuncOutput/SampleRequest新增用例编号no	2025-07-15 19:31:27 +08:00
Jiang-Jia-Jun	1b712bba82	Update setup.py	2025-07-15 14:57:23 +08:00
AIbin	fd91da7b41	【Inference Optimize】Support wint2 triton kernel about triton_utils_v2 (#2842 ) * update supported_models doc	2025-07-15 14:35:40 +08:00
bukejiyu	15c8c240b5	[vl] Use top_k from config.json (#2831 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-15 00:39:12 +08:00
freeliuzc	7cdd8d290d	[MTP] optimize mtp infer speed (#2840 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-14 19:50:22 +08:00
YuanRisheng	4c7b8bc458	Simplify the Config code (#2770 ) * simplify the code * fix vl * delete config * fix * perfect code * fix ci * fix xpu * fix xpu * fix server * resolve conflict * fix mtp * resolve conflict * fix xpu * fix xpu * fix vl * fix log * fix qwen moe * fix qwen moe * fix qwen moe	2025-07-14 19:50:05 +08:00
freeliuzc	2e81792d64	[fix] fix 'force-reinstall all-depe-packages in build' (#2837 )	2025-07-14 16:50:54 +08:00
AIbin	b7858c22d9	【Update Docs】update supported_models doc (#2836 ) * update supported_models doc	2025-07-14 16:01:34 +08:00
GoldPancake	09bbac6de0	Add DeepGEMM pre-compile tools (#2819 ) This tool allows you to compile all possible kernels in advance through the model's config.json, and avoids the situation where uncompiled kernel is encountered and JIT is executed when certain requests arrive.	2025-07-14 14:56:41 +08:00
freeliuzc	7f64d408a9	[MTP] support expert-parellel in mtp (#2835 )	2025-07-14 14:28:50 +08:00
lddfym	ece88596ed	fix spelling error (#2827 )	2025-07-14 13:12:57 +08:00
bukejiyu	bad53c6b6e	[vl]remove duplicated load logic (#2744 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-13 07:36:26 +08:00
xiegegege	16940822a7	add result save for ci (#2824 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details LGTM	2025-07-12 23:34:46 +08:00
zhenwenDang	d48c03413f	Feature/logprob bug fix (#2817 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix: handle missing logprobs at step 0 and incorrect finish reason with max_completion_tokens * Prevent response_logprobs.logprob_token_ids[0] from going out of bounds	2025-07-12 16:48:51 +08:00
gaoziyuan	e9e8443ea8	fix num_blocks_local when small size model in TP2 running mode (#2792 )	2025-07-12 12:50:48 +08:00
gaoziyuan	749b2e9c89	support qwen3moe name_mapping (#2820 )	2025-07-12 12:05:54 +08:00
Sunny-bot1	f6ad26fc08	fix topp default value (#2814 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-11 17:10:21 +08:00
zhink	c08561c13a	[Feature] support tensor-parallel-size>num_key_value_heads for qwen3 (#2799 )	2025-07-11 15:09:43 +08:00
chen	2c3607407f	check (#2811 )	2025-07-11 13:54:52 +08:00
lddfym	b5e4288704	Global scheduler supports configuring hot updates (#2807 ) * Check if the controller port is available * Global scheduler supports configuring hot updates * add interface: /controller/scheduler * add interface: /controller/scheduler	2025-07-11 13:38:07 +08:00
yulangz	abbbd0cddc	[XPU] Update docker file (#2809 )	2025-07-11 13:26:38 +08:00
yinwei	e98937cbba	delete useless file (#2772 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-11 11:46:04 +08:00
Sunny-bot1	240d6236bc	[Fix]fix top_k_top_p sampling (#2801 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix topk-topp * update * add base_non_truncated	2025-07-10 22:35:10 +08:00
littledgg	59071268b6	[Executor] Move forward_meta.py to fastdeploy/model_executor (#2774 ) * Use PEP 563 in attention.py and fix conflict * merge commit * Change what was left out last time	2025-07-10 20:36:51 +08:00
lizexu123	8c660a0dfb	[BugFix] fix RMSNorm rms_norm_esp (#2797 ) * fix rms * add vl * fix * add vl * fix * fix	2025-07-10 20:02:24 +08:00
LiqinruiG	ce5adec877	[Doc] modify offline-inerence docs (#2800 ) * modify offline-inerence docs * [bug] remove tool_call_content	2025-07-10 19:41:12 +08:00
Zeyu Chen	36571fd2d9	Update README.md Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-10 17:01:08 +08:00
yulangz	830de5a925	[XPU] Supports TP4 deployment on 4,5,6,7 (#2794 ) * 支持通过 XPU_VISIBLE_DEVICES 指定 4,5,6,7 卡运行 * 修改 XPU 文档中多卡说明	2025-07-10 16:48:08 +08:00
chen	d33105baeb	[Feature] Online Chat API Support Return logprobs (#2777 ) * online chat support logprobs * check xpu * check vl_gpu_model_runner and xpu_model_runner * get_worker() check platform	2025-07-10 16:33:40 +08:00
K11OntheBoat	24f934f1f9	[BugFix] Fix low prediction accuracy of deepseekv3 (#2798 )	2025-07-10 16:16:44 +08:00
Sunny-bot1	1e2319cbef	Rename top_p_sampling to top_k_top_p_sampling (#2791 )	2025-07-10 00:09:25 -07:00
Sunny-bot1	e45050cae3	[Feature] support top_k_top_p sampling (#2753 ) * support top_k_top_p sampling * fix * add api param * add api para * fix * fix * fix * fix * fix * fix * fix	2025-07-09 20:58:58 -07:00
Ryan	b0f525955c	[SOT] Remove breakgraph in post processing && fix datatype (#2780 )	2025-07-10 11:26:00 +08:00
Yuanle Liu	2ea267f624	assert prompt len > 0 (#2773 )	2025-07-10 11:14:52 +08:00
0x3878f	1d8af7ab73	Add env variable for dy2st (#2779 )	2025-07-10 11:06:06 +08:00
LiqinruiG	54affdc44b	[Doc] modify offline_inference docs (#2787 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * modify reasoning_output docs * modify offline inference docs * modify offline inference docs * modify offline_inference docs * modify offline_inference docs	2025-07-10 01:06:14 +08:00
Jiang-Jia-Jun	a4fdb3970b	[BugFix] Fix vocab size error for ernie model (#2785 ) * [BugFix] Fix vocab size error for ernie model * [BugFix] Fix vocab size error for ernie model --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-10 01:05:51 +08:00
Jiang-Jia-Jun	2a86928657	[BugFix Revert] Fix vocab size error for ernie model	2025-07-09 22:14:54 +08:00
Jiang-Jia-Jun	b1c53fa779	[BugFix] Fix vocab size error for ernie model	2025-07-09 22:13:41 +08:00
lizexu123	da20cf681e	[Bug fix] Fixed the garbled text issues in Qwen3-8B (#2783 )	2025-07-09 22:03:57 +08:00
LiqinruiG	4ccd1696ab	[Doc] modify offline inference docs (#2747 ) * modify reasoning_output docs * modify offline inference docs * modify offline inference docs	2025-07-09 20:53:26 +08:00
chen	888780ffde	[Feature] block_wise_fp8 support triton_moe_backend (#2767 )	2025-07-09 19:22:47 +08:00
RAM	e3768c5a83	[Executor] Fix bug of logger.debug (#2778 )	2025-07-09 04:13:43 -07:00
lifulll	1f28bdf994	dcu adapter ernie45t (#2756 ) Co-authored-by: lifu <lifu@sugon.com> Co-authored-by: yongqiangma <xing.wo@163.com>	2025-07-09 18:56:27 +08:00
RAM	03a74995b8	Clear dead code And supplementary notes (#2757 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * 1.supplementary notes 2.delete dead code * fix bug of forward meta * Global modification of forward meta * fix vl model_runner bug	2025-07-09 16:17:34 +08:00
zhink	b89180f1cd	[Feature] support custom all-reduce (#2758 ) * [Feature] support custom all-reduce * add vllm adapted	2025-07-09 16:00:27 +08:00
yulangz	be21ef5047	[XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B (#2765 ) * fix no quant xpu moe * change dir of xpu moe weight only	2025-07-09 15:57:51 +08:00
celsowm	771e71a24d	Feat/blackwell sm100 support (#2670 ) * Add initial support for NVIDIA Blackwell (SM100) architecture This change introduces initial support for the NVIDIA Blackwell GPU architecture, specifically targeting SM100 (Compute Capability 10.x) with '100a' architecture-specific features (e.g., for CUTLASS). Key changes: - Updated custom_ops/setup_ops.py to generate appropriate gencode flags (arch=compute_100a,code=sm_100a) when '100' is specified in FD_BUILDING_ARCS. Requires CUDA 12.9+. - Updated custom_ops/gpu_ops/cutlass_extensions/gemm_configs.h: - Added CutlassTileConfigSM100 enum (with placeholder tile shapes). - Added BLACKWELL to CandidateConfigTypeParam. - Updated CutlassGemmConfig struct with is_sm100 flag, tile_config_sm100, and new constructor for SM100. - Modified toString() and fromString() for SM100 support. - Updated custom_ops/gpu_ops/cutlass_kernels/cutlass_heuristic.cu: - Added get_candidate_tiles_sm100() (with placeholder tiles). - Added placeholder mcast support functions for SM100. - Updated get_candidate_configs() to include SM100 paths using the BLACKWELL flag and new SM100 config types. - Updated build.sh with comments to guide users on specifying '100' for Blackwell in FD_BUILDING_ARCS. Further work: - Optimal CUTLASS tile configurations for SM100 need to be researched and updated in cutlass_heuristic.cu. - Kernel auto-generation scripts in custom_ops/utils/ may need SM100-specific versions if Blackwell's hardware features for FP8/TMA differ significantly from SM90. - Compatibility of third-party libraries (CUTLASS v3.8.0, DeepGEMM) with Blackwell should be fully verified. * Feat: Implement detailed Blackwell (SM100) CUTLASS heuristics This change integrates specific, expert-provided CUTLASS heuristic configurations for the NVIDIA Blackwell (SM100) GPU architecture, replacing previous placeholders. This includes: - Updated `custom_ops/gpu_ops/cutlass_extensions/gemm_configs.h`: - Populated `CutlassTileConfigSM100` enum with specific tile shapes (e.g., CtaShape64x64x128B, CtaShape128x128x128B) suitable for SM100. - Added `FP4_ONLY` to `CandidateConfigTypeParam` for new FP4 paths. - Updated `custom_ops/gpu_ops/cutlass_kernels/cutlass_heuristic.cu`: - Implemented `get_candidate_tiles_sm100` with detailed logic for selecting tile configurations based on GROUPED_GEMM and FP4_ONLY flags, using the new SM100 tile enums. - Implemented `supports_mcast_along_m_sm100` and `supports_mcast_along_n_sm100` with specific tile checks for Blackwell. - Updated the `sm == 100` (Blackwell) block in `get_candidate_configs` to use these new helper functions and accurately populate candidate kernel configurations for various cluster shapes. - `custom_ops/setup_ops.py` remains configured to compile for `arch=compute_100a,code=sm_100a` with CUDA 12.9+ for these features. This aligns the codebase with heuristic configurations similar to those in upstream TensorRT-LLM / CUTLASS for Blackwell, enabling more performant kernel selection on this new architecture. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-09 15:29:42 +08:00
yulangz	0350831c2b	fix xpu offline demo garbled output (#2763 )	2025-07-09 14:51:20 +08:00

1 2 3 4 5 ...

2691 Commits