FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 08:37:06 +08:00

Author	SHA1	Message	Date
Jiang-Jia-Jun	c694fa2879	Revert "[Feature] block sparse attention (#3209 )" (#3647 ) This reverts commit `646a0c2fd8`.	2025-08-27 17:35:04 +08:00
chen	ce9c0917c5	[Precision] Support lm_head layer running in float32 (#3597 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support lm_head fp32 bf16 fp16 * support lm_head fp32 bf16 fp16 * add doc and check code * lm_head_fp32 specify lm_head as fp32 * code check * check doc	2025-08-27 11:34:53 +08:00
yangjianfengo1	646a0c2fd8	[Feature] block sparse attention (#3209 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * 支持稀疏attn * fix bug * code style * fix moba attn get kv shape * 修复a100编译 * codestyle * code style * code style * code style * fix conflict * 增加单侧 * code style * 增加eblite 加载时间 * fix bug * for ci * for ci * for ci * for ci * 支持mlp block size 128 * 增加小算子单测 * fix 单测 mlp * 将环境变量加入到config里面 * fix rollout config	2025-08-26 07:16:04 -07:00
gaoziyuan	82e64b13e1	[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop (#3598 ) * [Feature] update ep * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix queue ports idx * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * Update engine.py * fix ci * fix some bug in mixed ep * add server fix and op fix * rm some log * fix code style * ltd fix * fix * fix * fix some bug * fix bug * fix bug * fix style * Update config.py * Update splitwise_connector.py * Update cache_messager.py * Update __init__.py * merge and fix * Update engine.py * Update common_engine.py * Update run_ci_xpu.sh * Update ernie_processor.py * Update ernie_processor.py --------- Co-authored-by: ltd0924 <ltd0924@sina.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>	2025-08-26 19:59:02 +08:00
bukejiyu	3200a80de3	[v1 loader]support fp8 (#3593 ) * support fp8 * update ci	2025-08-26 02:42:46 -07:00
lzy	d339df2e90	Supports DP+TP+EP hybrid parallel deployment strategy (#3489 ) * Support DP+TP+EP hybrid parallel deployment strategy * Support DP+TP+EP hybrid parallel deployment strategy * fix conflict * add moe_tp_ep function split_allgather_out * del tp_group in moe_cutlass_backend * for ci * fix parallel_config for ci * del log	2025-08-26 00:04:01 -07:00
freeliuzc	52eda7fdb3	[Feature][MTP]support new speculative decoding method named hybrid mtp with ngram (#3610 )	2025-08-26 14:29:22 +08:00
lizexu123	c43a4bec00	[Features] support hugging face qwen3 dense and qwen2 model (#3574 ) * support qwen2 and qwen3 hugging face * fix moe * defualt_v1 loader * hugging_face_format deprecated * modify hugging_face_foramt to model_format * model_format auto * fix environemt * fix bug * fix qwen3-0.6 bug * model_format is str * fix	2025-08-26 10:54:53 +08:00
YuanRisheng	e481b7a779	fix sot (#3556 )	2025-08-23 08:37:06 +08:00
zhink	df7c31012b	Modified to support custom all reduce by default (#3538 )	2025-08-22 16:59:05 +08:00
YuanRisheng	c389a4013c	Unify server-side and model-side Config(Part-5) (#3497 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details * move config * fix xpu * fix * fix vl * fix vl * fix unitest * fix args * add unitest * fix test	2025-08-21 19:00:21 +08:00
Jundong Liu	70ee910cd5	[Excutor] Change cudagraph hashkey from batch size to num_tokens (#3454 )	2025-08-18 16:16:48 +08:00
chenjian	b21272d9ff	[Bug fix] fix block num setting in scheduler v1 for develop (#3303 ) * fix block num setting in scheduler v1 * fix block num setting in scheduler v1 * fix max_block_num and max_num_batched_tokens setting * fix max_block_num and max_num_batched_tokens setting * fix max_block_num and max_num_batched_tokens setting * fix max_block_num and max_num_batched_tokens setting	2025-08-12 10:38:51 +08:00
Yuanle Liu	9571c458f0	enhance eos_tokens (#3274 ) * enhance eos_tokens * update * update	2025-08-11 14:47:52 +08:00
yzwu	fbdd6b0663	[Iluvatar GPU] Optimze attention and moe performance (#3234 )	2025-08-08 10:51:24 +08:00
lizexu123	afff4d37ea	[Feature] support seed parameter (#3161 ) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix	2025-08-06 15:20:47 +08:00
bukejiyu	20839abccf	qwen3_moe (#3084 )	2025-08-06 14:45:27 +08:00
lizhenyun01	fe540f6caa	[plugin] Custom model_runner/model support (#3186 ) * support custom model&&model_runner * fix merge * add test && update doc * fix codestyle * fix unittest * load model in rl	2025-08-04 18:52:39 -07:00
YuanRisheng	7dfdd157ac	[BugFix]Fix ep size (#3092 ) * fix ep * fix num_layer	2025-07-30 21:03:12 +08:00
bukejiyu	db698bda01	qwen loader (#3057 )	2025-07-30 19:09:38 +08:00
Jiang-Jia-Jun	ffa0f4d99b	[Fix] Fix version function (#3076 ) * [Fix] Fix version function * Fix commit * Fix commit * fix code sync * Update coverage_run.sh --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-30 16:05:24 +08:00
YuanRisheng	99a70fc722	unify parallel config (#3070 )	2025-07-30 11:41:23 +08:00
Ryan	73cfe1fd37	[SOT] Extend SOT warmup support to new hardware (#3032 ) * add new hardware * add_sot_warmup4new_hardware * fix conflict * rm Optional	2025-07-29 22:45:20 +08:00
Zero Rains	b2f9a42d87	[Feature] Support repetition early stop (#3024 ) * support repetition early stop and support user to set the parameter * remove log * fix codestyle * add the early_stop_config to rollout_config * update config and EarlyStopper class * fix the bug for triton * modify the stop method * update description * modify the usage for stop_flags --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-29 22:42:54 +08:00
YuanRisheng	502ee92a0a	Unify server-side and model-side Config (Part3) (#3047 ) * merge model config * fix arch * fix rl	2025-07-29 17:07:44 +08:00
JYChen	dafe02a7b9	[stop sequence] support stop sequence (#3025 ) * stop seqs in multi-ends * unittest for gpu stop op * kernel tid==0	2025-07-29 14:17:37 +08:00
YuanRisheng	1a815b7a2a	Fix Speculative Config bug (#3049 ) * fix speculative bug * fix rl	2025-07-29 10:50:48 +08:00
YuanRisheng	bddf403576	Unify server-side and model-side Config (Part2) (#3035 ) * merge speculative and graph opt conifg * add attr	2025-07-28 15:31:48 +08:00
YuanRisheng	6ccc10ad47	Unify server-side and model-side Config (Part1) (#3018 ) * move cache config * fix mtp	2025-07-28 10:51:52 +08:00
Longzhi Wang	0700c90caa	[Feat] support mixed ep (#2969 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Support mixed ep * fix comment * fix comment * update mixep * fix conflict * fix typo * update * fix typo * fix code style * fix conflict	2025-07-25 15:29:30 +08:00
xiaoxiaohehe001	2970b00dfa	[Feature] Support_eplb (#2997 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Feature] support_eplb * [Feature] support_eplb * [Fix] fix mm ep	2025-07-24 20:22:45 +08:00
Ryan	95b5af24db	[SOT] Add sot warmup (NVIDIA GPU Only) (#2929 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add sot warmup * fix code style * change batch_size list * add param to config * rm free_list settings && set sot_warmup_sizes * finish debug with dynamic dims by type annotations * add profile_run guard * rm sth useless	2025-07-22 21:36:14 +08:00
GoldPancake	dc67c10a7e	[Feature][MTP]Support multi-step MTP (#2952 )	2025-07-22 16:26:29 +08:00
Nyakku Shigure	48e6a0ca26	[SOT] Mark dynamic dims by type annotations (#2771 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [SOT] Mark dynamic dims by type annotations * fix conflict of forward_meta * mark more attn backend * fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS * auto infer implicit 0 dim dynamic dim * revert manual marked dims * revert missing update * auto infer can use unsafe code in warmup stage * check -> type_match * fix codestyle * restore blank line * empty commit * add need_warmup nonlocal; * add doc for resolver * add missing type hints * unquote "ForwardMeta"	2025-07-22 00:23:52 -07:00
zhink	0262ef7eb3	custom all reduce support cuda graph (#2938 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag * rename communication_op to communication	2025-07-21 22:52:03 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
gaoziyuan	6efad14b95	support vl ori_vacab_size (#2900 )	2025-07-18 16:26:14 +08:00
RAM	cd52dc0f65	[Executor] Fix set capture sizes bug (#2902 )	2025-07-18 15:12:19 +08:00
YuanRisheng	0eb5dc18d3	[BugFix]Fix sample rejection (#2908 ) * fix config * fix rejection	2025-07-18 13:44:30 +08:00
freeliuzc	d49f8fb30a	[Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890 ) * support chunk_prefill both normal and speculative_decoding(mtp) * optimize pd-disaggregation config * fix bug	2025-07-17 17:58:08 +08:00
Yuanle Liu	dbb9e2506b	Fix rollout_model init (#2881 )	2025-07-16 22:36:21 -07:00
Yuanle Liu	63d6e7ce06	fix and refine vl (#2866 ) * refine vl config * delete attn_sep * fix vl accuracy	2025-07-16 05:59:28 -07:00
Yuanle Liu	dda4a9f848	rl update (#2861 )	2025-07-16 00:33:10 -07:00
YuanRisheng	0253381fb9	fix config (#2858 )	2025-07-16 11:40:10 +08:00
YuanRisheng	101ad33332	[BugFix] Fix Configs (#2849 ) * fix config * fix config	2025-07-15 19:50:36 -07:00
RAM	0fad10b35a	[Executor] CUDA Graph support padding batch (#2844 ) * cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug	2025-07-15 19:49:01 -07:00
Zero Rains	e7bcbbab52	Merge vl execution path into normal execution path (#2829 ) * merge vl model into gpu_model runner Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6 * fix chinese Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a * fix the parse parameter Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe * fix the bug in online_inference Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559 * polish code Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c	2025-07-15 22:20:03 +08:00
YuanRisheng	4c7b8bc458	Simplify the Config code (#2770 ) * simplify the code * fix vl * delete config * fix * perfect code * fix ci * fix xpu * fix xpu * fix server * resolve conflict * fix mtp * resolve conflict * fix xpu * fix xpu * fix vl * fix log * fix qwen moe * fix qwen moe * fix qwen moe	2025-07-14 19:50:05 +08:00
bukejiyu	bad53c6b6e	[vl]remove duplicated load logic (#2744 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-13 07:36:26 +08:00
lizexu123	8c660a0dfb	[BugFix] fix RMSNorm rms_norm_esp (#2797 ) * fix rms * add vl * fix * add vl * fix * fix	2025-07-10 20:02:24 +08:00

1 2

56 Commits