FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-04 08:16:42 +08:00

Author	SHA1	Message	Date
chen	f38b174a75	Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method (#4115 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * improve per_token_quant_fp8 performance * support moe wfp8apf8 * check glm test * fix noaux_tc op in cudagraph, support noaux_tc return the correct * check * check inf and overwrite score in noaux_tc --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-09-22 21:27:37 +08:00
chen	fbb4e0f8d1	[CP]Glm45 air 2.2 (#4073 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * [Feature] Support zai-org/GLM-4.5-Air BF16 model (#3928) * support glm45_air * [Feature] GLM-45-AIR Support Mix Quantization(Dense wfp8afp8 and wint8 triton_moe_backend) (#4051) * check * fix v1 load for mix and wint8 * check --quantizations 'None' * check * support RL rollout * check v1 loader * check glm rollout_model, change wfp8afp8 per_token_cast_to_fp8 to native impl * check rollout moe gate begin layer_id * check rollout e_score_correction_bias * delete infer_to_train_mapping={} * code check	2025-09-15 18:52:58 +08:00
Yuanle Liu	174510180a	[BugFix] fix error of import paddle.base.core.Config (#3761 ) (#3804 ) * 延迟 import Config * support chunked_prefill * support chunked_prefill	2025-09-03 10:14:03 +08:00
ltd0924	0f42771a84	[Feature] support model weight update in ep (#3802 ) * Update config.py * Update ep.py * Update fused_moe_backend_base.py * Update dynamic_weight_manager.py * Update worker_process.py * fix ci	2025-09-02 20:52:47 +08:00
Yuanle Liu	4957908275	add input_processor plugin (#3657 ) * add input_processor plugin * update * update * update * update * update * update * update * update * update * update * update	2025-08-28 22:53:57 +08:00
gaoziyuan	82e64b13e1	[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop (#3598 ) * [Feature] update ep * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix queue ports idx * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * Update engine.py * fix ci * fix some bug in mixed ep * add server fix and op fix * rm some log * fix code style * ltd fix * fix * fix * fix some bug * fix bug * fix bug * fix style * Update config.py * Update splitwise_connector.py * Update cache_messager.py * Update __init__.py * merge and fix * Update engine.py * Update common_engine.py * Update run_ci_xpu.sh * Update ernie_processor.py * Update ernie_processor.py --------- Co-authored-by: ltd0924 <ltd0924@sina.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>	2025-08-26 19:59:02 +08:00
Yuan Xiaolan	9205c88da1	support w4afp8 EP inference (#3044 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-08-25 11:27:45 +08:00
gaoziyuan	a799d14df1	[Bugfix] Fix model accuracy in some ops (#3231 ) * fix noaux_tc op * fix * update * fix qk norm * fix linear for prequant loader * test * fix * fix * rm some print * fix noaux_tc op * test * Fix the confused enable_early_stop when only set early_stop_config (#3214) * fix the confused early_stop_config when only set early_stop_config * pre-commit * write a general method * Add ci case for min token and max token (#3229) Co-authored-by: xujing43 <xujing43@baidu.com> * add some evil cases (#3240) * add repitation early stop cases * add repitation early stop cases * add bad cases * add bad cases * add evil cases * qwen3_moe (#3084) * [Feature] support seed parameter (#3161) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix * 【Fix Bug】修复 fa3 支持集中式bug (#3235) * fix fa3 集中式bug * 增加qknorm参数 * fix qk norm * fix * update * fix linear for prequant loader * fix * fix * rm some print * fix * fix moe init weight&scale * fix moe init weight&scale --------- Co-authored-by: bukejiyu <395822456@qq.com> Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com> Co-authored-by: Zero Rains <linjunlu@zerorains.top> Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com> Co-authored-by: xujing43 <xujing43@baidu.com> Co-authored-by: Divano <dddivano@outlook.com> Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com> Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com> Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com> Co-authored-by: qingqing01 <dangqingqing@baidu.com>	2025-08-08 17:30:37 +08:00
Yuan Xiaolan	af543b7f0f	revise get_moe_scores (#3164 )	2025-08-05 16:43:07 +08:00
RichardWooSJTU	f5c64a074c	[EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization (#3182 ) * Add support for mixed-ep across multi nodes * code refine --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-08-05 15:40:11 +08:00
Longzhi Wang	907d561523	fix ep when paddle version mismatch (#3056 )	2025-07-29 15:06:49 +08:00
Longzhi Wang	0700c90caa	[Feat] support mixed ep (#2969 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Support mixed ep * fix comment * fix comment * update mixep * fix conflict * fix typo * update * fix typo * fix code style * fix conflict	2025-07-25 15:29:30 +08:00
xiaoxiaohehe001	2970b00dfa	[Feature] Support_eplb (#2997 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Feature] support_eplb * [Feature] support_eplb * [Fix] fix mm ep	2025-07-24 20:22:45 +08:00
Zero Rains	0fb37ab7e4	update flake8 version to support pre-commit in python3.12 (#3000 ) * update flake8 version to support pre-commit in python3.12 * polish code	2025-07-24 01:43:31 -07:00
周周周	ff4569f135	remove some code in ep.py (#2947 )	2025-07-21 22:44:57 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00
jiangjiajun	684703fd72	[LLM] First commit the llm deployment code	2025-06-09 19:20:15 +08:00

18 Commits