FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 08:37:06 +08:00

Author	SHA1	Message	Date
Mattheliu	108d989d9d	[Docs] add fastdeploy_unit_test_guide.md (#3484 ) * docs:add fastdeploy_unit_test_guide.md * docs:fix fastdeploy_unit_test_guide.md * docs: add FastDeploy unit test spec (EN) and update usage nav * fix codestyle	2025-08-28 14:12:25 +08:00
Jiang-Jia-Jun	c694fa2879	Revert "[Feature] block sparse attention (#3209 )" (#3647 ) This reverts commit `646a0c2fd8`.	2025-08-27 17:35:04 +08:00
JYChen	e645db348b	[docs] Update best practice doc (#3539 ) * fix some docs error * [docs] x1 best-practice * update docs * fix docs	2025-08-27 15:45:30 +08:00
chen	ce9c0917c5	[Precision] Support lm_head layer running in float32 (#3597 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support lm_head fp32 bf16 fp16 * support lm_head fp32 bf16 fp16 * add doc and check code * lm_head_fp32 specify lm_head as fp32 * code check * check doc	2025-08-27 11:34:53 +08:00
yangjianfengo1	646a0c2fd8	[Feature] block sparse attention (#3209 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * 支持稀疏attn * fix bug * code style * fix moba attn get kv shape * 修复a100编译 * codestyle * code style * code style * code style * fix conflict * 增加单侧 * code style * 增加eblite 加载时间 * fix bug * for ci * for ci * for ci * for ci * 支持mlp block size 128 * 增加小算子单测 * fix 单测 mlp * 将环境变量加入到config里面 * fix rollout config	2025-08-26 07:16:04 -07:00
Yuanle Liu	cbce94a00e	rename ernie_xxx to ernie4_5_xxx (#3621 ) * rename ernie_xxx to ernie4_5_xxx * ci fix	2025-08-26 19:29:27 +08:00
Sunny-bot1	c68c3c4b8b	[Feature] bad words support v1 scheduler and specifiy token ids (#3608 ) * support bad_words_token_ids * docs * fix test * fix * bad words support kvcache v1 and token ids * fix	2025-08-25 20:14:51 -07:00
Kane2011	2ae7ab28d2	[MetaxGPU] adapt to the latest fastdeploy on metax gpu (#3492 )	2025-08-25 17:44:20 +08:00
chen	9cab3f47ff	[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552 ) * [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing * infer engine support temp_scaled_logprobs and top_p_normalized_logprobs * delete some code * code check * code check and add doc * fix tokenizer.decoder(-1), return 'Invalid Token' * add ci for temp_scaled and top_p logprobs * check test * check seq len time shape * logprob clip inf --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com>	2025-08-25 14:11:49 +08:00
zhink	df7c31012b	Modified to support custom all reduce by default (#3538 )	2025-08-22 16:59:05 +08:00
luukunn	371fb3f853	[Feature] add tool parser (#3483 ) * add tool parser * add x1 enable_thinking * restart ci * fix vl reasoning parser * modify call style * modify call style * add offline enablethinking * fix completion * fix * fix unit test * fix unit test * fix unit test * fix vl reasoning parser * fix vl reasoning parser	2025-08-21 17:25:44 +08:00
Yzc216	466cbb5a99	[Feature] Models api (#3073 ) * add v1/models interface related * add model parameters * default model verification * unit test * check model err_msg * unit test * type annotation * model parameter in response * modify document description * modify document description * unit test * verification * verification update * model_name * pre-commit * update test case * update test case * Update tests/entrypoints/openai/test_serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/entrypoints/openai/test_serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/entrypoints/openai/test_serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/entrypoints/openai/test_serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/entrypoints/openai/serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-21 17:02:56 +08:00
Zhang Yulong	33ff0bfe38	Update disaggregated.md (#3495 ) 修复文档错误	2025-08-20 19:39:18 +08:00
luukunn	9c129813f9	[Feature] add custom chat template (#3251 ) * add custom chat_template * add custom chat_template * add unittest * fix * add docs * fix comment * add offline chat * fix unit test * fix unit test * fix * fix pre commit * fix unit test * add unit test * add unit test * add unit test * fix pre_commit * fix enable_thinking * fix pre commit * fix pre commit * fix unit test * add requirements	2025-08-18 16:34:08 +08:00
RAM	154308102e	[Docs]Updata docs of graph opt backend (#3442 ) * Updata docs of graph opt backend * update best_practices	2025-08-15 21:30:32 +08:00
yongqiangma	5703d7aa0f	update installation readme (#3429 )	2025-08-15 19:09:41 +08:00
yangjianfengo1	615930bc05	Update README (#3426 ) * 修改READMe * code style * code style	2025-08-15 18:46:28 +08:00
JYChen	6f11171478	fix some docs error (#3439 )	2025-08-15 18:45:27 +08:00
yinwei	354575b6d1	[Docs]Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95 (#3428 ) * XPU Update 2.1 Release Documentation * code style check * Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95	2025-08-15 18:34:37 +08:00
ming1753	d4e3a20300	[Docs] Release 2.1 docs and fix some description (#3424 )	2025-08-15 14:27:19 +08:00
yinwei	fbb6dcb9e4	[Docs]XPU Update 2.1 Release Documentation (#3423 ) * XPU Update 2.1 Release Documentation * code style check	2025-08-15 14:07:47 +08:00
JYChen	562e01c979	update docs (#3420 )	2025-08-15 13:00:08 +08:00
ltd0924	5a84324798	[Doc] Add multinode deployment documents (#3417 ) * Create multi-node_deployment.md * Create multi-node_deployment.md * Update mkdocs.yml	2025-08-15 10:37:04 +08:00
yzwu	ce9180241e	[Iluvatar GPU] Modify the names of some variables (#3273 )	2025-08-13 11:38:02 +08:00
yangjianfengo1	b808c49585	[Doc] 增加中英文切换 (#3318 ) * 增加中英文切换 * 增加中英文切换 * 修改readme	2025-08-12 11:20:45 +08:00
Sunny-bot1	19fda4e912	fix docs (#3332 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-08-11 21:03:49 +08:00
Sunny-bot1	789dc67ff7	[Docs]fix sampling docs (#3113 ) * fix sampling docs * fix sampling docs * update	2025-08-11 20:42:27 +08:00
Yuanle Liu	9571c458f0	enhance eos_tokens (#3274 ) * enhance eos_tokens * update * update	2025-08-11 14:47:52 +08:00
ltd0924	31d4fcb425	[BugFix] fix too many open files problem (#3256 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Update cache_messager.py * fix too many open files problem * fix too many open files problem * fix too many open files problem * fix ci bugs * Update api_server.py * add parameter * format * format * format * format * Update parameters.md * Update parameters.md * Update serving_completion.py * Update serving_chat.py * Update envs.py --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-08 20:10:11 +08:00
yzwu	fbdd6b0663	[Iluvatar GPU] Optimze attention and moe performance (#3234 )	2025-08-08 10:51:24 +08:00
hong19860320	93a1731891	[Doc] Update deps and fix dead links (#3252 )	2025-08-07 11:04:31 +08:00
lizhenyun01	fe540f6caa	[plugin] Custom model_runner/model support (#3186 ) * support custom model&&model_runner * fix merge * add test && update doc * fix codestyle * fix unittest * load model in rl	2025-08-04 18:52:39 -07:00
gaoziyuan	4021d66ea5	【Feature】add fd plugins && rm model_classes (#3123 ) * add fd plugins && rm model_classed * fix reviews * add docs * fix * fix unitest ci	2025-08-03 19:53:20 -07:00
ApplEOFDiscord	b71cbb466d	[Feature] remove dependency on enable_mm and refine multimodal's code (#3014 ) * remove dependency on enable_mm * fix codestyle check error * fix codestyle check error * update docs * resolve conflicts on model config * fix unit test error * fix code style check error --------- Co-authored-by: shige <1021937542@qq.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-01 20:01:18 +08:00
ming1753	fc5f43c6bc	[Docs] Optimal Deployment (#2768 )	2025-08-01 11:56:27 +08:00
LiqinruiG	25005fee30	[Doc] add chat_template_kwagrs and update params docs (#3103 ) * add chat_template_kwagrs and update params docs * add chat_template_kwagrs and update params docs * update enable_thinking * pre-commit * update test case --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:44:06 +08:00
JYChen	1ef38b1563	[doc] best practice for eb45 text models (#3002 ) * [doc] best practice for eb45 text models * fix docs	2025-07-31 17:21:55 +08:00
Jiang-Jia-Jun	4498058722	Update README.md	2025-07-31 15:33:12 +08:00
Jiang-Jia-Jun	66304cf921	Update sampling.md	2025-07-31 15:02:57 +08:00
yinwei	5b9aec1f10	xpu release 2.0.3 (#3105 )	2025-07-31 14:26:07 +08:00
Jiang-Jia-Jun	998968f1e8	[Doc] Update parameters of serving	2025-07-30 22:35:01 +08:00
JYChen	bd29b2aaca	add stop_seqs doc (#3090 )	2025-07-30 20:36:18 +08:00
李泳桦	b242150f94	[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3058 ) * [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client * [fix] delete ci test case for enable_thinking * [fix] add reasoning_parser when server starts * [fix] fix ci consistency test error with reasoning parser * [doc] update docs related to metadata * [fix] cancel enable_thinking default value	2025-07-30 19:25:20 +08:00
Zero Rains	4dc130c5a9	[Doc] add repetition early stopping doc (#3078 ) * add repetition early stop doc * add the early_stop.md	2025-07-29 22:01:57 -07:00
lddfym	5ca684c762	update doc: load_balance.md (#3008 ) * update doc of load_balance * update doc: load_balance.md	2025-07-30 10:27:56 +08:00
Sunny-bot1	9c962343f2	[Docs] add sampling docs (#2973 ) * add sampling docs * add minp sampling docs * update sample docs * update * update * add bad words desc * update	2025-07-30 02:24:16 +08:00
Jiang-Jia-Jun	286802a070	Update ernie-4.5.md	2025-07-29 10:10:09 +08:00
Jiang-Jia-Jun	6ce3a8a497	Update index.md	2025-07-25 10:32:47 +08:00
Yzc216	980126b83a	[Feature] multi source download (#3005 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation	2025-07-24 17:42:09 +08:00
lizexu123	67990e0572	[Feature] support min_p_sampling (#2872 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Fastdeploy support min_p * add test_min_p * fix * min_p_sampling * update * delete vl_gpu_model_runner.py * fix * Align usage of min_p with vLLM * fix * modified unit test * fix test_min_sampling * pre-commit all files * fix * fix * fix * fix xpu_model_runner.py	2025-07-20 23:17:59 -07:00

1 2 3 4 5 ...

543 Commits