FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 16:48:03 +08:00

Author	SHA1	Message	Date
yangjianfengo1	4325b737e7	【FIX】Change the name of sparse attn from moba to plas (#4006 ) (#4076 ) * 【FIX】Change the name of sparse attn from moba to plas (#4006) * 更新文档 * 【docs】 update readme (#4000) * 更新文档 * update readme * update docs * 【FIX】Change the name of sparse attn from moba to plas (#3845) * 更新文档 * 更新文档 * 更新文档 * 更新文档 * 修改moba为plas * code style * update ci * code style * update ci * code style --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * fix max_num_seqs * fix test load attn --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-09-23 10:26:40 +08:00
lizexu123	c86945ef49	[Feature] support pool (#3827 ) * support pool * update pooling * add pooler_config and check * update * support AutoWeightsLoader load weight * fix * update * delete print * update pre-commit * fix * fix xpu * fix ModelRegistry->model_registry * fix Copilot review * fix pooler.py * delete StepPooler * fix abstract * fix default_loader_v1 * fix Pre Commit * support torch qwen3 dense * add test and fix torch-qwen * fix * fix * adapter ci: * fix review * fix pooling_params.py * fix * fix tasks.py 2025 * fix print and logger * Modefy ModelRegistry and delete AutoWeightsLoader * fix logger * fix test_embedding * fix ci bug * ernie4_5 model_registry * fix test * support Qwen3-Embedding-0.6B tp=1 load * fix extra code * fix * delete fix vocab_size * delete prepare_params_dict * fix:	2025-09-22 14:09:09 +08:00
co63oc	17a27170bc	fix typos (#4093 )	2025-09-15 18:33:30 +08:00
freeliuzc	46911f903d	[MTP]update hybrid-mtp-with-ngram (#4047 )	2025-09-15 17:13:31 +08:00
Jiang-Jia-Jun	c60adf4281	Revert "【FIX】Change the name of sparse attn from moba to plas (#3845 )" (#4001 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details This reverts commit `e31c8f7336`.	2025-09-09 11:08:23 +08:00
yangjianfengo1	e31c8f7336	【FIX】Change the name of sparse attn from moba to plas (#3845 ) * 更新文档 * 更新文档 * 更新文档 * 更新文档 * 修改moba为plas * code style * update ci * code style * update ci	2025-09-09 10:56:50 +08:00
yangjianfengo1	472402bf4e	Update sparse attn documentation (#3954 ) * 更新文档 * 更新文档 * 更新文档 * 更新文档	2025-09-08 12:23:18 +08:00
ltd0924	7643e6e6b2	[Docs] add data parallel (#3883 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Docs] add data parallel * [Docs] add data parallel	2025-09-04 20:33:50 +08:00
kevin	1908465542	[Feature] mm and thinking model support structred output (#2749 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * mm support structured output * update code * update code * update format * update code * update code * add enable_thinking default * update code * add structured_outputs test case * add ci install xgrammar * add ci timeout time * update test for structured_outputs * update code * add error traceback info * update error msg * update structred output code * update code * update code * update config * update torch version --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-09-02 16:21:09 +08:00
co63oc	d6369b4d51	fix typos (#3684 )	2025-09-01 17:50:17 +08:00
yangjianfengo1	3754a9906d	[Feature] block sparse attention (#3668 ) * 支持稀疏attn * fix bug * code style * fix moba attn get kv shape * 修复a100编译 * codestyle * code style * code style * code style * fix conflict * 增加单侧 * code style * 增加eblite 加载时间 * fix bug * for ci * for ci * for ci * for ci * 支持mlp block size 128 * 增加小算子单测 * fix 单测 mlp * 将环境变量加入到config里面 * fix rollout config * 修复显存 * add test server * add test server * fix mlp 最后一层使用full attn	2025-08-29 19:46:30 +08:00
Jiang-Jia-Jun	c694fa2879	Revert "[Feature] block sparse attention (#3209 )" (#3647 ) This reverts commit `646a0c2fd8`.	2025-08-27 17:35:04 +08:00
yangjianfengo1	646a0c2fd8	[Feature] block sparse attention (#3209 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * 支持稀疏attn * fix bug * code style * fix moba attn get kv shape * 修复a100编译 * codestyle * code style * code style * code style * fix conflict * 增加单侧 * code style * 增加eblite 加载时间 * fix bug * for ci * for ci * for ci * for ci * 支持mlp block size 128 * 增加小算子单测 * fix 单测 mlp * 将环境变量加入到config里面 * fix rollout config	2025-08-26 07:16:04 -07:00
Sunny-bot1	c68c3c4b8b	[Feature] bad words support v1 scheduler and specifiy token ids (#3608 ) * support bad_words_token_ids * docs * fix test * fix * bad words support kvcache v1 and token ids * fix	2025-08-25 20:14:51 -07:00
zhink	df7c31012b	Modified to support custom all reduce by default (#3538 )	2025-08-22 16:59:05 +08:00
RAM	154308102e	[Docs]Updata docs of graph opt backend (#3442 ) * Updata docs of graph opt backend * update best_practices	2025-08-15 21:30:32 +08:00
ltd0924	5a84324798	[Doc] Add multinode deployment documents (#3417 ) * Create multi-node_deployment.md * Create multi-node_deployment.md * Update mkdocs.yml	2025-08-15 10:37:04 +08:00
Sunny-bot1	789dc67ff7	[Docs]fix sampling docs (#3113 ) * fix sampling docs * fix sampling docs * update	2025-08-11 20:42:27 +08:00
lizhenyun01	fe540f6caa	[plugin] Custom model_runner/model support (#3186 ) * support custom model&&model_runner * fix merge * add test && update doc * fix codestyle * fix unittest * load model in rl	2025-08-04 18:52:39 -07:00
gaoziyuan	4021d66ea5	【Feature】add fd plugins && rm model_classes (#3123 ) * add fd plugins && rm model_classed * fix reviews * add docs * fix * fix unitest ci	2025-08-03 19:53:20 -07:00
LiqinruiG	25005fee30	[Doc] add chat_template_kwagrs and update params docs (#3103 ) * add chat_template_kwagrs and update params docs * add chat_template_kwagrs and update params docs * update enable_thinking * pre-commit * update test case --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:44:06 +08:00
JYChen	bd29b2aaca	add stop_seqs doc (#3090 )	2025-07-30 20:36:18 +08:00
李泳桦	b242150f94	[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3058 ) * [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client * [fix] delete ci test case for enable_thinking * [fix] add reasoning_parser when server starts * [fix] fix ci consistency test error with reasoning parser * [doc] update docs related to metadata * [fix] cancel enable_thinking default value	2025-07-30 19:25:20 +08:00
Zero Rains	4dc130c5a9	[Doc] add repetition early stopping doc (#3078 ) * add repetition early stop doc * add the early_stop.md	2025-07-29 22:01:57 -07:00
lddfym	5ca684c762	update doc: load_balance.md (#3008 ) * update doc of load_balance * update doc: load_balance.md	2025-07-30 10:27:56 +08:00
Sunny-bot1	9c962343f2	[Docs] add sampling docs (#2973 ) * add sampling docs * add minp sampling docs * update sample docs * update * update * add bad words desc * update	2025-07-30 02:24:16 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
LiqinruiG	b38823bc66	modify reasoning_output docs (#2696 )	2025-07-04 11:30:02 +08:00
qingqing01	90a5b18742	Update disaggregated.md Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-06-30 11:57:12 +08:00
qingqing01	7c43500060	Update disaggregated.md	2025-06-30 11:56:33 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00

31 Commits