FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-09-26 20:41:53 +08:00

Author	SHA1	Message	Date
Zero Rains	bd30b08521	get org_vocab_size from args (#3981 )	2025-09-09 15:08:47 +08:00
Divano	1aa16146ba	Update requirements.txt (#3915 )	2025-09-05 13:51:22 +08:00
ApplEOFDiscord	dac0a00d0f	[BugFix] fix max streaming tokens invalid (#3774 ) (#3856 ) * Update serving_chat.py * Update serving_completion.py Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>	2025-09-03 17:50:29 +08:00
ltd0924	c5591c45df	[BugFix] fix max streaming tokens invalid (#3774 ) * Update serving_chat.py * Update serving_completion.py	2025-09-02 21:00:29 +08:00
chen	121ac85d7d	fix (#3640 )	2025-08-27 14:23:38 +08:00
chen	d233e3c97c	[Precision] Change lm_head layer running in float32 (#3596 ) * support lm_head fp32 bf16 fp16 * delete print * code check * check * check * code check * check * check	2025-08-26 20:20:06 +08:00
chen	2136990144	[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3536 ) * [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing * infer engine support temp_scaled_logprobs and top_p_normalized_logprobs * code check * code check * fix tokenizer.decoder(-1), return 'Invalid Token' * check seq len time shape * logprob clip inf * code check --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com>	2025-08-25 14:11:18 +08:00
kevin	b7890cbe8d	fix uvicorn multi worker error (#3339 )	2025-08-25 11:24:07 +08:00
chenjian	bc388b65c7	[Bug fix] Fix bug in logprob in release 2.0.4 (#3445 ) * fix bug for scheduler v0 * Fix logprob in release/2.0.4	2025-08-16 21:13:10 +08:00
Jiang-Jia-Jun	71af0ca04a	[BugFix] Fix default log level of paddleformers (#3378 )	2025-08-15 18:30:00 +08:00
YuBaoku	d66660a0d1	[CI] fix run_ci error in release/2.0.4 (#3411 )	2025-08-14 22:44:17 +08:00
xiaolei373	f0519aec67	feat(log):add_request_and_response_log (#3391 ) * feat(log):add_request_and_response_log * [ci] Retrigger * [ci] Retrigger	2025-08-14 19:12:42 +08:00
gaoziyuan	1f5983290c	fix mapping (#3321 )	2025-08-12 16:17:59 +08:00
chenjian	c6a133d573	[Bug fix] Fix block num in scheduler v1 for release2.0.4 (#3314 ) * fix bug for scheduler v0 * fix block num setting in scheduler v1 * fix block num setting in scheduler v1 * fix block num setting in scheduler v1 * fix block num setting in scheduler v1 * fix block num setting in scheduler v1	2025-08-11 23:55:45 +08:00
chenjian	4646aff25c	fix bug for scheduler v0 (#3307 )	2025-08-11 23:55:20 +08:00
chenjian	a84a98b107	fix scheduler bug due to async running (#3293 )	2025-08-10 13:54:59 +08:00
chenjian	c208086f61	fix scheduler bug for bs=1 (#3288 )	2025-08-09 12:22:12 +08:00
sg263	ce1d4944e7	merge develop trace FD_START (#3253 ) (#3260 ) Co-authored-by: shige <shige@baidu.com>	2025-08-07 16:06:58 +08:00
chenjian	5439fb6336	[Cherry-pick] FIx bug for scheduler V1 (#3167 ) * [BUG FIX] Fix bug when preempted request rescheduled (#3080) * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled * Fix bug for offline inference in scheduler v1 (#3117)	2025-08-04 17:08:12 +08:00
gaoziyuan	a592d17615	support qwen3 name_mapping (#3180 )	2025-08-04 16:37:34 +08:00
李泳桦	eca8fc7ca6	[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3077 ) * [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client * [fix] delete ci test case for enable_thinking * [fix] add reasoning_parser when server starts * [doc] update docs related to metadata * [fix] fix ci consistency test error with reasoning parser * [fix] cancel enable_thinking default value	2025-07-30 19:25:39 +08:00
李泳桦	0463797fc2	[feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3023 ) * [feat] add disable_chat_template in chat api as a substitute for previous raw_request * [fix] pre-commit code check	2025-07-25 20:57:06 +08:00
Jiang-Jia-Jun	0ab8645fc4	Update setup.py	2025-07-25 10:27:51 +08:00
xiaoxiaohehe001	2970b00dfa	[Feature] Support_eplb (#2997 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Feature] support_eplb * [Feature] support_eplb * [Fix] fix mm ep	2025-07-24 20:22:45 +08:00
littledgg	f37d00e856	[Model] Provide clearer error for missing KV cache quantization scales (#3007 )	2025-07-24 20:15:00 +08:00
EnflameGCU	c40df1802e	[GCU] Update to develop (#2988 )	2025-07-24 19:30:52 +08:00
Yzc216	980126b83a	[Feature] multi source download (#3005 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation	2025-07-24 17:42:09 +08:00
Zero Rains	0fb37ab7e4	update flake8 version to support pre-commit in python3.12 (#3000 ) * update flake8 version to support pre-commit in python3.12 * polish code	2025-07-24 01:43:31 -07:00
Zhang Yulong	5151bc92c8	Update benchmark tools (#3004 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * update benchmark tools * update benchmark tools	2025-07-24 15:19:23 +08:00
ltd0924	f935d6f862	[BugFix] fix multinode deployment (#2977 )	2025-07-24 15:04:04 +08:00
ltd0924	3792345c3a	[LLM] update function name (#2985 ) * [LLM] update function name	2025-07-24 15:03:40 +08:00
Yzc216	e14587a954	[Feature] multi-source download (#2986 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit	2025-07-24 14:26:37 +08:00
YUNSHEN XIE	87a2f4191d	add ci reuse action (#2968 ) * add ci reuse action * fix code formatting * update	2025-07-24 14:24:10 +08:00
xiaoxiaohehe001	2c0ff068e2	[Fix] fix mm ep empty run (#2999 )	2025-07-24 14:15:55 +08:00
xiegegege	e3a843f2c5	[benchmark] add quantization for benchmark yaml (#2995 )	2025-07-24 13:26:34 +08:00
lizhenyun01	6235ef3881	fix chunk_prefill	2025-07-24 12:00:52 +08:00
lizhenyun01	29c3292f02	support c4 attn && fix cache	2025-07-24 12:00:52 +08:00
lizexu123	832d25334a	[Code Simplification] fix init_distributed_environment() (#2982 )	2025-07-24 11:43:28 +08:00
bukejiyu	bfeb664ab8	update (#2978 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-24 00:16:42 +08:00
chenjian	85a78d695d	[Feature] Support block scheduler v1 for FD (#2928 ) * Support FD block scheduler v1 * Support FD block scheduler v1 * Support FD block scheduler v1 * Fix according to copilot review * Fix according to review * Remove is_dummy * Fix bug when real_bsz=1 * Fix infer first token cost time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-23 20:31:31 +08:00
Zero Rains	ca0f71bd39	polish code for prefill restrictions (#2991 )	2025-07-23 05:10:14 -07:00
chen	172e69fe17	FA3 fix bug (#2987 )	2025-07-23 19:07:43 +08:00
zhink	1272c7ce98	Fix performance degradation bug of custom_all_reduce (#2981 )	2025-07-23 17:45:44 +08:00
Zero Rains	850c9d98d4	[BugFix] Add prefill restrictions for chunked_prefill+VL (#2983 )	2025-07-23 01:45:57 -07:00
freeliuzc	a39a67334c	fix mtp bug in pd-split mode (#2970 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-23 15:31:16 +08:00
YuBaoku	6c4cfd9359	[CI] add codestyle_check action (#2972 ) * [CI] add codestyle_check action * [CI] Integrate codestyle check via pre-commit in GitHub Actions	2025-07-23 15:21:56 +08:00
lizexu123	9b22b8d2c3	delete max-len (#2959 )	2025-07-23 15:11:39 +08:00
Jiang-Jia-Jun	5b59a97030	Update README.md	2025-07-23 13:52:14 +08:00
Jiang-Jia-Jun	475dc6d84e	Update README.md	2025-07-23 13:47:31 +08:00
chen	ad202272ed	【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942 )	2025-07-23 13:02:50 +08:00

1 2 3 4 5 ...

2819 Commits