FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-02 23:32:48 +08:00

Author	SHA1	Message	Date
gaoziyuan	82e64b13e1	[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop (#3598 ) * [Feature] update ep * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix queue ports idx * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * fix ci * Update engine.py * fix ci * fix some bug in mixed ep * add server fix and op fix * rm some log * fix code style * ltd fix * fix * fix * fix some bug * fix bug * fix bug * fix style * Update config.py * Update splitwise_connector.py * Update cache_messager.py * Update __init__.py * merge and fix * Update engine.py * Update common_engine.py * Update run_ci_xpu.sh * Update ernie_processor.py * Update ernie_processor.py --------- Co-authored-by: ltd0924 <ltd0924@sina.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>	2025-08-26 19:59:02 +08:00
Yuanle Liu	cbce94a00e	rename ernie_xxx to ernie4_5_xxx (#3621 ) * rename ernie_xxx to ernie4_5_xxx * ci fix	2025-08-26 19:29:27 +08:00
lzy	d339df2e90	Supports DP+TP+EP hybrid parallel deployment strategy (#3489 ) * Support DP+TP+EP hybrid parallel deployment strategy * Support DP+TP+EP hybrid parallel deployment strategy * fix conflict * add moe_tp_ep function split_allgather_out * del tp_group in moe_cutlass_backend * for ci * fix parallel_config for ci * del log	2025-08-26 00:04:01 -07:00
Sunny-bot1	c68c3c4b8b	[Feature] bad words support v1 scheduler and specifiy token ids (#3608 ) * support bad_words_token_ids * docs * fix test * fix * bad words support kvcache v1 and token ids * fix	2025-08-25 20:14:51 -07:00
RAM	2fa173e327	[Executor] CUDAGraph support RL training (#3265 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details * add clear graph opt backend * cuda graph support rl * add branch * 1.fix dynamic_weight_manager bug 2.add clear api for CasualLM * open test case * fix typo * update mkdocs.yaml * [Docs]Update mkdocs.yml * update test case * use unittest in graph test case	2025-08-25 20:59:30 +08:00
chen	9cab3f47ff	[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552 ) * [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing * infer engine support temp_scaled_logprobs and top_p_normalized_logprobs * delete some code * code check * code check and add doc * fix tokenizer.decoder(-1), return 'Invalid Token' * add ci for temp_scaled and top_p logprobs * check test * check seq len time shape * logprob clip inf --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com>	2025-08-25 14:11:49 +08:00
lizexu123	a053ab889b	[BugFix] fix num_running_requests in cuda_graph (#3457 ) * fix cuda_grpah * add note --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-08-19 10:47:22 +08:00
lizexu123	32b39620bc	[Code Simplification] remove cum_offsets (#3410 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details	2025-08-18 20:21:25 +08:00
Jundong Liu	ea4a3b479c	[Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool (#3404 ) * 修复buffer申请不够大，增加打印forwardmetadata的工具 * fix mistake * Make CPU tensor in CPUPlace * Add test about forward_meta_str and Add unitest_requirement --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-08-18 16:14:09 +08:00
chen	f0f00a6025	[OPs] Universal optimization and Fix early_stop cuda 700 (#3375 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * delete nonzero * delete setup_ops_base.py * check if * check gcp infer_seed.cpu() * fix repetition_early_stopper_kernel cuda 700	2025-08-14 22:40:44 +08:00
lizexu123	7b596d0877	[BugFix] fix real_bsz in ep (#3366 ) * Your commit message here * fix ep * delete cuda_graph	2025-08-14 17:31:19 +08:00
Jiang-Jia-Jun	c56c99837a	Revert "[BugFix] num_seqs (#3291 )" (#3316 ) This reverts commit `e0aeac58e1`.	2025-08-11 16:16:51 +08:00
Yuanle Liu	9571c458f0	enhance eos_tokens (#3274 ) * enhance eos_tokens * update * update	2025-08-11 14:47:52 +08:00
lizexu123	e0aeac58e1	[BugFix] num_seqs (#3291 ) * fix num_seqs * merge develop	2025-08-11 13:38:55 +08:00
chenjian	c011cb8b16	[Bug Fix] Fix scheduler bug in develop (#3292 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Fix scheduler bug in develop * Fix scheduler bug in develop * Fix scheduler bug in develop	2025-08-10 13:55:38 +08:00
yzwu	fbdd6b0663	[Iluvatar GPU] Optimze attention and moe performance (#3234 )	2025-08-08 10:51:24 +08:00
lizexu123	afff4d37ea	[Feature] support seed parameter (#3161 ) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix	2025-08-06 15:20:47 +08:00
lizexu123	b01cfd6007	[BugFix] support real batch_size (#3109 ) * support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix	2025-08-05 16:33:54 +08:00
Sunny-bot1	72ef5a9c93	[FIX]fix bad_words when sending requests consecutively (#3197 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix bad_words * fix log * fix log	2025-08-04 05:59:41 -07:00
Longzhi Wang	01d7586661	[Bug fix] Fix cudagraph when use ep. (#3130 ) * fix cudagraph when use ep * fix typo * reduce full length to adapt large bsz such 128/256	2025-08-04 18:06:18 +08:00
RAM	d850660872	[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989 ) * reset decoder_block_shape_q buffer * refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch * update decode_max_tile_size * fix pre-commit * update block_multihead_attn_backend * update flas attn backend * update MLA Attention * update XPU Attention * update gcu,iluvatar model runner * Update MTP * fix MTP bug	2025-07-31 00:09:31 +08:00
bukejiyu	db698bda01	qwen loader (#3057 )	2025-07-30 19:09:38 +08:00
ming1753	5acde4eb43	[Feature] Multimodal Scheduler V1 (#3019 ) * [Feature] Support multimodal scheduler v1 * remove debug log * fix bug * fix format * modify code * fix bug * fix bug * fix bug * modify code	2025-07-30 16:05:55 +08:00
Sunny-bot1	74aa31d15b	[Feature] support bad_words (#3055 ) * support bad_words * support online infer bad_words * update * add CI test * update * update * update --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-30 09:31:29 +08:00
Zero Rains	b2f9a42d87	[Feature] Support repetition early stop (#3024 ) * support repetition early stop and support user to set the parameter * remove log * fix codestyle * add the early_stop_config to rollout_config * update config and EarlyStopper class * fix the bug for triton * modify the stop method * update description * modify the usage for stop_flags --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-29 22:42:54 +08:00
YuanRisheng	502ee92a0a	Unify server-side and model-side Config (Part3) (#3047 ) * merge model config * fix arch * fix rl	2025-07-29 17:07:44 +08:00
JYChen	dafe02a7b9	[stop sequence] support stop sequence (#3025 ) * stop seqs in multi-ends * unittest for gpu stop op * kernel tid==0	2025-07-29 14:17:37 +08:00
begin2023	dd877f38b1	[Perf] Remove unnecessary operations in non-cuda_graph (#3010 ) * [Perf] Remove unnecessary operations in non-cuda_graph * fix code logic * use suggestion comment * reduce function call * reduce function call * reduce function call * reduce function call	2025-07-27 20:38:29 -07:00
YuanRisheng	6ccc10ad47	Unify server-side and model-side Config (Part1) (#3018 ) * move cache config * fix mtp	2025-07-28 10:51:52 +08:00
Longzhi Wang	0700c90caa	[Feat] support mixed ep (#2969 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Support mixed ep * fix comment * fix comment * update mixep * fix conflict * fix typo * update * fix typo * fix code style * fix conflict	2025-07-25 15:29:30 +08:00
ltd0924	3792345c3a	[LLM] update function name (#2985 ) * [LLM] update function name	2025-07-24 15:03:40 +08:00
lizhenyun01	29c3292f02	support c4 attn && fix cache	2025-07-24 12:00:52 +08:00
chenjian	85a78d695d	[Feature] Support block scheduler v1 for FD (#2928 ) * Support FD block scheduler v1 * Support FD block scheduler v1 * Support FD block scheduler v1 * Fix according to copilot review * Fix according to review * Remove is_dummy * Fix bug when real_bsz=1 * Fix infer first token cost time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-23 20:31:31 +08:00
Zero Rains	ca0f71bd39	polish code for prefill restrictions (#2991 )	2025-07-23 05:10:14 -07:00
Zero Rains	850c9d98d4	[BugFix] Add prefill restrictions for chunked_prefill+VL (#2983 )	2025-07-23 01:45:57 -07:00
lizexu123	9b22b8d2c3	delete max-len (#2959 )	2025-07-23 15:11:39 +08:00
Ryan	95b5af24db	[SOT] Add sot warmup (NVIDIA GPU Only) (#2929 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add sot warmup * fix code style * change batch_size list * add param to config * rm free_list settings && set sot_warmup_sizes * finish debug with dynamic dims by type annotations * add profile_run guard * rm sth useless	2025-07-22 21:36:14 +08:00
Zero Rains	89a485b69f	[Feature] Support using prefix-caching + cudagraph for inference (#2924 ) * fix the bug in cudagraph+prefix-caching but still have some bug with profile Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397 * add the signal to make sure cache manager launched * fix judge condition * reomove useless control * update control stream * update * fix xpu * change the do_profile flag * update * add new threads to init cache_manager --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-07-22 00:59:45 -07:00
littledgg	2845bde964	[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph (#2936 ) * [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph * Fix: Apply black formatting	2025-07-21 16:25:51 +08:00
lizexu123	67990e0572	[Feature] support min_p_sampling (#2872 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Fastdeploy support min_p * add test_min_p * fix * min_p_sampling * update * delete vl_gpu_model_runner.py * fix * Align usage of min_p with vLLM * fix * modified unit test * fix test_min_sampling * pre-commit all files * fix * fix * fix * fix xpu_model_runner.py	2025-07-20 23:17:59 -07:00
周周周	8c5407d9e4	remove cum_offsets from ForwardMeta (#2925 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-19 23:57:27 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
YuanRisheng	0eb5dc18d3	[BugFix]Fix sample rejection (#2908 ) * fix config * fix rejection	2025-07-18 13:44:30 +08:00
周周周	ddb10ac509	[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 ) * remove padding_offsets from atten	2025-07-17 18:41:31 +08:00
ming1753	67180c1ff9	[Bug Fix] fix bug of prompt penalty (#2888 )	2025-07-17 17:21:37 +08:00
Yuanle Liu	dbb9e2506b	Fix rollout_model init (#2881 )	2025-07-16 22:36:21 -07:00
ming1753	1f15ca21e4	[Feature] support prompt repetition_penalty (#2806 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-17 12:05:52 +08:00
Yuanle Liu	63d6e7ce06	fix and refine vl (#2866 ) * refine vl config * delete attn_sep * fix vl accuracy	2025-07-16 05:59:28 -07:00
YuanRisheng	101ad33332	[BugFix] Fix Configs (#2849 ) * fix config * fix config	2025-07-15 19:50:36 -07:00
RAM	0fad10b35a	[Executor] CUDA Graph support padding batch (#2844 ) * cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug	2025-07-15 19:49:01 -07:00

1 2

65 Commits