FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 08:37:06 +08:00

Author	SHA1	Message	Date
luukunn	eda83ca672	add Tool Parser (#3272 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add tool-parser * add tool-parser * add tool parser * add tool parser * fix * add offline * add offline * fix * parsers:tool&reasoning * 修改tool parser名称· * update * fix reasoning-parser * add requirements * fix finish reason * fix * fix reasoning-parser * fix * fix * fix * fix * fix --------- Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>	2025-08-13 01:06:55 +08:00
memoryCoderC	2d1a4cacdf	Completion add raw_prediction/text_after_process (#3356 )	2025-08-12 23:06:45 +08:00
memoryCoderC	c575611a5b	[BugFix] v1/completions add finish_reason (#3246 ) * [BugFix] v1/completions add finish_reason * update TestOpenAIServingCompletion for merge --------- Co-authored-by: YUNSHEN XIE <1084314248@qq.com>	2025-08-12 19:40:26 +08:00
Jiang-Jia-Jun	90bfa0be9c	Update envs.py	2025-08-12 16:24:47 +08:00
Jiang-Jia-Jun	5620bd12de	Update envs.py	2025-08-12 16:24:33 +08:00
gaoziyuan	ccc7f1beb3	fix mapping (#3320 )	2025-08-12 16:15:59 +08:00
RichardWooSJTU	283da92bfa	fix ep lm head (#3244 ) Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-08-12 15:38:28 +08:00
ming1753	f5164215be	[Bug Fix] fix vl V1 schedule bug (#3323 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Bug Fix] fix vl V1 schedule bug * fix format	2025-08-12 11:31:39 +08:00
chenjian	b21272d9ff	[Bug fix] fix block num setting in scheduler v1 for develop (#3303 ) * fix block num setting in scheduler v1 * fix block num setting in scheduler v1 * fix max_block_num and max_num_batched_tokens setting * fix max_block_num and max_num_batched_tokens setting * fix max_block_num and max_num_batched_tokens setting * fix max_block_num and max_num_batched_tokens setting	2025-08-12 10:38:51 +08:00
Jiang-Jia-Jun	183e3863e8	Remove useless code (#3337 )	2025-08-12 10:32:31 +08:00
Zero Rains	b23af29d0b	Launch expert_service before kv_cache initialization in worker_process (#3045 ) * launch expert_service before kv_cache initialization * add two signal make sure model loading and expert_service lauching finished * fix the EP bug * fix ep * update launching way * fix ep * update * roback ep * pre-commit all files --------- Co-authored-by: RAM <gstian5555@outlook.com> Co-authored-by: Divano <dddivano@outlook.com>	2025-08-11 19:38:46 +08:00
Jiang-Jia-Jun	c56c99837a	Revert "[BugFix] num_seqs (#3291 )" (#3316 ) This reverts commit `e0aeac58e1`.	2025-08-11 16:16:51 +08:00
Yuanle Liu	9571c458f0	enhance eos_tokens (#3274 ) * enhance eos_tokens * update * update	2025-08-11 14:47:52 +08:00
Zero Rains	42af0b4b64	[V1 Loader] Support DeepSeekV3(bf16) (#3294 ) * Support new loader for DeepSeekV3(bf16) * update paddle version * remove useless attr	2025-08-11 13:39:28 +08:00
lizexu123	e0aeac58e1	[BugFix] num_seqs (#3291 ) * fix num_seqs * merge develop	2025-08-11 13:38:55 +08:00
chenjian	b88537a456	fix bug for scheduler v0 (#3308 )	2025-08-11 13:07:04 +08:00
chen	46c8491201	merge logprob into batch_output (#3266 )	2025-08-11 10:03:00 +08:00
chenjian	c011cb8b16	[Bug Fix] Fix scheduler bug in develop (#3292 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Fix scheduler bug in develop * Fix scheduler bug in develop * Fix scheduler bug in develop	2025-08-10 13:55:38 +08:00
ltd0924	31d4fcb425	[BugFix] fix too many open files problem (#3256 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Update cache_messager.py * fix too many open files problem * fix too many open files problem * fix too many open files problem * fix ci bugs * Update api_server.py * add parameter * format * format * format * format * Update parameters.md * Update parameters.md * Update serving_completion.py * Update serving_chat.py * Update envs.py --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-08 20:10:11 +08:00
gaoziyuan	a799d14df1	[Bugfix] Fix model accuracy in some ops (#3231 ) * fix noaux_tc op * fix * update * fix qk norm * fix linear for prequant loader * test * fix * fix * rm some print * fix noaux_tc op * test * Fix the confused enable_early_stop when only set early_stop_config (#3214) * fix the confused early_stop_config when only set early_stop_config * pre-commit * write a general method * Add ci case for min token and max token (#3229) Co-authored-by: xujing43 <xujing43@baidu.com> * add some evil cases (#3240) * add repitation early stop cases * add repitation early stop cases * add bad cases * add bad cases * add evil cases * qwen3_moe (#3084) * [Feature] support seed parameter (#3161) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix * 【Fix Bug】修复 fa3 支持集中式bug (#3235) * fix fa3 集中式bug * 增加qknorm参数 * fix qk norm * fix * update * fix linear for prequant loader * fix * fix * rm some print * fix * fix moe init weight&scale * fix moe init weight&scale --------- Co-authored-by: bukejiyu <395822456@qq.com> Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com> Co-authored-by: Zero Rains <linjunlu@zerorains.top> Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com> Co-authored-by: xujing43 <xujing43@baidu.com> Co-authored-by: Divano <dddivano@outlook.com> Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com> Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com> Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com> Co-authored-by: qingqing01 <dangqingqing@baidu.com>	2025-08-08 17:30:37 +08:00
Zero Rains	ce1f353c70	Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend (#3148 ) * w4a8 bug * fix w4a8 bug * remove code * modify the triton backend * fix ep * fix the bug with tensor_wise_fp8 in triton backend * fix the RL * fix bug by merge * fix the bug in w4a8 * fix the tensor_wise_fp8 bug * fix RL	2025-08-08 15:55:47 +08:00
freeliuzc	71267840f7	【Fix】fix mtp bug (#3139 )	2025-08-08 13:30:12 +08:00
bukejiyu	b76b17fc1b	qwen3 0.3B fix (#3255 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-08-08 11:35:40 +08:00
Yuanle Liu	fac2f64837	delete parallel_state.py (#3250 )	2025-08-08 11:03:29 +08:00
yzwu	fbdd6b0663	[Iluvatar GPU] Optimze attention and moe performance (#3234 )	2025-08-08 10:51:24 +08:00
bukejiyu	37569cca86	[feat]add fast_weights_iterator (#3258 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add fast_weights_iterator * update * update	2025-08-07 22:36:46 +08:00
chenjian	5f0b30f6d0	support logprob in scheduler v1 (#3249 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-07 20:14:01 +08:00
Yzc216	6037dd5d9c	[fix] multi source download (#3259 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation * modify model download path * add requirements * error optimization * 连接失败兜底 * 连接失败兜底 * 连接失败兜底 * unit test * unit test * unit test * test * test * 兜底修改 * Trigger CI	2025-08-07 19:30:39 +08:00
JYChen	9423c577fe	[stop_seq] fix out-bound value for stop sequence (#3216 ) * fix out-bound value for stop sequence * catch error if there are out-of-bounds value * check in offline mode * add ut tests	2025-08-07 15:40:21 +08:00
李泳桦	09cc4e2802	[fix] fix completion stream api output_tokens not in usage (#3247 )	2025-08-07 10:36:00 +08:00
Yzc216	d9e3f88f9e	[Feature] multi source download (#3125 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation * modify model download path * add requirements * error optimization * 连接失败兜底 * 连接失败兜底 * 连接失败兜底 * unit test * unit test * unit test * test * test	2025-08-07 00:40:27 +08:00
bukejiyu	9408e667a5	[bugfix]fix blockwisefp8 and all_reduce (#3243 ) * fix * update * fix linear for prequant loader	2025-08-06 23:54:33 +08:00
yangjianfengo1	3a15e0c53e	【Fix Bug】修复 fa3 支持集中式bug (#3235 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix fa3 集中式bug * 增加qknorm参数	2025-08-06 16:24:27 +08:00
lizexu123	afff4d37ea	[Feature] support seed parameter (#3161 ) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix	2025-08-06 15:20:47 +08:00
bukejiyu	20839abccf	qwen3_moe (#3084 )	2025-08-06 14:45:27 +08:00
Zero Rains	36dc73470d	Fix the confused enable_early_stop when only set early_stop_config (#3214 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix the confused early_stop_config when only set early_stop_config * pre-commit * write a general method	2025-08-06 11:42:27 +08:00
sg263	841e831575	[Trace]add trace when fd start (#3174 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue * fix annotation * fix annotation when add opentelemetry * fix opentelemetry-instrumentation-fastapi * fix pentelemetry-bootstrap * fix opentelemetry can not work in uvicorn * move conf to env * fd start add trace * fix pre-commit * fix pre-commit * change FD_JOB_ID --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: shige <shige@baidu.com>	2025-08-05 21:18:27 +08:00
Yuan Xiaolan	7ce00e597c	support qk norm (#3145 )	2025-08-05 16:46:14 +08:00
RAM	4a10e29804	fix mla attention backend (#3176 )	2025-08-05 16:43:15 +08:00
Yuan Xiaolan	af543b7f0f	revise get_moe_scores (#3164 )	2025-08-05 16:43:07 +08:00
lizexu123	b01cfd6007	[BugFix] support real batch_size (#3109 ) * support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix	2025-08-05 16:33:54 +08:00
Jiang-Jia-Jun	55939f7942	Update engine.py	2025-08-05 16:10:36 +08:00
RichardWooSJTU	1e9a8e8cef	fix lm head bias (#3185 ) Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-08-05 15:40:24 +08:00
RichardWooSJTU	f5c64a074c	[EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization (#3182 ) * Add support for mixed-ep across multi nodes * code refine --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-08-05 15:40:11 +08:00
lizhenyun01	fe540f6caa	[plugin] Custom model_runner/model support (#3186 ) * support custom model&&model_runner * fix merge * add test && update doc * fix codestyle * fix unittest * load model in rl	2025-08-04 18:52:39 -07:00
Sunny-bot1	72ef5a9c93	[FIX]fix bad_words when sending requests consecutively (#3197 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix bad_words * fix log * fix log	2025-08-04 05:59:41 -07:00
Yuan Xiaolan	1f8289e106	fix expertwise_scale (#3181 )	2025-08-04 20:06:15 +08:00
SunLei	68bc1d12c0	[Bugfix] Fix uninitialized decoded_token and add corresponding unit test. (#3195 )	2025-08-04 19:23:58 +08:00
Longzhi Wang	01d7586661	[Bug fix] Fix cudagraph when use ep. (#3130 ) * fix cudagraph when use ep * fix typo * reduce full length to adapt large bsz such 128/256	2025-08-04 18:06:18 +08:00
周周周	2bd8a50649	remove useless code (#3166 )	2025-08-04 18:03:08 +08:00

1 2 3 4 5 ...

804 Commits