FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-07 01:22:59 +08:00

Author	SHA1	Message	Date
Yuanle Liu	fac2f64837	delete parallel_state.py (#3250 )	2025-08-08 11:03:29 +08:00
yzwu	fbdd6b0663	[Iluvatar GPU] Optimze attention and moe performance (#3234 )	2025-08-08 10:51:24 +08:00
bukejiyu	37569cca86	[feat]add fast_weights_iterator (#3258 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add fast_weights_iterator * update * update	2025-08-07 22:36:46 +08:00
chenjian	5f0b30f6d0	support logprob in scheduler v1 (#3249 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-07 20:14:01 +08:00
Yzc216	6037dd5d9c	[fix] multi source download (#3259 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation * modify model download path * add requirements * error optimization * 连接失败兜底 * 连接失败兜底 * 连接失败兜底 * unit test * unit test * unit test * test * test * 兜底修改 * Trigger CI	2025-08-07 19:30:39 +08:00
JYChen	9423c577fe	[stop_seq] fix out-bound value for stop sequence (#3216 ) * fix out-bound value for stop sequence * catch error if there are out-of-bounds value * check in offline mode * add ut tests	2025-08-07 15:40:21 +08:00
李泳桦	09cc4e2802	[fix] fix completion stream api output_tokens not in usage (#3247 )	2025-08-07 10:36:00 +08:00
Yzc216	d9e3f88f9e	[Feature] multi source download (#3125 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation * modify model download path * add requirements * error optimization * 连接失败兜底 * 连接失败兜底 * 连接失败兜底 * unit test * unit test * unit test * test * test	2025-08-07 00:40:27 +08:00
bukejiyu	9408e667a5	[bugfix]fix blockwisefp8 and all_reduce (#3243 ) * fix * update * fix linear for prequant loader	2025-08-06 23:54:33 +08:00
yangjianfengo1	3a15e0c53e	【Fix Bug】修复 fa3 支持集中式bug (#3235 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix fa3 集中式bug * 增加qknorm参数	2025-08-06 16:24:27 +08:00
lizexu123	afff4d37ea	[Feature] support seed parameter (#3161 ) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix	2025-08-06 15:20:47 +08:00
bukejiyu	20839abccf	qwen3_moe (#3084 )	2025-08-06 14:45:27 +08:00
Zero Rains	36dc73470d	Fix the confused enable_early_stop when only set early_stop_config (#3214 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix the confused early_stop_config when only set early_stop_config * pre-commit * write a general method	2025-08-06 11:42:27 +08:00
sg263	841e831575	[Trace]add trace when fd start (#3174 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue * fix annotation * fix annotation when add opentelemetry * fix opentelemetry-instrumentation-fastapi * fix pentelemetry-bootstrap * fix opentelemetry can not work in uvicorn * move conf to env * fd start add trace * fix pre-commit * fix pre-commit * change FD_JOB_ID --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: shige <shige@baidu.com>	2025-08-05 21:18:27 +08:00
Yuan Xiaolan	7ce00e597c	support qk norm (#3145 )	2025-08-05 16:46:14 +08:00
RAM	4a10e29804	fix mla attention backend (#3176 )	2025-08-05 16:43:15 +08:00
Yuan Xiaolan	af543b7f0f	revise get_moe_scores (#3164 )	2025-08-05 16:43:07 +08:00
lizexu123	b01cfd6007	[BugFix] support real batch_size (#3109 ) * support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix	2025-08-05 16:33:54 +08:00
Jiang-Jia-Jun	55939f7942	Update engine.py	2025-08-05 16:10:36 +08:00
RichardWooSJTU	1e9a8e8cef	fix lm head bias (#3185 ) Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-08-05 15:40:24 +08:00
RichardWooSJTU	f5c64a074c	[EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization (#3182 ) * Add support for mixed-ep across multi nodes * code refine --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-08-05 15:40:11 +08:00
lizhenyun01	fe540f6caa	[plugin] Custom model_runner/model support (#3186 ) * support custom model&&model_runner * fix merge * add test && update doc * fix codestyle * fix unittest * load model in rl	2025-08-04 18:52:39 -07:00
Sunny-bot1	72ef5a9c93	[FIX]fix bad_words when sending requests consecutively (#3197 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix bad_words * fix log * fix log	2025-08-04 05:59:41 -07:00
Yuan Xiaolan	1f8289e106	fix expertwise_scale (#3181 )	2025-08-04 20:06:15 +08:00
SunLei	68bc1d12c0	[Bugfix] Fix uninitialized decoded_token and add corresponding unit test. (#3195 )	2025-08-04 19:23:58 +08:00
Longzhi Wang	01d7586661	[Bug fix] Fix cudagraph when use ep. (#3130 ) * fix cudagraph when use ep * fix typo * reduce full length to adapt large bsz such 128/256	2025-08-04 18:06:18 +08:00
周周周	2bd8a50649	remove useless code (#3166 )	2025-08-04 18:03:08 +08:00
gaoziyuan	0443587a57	【Feature】support qwen3 name_mapping (#3179 ) * add fd plugins && rm model_classed * fix reviews * add docs * fix * fix unitest ci * support qwen3 name_mapping	2025-08-04 01:34:07 -07:00
ltd0924	c9e6ce1518	Update cache_messager.py (#3172 )	2025-08-04 14:32:34 +08:00
gaoziyuan	4021d66ea5	【Feature】add fd plugins && rm model_classes (#3123 ) * add fd plugins && rm model_classed * fix reviews * add docs * fix * fix unitest ci	2025-08-03 19:53:20 -07:00
bukejiyu	1582814905	fix load_pre_sharded_checkpoint (#3152 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-04 10:44:20 +08:00
ApplEOFDiscord	b71cbb466d	[Feature] remove dependency on enable_mm and refine multimodal's code (#3014 ) * remove dependency on enable_mm * fix codestyle check error * fix codestyle check error * update docs * resolve conflicts on model config * fix unit test error * fix code style check error --------- Co-authored-by: shige <1021937542@qq.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-01 20:01:18 +08:00
yangjianfengo1	64d7a3194d	集中式支持fa3 (#3112 )	2025-08-01 18:03:36 +08:00
Ryan	94264bbf60	[Code Simplification] Refactor Post-processing in VL Model Forward Method (#2937 ) * rm sth useless * refactor model forward * mv bool index to kernel	2025-08-01 17:28:07 +08:00
yinwei	3a4db15765	Fix out-of-memory issue during single-XPU deployment (#3133 )	2025-08-01 17:12:03 +08:00
chen	a2f5cc54f8	moe preprocess op support 160 experts and fused_moe triton kernel name add K (#3121 )	2025-08-01 10:46:20 +08:00
SunLei	dade19d7a4	[Feature] General support for logprobs (#2974 ) * [Feature] support logprobs in chat/completions and completions endpoints * Temporarily comment out text_offset due to incorrect logic * Clean up temporary debug prints * [Feature] support logprobs in offline mode via SamplingParams * fix: serialize Logprob as dict before zmq send to fix msgpack error * refactor: remove redundant methods to simplify codebase * Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization * refactor: centralize param validation in engine_client to reduce duplication * revert: rollback changes in offline_demo.py * revert: rollback changes in offline_demo.py * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:25:56 +08:00
chenjian	fe17410f9c	[BUG] Fix bug for pd in fd (#3034 ) * Fix bug for pd in fd * Fix bug for pd in fd --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:17:27 +08:00
Yuan Xiaolan	5f56d289a7	fix is_permuted (#3098 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:58:05 +08:00
LiqinruiG	25005fee30	[Doc] add chat_template_kwagrs and update params docs (#3103 ) * add chat_template_kwagrs and update params docs * add chat_template_kwagrs and update params docs * update enable_thinking * pre-commit * update test case --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:44:06 +08:00
kevin	22cab724e8	[Feature] block scheduler v1 support prefix caching (#3061 ) * block scheduler v1 support prefix cache * update code * update code * fix code bug * add timeout time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:29:19 +08:00
chenjian	32307283f1	Fix bug for offline inference in scheduler v1 (#3117 )	2025-07-31 17:54:24 +08:00
RAM	d850660872	[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989 ) * reset decoder_block_shape_q buffer * refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch * update decode_max_tile_size * fix pre-commit * update block_multihead_attn_backend * update flas attn backend * update MLA Attention * update XPU Attention * update gcu,iluvatar model runner * Update MTP * fix MTP bug	2025-07-31 00:09:31 +08:00
chenjian	fe0e3f508b	[BUG FIX] Fix bug when preempted request rescheduled (#3080 ) * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled	2025-07-30 22:25:47 +08:00
Jiang-Jia-Jun	0616c208d2	[Feature] Support include_stop_str_in_output in completion api (#3096 ) * [Feature] Support include_stop_str_in_output in completion api * Fix ci test --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-30 22:18:48 +08:00
YuanRisheng	7dfdd157ac	[BugFix]Fix ep size (#3092 ) * fix ep * fix num_layer	2025-07-30 21:03:12 +08:00
ltd0924	d17886de19	[Feature] support ep in mixed mode (#3001 ) * [LLM] support ep * Update worker_process.py * Update expert_service.py * Update worker_process.py * format files	2025-07-30 20:43:39 +08:00
Zhida Hu	3f8a41e68c	[*] fix the memory leak when modify qp to rts failed (#3051 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-30 19:49:07 +08:00
李泳桦	b242150f94	[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3058 ) * [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client * [fix] delete ci test case for enable_thinking * [fix] add reasoning_parser when server starts * [fix] fix ci consistency test error with reasoning parser * [doc] update docs related to metadata * [fix] cancel enable_thinking default value	2025-07-30 19:25:20 +08:00
bukejiyu	db698bda01	qwen loader (#3057 )	2025-07-30 19:09:38 +08:00

1 2 3 4 5 ...

781 Commits