FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-08 10:00:29 +08:00

Author	SHA1	Message	Date
xiaoxiaohehe001	f265a26f8b	support mtp rope_3d (#3791 ) * support mtp rope_3d * Update speculate_write_cache_with_rope_kernel.cu	2025-09-04 17:18:05 +08:00
Yuan Xiaolan	fa58a9fa8f	qk norm for speculate decode C16 (#3637 )	2025-09-03 14:53:56 +08:00
Liumengyuan	e93d4cfcdd	Add with_output version AppendAttention (#3302 ) * get use_output from fd_config * add clear TODO description * add mask_offset para to align with develop * fix bug * fix use_output logic * fix sot bug	2025-08-28 17:10:18 +08:00
lzy	1e06b9fa6d	make append_attn supports mask_offset (#3138 ) * make append_attn supports mask_offset * add unittest	2025-08-14 03:40:55 -07:00
Ryan	ed6bff215a	fix custom op order rms_norm_eps (#3348 )	2025-08-13 10:12:49 +08:00
Yuan Xiaolan	7ce00e597c	support qk norm (#3145 )	2025-08-05 16:46:14 +08:00
周周周	ddb10ac509	[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 ) * remove padding_offsets from atten	2025-07-17 18:41:31 +08:00
周周周	aa76085d1f	[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)	2025-07-16 20:10:57 +08:00
jiangjiajun	684703fd72	[LLM] First commit the llm deployment code	2025-06-09 19:20:15 +08:00