bukejiyu
1539fd6056
[BugFix]Set default OMP_NUM_THREADS=3 and fix extra GPU memory usage in DeepSeek ( #5219 )
...
* fix bug
* update
* update
* update
* fix copy
* update
2025-11-28 14:22:04 +08:00
Daci
7dc06cac6e
[BugFix] race condition [is_fetching] causing multiple fetch requests ( #5238 )
...
* RouterArgs port str -> int
* fix race condition [is_fetching] causing multiple fetch requests
* bugfix: Delete duplicate input_ids tensor creation
2025-11-28 13:41:36 +08:00
SunLei
c424e08dc5
[Speculative Decoding] split draft_tokens into standalone post-processing path ( #5205 )
...
* refactor(mtp): split draft_tokens into standalone post-processing path for MTP + logprobs
* Restore Request.__repr__ implementation
* ci
* add envs
* fix unittest
2025-11-27 11:22:41 +08:00
kevin
bf30f45738
[BugFix] fix vl performance bug ( #5181 )
...
* fix vl performance bug
* update code
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-11-26 21:06:52 +08:00
freeliuzc
ba915e03e1
[BugFix]Fix attention mask bug in D-Node of PD-split mode ( #5245 )
2025-11-26 17:56:28 +08:00
Yonghua Li
cead6b26fa
[Metrics] Update time_to_first_token to include tokenization & queue time, and remove redundant metrics ( #4993 )
...
* [update] update time_to_first_tokens to include queue time, and remove first_token_latency and infer_latency
* [doc] update docs
* [ci] fix test
* [chore] delete redundant code
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-11-26 14:42:17 +08:00
Daci
f25ee3a26f
[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 ( #5140 )
...
* enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-26 10:22:35 +08:00
kevin
df2be1cf16
[BugFix] fix mm_positions type error ( #5182 )
...
* fix mm_positions type error
* update code
* update code
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-11-25 19:28:18 +08:00
chenjian
09b47c7111
[Bug fix] Send first token in D instance ( #5199 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug fix] Send first token in D instance
* fix
2025-11-24 23:42:20 +08:00
kevin
8e4e3ff510
[Feature] support eplb in api_server ( #4782 )
...
* support eplb in api_server
* update code
* add eplb test case
* update eplb
* support tp+dp eplb
* update test cese
* update code
* update code
* fix bug
* update copilot review
* update test case name
2025-11-24 20:22:29 +08:00
xiaozude
d5bd64336a
[Metax] support ENABLE_V1_KVCACHE_SCHEDULER ( #5163 )
2025-11-24 19:19:49 +08:00
Juncai
af03da5127
[BugFix] fix release block ids ( #5184 )
...
* fix release block ids
* up
2025-11-24 16:48:09 +08:00
chenjian
3ea1b44a58
[Optimization] Improve perf for fd response token with internal adapter ( #4992 )
...
* [Optimize] Improve perf for fd response token with internal adapter
* fix
* fix bug
* fix ci
* fix ci
* fix ci
* fix ci
2025-11-21 19:02:03 +08:00
Yuanle Liu
5bcf79d780
[BugFix] fix num of rdma_comm_ports check ( #5168 )
...
* fix num of rdma_comm_ports check
* update
* update
* update
2025-11-21 18:31:14 +08:00
Jiang-Jia-Jun
d2298dcb0c
[Polish] Simplify __repr__ method in Request class ( #5153 )
...
Remove detailed string representation for Request class.
2025-11-21 17:21:06 +08:00
Juncai
f9b0545a7f
[PD Disaggregation] [Refine] Refine splitwise deployment ( #5151 )
...
* Refine splitwise deployment
* up
2025-11-21 15:30:24 +08:00
kevin
7454480e07
[Feature] support bos download retry ( #5137 )
...
* support bos download retry
* update code
* update code
2025-11-21 10:18:32 +08:00
Yonghua Li
43097a512a
[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol ( #5132 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix v1 scheduler profile run for append attention in prefill node
* [fix] skip send_signal if kv signal not inited for gpu and xpu
* [fix] extend fix to flash_attn & mla_attn
* [fix] fix v1 pd run in ipc transfer protocol
* [ci] add test for v1 pd profile run using ipc transfer protocol
* [style] fix code style check
* [style] fix code style again
* [fix] fix profile run
* [update] remove --num-gpu-blocks-override in example script
* [chore] rename forward_meta is_profiling to is_dummy_or_profile_run
2025-11-20 21:39:22 +08:00
Juncai
01c30f6b87
Fix schedule error in splitwise deployment ( #5149 )
2025-11-20 21:18:10 +08:00
Yuanle Liu
7ac25935c7
[Optimization] default compile rdma, reduce cudagraph buffer size in mm, fix some config bug ( #5121 )
...
* default compile rdma, reduce cudagraph buffer size in mm, fix some config logic
* update
* update
* fix bug
* enhance rdma compile
* fix
2025-11-20 17:19:47 +08:00
yangjianfengo1
af715db763
[Scheduler] Support chunk prefill for video input ( #5107 )
...
* add video chunk prefill
* add vit_merge=True for test_tokenizer_client.py
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-11-20 16:29:13 +08:00
kevin
109d48e456
[Feature] support async download features ( #5003 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support async download features
* add test case
* update code
2025-11-19 22:23:36 +08:00
chen
d58c1db8a0
[Feature][OP] Append Attn Support CUDA-PDL ( #5072 )
2025-11-17 20:47:33 +08:00
qwes5s5
36216e62f0
[Log] Add trace log and add loggingInstrumentor tool ( #4692 )
...
* add trace logger and trace print
* trigger ci
* fix unittest
* translate notes and add copyright
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-11-17 11:08:57 +08:00
fmiao2372
e43a5fc055
[Intel HPU] enable level 1 prefix caching and fix some bugs ( #4971 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [Intel HPU] enable prefix caching and dense tp moe ep and fix some bugs
* update code by copilot
* remove dense tp and moe ep code
2025-11-14 19:42:50 +08:00
chen
544ea9cbc2
check max_logprobs ( #5018 )
2025-11-14 17:18:06 +08:00
Juncai
36822fa49c
[PD Disaggregation] remove splitwise deployment on single node and refine the code ( #4891 )
...
* remove splitwise deployment on single node and refine the code
* up
* up
* up
* add test
* up
2025-11-14 09:56:53 +08:00
Yonghua Li
6c5ab727c1
[BugFix] fix num_requests_running after clear_data ( #4927 )
...
* [BugFix] fix num_requests_running after clear_data
* [fix] fix tasks_list and stop flags not cleared when _free_blocks failed
2025-11-13 13:50:21 +08:00
qwes5s5
a2d06118e1
[Logprobs]Support prompt_logprobs and max_logprobs ( #4897 )
...
* add prompt logprobs
* trigger ci
* fix unitest
* Update fastdeploy/config.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/entrypoints/llm.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/engine/sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/engine/test_sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/engine/test_sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix max_logprobs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-12 19:29:48 +08:00
bukejiyu
b09ebb2813
refactor pt loading ( #4532 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-11 21:30:39 +08:00
SunLei
3098aee05f
[Perf] Support tensor transmission between work and engine with zero-copy to improve efficiency ( #4839 )
...
* feat(zmq): support tensor transmission with zero-copy for improved efficiency
* perf: zmq.send disable copy
* zmq recv data for debug
* convert logprobs tensor to cpu
2025-11-11 15:43:11 +08:00
Yuanle Liu
3dc0ffa46d
[TSP] Support qwen3 moe tsp + cudagraph ( #4871 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen3_moe tsp mode
* fix
* fix
* update
* update
* update
* fix
* support external_rmsnorm
* update
* fix
2025-11-10 23:37:51 +08:00
chenjian
78895e2c7d
[Bug Fix] fix bug for PD EP ( #4823 )
...
* fix bug for PD EP
* fix
* optimize perf for engine worker queue
* fix bug
* fix internode ll two stage
* fix for ci
* fix bug
2025-11-10 15:33:29 +08:00
luukunn
41c0bef964
[BugFix] When the value of "temperature" is 0, adjust it to 1e-06 ( #4900 )
...
* add default temperature value
* add unit test
* update
* update
* add unit test
* update
* fix unit test
2025-11-10 13:24:33 +08:00
kevin
cc34487810
[Feature] support mm disable_chunked ( #4803 )
...
* support mm disable_chunked
* update code
* update code
* update code
2025-11-06 21:32:25 +08:00
Echo-Nie
e4f1267186
bug: fix list to List ( #4818 )
2025-11-06 16:13:12 +08:00
Juncai
08ca0f6aea
[Feature] [PD] add simple router and refine splitwise deployment ( #4709 )
...
* add simple router and refine splitwise deployment
* fix
2025-11-06 14:56:02 +08:00
chenjian
cc8f5312f5
[Feature] Add timestamp for profiler ( #4726 )
...
* [Feature] Add timestamp for profiler
* fix bug for offine inference
* fix for ci
* fix
* fix ci
2025-11-05 12:04:59 +08:00
chen
1c3ca48128
[Feature][Executor] GPU Model Runner Supports prompt_logprobs and max_logprobs ( #4769 )
2025-11-05 10:43:25 +08:00
lzy
af7e0f27f3
supports internode_ll_two_stage ( #4162 )
...
* supports internode_ll_two_stage
* supports internode_ll_two_stage
* supports internode_ll_two_stage
* supports internode_ll_two_stage
* supports D internode_ll_two_stage
* fix codestype
* fix xpu internode_ll_two_stage
* fix xpu internode_ll_two_stage
2025-11-04 16:35:40 +08:00
chenjian
25498efcf3
[Optimize] Support and robust for tpN for PD ( #4595 )
...
* [Optimize] Support and robust for tpN for PD
* fix
* fix
* support dpM tpN for cache messager
* fix
* fix token counter
* fix bug for merge develop
* fix bug
* robust cache messager for v0
2025-11-03 15:38:31 +08:00
chenjian
f83d0cf127
[Feature] Support eplb for fd ( #4599 )
...
* support eplb
* support eplb
---------
Co-authored-by: kevin <chengyf112@gmail.com >
2025-11-03 14:08:15 +08:00
lizexu123
4ac6de9a3c
[Feature] support pooling model runner ( #4590 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen3-embedding
* support qwen3-embedding-0.6b
* fix
* fix bug
* fix test_return_token_ids.py and update enable_thinking
* fix mtp dummy_run
* merge develop
* fix np.float32
* delete FD_DISABLE_CHUNKED_PREFILL and FD_USE_GET_SAVE_OUTPUT_V1
* delete and build_stream_transfer_data
* fix test_update_v1:
* fix
* fix
* update dummy_run post_process
* delete test_update_v1
* fix
* fix dummy_run
* fix model_path
* fix model_path
* fix dummy_run
2025-10-31 22:32:05 +08:00
kevin
c801d31c9c
add checker ( #4711 )
2025-10-31 15:26:35 +08:00
李泳桦
0f75b62de2
[BugFix] Fix profile run in pd-disaggregated deployment ( #4584 )
...
* [fix] fix pd+dp+ep bug
* [fix] fix again
* [ci] fix code style
2025-10-31 14:42:00 +08:00
kevin
64e875b460
[Scheduler] update v1 prefill batch ( #4611 )
...
* update v1 prefill batch
* update code
* update code
2025-10-31 14:03:01 +08:00
ddchenhao66
b87384aa70
[XPU] xpu currently disable prefix cache for VL model ( #4695 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-31 10:36:39 +08:00
chen
b73a78155f
fix --logprobs-mode raw_logits ( #4681 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-30 19:53:42 +08:00
zhouchong
35286ce31a
fix total_block_num init error in worker_process ( #4687 )
2025-10-30 19:53:09 +08:00
kxz2002
7dc9d9885e
[BugFix] fix offline llm chat "enable_thinking" is always "False" ( #4686 )
...
* fix enable_thinking
* recover ernie4_5_vl_processor
2025-10-30 19:45:41 +08:00