kevin
954a145d57
[Optimization] support mm prefill batch ( #5313 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mm prefill batch
* update code
* update code
* update code
* update code
* fix encoder cache bug
* update code
* update code
* fix bug
* fix paddle ocr bug
* fix xpu bug
* update code
2025-12-11 22:21:14 +08:00
chen
747b16e021
[BugFix] Fix MTP no logprobs when enable_logprob ( #5499 )
2025-12-11 19:57:22 +08:00
Ryan
e58fed3665
[Graph Optimization][BugFix][CI] Fix 0size bug && add unitest ( #5495 )
2025-12-11 16:25:26 +08:00
chen
6289cbc434
[BugFix] fix hung when n>1 and --enable-logprob ( #5492 )
2025-12-11 10:46:27 +08:00
lizexu123
95eab9f9ee
[Feature] support stop_token_ids ( #5399 )
...
* support stop_token_ids
* fix
* delete chinese
* support both
* delete print
2025-12-09 17:49:12 +08:00
lizexu123
b0cf2c4b7a
[Feature] Support prefill batch inference for pooling models. ( #5436 )
...
* fix multi-inputs
* fix threshold
* fix threshold
* fix
* support multi-batch
* add tests
* fix test
* test
* fix
2025-12-09 16:21:00 +08:00
Nyakku Shigure
e1c4a12e34
[Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part ( #5223 )
...
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-12-09 14:37:00 +08:00
chen
76649b45c1
[Optimization] compulte real max_logprobs in batch ( #5430 )
2025-12-09 14:15:05 +08:00
Daci
2f208db4e9
[Feature] Multimodal Model P / D Separation ( #5323 )
...
* RouterArgs port str -> int
* fix race condition [is_fetching] causing multiple fetch requests
* bugfix: Delete duplicate input_ids tensor creation
* mm pd splitwise json -> pickle5; multimodal_inputs only pos id;
debuglog f to %s
* fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ...
* update cr
* Apply suggestions from code review
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* pre-commit fix
* rm multimodal_inputs deepcopy & fix rdma_cache_transfer.py tpsize=0
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-09 10:47:42 +08:00
周周周
438c9f785a
[BugFix] 0 not into cuda graph to save memory ( #5426 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-08 16:47:44 +08:00
RAM
b2908b8e82
[New][RL] Support Rollout Routing Replay ( #5405 )
...
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
* Revert "Revert "[RL] Support Rollout Routing Replay (#5321 )" (#5402 )"
This reverts commit c45e064f3d .
* Fix XPU and NPU bug
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-12-05 22:06:26 +08:00
Jiang-Jia-Jun
c45e064f3d
Revert "[RL] Support Rollout Routing Replay ( #5321 )" ( #5402 )
...
This reverts commit 96d2d4877b .
2025-12-05 20:19:39 +08:00
周周周
94c57e4175
[BugFix]remove _execute_empty_input ( #5396 )
2025-12-05 20:19:01 +08:00
RAM
96d2d4877b
[RL] Support Rollout Routing Replay ( #5321 )
...
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-12-05 20:01:33 +08:00
Ayakouji
a8f8791668
[Optimization] Qwen2.5-VL support multi-batch prefill ( #5269 )
...
* update
* fix
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix dict access
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-12-05 18:22:39 +08:00
Longzhi Wang
5cd17fd662
[Models] Add forward_meta to moe models' forward function ( #5138 )
...
* [Models] Add forward_meta to moe models' forward function
* fix missing param
* fix
* fix
* fix forward_meta
* fix test and remove chunked MoE releated in config
* fix test
* fix
* fix
2025-12-04 13:26:58 +08:00
lizexu123
946025480e
[Bug fix] fix pooling models ( #5358 )
...
* fix
* fix
* fix test
* fix gpu_model_runner
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-04 11:06:30 +08:00
lizexu123
c563eca791
[Feature] support reward model ( #5301 )
...
* Your commit message here
* add test
* update develop
* support reward
* support enable_chunk_prefill
* support bingfa
* support convert is reward
* update test
* delete print
* fix enable_thinking
* add document
* fix place
* fix test
* fix
* support enable_prefix_caching
* add no-enable_prefix-caching test
* fix
* support enable_prefix_caching
* delete print
* fix document
* fix
* fix test
* fix document and delete chinese
* udpate
* enable_thinking
* fix test
2025-12-02 14:55:31 +08:00
K11OntheBoat
2e1680838f
[PD Disaggregation] Support PD deployment of DeepSeekv3. ( #5251 )
...
* Support deepseekv3 cache transfer for PD deploy
* clean some log info
---------
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-12-02 14:11:50 +08:00
Longzhi Wang
add524d80c
[Feature] support chunked moe ( #4575 )
...
* [Feature] support chunked moe
* update
* update
* fix and add test
* update
* fix conflict and modity test
* fix fused_moe
* fix fused_moe
* fix docstring
* fix
* fix typo
* fix test
* fix
* fix
* fix test
* fix test
2025-12-01 15:17:18 +08:00
周周周
73886204d4
[Others] clean code ( #5235 )
2025-11-28 15:40:49 +08:00
GoldPancake
cfc5b0ccf9
[BugFix] fix mtp logprob bugs in chunk prefill ( #5244 )
...
* fix mtp logprob bugs in chunk prefill
* fix
* fix
2025-11-27 11:31:29 +08:00
Daci
f25ee3a26f
[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 ( #5140 )
...
* enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-26 10:22:35 +08:00
kevin
c068a4f642
[Feature] dyc8 support prefixcache ( #5125 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* dyc8 support prefixcache
* fix cache_trans test case
* update code
2025-11-21 19:46:26 +08:00
Yonghua Li
43097a512a
[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol ( #5132 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix v1 scheduler profile run for append attention in prefill node
* [fix] skip send_signal if kv signal not inited for gpu and xpu
* [fix] extend fix to flash_attn & mla_attn
* [fix] fix v1 pd run in ipc transfer protocol
* [ci] add test for v1 pd profile run using ipc transfer protocol
* [style] fix code style check
* [style] fix code style again
* [fix] fix profile run
* [update] remove --num-gpu-blocks-override in example script
* [chore] rename forward_meta is_profiling to is_dummy_or_profile_run
2025-11-20 21:39:22 +08:00
K11OntheBoat
4a7739ec0b
Fix dummy run when use PD Disaggregation with EP inference. ( #5112 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-11-18 21:09:30 +08:00
Daci
5fc12eddfe
[Optimization] xgrammar async compile, multi thread, speed up ( #4835 )
...
* xgrammar async compile, multi thread, speed up
* fix test_sampler.py & pre-commit err
* add redis version check && fix request.llm_engine_recv_req_timestamp
* xgrammar prefill & decode & v0
* fix test_gpu_prompt_logprobs.py
* add test_guided_decoding.py
* Update fastdeploy/scheduler/splitwise_scheduler.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/model_executor/guided_decoding/xgrammar_backend.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/model_executor/guided_decoding/xgrammar_backend.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix torch xgrammar unittest env
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-14 18:05:26 +08:00
周周周
51b1f13547
[Executor]move batch_id_per_token ( #4853 )
2025-11-14 15:38:48 +08:00
ltd0924
5bf48de999
[KVCache] support unified cache backend ( #4903 )
...
* [Feature] support unified cache backend
* fix
* fix
* fix
* fix
* Update metax_model_runner.py
* fix
* update
* Update test_moba_attention_backend.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-11-12 14:54:52 +08:00
周周周
da6b4c10e5
[ATTENTION] make buffer alloc as a function ( #4945 )
2025-11-11 19:17:08 +08:00
K11OntheBoat
76be598129
replace paddle.max by numpy to avoid useless error log ( #4893 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-11-11 16:28:05 +08:00
SunLei
3098aee05f
[Perf] Support tensor transmission between work and engine with zero-copy to improve efficiency ( #4839 )
...
* feat(zmq): support tensor transmission with zero-copy for improved efficiency
* perf: zmq.send disable copy
* zmq recv data for debug
* convert logprobs tensor to cpu
2025-11-11 15:43:11 +08:00
chenjian
78895e2c7d
[Bug Fix] fix bug for PD EP ( #4823 )
...
* fix bug for PD EP
* fix
* optimize perf for engine worker queue
* fix bug
* fix internode ll two stage
* fix for ci
* fix bug
2025-11-10 15:33:29 +08:00
chen
80aedb82ce
[BugFix] max_lgprobes=-1 maps to ori_vocab_size ( #4884 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* -1 ori_vobal_size
* check
* check
* check
* revert config.py
2025-11-07 22:15:40 +08:00
周周周
69fa741763
remove seq_lens_this_time ( #4821 )
2025-11-06 11:06:28 +08:00
K11OntheBoat
62dfad4a5f
[PD Disaggregation] Support Qwen3-MoE use PD + EP inference. ( #4691 )
...
support Qwen-MoE PD/EP
2025-11-06 10:32:15 +08:00
周周周
876e4a8935
remove input_ids from ForwardMeta ( #4793 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-05 11:55:51 +08:00
chen
1c3ca48128
[Feature][Executor] GPU Model Runner Supports prompt_logprobs and max_logprobs ( #4769 )
2025-11-05 10:43:25 +08:00
freeliuzc
11398790d3
[Speculative Decoding][MTP]Support attn mask offset ( #4641 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [MTP]Merge support attn (#4591 )
* support mask_offset in speculate decoding
* fix dummpy run output
* add unit test
* fix unit test import
* support attn_mask_offset in mtp mode
* add update_attn_mask op
* fix unit test && fix code-style
2025-11-03 10:08:01 +08:00
lizexu123
4ac6de9a3c
[Feature] support pooling model runner ( #4590 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen3-embedding
* support qwen3-embedding-0.6b
* fix
* fix bug
* fix test_return_token_ids.py and update enable_thinking
* fix mtp dummy_run
* merge develop
* fix np.float32
* delete FD_DISABLE_CHUNKED_PREFILL and FD_USE_GET_SAVE_OUTPUT_V1
* delete and build_stream_transfer_data
* fix test_update_v1:
* fix
* fix
* update dummy_run post_process
* delete test_update_v1
* fix
* fix dummy_run
* fix model_path
* fix model_path
* fix dummy_run
2025-10-31 22:32:05 +08:00
kevin
64e875b460
[Scheduler] update v1 prefill batch ( #4611 )
...
* update v1 prefill batch
* update code
* update code
2025-10-31 14:03:01 +08:00
RAM
cd3b7cc392
[Graph Optimization] Add the CUDAGraph usage switch for Draft Model ( #4601 )
...
* add draft model using cudagraph switch
* set default as false
* capture draft model in ci
* fix bug
2025-10-30 11:44:50 +08:00
李泳桦
a012e3608b
[Feature] support logits processors ( #4515 )
...
* [feat] provide an interface for logits processors and a builtin LogitBiasLogitsProcessor
* [chore] fix code style
* [fix] add unit test & fix existing bugs
* [feat] add engine/worker arg --logits-processors
* [fix] redefine user args as logits_processors_args and fix some bugs
* [fix] fix test_sampler
* Update fastdeploy/model_executor/logits_processor/builtin.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/model_executor/logits_processor/__init__.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/model_executor/test_logits_processor.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [fix] fix typo
* Update fastdeploy/engine/sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [fix] fix bracelet
* [chore] redefine logits processor interface: pass the entire share_inputs into LP, do not copy share_inputs and logits
* [doc] add docs
* [fix] fix logit bias processor not applied when decoding is too fast & add docs and tests
* [fix] fix redundant code
* [feat] skip apply() if no bias is specified
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-10-29 00:08:53 +08:00
RAM
25a983ba9c
1.fix the bug of draft model with ep 2.fix sampler bug ( #4589 )
2025-10-27 17:47:34 +08:00
kevin
8aab4e367f
[Feature] mm support prefix cache ( #4134 )
...
* support mm prefix caching
* update code
* fix mm_hashes
* support encoder cache
* add encoder cache
* update code
* update encoder cache
* fix features bug
* fix worker bug
* support processor cache, need to optimize yet
* refactor multimodal data cache
* update code
* update code
* update v1 scheduler
* update code
* update code
* update codestyle
* support turn off processor cache and encoder cache
* update pre-commit
* fix code
* solve review
* update code
* update code
* update test case
* set processor cache in GiB
* update test case
* support mm prefix caching for qwen model
* fix code style check
* update pre-commit
* fix unit test
* fix unit test
* add ci test case
* fix rescheduled bug
* change text_after_process to prompt_tokens
* fix unit test
* fix chat template
* change model path
* [EP] fix adapter bugs (#4572 )
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* fix v1 hang bug (#4573 )
* fix import image_ops error on some platforms (#4559 )
* [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558 )
* add collect-env
* del files
* [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578 )
* add new branch for sot
* reorder
* fix batch bug
* [XPU]Moe uses a new operator (#4585 )
* [XPU]Moe uses a new operator
* [XPU]Moe uses a new operator
* update response
* [Feature] Support Paddle-OCR (#4396 )
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
* [DataProcessor] add reasoning_tokens into usage info (#4520 )
* add reasoning_tokens into usage info initial commit
* add unit tests
* modify unit test
* modify and add unit tests
* fix unit test
* move steam usage to processor
* modify processor
* modify test_logprobs
* modify test_logprobs.py
* modify stream reasoning tokens accumulation
* fix unit test
* perf: Optimize task queue communication from engine to worker (#4531 )
* perf: Optimize task queue communication from engine to worker
* perf: get_tasks to numpy
* perf: get_tasks remove to_numpy
* fix: request & replace ENV
* remove test_e2w_perf.py
* fix code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Clean up ports after processing results (#4587 )
* [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593 )
* [Others] api server exits when worker process is dead (#3271 )
* [fix] fix terminal hangs when worker process is dead
* [chore] change sleep time of monitor
* [chore] remove redundant comments
* update docs
---------
Co-authored-by: ApplEOFDiscord <wwy640130@163.com >
Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: yinwei <yinwei_hust@163.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com >
Co-authored-by: Ryan <zihaohuang@aliyun.com >
Co-authored-by: yyssys <atyangshuang@foxmail.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com >
Co-authored-by: SunLei <sunlei5788@gmail.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com >
2025-10-27 17:39:51 +08:00
ming1753
e4e3cede7f
[Feature] Support Paddle-OCR ( #4396 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
2025-10-24 23:34:30 +08:00
RAM
775edcc09a
[Executor] Default use CUDAGraph ( #3594 )
...
* add start intercept
* Adjustment GraphOptConfig
* pre-commit
* default use cudagraph
* set default value
* default use cuda graph
* pre-commit
* fix test case bug
* disable rl
* fix moba attention
* only support gpu
* Temporarily disable PD Disaggregation
* set max_num_seqs of test case as 1
* set max_num_seqs and temperature
* fix max_num_batched_tokens bug
* close cuda graph
* success run wint2
* profile run with max_num_batched_tokens
* 1.add c++ memchecker 2.success run wint2
* updatee a800 yaml
* update docs
* 1. delete check 2. fix plas attn test case
* default use use_unique_memory_pool
* add try-except for warmup
* ban mtp, mm, rl
* fix test case mock
* fix ci bug
* fix form_model_get_output_topp0 bug
* fix ci bug
* refine deepseek ci
* refine code
* Disable PD
* fix sot yaml
2025-10-21 14:25:45 +08:00
Yuanle Liu
cef3164c3b
Optimizing the performance of think length limit using custom operators ( #4279 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete impl
* delete min_length&max_length
* support limit thinking content strategy
* fix
* fix
* fix
* update
* fix set_value_by_flags_and_idx
* fix
* fix
* fix
* fix
* update
* fix
* fix
* fix typo
* fix ci
* fix
* fix
* support mtp
* fix
* fix
* update
* update
2025-10-20 21:09:13 +08:00
GoldPancake
47595a2480
[Feature] support mtp logprob ( #4464 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mtp logprob
* fix unitest
2025-10-20 15:18:12 +08:00
RAM
528c55776e
[Graph Optimization][Speculative Decoding] Fix the bug of CUDAGraph + MTP + EP ( #4456 )
...
* Fix MTP dummy run bug
* Target Model and Draft Model using the same flag
* In mtp replace use_cudagraph as step_use_cudagraph
2025-10-20 10:38:55 +08:00