Daci
|
a2ab1f4462
|
[BugFix] fix mix splitwise pickle load error (#5488)
* RouterArgs port str -> int
* fix race condition [is_fetching] causing multiple fetch requests
* bugfix: Delete duplicate input_ids tensor creation
* mm pd splitwise json -> pickle5; multimodal_inputs only pos id;
debuglog f to %s
* fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ...
* update cr
* Apply suggestions from code review
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* pre-commit fix
* rm multimodal_inputs deepcopy & fix rdma_cache_transfer.py tpsize=0
* fix mix splitwise pickle dump
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-10 19:05:50 +08:00 |
|
Juncai
|
83ea9646f9
|
[PD Disaggregation] Unify the disaggregation info and the pd communication (#5438)
* Unify the disaggregation info and the pd communication
* up
* up
* fix
* fix conflict
* fix unittest
|
2025-12-09 14:44:59 +08:00 |
|
Daci
|
2f208db4e9
|
[Feature] Multimodal Model P / D Separation (#5323)
* RouterArgs port str -> int
* fix race condition [is_fetching] causing multiple fetch requests
* bugfix: Delete duplicate input_ids tensor creation
* mm pd splitwise json -> pickle5; multimodal_inputs only pos id;
debuglog f to %s
* fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ...
* update cr
* Apply suggestions from code review
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* pre-commit fix
* rm multimodal_inputs deepcopy & fix rdma_cache_transfer.py tpsize=0
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-09 10:47:42 +08:00 |
|
Juncai
|
80efe98f8d
|
[PD Disaggregation] Add timestamp for analyzing splitwise deployment (#5317)
* Add timestamp for analyzing splitwise deployment
* up
* up
* up
* up
* up
* up
* fix format
* fix
|
2025-12-08 10:08:44 +08:00 |
|
Juncai
|
7f4fff4d1e
|
fix get_request from scheduler (#5369)
|
2025-12-04 21:59:10 +08:00 |
|
Daci
|
5fc12eddfe
|
[Optimization] xgrammar async compile, multi thread, speed up (#4835)
* xgrammar async compile, multi thread, speed up
* fix test_sampler.py & pre-commit err
* add redis version check && fix request.llm_engine_recv_req_timestamp
* xgrammar prefill & decode & v0
* fix test_gpu_prompt_logprobs.py
* add test_guided_decoding.py
* Update fastdeploy/scheduler/splitwise_scheduler.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update fastdeploy/model_executor/guided_decoding/xgrammar_backend.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update fastdeploy/model_executor/guided_decoding/xgrammar_backend.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix torch xgrammar unittest env
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-11-14 18:05:26 +08:00 |
|
kevin
|
f72be7a2c8
|
[BUG] fix ep bug (#4275)
* fix ep bug
* update code
* update code
* update code
* [BugFix] fix config bugs (#4370)
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* Update expert_service.py
* Update expert_service.py
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* update code
---------
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
|
2025-10-16 16:46:40 +08:00 |
|
kevin
|
67298cf4c0
|
add error traceback info (#3419)
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info
* update error msg
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
|
2025-08-19 19:32:04 +08:00 |
|
Zero Rains
|
25698d56d1
|
polish code with new pre-commit rule (#2923)
|
2025-07-19 23:19:27 +08:00 |
|
Jiang-Jia-Jun
|
05c670e593
|
[Sync] Update to latest code (#2679)
* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
|
2025-07-03 15:43:53 +08:00 |
|
Jiang-Jia-Jun
|
92c2cfa2e7
|
Sync v2.0 version of code to github repo
|
2025-06-29 23:29:37 +00:00 |
|