Commit Graph

11 Commits

Author SHA1 Message Date
chenjian
5439fb6336 [Cherry-pick] FIx bug for scheduler V1 (#3167)
* [BUG FIX] Fix bug when preempted request rescheduled (#3080)

* Fix bug when preempted request rescheduled

* Fix bug when preempted request rescheduled

* Fix bug when preempted request rescheduled

* Fix bug for offline inference in scheduler v1 (#3117)
2025-08-04 17:08:12 +08:00
李泳桦
0463797fc2 [feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3023)
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request

* [fix] pre-commit code check
2025-07-25 20:57:06 +08:00
chenjian
85a78d695d [Feature] Support block scheduler v1 for FD (#2928)
* Support FD block scheduler v1

* Support FD block scheduler v1

* Support FD block scheduler v1

* Fix according to copilot review

* Fix according to review

* Remove is_dummy

* Fix bug when real_bsz=1

* Fix infer first token cost time

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-23 20:31:31 +08:00
李泳桦
8a619e9db5 [Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body (#2940)
* [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body

* [fix] return_token_ids not working in curl request

* [test] improve some test cases of return_token_ids and prompt_token_ids

* [fix] the server responds ok even if request.messages is an empty list
2025-07-21 19:31:14 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
ltd0924
4b14dca1d6 [LLM] delete fixed slots (#2893)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 19:19:54 +08:00
ltd0924
d245d1ca6c [LLM] support send batch data and aggregate data (#2860)
* [LLM] support send batch data and aggregate data

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] update
2025-07-16 23:42:20 +08:00
sg263
42b80182e0 [Trace] add opentelemetry (#2852)
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-16 15:33:25 +08:00
chen
d33105baeb [Feature] Online Chat API Support Return logprobs (#2777)
* online chat support logprobs

* check xpu

* check vl_gpu_model_runner and xpu_model_runner

* get_worker() check platform
2025-07-10 16:33:40 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00