Zero Rains
bd30b08521
get org_vocab_size from args ( #3981 )
2025-09-09 15:08:47 +08:00
Divano
1aa16146ba
Update requirements.txt ( #3915 )
2025-09-05 13:51:22 +08:00
ApplEOFDiscord
dac0a00d0f
[BugFix] fix max streaming tokens invalid ( #3774 ) ( #3856 )
...
* Update serving_chat.py
* Update serving_completion.py
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-09-03 17:50:29 +08:00
ltd0924
c5591c45df
[BugFix] fix max streaming tokens invalid ( #3774 )
...
* Update serving_chat.py
* Update serving_completion.py
2025-09-02 21:00:29 +08:00
chen
121ac85d7d
fix ( #3640 )
2025-08-27 14:23:38 +08:00
chen
d233e3c97c
[Precision] Change lm_head layer running in float32 ( #3596 )
...
* support lm_head fp32 bf16 fp16
* delete print
* code check
* check
* check
* code check
* check
* check
2025-08-26 20:20:06 +08:00
chen
2136990144
[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing ( #3536 )
...
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* code check
* code check
* fix tokenizer.decoder(-1), return 'Invalid Token'
* check seq len time shape
* logprob clip inf
* code check
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
2025-08-25 14:11:18 +08:00
kevin
b7890cbe8d
fix uvicorn multi worker error ( #3339 )
2025-08-25 11:24:07 +08:00
chenjian
bc388b65c7
[Bug fix] Fix bug in logprob in release 2.0.4 ( #3445 )
...
* fix bug for scheduler v0
* Fix logprob in release/2.0.4
2025-08-16 21:13:10 +08:00
Jiang-Jia-Jun
71af0ca04a
[BugFix] Fix default log level of paddleformers ( #3378 )
2025-08-15 18:30:00 +08:00
YuBaoku
d66660a0d1
[CI] fix run_ci error in release/2.0.4 ( #3411 )
2025-08-14 22:44:17 +08:00
xiaolei373
f0519aec67
feat(log):add_request_and_response_log ( #3391 )
...
* feat(log):add_request_and_response_log
* [ci] Retrigger
* [ci] Retrigger
2025-08-14 19:12:42 +08:00
gaoziyuan
1f5983290c
fix mapping ( #3321 )
2025-08-12 16:17:59 +08:00
chenjian
c6a133d573
[Bug fix] Fix block num in scheduler v1 for release2.0.4 ( #3314 )
...
* fix bug for scheduler v0
* fix block num setting in scheduler v1
* fix block num setting in scheduler v1
* fix block num setting in scheduler v1
* fix block num setting in scheduler v1
* fix block num setting in scheduler v1
2025-08-11 23:55:45 +08:00
chenjian
4646aff25c
fix bug for scheduler v0 ( #3307 )
2025-08-11 23:55:20 +08:00
chenjian
a84a98b107
fix scheduler bug due to async running ( #3293 )
2025-08-10 13:54:59 +08:00
chenjian
c208086f61
fix scheduler bug for bs=1 ( #3288 )
2025-08-09 12:22:12 +08:00
sg263
ce1d4944e7
merge develop trace FD_START ( #3253 ) ( #3260 )
...
Co-authored-by: shige <shige@baidu.com >
2025-08-07 16:06:58 +08:00
chenjian
5439fb6336
[Cherry-pick] FIx bug for scheduler V1 ( #3167 )
...
* [BUG FIX] Fix bug when preempted request rescheduled (#3080 )
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
* Fix bug for offline inference in scheduler v1 (#3117 )
2025-08-04 17:08:12 +08:00
gaoziyuan
a592d17615
support qwen3 name_mapping ( #3180 )
2025-08-04 16:37:34 +08:00
李泳桦
eca8fc7ca6
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3077 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [doc] update docs related to metadata
* [fix] fix ci consistency test error with reasoning parser
* [fix] cancel enable_thinking default value
2025-07-30 19:25:39 +08:00
李泳桦
0463797fc2
[feat] add disable_chat_template in chat api as a substitute for previous raw_request ( #3023 )
...
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request
* [fix] pre-commit code check
2025-07-25 20:57:06 +08:00
Jiang-Jia-Jun
0ab8645fc4
Update setup.py
2025-07-25 10:27:51 +08:00
xiaoxiaohehe001
2970b00dfa
[Feature] Support_eplb ( #2997 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb
* [Feature] support_eplb
* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
littledgg
f37d00e856
[Model] Provide clearer error for missing KV cache quantization scales ( #3007 )
2025-07-24 20:15:00 +08:00
EnflameGCU
c40df1802e
[GCU] Update to develop ( #2988 )
2025-07-24 19:30:52 +08:00
Yzc216
980126b83a
[Feature] multi source download ( #3005 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
2025-07-24 17:42:09 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
Zhang Yulong
5151bc92c8
Update benchmark tools ( #3004 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* update benchmark tools
* update benchmark tools
2025-07-24 15:19:23 +08:00
ltd0924
f935d6f862
[BugFix] fix multinode deployment ( #2977 )
2025-07-24 15:04:04 +08:00
ltd0924
3792345c3a
[LLM] update function name ( #2985 )
...
* [LLM] update function name
2025-07-24 15:03:40 +08:00
Yzc216
e14587a954
[Feature] multi-source download ( #2986 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
2025-07-24 14:26:37 +08:00
YUNSHEN XIE
87a2f4191d
add ci reuse action ( #2968 )
...
* add ci reuse action
* fix code formatting
* update
2025-07-24 14:24:10 +08:00
xiaoxiaohehe001
2c0ff068e2
[Fix] fix mm ep empty run ( #2999 )
2025-07-24 14:15:55 +08:00
xiegegege
e3a843f2c5
[benchmark] add quantization for benchmark yaml ( #2995 )
2025-07-24 13:26:34 +08:00
lizhenyun01
6235ef3881
fix chunk_prefill
2025-07-24 12:00:52 +08:00
lizhenyun01
29c3292f02
support c4 attn && fix cache
2025-07-24 12:00:52 +08:00
lizexu123
832d25334a
[Code Simplification] fix init_distributed_environment() ( #2982 )
2025-07-24 11:43:28 +08:00
bukejiyu
bfeb664ab8
update ( #2978 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chenjian
85a78d695d
[Feature] Support block scheduler v1 for FD ( #2928 )
...
* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-23 20:31:31 +08:00
Zero Rains
ca0f71bd39
polish code for prefill restrictions ( #2991 )
2025-07-23 05:10:14 -07:00
chen
172e69fe17
FA3 fix bug ( #2987 )
2025-07-23 19:07:43 +08:00
zhink
1272c7ce98
Fix performance degradation bug of custom_all_reduce ( #2981 )
2025-07-23 17:45:44 +08:00
Zero Rains
850c9d98d4
[BugFix] Add prefill restrictions for chunked_prefill+VL ( #2983 )
2025-07-23 01:45:57 -07:00
freeliuzc
a39a67334c
fix mtp bug in pd-split mode ( #2970 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-23 15:31:16 +08:00
YuBaoku
6c4cfd9359
[CI] add codestyle_check action ( #2972 )
...
* [CI] add codestyle_check action
* [CI] Integrate codestyle check via pre-commit in GitHub Actions
2025-07-23 15:21:56 +08:00
lizexu123
9b22b8d2c3
delete max-len ( #2959 )
2025-07-23 15:11:39 +08:00
Jiang-Jia-Jun
5b59a97030
Update README.md
2025-07-23 13:52:14 +08:00
Jiang-Jia-Jun
475dc6d84e
Update README.md
2025-07-23 13:47:31 +08:00
chen
ad202272ed
【Infer】Improve the performance block_wise_fp8 of triton_moe_backend ( #2942 )
2025-07-23 13:02:50 +08:00