yangjianfengo1
4325b737e7
【FIX】Change the name of sparse attn from moba to plas ( #4006 ) ( #4076 )
...
* 【FIX】Change the name of sparse attn from moba to plas (#4006 )
* 更新文档
* 【docs】 update readme (#4000 )
* 更新文档
* update readme
* update docs
* 【FIX】Change the name of sparse attn from moba to plas (#3845 )
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
* code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* fix max_num_seqs
* fix test load attn
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-23 10:26:40 +08:00
lizexu123
c86945ef49
[Feature] support pool ( #3827 )
...
* support pool
* update pooling
* add pooler_config and check
* update
* support AutoWeightsLoader load weight
* fix
* update
* delete print
* update pre-commit
* fix
* fix xpu
* fix ModelRegistry->model_registry
* fix Copilot review
* fix pooler.py
* delete StepPooler
* fix abstract
* fix default_loader_v1
* fix Pre Commit
* support torch qwen3 dense
* add test and fix torch-qwen
* fix
* fix
* adapter ci:
* fix review
* fix pooling_params.py
* fix
* fix tasks.py 2025
* fix print and logger
* Modefy ModelRegistry and delete AutoWeightsLoader
* fix logger
* fix test_embedding
* fix ci bug
* ernie4_5 model_registry
* fix test
* support Qwen3-Embedding-0.6B tp=1 load
* fix extra code
* fix
* delete fix vocab_size
* delete prepare_params_dict
* fix:
2025-09-22 14:09:09 +08:00
co63oc
17a27170bc
fix typos ( #4093 )
2025-09-15 18:33:30 +08:00
freeliuzc
46911f903d
[MTP]update hybrid-mtp-with-ngram ( #4047 )
2025-09-15 17:13:31 +08:00
Jiang-Jia-Jun
c60adf4281
Revert "【FIX】Change the name of sparse attn from moba to plas ( #3845 )" ( #4001 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
This reverts commit e31c8f7336
.
2025-09-09 11:08:23 +08:00
yangjianfengo1
e31c8f7336
【FIX】Change the name of sparse attn from moba to plas ( #3845 )
...
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
2025-09-09 10:56:50 +08:00
yangjianfengo1
472402bf4e
Update sparse attn documentation ( #3954 )
...
* 更新文档
* 更新文档
* 更新文档
* 更新文档
2025-09-08 12:23:18 +08:00
ltd0924
7643e6e6b2
[Docs] add data parallel ( #3883 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Docs] add data parallel
* [Docs] add data parallel
2025-09-04 20:33:50 +08:00
kevin
1908465542
[Feature] mm and thinking model support structred output ( #2749 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* mm support structured output
* update code
* update code
* update format
* update code
* update code
* add enable_thinking default
* update code
* add structured_outputs test case
* add ci install xgrammar
* add ci timeout time
* update test for structured_outputs
* update code
* add error traceback info
* update error msg
* update structred output code
* update code
* update code
* update config
* update torch version
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-02 16:21:09 +08:00
co63oc
d6369b4d51
fix typos ( #3684 )
2025-09-01 17:50:17 +08:00
Sunny-bot1
c68c3c4b8b
[Feature] bad words support v1 scheduler and specifiy token ids ( #3608 )
...
* support bad_words_token_ids
* docs
* fix test
* fix
* bad words support kvcache v1 and token ids
* fix
2025-08-25 20:14:51 -07:00
zhink
df7c31012b
Modified to support custom all reduce by default ( #3538 )
2025-08-22 16:59:05 +08:00
Zhang Yulong
33ff0bfe38
Update disaggregated.md ( #3495 )
...
修复文档错误
2025-08-20 19:39:18 +08:00
RAM
154308102e
[Docs]Updata docs of graph opt backend ( #3442 )
...
* Updata docs of graph opt backend
* update best_practices
2025-08-15 21:30:32 +08:00
ltd0924
5a84324798
[Doc] Add multinode deployment documents ( #3417 )
...
* Create multi-node_deployment.md
* Create multi-node_deployment.md
* Update mkdocs.yml
2025-08-15 10:37:04 +08:00
Sunny-bot1
789dc67ff7
[Docs]fix sampling docs ( #3113 )
...
* fix sampling docs
* fix sampling docs
* update
2025-08-11 20:42:27 +08:00
gaoziyuan
4021d66ea5
【Feature】add fd plugins && rm model_classes ( #3123 )
...
* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci
2025-08-03 19:53:20 -07:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
Jiang-Jia-Jun
66304cf921
Update sampling.md
2025-07-31 15:02:57 +08:00
JYChen
bd29b2aaca
add stop_seqs doc ( #3090 )
2025-07-30 20:36:18 +08:00
李泳桦
b242150f94
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3058 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
Zero Rains
4dc130c5a9
[Doc] add repetition early stopping doc ( #3078 )
...
* add repetition early stop doc
* add the early_stop.md
2025-07-29 22:01:57 -07:00
lddfym
5ca684c762
update doc: load_balance.md ( #3008 )
...
* update doc of load_balance
* update doc: load_balance.md
2025-07-30 10:27:56 +08:00
Sunny-bot1
9c962343f2
[Docs] add sampling docs ( #2973 )
...
* add sampling docs
* add minp sampling docs
* update sample docs
* update
* update
* add bad words desc
* update
2025-07-30 02:24:16 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
LiqinruiG
b38823bc66
modify reasoning_output docs ( #2696 )
2025-07-04 11:30:02 +08:00
freeliuzc
2b7f74d427
fix docs ( #2669 )
...
Co-authored-by: liuzichang01 <liuzichang01@baidu.com >
2025-07-01 18:02:44 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00