Zero Rains
|
e37e86b3b8
|
[V1 Loader]support param create and load for wint2 and xpu backend (#3581)
* support wint2 backend'
* [V1 Loader]support param create and load for wint2 and xpu backend
* update weight shape name
* update
* update
* update baseline.txt
* update model name
* update baseline.txt
* fix codestyle
* remove debug coode
|
2025-08-28 09:49:36 +08:00 |
|
李泳桦
|
b2afdf4fc6
|
[fix] qwen output inconsistency when top_p=0 (#3634)
* [fix] qwen output inconsistency when top_p=0
* [fix] remove decode pre_id code
|
2025-08-27 17:16:23 +08:00 |
|
Yuanle Liu
|
cbce94a00e
|
rename ernie_xxx to ernie4_5_xxx (#3621)
* rename ernie_xxx to ernie4_5_xxx
* ci fix
|
2025-08-26 19:29:27 +08:00 |
|
Sunny-bot1
|
c68c3c4b8b
|
[Feature] bad words support v1 scheduler and specifiy token ids (#3608)
* support bad_words_token_ids
* docs
* fix test
* fix
* bad words support kvcache v1 and token ids
* fix
|
2025-08-25 20:14:51 -07:00 |
|
Kane2011
|
2ae7ab28d2
|
[MetaxGPU] adapt to the latest fastdeploy on metax gpu (#3492)
|
2025-08-25 17:44:20 +08:00 |
|
Kane2011
|
b4fef2cf29
|
[MetaxGPU] Support FastDeploy on metax gpu (#3241)
* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com>
|
2025-08-13 11:11:54 +08:00 |
|