YuanRisheng
85fbf5455a
[V1 Loader]Ernie VL support loader v1 ( #3494 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* ernie vl support new loader
* add unittest
* fix test
2025-08-22 11:16:57 +08:00
YuanRisheng
09c979f3dd
[V1 Loader] Support Ernie text(moe and dense) ( #3110 )
...
* new loader support 0.3B
* fix weight
* support parallel load
* support parallel load
* fix slice
* support moe
* delete code
* perfect code
* perfect code
2025-08-14 20:25:28 +08:00
Kane2011
b4fef2cf29
[MetaxGPU] Support FastDeploy on metax gpu ( #3241 )
...
* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-08-13 11:11:54 +08:00
Zero Rains
42af0b4b64
[V1 Loader] Support DeepSeekV3(bf16) ( #3294 )
...
* Support new loader for DeepSeekV3(bf16)
* update paddle version
* remove useless attr
2025-08-11 13:39:28 +08:00
gaoziyuan
a799d14df1
[Bugfix] Fix model accuracy in some ops ( #3231 )
...
* fix noaux_tc op
* fix
* update
* fix qk norm
* fix linear for prequant loader
* test
* fix
* fix
* rm some print
* fix noaux_tc op
* test
* Fix the confused enable_early_stop when only set early_stop_config (#3214 )
* fix the confused early_stop_config when only set early_stop_config
* pre-commit
* write a general method
* Add ci case for min token and max token (#3229 )
Co-authored-by: xujing43 <xujing43@baidu.com >
* add some evil cases (#3240 )
* add repitation early stop cases
* add repitation early stop cases
* add bad cases
* add bad cases
* add evil cases
* qwen3_moe (#3084 )
* [Feature] support seed parameter (#3161 )
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
* 【Fix Bug】 修复 fa3 支持集中式bug (#3235 )
* fix fa3 集中式bug
* 增加qknorm参数
* fix qk norm
* fix
* update
* fix linear for prequant loader
* fix
* fix
* rm some print
* fix
* fix moe init weight&scale
* fix moe init weight&scale
---------
Co-authored-by: bukejiyu <395822456@qq.com >
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
Co-authored-by: Zero Rains <linjunlu@zerorains.top >
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com >
Co-authored-by: xujing43 <xujing43@baidu.com >
Co-authored-by: Divano <dddivano@outlook.com >
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com >
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com >
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com >
Co-authored-by: qingqing01 <dangqingqing@baidu.com >
2025-08-08 17:30:37 +08:00
Zero Rains
ce1f353c70
Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend ( #3148 )
...
* w4a8 bug
* fix w4a8 bug
* remove code
* modify the triton backend
* fix ep
* fix the bug with tensor_wise_fp8 in triton backend
* fix the RL
* fix bug by merge
* fix the bug in w4a8
* fix the tensor_wise_fp8 bug
* fix RL
2025-08-08 15:55:47 +08:00
bukejiyu
20839abccf
qwen3_moe ( #3084 )
2025-08-06 14:45:27 +08:00
Yuan Xiaolan
5f56d289a7
fix is_permuted ( #3098 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:58:05 +08:00
Yuan Xiaolan
35935da9e5
support W4A8 EPLB ( #3075 )
2025-07-30 14:34:12 +08:00
Yuan Xiaolan
3214fb5393
support model loading for w4a8 offline quant ( #3064 )
...
支持W4A8 EP 对离线量化权重的load
2025-07-29 21:54:37 +08:00
YuanRisheng
502ee92a0a
Unify server-side and model-side Config (Part3) ( #3047 )
...
* merge model config
* fix arch
* fix rl
2025-07-29 17:07:44 +08:00
Yuan Xiaolan
b1d787a272
[fix] w4a8 model loading and hadamard config ( #3013 )
2025-07-28 18:17:59 +08:00
xiaoxiaohehe001
2970b00dfa
[Feature] Support_eplb ( #2997 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb
* [Feature] support_eplb
* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
bukejiyu
bfeb664ab8
update ( #2978 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
Yuanle Liu
dda4a9f848
rl update ( #2861 )
2025-07-16 00:33:10 -07:00
Yuanle Liu
61b3997b85
refactor rl get_name_mappings_to_training ( #2847 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix
2025-07-15 07:31:42 -07:00
YuanRisheng
4c7b8bc458
Simplify the Config code ( #2770 )
...
* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe
2025-07-14 19:50:05 +08:00
yulangz
be21ef5047
[XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B ( #2765 )
...
* fix no quant xpu moe
* change dir of xpu moe weight only
2025-07-09 15:57:51 +08:00
EnflameGCU
d0f4d6ba3a
[GCU] Support gcu platform ( #2702 )
...
baseline: e7fa57ebae
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-08 13:00:52 +08:00
gaoziyuan
26d5d737dd
【Fearture】support qwen2 some func ( #2740 )
...
* add rl qwen model support
* fix
* fix
2025-07-08 12:03:04 +08:00
Jiang-Jia-Jun
05c670e593
[Sync] Update to latest code ( #2679 )
...
* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72
[LLM] First commit the llm deployment code
2025-06-09 19:20:15 +08:00