gaoziyuan
82e64b13e1
[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop ( #3598 )
...
* [Feature] update ep
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix queue ports idx
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* Update engine.py
* fix ci
* fix some bug in mixed ep
* add server fix and op fix
* rm some log
* fix code style
* ltd fix
* fix
* fix
* fix some bug
* fix bug
* fix bug
* fix style
* Update config.py
* Update splitwise_connector.py
* Update cache_messager.py
* Update __init__.py
* merge and fix
* Update engine.py
* Update common_engine.py
* Update run_ci_xpu.sh
* Update ernie_processor.py
* Update ernie_processor.py
---------
Co-authored-by: ltd0924 <ltd0924@sina.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-08-26 19:59:02 +08:00
lizexu123
32b39620bc
[Code Simplification] remove cum_offsets ( #3410 )
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-18 20:21:25 +08:00
chen
f0f00a6025
[OPs] Universal optimization and Fix early_stop cuda 700 ( #3375 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* delete nonzero
* delete setup_ops_base.py
* check if
* check gcp infer_seed.cpu()
* fix repetition_early_stopper_kernel cuda 700
2025-08-14 22:40:44 +08:00
EnflameGCU
d1a92e3e17
[GCU] Enable gcu CI ( #3190 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [GCU] Update to the latest version
* [GCU] Enable CI
2025-08-13 11:48:24 +08:00
Yuanle Liu
9571c458f0
enhance eos_tokens ( #3274 )
...
* enhance eos_tokens
* update
* update
2025-08-11 14:47:52 +08:00
lizexu123
afff4d37ea
[Feature] support seed parameter ( #3161 )
...
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
2025-08-06 15:20:47 +08:00
lizexu123
b01cfd6007
[BugFix] support real batch_size ( #3109 )
...
* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix
2025-08-05 16:33:54 +08:00
Sunny-bot1
72ef5a9c93
[FIX]fix bad_words when sending requests consecutively ( #3197 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix bad_words
* fix log
* fix log
2025-08-04 05:59:41 -07:00
RAM
d850660872
[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )
...
* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug
2025-07-31 00:09:31 +08:00
bukejiyu
db698bda01
qwen loader ( #3057 )
2025-07-30 19:09:38 +08:00
Sunny-bot1
74aa31d15b
[Feature] support bad_words ( #3055 )
...
* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-07-30 09:31:29 +08:00
Ryan
73cfe1fd37
[SOT] Extend SOT warmup support to new hardware ( #3032 )
...
* add new hardware
* add_sot_warmup4new_hardware
* fix conflict
* rm Optional
2025-07-29 22:45:20 +08:00
begin2023
dd877f38b1
[Perf] Remove unnecessary operations in non-cuda_graph ( #3010 )
...
* [Perf] Remove unnecessary operations in non-cuda_graph
* fix code logic
* use suggestion comment
* reduce function call
* reduce function call
* reduce function call
* reduce function call
2025-07-27 20:38:29 -07:00
YuanRisheng
6ccc10ad47
Unify server-side and model-side Config (Part1) ( #3018 )
...
* move cache config
* fix mtp
2025-07-28 10:51:52 +08:00
EnflameGCU
8c167e130c
[GCU] Update post_process ( #3012 )
2025-07-25 11:03:03 +08:00
EnflameGCU
c40df1802e
[GCU] Update to develop ( #2988 )
2025-07-24 19:30:52 +08:00
ltd0924
3792345c3a
[LLM] update function name ( #2985 )
...
* [LLM] update function name
2025-07-24 15:03:40 +08:00
lizhenyun01
29c3292f02
support c4 attn && fix cache
2025-07-24 12:00:52 +08:00
lizexu123
9b22b8d2c3
delete max-len ( #2959 )
2025-07-23 15:11:39 +08:00
Zero Rains
89a485b69f
[Feature] Support using prefix-caching + cudagraph for inference ( #2924 )
...
* fix the bug in cudagraph+prefix-caching but still have some bug with profile
Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397
* add the signal to make sure cache manager launched
* fix judge condition
* reomove useless control
* update control stream
* update
* fix xpu
* change the do_profile flag
* update
* add new threads to init cache_manager
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-07-22 00:59:45 -07:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
YuanRisheng
101ad33332
[BugFix] Fix Configs ( #2849 )
...
* fix config
* fix config
2025-07-15 19:50:36 -07:00
RAM
0fad10b35a
[Executor] CUDA Graph support padding batch ( #2844 )
...
* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug
2025-07-15 19:49:01 -07:00
YuanRisheng
4c7b8bc458
Simplify the Config code ( #2770 )
...
* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe
2025-07-14 19:50:05 +08:00
Sunny-bot1
f6ad26fc08
fix topp default value ( #2814 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-11 17:10:21 +08:00
Sunny-bot1
240d6236bc
[Fix]fix top_k_top_p sampling ( #2801 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix topk-topp
* update
* add base_non_truncated
2025-07-10 22:35:10 +08:00
littledgg
59071268b6
[Executor] Move forward_meta.py to fastdeploy/model_executor ( #2774 )
...
* Use PEP 563 in attention.py and fix conflict
* merge commit
* Change what was left out last time
2025-07-10 20:36:51 +08:00
chen
d33105baeb
[Feature] Online Chat API Support Return logprobs ( #2777 )
...
* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform
2025-07-10 16:33:40 +08:00
Sunny-bot1
e45050cae3
[Feature] support top_k_top_p sampling ( #2753 )
...
* support top_k_top_p sampling
* fix
* add api param
* add api para
* fix
* fix
* fix
* fix
* fix
* fix
* fix
2025-07-09 20:58:58 -07:00
RAM
03a74995b8
Clear dead code And supplementary notes ( #2757 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* 1.supplementary notes 2.delete dead code
* fix bug of forward meta
* Global modification of forward meta
* fix vl model_runner bug
2025-07-09 16:17:34 +08:00
EnflameGCU
d0f4d6ba3a
[GCU] Support gcu platform ( #2702 )
...
baseline: e7fa57ebae
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-08 13:00:52 +08:00