This website requires JavaScript.
Explore
Help
Sign In
apps
/
FastDeploy
Watch
1
Star
0
Fork
0
You've already forked FastDeploy
mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced
2025-12-24 13:28:13 +08:00
Code
Issues
Actions
2
Packages
Projects
Releases
Wiki
Activity
Files
c415885a94da83787f8ec9a0af4e9ef00e87c410
FastDeploy
/
custom_ops
/
gpu_ops
History
chen
7c1fd19f0f
[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (
#4238
)
2025-09-24 16:39:51 +08:00
..
append_attn
FIX] Fix CUDA error(700): 'cudaErrorIllegalAddress' in CascadeAppendWriteCacheKVQKV cache_kernel(). Continue when batch_id_per_token[token_idx] is default value -1. (
#4218
)
2025-09-24 14:08:49 +08:00
common
…
custom_all_reduce
fix typos (
#4176
)
2025-09-22 14:27:17 +08:00
cutlass_extensions
…
cutlass_kernels
…
flash_mask_attn
…
fp8_gemm_with_cutlass
…
glog
…
int8_gemm_with_cutlass
…
machete
…
mla_attn
…
moba_attn
…
moe
…
quantization
[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (
#4238
)
2025-09-24 16:39:51 +08:00
sample_kernels
…
speculate_decoding
…
w4afp8_gemm
…
wfp8afp8_sparse_gemm
…
append_attention.cu
…
beam_search_softmax.cu
…
cpp_extensions.cc
Fix noaux_tc cuda Error 700 in CUDAGraph (
#4174
)
2025-09-23 18:41:33 +08:00
cuda_multiprocess.h
…
dequant_int8.cu
…
enforce_generation.cu
…
env.h
…
extract_text_token_output.cu
…
fused_get_rotary_embedding.cu
…
fused_hadamard_quant_fp8.cu
…
fused_rotary_position_encoding.cu
…
gather_idx.cu
…
get_data_ptr_ipc.cu
…
get_img_boundaries.cc
…
get_mm_split_fuse.cc
…
get_output_ep.cc
…
get_output_msg_with_topk.cc
…
get_output.cc
…
get_padding_offset_system.cu
…
get_padding_offset.cu
…
get_position_ids_and_mask_encoder_batch.cu
…
helper.cu
…
helper.h
[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (
#4238
)
2025-09-24 16:39:51 +08:00
init_signal_layerwise.cc
…
ipc_sent_key_value_cache_by_remote_ptr.cu
…
merge_prefill_decode_output.cu
…
msg_utils.h
…
multi_head_latent_attention.cu
…
ngram_mask.cu
…
noaux_tc.cu
Fix noaux_tc cuda Error 700 in CUDAGraph (
#4174
)
2025-09-23 18:41:33 +08:00
noauxtc_kernel.h
Fix noaux_tc cuda Error 700 in CUDAGraph (
#4174
)
2025-09-23 18:41:33 +08:00
open_shm_and_get_meta_signal.cc
…
per_token_quant_fp8.cu
…
read_data_ipc.cu
…
read_ids.py
…
read_temp_ids.py
…
rebuild_padding.cu
…
recover_decode_task.cu
…
remote_cache_kv_ipc.cc
…
remote_cache_kv_ipc.h
…
save_output_msg_with_topk.cc
…
save_with_output_msg.cc
…
save_with_output_msg.h
…
save_with_output.cc
…
scaled_gemm_f8_i4_f16_gemm.cu
…
scaled_gemm_f8_i4_f16_weight_quantize.cu
…
seqs2seqs.cu
…
set_data_ipc.cu
…
set_flags.cu
…
set_mask_value.cu
…
set_value_by_flags_and_idx.cu
…
share_external_data.cu
…
step_reschedule.cu
…
step_system_cache.cu
…
step.cu
…
stop_generation_multi_ends.cu
…
stop_generation.cu
…
swap_cache_batch.cu
…
swap_cache.cu
…
system2group.cu
…
text_image_gather_scatter.cu
…
text_image_index_out.cu
…
token_penalty_multi_scores.cu
…
token_penalty_only_once.cu
…
token_transfer.hpp
…
transfer_output.cc
…
tune_cublaslt_gemm.cu
…
update_inputs_beam.cu
…
update_inputs_v1.cu
[Feature] Support pd ep deployment with yiyan adapter (
#4029
)
2025-09-22 16:41:38 +08:00
update_inputs.cu
…
update_split_fuse_input.cu
…