Files
FastDeploy/docs/usage/code_overview.md
2025-07-19 23:19:27 +08:00

1.9 KiB

Code Overview

Below is an overview of the FastDeploy code structure and functionality organized by directory.

  • custom_ops: Contains C++ operators used by FastDeploy for large model inference. Operators for different hardware are placed in corresponding subdirectories (e.g., cpu_ops, gpu_ops). The root-level setup_*.py files are used to compile these C++ operators.
  • dockerfiles: Stores Dockerfiles for building FastDeploy runtime environment images.
  • docs: Documentation related to the FastDeploy codebase.
  • fastdeploy
    • agent: Scripts for launching large model services.
    • cache_manager: Cache management module for large models.
    • engine: Core engine classes for managing large model execution.
    • entrypoints: User-facing APIs for interaction.
    • input: Input processing module, including preprocessing, multimodal input handling, tokenization, etc.
    • model_executor
      • layers: Layer modules required for large model architecture.
      • model_runner: Model inference execution module.
      • models: Built-in large model classes in FastDeploy.
      • ops: Python-callable operator modules compiled from custom_ops, organized by hardware platform.
    • output: Post-processing for large model outputs.
    • platforms: Platform-specific modules for underlying hardware support.
    • scheduler: Request scheduling module for large models.
    • metrics: Core component for collecting, managing, and exporting Prometheus metrics, tracking key runtime performance data (e.g., request latency, resource utilization, successful request counts).
    • splitwise: Modules related to PD disaggragation deployment.
  • scripts/tools: Utility scripts for FastDeploy operations (e.g., compilation, unit testing, code style fixes).
  • test: Code for unit testing and validation.