# bench: Benchmark Testing ## 1. bench latency: Offline Latency Test ### Parameters | Parameter | Description | Default | | -------------------- | ------------------------------------------- | ------- | | --input-len | Input sequence length (tokens) | 32 | | --output-len | Output sequence length (tokens) | 128 | | --batch-size | Batch size | 8 | | --n | Number of sequences generated per prompt | 1 | | --use-beam-search | Whether to use beam search | False | | --num-iters-warmup | Number of warmup iterations | 10 | | --num-iters | Number of actual test iterations | 30 | | --profile | Whether to enable performance profiling | False | | --output-json | Path to save latency results as a JSON file | None | | --disable-detokenize | Whether to disable detokenization | False | ### Example ``` # Run latency benchmark on the inference engine fastdeploy bench latency --model baidu/ERNIE-4.5-0.3B-Paddle ``` ## 2. bench serve: Online Latency and Throughput Test ### Parameters | Parameter | Description | Default | | ----------------- | ------------------------------------- | ---------------------- | | --backend | Backend type | "openai-chat" | | --base-url | Base URL of the server or API | None | | --host | Host address | "127.0.0.1" | | --port | Port | 8000 | | --endpoint | API endpoint path | "/v1/chat/completions" | | --model | Model name | Required | | --dataset-name | Dataset name | "sharegpt" | | --dataset-path | Path to dataset | None | | --num-prompts | Number of prompts to process | 1000 | | --request-rate | Requests per second | inf | | --max-concurrency | Maximum concurrency | None | | --top-p | Sampling top-p (OpenAI backend) | None | | --top-k | Sampling top-k (OpenAI backend) | None | | --temperature | Sampling temperature (OpenAI backend) | None | ### Example ``` # Run online performance test fastdeploy bench serve --backend openai-chat \ --model baidu/ERNIE-4.5-0.3B-Paddle \ --endpoint /v1/chat/completions \ --host 0.0.0.0 \ --port 8891 \ --dataset-name EBChat \ --dataset-path /datasets/filtered_sharedgpt_2000_input_1136_output_200.json \ --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \ --metric-percentiles 80,95,99,99.9,99.95,99.99 \ --num-prompts 1 \ --max-concurrency 1 \ --save-result ``` ## 3. bench throughput: Throughput Test ### Parameters | Parameter | Description | Default | | -------------------- | ---------------------------------------- | ------------ | | --backend | Inference backend | "fastdeploy" | | --dataset-name | Dataset name | "random" | | --model | Model name | Required | | --input-len | Input sequence length | None | | --output-len | Output sequence length | None | | --prefix-len | Prefix length | 0 | | --n | Number of sequences generated per prompt | 1 | | --num-prompts | Number of prompts | 50 | | --output-json | Path to save results as a JSON file | None | | --disable-detokenize | Whether to disable detokenization | False | | --lora-path | Path to LoRA adapter | None | ### Example ``` # Run throughput benchmark on the inference engine fastdeploy bench throughput --model baidu/ERNIE-4.5-0.3B-Paddle \ --backend fastdeploy-chat \ --dataset-name EBChat \ --dataset-path /datasets/filtered_sharedgpt_2000_input_1136_output_200.json \ --max-model-len 32768 ``` ## 4. bench eval: Online Task Evaluation ### Parameters | Parameter | Description | Default | | ----------------- | ------------------------------- | ------- | | --model, -m | Model name | "hf" | | --tasks, -t | List of evaluation tasks | None | | --model_args, -a | Model arguments | "" | | --num_fewshot, -f | Number of few-shot examples | None | | --samples, -E | Number of samples | None | | --batch_size, -b | Batch size | 1 | | --device | Device | None | | --output_path, -o | Output file path | None | | --write_out, -w | Whether to write output results | False | ### Example ``` # Run task evaluation on an online service fastdeploy bench eval --model local-completions \ --model_args pretrained=./baidu/ERNIE-4.5-0.3B-Paddle,base_url=http://0.0.0.0:8490/v1/completions --write_out \ --tasks ceval-valid_accountant ```