mirror of https://github.com/langgenius/dify.git synced 2026-01-01 04:21:09 +00:00

History

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

2025-09-16 12:58:12 +08:00

common

ruff check preview (#25653 )

2025-09-16 12:58:12 +08:00

setup

ruff check preview (#25653 )

2025-09-16 12:58:12 +08:00

cleanup.py

ruff check preview (#25653 )

2025-09-16 12:58:12 +08:00

locust.conf

…

README.md

…

run_locust_stress_test.sh

…

setup_all.py

ruff check preview (#25653 )

2025-09-16 12:58:12 +08:00

sse_benchmark.py

ruff check preview (#25653 )

2025-09-16 12:58:12 +08:00

README.md

Dify Stress Test Suite

A high-performance stress test suite for Dify workflow execution using Locust - optimized for measuring Server-Sent Events (SSE) streaming performance.

Key Metrics Tracked

The stress test focuses on four critical SSE performance indicators:

Active SSE Connections - Real-time count of open SSE connections
New Connection Rate - Connections per second (conn/sec)
Time to First Event (TTFE) - Latency until first SSE event arrives
Event Throughput - Events per second (events/sec)

Features

True SSE Support: Properly handles Server-Sent Events streaming without premature connection closure
Real-time Metrics: Live reporting every 5 seconds during tests
Comprehensive Tracking:
- Active connection monitoring
- Connection establishment rate
- Event processing throughput
- TTFE distribution analysis
Multiple Interfaces:
- Web UI for real-time monitoring (http://localhost:8089)
- Headless mode with periodic console updates
Detailed Reports: Final statistics with overall rates and averages
Easy Configuration: Uses existing API key configuration from setup

What Gets Measured

The stress test focuses on SSE streaming performance with these key metrics:

Primary Endpoint: `/v1/workflows/run`

The stress test tests a single endpoint with comprehensive SSE metrics tracking:

Request Type: POST request to workflow execution API
Response Type: Server-Sent Events (SSE) stream
Payload: Random questions from a configurable pool
Concurrency: Configurable from 1 to 1000+ simultaneous users

Key Performance Metrics

1. Active Connections

What it measures: Number of concurrent SSE connections open at any moment
Why it matters: Shows system's ability to handle parallel streams
Good values: Should remain stable under load without drops

2. Connection Rate (conn/sec)

What it measures: How fast new SSE connections are established
Why it matters: Indicates system's ability to handle connection spikes
Good values:
- Light load: 5-10 conn/sec
- Medium load: 20-50 conn/sec
- Heavy load: 100+ conn/sec

3. Time to First Event (TTFE)

What it measures: Latency from request sent to first SSE event received
Why it matters: Critical for user experience - faster TTFE = better perceived performance
Good values:
- Excellent: < 50ms
- Good: 50-100ms
- Acceptable: 100-500ms
- Poor: > 500ms

4. Event Throughput (events/sec)

What it measures: Rate of SSE events being delivered across all connections
Why it matters: Shows actual data delivery performance
Expected values: Depends on workflow complexity and number of connections
- Single connection: 10-20 events/sec
- 10 connections: 50-100 events/sec
- 100 connections: 200-500 events/sec

5. Request/Response Times

P50 (Median): 50% of requests complete within this time
P95: 95% of requests complete within this time
P99: 99% of requests complete within this time
Min/Max: Best and worst case response times

Prerequisites

Dependencies are automatically installed when running setup:
- Locust (load testing framework)
- sseclient-py (SSE client library)

Complete Dify setup:

# Run the complete setup
python scripts/stress-test/setup_all.py

Ensure services are running:

IMPORTANT: For accurate stress testing, run the API server with Gunicorn in production mode:

# Run from the api directory
cd api
uv run gunicorn \
  --bind 0.0.0.0:5001 \
  --workers 4 \
  --worker-class gevent \
  --timeout 120 \
  --keep-alive 5 \
  --log-level info \
  --access-logfile - \
  --error-logfile - \
  app:app

Configuration options explained:

--workers 4: Number of worker processes (adjust based on CPU cores)
--worker-class gevent: Async worker for handling concurrent connections
--timeout 120: Worker timeout for long-running requests
--keep-alive 5: Keep connections alive for SSE streaming

NOT RECOMMENDED for stress testing:

# Debug mode - DO NOT use for stress testing (slow performance)
./dev/start-api  # This runs Flask in debug mode with single-threaded execution

Also start the Mock OpenAI server:

python scripts/stress-test/setup/mock_openai_server.py

Running the Stress Test

# Run with default configuration (headless mode)
./scripts/stress-test/run_locust_stress_test.sh

# Or run directly with uv
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001

# Run with Web UI (access at http://localhost:8089)
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089

The script will:

Validate that all required services are running
Check API token availability
Execute the Locust stress test with SSE support
Generate comprehensive reports in the reports/ directory

Configuration

The stress test configuration is in locust.conf:

users = 10           # Number of concurrent users
spawn-rate = 2       # Users spawned per second
run-time = 1m        # Test duration (30s, 5m, 1h)
headless = true      # Run without web UI

Custom Question Sets

Modify the questions list in sse_benchmark.py:

self.questions = [
    "Your custom question 1",
    "Your custom question 2",
    # Add more questions...
]

Understanding the Results

Report Structure

After running the stress test, you'll find these files in the reports/ directory:

locust_summary_YYYYMMDD_HHMMSS.txt - Complete console output with metrics
locust_report_YYYYMMDD_HHMMSS.html - Interactive HTML report with charts
locust_YYYYMMDD_HHMMSS_stats.csv - CSV with detailed statistics
locust_YYYYMMDD_HHMMSS_stats_history.csv - Time-series data

Key Metrics

Requests Per Second (RPS):

Excellent: > 50 RPS
Good: 20-50 RPS
Acceptable: 10-20 RPS
Needs Improvement: < 10 RPS

Response Time Percentiles:

P50 (Median): 50% of requests complete within this time
P95: 95% of requests complete within this time
P99: 99% of requests complete within this time

Success Rate:

Should be > 99% for production readiness
Lower rates indicate errors or timeouts

Example Output

============================================================
DIFY SSE STRESS TEST
============================================================

[2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec

============================================================
SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
============================================================

Type     Name                          # reqs  # fails |    Avg     Min     Max    Med | req/s  failures/s
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
POST     /v1/workflows/run                  142   0(0.00%) |     41      18     192     38 |   2.37        0.00
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
         Aggregated                         142   0(0.00%) |     41      18     192     38 |   2.37        0.00

============================================================
FINAL RESULTS
============================================================
Total Connections: 142
Total Events:      2841
Average TTFE:      43 ms
============================================================

How to Read the Results

Live SSE Metrics Box (Updates every 10 seconds):

SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms

Active: Current number of open SSE connections
Total Conn: Cumulative connections established
Events: Total SSE events received
conn/s: Connection establishment rate
events/s: Event delivery rate
TTFE: Average time to first event

Standard Locust Table:

Type     Name                # reqs  # fails |    Avg     Min     Max    Med | req/s
POST     /v1/workflows/run      142   0(0.00%) |     41      18     192     38 |   2.37

Type: Always POST for our SSE requests
Name: The API endpoint being tested
# reqs: Total requests made
# fails: Failed requests (should be 0)
Avg/Min/Max/Med: Response time percentiles (ms)
req/s: Request throughput

Performance Targets:

✅ Good Performance:

Zero failures (0.00%)
TTFE < 100ms
Stable active connections
Consistent event throughput

⚠️ Warning Signs:

Failures > 1%
TTFE > 500ms
Dropping active connections
Declining event rate over time

Test Scenarios

Light Load

concurrency: 10
iterations: 100

Normal Load

concurrency: 100
iterations: 1000

Heavy Load

concurrency: 500
iterations: 5000

Stress Test

concurrency: 1000
iterations: 10000

Performance Tuning

API Server Optimization

Gunicorn Tuning for Different Load Levels:

# Light load (10-50 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app

# Medium load (50-200 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app

# Heavy load (200-1000 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app

Worker calculation formula:

Workers = (2 × CPU cores) + 1
For SSE/WebSocket: Use gevent worker class
For CPU-bound tasks: Use sync workers

Database Optimization

PostgreSQL Connection Pool Tuning:

For high-concurrency stress testing, increase the PostgreSQL max connections in docker/middleware.env:

# Edit docker/middleware.env
POSTGRES_MAX_CONNECTIONS=200  # Default is 100

# Recommended values for different load levels:
# Light load (10-50 users): 100 (default)
# Medium load (50-200 users): 200
# Heavy load (200-1000 users): 500

After changing, restart the PostgreSQL container:

docker compose -f docker/docker-compose.middleware.yaml down db
docker compose -f docker/docker-compose.middleware.yaml up -d db

Note: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:

100 connections: ~1GB RAM
200 connections: ~2GB RAM
500 connections: ~5GB RAM

System Optimizations

Increase file descriptor limits:
```
ulimit -n 65536
```

TCP tuning for high concurrency (Linux):

# Increase TCP buffer sizes
sudo sysctl -w net.core.rmem_max=134217728
sudo sysctl -w net.core.wmem_max=134217728

# Enable TCP fast open
sudo sysctl -w net.ipv4.tcp_fastopen=3

macOS specific:

# Increase maximum connections
sudo sysctl -w kern.ipc.somaxconn=2048

Troubleshooting

Common Issues

"ModuleNotFoundError: No module named 'locust'":

# Dependencies are installed automatically, but if needed:
uv --project api add --dev locust sseclient-py

"API key configuration not found":

# Run setup
python scripts/stress-test/setup_all.py

Services not running:

# Start Dify API with Gunicorn (production mode)
cd api
uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app

# Start Mock OpenAI server
python scripts/stress-test/setup/mock_openai_server.py

High error rate:
- Reduce concurrency level
- Check system resources (CPU, memory)
- Review API server logs for errors
- Increase timeout values if needed
Permission denied running script:
```
chmod +x run_benchmark.sh
```

Advanced Usage

Running Multiple Iterations

# Run stress test 3 times with 60-second intervals
for i in {1..3}; do
    echo "Run $i of 3"
    ./run_locust_stress_test.sh
    sleep 60
done

Custom Locust Options

Run Locust directly with custom options:

# With specific user count and spawn rate
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --users 50 --spawn-rate 5

# Generate CSV reports
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --csv reports/results

# Run for specific duration
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --run-time 5m --headless

Comparing Results

# Compare multiple stress test runs
ls -la reports/stress_test_*.txt | tail -5

Interpreting Performance Issues

High Response Times

Possible causes:

Database query performance
External API latency
Insufficient server resources
Network congestion

Low Throughput (RPS < 10)

Check for:

CPU bottlenecks
Memory constraints
Database connection pooling
API rate limiting

High Error Rate

Investigate:

Server error logs
Resource exhaustion
Timeout configurations
Connection limits

Why Locust?

Locust was chosen over Drill for this stress test because:

Proper SSE Support: Correctly handles streaming responses without premature closure
Custom Metrics: Can track SSE-specific metrics like TTFE and stream duration
Web UI: Real-time monitoring and control via web interface
Python Integration: Seamlessly integrates with existing Python setup code
Extensibility: Easy to customize for specific testing scenarios

Contributing

To improve the stress test suite:

Edit stress_test.yml for configuration changes
Modify run_locust_stress_test.sh for workflow improvements
Update question sets for better coverage
Add new metrics or analysis features

README.md Unescape Escape

Dify Stress Test Suite

Key Metrics Tracked

Features

What Gets Measured

Primary Endpoint: /v1/workflows/run

Key Performance Metrics

1. Active Connections

2. Connection Rate (conn/sec)

3. Time to First Event (TTFE)

4. Event Throughput (events/sec)

5. Request/Response Times

Prerequisites

Running the Stress Test

Configuration

Custom Question Sets

Understanding the Results

Report Structure

Key Metrics

Example Output

How to Read the Results

Test Scenarios

Light Load

Normal Load

Heavy Load

Stress Test

Performance Tuning

API Server Optimization

Database Optimization

System Optimizations

Troubleshooting

Common Issues

Advanced Usage

Running Multiple Iterations

Custom Locust Options

Comparing Results

Interpreting Performance Issues

High Response Times

Low Throughput (RPS < 10)

High Error Rate

Why Locust?

Contributing

README.md

Primary Endpoint: `/v1/workflows/run`