Dify Stress Test Suite
A high-performance stress test suite for Dify workflow execution using Locust - optimized for measuring Server-Sent Events (SSE) streaming performance.
Key Metrics Tracked
The stress test focuses on four critical SSE performance indicators:
- Active SSE Connections - Real-time count of open SSE connections
- New Connection Rate - Connections per second (conn/sec)
- Time to First Event (TTFE) - Latency until first SSE event arrives
- Event Throughput - Events per second (events/sec)
Features
- True SSE Support: Properly handles Server-Sent Events streaming without premature connection closure
- Real-time Metrics: Live reporting every 5 seconds during tests
- Comprehensive Tracking:
- Active connection monitoring
- Connection establishment rate
- Event processing throughput
- TTFE distribution analysis
- Multiple Interfaces:
- Web UI for real-time monitoring (http://localhost:8089)
- Headless mode with periodic console updates
- Detailed Reports: Final statistics with overall rates and averages
- Easy Configuration: Uses existing API key configuration from setup
What Gets Measured
The stress test focuses on SSE streaming performance with these key metrics:
Primary Endpoint: /v1/workflows/run
The stress test tests a single endpoint with comprehensive SSE metrics tracking:
- Request Type: POST request to workflow execution API
- Response Type: Server-Sent Events (SSE) stream
- Payload: Random questions from a configurable pool
- Concurrency: Configurable from 1 to 1000+ simultaneous users
Key Performance Metrics
1. Active Connections
- What it measures: Number of concurrent SSE connections open at any moment
- Why it matters: Shows system's ability to handle parallel streams
- Good values: Should remain stable under load without drops
2. Connection Rate (conn/sec)
- What it measures: How fast new SSE connections are established
- Why it matters: Indicates system's ability to handle connection spikes
- Good values:
- Light load: 5-10 conn/sec
- Medium load: 20-50 conn/sec
- Heavy load: 100+ conn/sec
3. Time to First Event (TTFE)
- What it measures: Latency from request sent to first SSE event received
- Why it matters: Critical for user experience - faster TTFE = better perceived performance
- Good values:
- Excellent: < 50ms
- Good: 50-100ms
- Acceptable: 100-500ms
- Poor: > 500ms
4. Event Throughput (events/sec)
- What it measures: Rate of SSE events being delivered across all connections
- Why it matters: Shows actual data delivery performance
- Expected values: Depends on workflow complexity and number of connections
- Single connection: 10-20 events/sec
- 10 connections: 50-100 events/sec
- 100 connections: 200-500 events/sec
5. Request/Response Times
- P50 (Median): 50% of requests complete within this time
- P95: 95% of requests complete within this time
- P99: 99% of requests complete within this time
- Min/Max: Best and worst case response times
Prerequisites
-
Dependencies are automatically installed when running setup:
- Locust (load testing framework)
- sseclient-py (SSE client library)
-
Complete Dify setup:
# Run the complete setup python scripts/stress-test/setup_all.py -
Ensure services are running:
IMPORTANT: For accurate stress testing, run the API server with Gunicorn in production mode:
# Run from the api directory cd api uv run gunicorn \ --bind 0.0.0.0:5001 \ --workers 4 \ --worker-class gevent \ --timeout 120 \ --keep-alive 5 \ --log-level info \ --access-logfile - \ --error-logfile - \ app:appConfiguration options explained:
--workers 4: Number of worker processes (adjust based on CPU cores)--worker-class gevent: Async worker for handling concurrent connections--timeout 120: Worker timeout for long-running requests--keep-alive 5: Keep connections alive for SSE streaming
NOT RECOMMENDED for stress testing:
# Debug mode - DO NOT use for stress testing (slow performance) ./dev/start-api # This runs Flask in debug mode with single-threaded executionAlso start the Mock OpenAI server:
python scripts/stress-test/setup/mock_openai_server.py
Running the Stress Test
# Run with default configuration (headless mode)
./scripts/stress-test/run_locust_stress_test.sh
# Or run directly with uv
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001
# Run with Web UI (access at http://localhost:8089)
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089
The script will:
- Validate that all required services are running
- Check API token availability
- Execute the Locust stress test with SSE support
- Generate comprehensive reports in the
reports/directory
Configuration
The stress test configuration is in locust.conf:
users = 10 # Number of concurrent users
spawn-rate = 2 # Users spawned per second
run-time = 1m # Test duration (30s, 5m, 1h)
headless = true # Run without web UI
Custom Question Sets
Modify the questions list in sse_benchmark.py:
self.questions = [
"Your custom question 1",
"Your custom question 2",
# Add more questions...
]
Understanding the Results
Report Structure
After running the stress test, you'll find these files in the reports/ directory:
locust_summary_YYYYMMDD_HHMMSS.txt- Complete console output with metricslocust_report_YYYYMMDD_HHMMSS.html- Interactive HTML report with chartslocust_YYYYMMDD_HHMMSS_stats.csv- CSV with detailed statisticslocust_YYYYMMDD_HHMMSS_stats_history.csv- Time-series data
Key Metrics
Requests Per Second (RPS):
- Excellent: > 50 RPS
- Good: 20-50 RPS
- Acceptable: 10-20 RPS
- Needs Improvement: < 10 RPS
Response Time Percentiles:
- P50 (Median): 50% of requests complete within this time
- P95: 95% of requests complete within this time
- P99: 99% of requests complete within this time
Success Rate:
- Should be > 99% for production readiness
- Lower rates indicate errors or timeouts
Example Output
============================================================
DIFY SSE STRESS TEST
============================================================
[2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec
============================================================
SSE Metrics | Active: 8 | Total Conn: 142 | Events: 2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
============================================================
Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
POST /v1/workflows/run 142 0(0.00%) | 41 18 192 38 | 2.37 0.00
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
Aggregated 142 0(0.00%) | 41 18 192 38 | 2.37 0.00
============================================================
FINAL RESULTS
============================================================
Total Connections: 142
Total Events: 2841
Average TTFE: 43 ms
============================================================
How to Read the Results
Live SSE Metrics Box (Updates every 10 seconds):
SSE Metrics | Active: 8 | Total Conn: 142 | Events: 2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
- Active: Current number of open SSE connections
- Total Conn: Cumulative connections established
- Events: Total SSE events received
- conn/s: Connection establishment rate
- events/s: Event delivery rate
- TTFE: Average time to first event
Standard Locust Table:
Type Name # reqs # fails | Avg Min Max Med | req/s
POST /v1/workflows/run 142 0(0.00%) | 41 18 192 38 | 2.37
- Type: Always POST for our SSE requests
- Name: The API endpoint being tested
- # reqs: Total requests made
- # fails: Failed requests (should be 0)
- Avg/Min/Max/Med: Response time percentiles (ms)
- req/s: Request throughput
Performance Targets:
✅ Good Performance:
- Zero failures (0.00%)
- TTFE < 100ms
- Stable active connections
- Consistent event throughput
⚠️ Warning Signs:
- Failures > 1%
- TTFE > 500ms
- Dropping active connections
- Declining event rate over time
Test Scenarios
Light Load
concurrency: 10
iterations: 100
Normal Load
concurrency: 100
iterations: 1000
Heavy Load
concurrency: 500
iterations: 5000
Stress Test
concurrency: 1000
iterations: 10000
Performance Tuning
API Server Optimization
Gunicorn Tuning for Different Load Levels:
# Light load (10-50 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app
# Medium load (50-200 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app
# Heavy load (200-1000 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app
Worker calculation formula:
- Workers = (2 × CPU cores) + 1
- For SSE/WebSocket: Use gevent worker class
- For CPU-bound tasks: Use sync workers
Database Optimization
PostgreSQL Connection Pool Tuning:
For high-concurrency stress testing, increase the PostgreSQL max connections in docker/middleware.env:
# Edit docker/middleware.env
POSTGRES_MAX_CONNECTIONS=200 # Default is 100
# Recommended values for different load levels:
# Light load (10-50 users): 100 (default)
# Medium load (50-200 users): 200
# Heavy load (200-1000 users): 500
After changing, restart the PostgreSQL container:
docker compose -f docker/docker-compose.middleware.yaml down db
docker compose -f docker/docker-compose.middleware.yaml up -d db
Note: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:
- 100 connections: ~1GB RAM
- 200 connections: ~2GB RAM
- 500 connections: ~5GB RAM
System Optimizations
-
Increase file descriptor limits:
ulimit -n 65536 -
TCP tuning for high concurrency (Linux):
# Increase TCP buffer sizes sudo sysctl -w net.core.rmem_max=134217728 sudo sysctl -w net.core.wmem_max=134217728 # Enable TCP fast open sudo sysctl -w net.ipv4.tcp_fastopen=3 -
macOS specific:
# Increase maximum connections sudo sysctl -w kern.ipc.somaxconn=2048
Troubleshooting
Common Issues
-
"ModuleNotFoundError: No module named 'locust'":
# Dependencies are installed automatically, but if needed: uv --project api add --dev locust sseclient-py -
"API key configuration not found":
# Run setup python scripts/stress-test/setup_all.py -
Services not running:
# Start Dify API with Gunicorn (production mode) cd api uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app # Start Mock OpenAI server python scripts/stress-test/setup/mock_openai_server.py -
High error rate:
- Reduce concurrency level
- Check system resources (CPU, memory)
- Review API server logs for errors
- Increase timeout values if needed
-
Permission denied running script:
chmod +x run_benchmark.sh
Advanced Usage
Running Multiple Iterations
# Run stress test 3 times with 60-second intervals
for i in {1..3}; do
echo "Run $i of 3"
./run_locust_stress_test.sh
sleep 60
done
Custom Locust Options
Run Locust directly with custom options:
# With specific user count and spawn rate
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
--host http://localhost:5001 --users 50 --spawn-rate 5
# Generate CSV reports
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
--host http://localhost:5001 --csv reports/results
# Run for specific duration
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
--host http://localhost:5001 --run-time 5m --headless
Comparing Results
# Compare multiple stress test runs
ls -la reports/stress_test_*.txt | tail -5
Interpreting Performance Issues
High Response Times
Possible causes:
- Database query performance
- External API latency
- Insufficient server resources
- Network congestion
Low Throughput (RPS < 10)
Check for:
- CPU bottlenecks
- Memory constraints
- Database connection pooling
- API rate limiting
High Error Rate
Investigate:
- Server error logs
- Resource exhaustion
- Timeout configurations
- Connection limits
Why Locust?
Locust was chosen over Drill for this stress test because:
- Proper SSE Support: Correctly handles streaming responses without premature closure
- Custom Metrics: Can track SSE-specific metrics like TTFE and stream duration
- Web UI: Real-time monitoring and control via web interface
- Python Integration: Seamlessly integrates with existing Python setup code
- Extensibility: Easy to customize for specific testing scenarios
Contributing
To improve the stress test suite:
- Edit
stress_test.ymlfor configuration changes - Modify
run_locust_stress_test.shfor workflow improvements - Update question sets for better coverage
- Add new metrics or analysis features