DataHub Smoke Tests
This directory contains end-to-end smoke tests for DataHub functionality. These tests can be run locally for faster development and debugging compared to the full CI pipeline.
Quick Start
Prerequisites
-
DataHub must be running locally
# From project root ./gradlew quickstartDebug
-
Set up Python environment (one-time setup)
# From project root - sets up metadata-ingestion venv ./gradlew :metadata-ingestion:installDev # Set up smoke-test specific environment cd smoke-test python3 -m venv venv source venv/bin/activate pip install --upgrade pip wheel setuptools pip install -r requirements.txt
Environment Variables
export DATAHUB_VERSION=v1.0.0rc3-SNAPSHOT # or current version
export TEST_STRATEGY=no_cypress_suite0 # for non-Cypress tests
Running Tests
cd smoke-test
source venv/bin/activate
# Set environment variables
export DATAHUB_VERSION=v1.0.0rc3-SNAPSHOT
export TEST_STRATEGY=no_cypress_suite0
# Run all tests (WARNING: Takes a long time, requires full setup)
pytest -vv
# Run specific test file (RECOMMENDED for development)
pytest test_system_info.py -vv
# Run specific test method
pytest test_system_info.py::test_system_info_main_endpoint -vv
# Run multiple specific tests
pytest test_e2e.py::test_healthchecks test_e2e.py::test_gms_usage_fetch -v
Test Categories
System Info Tests (test_system_info.py
)
✅ FAST - Can run independently
Tests the system info API endpoints:
/openapi/v1/system-info
- Spring components only/openapi/v1/system-info/properties
- Detailed properties/openapi/v1/system-info/spring-components
- Component status
# Run all system info tests (takes ~30 seconds)
pytest test_system_info.py -vv
Core E2E Tests (test_e2e.py
)
⚠️ SLOW - Requires full ingestion pipeline
Tests that require data ingestion and full DataHub functionality. Many tests depend on the initial ingestion fixture which can fail if Kafka/Schema Registry aren't properly configured.
# Run health checks only (fast)
pytest test_e2e.py::test_healthchecks -v
# Run authentication tests (fast)
pytest test_e2e.py::test_frontend_auth -v
# Run full e2e tests (slow, requires full setup)
pytest test_e2e.py -vv
Development Workflow
Testing System Info Changes
After making changes to system info APIs:
-
Restart DataHub
# Kill existing processes ./gradlew :datahub-frontend:stop :datahub-gms:stop # Restart ./gradlew quickstartDebug
-
Run System Info Tests
cd smoke-test source venv/bin/activate export DATAHUB_VERSION=v1.0.0rc3-SNAPSHOT export TEST_STRATEGY=no_cypress_suite0 pytest test_system_info.py -vv
Quick API Verification
# Check if DataHub is running
curl -s http://localhost:8080/health | head -5
# Test system info endpoint directly
curl -s http://localhost:8080/openapi/v1/system-info | jq . | head -20
Troubleshooting
Common Issues
❌ "Connection refused" errors
- DataHub is not running
- Wrong port (should be 8080 for GMS)
- Services still starting up (wait a few minutes)
❌ "401 Unauthorized" for direct curl
- Expected behavior - tests handle authentication
- Use the test suite instead of direct curl for authenticated endpoints
❌ Kafka/Schema Registry connection errors
- Only affects full e2e tests with ingestion
- System info tests should still work
- Try running individual test methods instead of full suite
❌ Python environment issues
# Recreate virtual environment
rm -rf venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Environment Debug
# Check if services are running
curl -s http://localhost:8080/health
curl -s http://localhost:9092 # Kafka (will show connection refused if not running)
# Verify Python environment
source venv/bin/activate
which python
python --version
pip list | grep datahub
# Check environment variables
echo "DATAHUB_VERSION: $DATAHUB_VERSION"
echo "TEST_STRATEGY: $TEST_STRATEGY"
CI vs Local Testing
- CI: Uses
./gradlew :smoke-test:pytest
- full pipeline with Docker containers - Local: Uses direct pytest - faster, uses locally running DataHub instance
- Recommendation: Use local for development, CI for final validation
Test Organization
test_e2e.py
- Main test suite (1387 lines)test_system_info.py
- System info API tests (169 lines)conftest.py
- Test configuration and fixturestests/utils.py
- Test utilities and helpers
💡 Pro Tip: For rapid development, use pytest test_system_info.py -vv
which runs in ~30 seconds vs the full test suite which can take 30+ minutes.