This connector ingests OpenAPI (Swagger) API endpoint metadata into DataHub. It extracts API endpoints from OpenAPI v2 (Swagger) and v3 specifications and represents them as datasets in DataHub, allowing you to catalog and discover your API endpoints alongside your data assets.
### Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
3.**Live API Calls (Optional)** - If `enable_api_calls_for_schema_extraction=True` and credentials are provided, the source will make GET requests to endpoints when:
- Schema extraction from the spec fails
- The endpoint uses the GET method
- Valid credentials are available (username/password, token, or bearer_token)
All ingested endpoints are organized in DataHub's browse interface using browse paths based on their endpoint path structure. This makes it easy to navigate and discover related endpoints.
If you want to enable live API calls for schema extraction (`enable_api_calls_for_schema_extraction=True`), you'll need to provide authentication credentials. The source supports:
Authentication is only required if you want to enable live API calls. Schema extraction from the OpenAPI specification itself does not require authentication.
When using `get_token` with `request_type: get`, the username and password are sent in the URL query parameters, which is less secure. Use `request_type: post` when possible.
For endpoints with path parameters where the source cannot automatically determine example values, you can provide them manually using `forced_examples`:
The source will use these values to construct URLs for API calls when needed.
### Ignoring Endpoints
You can exclude specific endpoints from ingestion:
```yaml
source:
type: openapi
config:
name: my_api
url: https://api.example.com
swagger_file: openapi.json
ignore_endpoints:
- /health
- /metrics
- /internal/debug
```
## Examples
### Basic Configuration (Schema from Spec Only)
```yaml
source:
type: openapi
config:
name: petstore_api
url: https://petstore.swagger.io
swagger_file: /v2/swagger.json
enable_api_calls_for_schema_extraction: false
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
```
### With API Calls Enabled
```yaml
source:
type: openapi
config:
name: petstore_api
url: https://petstore.swagger.io
swagger_file: /v2/swagger.json
bearer_token: "${BEARER_TOKEN}"
enable_api_calls_for_schema_extraction: true
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
```
### Complete Example with All Options
```yaml
source:
type: openapi
config:
name: petstore_api
url: https://petstore.swagger.io
swagger_file: /v2/swagger.json
# Authentication
bearer_token: "${BEARER_TOKEN}"
# Optional: Enable/disable API calls
enable_api_calls_for_schema_extraction: true
# Optional: Ignore specific endpoints
ignore_endpoints:
- /user/logout
# Optional: Provide example values for parameterized endpoints
forced_examples:
/pet/{petId}: [1]
/store/order/{orderId}: [1]
/user/{username}: ["user1"]
# Optional: Proxy configuration
proxies:
http: "http://proxy.example.com:8080"
https: "https://proxy.example.com:8080"
# Optional: SSL verification
verify_ssl: true
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
```
## Limitations
- **API calls are GET-only**: Live API calls for schema extraction are only made for GET methods. POST, PUT, and PATCH methods rely solely on schema definitions in the OpenAPI specification.
- **Authentication required for API calls**: If `enable_api_calls_for_schema_extraction=True`, valid credentials must be provided.
- **200 response codes only**: Only endpoints with 200 response codes are ingested.
- **Schema extraction from spec is preferred**: The source prioritizes extracting schemas from the OpenAPI specification. API calls are used as a fallback.
## Troubleshooting
### No schemas extracted
If schemas aren't being extracted:
1.**Check the OpenAPI specification** - Ensure your spec includes schema definitions in responses or request bodies
2.**Enable API calls** - Set `enable_api_calls_for_schema_extraction: true` and provide credentials
3.**Check authentication** - Verify your credentials are correct if API calls are enabled
4.**Review warnings** - Check the ingestion report for warnings about specific endpoints
### Endpoints not appearing
If endpoints aren't appearing in DataHub:
1.**Check ignore_endpoints** - Ensure endpoints aren't in the ignore list
2.**Verify response codes** - Only endpoints with 200 response codes are ingested
3.**Check OpenAPI spec format** - Ensure the specification is valid OpenAPI v2 or v3