
* Rename docs and clean SSO * Add connector partials * Add connector partials * Rename path
2.6 KiB
title | slug |
---|---|
Build a Connector | /sdk/python/build-connector |
Build a Connector
This design doc will walk through developing a connector for OpenMetadata
Ingestion is a simple python framework to ingest the metadata from various sources.
Please look at our framework APIs.
Workflow
workflow is a simple orchestration job that runs the components in an Order.
A workflow consists of Source and Sink. It also provides support for Stage and BulkSink.
Workflow execution happens in a serial fashion.
- The Workflow runs the source component first. The source retrieves a record from external sources and emits the record downstream.
- If the processor component is configured, the workflow sends the record to the processor next.
- There can be multiple processor components attached to the workflow. The workflow passes a record to each processor in the order they are configured.
- Once a processor is finished, it sends the modified record to the sink.
- The above steps are repeated for each record emitted from the source.
In the cases where we need aggregation over the records, we can use the stage to write to a file or other store. Use the file written to in stage and pass it to bulk sink to publish to external services such as OpenMetadata or Elasticsearch.
{% inlineCalloutContainer %} {% inlineCallout color="violet-70" icon="source" bold="Source" href="/sdk/python/build-connector/source" %} The connector to external systems which outputs a record for downstream to process. {% /inlineCallout %} {% inlineCallout color="violet-70" icon="filter_alt" bold="Sink" href="/sdk/python/build-connector/sink" %} It will get the event emitted by the source, one at a time. {% /inlineCallout %} {% inlineCallout color="violet-70" icon="storage" bold="Stage" href="/sdk/python/build-connector/stage" %} It can be used to store the records or to aggregate the work done by a processor. {% /inlineCallout %} {% inlineCallout color="violet-70" icon="filter_list" bold="BulkSink" href="/sdk/python/build-connector/bulk-sink" %} It can be used to bulk update the records generated in a workflow. {% /inlineCallout %} {% /inlineCalloutContainer %}