2.0 KiB
description |
---|
This design doc will walk through developing a connector for OpenMetadata |
Build a Connector
Ingestion is a simple python framework to ingest the metadata from various sources.
Please look at our framework APIs
Workflow
workflow is a simple orchestration job that runs the components in an Order.
A workflow consists of Source and Sink. It also provides support for Stage and BulkSink.
Workflow execution happens in a serial fashion.
- The Workflow runs the source component first. The source retrieves a record from external sources and emits the record downstream.
- If the processor component is configured, the workflow sends the record to the processor next.
- There can be multiple processor components attached to the workflow. The workflow passes a record to each processor in the order they are configured.
- Once a processor is finished, it sends the modified record to the sink.
- The above steps are repeated for each record emitted from the source.
In the cases where we need aggregation over the records, we can use the stage to write to a file or other store. Use the file written to in stage and pass it to bulk sink to publish to external services such as OpenMetadata or Elasticsearch.
{% content-ref url="setup.md" %} setup.md {% endcontent-ref %}
{% content-ref url="setup.md" %} setup.md {% endcontent-ref %}
{% content-ref url="source.md" %} source.md {% endcontent-ref %}
{% content-ref url="sink.md" %} sink.md {% endcontent-ref %}
{% content-ref url="stage.md" %} stage.md {% endcontent-ref %}
{% content-ref url="bulksink.md" %} bulksink.md {% endcontent-ref %}