parthp2107 e2578d6be3
Added documentation changes done in 0.5.0 branch to main (#1168)
* GitBook: [#177] Documentation Update - Airflow

* GitBook: [#195] Removing Cron from databaseServices

* GitBook: [#196] Added trino

* GitBook: [#197] removed cron from config

* GitBook: [#198] Added Redash Documentation

* GitBook: [#199] Added Bigquery Usage Documentation

* GitBook: [#200] Added page link for presto

* GitBook: [#201] Added Local Docker documentation

* GitBook: [#202] Added Documentation for Local Docker Setup

* GitBook: [#203] Added Git Command to clone Openmetadata in docs

* GitBook: [#207] links update

* GitBook: [#208] Updating Airflow Documentation

* GitBook: [#210] Adding Python installation package under Airflow Lineage config

* GitBook: [#211] Change the links to 0.5..0

* GitBook: [#213] Move buried connectors page up

* GitBook: [#214] Update to connectors page

* GitBook: [#215] Removed sub-categories

* GitBook: [#212] Adding Discovery tutorial

* GitBook: [#220] Updated steps to H2s.

* GitBook: [#230] Complex queries

* GitBook: [#231] Add lineage to feature overview

* GitBook: [#232] Make feature overview headers verbs instead of nouns

* GitBook: [#233] Add data reliability to features overview

* GitBook: [#234] Add complex data types to feature overview

* GitBook: [#235] Simplify and further distinguish discovery feature headers

* GitBook: [#236] Add data importance to feature overview

* GitBook: [#237] Break Connectors into its own section

* GitBook: [#238] Reorganize first section of docs.

* GitBook: [#239] Add connectors to feature overview

* GitBook: [#240] Organize layout of feature overview into feature categories as agreed with Harsha.

* GitBook: [#242] Make overview paragraph more descriptive.

* GitBook: [#243] Create a link to Connectors section from feature overview.

* GitBook: [#244] Add "discover data through association" to feature overview.

* GitBook: [#245] Update importance and owners gifs

* GitBook: [#246] Include a little more descriptive documentation for key features.

* GitBook: [#248] Small tweaks to intro paragraph.

* GitBook: [#249] Clean up data profiler paragraph.

* GitBook: [#250] Promote Complex Data Types to its own feature.

* GitBook: [#251] Update to advanced search

* GitBook: [#252] Update Roadmap

* GitBook: [#254] Remove old features page (text and screenshot based).

* GitBook: [#255] Remove references to removed page.

* GitBook: [#256] Add Descriptions and Tags section to feature overview.

* GitBook: [#257] Update title for "Know Your Data"

Co-authored-by: Ayush Shah <ayush.shah@deuexsolutions.com>
Co-authored-by: Suresh Srinivas <suresh@getcollate.io>
Co-authored-by: Shannon Bradshaw <shannon.bradshaw@arrikto.com>
Co-authored-by: OpenMetadata <github@harsha.io>
2021-11-13 09:33:20 -08:00
..

description
This design doc will walk through developing a connector for OpenMetadata

Build a Connector

Ingestion is a simple python framework to ingest the metadata from various sources.

Please look at our framework APIs

Workflow

workflow is a simple orchestration job that runs the components in an Order.

A workflow consists of Source, Processor and Sink. It also provides support for Stage and BulkSink.

Workflow execution happens in a serial fashion.

  1. The** Workflow** runs the source component first. The source retrieves a record from external sources and emits the record downstream.
  2. If the processor component is configured, the workflow sends the record to the processor next.
  3. There can be multiple processor components attached to the workflow. The workflow passes a record to each processor in the order they are configured.
  4. Once a processor is finished, it sends the modified record to the sink.
  5. The above steps are repeated for each record emitted from the source.

In the cases where we need aggregation over the records, we can use the stage to write to a file or other store. Use the file written to in stage and pass it to bulk sink to publish to external services such as openmetadata or elasticsearch.

{% content-ref url="source.md" %} source.md {% endcontent-ref %}

{% content-ref url="processor.md" %} processor.md {% endcontent-ref %}

{% content-ref url="sink.md" %} sink.md {% endcontent-ref %}

{% content-ref url="stage.md" %} stage.md {% endcontent-ref %}

{% content-ref url="bulksink.md" %} bulksink.md {% endcontent-ref %}