mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-08-19 06:28:03 +00:00
GitBook: [main] 11 pages and one asset modified
This commit is contained in:
parent
2d70350742
commit
856be1f640
78302
docs/.gitbook/assets/openmetadata-style-guide (1).pdf
Normal file
78302
docs/.gitbook/assets/openmetadata-style-guide (1).pdf
Normal file
File diff suppressed because it is too large
Load Diff
Binary file not shown.
After Width: | Height: | Size: 111 KiB |
@ -1,10 +1,10 @@
|
|||||||
# Introduction
|
# Introduction
|
||||||
|
|
||||||
Data is an important asset of an organization and metadata is the key to unlock the value from that asset. It provides crucial context to turn data into information and powers not just the current limited use cases of data discovery, and governance, but also emerging use cases related to data quality, observability, and most importantly people collaboration.
|
Metadata enables you to unlock the value of data assets in the common use cases of data discovery and governance, but also in emerging use cases related to data quality, observability, and people collaboration. However, poorly organized and managed metadata leads to redundant efforts within organizations and other inefficiencies that are expensive in time and dollars.
|
||||||
|
|
||||||
Poorly organized metadata is preventing organizations from realizing the full potential of data. Metadata is incorrect, inconsistent, stale, often missing, and fragmented in silos across various disconnected tools in proprietary formats obscuring a holistic picture of data.
|
Poorly organized metadata is preventing organizations from realizing the full potential of data. Metadata is incorrect, inconsistent, stale, often missing, and fragmented in silos across various disconnected tools in proprietary formats obscuring a holistic picture of data.
|
||||||
|
|
||||||
### **OpenMetadata is an Open standard for metadata with a centralized metadata store that unifies all the data assets and metadata end-to-end to power data discovery, user collaboration, and tool interoperability.**
|
### **OpenMetadata is an open standard with a centralized metadata store that unifies all your data assets end-to-end to enable data discovery, user collaboration, and tool interoperability.**
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
@ -1,12 +1,12 @@
|
|||||||
# Table of contents
|
# Table of contents
|
||||||
|
|
||||||
* [Introduction](README.md)
|
* [Introduction](README.md)
|
||||||
* [Take it for a spin](take-it-for-a-spin.md)
|
* [Try OpenMetadata](take-it-for-a-spin.md)
|
||||||
|
|
||||||
## OpenMetadata APIs
|
## OpenMetadata APIs
|
||||||
|
|
||||||
* [Schemas](openmetadata-apis/schemas/README.md)
|
* [Schemas](openmetadata-apis/schemas/README.md)
|
||||||
* [Schema Language](openmetadata-apis/schemas/schema-language.md)
|
* [JSON Schema](openmetadata-apis/schemas/schema-language.md)
|
||||||
* [Schema Concepts](openmetadata-apis/schemas/overview.md)
|
* [Schema Concepts](openmetadata-apis/schemas/overview.md)
|
||||||
* [OpenMetadata Types](openmetadata-apis/schemas/types/README.md)
|
* [OpenMetadata Types](openmetadata-apis/schemas/types/README.md)
|
||||||
* [Basic Types](openmetadata-apis/schemas/types/basic.md)
|
* [Basic Types](openmetadata-apis/schemas/types/basic.md)
|
||||||
@ -77,11 +77,11 @@
|
|||||||
* [Coding Style](open-source-community/developer/coding-style.md)
|
* [Coding Style](open-source-community/developer/coding-style.md)
|
||||||
* [Build the code & run tests](open-source-community/developer/build-code-run-tests.md)
|
* [Build the code & run tests](open-source-community/developer/build-code-run-tests.md)
|
||||||
* [Build a Connector](open-source-community/developer/build-a-connector/README.md)
|
* [Build a Connector](open-source-community/developer/build-a-connector/README.md)
|
||||||
* [Source](open-source-community/developer/build-a-connector/source.md)
|
* [Source](open-source-community/developer/build-a-connector/source.md)
|
||||||
* [Processor](open-source-community/developer/build-a-connector/processor.md)
|
* [Processor](open-source-community/developer/build-a-connector/processor.md)
|
||||||
* [Sink](open-source-community/developer/build-a-connector/sink.md)
|
* [Sink](open-source-community/developer/build-a-connector/sink.md)
|
||||||
* [Stage](open-source-community/developer/build-a-connector/stage.md)
|
* [Stage](open-source-community/developer/build-a-connector/stage.md)
|
||||||
* [BulkSink](open-source-community/developer/build-a-connector/bulksink.md)
|
* [BulkSink](open-source-community/developer/build-a-connector/bulksink.md)
|
||||||
* [Run Integration Tests](open-source-community/developer/run-integration-tests.md)
|
* [Run Integration Tests](open-source-community/developer/run-integration-tests.md)
|
||||||
* [UX Style Guide](open-source-community/developer/ux-style-guide.md)
|
* [UX Style Guide](open-source-community/developer/ux-style-guide.md)
|
||||||
|
|
||||||
|
@ -47,7 +47,7 @@ Different Connectors require different dependencies, please go through [Connecto
|
|||||||
|
|
||||||
Loads all the Json connectors inside the pipeline directory as cron jobs.
|
Loads all the Json connectors inside the pipeline directory as cron jobs.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
### Custom run a job
|
### Custom run a job
|
||||||
|
|
||||||
|
@ -1,21 +1,18 @@
|
|||||||
---
|
---
|
||||||
description: >-
|
description: This design doc will walk through developing a connector for OpenMetadata
|
||||||
This design doc will walk through developing a connector for OpenMetadata
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
# Build a Connector
|
||||||
# Ingestion API
|
|
||||||
|
|
||||||
Ingestion is a simple python framework to ingest the metadata from various sources.
|
Ingestion is a simple python framework to ingest the metadata from various sources.
|
||||||
|
|
||||||
Please look at our framework [APIs](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/src/metadata/ingestion/api)
|
Please look at our framework [APIs](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/src/metadata/ingestion/api)
|
||||||
|
|
||||||
|
|
||||||
## Workflow
|
## Workflow
|
||||||
|
|
||||||
[workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) is a simple orchestration job that runs the components in an Order.
|
[workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) is a simple orchestration job that runs the components in an Order.
|
||||||
|
|
||||||
It consists of [Source](./source.md) ,[Processor](./processor.md), [Sink](./sink.md) . It also provides support for [Stage](./stage.md) , [BulkSink](./bulksink.md)
|
It consists of [Source](source.md) ,[Processor](processor.md), [Sink](sink.md) . It also provides support for [Stage](stage.md) , [BulkSink](bulksink.md)
|
||||||
|
|
||||||
Workflow execution happens in serial fashion.
|
Workflow execution happens in serial fashion.
|
||||||
|
|
||||||
@ -27,8 +24,6 @@ Workflow execution happens in serial fashion.
|
|||||||
|
|
||||||
In the cases where we need to aggregation over the records, we can use **stage** to write to a file or other store. Use the file written to in **stage** and pass it to **bulksink** to publish to external services such as **openmetadata** or **elasticsearch**
|
In the cases where we need to aggregation over the records, we can use **stage** to write to a file or other store. Use the file written to in **stage** and pass it to **bulksink** to publish to external services such as **openmetadata** or **elasticsearch**
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
{% page-ref page="source.md" %}
|
{% page-ref page="source.md" %}
|
||||||
|
|
||||||
{% page-ref page="processor.md" %}
|
{% page-ref page="processor.md" %}
|
||||||
@ -39,7 +34,3 @@ In the cases where we need to aggregation over the records, we can use **stage**
|
|||||||
|
|
||||||
{% page-ref page="bulksink.md" %}
|
{% page-ref page="bulksink.md" %}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -1,11 +1,10 @@
|
|||||||
#BulkSink
|
# BulkSink
|
||||||
|
|
||||||
**BulkSink** is an optional component in workflow. It can be used to bulk update the records
|
**BulkSink** is an optional component in workflow. It can be used to bulk update the records generated in a workflow. It needs to be used in conjuction with Stage
|
||||||
generated in a workflow. It needs to be used in conjuction with Stage
|
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
```py
|
```python
|
||||||
@dataclass # type: ignore[misc]
|
@dataclass # type: ignore[misc]
|
||||||
class BulkSink(Closeable, metaclass=ABCMeta):
|
class BulkSink(Closeable, metaclass=ABCMeta):
|
||||||
ctx: WorkflowContext
|
ctx: WorkflowContext
|
||||||
@ -30,14 +29,12 @@ class BulkSink(Closeable, metaclass=ABCMeta):
|
|||||||
|
|
||||||
**create** method is called during the workflow instantiation and creates a instance of the bulksink
|
**create** method is called during the workflow instantiation and creates a instance of the bulksink
|
||||||
|
|
||||||
**write_records** this method is called only once in Workflow. Its developer responsibility to make bulk actions inside this method. Such as read the entire file or store to generate
|
**write\_records** this method is called only once in Workflow. Its developer responsibility to make bulk actions inside this method. Such as read the entire file or store to generate the API calls to external services
|
||||||
the API calls to external services
|
|
||||||
|
|
||||||
**get_status** to report the status of the bulk_sink ex: how many records, failures or warnings etc..
|
**get\_status** to report the status of the bulk\_sink ex: how many records, failures or warnings etc..
|
||||||
|
|
||||||
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
|
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
|
||||||
|
|
||||||
|
|
||||||
## Example
|
## Example
|
||||||
|
|
||||||
[Example implmentation](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/bulksink/metadata_usage.py#L36)
|
[Example implmentation](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/bulksink/metadata_usage.py#L36)
|
||||||
|
@ -1,13 +1,10 @@
|
|||||||
#Processor
|
# Processor
|
||||||
|
|
||||||
**Processor** is an optional component in workflow. It can be used to modify the record
|
|
||||||
coming from sources. Processor receives a record from source and can modify and re-emit the
|
|
||||||
event back to workflow.
|
|
||||||
|
|
||||||
|
**Processor** is an optional component in workflow. It can be used to modify the record coming from sources. Processor receives a record from source and can modify and re-emit the event back to workflow.
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
```py
|
```python
|
||||||
@dataclass
|
@dataclass
|
||||||
class Processor(Closeable, metaclass=ABCMeta):
|
class Processor(Closeable, metaclass=ABCMeta):
|
||||||
ctx: WorkflowContext
|
ctx: WorkflowContext
|
||||||
@ -34,17 +31,15 @@ class Processor(Closeable, metaclass=ABCMeta):
|
|||||||
|
|
||||||
**process** this method is called for each record coming down in workflow chain and can be used to modify or enrich the record
|
**process** this method is called for each record coming down in workflow chain and can be used to modify or enrich the record
|
||||||
|
|
||||||
**get_status** to report the status of the processor ex: how many records, failures or warnings etc..
|
**get\_status** to report the status of the processor ex: how many records, failures or warnings etc..
|
||||||
|
|
||||||
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
|
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
|
||||||
|
|
||||||
|
|
||||||
## Example
|
## Example
|
||||||
|
|
||||||
Example implmentation
|
Example implmentation
|
||||||
|
|
||||||
```py
|
```python
|
||||||
|
|
||||||
class PiiProcessor(Processor):
|
class PiiProcessor(Processor):
|
||||||
config: PiiProcessorConfig
|
config: PiiProcessorConfig
|
||||||
metadata_config: MetadataServerConfig
|
metadata_config: MetadataServerConfig
|
||||||
@ -100,3 +95,4 @@ class PiiProcessor(Processor):
|
|||||||
def get_status(self) -> ProcessorStatus:
|
def get_status(self) -> ProcessorStatus:
|
||||||
return self.status
|
return self.status
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -1,11 +1,10 @@
|
|||||||
#Sink
|
# Sink
|
||||||
|
|
||||||
Sink will get the event emitted by the source, one at a time. It can use this record to make external service calls to store or index etc.. For OpenMetadata
|
Sink will get the event emitted by the source, one at a time. It can use this record to make external service calls to store or index etc.. For OpenMetadata we have [MetadataRestTablesSink](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/sink/metadata_rest_tables.py)
|
||||||
we have [MetadataRestTablesSink](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/sink/metadata_rest_tables.py)
|
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
```py
|
```python
|
||||||
@dataclass # type: ignore[misc]
|
@dataclass # type: ignore[misc]
|
||||||
class Sink(Closeable, metaclass=ABCMeta):
|
class Sink(Closeable, metaclass=ABCMeta):
|
||||||
"""All Sinks must inherit this base class."""
|
"""All Sinks must inherit this base class."""
|
||||||
@ -33,19 +32,17 @@ class Sink(Closeable, metaclass=ABCMeta):
|
|||||||
|
|
||||||
**create** method is called during the workflow instantiation and creates a instance of the sink
|
**create** method is called during the workflow instantiation and creates a instance of the sink
|
||||||
|
|
||||||
**write_record** this method is called for each record coming down in workflow chain and can be used to store the record in external services etc..
|
**write\_record** this method is called for each record coming down in workflow chain and can be used to store the record in external services etc..
|
||||||
|
|
||||||
**get_status** to report the status of the sink ex: how many records, failures or warnings etc..
|
**get\_status** to report the status of the sink ex: how many records, failures or warnings etc..
|
||||||
|
|
||||||
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
|
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
|
||||||
|
|
||||||
|
|
||||||
## Example
|
## Example
|
||||||
|
|
||||||
Example implmentation
|
Example implmentation
|
||||||
|
|
||||||
```py
|
```python
|
||||||
|
|
||||||
class MetadataRestTablesSink(Sink):
|
class MetadataRestTablesSink(Sink):
|
||||||
config: MetadataTablesSinkConfig
|
config: MetadataTablesSinkConfig
|
||||||
status: SinkStatus
|
status: SinkStatus
|
||||||
@ -92,3 +89,4 @@ class MetadataRestTablesSink(Sink):
|
|||||||
def close(self):
|
def close(self):
|
||||||
pass
|
pass
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -1,10 +1,10 @@
|
|||||||
#Source
|
# Source
|
||||||
|
|
||||||
Source is the connector to external systems and outputs a record for downstream to process and push to OpenMetadata.
|
Source is the connector to external systems and outputs a record for downstream to process and push to OpenMetadata.
|
||||||
|
|
||||||
##Source API
|
## Source API
|
||||||
|
|
||||||
```py
|
```python
|
||||||
@dataclass # type: ignore[misc]
|
@dataclass # type: ignore[misc]
|
||||||
class Source(Closeable, metaclass=ABCMeta):
|
class Source(Closeable, metaclass=ABCMeta):
|
||||||
ctx: WorkflowContext
|
ctx: WorkflowContext
|
||||||
@ -31,16 +31,15 @@ class Source(Closeable, metaclass=ABCMeta):
|
|||||||
|
|
||||||
**prepare** will be called through Python's init method. This will be a place where you could make connections to external sources or initiate the client library
|
**prepare** will be called through Python's init method. This will be a place where you could make connections to external sources or initiate the client library
|
||||||
|
|
||||||
**next_record** is where the client can connect to external resource and emit the data downstream
|
**next\_record** is where the client can connect to external resource and emit the data downstream
|
||||||
|
|
||||||
**get_status** is for [workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) to call and report the status of the source such as how many records its processed any failures or warnings
|
|
||||||
|
|
||||||
|
**get\_status** is for [workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) to call and report the status of the source such as how many records its processed any failures or warnings
|
||||||
|
|
||||||
## Example
|
## Example
|
||||||
|
|
||||||
A simple example of this implementation is
|
A simple example of this implementation is
|
||||||
|
|
||||||
```py
|
```python
|
||||||
class SampleTablesSource(Source):
|
class SampleTablesSource(Source):
|
||||||
|
|
||||||
def __init__(self, config: SampleTableSourceConfig, metadata_config: MetadataServerConfig, ctx):
|
def __init__(self, config: SampleTableSourceConfig, metadata_config: MetadataServerConfig, ctx):
|
||||||
|
@ -1,11 +1,10 @@
|
|||||||
#Stage
|
# Stage
|
||||||
|
|
||||||
**Stage** is an optional component in workflow. It can be used to store the records in a
|
**Stage** is an optional component in workflow. It can be used to store the records in a file or data store and can be used to aggregate the work done by a processor.
|
||||||
file or data store and can be used to aggregate the work done by a processor.
|
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
```py
|
```python
|
||||||
@dataclass # type: ignore[misc]
|
@dataclass # type: ignore[misc]
|
||||||
class Stage(Closeable, metaclass=ABCMeta):
|
class Stage(Closeable, metaclass=ABCMeta):
|
||||||
ctx: WorkflowContext
|
ctx: WorkflowContext
|
||||||
@ -30,19 +29,17 @@ class Stage(Closeable, metaclass=ABCMeta):
|
|||||||
|
|
||||||
**create** method is called during the workflow instantiation and creates a instance of the processor
|
**create** method is called during the workflow instantiation and creates a instance of the processor
|
||||||
|
|
||||||
**stage_record** this method is called for each record coming down in workflow chain and can be used to store the record. This method doesn't emit anything for the downstream to process on.
|
**stage\_record** this method is called for each record coming down in workflow chain and can be used to store the record. This method doesn't emit anything for the downstream to process on.
|
||||||
|
|
||||||
**get_status** to report the status of the stage ex: how many records, failures or warnings etc..
|
**get\_status** to report the status of the stage ex: how many records, failures or warnings etc..
|
||||||
|
|
||||||
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
|
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
|
||||||
|
|
||||||
|
|
||||||
## Example
|
## Example
|
||||||
|
|
||||||
Example implmentation
|
Example implmentation
|
||||||
|
|
||||||
```py
|
```python
|
||||||
|
|
||||||
class FileStage(Stage):
|
class FileStage(Stage):
|
||||||
config: FileStageConfig
|
config: FileStageConfig
|
||||||
status: StageStatus
|
status: StageStatus
|
||||||
@ -77,3 +74,4 @@ class FileStage(Stage):
|
|||||||
def close(self):
|
def close(self):
|
||||||
self.file.close()
|
self.file.close()
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
# Schema Language
|
# JSON Schema
|
||||||
|
|
||||||
We use [JSON schema](https://json-schema.org/) as the Schema Definition Language as it offers several advantages:
|
We use [JSON schema](https://json-schema.org/) as the Schema Definition Language as it offers several advantages:
|
||||||
|
|
||||||
|
@ -1,20 +1,18 @@
|
|||||||
# Take it for a spin
|
# Try OpenMetadata
|
||||||
|
|
||||||
We want our users to get the experience OpenMetadata with the least effort 😁. That is why we have set up a [sandbox](https://sandbox.open-metadata.org) that mimics a real production setup. We appreciate it if you take it for a spin and let us know your feedback on our [mailing list](mailto:openmetadata-user@googlegroups.com), or join our [slack channel](https://join.slack.com/t/openmetadata/shared_invite/zt-udl8ris3-Egq~YtJU_yJgJTtROo00dQ) and post a message in the [user](https://openmetadata.slack.com/archives/C02B38JFDDK) channel.
|
We want our users to get the experience of OpenMetadata with the least effort 😁. That is why we have set up a [sandbox](https://sandbox.open-metadata.org) that mimics a real production setup. Please take it for a spin and let us know your feedback on our [mailing list](mailto:openmetadata-user@googlegroups.com), or join our [slack community](https://join.slack.com/t/openmetadata/shared_invite/zt-udl8ris3-Egq~YtJU_yJgJTtROo00dQ) and post a message in the [user](https://openmetadata.slack.com/archives/C02B38JFDDK) channel.
|
||||||
|
|
||||||
Here is what to expect when you are on Sandbox...
|
To set up your sandbox account:
|
||||||
|
|
||||||
### Login using your Google credentials
|
### 1. Login using your Google credentials
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
### Add yourself as a user in the Sandbox
|
### 2. Add yourself as a user in the Sandbox. Pick a few teams to be part of because data is a team game.
|
||||||
|
|
||||||
Pick a few teams to be part of because data is a team game.
|
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
### Try out few things
|
### 3. Try out few things
|
||||||
|
|
||||||
Don't limit yourself to just the callouts. Try other things too. We would love to get your feedback.
|
Don't limit yourself to just the callouts. Try other things too. We would love to get your feedback.
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user