GitBook: [main] 11 pages and one asset modified

This commit is contained in:
Suresh Srinivas 2021-08-17 15:45:52 +00:00 committed by gitbook-bot
parent 2d70350742
commit 856be1f640
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF
13 changed files with 78353 additions and 74 deletions

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

View File

@ -1,10 +1,10 @@
# Introduction
Data is an important asset of an organization and metadata is the key to unlock the value from that asset. It provides crucial context to turn data into information and powers not just the current limited use cases of data discovery, and governance, but also emerging use cases related to data quality, observability, and most importantly people collaboration.
Metadata enables you to unlock the value of data assets in the common use cases of data discovery and governance, but also in emerging use cases related to data quality, observability, and people collaboration. However, poorly organized and managed metadata leads to redundant efforts within organizations and other inefficiencies that are expensive in time and dollars.
Poorly organized metadata is preventing organizations from realizing the full potential of data. Metadata is incorrect, inconsistent, stale, often missing, and fragmented in silos across various disconnected tools in proprietary formats obscuring a holistic picture of data.
### **OpenMetadata is an Open standard for metadata with a centralized metadata store that unifies all the data assets and metadata end-to-end to power data discovery, user collaboration, and tool interoperability.**
### **OpenMetadata is an open standard with a centralized metadata store that unifies all your data assets end-to-end to enable data discovery, user collaboration, and tool interoperability.**
![](.gitbook/assets/openmetadata-overview%20%281%29.png)

View File

@ -1,12 +1,12 @@
# Table of contents
* [Introduction](README.md)
* [Take it for a spin](take-it-for-a-spin.md)
* [Try OpenMetadata](take-it-for-a-spin.md)
## OpenMetadata APIs
* [Schemas](openmetadata-apis/schemas/README.md)
* [Schema Language](openmetadata-apis/schemas/schema-language.md)
* [JSON Schema](openmetadata-apis/schemas/schema-language.md)
* [Schema Concepts](openmetadata-apis/schemas/overview.md)
* [OpenMetadata Types](openmetadata-apis/schemas/types/README.md)
* [Basic Types](openmetadata-apis/schemas/types/basic.md)
@ -77,11 +77,11 @@
* [Coding Style](open-source-community/developer/coding-style.md)
* [Build the code & run tests](open-source-community/developer/build-code-run-tests.md)
* [Build a Connector](open-source-community/developer/build-a-connector/README.md)
* [Source](open-source-community/developer/build-a-connector/source.md)
* [Processor](open-source-community/developer/build-a-connector/processor.md)
* [Sink](open-source-community/developer/build-a-connector/sink.md)
* [Stage](open-source-community/developer/build-a-connector/stage.md)
* [BulkSink](open-source-community/developer/build-a-connector/bulksink.md)
* [Source](open-source-community/developer/build-a-connector/source.md)
* [Processor](open-source-community/developer/build-a-connector/processor.md)
* [Sink](open-source-community/developer/build-a-connector/sink.md)
* [Stage](open-source-community/developer/build-a-connector/stage.md)
* [BulkSink](open-source-community/developer/build-a-connector/bulksink.md)
* [Run Integration Tests](open-source-community/developer/run-integration-tests.md)
* [UX Style Guide](open-source-community/developer/ux-style-guide.md)

View File

@ -47,7 +47,7 @@ Different Connectors require different dependencies, please go through [Connecto
Loads all the Json connectors inside the pipeline directory as cron jobs.
![](../../.gitbook/assets/screenshot-from-2021-07-26-21-08-17%20%281%29%20%282%29%20%282%29%20%282%29%20%283%29%20%284%29%20%284%29%20%285%29%20%283%29%20%281%29%20%284%29.png)
![](../../.gitbook/assets/screenshot-from-2021-07-26-21-08-17%20%281%29%20%282%29%20%282%29%20%282%29%20%283%29%20%284%29%20%284%29%20%285%29%20%283%29%20%281%29%20%285%29.png)
### Custom run a job

View File

@ -1,21 +1,18 @@
---
description: >-
This design doc will walk through developing a connector for OpenMetadata
description: This design doc will walk through developing a connector for OpenMetadata
---
# Ingestion API
# Build a Connector
Ingestion is a simple python framework to ingest the metadata from various sources.
Please look at our framework [APIs](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/src/metadata/ingestion/api)
## Workflow
[workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) is a simple orchestration job that runs the components in an Order.
It consists of [Source](./source.md) ,[Processor](./processor.md), [Sink](./sink.md) . It also provides support for [Stage](./stage.md) , [BulkSink](./bulksink.md)
It consists of [Source](source.md) ,[Processor](processor.md), [Sink](sink.md) . It also provides support for [Stage](stage.md) , [BulkSink](bulksink.md)
Workflow execution happens in serial fashion.
@ -27,8 +24,6 @@ Workflow execution happens in serial fashion.
In the cases where we need to aggregation over the records, we can use **stage** to write to a file or other store. Use the file written to in **stage** and pass it to **bulksink** to publish to external services such as **openmetadata** or **elasticsearch**
{% page-ref page="source.md" %}
{% page-ref page="processor.md" %}
@ -39,7 +34,3 @@ In the cases where we need to aggregation over the records, we can use **stage**
{% page-ref page="bulksink.md" %}

View File

@ -1,11 +1,10 @@
#BulkSink
# BulkSink
**BulkSink** is an optional component in workflow. It can be used to bulk update the records
generated in a workflow. It needs to be used in conjuction with Stage
**BulkSink** is an optional component in workflow. It can be used to bulk update the records generated in a workflow. It needs to be used in conjuction with Stage
## API
```py
```python
@dataclass # type: ignore[misc]
class BulkSink(Closeable, metaclass=ABCMeta):
ctx: WorkflowContext
@ -30,14 +29,12 @@ class BulkSink(Closeable, metaclass=ABCMeta):
**create** method is called during the workflow instantiation and creates a instance of the bulksink
**write_records** this method is called only once in Workflow. Its developer responsibility to make bulk actions inside this method. Such as read the entire file or store to generate
the API calls to external services
**write\_records** this method is called only once in Workflow. Its developer responsibility to make bulk actions inside this method. Such as read the entire file or store to generate the API calls to external services
**get_status** to report the status of the bulk_sink ex: how many records, failures or warnings etc..
**get\_status** to report the status of the bulk\_sink ex: how many records, failures or warnings etc..
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
## Example
[Example implmentation](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/bulksink/metadata_usage.py#L36)

View File

@ -1,13 +1,10 @@
#Processor
**Processor** is an optional component in workflow. It can be used to modify the record
coming from sources. Processor receives a record from source and can modify and re-emit the
event back to workflow.
# Processor
**Processor** is an optional component in workflow. It can be used to modify the record coming from sources. Processor receives a record from source and can modify and re-emit the event back to workflow.
## API
```py
```python
@dataclass
class Processor(Closeable, metaclass=ABCMeta):
ctx: WorkflowContext
@ -34,17 +31,15 @@ class Processor(Closeable, metaclass=ABCMeta):
**process** this method is called for each record coming down in workflow chain and can be used to modify or enrich the record
**get_status** to report the status of the processor ex: how many records, failures or warnings etc..
**get\_status** to report the status of the processor ex: how many records, failures or warnings etc..
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
## Example
Example implmentation
```py
```python
class PiiProcessor(Processor):
config: PiiProcessorConfig
metadata_config: MetadataServerConfig
@ -100,3 +95,4 @@ class PiiProcessor(Processor):
def get_status(self) -> ProcessorStatus:
return self.status
```

View File

@ -1,11 +1,10 @@
#Sink
# Sink
Sink will get the event emitted by the source, one at a time. It can use this record to make external service calls to store or index etc.. For OpenMetadata
we have [MetadataRestTablesSink](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/sink/metadata_rest_tables.py)
Sink will get the event emitted by the source, one at a time. It can use this record to make external service calls to store or index etc.. For OpenMetadata we have [MetadataRestTablesSink](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/sink/metadata_rest_tables.py)
## API
```py
```python
@dataclass # type: ignore[misc]
class Sink(Closeable, metaclass=ABCMeta):
"""All Sinks must inherit this base class."""
@ -33,19 +32,17 @@ class Sink(Closeable, metaclass=ABCMeta):
**create** method is called during the workflow instantiation and creates a instance of the sink
**write_record** this method is called for each record coming down in workflow chain and can be used to store the record in external services etc..
**write\_record** this method is called for each record coming down in workflow chain and can be used to store the record in external services etc..
**get_status** to report the status of the sink ex: how many records, failures or warnings etc..
**get\_status** to report the status of the sink ex: how many records, failures or warnings etc..
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
## Example
Example implmentation
```py
```python
class MetadataRestTablesSink(Sink):
config: MetadataTablesSinkConfig
status: SinkStatus
@ -92,3 +89,4 @@ class MetadataRestTablesSink(Sink):
def close(self):
pass
```

View File

@ -1,10 +1,10 @@
#Source
# Source
Source is the connector to external systems and outputs a record for downstream to process and push to OpenMetadata.
##Source API
## Source API
```py
```python
@dataclass # type: ignore[misc]
class Source(Closeable, metaclass=ABCMeta):
ctx: WorkflowContext
@ -31,16 +31,15 @@ class Source(Closeable, metaclass=ABCMeta):
**prepare** will be called through Python's init method. This will be a place where you could make connections to external sources or initiate the client library
**next_record** is where the client can connect to external resource and emit the data downstream
**get_status** is for [workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) to call and report the status of the source such as how many records its processed any failures or warnings
**next\_record** is where the client can connect to external resource and emit the data downstream
**get\_status** is for [workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) to call and report the status of the source such as how many records its processed any failures or warnings
## Example
A simple example of this implementation is
```py
```python
class SampleTablesSource(Source):
def __init__(self, config: SampleTableSourceConfig, metadata_config: MetadataServerConfig, ctx):

View File

@ -1,11 +1,10 @@
#Stage
# Stage
**Stage** is an optional component in workflow. It can be used to store the records in a
file or data store and can be used to aggregate the work done by a processor.
**Stage** is an optional component in workflow. It can be used to store the records in a file or data store and can be used to aggregate the work done by a processor.
## API
```py
```python
@dataclass # type: ignore[misc]
class Stage(Closeable, metaclass=ABCMeta):
ctx: WorkflowContext
@ -30,19 +29,17 @@ class Stage(Closeable, metaclass=ABCMeta):
**create** method is called during the workflow instantiation and creates a instance of the processor
**stage_record** this method is called for each record coming down in workflow chain and can be used to store the record. This method doesn't emit anything for the downstream to process on.
**stage\_record** this method is called for each record coming down in workflow chain and can be used to store the record. This method doesn't emit anything for the downstream to process on.
**get_status** to report the status of the stage ex: how many records, failures or warnings etc..
**get\_status** to report the status of the stage ex: how many records, failures or warnings etc..
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
## Example
Example implmentation
```py
```python
class FileStage(Stage):
config: FileStageConfig
status: StageStatus
@ -77,3 +74,4 @@ class FileStage(Stage):
def close(self):
self.file.close()
```

View File

@ -1,4 +1,4 @@
# Schema Language
# JSON Schema
We use [JSON schema](https://json-schema.org/) as the Schema Definition Language as it offers several advantages:

View File

@ -1,20 +1,18 @@
# Take it for a spin
# Try OpenMetadata
We want our users to get the experience OpenMetadata with the least effort 😁. That is why we have set up a [sandbox](https://sandbox.open-metadata.org) that mimics a real production setup. We appreciate it if you take it for a spin and let us know your feedback on our [mailing list](mailto:openmetadata-user@googlegroups.com), or join our [slack channel](https://join.slack.com/t/openmetadata/shared_invite/zt-udl8ris3-Egq~YtJU_yJgJTtROo00dQ) and post a message in the [user](https://openmetadata.slack.com/archives/C02B38JFDDK) channel.
We want our users to get the experience of OpenMetadata with the least effort 😁. That is why we have set up a [sandbox](https://sandbox.open-metadata.org) that mimics a real production setup. Please take it for a spin and let us know your feedback on our [mailing list](mailto:openmetadata-user@googlegroups.com), or join our [slack community](https://join.slack.com/t/openmetadata/shared_invite/zt-udl8ris3-Egq~YtJU_yJgJTtROo00dQ) and post a message in the [user](https://openmetadata.slack.com/archives/C02B38JFDDK) channel.
Here is what to expect when you are on Sandbox...
To set up your sandbox account:
### Login using your Google credentials
### 1. Login using your Google credentials
![](.gitbook/assets/welcome.png)
### Add yourself as a user in the Sandbox
Pick a few teams to be part of because data is a team game.
### 2. Add yourself as a user in the Sandbox. Pick a few teams to be part of because data is a team game.
![](.gitbook/assets/create-user.png)
### Try out few things
### 3. Try out few things
Don't limit yourself to just the callouts. Try other things too. We would love to get your feedback.