GitBook: [main] 11 pages and one asset modified

2025-12-16 10:08:08 +00:00 · 2021-08-17 15:45:52 +00:00 · 2021-08-17 15:45:52 +00:00 · 856be1f640
commit 856be1f640
parent 2d70350742
13 changed files with 78353 additions and 74 deletions
--- a/docs/.gitbook/assets/openmetadata-style-guide
+++ b/docs/.gitbook/assets/openmetadata-style-guide
--- a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
+++ b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
--- a/docs/README.md
+++ b/docs/README.md
@ -1,10 +1,10 @@
 # Introduction

-Data is an important asset of an organization and metadata is the key to unlock the value from that asset. It provides crucial context to turn data into information and powers not just the current limited use cases of data discovery, and governance, but also emerging use cases related to data quality, observability, and most importantly people collaboration. 
+Metadata enables you to unlock the value of data assets in the common use cases of data discovery and governance, but also in emerging use cases related to data quality, observability, and people collaboration. However, poorly organized and managed metadata leads to redundant efforts within organizations and other inefficiencies that are expensive in time and dollars.

 Poorly organized metadata is preventing organizations from realizing the full potential of data. Metadata is incorrect, inconsistent, stale, often missing, and fragmented in silos across various disconnected tools in proprietary formats obscuring a holistic picture of data.

-### **OpenMetadata is an Open standard for metadata with a centralized metadata store that unifies all the data assets and metadata end-to-end to power data discovery, user collaboration,  and tool interoperability.**   
+### **OpenMetadata is an open standard with a centralized metadata store that unifies all your data assets end-to-end to enable data discovery, user collaboration, and tool interoperability.** 

 ![](.gitbook/assets/openmetadata-overview%20%281%29.png)

--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@ -1,12 +1,12 @@
 # Table of contents

 * [Introduction](README.md)
-* [Take it for a spin](take-it-for-a-spin.md)
+* [Try OpenMetadata](take-it-for-a-spin.md)

 ## OpenMetadata APIs

 * [Schemas](openmetadata-apis/schemas/README.md)
-  * [Schema Language](openmetadata-apis/schemas/schema-language.md)
+  * [JSON Schema](openmetadata-apis/schemas/schema-language.md)
  * [Schema Concepts](openmetadata-apis/schemas/overview.md)
  * [OpenMetadata Types](openmetadata-apis/schemas/types/README.md)
    * [Basic Types](openmetadata-apis/schemas/types/basic.md)
@ -77,11 +77,11 @@
  * [Coding Style](open-source-community/developer/coding-style.md)
  * [Build the code & run tests](open-source-community/developer/build-code-run-tests.md)
  * [Build a Connector](open-source-community/developer/build-a-connector/README.md)
-	 * [Source](open-source-community/developer/build-a-connector/source.md)
-	 * [Processor](open-source-community/developer/build-a-connector/processor.md)
-	 * [Sink](open-source-community/developer/build-a-connector/sink.md)
-	 * [Stage](open-source-community/developer/build-a-connector/stage.md)
-	 * [BulkSink](open-source-community/developer/build-a-connector/bulksink.md)
+    * [Source](open-source-community/developer/build-a-connector/source.md)
+    * [Processor](open-source-community/developer/build-a-connector/processor.md)
+    * [Sink](open-source-community/developer/build-a-connector/sink.md)
+    * [Stage](open-source-community/developer/build-a-connector/stage.md)
+    * [BulkSink](open-source-community/developer/build-a-connector/bulksink.md)
  * [Run Integration Tests](open-source-community/developer/run-integration-tests.md)
  * [UX Style Guide](open-source-community/developer/ux-style-guide.md)

--- a/docs/install/metadata-ingestion/scheduler.md
+++ b/docs/install/metadata-ingestion/scheduler.md
@ -47,7 +47,7 @@ Different Connectors require different dependencies, please go through [Connecto

 Loads all the Json connectors inside the pipeline directory as cron jobs.

-![](../../.gitbook/assets/screenshot-from-2021-07-26-21-08-17%20%281%29%20%282%29%20%282%29%20%282%29%20%283%29%20%284%29%20%284%29%20%285%29%20%283%29%20%281%29%20%284%29.png)
+![](../../.gitbook/assets/screenshot-from-2021-07-26-21-08-17%20%281%29%20%282%29%20%282%29%20%282%29%20%283%29%20%284%29%20%284%29%20%285%29%20%283%29%20%281%29%20%285%29.png)

 ### Custom run a job

--- a/docs/open-source-community/developer/build-a-connector/README.md
+++ b/docs/open-source-community/developer/build-a-connector/README.md
@ -1,21 +1,18 @@
 ---
-description: >-
-  This design doc will walk through developing a connector for OpenMetadata
+description: This design doc will walk through developing a connector for OpenMetadata
 ---

-
-# Ingestion API
+# Build a Connector

 Ingestion is a simple python framework to ingest the metadata from various sources.

 Please look at our framework [APIs](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/src/metadata/ingestion/api)

-
 ## Workflow

 [workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) is a simple orchestration job that runs the components in an Order.

-It consists of [Source](./source.md) ,[Processor](./processor.md), [Sink](./sink.md) .  It also provides support for [Stage](./stage.md) , [BulkSink](./bulksink.md)
+It consists of [Source](source.md) ,[Processor](processor.md), [Sink](sink.md) . It also provides support for [Stage](stage.md) , [BulkSink](bulksink.md)

 Workflow execution happens in serial fashion.

@ -27,8 +24,6 @@ Workflow execution happens in serial fashion.

 In the cases where we need to aggregation over the records, we can use **stage** to write to a file or other store. Use the file written to in **stage** and pass it to **bulksink** to publish to external services such as **openmetadata** or **elasticsearch**

-
-
 {% page-ref page="source.md" %}

 {% page-ref page="processor.md" %}
@ -39,7 +34,3 @@ In the cases where we need to aggregation over the records, we can use **stage**

 {% page-ref page="bulksink.md" %}

-
-
-
-
--- a/docs/open-source-community/developer/build-a-connector/bulksink.md
+++ b/docs/open-source-community/developer/build-a-connector/bulksink.md
@ -1,11 +1,10 @@
-#BulkSink 
+# BulkSink

-**BulkSink** is an optional component in workflow. It can be used to bulk update the records
-generated in a workflow. It needs to be used in conjuction with Stage 
+**BulkSink** is an optional component in workflow. It can be used to bulk update the records generated in a workflow. It needs to be used in conjuction with Stage

 ## API

-```py
+```python
@dataclass  # type: ignore[misc]
 class BulkSink(Closeable, metaclass=ABCMeta):
    ctx: WorkflowContext
@ -30,14 +29,12 @@ class BulkSink(Closeable, metaclass=ABCMeta):

 **create** method is called during the workflow instantiation and creates a instance of the bulksink

-**write_records** this method is called only once in Workflow. Its developer responsibility to make bulk actions inside this method. Such as read the entire file or store to generate
-the API calls to external services
+**write\_records** this method is called only once in Workflow. Its developer responsibility to make bulk actions inside this method. Such as read the entire file or store to generate the API calls to external services

-**get_status** to report the status of the bulk_sink ex: how many records, failures or warnings etc..
+**get\_status** to report the status of the bulk\_sink ex: how many records, failures or warnings etc..

 **close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.

-
 ## Example

 [Example implmentation](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/bulksink/metadata_usage.py#L36)
--- a/docs/open-source-community/developer/build-a-connector/processor.md
+++ b/docs/open-source-community/developer/build-a-connector/processor.md
@ -1,13 +1,10 @@
-#Processor 
-
-**Processor** is an optional component in workflow. It can be used to modify the record
-coming from sources. Processor receives a record from source and can modify and re-emit the
-event back to workflow.
+# Processor

+**Processor** is an optional component in workflow. It can be used to modify the record coming from sources. Processor receives a record from source and can modify and re-emit the event back to workflow.

 ## API

-```py
+```python
@dataclass
 class Processor(Closeable, metaclass=ABCMeta):
    ctx: WorkflowContext
@ -34,17 +31,15 @@ class Processor(Closeable, metaclass=ABCMeta):

 **process** this method is called for each record coming down in workflow chain and can be used to modify or enrich the record

-**get_status** to report the status of the processor ex: how many records, failures or warnings etc..
+**get\_status** to report the status of the processor ex: how many records, failures or warnings etc..

 **close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.

-
 ## Example

 Example implmentation

-```py
-
+```python
 class PiiProcessor(Processor):
    config: PiiProcessorConfig
    metadata_config: MetadataServerConfig
@ -100,3 +95,4 @@ class PiiProcessor(Processor):
    def get_status(self) -> ProcessorStatus:
        return self.status
 ```
+
--- a/docs/open-source-community/developer/build-a-connector/sink.md
+++ b/docs/open-source-community/developer/build-a-connector/sink.md
@ -1,11 +1,10 @@
-#Sink 
+# Sink

-Sink will get the event emitted by the source, one at a time. It can use this record to make external service calls to store or index etc.. For OpenMetadata
-we have [MetadataRestTablesSink](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/sink/metadata_rest_tables.py)
+Sink will get the event emitted by the source, one at a time. It can use this record to make external service calls to store or index etc.. For OpenMetadata we have [MetadataRestTablesSink](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/sink/metadata_rest_tables.py)

 ## API

-```py
+```python
@dataclass  # type: ignore[misc]
 class Sink(Closeable, metaclass=ABCMeta):
    """All Sinks must inherit this base class."""
@ -33,19 +32,17 @@ class Sink(Closeable, metaclass=ABCMeta):

 **create** method is called during the workflow instantiation and creates a instance of the sink

-**write_record** this method is called for each record coming down in workflow chain and can be used to store the record in external services etc..
+**write\_record** this method is called for each record coming down in workflow chain and can be used to store the record in external services etc..

-**get_status** to report the status of the sink ex: how many records, failures or warnings etc..
+**get\_status** to report the status of the sink ex: how many records, failures or warnings etc..

 **close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.

-
 ## Example

 Example implmentation

-```py
-
+```python
 class MetadataRestTablesSink(Sink):
    config: MetadataTablesSinkConfig
    status: SinkStatus
@ -92,3 +89,4 @@ class MetadataRestTablesSink(Sink):
    def close(self):
        pass
 ```
+
--- a/docs/open-source-community/developer/build-a-connector/source.md
+++ b/docs/open-source-community/developer/build-a-connector/source.md
@ -1,10 +1,10 @@
-#Source
+# Source

 Source is the connector to external systems and outputs a record for downstream to process and push to OpenMetadata.

-##Source API
+## Source API

-```py
+```python
@dataclass  # type: ignore[misc]
 class Source(Closeable, metaclass=ABCMeta):
    ctx: WorkflowContext
@ -31,16 +31,15 @@ class Source(Closeable, metaclass=ABCMeta):

 **prepare** will be called through Python's init method. This will be a place where you could make connections to external sources or initiate the client library

-**next_record** is where the client can connect to external resource and emit the data downstream
-
-**get_status** is for [workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) to call and report the status of the source such as how many records its processed any failures or warnings
+**next\_record** is where the client can connect to external resource and emit the data downstream

+**get\_status** is for [workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) to call and report the status of the source such as how many records its processed any failures or warnings

 ## Example

 A simple example of this implementation is

-```py
+```python
 class SampleTablesSource(Source):

    def __init__(self, config: SampleTableSourceConfig, metadata_config: MetadataServerConfig, ctx):
--- a/docs/open-source-community/developer/build-a-connector/stage.md
+++ b/docs/open-source-community/developer/build-a-connector/stage.md
@ -1,11 +1,10 @@
-#Stage 
+# Stage

-**Stage** is an optional component in workflow. It can be used to store the records in a
-file or data store and can be used to aggregate the work done by a processor.
+**Stage** is an optional component in workflow. It can be used to store the records in a file or data store and can be used to aggregate the work done by a processor.

 ## API

-```py
+```python
@dataclass  # type: ignore[misc]
 class Stage(Closeable, metaclass=ABCMeta):
    ctx: WorkflowContext
@ -30,19 +29,17 @@ class Stage(Closeable, metaclass=ABCMeta):

 **create** method is called during the workflow instantiation and creates a instance of the processor

-**stage_record** this method is called for each record coming down in workflow chain and can be used to store the record. This method doesn't emit anything for the downstream to process on.
+**stage\_record** this method is called for each record coming down in workflow chain and can be used to store the record. This method doesn't emit anything for the downstream to process on.

-**get_status** to report the status of the stage ex: how many records, failures or warnings etc..
+**get\_status** to report the status of the stage ex: how many records, failures or warnings etc..

 **close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.

-
 ## Example

 Example implmentation

-```py
-
+```python
 class FileStage(Stage):
    config: FileStageConfig
    status: StageStatus
@ -77,3 +74,4 @@ class FileStage(Stage):
    def close(self):
        self.file.close()
 ```
+
--- a/docs/openmetadata-apis/schemas/schema-language.md
+++ b/docs/openmetadata-apis/schemas/schema-language.md
@ -1,4 +1,4 @@
-# Schema Language
+# JSON Schema

 We use [JSON schema](https://json-schema.org/) as the Schema Definition Language as it offers several advantages:

--- a/docs/take-it-for-a-spin.md
+++ b/docs/take-it-for-a-spin.md
@ -1,20 +1,18 @@
-# Take it for a spin
+# Try OpenMetadata

-We want our users to get the experience OpenMetadata with the least effort 😁. That is why we have set up a [sandbox](https://sandbox.open-metadata.org) that mimics a real production setup. We appreciate it if you take it for a spin and let us know your feedback on our [mailing list](mailto:openmetadata-user@googlegroups.com), or join our [slack channel](https://join.slack.com/t/openmetadata/shared_invite/zt-udl8ris3-Egq~YtJU_yJgJTtROo00dQ) and post a message in the [user](https://openmetadata.slack.com/archives/C02B38JFDDK) channel.
+We want our users to get the experience of OpenMetadata with the least effort 😁. That is why we have set up a [sandbox](https://sandbox.open-metadata.org) that mimics a real production setup. Please take it for a spin and let us know your feedback on our [mailing list](mailto:openmetadata-user@googlegroups.com), or join our [slack community](https://join.slack.com/t/openmetadata/shared_invite/zt-udl8ris3-Egq~YtJU_yJgJTtROo00dQ) and post a message in the [user](https://openmetadata.slack.com/archives/C02B38JFDDK) channel.

-Here is what to expect when you are on Sandbox...
+To set up your sandbox account:

-### Login using your Google credentials
+### 1. Login using your Google credentials

 ![](.gitbook/assets/welcome.png)

-### Add yourself as a user in the Sandbox
-
-Pick a few teams to be part of because data is a team game.
+### 2. Add yourself as a user in the Sandbox. Pick a few teams to be part of because data is a team game.

 ![](.gitbook/assets/create-user.png)

-### Try out few things
+### 3. Try out few things

 Don't limit yourself to just the callouts. Try other things too. We would love to get your feedback.