GitBook: [0.4.0] 11 pages modified

This commit is contained in:
Sachin Chaurasiya 2021-09-20 07:39:48 +00:00 committed by gitbook-bot
parent 1703a888b5
commit faed93e03d
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF
11 changed files with 39 additions and 39 deletions

View File

@ -51,7 +51,7 @@ metadata ingest -c ./examples/workflows/redshift_usage.json
1. **username** - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation
2. **password** - password for the username
3. **service\_name** - Service Name for this Redshift cluster. If you added Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
3. **service\_name** - Service Name for this Redshift cluster. If you added the Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
## Publish to OpenMetadata

View File

@ -12,7 +12,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
1. Python 3.7 or above
{% endhint %}
### Install from PyPi
### Install from PyPI
{% tabs %}
{% tab title="Install Using PyPi" %}

View File

@ -14,15 +14,15 @@ Please look at our framework [APIs](https://github.com/open-metadata/OpenMetadat
A workflow consists of [Source](source.md), [Processor](processor.md) and [Sink](sink.md). It also provides support for [Stage](stage.md) and [BulkSink](bulksink.md).
Workflow execution happens in serial fashion.
Workflow execution happens in a serial fashion.
1. **Workflow** runs the **source** component first. The **source** retrieves a record from external sources and emits the record downstream.
1. The **Workflow** runs the **source** component first. The **source** retrieves a record from external sources and emits the record downstream.
2. If the **processor** component is configured, the **workflow** sends the record to the **processor** next.
3. There can be multiple **processor** components attached to the **workflow**. The **workflow** passes a record to each **processor** in the order they are configured.
4. Once a **processor** is finished, it sends the modified record to **sink**.
4. Once a **processor** is finished, it sends the modified record to the **sink**.
5. The above steps are repeated for each record emitted from the **source**.
In the cases where we need aggregation over the records, we can use **stage** to write to a file or other store. Use the file written to in **stage** and pass it to **bulksink** to publish to external services such as **openmetadata** or **elasticsearch**.
In the cases where we need aggregation over the records, we can use the **stage** to write to a file or other store. Use the file written to in **stage** and pass it to **bulk sink** to publish to external services such as **openmetadata** or **elasticsearch**.
{% page-ref page="source.md" %}

View File

@ -1,6 +1,6 @@
# BulkSink
**BulkSink** is an optional component in workflow. It can be used to bulk update the records generated in a workflow. It needs to be used in conjuction with Stage
**BulkSink** is an optional component in the workflow. It can be used to bulk update the records generated in a workflow. It needs to be used in conjunction with Stage
## API
@ -27,15 +27,15 @@ class BulkSink(Closeable, metaclass=ABCMeta):
pass
```
**create** method is called during the workflow instantiation and creates a instance of the bulksink
**create** method is called during the workflow instantiation and creates an instance of the bulk sink
**write\_records** this method is called only once in Workflow. Its developer responsibility to make bulk actions inside this method. Such as read the entire file or store to generate the API calls to external services
**write\_records** this method is called only once in Workflow. Its developer responsibility is to make bulk actions inside this method. Such as read the entire file or store to generate the API calls to external services
**get\_status** to report the status of the bulk\_sink ex: how many records, failures or warnings etc..
**get\_status** to report the status of the bulk\_sink ex: how many records, failures or warnings etc.
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
**close** gets called before the workflow stops. Can be used to clean up any connections or other resources.
## Example
[Example implmentation](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/bulksink/metadata_usage.py#L36)
[Example implementation](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/bulksink/metadata_usage.py#L36)

View File

@ -1,6 +1,6 @@
# Processor
**Processor** is an optional component in workflow. It can be used to modify the record coming from sources. Processor receives a record from source and can modify and re-emit the event back to workflow.
The **Processor** is an optional component in the workflow. It can be used to modify the record coming from sources. The Processor receives a record from The source and can modify and re-emit the event back to the workflow.
## API
@ -27,17 +27,17 @@ class Processor(Closeable, metaclass=ABCMeta):
pass
```
**create** method is called during the workflow instantiation and creates a instance of the processor
**create** method is called during the workflow instantiation and creates an instance of the processor.
**process** this method is called for each record coming down in workflow chain and can be used to modify or enrich the record
**process** this method is called for each record coming down in the workflow chain and can be used to modify or enrich the record
**get\_status** to report the status of the processor ex: how many records, failures or warnings etc..
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
**close** gets called before the workflow stops. Can be used to clean up any connections or other resources.
## Example
Example implmentation
Example implementation
```python
class PiiProcessor(Processor):

View File

@ -1,6 +1,6 @@
# Sink
Sink will get the event emitted by the source, one at a time. It can use this record to make external service calls to store or index etc.. For OpenMetadata we have [MetadataRestTablesSink](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/sink/metadata_rest_tables.py)
The Sink will get the event emitted by the source, one at a time. It can use this record to make external service calls to store or index etc.For OpenMetadata we have [MetadataRestTablesSink](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/sink/metadata_rest_tables.py)
## API
@ -30,17 +30,17 @@ class Sink(Closeable, metaclass=ABCMeta):
pass
```
**create** method is called during the workflow instantiation and creates a instance of the sink
**create** method is called during the workflow instantiation and creates an instance of the sink
**write\_record** this method is called for each record coming down in workflow chain and can be used to store the record in external services etc..
**write\_record** this method is called for each record coming down in the workflow chain and can be used to store the record in external services etc.
**get\_status** to report the status of the sink ex: how many records, failures or warnings etc..
**get\_status** to report the status of the sink ex: how many records, failures or warnings etc.
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
**close** gets called before the workflow stops. Can be used to clean up any connections or other resources.
## Example
Example implmentation
Example implementation
```python
class MetadataRestTablesSink(Sink):

View File

@ -1,6 +1,6 @@
# Source
Source is the connector to external systems and outputs a record for downstream to process and push to OpenMetadata.
The Source is the connector to external systems and outputs a record for downstream to process and push to OpenMetadata.
## Source API
@ -31,9 +31,9 @@ class Source(Closeable, metaclass=ABCMeta):
**prepare** will be called through Python's init method. This will be a place where you could make connections to external sources or initiate the client library
**next\_record** is where the client can connect to external resource and emit the data downstream
**next\_record** is where the client can connect to an external resource and emit the data downstream
**get\_status** is for [workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) to call and report the status of the source such as how many records its processed any failures or warnings
**get\_status** is for the [workflow](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/api/workflow.py) to call and report the status of the source such as how many records its processed any failures or warnings
## Example

View File

@ -1,6 +1,6 @@
# Stage
**Stage** is an optional component in workflow. It can be used to store the records in a file or data store and can be used to aggregate the work done by a processor.
The **Stage** is an optional component in the workflow. It can be used to store the records in a file or data store and can be used to aggregate the work done by a processor.
## API
@ -27,17 +27,17 @@ class Stage(Closeable, metaclass=ABCMeta):
pass
```
**create** method is called during the workflow instantiation and creates a instance of the processor
**create** method is called during the workflow instantiation and creates an instance of the processor
**stage\_record** this method is called for each record coming down in workflow chain and can be used to store the record. This method doesn't emit anything for the downstream to process on.
**stage\_record** this method is called for each record coming down in the workflow chain and can be used to store the record. This method doesn't emit anything for the downstream to process on.
**get\_status** to report the status of the stage ex: how many records, failures or warnings etc..
**get\_status** to report the status of the stage ex: how many records, failures or warnings etc.
**close** gets called before the workflow stops. Can be used to cleanup any connections or other resources.
**close** gets called before the workflow stops. Can be used to clean up any connections or other resources.
## Example
Example implmentation
Example implementation
```python
class FileStage(Stage):

View File

@ -10,7 +10,7 @@ The following commands must be run from the top-level directory.
`mvn clean install`
If you wish to skip the unit tests you can do this by adding `-DskipTests` to the command line.
If you wish to skip the unit tests you can do this by adding`-DskipTests`to the command line.
## Create a distribution \(packaging\)

View File

@ -61,7 +61,7 @@ description: >-
```
3. Logging statements should be complete sentences with proper capitalization that are written to be read by a person not necessarily familiar with the source code.
4. String appending using StringBuilders should not be used for building log messages.
4. String appending using StringBuilder should not be used for building log messages.
Formatting should be used. For example:

View File

@ -6,13 +6,13 @@ We ❤️ all contributions, big and small!
## Github issues
Look for issues under [github/issues tab](https://github.com/open-metadata/OpenMetadata/issues) . If you have a feature request or found a bug please file an issue. This will help us track and will help community overall as well.
Look for issues under [Github/issues tab](https://github.com/open-metadata/OpenMetadata/issues). If you have a feature request or found a bug please file an issue. This will help us track and will help the community overall as well.
![./images/new-issue.png](../../.gitbook/assets/new-issue.png)
## Fork Github project
OpenMetadata Github repository can be accessed here [https://github.com/open-metadata/OpenMetadata](https://github.com/open-metadata/OpenMetadata) .
OpenMetadata Github repository can be accessed here [https://github.com/open-metadata/OpenMetadata](https://github.com/open-metadata/OpenMetadata).
![./images/fork-github.png](../../.gitbook/assets/fork-github%20%281%29.png)
@ -22,7 +22,7 @@ Create a local clone of your fork
git clone https://github.com/<username>/OpenMetadata.git
```
Set a new remote repository that points to the OpenMetadata repository to pull changes from the open source OpenMetadata codebase into your clone
Set a new remote repository that points to the OpenMetadata repository to pull changes from the open-source OpenMetadata codebase into your clone
```bash
cd OpenMetadata/
@ -36,7 +36,7 @@ git remote -v
git checkout -b ISSUE-200
```
Make changes. Follow the [Coding Style](coding-style.md) Guide on best practices and [Build the code & run tests](build-code-run-tests.md) on how to setup Intellij, Maven
Make changes. Follow the [Coding Style](coding-style.md) Guide on best practices and [Build the code & run tests](build-code-run-tests.md) on how to set up IntelliJ, Maven.
## Push your changes to Github
@ -59,5 +59,5 @@ git push origin HEAD:refs/heads/issue-200
## We are here to help
Please reach out to us anytime you need any help. [Slack](https://openmetadata.slack.com/join/shared_invite/zt-udl8ris3-Egq~YtJU_yJgJTtROo00dQ#/shared-invite/email) would be fastest way to get a response.
Please reach out to us anytime you need any help. [Slack](https://openmetadata.slack.com/join/shared_invite/zt-udl8ris3-Egq~YtJU_yJgJTtROo00dQ#/shared-invite/email) would be the fastest way to get a response.