docs(ingest): clarify adding source guide (#9161)

This commit is contained in:
Harshal Sheth 2023-11-06 12:47:07 -08:00 committed by GitHub
parent 81daae815a
commit 02156662b5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -6,7 +6,7 @@ There are two ways of adding a metadata ingestion source.
2. You are writing the custom source for yourself and are not going to contribute back (yet). 2. You are writing the custom source for yourself and are not going to contribute back (yet).
If you are going for case (1) just follow the steps 1 to 9 below. In case you are building it for yourself you can skip If you are going for case (1) just follow the steps 1 to 9 below. In case you are building it for yourself you can skip
steps 4-9 (but maybe write tests and docs for yourself as well) and follow the documentation steps 4-8 (but maybe write tests and docs for yourself as well) and follow the documentation
on [how to use custom ingestion sources](../docs/how/add-custom-ingestion-source.md) on [how to use custom ingestion sources](../docs/how/add-custom-ingestion-source.md)
without forking Datahub. without forking Datahub.
@ -27,6 +27,7 @@ from `ConfigModel`. The [file source](./src/datahub/ingestion/source/file.py) is
We use [pydantic](https://pydantic-docs.helpmanual.io) conventions for documenting configuration flags. Use the `description` attribute to write rich documentation for your configuration field. We use [pydantic](https://pydantic-docs.helpmanual.io) conventions for documenting configuration flags. Use the `description` attribute to write rich documentation for your configuration field.
For example, the following code: For example, the following code:
```python ```python
from pydantic import Field from pydantic import Field
from datahub.api.configuration.common import ConfigModel from datahub.api.configuration.common import ConfigModel
@ -49,12 +50,10 @@ generates the following documentation:
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/metadata-ingestion/generated_config_docs.png"/> <img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/metadata-ingestion/generated_config_docs.png"/>
</p> </p>
:::note :::note
Inline markdown or code snippets are not yet supported for field level documentation. Inline markdown or code snippets are not yet supported for field level documentation.
::: :::
### 2. Set up the reporter ### 2. Set up the reporter
The reporter interface enables the source to report statistics, warnings, failures, and other information about the run. The reporter interface enables the source to report statistics, warnings, failures, and other information about the run.
@ -71,6 +70,8 @@ some [convenience methods](./src/datahub/emitter/mce_builder.py) for commonly us
### 4. Set up the dependencies ### 4. Set up the dependencies
Note: Steps 4-8 are only required if you intend to contribute the source back to the Datahub project.
Declare the source's pip dependencies in the `plugins` variable of the [setup script](./setup.py). Declare the source's pip dependencies in the `plugins` variable of the [setup script](./setup.py).
### 5. Enable discoverability ### 5. Enable discoverability
@ -131,7 +132,6 @@ class FileSource(Source):
``` ```
#### 7.2 Write custom documentation #### 7.2 Write custom documentation
- Create a copy of [`source-docs-template.md`](./source-docs-template.md) and edit all relevant components. - Create a copy of [`source-docs-template.md`](./source-docs-template.md) and edit all relevant components.
@ -144,12 +144,14 @@ class FileSource(Source):
Documentation for the source can be viewed by running the documentation generator from the `docs-website` module. Documentation for the source can be viewed by running the documentation generator from the `docs-website` module.
##### Step 1: Build the Ingestion docs ##### Step 1: Build the Ingestion docs
```console ```console
# From the root of DataHub repo # From the root of DataHub repo
./gradlew :metadata-ingestion:docGen ./gradlew :metadata-ingestion:docGen
``` ```
If this finishes successfully, you will see output messages like: If this finishes successfully, you will see output messages like:
```console ```console
Ingestion Documentation Generation Complete Ingestion Documentation Generation Complete
############################################ ############################################
@ -170,6 +172,7 @@ Ingestion Documentation Generation Complete
You can also find documentation files generated at `./docs/generated/ingestion/sources` relative to the root of the DataHub repo. You should be able to locate your specific source's markdown file here and investigate it to make sure things look as expected. You can also find documentation files generated at `./docs/generated/ingestion/sources` relative to the root of the DataHub repo. You should be able to locate your specific source's markdown file here and investigate it to make sure things look as expected.
#### Step 2: Build the Entire Documentation #### Step 2: Build the Entire Documentation
To view how this documentation looks in the browser, there is one more step. Just build the entire docusaurus page from the `docs-website` module. To view how this documentation looks in the browser, there is one more step. Just build the entire docusaurus page from the `docs-website` module.
```console ```console
@ -178,6 +181,7 @@ To view how this documentation looks in the browser, there is one more step. Jus
``` ```
This will generate messages like: This will generate messages like:
```console ```console
... ...
> Task :docs-website:yarnGenerate > Task :docs-website:yarnGenerate
@ -220,6 +224,7 @@ BUILD SUCCESSFUL in 35s
``` ```
After this you need to run the following script from the `docs-website` module. After this you need to run the following script from the `docs-website` module.
```console ```console
cd docs-website cd docs-website
npm run serve npm run serve
@ -228,7 +233,6 @@ npm run serve
Now, browse to http://localhost:3000 or whichever port npm is running on, to browse the docs. Now, browse to http://localhost:3000 or whichever port npm is running on, to browse the docs.
Your source should show up on the left sidebar under `Metadata Ingestion / Sources`. Your source should show up on the left sidebar under `Metadata Ingestion / Sources`.
### 8. Add SQL Alchemy mapping (if applicable) ### 8. Add SQL Alchemy mapping (if applicable)
Add the source in `get_platform_from_sqlalchemy_uri` function Add the source in `get_platform_from_sqlalchemy_uri` function