| 
									
										
										
										
											2021-05-24 12:23:03 -07:00
										 |  |  | # Adding a Metadata Ingestion Source
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | There are two ways of adding a metadata ingestion source. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1. You are going to contribute the custom source directly to the Datahub project. | 
					
						
							|  |  |  | 2. You are writing the custom source for yourself and are not going to contribute back (yet). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If you are going for case (1) just follow the steps 1 to 9 below. In case you are building it for yourself you can skip | 
					
						
							|  |  |  | steps 4-9 (but maybe write tests and docs for yourself as well) and follow the documentation | 
					
						
							|  |  |  | on [how to use custom ingestion sources](../docs/how/add-custom-ingestion-source.md) | 
					
						
							|  |  |  | without forking Datahub. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-05-24 12:23:03 -07:00
										 |  |  | :::note | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | This guide assumes that you've already followed the metadata ingestion [developing guide](./developing.md) to set up | 
					
						
							|  |  |  | your local environment. | 
					
						
							| 
									
										
										
										
											2021-05-24 12:23:03 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### 1. Set up the configuration model
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | We use [pydantic](https://pydantic-docs.helpmanual.io/) for configuration, and all models must inherit | 
					
						
							|  |  |  | from `ConfigModel`. The [file source](./src/datahub/ingestion/source/file.py) is a good example. | 
					
						
							| 
									
										
										
										
											2021-05-24 12:23:03 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ### 2. Set up the reporter
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | The reporter interface enables the source to report statistics, warnings, failures, and other information about the run. | 
					
						
							|  |  |  | Some sources use the default `SourceReport` class, but others inherit and extend that class. | 
					
						
							| 
									
										
										
										
											2021-05-24 12:23:03 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ### 3. Implement the source itself
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | The core for the source is the `get_workunits` method, which produces a stream of MCE objects. | 
					
						
							|  |  |  | The [file source](./src/datahub/ingestion/source/file.py) is a good and simple example. | 
					
						
							| 
									
										
										
										
											2021-05-24 12:23:03 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | The MetadataChangeEventClass is defined in the metadata models which are generated | 
					
						
							|  |  |  | under `metadata-ingestion/src/datahub/metadata/schema_classes.py`. There are also | 
					
						
							|  |  |  | some [convenience methods](./src/datahub/emitter/mce_builder.py) for commonly used operations. | 
					
						
							| 
									
										
										
										
											2021-05-24 12:23:03 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ### 4. Set up the dependencies
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Declare the source's pip dependencies in the `plugins` variable of the [setup script](./setup.py). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### 5. Enable discoverability
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | Declare the source under the `entry_points` variable of the [setup script](./setup.py). This enables the source to be | 
					
						
							|  |  |  | listed when running `datahub check plugins`, and sets up the source's shortened alias for use in recipes. | 
					
						
							| 
									
										
										
										
											2021-05-24 12:23:03 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ### 6. Write tests
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Tests go in the `tests` directory. We use the [pytest framework](https://pytest.org/). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### 7. Write docs
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | Add the plugin to the table at the top of the README file, and add the source's documentation underneath the sources | 
					
						
							|  |  |  | header. | 
					
						
							| 
									
										
										
										
											2021-10-20 10:59:38 +05:30
										 |  |  | 
 | 
					
						
							|  |  |  | ### 8. Add SQL Alchemy mapping (if applicable)
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | Add the source in `get_platform_from_sqlalchemy_uri` function | 
					
						
							|  |  |  | in [sql_common.py](./src/datahub/ingestion/source/sql/sql_common.py) if the source has an sqlalchemy source | 
					
						
							| 
									
										
										
										
											2021-10-20 10:59:38 +05:30
										 |  |  | 
 | 
					
						
							|  |  |  | ### 9. Add logo
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-10-26 07:23:48 +02:00
										 |  |  | Add logo image in [images folder](../datahub-web-react/src/images) and add it to be ingested | 
					
						
							|  |  |  | in [boot](../metadata-service/war/src/main/resources/boot/data_platforms.json) |