mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-24 16:38:19 +00:00
docs(mlflow): add docs for the mlflow dataset config (#12973)
This commit is contained in:
parent
b6af240e97
commit
9e28c1af63
46
metadata-ingestion/docs/sources/mlflow/mlflow_post.md
Normal file
46
metadata-ingestion/docs/sources/mlflow/mlflow_post.md
Normal file
@ -0,0 +1,46 @@
|
||||
### Auth Configuration
|
||||
|
||||
You can configure the MLflow source to authenticate with the MLflow server using the `username` and `password` configuration options.
|
||||
|
||||
```yaml
|
||||
source:
|
||||
type: mlflow
|
||||
config:
|
||||
tracking_uri: "http://127.0.0.1:5000"
|
||||
username: <username>
|
||||
password: <password>
|
||||
```
|
||||
|
||||
### Dataset Lineage
|
||||
You can map MLflow run datasets to specific DataHub platforms using the `source_mapping_to_platform` configuration option. This allows you to specify which DataHub platform should be associated with datasets from different MLflow engines.
|
||||
|
||||
Example:
|
||||
```yaml
|
||||
source_mapping_to_platform:
|
||||
huggingface: snowflake # Maps Hugging Face datasets to Snowflake platform
|
||||
http: s3 # Maps HTTP data sources to s3 platform
|
||||
```
|
||||
|
||||
By default, DataHub will attempt to connect lineage with existing datasets based on the platform and name, but will not create new datasets if they don't exist.
|
||||
|
||||
To enable automatic dataset creation and lineage mapping, use the `materialize_dataset_inputs` option:
|
||||
|
||||
```yaml
|
||||
materlize_dataset_inputs: true # Creates new datasets if they don't exist
|
||||
```
|
||||
|
||||
You can configure these options independently:
|
||||
|
||||
```yaml
|
||||
# Only map to existing datasets
|
||||
materlize_dataset_inputs: false
|
||||
source_mapping_to_platform:
|
||||
huggingface: snowflake # Maps Hugging Face datasets to Snowflake platform
|
||||
pytorch: snowflake # Maps PyTorch datasets to Snowflake platform
|
||||
|
||||
# Create new datasets and map platforms
|
||||
materlize_dataset_inputs: true
|
||||
source_mapping_to_platform:
|
||||
huggingface: snowflake
|
||||
pytorch: snowflake
|
||||
```
|
||||
Loading…
x
Reference in New Issue
Block a user