docs(mlflow): add docs for the mlflow dataset config (#12973)

2025-12-24 16:38:19 +00:00 · 2025-04-01 12:20:32 +09:00 · 2025-04-01 12:20:32 +09:00 · 9e28c1af63
commit 9e28c1af63
parent b6af240e97
1 changed files with 46 additions and 0 deletions
--- a/metadata-ingestion/docs/sources/mlflow/mlflow_post.md
+++ b/metadata-ingestion/docs/sources/mlflow/mlflow_post.md
@ -0,0 +1,46 @@
+### Auth Configuration
+
+You can configure the MLflow source to authenticate with the MLflow server using the `username` and `password` configuration options.
+
+```yaml
+source:
+  type: mlflow
+  config:
+    tracking_uri: "http://127.0.0.1:5000"
+    username: <username>
+    password: <password>
+```
+
+### Dataset Lineage 
+You can map MLflow run datasets to specific DataHub platforms using the `source_mapping_to_platform` configuration option. This allows you to specify which DataHub platform should be associated with datasets from different MLflow engines.
+
+Example:
+```yaml
+source_mapping_to_platform:
+    huggingface: snowflake  # Maps Hugging Face datasets to Snowflake platform
+    http: s3 # Maps HTTP data sources to s3 platform
+```
+
+By default, DataHub will attempt to connect lineage with existing datasets based on the platform and name, but will not create new datasets if they don't exist.
+
+To enable automatic dataset creation and lineage mapping, use the `materialize_dataset_inputs` option:
+
+```yaml
+materlize_dataset_inputs: true  # Creates new datasets if they don't exist
+```
+
+You can configure these options independently:
+
+```yaml
+# Only map to existing datasets
+materlize_dataset_inputs: false
+source_mapping_to_platform:
+    huggingface: snowflake  # Maps Hugging Face datasets to Snowflake platform
+    pytorch: snowflake      # Maps PyTorch datasets to Snowflake platform
+
+# Create new datasets and map platforms
+materlize_dataset_inputs: true
+source_mapping_to_platform:
+    huggingface: snowflake
+    pytorch: snowflake
+```