2021-08-08 16:40:51 -04:00
|
|
|
# SageMaker
|
|
|
|
|
|
|
|
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
|
|
|
|
|
|
|
|
## Setup
|
|
|
|
|
|
|
|
To install this plugin, run `pip install 'acryl-datahub[sagemaker]'`.
|
|
|
|
|
|
|
|
## Capabilities
|
|
|
|
|
|
|
|
This plugin extracts the following:
|
|
|
|
|
|
|
|
- Feature groups
|
|
|
|
- Models, jobs, and lineage between the two (e.g. when jobs output a model or a model is used by a job)
|
|
|
|
|
|
|
|
## Quickstart recipe
|
|
|
|
|
|
|
|
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
|
|
|
|
|
|
|
|
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
|
|
|
|
|
|
|
|
```yml
|
|
|
|
source:
|
|
|
|
type: sagemaker
|
|
|
|
config:
|
|
|
|
# Coordinates
|
|
|
|
aws_region: "my-aws-region"
|
|
|
|
|
|
|
|
sink:
|
|
|
|
# sink configs
|
|
|
|
```
|
|
|
|
|
|
|
|
## Config details
|
|
|
|
|
|
|
|
Note that a `.` is used to denote nested fields in the YAML recipe.
|
|
|
|
|
|
|
|
| Field | Required | Default | Description |
|
|
|
|
| ------------------------------------- | -------- | ------------ | ---------------------------------------------------------------------------------- |
|
|
|
|
| `aws_region` | ✅ | | AWS region code. |
|
|
|
|
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
|
|
|
|
| `aws_access_key_id` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
|
|
|
| `aws_secret_access_key` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
|
|
|
| `aws_session_token` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
|
|
|
| `aws_role` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
2021-08-09 13:00:47 -04:00
|
|
|
| `extract_feature_groups` | | `True` | Whether to extract feature groups. |
|
2021-08-08 16:40:51 -04:00
|
|
|
| `extract_models` | | `True` | Whether to extract models. |
|
|
|
|
| `extract_jobs.auto_ml` | | `True` | Whether to extract AutoML jobs. |
|
|
|
|
| `extract_jobs.compilation` | | `True` | Whether to extract compilation jobs. |
|
|
|
|
| `extract_jobs.edge_packaging` | | `True` | Whether to extract edge packaging jobs. |
|
|
|
|
| `extract_jobs.hyper_parameter_tuning` | | `True` | Whether to extract hyperparameter tuning jobs. |
|
|
|
|
| `extract_jobs.labeling` | | `True` | Whether to extract labeling jobs. |
|
|
|
|
| `extract_jobs.processing` | | `True` | Whether to extract processing jobs. |
|
|
|
|
| `extract_jobs.training` | | `True` | Whether to extract training jobs. |
|
|
|
|
| `extract_jobs.transform` | | `True` | Whether to extract transform jobs. |
|
|
|
|
|
|
|
|
## Compatibility
|
|
|
|
|
|
|
|
Coming soon!
|
|
|
|
|
|
|
|
## Questions
|
|
|
|
|
|
|
|
If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
|