mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-07-13 20:18:24 +00:00

* GitBook: [#50] BigQuery, Glue, MSSQL, Postgres, Redshift, Snowflake - V2 * GitBook: [#62] No subject * GitBook: [#63] No subject * GitBook: [#64] Beta * GitBook: [#65] Make Harsha's requested changes to connectors section organization * GitBook: [#66] Kerberos authentication with Hive * GitBook: [#67] Fix procedure overview links * GitBook: [#68] Fix procedure overview links * GitBook: [#69] correct step reference * GitBook: [#70] Add Kerberos connection troubleshooting * updated json schema and schema docs (#3219) * updated json schema and schema docs * added glossay to readme * GitBook: [#72] Metrics & Tests Co-authored-by: Parth Panchal <parth.panchal@deuexsolutions.com> Co-authored-by: Shilpa V <vernekar.shilpa@gmail.com> Co-authored-by: Shannon Bradshaw <shannon.bradshaw@arrikto.com> Co-authored-by: parthp2107 <83201188+parthp2107@users.noreply.github.com> Co-authored-by: pmbrull <peremiquelbrull@gmail.com>
313 lines
11 KiB
Markdown
313 lines
11 KiB
Markdown
# Pipeline
|
|
|
|
This schema defines the Pipeline entity. A pipeline enables the flow of data from source to destination through a series of processing steps. ETL is a type of pipeline where the series of steps Extract, Transform and Load the data.
|
|
|
|
**$id:**[**https://open-metadata.org/schema/entity/data/pipeline.json**](https://open-metadata.org/schema/entity/data/pipeline.json)
|
|
|
|
Type: `object`
|
|
|
|
This schema does not accept additional properties.
|
|
|
|
## Properties
|
|
<<<<<<< HEAD
|
|
=======
|
|
- **id** `required`
|
|
- Unique identifier that identifies a pipeline instance.
|
|
- $ref: [../../type/basic.json#/definitions/uuid](../types/basic.md#uuid)
|
|
- **name** `required`
|
|
- Name that identifies this pipeline instance uniquely.
|
|
- Type: `string`
|
|
- Length: between 1 and 128
|
|
- **displayName**
|
|
- Display Name that identifies this Pipeline. It could be title or label from the source services.
|
|
- Type: `string`
|
|
- **fullyQualifiedName**
|
|
- A unique name that identifies a pipeline in the format 'ServiceName.PipelineName'.
|
|
- Type: `string`
|
|
- **description**
|
|
- Description of this Pipeline.
|
|
- Type: `string`
|
|
- **version**
|
|
- Metadata version of the entity.
|
|
- $ref: [../../type/entityHistory.json#/definitions/entityVersion](../types/entityhistory.md#entityversion)
|
|
- **updatedAt**
|
|
- Last update time corresponding to the new version of the entity in Unix epoch time milliseconds.
|
|
- $ref: [../../type/basic.json#/definitions/timestamp](../types/basic.md#timestamp)
|
|
- **updatedBy**
|
|
- User who made the update.
|
|
- Type: `string`
|
|
- **pipelineUrl**
|
|
- Pipeline URL to visit/manage. This URL points to respective pipeline service UI.
|
|
- Type: `string`
|
|
- String format must be a "uri"
|
|
- **concurrency**
|
|
- Concurrency of the Pipeline.
|
|
- Type: `integer`
|
|
- **pipelineLocation**
|
|
- Pipeline Code Location.
|
|
- Type: `string`
|
|
- **startDate**
|
|
- Start date of the workflow.
|
|
- $ref: [../../type/basic.json#/definitions/dateTime](../types/basic.md#datetime)
|
|
- **tasks**
|
|
- All the tasks that are part of pipeline.
|
|
- Type: `array`
|
|
- **Items**
|
|
- $ref: [#/definitions/task](#task)
|
|
- **pipelineStatus**
|
|
- Series of pipeline executions and its status.
|
|
- Type: `array`
|
|
- **Items**
|
|
- $ref: [#/definitions/pipelineStatus](#pipelinestatus)
|
|
- **followers**
|
|
- Followers of this Pipeline.
|
|
- $ref: [../../type/entityReference.json#/definitions/entityReferenceList](../types/entityreference.md#entityreferencelist)
|
|
- **tags**
|
|
- Tags for this Pipeline.
|
|
- Type: `array`
|
|
- **Items**
|
|
- $ref: [../../type/tagLabel.json](../types/taglabel.md)
|
|
- **href**
|
|
- Link to the resource corresponding to this entity.
|
|
- $ref: [../../type/basic.json#/definitions/href](../types/basic.md#href)
|
|
- **owner**
|
|
- Owner of this pipeline.
|
|
- $ref: [../../type/entityReference.json](../types/entityreference.md)
|
|
- **service** `required`
|
|
- Link to service where this pipeline is hosted in.
|
|
- $ref: [../../type/entityReference.json](../types/entityreference.md)
|
|
- **serviceType**
|
|
- Service type where this pipeline is hosted in.
|
|
- $ref: [../services/pipelineService.json#/definitions/pipelineServiceType](../services/pipelineservice.md#pipelineservicetype)
|
|
- **changeDescription**
|
|
- Change that lead to this version of the entity.
|
|
- $ref: [../../type/entityHistory.json#/definitions/changeDescription](../types/entityhistory.md#changedescription)
|
|
- **deleted**
|
|
- When `true` indicates the entity has been soft deleted.
|
|
- Type: `boolean`
|
|
- Default: _false_
|
|
>>>>>>> a07bc411 (updated json schema and schema docs (#3219))
|
|
|
|
* **id** `required`
|
|
* Unique identifier that identifies a pipeline instance.
|
|
* $ref: [../../type/basic.json#/definitions/uuid](../types/basic.md#uuid)
|
|
* **name** `required`
|
|
* Name that identifies this pipeline instance uniquely.
|
|
* Type: `string`
|
|
* Length: between 1 and 128
|
|
* **displayName**
|
|
* Display Name that identifies this Pipeline. It could be title or label from the source services.
|
|
* Type: `string`
|
|
* **fullyQualifiedName**
|
|
* A unique name that identifies a pipeline in the format 'ServiceName.PipelineName'.
|
|
* Type: `string`
|
|
* **description**
|
|
* Description of this Pipeline.
|
|
* Type: `string`
|
|
* **version**
|
|
* Metadata version of the entity.
|
|
* $ref: [../../type/entityHistory.json#/definitions/entityVersion](../types/entityhistory.md#entityversion)
|
|
* **updatedAt**
|
|
* Last update time corresponding to the new version of the entity in Unix epoch time milliseconds.
|
|
* $ref: [../../type/basic.json#/definitions/timestamp](../types/basic.md#timestamp)
|
|
* **updatedBy**
|
|
* User who made the update.
|
|
* Type: `string`
|
|
* **pipelineUrl**
|
|
* Pipeline URL to visit/manage. This URL points to respective pipeline service UI.
|
|
* Type: `string`
|
|
* String format must be a "uri"
|
|
* **concurrency**
|
|
* Concurrency of the Pipeline.
|
|
* Type: `integer`
|
|
* **pipelineLocation**
|
|
* Pipeline Code Location.
|
|
* Type: `string`
|
|
* **startDate**
|
|
* Start date of the workflow.
|
|
* $ref: [../../type/basic.json#/definitions/dateTime](../types/basic.md#datetime)
|
|
* **tasks**
|
|
* All the tasks that are part of pipeline.
|
|
* Type: `array`
|
|
* **Items**
|
|
* $ref: [#/definitions/task](pipeline.md#task)
|
|
* **followers**
|
|
* Followers of this Pipeline.
|
|
* $ref: [../../type/entityReference.json#/definitions/entityReferenceList](../types/entityreference.md#entityreferencelist)
|
|
* **tags**
|
|
* Tags for this Pipeline.
|
|
* Type: `array`
|
|
* **Items**
|
|
* $ref: [../../type/tagLabel.json](../types/taglabel.md)
|
|
* **href**
|
|
* Link to the resource corresponding to this entity.
|
|
* $ref: [../../type/basic.json#/definitions/href](../types/basic.md#href)
|
|
* **owner**
|
|
* Owner of this pipeline.
|
|
* $ref: [../../type/entityReference.json](../types/entityreference.md)
|
|
* **service** `required`
|
|
* Link to service where this pipeline is hosted in.
|
|
* $ref: [../../type/entityReference.json](../types/entityreference.md)
|
|
* **serviceType**
|
|
* Service type where this pipeline is hosted in.
|
|
* $ref: [../services/pipelineService.json#/definitions/pipelineServiceType](https://github.com/open-metadata/OpenMetadata/blob/main/docs/openmetadata-apis/schemas/services/pipelineservice.md#pipelineservicetype)
|
|
* **changeDescription**
|
|
* Change that lead to this version of the entity.
|
|
* $ref: [../../type/entityHistory.json#/definitions/changeDescription](../types/entityhistory.md#changedescription)
|
|
* **deleted**
|
|
* When `true` indicates the entity has been soft deleted.
|
|
* Type: `boolean`
|
|
* Default: _false_
|
|
|
|
## Type definitions in this schema
|
|
### statusType
|
|
|
|
- Enum defining the possible Status.
|
|
- Type: `string`
|
|
- The value is restricted to the following:
|
|
1. _"Successful"_
|
|
2. _"Failed"_
|
|
3. _"Pending"_
|
|
|
|
|
|
### taskStatus
|
|
|
|
- This schema defines a time series of the status of a Pipeline or Task.
|
|
- Type: `object`
|
|
- This schema <u>does not</u> accept additional properties.
|
|
- **Properties**
|
|
- **name**
|
|
- Name of the Task.
|
|
- Type: `string`
|
|
- **executionStatus**
|
|
- Status at a specific execution date.
|
|
- $ref: [#/definitions/statusType](#statustype)
|
|
|
|
|
|
### task
|
|
|
|
* Type: `object`
|
|
- This schema <u>does not</u> accept additional properties.
|
|
* **Properties**
|
|
* **name** `required`
|
|
* Name that identifies this task instance uniquely.
|
|
* Type: `string`
|
|
* **displayName**
|
|
* Display Name that identifies this Task. It could be title or label from the pipeline services.
|
|
* Type: `string`
|
|
* **fullyQualifiedName**
|
|
* A unique name that identifies a pipeline in the format 'ServiceName.PipelineName.TaskName'.
|
|
* Type: `string`
|
|
* **description**
|
|
* Description of this Task.
|
|
* Type: `string`
|
|
* **taskUrl**
|
|
* Task URL to visit/manage. This URL points to respective pipeline service UI.
|
|
* Type: `string`
|
|
* String format must be a "uri"
|
|
* **downstreamTasks**
|
|
* All the tasks that are downstream of this task.
|
|
* Type: `array`
|
|
* **Items**
|
|
* Type: `string`
|
|
* **taskType**
|
|
* Type of the Task. Usually refers to the class it implements.
|
|
* Type: `string`
|
|
* **taskSQL**
|
|
* SQL used in the task. Can be used to determine the lineage.
|
|
* $ref: [../../type/basic.json#/definitions/sqlQuery](../types/basic.md#sqlquery)
|
|
* **tags**
|
|
* Tags for this task.
|
|
* Type: `array`
|
|
* **Items**
|
|
* $ref: [../../type/tagLabel.json](../types/taglabel.md)
|
|
|
|
_This document was updated on: Tuesday, January 25, 2022_
|
|
=======
|
|
### statusType
|
|
|
|
- Enum defining the possible Status.
|
|
- Type: `string`
|
|
- The value is restricted to the following:
|
|
1. _"Successful"_
|
|
2. _"Failed"_
|
|
3. _"Pending"_
|
|
|
|
|
|
### taskStatus
|
|
|
|
- This schema defines a time series of the status of a Pipeline or Task.
|
|
- Type: `object`
|
|
- This schema <u>does not</u> accept additional properties.
|
|
- **Properties**
|
|
- **name**
|
|
- Name of the Task.
|
|
- Type: `string`
|
|
- **executionStatus**
|
|
- Status at a specific execution date.
|
|
- $ref: [#/definitions/statusType](#statustype)
|
|
|
|
|
|
### task
|
|
|
|
- Type: `object`
|
|
- This schema <u>does not</u> accept additional properties.
|
|
- **Properties**
|
|
- **name** `required`
|
|
- Name that identifies this task instance uniquely.
|
|
- Type: `string`
|
|
- **displayName**
|
|
- Display Name that identifies this Task. It could be title or label from the pipeline services.
|
|
- Type: `string`
|
|
- **fullyQualifiedName**
|
|
- A unique name that identifies a pipeline in the format 'ServiceName.PipelineName.TaskName'.
|
|
- Type: `string`
|
|
- **description**
|
|
- Description of this Task.
|
|
- Type: `string`
|
|
- **taskUrl**
|
|
- Task URL to visit/manage. This URL points to respective pipeline service UI.
|
|
- Type: `string`
|
|
- String format must be a "uri"
|
|
- **downstreamTasks**
|
|
- All the tasks that are downstream of this task.
|
|
- Type: `array`
|
|
- **Items**
|
|
- Type: `string`
|
|
- **taskType**
|
|
- Type of the Task. Usually refers to the class it implements.
|
|
- Type: `string`
|
|
- **taskSQL**
|
|
- SQL used in the task. Can be used to determine the lineage.
|
|
- $ref: [../../type/basic.json#/definitions/sqlQuery](../types/basic.md#sqlquery)
|
|
- **tags**
|
|
- Tags for this task.
|
|
- Type: `array`
|
|
- **Items**
|
|
- $ref: [../../type/tagLabel.json](../types/taglabel.md)
|
|
|
|
|
|
### pipelineStatus
|
|
|
|
- Series of pipeline executions, its status and task status.
|
|
- Type: `object`
|
|
- This schema <u>does not</u> accept additional properties.
|
|
- **Properties**
|
|
- **executionDate**
|
|
- Date where the job was executed.
|
|
- $ref: [../../type/basic.json#/definitions/timestamp](../types/basic.md#timestamp)
|
|
- **executionStatus**
|
|
- Status at a specific execution date.
|
|
- $ref: [#/definitions/statusType](#statustype)
|
|
- **taskStatus**
|
|
- Series of task executions and its status.
|
|
- Type: `array`
|
|
- **Items**
|
|
- $ref: [#/definitions/taskStatus](#taskstatus)
|
|
|
|
|
|
|
|
|
|
_This document was updated on: Monday, March 7, 2022_
|
|
>>>>>>> a07bc411 (updated json schema and schema docs (#3219))
|