2022-05-02 00:18:15 -07:00
### Prerequisites
2022-02-08 14:26:44 -08:00
2022-05-02 00:18:15 -07:00
In order to ingest metadata from tableau, you will need:
- Python 3.6+
- Tableau Server Version 2021.1.10 and above. It may also work for older versions.
- [Enable the Tableau Metadata API ](https://help.tableau.com/current/api/metadata_api/en-us/docs/meta_api_start.html#enable-the-tableau-metadata-api-for-tableau-server ) for Tableau Server, if its not already enabled.
- Tableau Credentials (Username/Password or [Personal Access Token ](https://help.tableau.com/current/pro/desktop/en-us/useracct.htm#create-and-revoke-personal-access-tokens ))
2022-02-08 14:26:44 -08:00
2022-04-04 16:45:08 +05:30
## Integration Details
2022-02-08 14:26:44 -08:00
2022-04-04 16:45:08 +05:30
This plugin extracts Sheets, Dashboards, Embedded and Published Data sources metadata within Workbooks in a given project
on a Tableau site. This plugin is in beta and has only been tested on PostgreSQL database and sample workbooks
on Tableau online. Tableau's GraphQL interface is used to extract metadata information. Queries used to extract metadata are located
in `metadata-ingestion/src/datahub/ingestion/source/tableau_common.py`
2022-02-08 14:26:44 -08:00
2022-04-04 16:45:08 +05:30
### Concept Mapping
2022-02-08 14:26:44 -08:00
2022-04-04 16:45:08 +05:30
This ingestion source maps the following Source System Concepts to DataHub Concepts:
2022-02-08 14:26:44 -08:00
2022-04-04 16:45:08 +05:30
| Source Concept | DataHub Concept | Notes |
| -- | -- | -- |
2022-05-25 11:22:53 +05:30
| `"Tableau"` | [Data Platform ](../../metamodel/entities/dataPlatform.md ) | |
| Embedded DataSource | [Dataset ](../../metamodel/entities/dataset.md ) | SubType `"Embedded Data Source"` |
| Published DataSource | [Dataset ](../../metamodel/entities/dataset.md ) | SubType `"Published Data Source"` |
| Custom SQL Table | [Dataset ](../../metamodel/entities/dataset.md ) | SubTypes `"View"` , `"Custom SQL"` |
2022-05-02 00:18:15 -07:00
| Embedded or External Tables | [Dataset ](../../metamodel/entities/dataset.md ) | |
| Sheet | [Chart ](../../metamodel/entities/chart.md ) | |
| Dashboard | [Dashboard ](../../metamodel/entities/dashboard.md ) | |
| User | [User (a.k.a CorpUser) ](../../metamodel/entities/corpuser.md ) | |
2022-05-25 11:22:53 +05:30
| Workbook | [Container ](../../metamodel/entities/container.md ) | SubType `"Workbook"` |
2022-05-02 00:18:15 -07:00
| Tag | [Tag ](../../metamodel/entities/tag.md ) | |
2022-02-08 14:26:44 -08:00
2022-03-05 01:22:04 +05:30
- [Workbook ](#Workbook )
2022-02-08 14:26:44 -08:00
- [Dashboard ](#Dashboard )
- [Sheet ](#Sheet )
- [Embedded Data source ](#Embedded-Data-Source )
- [Published Data source ](#Published-Data-Source )
- [Custom SQL Data source ](#Custom-SQL-Data-Source )
2022-04-04 16:45:08 +05:30
#### Workbook
2022-03-05 01:22:04 +05:30
Workbooks from Tableau are ingested as Container in datahub. < br / >
2022-02-08 14:26:44 -08:00
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
workbooksConnection(first: 15, offset: 0, filter: {projectNameWithin: ["default", "Project 2"]}) {
nodes {
id
name
luid
2022-03-05 01:22:04 +05:30
uri
2022-02-08 14:26:44 -08:00
projectName
owner {
username
}
description
uri
createdAt
updatedAt
2022-03-05 01:22:04 +05:30
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}
```
2022-04-04 16:45:08 +05:30
#### Dashboard
2022-03-05 01:22:04 +05:30
Dashboards from Tableau are ingested as Dashboard in datahub. < br / >
- GraphQL query < br />
```graphql
{
workbooksConnection(first: 15, offset: 0, filter: {projectNameWithin: ["default", "Project 2"]}) {
nodes {
.....
2022-02-08 14:26:44 -08:00
dashboards {
id
name
path
createdAt
updatedAt
sheets {
id
name
}
}
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}
```
2022-04-04 16:45:08 +05:30
#### Sheet
2022-02-08 14:26:44 -08:00
Sheets from Tableau are ingested as charts in datahub. < br / >
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
workbooksConnection(first: 10, offset: 0, filter: {projectNameWithin: ["default"]}) {
.....
sheets {
id
name
path
createdAt
updatedAt
tags {
name
}
containedInDashboards {
name
path
}
upstreamDatasources {
id
name
}
datasourceFields {
__typename
id
name
description
upstreamColumns {
name
}
... on ColumnField {
dataCategory
role
dataType
aggregation
}
... on CalculatedField {
role
dataType
aggregation
formula
}
... on GroupField {
role
dataType
}
... on DatasourceField {
remoteField {
__typename
id
name
description
folderName
... on ColumnField {
dataCategory
role
dataType
aggregation
}
... on CalculatedField {
role
dataType
aggregation
formula
}
... on GroupField {
role
dataType
}
}
}
}
}
}
.....
}
}
```
2022-04-04 16:45:08 +05:30
#### Embedded Data Source
2022-02-08 14:26:44 -08:00
Embedded Data source from Tableau is ingested as a Dataset in datahub.
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
workbooksConnection(first: 15, offset: 0, filter: {projectNameWithin: ["default"]}) {
nodes {
....
embeddedDatasources {
__typename
id
name
hasExtracts
extractLastRefreshTime
extractLastIncrementalUpdateTime
extractLastUpdateTime
upstreamDatabases {
id
name
connectionType
isEmbedded
}
upstreamTables {
name
schema
columns {
name
remoteType
}
}
fields {
__typename
id
name
description
isHidden
folderName
... on ColumnField {
dataCategory
role
dataType
defaultFormat
aggregation
columns {
table {
... on CustomSQLTable {
id
name
}
}
}
}
... on CalculatedField {
role
dataType
defaultFormat
aggregation
formula
}
... on GroupField {
role
dataType
}
}
upstreamDatasources {
2022-03-31 04:02:15 +05:30
id
2022-02-08 14:26:44 -08:00
name
}
workbook {
name
projectName
}
}
}
....
}
}
```
2022-04-04 16:45:08 +05:30
#### Published Data Source
2022-02-08 14:26:44 -08:00
Published Data source from Tableau is ingested as a Dataset in datahub.
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
publishedDatasourcesConnection(filter: {idWithin: ["00cce29f-b561-bb41-3557-8e19660bb5dd", "618c87db-5959-338b-bcc7-6f5f4cc0b6c6"]}) {
nodes {
__typename
id
name
hasExtracts
extractLastRefreshTime
extractLastIncrementalUpdateTime
extractLastUpdateTime
downstreamSheets {
id
name
}
upstreamTables {
name
schema
fullName
connectionType
description
contact {
name
}
}
fields {
__typename
id
name
description
isHidden
folderName
... on ColumnField {
dataCategory
role
dataType
defaultFormat
aggregation
columns {
table {
... on CustomSQLTable {
id
name
}
}
}
}
... on CalculatedField {
role
dataType
defaultFormat
aggregation
formula
}
... on GroupField {
role
dataType
}
}
owner {
username
}
description
uri
projectName
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}
```
2022-04-04 16:45:08 +05:30
#### Custom SQL Data Source
2022-02-08 14:26:44 -08:00
For custom sql data sources, the query is viewable in UI under View Definition tab. < br / >
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
customSQLTablesConnection(filter: {idWithin: ["22b0b4c3-6b85-713d-a161-5a87fdd78f40"]}) {
nodes {
id
name
query
columns {
id
name
remoteType
description
referencedByFields {
datasource {
id
name
upstreamDatabases {
id
name
}
upstreamTables {
id
name
schema
connectionType
columns {
id
}
}
... on PublishedDatasource {
projectName
}
... on EmbeddedDatasource {
workbook {
name
projectName
}
}
}
}
}
tables {
id
name
schema
connectionType
}
}
}
}
```
2022-04-04 16:45:08 +05:30
#### Lineage
2022-03-31 04:02:15 +05:30
Lineage is emitted as received from Tableau's metadata API for
- Sheets contained in Dashboard
- Embedded or Published datasources upstream to Sheet
- Published datasources upstream to Embedded datasource
- Tables upstream to Embedded or Published datasource
- Custom SQL datasources upstream to Embedded or Published datasource
- Tables upstream to Custom SQL datasource
2022-02-08 14:26:44 -08:00
2022-04-04 16:45:08 +05:30
#### Caveats
2022-04-06 03:26:35 +05:30
- Tableau metadata API might return incorrect schema name for tables for some databases, leading to incorrect metadata in DataHub. This source attempts to extract correct schema from databaseTable's fully qualified name, wherever possible. Read [Using the databaseTable object in query ](https://help.tableau.com/current/api/metadata_api/en-us/docs/meta_api_model.html#schema_attribute ) for caveats in using schema attribute.
2022-04-04 16:45:08 +05:30
## Troubleshooting
2022-02-08 14:26:44 -08:00
2022-05-02 00:18:15 -07:00
### Why are only some workbooks ingested from the specified project?
2022-02-08 14:26:44 -08:00
2022-05-25 11:22:53 +05:30
This may happen when the Tableau API returns NODE_LIMIT_EXCEEDED error in response to metadata query and returns partial results with message "Showing partial results. , The request exceeded the ‘ n’ node limit. Use pagination, additional filtering, or both in the query to adjust results." To resolve this, consider
- reducing the page size using the `workbooks_page_size` config param in datahub recipe (Defaults to 10).
- increasing tableau configuration [metadata query node limit ](https://help.tableau.com/current/server/en-us/cli_configuration-set_tsm.htm#metadata_nodelimit ) to higher value.