2022-02-08 14:26:44 -08:00
# Tableau
For context on getting started with ingestion, check out our [metadata ingestion guide ](../README.md ).
Note that this connector is currently considered in `BETA` , and has not been validated for production use.
## Setup
To install this plugin, run `pip install 'acryl-datahub[tableau]'` .
See documentation for Tableau's metadata API at https://help.tableau.com/current/api/metadata_api/en-us/index.html
## Capabilities
This plugin extracts Sheets, Dashboards, Embedded and Published Data sources metadata within Workbooks in a given project
2022-03-31 04:02:15 +05:30
on a Tableau site. This plugin is in beta and has only been tested on PostgreSQL database and sample workbooks
2022-02-08 14:26:44 -08:00
on Tableau online.
Tableau's GraphQL interface is used to extract metadata information. Queries used to extract metadata are located
in `metadata-ingestion/src/datahub/ingestion/source/tableau_common.py`
2022-03-05 01:22:04 +05:30
- [Workbook ](#Workbook )
2022-02-08 14:26:44 -08:00
- [Dashboard ](#Dashboard )
- [Sheet ](#Sheet )
- [Embedded Data source ](#Embedded-Data-Source )
- [Published Data source ](#Published-Data-Source )
- [Custom SQL Data source ](#Custom-SQL-Data-Source )
2022-03-05 01:22:04 +05:30
### Workbook
Workbooks from Tableau are ingested as Container in datahub. < br / >
2022-02-08 14:26:44 -08:00
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
workbooksConnection(first: 15, offset: 0, filter: {projectNameWithin: ["default", "Project 2"]}) {
nodes {
id
name
luid
2022-03-05 01:22:04 +05:30
uri
2022-02-08 14:26:44 -08:00
projectName
owner {
username
}
description
uri
createdAt
updatedAt
2022-03-05 01:22:04 +05:30
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}
```
### Dashboard
Dashboards from Tableau are ingested as Dashboard in datahub. < br / >
- GraphQL query < br />
```graphql
{
workbooksConnection(first: 15, offset: 0, filter: {projectNameWithin: ["default", "Project 2"]}) {
nodes {
.....
2022-02-08 14:26:44 -08:00
dashboards {
id
name
path
createdAt
updatedAt
sheets {
id
name
}
}
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}
```
### Sheet
Sheets from Tableau are ingested as charts in datahub. < br / >
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
workbooksConnection(first: 10, offset: 0, filter: {projectNameWithin: ["default"]}) {
.....
sheets {
id
name
path
createdAt
updatedAt
tags {
name
}
containedInDashboards {
name
path
}
upstreamDatasources {
id
name
}
datasourceFields {
__typename
id
name
description
upstreamColumns {
name
}
... on ColumnField {
dataCategory
role
dataType
aggregation
}
... on CalculatedField {
role
dataType
aggregation
formula
}
... on GroupField {
role
dataType
}
... on DatasourceField {
remoteField {
__typename
id
name
description
folderName
... on ColumnField {
dataCategory
role
dataType
aggregation
}
... on CalculatedField {
role
dataType
aggregation
formula
}
... on GroupField {
role
dataType
}
}
}
}
}
}
.....
}
}
```
### Embedded Data Source
Embedded Data source from Tableau is ingested as a Dataset in datahub.
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
workbooksConnection(first: 15, offset: 0, filter: {projectNameWithin: ["default"]}) {
nodes {
....
embeddedDatasources {
__typename
id
name
hasExtracts
extractLastRefreshTime
extractLastIncrementalUpdateTime
extractLastUpdateTime
upstreamDatabases {
id
name
connectionType
isEmbedded
}
upstreamTables {
name
schema
columns {
name
remoteType
}
}
fields {
__typename
id
name
description
isHidden
folderName
... on ColumnField {
dataCategory
role
dataType
defaultFormat
aggregation
columns {
table {
... on CustomSQLTable {
id
name
}
}
}
}
... on CalculatedField {
role
dataType
defaultFormat
aggregation
formula
}
... on GroupField {
role
dataType
}
}
upstreamDatasources {
2022-03-31 04:02:15 +05:30
id
2022-02-08 14:26:44 -08:00
name
}
workbook {
name
projectName
}
}
}
....
}
}
```
### Published Data Source
Published Data source from Tableau is ingested as a Dataset in datahub.
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
publishedDatasourcesConnection(filter: {idWithin: ["00cce29f-b561-bb41-3557-8e19660bb5dd", "618c87db-5959-338b-bcc7-6f5f4cc0b6c6"]}) {
nodes {
__typename
id
name
hasExtracts
extractLastRefreshTime
extractLastIncrementalUpdateTime
extractLastUpdateTime
downstreamSheets {
id
name
}
upstreamTables {
name
schema
fullName
connectionType
description
contact {
name
}
}
fields {
__typename
id
name
description
isHidden
folderName
... on ColumnField {
dataCategory
role
dataType
defaultFormat
aggregation
columns {
table {
... on CustomSQLTable {
id
name
}
}
}
}
... on CalculatedField {
role
dataType
defaultFormat
aggregation
formula
}
... on GroupField {
role
dataType
}
}
owner {
username
}
description
uri
projectName
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}
```
### Custom SQL Data Source
For custom sql data sources, the query is viewable in UI under View Definition tab. < br / >
- GraphQL query < br />
2022-03-05 01:22:04 +05:30
```graphql
2022-02-08 14:26:44 -08:00
{
customSQLTablesConnection(filter: {idWithin: ["22b0b4c3-6b85-713d-a161-5a87fdd78f40"]}) {
nodes {
id
name
query
columns {
id
name
remoteType
description
referencedByFields {
datasource {
id
name
upstreamDatabases {
id
name
}
upstreamTables {
id
name
schema
connectionType
columns {
id
}
}
... on PublishedDatasource {
projectName
}
... on EmbeddedDatasource {
workbook {
name
projectName
}
}
}
}
}
tables {
id
name
schema
connectionType
}
}
}
}
```
2022-03-31 04:02:15 +05:30
### Lineage
Lineage is emitted as received from Tableau's metadata API for
- Sheets contained in Dashboard
- Embedded or Published datasources upstream to Sheet
- Published datasources upstream to Embedded datasource
- Tables upstream to Embedded or Published datasource
- Custom SQL datasources upstream to Embedded or Published datasource
- Tables upstream to Custom SQL datasource
2022-02-08 14:26:44 -08:00
## Quickstart recipe
Check out the following recipe to get started with ingestion! See [below ](#config-details ) for full configuration options.
For general pointers on writing and running a recipe, see our [main recipe guide ](../README.md#recipes ).
```yml
source:
type: tableau
config:
# Coordinates
connect_uri: https://prod-ca-a.online.tableau.com
site: acryl
projects: ["default", "Project 2"]
# Credentials
username: username@acrylio .com
password: pass
2022-03-05 01:22:04 +05:30
2022-02-08 14:26:44 -08:00
# Options
ingest_tags: True
ingest_owner: True
default_schema_map:
mydatabase: public
anotherdatabase: anotherschema
sink:
# sink configs
```
## Config details
| Field | Required | Default | Description |
|-----------------------|----------|-----------|--------------------------------------------------------------------------|
| `connect_uri` | ✅ | | Tableau host URL. |
2022-03-05 01:22:04 +05:30
| `site` | | `""` | Tableau Site. Always required for Tableau Online. Use emptystring "" to connect with Default site on Tableau Server. |
2022-02-08 14:26:44 -08:00
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
2022-03-05 01:22:04 +05:30
| `username` | | | Tableau username, must be set if authenticating using username/password. |
| `password` | | | Tableau password, must be set if authenticating using username/password. |
| `token_name` | | | Tableau token name, must be set if authenticating using a personal access token. |
| `token_value` | | | Tableau token value, must be set if authenticating using a personal access token. |
2022-02-08 14:26:44 -08:00
| `projects` | | `default` | List of projects |
2022-03-05 01:22:04 +05:30
| `workbooks_page_size` | | 10 | Number of workbooks to query at a time using Tableau api. |
2022-02-08 14:26:44 -08:00
| `default_schema_map` * | | | Default schema to use when schema is not found. |
| `ingest_tags` | | `False` | Ingest Tags from source. This will override Tags entered from UI |
| `ingest_owners` | | `False` | Ingest Owner from source. This will override Owner info entered from UI |
*Tableau may not provide schema name when ingesting Custom SQL data source. Use `default_schema_map` to provide a default
schema name to use when constructing a table URN.
2022-03-05 01:22:04 +05:30
2022-02-08 14:26:44 -08:00
### Authentication
2022-03-05 01:22:04 +05:30
Currently, authentication is supported on Tableau using username and password
2022-02-08 14:26:44 -08:00
and personal token. For more information on Tableau authentication, refer to [How to Authenticate ](https://help.tableau.com/current/api/metadata_api/en-us/docs/meta_api_auth.html ) guide.
## Compatibility
2022-03-31 04:02:15 +05:30
Works with Tableau Server Version 2021.1.10 and above. It may also work for older versions.
2022-02-08 14:26:44 -08:00
## Questions
If you've got any questions on configuring this source, feel free to ping us on
[our Slack ](https://slack.datahubproject.io/ )!