harshsoni2024 56bc7e647c
DOCS: doc fix v1.5 (#16834)
* use 1.5 partial file

* use v1.5 images in docs
2024-06-28 12:13:05 +05:30

2.7 KiB

Query Usage

The Query Usage workflow will be using the query-parser processor.

After running a Metadata Ingestion workflow, we can run Query Usage workflow. While the serviceName will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the serviceConnection details from the server.

1. Define the YAML Config

This is a sample config for BigQuery Usage:

{% codePreview %}

{% codeInfoContainer %}

{% codeInfo srNumber=25 %}

Source Configuration - Source Config

You can find all the definitions and types for the sourceConfig here.

queryLogDuration: Configuration to tune how far we want to look back in query logs to process usage data.

{% /codeInfo %}

{% codeInfo srNumber=26 %}

stageFileLocation: Temporary file name to store the query logs before processing. Absolute file path required.

{% /codeInfo %}

{% codeInfo srNumber=27 %}

resultLimit: Configuration to set the limit for query logs

{% /codeInfo %}

{% codeInfo srNumber=28 %}

queryLogFilePath: Configuration to set the file path for query logs

{% /codeInfo %}

{% codeInfo srNumber=29 %}

Processor, Stage and Bulk Sink Configuration

To specify where the staging files will be located.

Note that the location is a directory that will be cleaned at the end of the ingestion.

{% /codeInfo %}

{% partial file="/v1.5/connectors/yaml/workflow-config-def.md" /%}

{% /codeInfoContainer %}

{% codeBlock fileName="filename.yaml" %}

source:
  type: {% $connector %}-usage
  serviceName: <service name>
  sourceConfig:
    config:
      type: DatabaseUsage
      # Number of days to look back
      queryLogDuration: 7
      # This is a directory that will be DELETED after the usage runs
      stageFileLocation: <path to store the stage file>
      # resultLimit: 1000
      # If instead of getting the query logs from the database we want to pass a file with the queries
      # queryLogFilePath: path-to-file
processor:
  type: query-parser
  config: {}
stage:
  type: table-usage
  config:
    filename: /tmp/athena_usage
bulkSink:
  type: metadata-usage
  config:
    filename: /tmp/athena_usage

{% partial file="/v1.5/connectors/yaml/workflow-config.md" /%}

{% /codeBlock %} {% /codePreview %}

2. Run with the CLI

After saving the YAML config, we will run the command the same way we did for the metadata ingestion:

metadata usage -c <path-to-yaml>