The Collate Ingestion Agent is designed to facilitate metadata ingestion for hybrid deployments, allowing organizations to securely push metadata from their infrastructure into the Collate platform without exposing their internal systems. It provides a secure and efficient channel for running ingestion workflows while maintaining full control over data processing within your network. This document outlines the setup and usage of the Collate Ingestion Agent, emphasizing its role in hybrid environments and key functionalities.
### Overview
The Collate Ingestion Agent is ideal for scenarios where running connectors on-premises is necessary, providing a secure and efficient way to process metadata within your infrastructure. This eliminates concerns about data privacy and streamlines the ingestion process.
With the Collate Ingestion Agent, you can:
- Set up ingestion workflows easily without configuring YAML files manually.
- Leverage the Collate UI for a seamless and user-friendly experience.
- Manage various ingestion types, including metadata, profiling, lineage, usage, DBT, and data quality.
### Setting Up the Collate Ingestion Agent
#### 1. Prepare Your Environment
To begin, download the Collate-provided Docker image for the Ingestion Agent. The Collate team will provide the necessary credentials to authenticate and pull the image from the repository.
**Run the following commands:**
- **Log in to Docker**: Use the credentials provided by Collate to authenticate.
- **Pull the Docker Image**: Run the command to pull the image into your local environment.
Once the image is downloaded, you can start the Docker container to initialize the Ingestion Agent.
#### 2. Configure the Agent
##### Access the Local Agent UI:
- Open your browser and navigate to the local instance of the Collate Ingestion Agent.
##### Set Up the Connection:
- Enter your Collate platform URL (e.g., `https://<your-company>.collate.com/api`).
- Add the ingestion bot token from the Collate settings under **Settings > Bots > Ingestion Bot**.
##### Verify Services:
- Open the Collate UI and confirm that all available services (e.g., databases) are visible in the Ingestion Agent interface.
#### 3. Add a New Service
1. Navigate to the **Database Services** section in the Ingestion Agent UI.
2. Click **Add New Service** and select the database type (e.g., Redshift).
3. Enter the necessary service configuration:
- **Service Name**: A unique name for the database service.
- **Host and Port**: Connection details for the database.
- **Username and Password**: Credentials to access the database.
- **Database Name**: The target database for ingestion.
4. Test the connection to ensure the service is properly configured.
#### 4. Run Metadata Ingestion
1. After creating the service, navigate to the **Ingestion** tab and click **Add Ingestion**.
2. Select the ingestion type (e.g., metadata) and specify any additional configurations:
- Include specific schemas or tables.
- Enable options like DDL inclusion if required.
3. Choose whether to:
- Run the ingestion immediately via the agent.
- Download the YAML configuration file for running ingestion on an external scheduler.
4. Monitor the logs in real-time to track the ingestion process.
#### 5. Verify Ingested Data
1. Return to the Collate platform and refresh the database service.
2. Verify that the ingested metadata, including schemas, tables, and column details, is available.
3. Explore additional ingestion options like profiling, lineage, or data quality for the service.
### Additional Features
The Collate Ingestion Agent supports various ingestion workflows, allowing you to:
- **Generate YAML Configurations**: Download YAML files for external scheduling.
- **Manage Ingestion Types**: Run metadata, profiling, lineage, usage, and other workflows as needed.
- **Monitor Progress**: View logs and monitor real-time ingestion activity.
In order to install it, You need to first find out which connector you are interested in running from your laptop.
For exammple, snowflake you can refer to document here and go through the requirements https://docs.getcollate.io/connectors/database/snowflake/yaml#requirements