The DataHub MCP Server implements the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction), which standardizes how applications provide context to LLMs and AI agents. This enables AI agents to query DataHub metadata and use it to find relevant assets, traverse lineage, and more.
Want to learn more about the motivation, architecture, and advanced use cases? Check out our [deep dive blog post](https://datahub.com/blog/datahub-mcp-server-block-ai-agents-use-case/).
There's two ways to use the MCP server, which vary in setup required but offer the same capabilities.
- [Managed MCP Server](#managed-mcp-server-usage) - Available on DataHub Cloud v0.3.12+
- [Self-Hosted MCP Server](#self-hosted-mcp-server-usage) - Available for DataHub Core
## Capabilities
**Search for Data** <br/>
Find the right data to use for your projects & analysis by asking questions in plain English - skip the tribal knowledge.
**Dive Deeper** <br/>
Separate the signal from noise with rich context about your data, including usage, ownership, documentation, tags, and quality.
**Lineage & Impact Analysis** <br/>
Understand the impact of upcoming changes to tables, reports, and dashboards using DataHub’s end-to-end lineage graph.
**Query Analysis & Authoring** <br/>
Understand how your mission-critical data is typically queried, or build custom queries for your tables.
**Works Where You Work** <br/>
Seamlessly integrates with AI-native tools like Cursor, Windsurf, Claude Desktop, and OpenAI to supercharge your workflows.
With DataHub MCP Server, you can instantly give AI agents visibility into of your entire data ecosystem. Find and understand data stored in your databases, data lake, data warehouse, BI visualization tools, and AI/ML Feature stores. Explore data lineage, understand usage & use cases, identify the data experts, and generate SQL - all through natural language.
### **Structured Search with Context Filtering**
Go beyond keyword matching with powerful query & filtering syntax:
- Field searches: `/q tag:PII` finds all PII-tagged data
- Boolean logic: `/q (sales OR revenue) AND quarterly` for complex queries
### **SQL Intelligence & Query Generation**
Access popular SQL queries, and generate new ones with accuracy:
- See how analysts query tables (perfect for SQL generation)
- Understand join patterns and common filters
- Learn from production query patterns
### **Table & Column-Level Lineage**
Trace data flow at both the table and column level:
- Track how `user_id` becomes `customer_key` downstream
- Understand transformation logic
- Upstream and downstream exploration (1-3+ hops)
- Handle enterprise-scale lineage graphs
## Tools
The DataHub MCP Server provides the following tools:
`search`
Search DataHub using structured keyword search (/q syntax) with boolean logic, filters, pagination, and optional sorting by usage metrics.
`get_lineage`
Retrieve upstream or downstream lineage for any entity (datasets, columns, dashboards, etc.) with filtering, query-within-lineage, pagination, and hop control.
`get_dataset_queries`
Fetch real SQL queries referencing a dataset or column—manual or system-generated—to understand usage patterns, joins, filters, and aggregation behavior.
`get_entities`
Fetch detailed metadata for one or more entities by URN; supports batch retrieval for efficient inspection of search results.
`list_schema_fields`
List schema fields for a dataset with keyword filtering and pagination, useful when search results truncate fields or when exploring large schemas.
`get_lineage_paths_between`
Retrieve the exact lineage paths between two assets or columns, including intermediate transformations and SQL query information.
The managed MCP server endpoint is only available with DataHub Cloud v0.3.12+. For DataHub Core and older versions of DataHub Cloud, you'll need to [self-host the MCP server](#self-hosted-mcp-server-usage).
There are two [transports types](https://modelcontextprotocol.io/docs/concepts/transports) for remote MCP servers: streamable HTTP and server-sent events (SSE). SSE has been deprecated in favor of streamable HTTP, so DataHub only supports the newer streamable HTTP transport. Some older MCP clients (e.g. chatgpt.com) may still only support SSE. For those cases, you'll need to use something like [mcp-remote](https://github.com/geelen/mcp-remote) to bridge the gap.