feat(docs): refactor source and sink ingestion docs (#3031)

2025-12-25 17:08:29 +00:00 · 2021-08-08 16:40:51 -04:00 · 2021-08-08 16:40:51 -04:00 · 32b8fc6108
commit 32b8fc6108
parent a7ea888612
32 changed files with 2046 additions and 905 deletions
--- a/docs-website/generateDocsDir.ts
+++ b/docs-website/generateDocsDir.ts
@ -159,6 +159,13 @@ function markdown_guess_title(
  } else {
    // Find first h1 header and use it as the title.
    const headers = contents.content.match(/^# (.+)$/gm);
+
+    if (!headers) {
+      throw new Error(
+        `${filepath} must have at least one h1 header for setting the title`
+      );
+    }
+
    if (headers.length > 1 && contents.content.indexOf("```") < 0) {
      throw new Error(`too many h1 headers in ${filepath}`);
    }
--- a/docs-website/sidebars.js
+++ b/docs-website/sidebars.js
@ -55,6 +55,14 @@ module.exports = {
      "docs/architecture/metadata-serving",
      //"docs/what/gms",
    ],
+    "Metadata Ingestion": [
+      {
+        Sources: list_ids_in_directory("metadata-ingestion/source_docs"),
+      },
+      {
+        Sinks: list_ids_in_directory("metadata-ingestion/sink_docs"),
+      },
+    ],
    "Metadata Modeling": [
      "docs/modeling/metadata-model",
      "docs/modeling/extending-the-metadata-model",
--- a/docs/features.md
+++ b/docs/features.md
@ -40,7 +40,7 @@ Our open sourcing [blog post](https://engineering.linkedin.com/blog/2020/open-so
 - **Schema history**: view and diff historic versions of schemas
 - **GraphQL**: visualization of GraphQL schemas

-### Jos/flows [*coming soon*]
+### Jobs/flows [*coming soon*]
 - **Search**: full-text & advanced search, search ranking
 - **Browse**: browsing through a configurable hierarchy
 - **Basic information**: 
--- a/metadata-ingestion/README.md
+++ b/metadata-ingestion/README.md
@ -28,38 +28,47 @@ If you run into an error, try checking the [_common setup issues_](./developing.

 #### Installing Plugins

-We use a plugin architecture so that you can install only the dependencies you actually need.
+We use a plugin architecture so that you can install only the dependencies you actually need. Click the plugin name to learn more about the specific source recipe and any FAQs!

-| Plugin Name     | Install Command                                            | Provides                            |
-| --------------- | ---------------------------------------------------------- | ----------------------------------- |
-| file            | _included by default_                                      | File source and sink                |
-| console         | _included by default_                                      | Console sink                        |
-| athena          | `pip install 'acryl-datahub[athena]'`                      | AWS Athena source                   |
-| bigquery        | `pip install 'acryl-datahub[bigquery]'`                    | BigQuery source                     |
-| bigquery-usage  | `pip install 'acryl-datahub[bigquery-usage]'`              | BigQuery usage statistics source    |
-| feast           | `pip install 'acryl-datahub[feast]'`                       | Feast source                        |
-| glue            | `pip install 'acryl-datahub[glue]'`                        | AWS Glue source                     |
-| hive            | `pip install 'acryl-datahub[hive]'`                        | Hive source                         |
-| mssql           | `pip install 'acryl-datahub[mssql]'`                       | SQL Server source                   |
-| mysql           | `pip install 'acryl-datahub[mysql]'`                       | MySQL source                        |
-| oracle          | `pip install 'acryl-datahub[oracle]'`                      | Oracle source                       |
-| postgres        | `pip install 'acryl-datahub[postgres]'`                    | Postgres source                     |
-| redshift        | `pip install 'acryl-datahub[redshift]'`                    | Redshift source                     |
-| sagemaker       | `pip install 'acryl-datahub[sagemaker]'`                   | AWS SageMaker source                |
-| sqlalchemy      | `pip install 'acryl-datahub[sqlalchemy]'`                  | Generic SQLAlchemy source           |
-| snowflake       | `pip install 'acryl-datahub[snowflake]'`                   | Snowflake source                    |
-| snowflake-usage | `pip install 'acryl-datahub[snowflake-usage]'`             | Snowflake usage statistics source   |
-| sql-profiles    | `pip install 'acryl-datahub[sql-profiles]'`                | Data profiles for SQL-based systems |
-| superset        | `pip install 'acryl-datahub[superset]'`                    | Superset source                     |
-| mongodb         | `pip install 'acryl-datahub[mongodb]'`                     | MongoDB source                      |
-| ldap            | `pip install 'acryl-datahub[ldap]'` ([extra requirements]) | LDAP source                         |
-| looker          | `pip install 'acryl-datahub[looker]'`                      | Looker source                       |
-| lookml          | `pip install 'acryl-datahub[lookml]'`                      | LookML source, requires Python 3.7+ |
-| kafka           | `pip install 'acryl-datahub[kafka]'`                       | Kafka source                        |
-| druid           | `pip install 'acryl-datahub[druid]'`                       | Druid Source                        |
-| dbt             | `pip install 'acryl-datahub[dbt]'`                         | dbt source                          |
-| datahub-rest    | `pip install 'acryl-datahub[datahub-rest]'`                | DataHub sink over REST API          |
-| datahub-kafka   | `pip install 'acryl-datahub[datahub-kafka]'`               | DataHub sink over Kafka             |
+Sources:
+
+| Plugin Name                                     | Install Command                                            | Provides                            |
+| ----------------------------------------------- | ---------------------------------------------------------- | ----------------------------------- |
+| [file](./source_docs/file.md)                   | _included by default_                                      | File source and sink                |
+| [athena](./source_docs/athena.md)               | `pip install 'acryl-datahub[athena]'`                      | AWS Athena source                   |
+| [bigquery](./source_docs/bigquery.md)           | `pip install 'acryl-datahub[bigquery]'`                    | BigQuery source                     |
+| [bigquery-usage](./source_docs/bigquery.md)     | `pip install 'acryl-datahub[bigquery-usage]'`              | BigQuery usage statistics source    |
+| [dbt](./source_docs/dbt.md)                     | _no additional dependencies_                               | dbt source                          |
+| [druid](./source_docs/druid.md)                 | `pip install 'acryl-datahub[druid]'`                       | Druid Source                        |
+| [feast](./source_docs/feast.md)                 | `pip install 'acryl-datahub[feast]'`                       | Feast source                        |
+| [glue](./source_docs/glue.md)                   | `pip install 'acryl-datahub[glue]'`                        | AWS Glue source                     |
+| [hive](./source_docs/hive.md)                   | `pip install 'acryl-datahub[hive]'`                        | Hive source                         |
+| [kafka](./source_docs/kafka.md)                 | `pip install 'acryl-datahub[kafka]'`                       | Kafka source                        |
+| [kafka-connect](./source_docs/kafka-connect.md) | `pip install 'acryl-datahub[kafka-connect]'`               | Kafka connect source                |
+| [ldap](./source_docs/ldap.md)                   | `pip install 'acryl-datahub[ldap]'` ([extra requirements]) | LDAP source                         |
+| [looker](./source_docs/looker.md)               | `pip install 'acryl-datahub[looker]'`                      | Looker source                       |
+| [lookml](./source_docs/lookml.md)               | `pip install 'acryl-datahub[lookml]'`                      | LookML source, requires Python 3.7+ |
+| [mongodb](./source_docs/mongodb.md)             | `pip install 'acryl-datahub[mongodb]'`                     | MongoDB source                      |
+| [mssql](./source_docs/mssql.md)                 | `pip install 'acryl-datahub[mssql]'`                       | SQL Server source                   |
+| [mysql](./source_docs/mysql.md)                 | `pip install 'acryl-datahub[mysql]'`                       | MySQL source                        |
+| [oracle](./source_docs/oracle.md)               | `pip install 'acryl-datahub[oracle]'`                      | Oracle source                       |
+| [postgres](./source_docs/postgres.md)           | `pip install 'acryl-datahub[postgres]'`                    | Postgres source                     |
+| [redshift](./source_docs/redshift.md)           | `pip install 'acryl-datahub[redshift]'`                    | Redshift source                     |
+| [sagemaker](./source_docs/sagemaker.md)         | `pip install 'acryl-datahub[sagemaker]'`                   | AWS SageMaker source                |
+| [snowflake](./source_docs/snowflake.md)         | `pip install 'acryl-datahub[snowflake]'`                   | Snowflake source                    |
+| [snowflake-usage](./source_docs/snowflake.md)   | `pip install 'acryl-datahub[snowflake-usage]'`             | Snowflake usage statistics source   |
+| sql-profiles                                    | `pip install 'acryl-datahub[sql-profiles]'`                | Data profiles for SQL-based systems |
+| [sqlalchemy](./source_docs/sqlalchemy.md)       | `pip install 'acryl-datahub[sqlalchemy]'`                  | Generic SQLAlchemy source           |
+| [superset](./source_docs/superset.md)           | `pip install 'acryl-datahub[superset]'`                    | Superset source                     |
+
+Sinks
+
+| Plugin Name                             | Install Command                              | Provides                   |
+| --------------------------------------- | -------------------------------------------- | -------------------------- |
+| [file](./sink_docs/file.md)             | _included by default_                        | File source and sink       |
+| [console](./sink_docs/console.md)       | _included by default_                        | Console sink               |
+| [datahub-rest](./sink_docs/datahub.md)  | `pip install 'acryl-datahub[datahub-rest]'`  | DataHub sink over REST API |
+| [datahub-kafka](./sink_docs/datahub.md) | `pip install 'acryl-datahub[datahub-kafka]'` | DataHub sink over Kafka    |

 These plugins can be mixed and matched as desired. For example:

@ -137,875 +146,7 @@ Running a recipe is quite easy.
 datahub ingest -c ./examples/recipes/mssql_to_datahub.yml
 ```

-A number of recipes are included in the examples/recipes directory.
-
-## Sources
-
-### Kafka Metadata `kafka`
-
-Extracts:
-
- List of topics - from the Kafka broker
- Schemas associated with each topic - from the schema registry
-
-```yml
-source:
-  type: "kafka"
-  config:
-    connection:
-      bootstrap: "broker:9092"
-      consumer_config: {} # passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.DeserializingConsumer
-      schema_registry_url: http://localhost:8081
-      schema_registry_config: {} # passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.schema_registry.SchemaRegistryClient
-```
-
-The options in the consumer config and schema registry config are passed to the Kafka DeserializingConsumer and SchemaRegistryClient respectively.
-
-For a full example with a number of security options, see this [example recipe](./examples/recipes/secured_kafka.yml).
-
-### MySQL Metadata `mysql`
-
-Extracts:
-
- List of databases and tables
- Column types and schema associated with each table
-
-```yml
-source:
-  type: mysql
-  config:
-    username: root
-    password: example
-    database: dbname
-    host_port: localhost:3306
-    table_pattern:
-      deny:
-        # Note that the deny patterns take precedence over the allow patterns.
-        - "performance_schema"
-      allow:
-        - "schema1.table2"
-      # Although the 'table_pattern' enables you to skip everything from certain schemas,
-      # having another option to allow/deny on schema level is an optimization for the case when there is a large number
-      # of schemas that one wants to skip and you want to avoid the time to needlessly fetch those tables only to filter
-      # them out afterwards via the table_pattern.
-    schema_pattern:
-      deny:
-        - "garbage_schema"
-      allow:
-        - "schema1"
-```
-
-### Microsoft SQL Server Metadata `mssql`
-
-We have two options for the underlying library used to connect to SQL Server: (1) [python-tds](https://github.com/denisenkom/pytds) and (2) [pyodbc](https://github.com/mkleehammer/pyodbc). The TDS library is pure Python and hence easier to install, but only PyODBC supports encrypted connections.
-
-Extracts:
-
- List of databases, schema, tables and views
- Column types associated with each table/view
-
-```yml
-source:
-  type: mssql
-  config:
-    username: user
-    password: pass
-    host_port: localhost:1433
-    database: DemoDatabase
-    include_views: True # whether to include views, defaults to True
-    table_pattern:
-      deny:
-        - "^.*\\.sys_.*" # deny all tables that start with sys_
-      allow:
-        - "schema1.table1"
-        - "schema1.table2"
-    options:
-      # Any options specified here will be passed to SQLAlchemy's create_engine as kwargs.
-      # See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details.
-      # Many of these options are specific to the underlying database driver, so that library's
-      # documentation will be a good reference for what is supported. To find which dialect is likely
-      # in use, consult this table: https://docs.sqlalchemy.org/en/14/dialects/index.html.
-      charset: "utf8"
-    # If set to true, we'll use the pyodbc library. This requires you to have
-    # already installed the Microsoft ODBC Driver for SQL Server.
-    # See https://docs.microsoft.com/en-us/sql/connect/python/pyodbc/step-1-configure-development-environment-for-pyodbc-python-development?view=sql-server-ver15
-    use_odbc: False
-    uri_args: {}
-```
-
-<details>
-  <summary>Example: using ingestion with ODBC and encryption</summary>
-
-This requires you to have already installed the Microsoft ODBC Driver for SQL Server.
-See https://docs.microsoft.com/en-us/sql/connect/python/pyodbc/step-1-configure-development-environment-for-pyodbc-python-development?view=sql-server-ver15
-
-```yml
-source:
-  type: mssql
-  config:
-    # See https://docs.sqlalchemy.org/en/14/dialects/mssql.html#module-sqlalchemy.dialects.mssql.pyodbc
-    use_odbc: True
-    username: user
-    password: pass
-    host_port: localhost:1433
-    database: DemoDatabase
-    include_views: True # whether to include views, defaults to True
-    uri_args:
-      # See https://docs.microsoft.com/en-us/sql/connect/odbc/dsn-connection-string-attribute?view=sql-server-ver15
-      driver: "ODBC Driver 17 for SQL Server"
-      Encrypt: "yes"
-      TrustServerCertificate: "Yes"
-      ssl: "True"
-      # Trusted_Connection: "yes"
-```
-
-</details>
-
-### Hive `hive`
-
-Extracts:
-
- List of databases, schema, and tables
- Column types associated with each table
- Detailed table and storage information
-
-```yml
-source:
-  type: hive
-  config:
-    # For more details on authentication, see the PyHive docs:
-    # https://github.com/dropbox/PyHive#passing-session-configuration.
-    # LDAP, Kerberos, etc. are supported using connect_args, which can be
-    # added under the `options` config parameter.
-    #scheme: 'hive+http' # set this if Thrift should use the HTTP transport
-    #scheme: 'hive+https' # set this if Thrift should use the HTTP with SSL transport
-    username: user # optional
-    password: pass # optional
-    host_port: localhost:10000
-    database: DemoDatabase # optional, if not specified, ingests from all databases
-    # table_pattern/schema_pattern is same as above
-    # options is same as above
-```
-
-<details>
-  <summary>Example: using ingestion with Azure HDInsight</summary>
-
-```yml
-# Connecting to Microsoft Azure HDInsight using TLS.
-source:
-  type: hive
-  config:
-    scheme: "hive+https"
-    host_port: <cluster_name>.azurehdinsight.net:443
-    username: admin
-    password: "<password>"
-    options:
-      connect_args:
-        http_path: "/hive2"
-        auth: BASIC
-    # table_pattern/schema_pattern is same as above
-```
-
-</details>
-
-### PostgreSQL `postgres`
-
-Extracts:
-
- List of databases, schema, and tables
- Column types associated with each table
- Also supports PostGIS extensions
- database_alias (optional) can be used to change the name of database to be ingested
-
-```yml
-source:
-  type: postgres
-  config:
-    username: user
-    password: pass
-    host_port: localhost:5432
-    database: DemoDatabase
-    database_alias: DatabaseNameToBeIngested
-    include_views: True # whether to include views, defaults to True
-    # table_pattern/schema_pattern is same as above
-    # options is same as above
-```
-
-### Redshift `redshift`
-
-Extracts:
-
- List of databases, schema, and tables
- Column types associated with each table
- Also supports PostGIS extensions
-
-```yml
-source:
-  type: redshift
-  config:
-    username: user
-    password: pass
-    host_port: example.something.us-west-2.redshift.amazonaws.com:5439
-    database: DemoDatabase
-    include_views: True # whether to include views, defaults to True
-    # table_pattern/schema_pattern is same as above
-    # options is same as above
-```
-
-<details>
-  <summary>Extra options when running Redshift behind a proxy</summary>
-
-This requires you to have already installed the Microsoft ODBC Driver for SQL Server.
-See https://docs.microsoft.com/en-us/sql/connect/python/pyodbc/step-1-configure-development-environment-for-pyodbc-python-development?view=sql-server-ver15
-
-```yml
-source:
-  type: redshift
-  config:
-    # username, password, database, etc are all the same as above
-    host_port: my-proxy-hostname:5439
-    options:
-      connect_args:
-        sslmode: "prefer" # or "require" or "verify-ca"
-        sslrootcert: ~ # needed to unpin the AWS Redshift certificate
-```
-
-</details>
-
-### AWS SageMaker `sagemaker`
-
-Extracts:
-
- Feature groups
- Models, jobs, and lineage between the two (e.g. when jobs output a model or a model is used by a job)
-
-```yml
-source:
-  type: sagemaker
-  config:
-    aws_region: # aws_region_name, i.e. "eu-west-1"
-    env: # environment for the DatasetSnapshot URN, one of "DEV", "EI", "PROD" or "CORP". Defaults to "PROD".
-
-    # Credentials. If not specified here, these are picked up according to boto3 rules.
-    # (see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html)
-    aws_access_key_id: # Optional.
-    aws_secret_access_key: # Optional.
-    aws_session_token: # Optional.
-    aws_role: # Optional (Role chaining supported by using a sorted list).
-
-    extract_feature_groups: True # if feature groups should be ingested, default True
-    extract_models: True # if models should be ingested, default True
-    extract_jobs: # if jobs should be ingested, default True for all
-      auto_ml: True
-      compilation: True
-      edge_packaging: True
-      hyper_parameter_tuning: True
-      labeling: True
-      processing: True
-      training: True
-      transform: True
-```
-
-### Snowflake `snowflake`
-
-Extracts:
-
- List of databases, schema, and tables
- Column types associated with each table
-
-```yml
-source:
-  type: snowflake
-  config:
-    username: user
-    password: pass
-    host_port: account_name
-    database_pattern:
-      # The escaping of the \$ symbol helps us skip the environment variable substitution.
-      allow:
-        - ^MY_DEMO_DATA.*
-        - ^ANOTHER_DB_REGEX
-      deny:
-        - ^SNOWFLAKE\$
-        - ^SNOWFLAKE_SAMPLE_DATA\$
-    warehouse: "COMPUTE_WH" # optional
-    role: "sysadmin" # optional
-    include_views: True # whether to include views, defaults to True
-    # table_pattern/schema_pattern is same as above
-    # options is same as above
-```
-
-:::tip
-
-You can also get fine-grained usage statistics for Snowflake using the `snowflake-usage` source.
-
-:::
-
-### SQL Profiles `sql-profiles`
-
-The SQL-based profiler does not run alone, but rather can be enabled for other SQL-based sources.
-Enabling profiling will slow down ingestion runs.
-
-Extracts:
-
- row and column counts for each table
- for each column, if applicable:
-  - null counts and proportions
-  - distinct counts and proportions
-  - minimum, maximum, mean, median, standard deviation, some quantile values
-  - histograms or frequencies of unique values
-
-Supported SQL sources:
-
- AWS Athena
- BigQuery
- Druid
- Hive
- Microsoft SQL Server
- MySQL
- Oracle
- Postgres
- Redshift
- Snowflake
- Generic SQLAlchemy source
-
-```yml
-source:
-  type: <sql-source> # can be bigquery, snowflake, etc - see above for the list
-  config:
-    # username, password, etc - varies by source type
-    profiling:
-      enabled: true
-      limit: 1000 # optional - max rows to profile
-      offset: 0 # optional - offset of first row to profile
-    profile_pattern:
-      deny:
-        # Skip all tables ending with "_staging"
-        - _staging\$
-      allow:
-        # Profile all tables in that start with "gold_" in "myschema"
-        - myschema\.gold_.*
-
-    # If you only want profiles (but no catalog information), set these to false
-    include_tables: true
-    include_views: true
-```
-
-:::caution
-
-Running profiling against many tables or over many rows can run up significant costs.
-While we've done our best to limit the expensiveness of the queries the profiler runs, you
-should be prudent about the set of tables profiling is enabled on or the frequency
-of the profiling runs.
-
-:::
-
-### Superset `superset`
-
-Extracts:
-
- List of charts and dashboards
-
-```yml
-source:
-  type: superset
-  config:
-    username: user
-    password: pass
-    provider: db | ldap
-    connect_uri: http://localhost:8088
-    env: "PROD" # Optional, default is "PROD"
-```
-
-See documentation for superset's `/security/login` at https://superset.apache.org/docs/rest-api for more details on superset's login api.
-
-### Oracle `oracle`
-
-Extracts:
-
- List of databases, schema, and tables
- Column types associated with each table
-
-Using the Oracle source requires that you've also installed the correct drivers; see the [cx_Oracle docs](https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html). The easiest one is the [Oracle Instant Client](https://www.oracle.com/database/technologies/instant-client.html).
-
-```yml
-source:
-  type: oracle
-  config:
-    # For more details on authentication, see the documentation:
-    # https://docs.sqlalchemy.org/en/14/dialects/oracle.html#dialect-oracle-cx_oracle-connect and
-    # https://cx-oracle.readthedocs.io/en/latest/user_guide/connection_handling.html#connection-strings.
-    username: user
-    password: pass
-    host_port: localhost:5432
-    database: dbname
-    service_name: svc # omit database if using this option
-    include_views: True # whether to include views, defaults to True
-    # table_pattern/schema_pattern is same as above
-    # options is same as above
-```
-
-### Feast `feast`
-
-**Note: Feast ingestion requires Docker to be installed.**
-
-Extracts:
-
- List of feature tables (modeled as [`MLFeatureTable`](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLFeatureTableProperties.pdl)s),
-  features ([`MLFeature`](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLFeatureProperties.pdl)s),
-  and entities ([`MLPrimaryKey`](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLPrimaryKeyProperties.pdl)s)
- Column types associated with each feature and entity
-
-Note: this uses a separate Docker container to extract Feast's metadata into a JSON file, which is then
-parsed to DataHub's native objects. This was done because of a dependency conflict in the `feast` module.
-
-```yml
-source:
-  type: feast
-  config:
-    core_url: localhost:6565 # default
-    env: "PROD" # Optional, default is "PROD"
-    use_local_build: False # Whether to build Feast ingestion image locally, default is False
-```
-
-### Google BigQuery `bigquery`
-
-Extracts:
-
- List of databases, schema, and tables
- Column types associated with each table
-
-```yml
-source:
-  type: bigquery
-  config:
-    project_id: project # optional - can autodetect from environment
-    options: # options is same as above
-      # See https://github.com/mxmzdlv/pybigquery#authentication for details.
-      credentials_path: "/path/to/keyfile.json" # optional
-    include_views: True # whether to include views, defaults to True
-    # table_pattern/schema_pattern is same as above
-```
-
-:::tip
-
-You can also get fine-grained usage statistics for BigQuery using the `bigquery-usage` source.
-
-:::
-
-### AWS Athena `athena`
-
-Extracts:
-
- List of databases and tables
- Column types associated with each table
-
-```yml
-source:
-  type: athena
-  config:
-    username: aws_access_key_id # Optional. If not specified, credentials are picked up according to boto3 rules.
-    # See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
-    password: aws_secret_access_key # Optional.
-    database: database # Optional, defaults to "default"
-    aws_region: aws_region_name # i.e. "eu-west-1"
-    s3_staging_dir: s3_location # "s3://<bucket-name>/prefix/"
-    # The s3_staging_dir parameter is needed because Athena always writes query results to S3.
-    # See https://docs.aws.amazon.com/athena/latest/ug/querying.html
-    # However, the athena driver will transparently fetch these results as you would expect from any other sql client.
-    work_group: athena_workgroup # "primary"
-    # table_pattern/schema_pattern is same as above
-```
-
-### AWS Glue `glue`
-
-Note: if you also have files in S3 that you'd like to ingest, we recommend you use Glue's built-in data catalog. See [here](./s3-ingestion.md) for a quick guide on how to set up a crawler on Glue and ingest the outputs with DataHub.
-
-Extracts:
-
- List of tables
- Column types associated with each table
- Table metadata, such as owner, description and parameters
- Jobs and their component transformations, data sources, and data sinks
-
-```yml
-source:
-  type: glue
-  config:
-    aws_region: # aws_region_name, i.e. "eu-west-1"
-    extract_transforms: True # whether to ingest Glue jobs, defaults to True
-    env: # environment for the DatasetSnapshot URN, one of "DEV", "EI", "PROD" or "CORP". Defaults to "PROD".
-
-    # Filtering patterns for databases and tables to scan
-    database_pattern: # Optional, to filter databases scanned, same as schema_pattern above.
-    table_pattern: # Optional, to filter tables scanned, same as table_pattern above.
-
-    # Credentials. If not specified here, these are picked up according to boto3 rules.
-    # (see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html)
-    aws_access_key_id: # Optional.
-    aws_secret_access_key: # Optional.
-    aws_session_token: # Optional.
-    aws_role: # Optional (Role chaining supported by using a sorted list).
-    underlying_platform: #Optional (Can change platform name to be athena)
-```
-
-### Druid `druid`
-
-Extracts:
-
- List of databases, schema, and tables
- Column types associated with each table
-
-**Note** It is important to define a explicitly define deny schema pattern for internal druid databases (lookup & sys)
-if adding a schema pattern otherwise the crawler may crash before processing relevant databases.
-This deny pattern is defined by default but is overriden by user-submitted configurations
-
-```yml
-source:
-  type: druid
-  config:
-    # Point to broker address
-    host_port: localhost:8082
-    schema_pattern:
-      deny:
-        - "^(lookup|sys).*"
-    # options is same as above
-```
-
-### Other databases using SQLAlchemy `sqlalchemy`
-
-The `sqlalchemy` source is useful if we don't have a pre-built source for your chosen
-database system, but there is an [SQLAlchemy dialect](https://docs.sqlalchemy.org/en/14/dialects/)
-defined elsewhere. In order to use this, you must `pip install` the required dialect packages yourself.
-
-Extracts:
-
- List of schemas and tables
- Column types associated with each table
-
-```yml
-source:
-  type: sqlalchemy
-  config:
-    # See https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls
-    connect_uri: "dialect+driver://username:password@host:port/database"
-    options: {} # same as above
-    schema_pattern: {} # same as above
-    table_pattern: {} # same as above
-    include_views: True # whether to include views, defaults to True
-```
-
-### MongoDB `mongodb`
-
-Extracts:
-
- List of databases
- List of collections in each database and infers schemas for each collection
-
-By default, schema inference samples 1,000 documents from each collection. Setting `schemaSamplingSize: null` will scan the entire collection.
-Moreover, setting `useRandomSampling: False` will sample the first documents found without random selection, which may be faster for large collections.
-
-Note that `schemaSamplingSize` has no effect if `enableSchemaInference: False` is set.
-
-```yml
-source:
-  type: "mongodb"
-  config:
-    # For advanced configurations, see the MongoDB docs.
-    # https://pymongo.readthedocs.io/en/stable/examples/authentication.html
-    connect_uri: "mongodb://localhost"
-    username: admin
-    password: password
-    env: "PROD" # Optional, default is "PROD"
-    authMechanism: "DEFAULT"
-    options: {}
-    database_pattern: {}
-    collection_pattern: {}
-    enableSchemaInference: True
-    schemaSamplingSize: 1000
-    useRandomSampling: True # whether to randomly sample docs for schema or just use the first ones, True by default
-    # database_pattern/collection_pattern are similar to schema_pattern/table_pattern from above
-```
-
-### LDAP `ldap`
-
-Extracts:
-
- List of people
- Names, emails, titles, and manager information for each person
- List of groups
-
-```yml
-source:
-  type: "ldap"
-  config:
-    ldap_server: ldap://localhost
-    ldap_user: "cn=admin,dc=example,dc=org"
-    ldap_password: "admin"
-    base_dn: "dc=example,dc=org"
-    filter: "(objectClass=*)" # optional field
-    drop_missing_first_last_name: False # optional
-```
-
-The `drop_missing_first_last_name` should be set to true if you've got many "headless" user LDAP accounts
-for devices or services should be excluded when they do not contain a first and last name. This will only
-impact the ingestion of LDAP users, while LDAP groups will be unaffected by this config option.
-
-### LookML `lookml`
-
-Note! This plugin uses a package that requires Python 3.7+!
-
-Extracts:
-
- LookML views from model files
- Name, upstream table names, dimensions, measures, and dimension groups
-
-```yml
-source:
-  type: "lookml"
-  config:
-    base_folder: /path/to/model/files # where the *.model.lkml and *.view.lkml files are stored
-    connection_to_platform_map: # mappings between connection names in the model files to platform names
-      connection_name: platform_name (or platform_name.database_name) # for ex. my_snowflake_conn: snowflake.my_database
-    model_pattern: {}
-    view_pattern: {}
-    env: "PROD" # optional, default is "PROD"
-    parse_table_names_from_sql: False # see note below
-    platform_name: "looker" # optional, default is "looker"
-```
-
-Note! The integration can use [`sql-metadata`](https://pypi.org/project/sql-metadata/) to try to parse the tables the
-views depends on. As these SQL's can be complicated, and the package doesn't official support all the SQL dialects that
-Looker supports, the result might not be correct. This parsing is disabled by default, but can be enabled by setting
-`parse_table_names_from_sql: True`.
-
-### Looker dashboards `looker`
-
-Extracts:
-
- Looker dashboards and dashboard elements (charts)
- Names, descriptions, URLs, chart types, input view for the charts
-
-See the [Looker authentication docs](https://docs.looker.com/reference/api-and-integration/api-auth#authentication_with_an_sdk) for the steps to create a client ID and secret.
-
-```yml
-source:
-  type: "looker"
-  config:
-    client_id: # Your Looker API3 client ID
-    client_secret: # Your Looker API3 client secret
-    base_url: # The url to your Looker instance: https://company.looker.com:19999 or https://looker.company.com, or similar.
-    dashboard_pattern: # supports allow/deny regexes
-    chart_pattern: # supports allow/deny regexes
-    actor: urn:li:corpuser:etl # Optional, defaults to urn:li:corpuser:etl
-    env: "PROD" # Optional, default is "PROD"
-    platform_name: "looker" # Optional, default is "looker"
-```
-
-### File `file`
-
-Pulls metadata from a previously generated file. Note that the file sink
-can produce such files, and a number of samples are included in the
-[examples/mce_files](examples/mce_files) directory.
-
-```yml
-source:
-  type: file
-  config:
-    filename: ./path/to/mce/file.json
-```
-
-### dbt `dbt`
-
-Pull metadata from dbt artifacts files:
-
- [dbt manifest file](https://docs.getdbt.com/reference/artifacts/manifest-json)
-  - This file contains model, source and lineage data.
- [dbt catalog file](https://docs.getdbt.com/reference/artifacts/catalog-json)
-  - This file contains schema data.
-  - dbt does not record schema data for Ephemeral models, as such datahub will show Ephemeral models in the lineage, however there will be no associated schema for Ephemeral models
- [dbt sources file](https://docs.getdbt.com/reference/artifacts/sources-json)
-  - This file contains metadata for sources with freshness checks.
-  - We transfer dbt's freshness checks to DataHub's last-modified fields.
-  - Note that this file is optional – if not specified, we'll use time of ingestion instead as a proxy for time last-modified.
- target_platform:
-  - The data platform you are enriching with dbt metadata.
-  - [data platforms](https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/resources/DataPlatformInfo.json)
- load_schemas:
-  - Load schemas from dbt catalog file, not necessary when the underlying data platform already has this data.
- node_type_pattern:
-  - Use this filter to exclude and include node types using allow or deny method
-
-```yml
-source:
-  type: "dbt"
-  config:
-    manifest_path: "./path/dbt/manifest_file.json"
-    catalog_path: "./path/dbt/catalog_file.json"
-    sources_path: "./path/dbt/sources_file.json" # (optional, used for freshness checks)
-    target_platform: "postgres" # optional, eg "postgres", "snowflake", etc.
-    load_schemas: True or False
-    node_type_pattern: # optional
-      deny:
-        - ^test.*
-      allow:
-        - ^.*
-```
-
-Note: when `load_schemas` is False, models that use [identifiers](https://docs.getdbt.com/reference/resource-properties/identifier) to reference their source tables are ingested using the model identifier as the model name to preserve the lineage.
-
-### Google BigQuery Usage Stats `bigquery-usage`
-
- Fetch a list of queries issued
- Fetch a list of tables and columns accessed
- Aggregate these statistics into buckets, by day or hour granularity
-
-Note: the client must have one of the following OAuth scopes, and should be authorized on all projects you'd like to ingest usage stats from.
-
- https://www.googleapis.com/auth/logging.read
- https://www.googleapis.com/auth/logging.admin
- https://www.googleapis.com/auth/cloud-platform.read-only
- https://www.googleapis.com/auth/cloud-platform
-
-```yml
-source:
-  type: bigquery-usage
-  config:
-    projects: # optional - can autodetect a single project from the environment
-      - project_id_1
-      - project_id_2
-    options:
-      # See https://googleapis.dev/python/logging/latest/client.html for details.
-      credentials: ~ # optional - see docs
-    env: PROD
-
-    bucket_duration: "DAY"
-    start_time: ~ # defaults to the last full day in UTC (or hour)
-    end_time: ~ # defaults to the last full day in UTC (or hour)
-
-    top_n_queries: 10 # number of queries to save for each table
-```
-
-:::note
-
-This source only does usage statistics. To get the tables, views, and schemas in your BigQuery project, use the `bigquery` source.
-
-:::
-
-### Snowflake Usage Stats `snowflake-usage`
-
- Fetch a list of queries issued
- Fetch a list of tables and columns accessed (excludes views)
- Aggregate these statistics into buckets, by day or hour granularity
-
-Note: the user/role must have access to the account usage table. The "accountadmin" role has this by default, and other roles can be [granted this permission](https://docs.snowflake.com/en/sql-reference/account-usage.html#enabling-account-usage-for-other-roles).
-
-Note: the underlying access history views that we use are only available in Snowflake's enterprise edition or higher.
-
-```yml
-source:
-  type: snowflake-usage
-  config:
-    username: user
-    password: pass
-    host_port: account_name
-    role: ACCOUNTADMIN
-    env: PROD
-
-    bucket_duration: "DAY"
-    start_time: ~ # defaults to the last full day in UTC (or hour)
-    end_time: ~ # defaults to the last full day in UTC (or hour)
-
-    top_n_queries: 10 # number of queries to save for each table
-```
-
-:::note
-
-This source only does usage statistics. To get the tables, views, and schemas in your Snowflake warehouse, ingest using the `snowflake` source.
-
-:::
-
-### Kafka Connect `kafka-connect`
-
-Extracts:
-
- Kafka Connect connector as individual `DataFlowSnapshotClass` entity
- Creating individual `DataJobSnapshotClass` entity using `{connector_name}:{source_dataset}` naming
- Lineage information between source database to Kafka topic
-
-```yml
-source:
-  type: "kafka-connect"
-  config:
-    connect_uri: "http://localhost:8083"
-    cluster_name: "connect-cluster"
-    connector_patterns:
-      deny:
-        - ^denied-connector.*
-      allow:
-        - ^allowed-connector.*
-```
-
-Current limitations:
-
- Currently works only for Debezium source connectors.
-
-## Sinks
-
-### DataHub Rest `datahub-rest`
-
-Pushes metadata to DataHub using the GMA rest API. The advantage of the rest-based interface
-is that any errors can immediately be reported.
-
-```yml
-sink:
-  type: "datahub-rest"
-  config:
-    server: "http://localhost:8080"
-```
-
-### DataHub Kafka `datahub-kafka`
-
-Pushes metadata to DataHub by publishing messages to Kafka. The advantage of the Kafka-based
-interface is that it's asynchronous and can handle higher throughput. This requires the
-Datahub mce-consumer container to be running.
-
-```yml
-sink:
-  type: "datahub-kafka"
-  config:
-    connection:
-      bootstrap: "localhost:9092"
-      producer_config: {} # passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.SerializingProducer
-      schema_registry_url: "http://localhost:8081"
-      schema_registry_config: {} # passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.schema_registry.SchemaRegistryClient
-```
-
-The options in the producer config and schema registry config are passed to the Kafka SerializingProducer and SchemaRegistryClient respectively.
-
-For a full example with a number of security options, see this [example recipe](./examples/recipes/secured_kafka.yml).
-
-### Console `console`
-
-Simply prints each metadata event to stdout. Useful for experimentation and debugging purposes.
-
-```yml
-sink:
-  type: "console"
-```
-
-### File `file`
-
-Outputs metadata to a file. This can be used to decouple metadata sourcing from the
-process of pushing it into DataHub, and is particularly useful for debugging purposes.
-Note that the file source can read files generated by this sink.
-
-```yml
-sink:
-  type: file
-  config:
-    filename: ./path/to/mce/file.json
-```
+A number of recipes are included in the [examples/recipes](./examples/recipes) directory. For full info and context on each source and sink, see the pages described in the [table of plugins](#installing-plugins).

 ## Transformations

@ -1040,10 +181,13 @@ If you're simply looking to run ingestion on a schedule, take a look at these sa
 The Airflow lineage backend is only supported in Airflow 1.10.15+ and 2.0.2+.

 :::
+
 1. You need to install the required dependency in your airflow. See https://registry.astronomer.io/providers/datahub/modules/datahublineagebackend
-  ```shell
-    pip install acryl-datahub[airflow]
-  ```
+
+```shell
+  pip install acryl-datahub[airflow]
+```
+
 2. You must configure an Airflow hook for Datahub. We support both a Datahub REST hook and a Kafka-based hook, but you only need one.

   ```shell
--- a/metadata-ingestion/examples/recipes/mongodb_to_datahub.yml
+++ b/metadata-ingestion/examples/recipes/mongodb_to_datahub.yml
@ -13,7 +13,6 @@ source:
    collection_pattern: {}
    enableSchemaInference: True
    schemaSamplingSize: 1000
-    # database_pattern/collection_pattern are similar to schema_pattern/table_pattern from above
 sink:
  type: "datahub-rest"
  config:
--- a/metadata-ingestion/sink_docs/console.md
+++ b/metadata-ingestion/sink_docs/console.md
@ -0,0 +1,33 @@
+# Console
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+Works with `acryl-datahub` out of the box.
+
+## Capabilities
+
+Simply prints each metadata event to stdout. Useful for experimentation and debugging purposes.
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  # source configs
+
+sink:
+  type: "console"
+```
+
+## Config details
+
+None!
+
+## Questions
+
+If you've got any questions on configuring this sink, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/sink_docs/datahub.md
+++ b/metadata-ingestion/sink_docs/datahub.md
@ -0,0 +1,87 @@
+# DataHub
+
+## DataHub Rest
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+### Setup
+
+To install this plugin, run `pip install 'acryl-datahub[datahub-rest]'`.
+
+### Capabilities
+
+Pushes metadata to DataHub using the GMA rest API. The advantage of the rest-based interface
+is that any errors can immediately be reported.
+
+### Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  # source configs
+sink:
+  type: "datahub-rest"
+  config:
+    server: "http://localhost:8080"
+```
+
+### Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field    | Required | Default | Description                  |
+| -------- | -------- | ------- | ---------------------------- |
+| `server` | ✅       |         | URL of DataHub GMS endpoint. |
+
+## DataHub Kafka
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+### Setup
+
+To install this plugin, run `pip install 'acryl-datahub[datahub-kafka]'`.
+
+### Capabilities
+
+Pushes metadata to DataHub by publishing messages to Kafka. The advantage of the Kafka-based
+interface is that it's asynchronous and can handle higher throughput.
+
+### Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  # source configs
+
+sink:
+  type: "datahub-kafka"
+  config:
+    connection:
+      bootstrap: "localhost:9092"
+      schema_registry_url: "http://localhost:8081"
+```
+
+### Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                                        | Required | Default | Description                                                                                                                                              |
+| -------------------------------------------- | -------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `connection.bootstrap`                       | ✅       |         | Kafka bootstrap URL.                                                                                                                                     |
+| `connection.producer_config.<option>`        |          |         | Passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.SerializingProducer                  |
+| `connection.schema_registry_url`             | ✅       |         | URL of schema registry being used.                                                                                                                       |
+| `connection.schema_registry_config.<option>` |          |         | Passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.schema_registry.SchemaRegistryClient |
+
+The options in the producer config and schema registry config are passed to the Kafka SerializingProducer and SchemaRegistryClient respectively.
+
+For a full example with a number of security options, see this [example recipe](../examples/recipes/secured_kafka.yml).
+
+## Questions
+
+If you've got any questions on configuring this sink, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/sink_docs/file.md
+++ b/metadata-ingestion/sink_docs/file.md
@ -0,0 +1,41 @@
+# File
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+Works with `acryl-datahub` out of the box.
+
+## Capabilities
+
+Outputs metadata to a file. This can be used to decouple metadata sourcing from the
+process of pushing it into DataHub, and is particularly useful for debugging purposes.
+Note that the [file source](../source_docs/file.md) can read files generated by this sink.
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  # source configs
+
+sink:
+  type: file
+  config:
+    filename: ./path/to/mce/file.json
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field    | Required | Default | Description               |
+| -------- | -------- | ------- | ------------------------- |
+| filename | ✅       |         | Path to file to write to. |
+
+## Questions
+
+If you've got any questions on configuring this sink, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/athena.md
+++ b/metadata-ingestion/source_docs/athena.md
@ -0,0 +1,70 @@
+# Athena
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[athena]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, and tables
+- Column types associated with each table
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: athena
+  config:
+    # Coordinates
+    aws_region: my_aws_region_name
+    work_group: my_work_group
+
+    # Credentials
+    username: my_aws_access_key_id
+    password: my_aws_secret_access_key
+    database: my_database
+
+    # Options
+    s3_staging_dir: "s3://<bucket-name>/<folder>/"
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                  | Required | Default      | Description                                                                                                                                                                                                |
+| ---------------------- | -------- | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `username`             |          | Autodetected | Username credential. If not specified, detected with boto3 rules. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html                                                       |
+| `password`             |          | Autodetected | Same detection scheme as `username`                                                                                                                                                                        |
+| `database`             |          | Autodetected |                                                                                                                                                                                                            |
+| `aws_region`           | ✅       |              | AWS region code.                                                                                                                                                                                           |
+| `s3_staging_dir`       | ✅       |              | Of format `"s3://<bucket-name>/prefix/"`. The `s3_staging_dir` parameter is needed because Athena always writes query results to S3. <br />See https://docs.aws.amazon.com/athena/latest/ug/querying.html. |
+| `work_group`           | ✅       |              | Name of Athena workgroup. <br />See https://docs.aws.amazon.com/athena/latest/ug/manage-queries-control-costs-with-workgroups.html.                                                                        |
+| `env`                  |          | `"PROD"`     | Environment to use in namespace when constructing URNs.                                                                                                                                                    |
+| `options.<option>`     |          |              | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details.                    |
+| `table_pattern.allow`  |          |              | Regex pattern for tables to include in ingestion.                                                                                                                                                          |
+| `table_pattern.deny`   |          |              | Regex pattern for tables to exclude from ingestion.                                                                                                                                                        |
+| `schema_pattern.allow` |          |              | Regex pattern for schemas to include in ingestion.                                                                                                                                                         |
+| `schema_pattern.deny`  |          |              | Regex pattern for schemas to exclude from ingestion.                                                                                                                                                       |
+| `view_pattern.allow`   |          |              | Regex pattern for views to include in ingestion.                                                                                                                                                           |
+| `view_pattern.deny`    |          |              | Regex pattern for views to exclude from ingestion.                                                                                                                                                         |
+| `include_tables`       |          | `True`       | Whether tables should be ingested.                                                                                                                                                                         |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/bigquery.md
+++ b/metadata-ingestion/source_docs/bigquery.md
@ -0,0 +1,135 @@
+# BigQuery
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[bigquery]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, and tables
+- Column types associated with each table
+
+:::tip
+
+You can also get fine-grained usage statistics for BigQuery using the `bigquery-usage` source described below.
+
+:::
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: bigquery
+  config:
+    # Coordinates
+    project_id: my_project_id
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                  | Required | Default      | Description                                                                                                                                                                             |
+| ---------------------- | -------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `project_id`           |          | Autodetected | Project ID to ingest from. If not specified, will infer from environment.                                                                                                               |
+| `env`                  |          | `"PROD"`     | Environment to use in namespace when constructing URNs.                                                                                                                                 |
+| `options.<option>`     |          |              | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. |
+| `table_pattern.allow`  |          |              | Regex pattern for tables to include in ingestion.                                                                                                                                       |
+| `table_pattern.deny`   |          |              | Regex pattern for tables to exclude from ingestion.                                                                                                                                     |
+| `schema_pattern.allow` |          |              | Regex pattern for schemas to include in ingestion.                                                                                                                                      |
+| `schema_pattern.deny`  |          |              | Regex pattern for schemas to exclude from ingestion.                                                                                                                                    |
+| `view_pattern.allow`   |          |              | Regex pattern for views to include in ingestion.                                                                                                                                        |
+| `view_pattern.deny`    |          |              | Regex pattern for views to exclude from ingestion.                                                                                                                                      |
+| `include_tables`       |          | `True`       | Whether tables should be ingested.                                                                                                                                                      |
+| `include_views`        |          | `True`       | Whether views should be ingested.                                                                                                                                                       |
+
+## Compatibility
+
+Coming soon!
+
+## BigQuery Usage Stats
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+### Setup
+
+To install this plugin, run `pip install 'acryl-datahub[bigquery-usage]'`.
+
+### Capabilities
+
+This plugin extracts the following:
+
+- Statistics on queries issued and tables and columns accessed (excludes views)
+- Aggregation of these statistics into buckets, by day or hour granularity
+
+Note: the client must have one of the following OAuth scopes, and should be authorized on all projects you'd like to ingest usage stats from.
+
+- https://www.googleapis.com/auth/logging.read
+- https://www.googleapis.com/auth/logging.admin
+- https://www.googleapis.com/auth/cloud-platform.read-only
+- https://www.googleapis.com/auth/cloud-platform
+
+:::note
+
+This source only does usage statistics. To get the tables, views, and schemas in your BigQuery project, use the `bigquery` source described above.
+
+:::
+
+### Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: bigquery-usage
+  config:
+    # Coordinates
+    projects:
+      - project_id_1
+      - project_id_2
+
+    # Options
+    top_n_queries: 10
+
+sink:
+  # sink configs
+```
+
+### Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+By default, we extract usage stats for the last day, with the recommendation that this source is executed every day.
+
+| Field                  | Required | Default                                                        | Description                                                                                                                                                                                                                                                                                                                                                                            |
+| ---------------------- | -------- | -------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `projects`             |          |                                                                |                                                                                                                                                                                                                                                                                                                                                                                        |
+| `extra_client_options` |          |                                                                |                                                                                                                                                                                                                                                                                                                                                                                        |
+| `env`                  |          | `"PROD"`                                                       | Environment to use in namespace when constructing URNs.                                                                                                                                                                                                                                                                                                                                |
+| `start_time`           |          | Last full day in UTC (or hour, depending on `bucket_duration`) | Earliest date of usage logs to consider.                                                                                                                                                                                                                                                                                                                                               |
+| `end_time`             |          | Last full day in UTC (or hour, depending on `bucket_duration`) | Latest date of usage logs to consider.                                                                                                                                                                                                                                                                                                                                                 |
+| `top_n_queries`        |          | `10`                                                           | Number of top queries to save to each table.                                                                                                                                                                                                                                                                                                                                           |
+| `extra_client_options` |          |                                                                | Additional options to pass to `google.cloud.logging_v2.client.Client`.                                                                                                                                                                                                                                                                                                                 |
+| `query_log_deplay`     |          |                                                                | To account for the possibility that the query event arrives after the read event in the audit logs, we wait for at least `query_log_delay` additional events to be processed before attempting to resolve BigQuery job information from the logs. If `query_log_delay` is `None`, it gets treated as an unlimited delay, which prioritizes correctness at the expense of memory usage. |
+| `max_query_duration`   |          | `15`                                                           | Correction to pad `start_time` and `end_time` with. For handling the case where the read happens within our time range but the query completion event is delayed and happens after the configured end time.                                                                                                                                                                            |
+
+### Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/dbt.md
+++ b/metadata-ingestion/source_docs/dbt.md
@ -0,0 +1,76 @@
+# dbt
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+Works with `acryl-datahub` out of the box.
+
+## Capabilities
+
+This plugin pulls metadata from dbt's artifact files:
+
+- [dbt manifest file](https://docs.getdbt.com/reference/artifacts/manifest-json)
+  - This file contains model, source and lineage data.
+- [dbt catalog file](https://docs.getdbt.com/reference/artifacts/catalog-json)
+  - This file contains schema data.
+  - dbt does not record schema data for Ephemeral models, as such datahub will show Ephemeral models in the lineage, however there will be no associated schema for Ephemeral models
+- [dbt sources file](https://docs.getdbt.com/reference/artifacts/sources-json)
+  - This file contains metadata for sources with freshness checks.
+  - We transfer dbt's freshness checks to DataHub's last-modified fields.
+  - Note that this file is optional – if not specified, we'll use time of ingestion instead as a proxy for time last-modified.
+- target_platform:
+  - The data platform you are enriching with dbt metadata.
+  - [data platforms](https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/resources/DataPlatformInfo.json)
+- load_schemas:
+  - Load schemas from dbt catalog file, not necessary when the underlying data platform already has this data.
+- node_type_pattern:
+  - Use this filter to exclude and include node types using allow or deny method
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: "dbt"
+  config:
+    # Coordinates
+    manifest_path: "./path/dbt/manifest_file.json"
+    catalog_path: "./path/dbt/catalog_file.json"
+    sources_path: "./path/dbt/sources_file.json"
+
+    # Options
+    target_platform: "my_target_platform_id"
+    load_schemas: True # note: if this is disabled
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                     | Required | Default  | Description                                                                                                                                           |
+| ------------------------- | -------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `manifest_path`           | ✅       |          | Path to dbt manifest JSON. See https://docs.getdbt.com/reference/artifacts/manifest-json                                                              |
+| `catalog_path`            | ✅       |          | Path to dbt catalog JSON. See https://docs.getdbt.com/reference/artifacts/catalog-json                                                                |
+| `sources_path`            |          |          | Path to dbt sources JSON. See https://docs.getdbt.com/reference/artifacts/sources-json. If not specified, last-modified fields will not be populated. |
+| `env`                     |          | `"PROD"` | Environment to use in namespace when constructing URNs.                                                                                               |
+| `target_platform`         | ✅       |          | The platform that dbt is loading onto.                                                                                                                |
+| `load_schemas`            | ✅       |          | Whether to load database schemas. If set to `False`, table schema details (e.g. columns) will not be ingested.                                        |
+| `node_type_pattern.allow` |          |          | Regex pattern for dbt nodes to include in ingestion.                                                                                                  |
+| `node_type_pattern.deny`  |          |          | Regex pattern for dbt nodes to exclude from ingestion.                                                                                                |
+
+Note: when `load_schemas` is False, models that use [identifiers](https://docs.getdbt.com/reference/resource-properties/identifier) to reference their source tables are ingested using the model identifier as the model name to preserve the lineage.
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/druid.md
+++ b/metadata-ingestion/source_docs/druid.md
@ -0,0 +1,67 @@
+# Druid
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[druid]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, and tables
+- Column types associated with each table
+
+**Note**: It is important to explicitly define the deny schema pattern for internal Druid databases (lookup & sys) if adding a schema pattern. Otherwise, the crawler may crash before processing relevant databases. This deny pattern is defined by default but is overriden by user-submitted configurations.
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: druid
+  config:
+    # Coordinates
+    host_port: "localhost:8082"
+
+    # Credentials
+    username: admin
+    password: password
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                  | Required | Default                 | Description                                                                                                                                                                             |
+| ---------------------- | -------- | ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `username`             |          |                         | Database username.                                                                                                                                                                      |
+| `password`             |          |                         | Database password.                                                                                                                                                                      |
+| `host_port`            | ✅       |                         | Host URL and port to connect to.                                                                                                                                                        |
+| `database`             |          |                         | Database to ingest.                                                                                                                                                                     |
+| `database_alias`       |          |                         | Alias to apply to database when ingesting.                                                                                                                                              |
+| `env`                  |          | `"PROD"`                | Environment to use in namespace when constructing URNs.                                                                                                                                 |
+| `options.<option>`     |          |                         | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. |
+| `table_pattern.allow`  |          |                         | Regex pattern for tables to include in ingestion.                                                                                                                                       |
+| `table_pattern.deny`   |          |                         | Regex pattern for tables to exclude from ingestion.                                                                                                                                     |
+| `schema_pattern.allow` |          |                         | Regex pattern for schemas to include in ingestion.                                                                                                                                      |
+| `schema_pattern.deny`  |          | `"^(lookup \| sys).\*"` | Regex pattern for schemas to exclude from ingestion.                                                                                                                                    |
+| `view_pattern.allow`   |          |                         | Regex pattern for views to include in ingestion.                                                                                                                                        |
+| `view_pattern.deny`    |          |                         | Regex pattern for views to exclude from ingestion.                                                                                                                                      |
+| `include_tables`       |          | `True`                  | Whether tables should be ingested.                                                                                                                                                      |
+| `include_views`        |          | `True`                  | Whether views should be ingested.                                                                                                                                                       |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/feast.md
+++ b/metadata-ingestion/source_docs/feast.md
@ -0,0 +1,56 @@
+# Feast
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+**Note: Feast ingestion requires Docker to be installed.**
+
+To install this plugin, run `pip install 'acryl-datahub[feast]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- List of feature tables (modeled as [`MLFeatureTable`](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLFeatureTableProperties.pdl)s),
+  features ([`MLFeature`](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLFeatureProperties.pdl)s),
+  and entities ([`MLPrimaryKey`](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLPrimaryKeyProperties.pdl)s)
+- Column types associated with each feature and entity
+
+Note: this uses a separate Docker container to extract Feast's metadata into a JSON file, which is then
+parsed to DataHub's native objects. This separation was performed because of a dependency conflict in the `feast` module.
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: feast
+  config:
+    # Coordinates
+    core_url: "localhost:6565"
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field             | Required | Default            | Description                                             |
+| ----------------- | -------- | ------------------ | ------------------------------------------------------- |
+| `core_url`        |          | `"localhost:6565"` | URL of Feast Core instance.                             |
+| `env`             |          | `"PROD"`           | Environment to use in namespace when constructing URNs. |
+| `use_local_build` |          | `False`            | Whether to build Feast ingestion Docker image locally.  |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/file.md
+++ b/metadata-ingestion/source_docs/file.md
@ -0,0 +1,46 @@
+# File
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+Works with `acryl-datahub` out of the box.
+
+## Capabilities
+
+This plugin pulls metadata from a previously generated file. The [file sink](../sink_docs/file.md)
+can produce such files, and a number of samples are included in the
+[examples/mce_files](../examples/mce_files) directory.
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: file
+  config:
+    # Coordinates
+    filename: ./path/to/mce/file.json
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field      | Required | Default | Description             |
+| ---------- | -------- | ------- | ----------------------- |
+| `filename` | ✅       |         | Path to file to ingest. |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/glue.md
+++ b/metadata-ingestion/source_docs/glue.md
@ -0,0 +1,62 @@
+# Glue
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[glue]'`.
+
+Note: if you also have files in S3 that you'd like to ingest, we recommend you use Glue's built-in data catalog. See [here](../s3-ingestion.md) for a quick guide on how to set up a crawler on Glue and ingest the outputs with DataHub.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Tables in the Glue catalog
+- Column types associated with each table
+- Table metadata, such as owner, description and parameters
+- Jobs and their component transformations, data sources, and data sinks
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: glue
+  config:
+    # Coordinates
+    aws_region: "my-aws-region"
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                    | Required | Default                     | Description                                                                        |
+| ------------------------ | -------- | --------------------------- | ---------------------------------------------------------------------------------- |
+| `aws_region`             | ✅       |                             | AWS region code.                                                                   |
+| `env`                    |          | `"PROD"`                    | Environment to use in namespace when constructing URNs.                            |
+| `aws_access_key_id`      |          | Autodetected                | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
+| `aws_secret_access_key`  |          | Autodetected                | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
+| `aws_session_token`      |          | Autodetected                | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
+| `aws_role`               |          | Autodetected                | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
+| `extract_transforms`     |          | `True`                      | Whether to extract Glue transform jobs.                                            |
+| `database_pattern.allow` |          |                             | Regex pattern for databases to include in ingestion.                               |
+| `database_pattern.deny`  |          |                             | Regex pattern for databases to exclude from ingestion.                             |
+| `table_pattern.allow`    |          |                             | Regex pattern for tables to include in ingestion.                                  |
+| `table_pattern.deny`     |          |                             | Regex pattern for tables to exclude from ingestion.                                |
+| `underlying_platform`    |          | Override for platform name. |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/hive.md
+++ b/metadata-ingestion/source_docs/hive.md
@ -0,0 +1,100 @@
+# Hive
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[hive]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, and tables
+- Column types associated with each table
+- Detailed table and storage information
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: hive
+  config:
+    # Coordinates
+    host_port: localhost:10000
+    database: DemoDatabase # optional, if not specified, ingests from all databases
+
+    # Credentials
+    username: user # optional
+    password: pass # optional
+
+    # For more details on authentication, see the PyHive docs:
+    # https://github.com/dropbox/PyHive#passing-session-configuration.
+    # LDAP, Kerberos, etc. are supported using connect_args, which can be
+    # added under the `options` config parameter.
+    #scheme: 'hive+http' # set this if Thrift should use the HTTP transport
+    #scheme: 'hive+https' # set this if Thrift should use the HTTP with SSL transport
+
+sink:
+  # sink configs
+```
+
+<details>
+  <summary>Example: using ingestion with Azure HDInsight</summary>
+
+```yml
+# Connecting to Microsoft Azure HDInsight using TLS.
+source:
+  type: hive
+  config:
+    # Coordinates
+    host_port: <cluster_name>.azurehdinsight.net:443
+
+    # Credentials
+    username: admin
+    password: password
+
+    # Options
+    options:
+      connect_args:
+        http_path: "/hive2"
+        auth: BASIC
+
+sink:
+  # sink configs
+```
+
+</details>
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                  | Required | Default  | Description                                                                                                                                                                             |
+| ---------------------- | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `username`             |          |          | Database username.                                                                                                                                                                      |
+| `password`             |          |          | Database password.                                                                                                                                                                      |
+| `host_port`            | ✅       |          | Host URL and port to connect to.                                                                                                                                                        |
+| `database`             |          |          | Database to ingest.                                                                                                                                                                     |
+| `database_alias`       |          |          | Alias to apply to database when ingesting.                                                                                                                                              |
+| `env`                  |          | `"PROD"` | Environment to use in namespace when constructing URNs.                                                                                                                                 |
+| `options.<option>`     |          |          | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. |
+| `table_pattern.allow`  |          |          | Regex pattern for tables to include in ingestion.                                                                                                                                       |
+| `table_pattern.deny`   |          |          | Regex pattern for tables to exclude from ingestion.                                                                                                                                     |
+| `schema_pattern.allow` |          |          | Regex pattern for schemas to include in ingestion.                                                                                                                                      |
+| `schema_pattern.deny`  |          |          | Regex pattern for schemas to exclude from ingestion.                                                                                                                                    |
+| `view_pattern.allow`   |          |          | Regex pattern for views to include in ingestion.                                                                                                                                        |
+| `view_pattern.deny`    |          |          | Regex pattern for views to exclude from ingestion.                                                                                                                                      |
+| `include_tables`       |          | `True`   | Whether tables should be ingested.                                                                                                                                                      |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/kafka-connect.md
+++ b/metadata-ingestion/source_docs/kafka-connect.md
@ -0,0 +1,63 @@
+# Kafka Connect
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[kafka-connect]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Kafka Connect connector as individual `DataFlowSnapshotClass` entity
+- Creating individual `DataJobSnapshotClass` entity using `{connector_name}:{source_dataset}` naming
+- Lineage information between source database to Kafka topic
+
+Current limitations:
+
+- Currently works only for Debezium source connectors.
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: "kafka-connect"
+  config:
+    # Coordinates
+    connect_uri: "http://localhost:8083"
+    cluster_name: "connect-cluster"
+
+    # Credentials
+    username: admin
+    password: password
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                      | Required | Default                    | Description                                             |
+| -------------------------- | -------- | -------------------------- | ------------------------------------------------------- |
+| `connect_uri`              |          | `"http://localhost:8083/"` | URI to connect to.                                      |
+| `username`                 |          |                            | Kafka Connect username.                                 |
+| `password`                 |          |                            | Kafka Connect password.                                 |
+| `cluster_name`             |          | `"connect-cluster"`        | Cluster to ingest from.                                 |
+| `connector_patterns.deny`  |          |                            | Regex pattern for connectors to include in ingestion.   |
+| `connector_patterns.allow` |          |                            | Regex pattern for connectors to exclude from ingestion. |
+| `env`                      |          | `"PROD"`                   | Environment to use in namespace when constructing URNs. |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/kafka.md
+++ b/metadata-ingestion/source_docs/kafka.md
@ -0,0 +1,60 @@
+# Kafka Metadata
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[kafka]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Topics from the Kafka broker
+- Schemas associated with each topic from the schema registry
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: "kafka"
+  config:
+    # Coordinates
+    connection:
+      bootstrap: "broker:9092"
+
+      schema_registry_url: http://localhost:8081
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                                        | Required | Default                  | Description                                                                                                                                                                                                                                                                          |
+| -------------------------------------------- | -------- | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `conection.bootstrap`                        |          | `"localhost:9092"`       | Bootstrap servers.                                                                                                                                                                                                                                                                   |
+| `connection.schema_registry_url`             |          | `http://localhost:8081"` | Schema registry location.                                                                                                                                                                                                                                                            |
+| `connection.schema_registry_config.<option>` |          |                          | Extra schema registry config. These options will be passed into Kafka's SchemaRegistryClient. See https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html?#schemaregistryclient.                                                                   |
+| `connection.consumer_config.<option>`        |          |                          | Extra consumer config. These options will be passed into Kafka's DeserializingConsumer. See https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#deserializingconsumer and https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md. |
+| `connection.producer_config.<option>`        |          |                          | Extra producer config. These options will be passed into Kafka's SerializingProducer. See https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#serializingproducer and https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md.     |
+| `topic_patterns.allow`                       |          |                          | Regex pattern for topics to include in ingestion.                                                                                                                                                                                                                                    |
+| `topic_patterns.deny`                        |          |                          | Regex pattern for topics to exclude from ingestion.                                                                                                                                                                                                                                  |
+
+The options in the consumer config and schema registry config are passed to the Kafka DeserializingConsumer and SchemaRegistryClient respectively.
+
+For a full example with a number of security options, see this [example recipe](../examples/recipes/secured_kafka.yml).
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/ldap.md
+++ b/metadata-ingestion/source_docs/ldap.md
@ -0,0 +1,65 @@
+# LDAP
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[ldap]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- People
+- Names, emails, titles, and manager information for each person
+- List of groups
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: "ldap"
+  config:
+    # Coordinates
+    ldap_server: ldap://localhost
+
+    # Credentials
+    ldap_user: "cn=admin,dc=example,dc=org"
+    ldap_password: "admin"
+
+    # Options
+    base_dn: "dc=example,dc=org"
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                          | Required | Default             | Description                                                             |
+| ------------------------------ | -------- | ------------------- | ----------------------------------------------------------------------- |
+| `ldap_server`                  | ✅       |                     | LDAP server URL.                                                        |
+| `ldap_user`                    | ✅       |                     | LDAP user.                                                              |
+| `ldap_password`                | ✅       |                     | LDAP password.                                                          |
+| `base_dn`                      | ✅       |                     | LDAP DN.                                                                |
+| `filter`                       |          | `"(objectClass=*)"` | LDAP extractor filter.                                                  |
+| `drop_missing_first_last_name` |          | `True`              | If set to true, any users without first and last names will be dropped. |
+| `page_size`                    |          | `20`                | Size of each page to fetch when extracting metadata.                    |
+
+The `drop_missing_first_last_name` should be set to true if you've got many "headless" user LDAP accounts
+for devices or services should be excluded when they do not contain a first and last name. This will only
+impact the ingestion of LDAP users, while LDAP groups will be unaffected by this config option.
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/looker.md
+++ b/metadata-ingestion/source_docs/looker.md
@ -0,0 +1,62 @@
+# Looker dashboards
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[looker]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Looker dashboards and dashboard elements (charts)
+- Names, descriptions, URLs, chart types, input view for the charts
+
+See the [Looker authentication docs](https://docs.looker.com/reference/api-and-integration/api-auth#authentication_with_an_sdk) for the steps to create a client ID and secret.
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: "looker"
+  config:
+    # Coordinates
+    base_url: https://company.looker.com:19999
+
+    # Credentials
+    client_id: admin
+    client_secret: password
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                     | Required | Default                 | Description                                                                                                  |
+| ------------------------- | -------- | ----------------------- | ------------------------------------------------------------------------------------------------------------ |
+| `client_id`               | ✅       |                         | Looker API3 client ID.                                                                                       |
+| `client_secret`           | ✅       |                         | Looker API3 client secret.                                                                                   |
+| `base_url`                | ✅       |                         | Url to your Looker instance: `https://company.looker.com:19999` or `https://looker.company.com`, or similar. |
+| `platform_name`           |          | `"looker"`              | Platform to use in namespace when constructing URNs.                                                         |
+| `actor`                   |          | `"urn:li:corpuser:etl"` | Actor to use in ownership properties of ingested metadata.                                                   |
+| `dashboard_pattern.allow` |          |                         | Regex pattern for dashboards to include in ingestion.                                                        |
+| `dashboard_pattern.deny`  |          |                         | Regex pattern for dashboards to exclude from ingestion.                                                      |
+| `chart_pattern.allow`     |          |                         | Regex pattern for charts to include in ingestion.                                                            |
+| `chart_pattern.deny`      |          |                         | Regex pattern for charts to exclude from ingestion.                                                          |
+| `env`                     |          | `"PROD"`                | Environment to use in namespace when constructing URNs.                                                      |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/lookml.md
+++ b/metadata-ingestion/source_docs/lookml.md
@ -0,0 +1,66 @@
+# LookML
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[lookml]'`.
+
+Note! This plugin uses a package that requires Python 3.7+!
+
+## Capabilities
+
+This plugin extracts the following:
+
+- LookML views from model files
+- Name, upstream table names, dimensions, measures, and dimension groups
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: "lookml"
+  config:
+    # Coordinates
+    base_folder: /path/to/model/files
+
+    # Options
+    connection_to_platform_map:
+      connection_name: platform_name (or platform_name.database_name) # for ex. my_snowflake_conn: snowflake.my_database
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                                          | Required | Default    | Description                                                             |
+| ---------------------------------------------- | -------- | ---------- | ----------------------------------------------------------------------- |
+| `base_folder`                                  | ✅       |            | Where the `*.model.lkml` and `*.view.lkml` files are stored.            |
+| `connection_to_platform_map.<connection_name>` | ✅       |            | Mappings between connection names in the model files to platform names. |
+| `platform_name`                                |          | `"looker"` | Platform to use in namespace when constructing URNs.                    |
+| `model_pattern.allow`                          |          |            | Regex pattern for models to include in ingestion.                       |
+| `model_pattern.deny`                           |          |            | Regex pattern for models to exclude from ingestion.                     |
+| `view_pattern.allow`                           |          |            | Regex pattern for views to include in ingestion.                        |
+| `view_pattern.deny`                            |          |            | Regex pattern for views to exclude from ingestion.                      |
+| `env`                                          |          | `"PROD"`   | Environment to use in namespace when constructing URNs.                 |
+| `parse_table_names_from_sql`                   |          | `False`    | See note below.                                                         |
+
+Note! The integration can use [`sql-metadata`](https://pypi.org/project/sql-metadata/) to try to parse the tables the
+views depends on. As these SQL's can be complicated, and the package doesn't official support all the SQL dialects that
+Looker supports, the result might not be correct. This parsing is disabled by default, but can be enabled by setting
+`parse_table_names_from_sql: True`.
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/mongodb.md
+++ b/metadata-ingestion/source_docs/mongodb.md
@ -0,0 +1,73 @@
+# MongoDB
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[mongodb]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Databases and associated metadata
+- Collections in each database and schemas for each collection (via schema inference)
+
+By default, schema inference samples 1,000 documents from each collection. Setting `schemaSamplingSize: null` will scan the entire collection.
+Moreover, setting `useRandomSampling: False` will sample the first documents found without random selection, which may be faster for large collections.
+
+Note that `schemaSamplingSize` has no effect if `enableSchemaInference: False` is set.
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: "mongodb"
+  config:
+    # Coordinates
+    connect_uri: "mongodb://localhost"
+
+    # Credentials
+    username: admin
+    password: password
+    authMechanism: "DEFAULT"
+
+    # Options
+    enableSchemaInference: True
+    useRandomSampling: True
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                      | Required | Default                 | Description                                                                                                              |
+| -------------------------- | -------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------ |
+| `connect_uri`              |          | `"mongodb://localhost"` | MongoDB connection URI.                                                                                                  |
+| `username`                 |          |                         | MongoDB username.                                                                                                        |
+| `password`                 |          |                         | MongoDB password.                                                                                                        |
+| `authMechanism`            |          |                         | MongoDB authentication mechanism. See https://pymongo.readthedocs.io/en/stable/examples/authentication.html for details. |
+| `options`                  |          |                         | Additional options to pass to `pymongo.MongoClient()`.                                                                   |
+| `enableSchemaInference`    |          | `True`                  | Whether to infer schemas.                                                                                                |
+| `schemaSamplingSize`       |          | `1000`                  | Number of documents to use when inferring schema size. If set to `0`, all documents will be scanned.                     |
+| `useRandomSampling`        |          | `True`                  | If documents for schema inference should be randomly selected. If `False`, documents will be selected from start.        |
+| `env`                      |          | `"PROD"`                | Environment to use in namespace when constructing URNs.                                                                  |
+| `database_pattern.allow`   |          |                         | Regex pattern for databases to include in ingestion.                                                                     |
+| `database_pattern.deny`    |          |                         | Regex pattern for databases to exclude from ingestion.                                                                   |
+| `collection_pattern.allow` |          |                         | Regex pattern for collections to include in ingestion.                                                                   |
+| `collection_pattern.deny`  |          |                         | Regex pattern for collections to exclude from ingestion.                                                                 |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/mssql.md
+++ b/metadata-ingestion/source_docs/mssql.md
@ -0,0 +1,101 @@
+# Microsoft SQL Server
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[mssql]'`.
+
+We have two options for the underlying library used to connect to SQL Server: (1) [python-tds](https://github.com/denisenkom/pytds) and (2) [pyodbc](https://github.com/mkleehammer/pyodbc). The TDS library is pure Python and hence easier to install, but only PyODBC supports encrypted connections.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, views and tables
+- Column types associated with each table/view
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: mssql
+  config:
+    # Coordinates
+    host_port: localhost:1433
+    database: DemoDatabase
+
+    # Credentials
+    username: user
+    password: pass
+
+sink:
+  # sink configs
+```
+
+<details>
+  <summary>Example: using ingestion with ODBC and encryption</summary>
+
+This requires you to have already installed the Microsoft ODBC Driver for SQL Server.
+See https://docs.microsoft.com/en-us/sql/connect/python/pyodbc/step-1-configure-development-environment-for-pyodbc-python-development?view=sql-server-ver15
+
+```yml
+source:
+  type: mssql
+  config:
+    # Coordinates
+    host_port: localhost:1433
+    database: DemoDatabase
+
+    # Credentials
+    username: admin
+    password: password
+
+    # Options
+    uri_args:
+      driver: "ODBC Driver 17 for SQL Server"
+      Encrypt: "yes"
+      TrustServerCertificate: "Yes"
+      ssl: "True"
+
+sink:
+  # sink configs
+```
+
+</details>
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                  | Required | Default            | Description                                                                                                                                                                             |
+| ---------------------- | -------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `username`             |          |                    | MSSQL username.                                                                                                                                                                         |
+| `password`             |          |                    | MSSQL password.                                                                                                                                                                         |
+| `host_port`            |          | `"localhost:1433"` | MSSQL host URL.                                                                                                                                                                         |
+| `database`             |          |                    | MSSQL database.                                                                                                                                                                         |
+| `database_alias`       |          |                    | Alias to apply to database when ingesting.                                                                                                                                              |
+| `use_odbc`             |          | `False`            | See https://docs.sqlalchemy.org/en/14/dialects/mssql.html#module-sqlalchemy.dialects.mssql.pyodbc.                                                                                      |
+| `uri_args.<uri_arg>`   |          |                    | Arguments to URL-encode when connecting. See https://docs.microsoft.com/en-us/sql/connect/odbc/dsn-connection-string-attribute?view=sql-server-ver15.                                   |
+| `env`                  |          | `"PROD"`           | Environment to use in namespace when constructing URNs.                                                                                                                                 |
+| `options.<option>`     |          |                    | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. |
+| `table_pattern.allow`  |          |                    | Regex pattern for tables to include in ingestion.                                                                                                                                       |
+| `table_pattern.deny`   |          |                    | Regex pattern for tables to exclude from ingestion.                                                                                                                                     |
+| `schema_pattern.allow` |          |                    | Regex pattern for schemas to include in ingestion.                                                                                                                                      |
+| `schema_pattern.deny`  |          |                    | Regex pattern for schemas to exclude from ingestion.                                                                                                                                    |
+| `view_pattern.allow`   |          |                    | Regex pattern for views to include in ingestion.                                                                                                                                        |
+| `view_pattern.deny`    |          |                    | Regex pattern for views to exclude from ingestion.                                                                                                                                      |
+| `include_tables`       |          | `True`             | Whether tables should be ingested.                                                                                                                                                      |
+| `include_views`        |          | `True`             | Whether views should be ingested.                                                                                                                                                       |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/mysql.md
+++ b/metadata-ingestion/source_docs/mysql.md
@ -0,0 +1,66 @@
+# MySQL
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[mysql]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, and tables
+- Column types and schema associated with each table
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: mysql
+  config:
+    # Coordinates
+    host_port: localhost:3306
+    database: dbname
+
+    # Credentials
+    username: root
+    password: example
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                  | Required | Default            | Description                                                                                                                                                                             |
+| ---------------------- | -------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `username`             |          |                    | MySQL username.                                                                                                                                                                         |
+| `password`             |          |                    | MySQL password.                                                                                                                                                                         |
+| `host_port`            |          | `"localhost:3306"` | MySQL host URL.                                                                                                                                                                         |
+| `database`             |          |                    | MySQL database.                                                                                                                                                                         |
+| `database_alias`       |          |                    | Alias to apply to database when ingesting.                                                                                                                                              |
+| `env`                  |          | `"PROD"`           | Environment to use in namespace when constructing URNs.                                                                                                                                 |
+| `options.<option>`     |          |                    | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. |
+| `table_pattern.allow`  |          |                    | Regex pattern for tables to include in ingestion.                                                                                                                                       |
+| `table_pattern.deny`   |          |                    | Regex pattern for tables to exclude from ingestion.                                                                                                                                     |
+| `schema_pattern.allow` |          |                    | Regex pattern for schemas to include in ingestion.                                                                                                                                      |
+| `schema_pattern.deny`  |          |                    | Regex pattern for schemas to exclude from ingestion.                                                                                                                                    |
+| `view_pattern.allow`   |          |                    | Regex pattern for views to include in ingestion.                                                                                                                                        |
+| `view_pattern.deny`    |          |                    | Regex pattern for views to exclude from ingestion.                                                                                                                                      |
+| `include_tables`       |          | `True`             | Whether tables should be ingested.                                                                                                                                                      |
+| `include_views`        |          | `True`             | Whether views should be ingested.                                                                                                                                                       |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/oracle.md
+++ b/metadata-ingestion/source_docs/oracle.md
@ -0,0 +1,74 @@
+# Oracle
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[oracle]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, and tables
+- Column types associated with each table
+
+Using the Oracle source requires that you've also installed the correct drivers; see the [cx_Oracle docs](https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html). The easiest one is the [Oracle Instant Client](https://www.oracle.com/database/technologies/instant-client.html).
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: oracle
+  config:
+    # Coordinates
+    host_port: localhost:5432
+    database: dbname
+
+    # Credentials
+    username: user
+    password: pass
+
+    # Options
+    service_name: svc # omit database if using this option
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+Exactly one of `database` or `service_name` is required.
+
+| Field                  | Required                       | Default  | Description                                                                                                                                                                                                                                                                       |
+| ---------------------- | ------------------------------ | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `username`             |                                |          | Oracle username. For more details on authentication, see the documentation: https://docs.sqlalchemy.org/en/14/dialects/oracle.html#dialect-oracle-cx_oracle-connect <br /> and https://cx-oracle.readthedocs.io/en/latest/user_guide/connection_handling.html#connection-strings. |
+| `password`             |                                |          | Oracle password.                                                                                                                                                                                                                                                                  |
+| `host_port`            |                                |          | Oracle host URL.                                                                                                                                                                                                                                                                  |
+| `database`             | If `service_name` is not set   |          | If using, omit `service_name`.                                                                                                                                                                                                                                                    |
+| `service_name`         | If `database_alias` is not set |          | Oracle service name. If using, omit `database`.                                                                                                                                                                                                                                   |
+| `database_alias`       |                                |          | Alias to apply to database when ingesting.                                                                                                                                                                                                                                        |
+| `env`                  |                                | `"PROD"` | Environment to use in namespace when constructing URNs.                                                                                                                                                                                                                           |
+| `options.<option>`     |                                |          | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details.                                                                                           |
+| `table_pattern.allow`  |                                |          | Regex pattern for tables to include in ingestion.                                                                                                                                                                                                                                 |
+| `table_pattern.deny`   |                                |          | Regex pattern for tables to exclude from ingestion.                                                                                                                                                                                                                               |
+| `schema_pattern.allow` |                                |          | Regex pattern for schemas to include in ingestion.                                                                                                                                                                                                                                |
+| `schema_pattern.deny`  |                                |          | Regex pattern for schemas to exclude from ingestion.                                                                                                                                                                                                                              |
+| `view_pattern.allow`   |                                |          | Regex pattern for views to include in ingestion.                                                                                                                                                                                                                                  |
+| `view_pattern.deny`    |                                |          | Regex pattern for views to exclude from ingestion.                                                                                                                                                                                                                                |
+| `include_tables`       |                                | `True`   | Whether tables should be ingested.                                                                                                                                                                                                                                                |
+| `include_views`        |                                | `True`   | Whether views should be ingested.                                                                                                                                                                                                                                                 |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/postgres.md
+++ b/metadata-ingestion/source_docs/postgres.md
@ -0,0 +1,71 @@
+# PostgreSQL
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[postgres]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, views, and tables
+- Column types associated with each table
+- Also supports PostGIS extensions
+- database_alias (optional) can be used to change the name of database to be ingested
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: postgres
+  config:
+    # Coordinates
+    host_port: localhost:5432
+    database: DemoDatabase
+
+    # Credentials
+    username: user
+    password: pass
+
+    # Options
+    database_alias: DatabaseNameToBeIngested
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                  | Required | Default  | Description                                                                                                                                                                             |
+| ---------------------- | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `username`             |          |          | PostgreSQL username.                                                                                                                                                                    |
+| `password`             |          |          | PostgreSQL password.                                                                                                                                                                    |
+| `host_port`            | ✅       |          | PostgreSQL host URL.                                                                                                                                                                    |
+| `database`             |          |          | PostgreSQL database.                                                                                                                                                                    |
+| `database_alias`       |          |          | Alias to apply to database when ingesting.                                                                                                                                              |
+| `env`                  |          | `"PROD"` | Environment to use in namespace when constructing URNs.                                                                                                                                 |
+| `options.<option>`     |          |          | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. |
+| `table_pattern.allow`  |          |          | Regex pattern for tables to include in ingestion.                                                                                                                                       |
+| `table_pattern.deny`   |          |          | Regex pattern for tables to exclude from ingestion.                                                                                                                                     |
+| `schema_pattern.allow` |          |          | Regex pattern for schemas to include in ingestion.                                                                                                                                      |
+| `schema_pattern.deny`  |          |          | Regex pattern for schemas to exclude from ingestion.                                                                                                                                    |
+| `view_pattern.allow`   |          |          | Regex pattern for views to include in ingestion.                                                                                                                                        |
+| `view_pattern.deny`    |          |          | Regex pattern for views to exclude from ingestion.                                                                                                                                      |
+| `include_tables`       |          | `True`   | Whether tables should be ingested.                                                                                                                                                      |
+| `include_views`        |          | `True`   | Whether views should be ingested.                                                                                                                                                       |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/redshift.md
+++ b/metadata-ingestion/source_docs/redshift.md
@ -0,0 +1,97 @@
+# Redshift
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[redshift]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, views and tables
+- Column types associated with each table
+- Also supports PostGIS extensions
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: redshift
+  config:
+    # Coordinates
+    host_port: example.something.us-west-2.redshift.amazonaws.com:5439
+    database: DemoDatabase
+
+    # Credentials
+    username: user
+    password: pass
+
+    # Options
+    options:
+      # driver_option: some-option
+
+    include_views: True # whether to include views, defaults to True
+    include_tables: True # whether to include views, defaults to True
+
+sink:
+  # sink configs
+```
+
+<details>
+  <summary>Extra options when running Redshift behind a proxy</summary>
+
+This requires you to have already installed the Microsoft ODBC Driver for SQL Server.
+See https://docs.microsoft.com/en-us/sql/connect/python/pyodbc/step-1-configure-development-environment-for-pyodbc-python-development?view=sql-server-ver15
+
+```yml
+source:
+  type: redshift
+  config:
+    host_port: my-proxy-hostname:5439
+
+    options:
+      connect_args:
+        sslmode: "prefer" # or "require" or "verify-ca"
+        sslrootcert: ~ # needed to unpin the AWS Redshift certificate
+
+sink:
+  # sink configs
+```
+
+</details>
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                  | Required | Default  | Description                                                                                                                                                                             |
+| ---------------------- | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `username`             |          |          | Redshift username.                                                                                                                                                                      |
+| `password`             |          |          | Redshift password.                                                                                                                                                                      |
+| `host_port`            | ✅       |          | Redshift host URL.                                                                                                                                                                      |
+| `database`             |          |          | Redshift database.                                                                                                                                                                      |
+| `database_alias`       |          |          | Alias to apply to database when ingesting.                                                                                                                                              |
+| `env`                  |          | `"PROD"` | Environment to use in namespace when constructing URNs.                                                                                                                                 |
+| `options.<option>`     |          |          | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. |
+| `table_pattern.allow`  |          |          | Regex pattern for tables to include in ingestion.                                                                                                                                       |
+| `table_pattern.deny`   |          |          | Regex pattern for tables to exclude from ingestion.                                                                                                                                     |
+| `schema_pattern.allow` |          |          | Regex pattern for schemas to include in ingestion.                                                                                                                                      |
+| `schema_pattern.deny`  |          |          | Regex pattern for schemas to exclude from ingestion.                                                                                                                                    |
+| `view_pattern.allow`   |          |          | Regex pattern for views to include in ingestion.                                                                                                                                        |
+| `view_pattern.deny`    |          |          | Regex pattern for views to exclude from ingestion.                                                                                                                                      |
+| `include_tables`       |          | `True`   | Whether tables should be ingested.                                                                                                                                                      |
+| `include_views`        |          | `True`   | Whether views should be ingested.                                                                                                                                                       |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/sagemaker.md
+++ b/metadata-ingestion/source_docs/sagemaker.md
@ -0,0 +1,62 @@
+# SageMaker
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[sagemaker]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Feature groups
+- Models, jobs, and lineage between the two (e.g. when jobs output a model or a model is used by a job)
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: sagemaker
+  config:
+    # Coordinates
+    aws_region: "my-aws-region"
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                                 | Required | Default      | Description                                                                        |
+| ------------------------------------- | -------- | ------------ | ---------------------------------------------------------------------------------- |
+| `aws_region`                          | ✅       |              | AWS region code.                                                                   |
+| `env`                                 |          | `"PROD"`     | Environment to use in namespace when constructing URNs.                            |
+| `aws_access_key_id`                   |          | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
+| `aws_secret_access_key`               |          | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
+| `aws_session_token`                   |          | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
+| `aws_role`                            |          | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
+| `extract_feature_groups`              |          | `True`       | Whether to extract feather groups.                                                 |
+| `extract_models`                      |          | `True`       | Whether to extract models.                                                         |
+| `extract_jobs.auto_ml`                |          | `True`       | Whether to extract AutoML jobs.                                                    |
+| `extract_jobs.compilation`            |          | `True`       | Whether to extract compilation jobs.                                               |
+| `extract_jobs.edge_packaging`         |          | `True`       | Whether to extract edge packaging jobs.                                            |
+| `extract_jobs.hyper_parameter_tuning` |          | `True`       | Whether to extract hyperparameter tuning jobs.                                     |
+| `extract_jobs.labeling`               |          | `True`       | Whether to extract labeling jobs.                                                  |
+| `extract_jobs.processing`             |          | `True`       | Whether to extract processing jobs.                                                |
+| `extract_jobs.training`               |          | `True`       | Whether to extract training jobs.                                                  |
+| `extract_jobs.transform`              |          | `True`       | Whether to extract transform jobs.                                                 |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/snowflake.md
+++ b/metadata-ingestion/source_docs/snowflake.md
@ -0,0 +1,147 @@
+# Snowflake
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[snowflake]'`.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, views and tables
+- Column types associated with each table
+
+:::tip
+
+You can also get fine-grained usage statistics for Snowflake using the `snowflake-usage` source described below.
+
+:::
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: snowflake
+  config:
+    # Coordinates
+    host_port: account_name
+    warehouse: "COMPUTE_WH"
+
+    # Credentials
+    username: user
+    password: pass
+    role: "sysadmin"
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                    | Required | Default                                                              | Description                                                                                                                                                                             |
+| ------------------------ | -------- | -------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `username`               |          |                                                                      | Snowflake username.                                                                                                                                                                     |
+| `password`               |          |                                                                      | Snowflake password.                                                                                                                                                                     |
+| `host_port`              | ✅       |                                                                      | Snowflake host URL.                                                                                                                                                                     |
+| `warehouse`              |          |                                                                      | Snowflake warehouse.                                                                                                                                                                    |
+| `role`                   |          |                                                                      | Snowflake role.                                                                                                                                                                         |
+| `env`                    |          | `"PROD"`                                                             | Environment to use in namespace when constructing URNs.                                                                                                                                 |
+| `options.<option>`       |          |                                                                      | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. |
+| `database_pattern.allow` |          |                                                                      | Regex pattern for databases to include in ingestion.                                                                                                                                    |
+| `database_pattern.deny`  |          | `"^UTIL_DB$" `<br />`"^SNOWFLAKE$"`<br />`"^SNOWFLAKE_SAMPLE_DATA$"` | Regex pattern for databases to exclude from ingestion.                                                                                                                                  |
+| `table_pattern.allow`    |          |                                                                      | Regex pattern for tables to include in ingestion.                                                                                                                                       |
+| `table_pattern.deny`     |          |                                                                      | Regex pattern for tables to exclude from ingestion.                                                                                                                                     |
+| `schema_pattern.allow`   |          |                                                                      | Regex pattern for schemas to include in ingestion.                                                                                                                                      |
+| `schema_pattern.deny`    |          |                                                                      | Regex pattern for schemas to exclude from ingestion.                                                                                                                                    |
+| `view_pattern.allow`     |          |                                                                      | Regex pattern for views to include in ingestion.                                                                                                                                        |
+| `view_pattern.deny`      |          |                                                                      | Regex pattern for views to exclude from ingestion.                                                                                                                                      |
+| `include_tables`         |          | `True`                                                               | Whether tables should be ingested.                                                                                                                                                      |
+| `include_views`          |          | `True`                                                               | Whether views should be ingested.                                                                                                                                                       |
+
+## Compatibility
+
+Coming soon!
+
+## Snowflake Usage Stats
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+### Setup
+
+To install this plugin, run `pip install 'acryl-datahub[snowflake-usage]'`.
+
+### Capabilities
+
+This plugin extracts the following:
+
+- Statistics on queries issued and tables and columns accessed (excludes views)
+- Aggregation of these statistics into buckets, by day or hour granularity
+
+Note: the user/role must have access to the account usage table. The "accountadmin" role has this by default, and other roles can be [granted this permission](https://docs.snowflake.com/en/sql-reference/account-usage.html#enabling-account-usage-for-other-roles).
+
+Note: the underlying access history views that we use are only available in Snowflake's enterprise edition or higher.
+
+:::note
+
+This source only does usage statistics. To get the tables, views, and schemas in your Snowflake warehouse, ingest using the `snowflake` source described above.
+
+:::
+
+### Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: snowflake-usage
+  config:
+    # Coordinates
+    host_port: account_name
+    warehouse: "COMPUTE_WH"
+
+    # Credentials
+    username: user
+    password: pass
+    role: "sysadmin"
+
+    # Options
+    top_n_queries: 10
+
+sink:
+  # sink configs
+```
+
+### Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field             | Required | Default                                                        | Description                                                     |
+| ----------------- | -------- | -------------------------------------------------------------- | --------------------------------------------------------------- |
+| `username`        |          |                                                                | Snowflake username.                                             |
+| `password`        |          |                                                                | Snowflake password.                                             |
+| `host_port`       | ✅       |                                                                | Snowflake host URL.                                             |
+| `warehouse`       |          |                                                                | Snowflake warehouse.                                            |
+| `role`            |          |                                                                | Snowflake role.                                                 |
+| `env`             |          | `"PROD"`                                                       | Environment to use in namespace when constructing URNs.         |
+| `bucket_duration` |          | `"DAY"`                                                        | Duration to bucket usage events by. Can be `"DAY"` or `"HOUR"`. |
+| `start_time`      |          | Last full day in UTC (or hour, depending on `bucket_duration`) | Earliest date of usage logs to consider.                        |
+| `end_time`        |          | Last full day in UTC (or hour, depending on `bucket_duration`) | Latest date of usage logs to consider.                          |
+| `top_n_queries`   |          | `10`                                                           | Number of top queries to save to each table.                    |
+
+### Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/sql_profiles.md
+++ b/metadata-ingestion/source_docs/sql_profiles.md
@ -0,0 +1,84 @@
+# SQL Profiles
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[sql-profiles]'`.
+
+The SQL-based profiler does not run alone, but rather can be enabled for other SQL-based sources.
+Enabling profiling will slow down ingestion runs.
+
+:::caution
+
+Running profiling against many tables or over many rows can run up significant costs.
+While we've done our best to limit the expensiveness of the queries the profiler runs, you
+should be prudent about the set of tables profiling is enabled on or the frequency
+of the profiling runs.
+
+:::
+
+## Capabilities
+
+Extracts:
+
+- Row and column counts for each table
+- For each column, if applicable:
+  - null counts and proportions
+  - distinct counts and proportions
+  - minimum, maximum, mean, median, standard deviation, some quantile values
+  - histograms or frequencies of unique values
+
+Supported SQL sources:
+
+- AWS Athena
+- BigQuery
+- Druid
+- Hive
+- Microsoft SQL Server
+- MySQL
+- Oracle
+- Postgres
+- Redshift
+- Snowflake
+- Generic SQLAlchemy source
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: <sql-source> # can be bigquery, snowflake, etc - see above for the list
+  config:
+    # ... any other source-specific options ...
+
+    # Options
+    profiling:
+      enabled: true
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                   | Required | Default | Description                                                             |
+| ----------------------- | -------- | ------- | ----------------------------------------------------------------------- |
+| `profiling.enabled`     |          | `False` | Whether profiling should be done.                                       |
+| `profiling.limit`       |          |         | Max number of documents to profile. By default, profiles all documents. |
+| `profiling.offset`      |          |         | Offset in documents to profile. By default, uses no offset.             |
+| `profile_pattern.allow` |          |         | Regex pattern for tables to profile.                                    |
+| `profile_pattern.deny`  |          |         | Regex pattern for tables to not profile.                                |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/sqlalchemy.md
+++ b/metadata-ingestion/source_docs/sqlalchemy.md
@ -0,0 +1,62 @@
+# Other SQLAlchemy databases
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[sqlalchemy]'`.
+
+The `sqlalchemy` source is useful if we don't have a pre-built source for your chosen
+database system, but there is an [SQLAlchemy dialect](https://docs.sqlalchemy.org/en/14/dialects/)
+defined elsewhere. In order to use this, you must `pip install` the required dialect packages yourself.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Metadata for databases, schemas, views, and tables
+- Column types associated with each table
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: sqlalchemy
+  config:
+    # Coordinates
+    connect_uri: "dialect+driver://username:password@host:port/database"
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field                  | Required | Default  | Description                                                                                                                                                                             |
+| ---------------------- | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `platform`             | ✅       |          | Name of platform being ingested, used in constructing URNs.                                                                                                                             |
+| `connect_uri`          | ✅       |          | URI of database to connect to. See https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls                                                                                    |
+| `env`                  |          | `"PROD"` | Environment to use in namespace when constructing URNs.                                                                                                                                 |
+| `options.<option>`     |          |          | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. |
+| `table_pattern.allow`  |          |          | Regex pattern for tables to include in ingestion.                                                                                                                                       |
+| `table_pattern.deny`   |          |          | Regex pattern for tables to exclude from ingestion.                                                                                                                                     |
+| `schema_pattern.allow` |          |          | Regex pattern for schemas to include in ingestion.                                                                                                                                      |
+| `schema_pattern.deny`  |          |          | Regex pattern for schemas to exclude from ingestion.                                                                                                                                    |
+| `view_pattern.allow`   |          |          | Regex pattern for views to include in ingestion.                                                                                                                                        |
+| `view_pattern.deny`    |          |          | Regex pattern for views to exclude from ingestion.                                                                                                                                      |
+| `include_tables`       |          | `True`   | Whether tables should be ingested.                                                                                                                                                      |
+| `include_views`        |          | `True`   | Whether views should be ingested.                                                                                                                                                       |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
--- a/metadata-ingestion/source_docs/superset.md
+++ b/metadata-ingestion/source_docs/superset.md
@ -0,0 +1,57 @@
+# Superset
+
+For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
+
+## Setup
+
+To install this plugin, run `pip install 'acryl-datahub[superset]'`.
+
+See documentation for superset's `/security/login` at https://superset.apache.org/docs/rest-api for more details on superset's login api.
+
+## Capabilities
+
+This plugin extracts the following:
+
+- Charts, dashboards, and associated metadata
+
+## Quickstart recipe
+
+Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
+
+For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
+
+```yml
+source:
+  type: superset
+  config:
+    # Coordinates
+    connect_uri: http://localhost:8088
+
+    # Credentials
+    username: user
+    password: pass
+    provider: ldap
+
+sink:
+  # sink configs
+```
+
+## Config details
+
+Note that a `.` is used to denote nested fields in the YAML recipe.
+
+| Field         | Required | Default            | Description                                             |
+| ------------- | -------- | ------------------ | ------------------------------------------------------- |
+| `connect_uri` |          | `"localhost:8088"` | Superset host URL.                                      |
+| `username`    |          |                    | Superset username.                                      |
+| `password`    |          |                    | Superset password.                                      |
+| `provider`    |          | `"db"`             | Superset provider.                                      |
+| `env`         |          | `"PROD"`           | Environment to use in namespace when constructing URNs. |
+
+## Compatibility
+
+Coming soon!
+
+## Questions
+
+If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!