docs: use sphinx-markdown-builder for sdk doc generation (#13721)

2025-06-27 05:03:31 +00:00 · 2025-06-13 16:50:19 +09:00 · 2025-06-13 16:50:19 +09:00 · e24ef15e77
commit e24ef15e77
parent f3c49c3174
24 changed files with 386 additions and 197 deletions
--- a/docs-website/sidebars.js
+++ b/docs-website/sidebars.js
@ -746,11 +746,26 @@ module.exports = {
      items: [
        "metadata-ingestion/as-a-library",
        {
-          "Python SDK Reference": [
+          type: "category",
+          label: "SDK Reference",
+          items: [
            {
-              type: "autogenerated",
-              dirName: "python-sdk",
+              type: "category",
+              label: "Builder",
+              items: [{ type: "autogenerated", dirName: "python-sdk/builder" }],
            },
+            {
+              type: "category",
+              label: "Clients",
+              items: [{ type: "autogenerated", dirName: "python-sdk/clients" }],
+            },
+            {
+              type: "category",
+              label: "SDK V2",
+              items: [{ type: "autogenerated", dirName: "python-sdk/sdk-v2" }],
+            },
+            "python-sdk/models",
+            "python-sdk/urns",
          ],
        },
      ],
--- a/docs-website/sphinx/Makefile
+++ b/docs-website/sphinx/Makefile
@ -26,10 +26,10 @@ $(VENV_SENTINEL): requirements.txt

 # Not using Python's http.server because it enables caching headers.
 serve:
-	serve -p 3001 _build/html/
+	serve -p 3001 _build/markdown/

-md: html
-	$(VENV_DIR)/bin/python3 convert_sphinx_to_docusaurus.py
+md: venv 
+	@$(SPHINXBUILD) -M markdown "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) && $(VENV_DIR)/bin/python3 convert_sphinx_to_docusaurus.py

 # Route other targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
--- a/docs-website/sphinx/apidocs/builder.rst
+++ b/docs-website/sphinx/apidocs/builder.rst
@ -1,10 +0,0 @@
-Builder
-=======
-
-These classes and methods make it easier to construct MetadataChangeProposals and MetadataChangeEvents.
-
-.. automodule:: datahub.emitter.mcp
-
-.. automodule:: datahub.emitter.mce_builder
-
-.. automodule:: datahub.emitter.mcp_builder
--- a/docs-website/sphinx/apidocs/builder/mce-builder.rst
+++ b/docs-website/sphinx/apidocs/builder/mce-builder.rst
@ -0,0 +1,7 @@
+MCE Builder
+=======
+
+These classes and methods make it easier to construct MetadataChangeEvents.
+
+.. automodule:: datahub.emitter.mce_builder
+   :member-order: alphabetical
--- a/docs-website/sphinx/apidocs/builder/mcp-builder.rst
+++ b/docs-website/sphinx/apidocs/builder/mcp-builder.rst
@ -0,0 +1,10 @@
+MCP Builder
+=======
+
+These classes and methods make it easier to construct MetadataChangeProposals.
+
+.. automodule:: datahub.emitter.mcp
+   :member-order: alphabetical
+
+.. automodule:: datahub.emitter.mcp_builder
+   :member-order: alphabetical
--- a/docs-website/sphinx/apidocs/clients.rst
+++ b/docs-website/sphinx/apidocs/clients.rst
@ -1,11 +0,0 @@
-Client
-======
-
-The Kafka emitter or Rest emitter can be used to push metadata to DataHub.
-The DataHub graph client extends the Rest emitter with additional functionality.
-
-.. automodule:: datahub.emitter.rest_emitter
-
-.. automodule:: datahub.emitter.kafka_emitter
-
-.. automodule:: datahub.ingestion.graph.client
--- a/docs-website/sphinx/apidocs/clients/graph-client.rst
+++ b/docs-website/sphinx/apidocs/clients/graph-client.rst
@ -0,0 +1,8 @@
+Graph Client
+======
+
+The DataHub graph client extends the Rest emitter with additional functionality.
+
+.. automodule:: datahub.ingestion.graph.client
+   :member-order: alphabetical
+
--- a/docs-website/sphinx/apidocs/clients/kafka-emitter.rst
+++ b/docs-website/sphinx/apidocs/clients/kafka-emitter.rst
@ -0,0 +1,8 @@
+Kafka Emitter
+======
+
+The Kafka emitter can be used to push metadata to DataHub.
+
+.. automodule:: datahub.emitter.kafka_emitter
+   :member-order: alphabetical
+
--- a/docs-website/sphinx/apidocs/clients/rest-emitter.rst
+++ b/docs-website/sphinx/apidocs/clients/rest-emitter.rst
@ -0,0 +1,8 @@
+Rest Emitter
+======
+
+The Rest emitter can be used to push metadata to DataHub.
+
+.. automodule:: datahub.emitter.rest_emitter
+   :member-order: alphabetical
+
--- a/docs-website/sphinx/apidocs/sdk-v2/entities.rst
+++ b/docs-website/sphinx/apidocs/sdk-v2/entities.rst
@ -0,0 +1,29 @@
+Entities
+=======
+
+The DataHub SDK provides a set of entities that can be used to interact with DataHub's metadata.
+
+
+.. automodule:: datahub.sdk.dataset
+   :member-order: alphabetical
+
+.. automodule:: datahub.sdk.container
+   :member-order: alphabetical
+
+.. automodule:: datahub.sdk.mlmodel
+   :member-order: alphabetical
+
+.. automodule:: datahub.sdk.mlmodelgroup
+   :member-order: alphabetical
+
+.. automodule:: datahub.sdk.dashboard
+   :member-order: alphabetical
+
+.. automodule:: datahub.sdk.chart
+   :member-order: alphabetical
+
+.. automodule:: datahub.sdk.datajob
+   :member-order: alphabetical
+
+.. automodule:: datahub.sdk.dataflow
+   :member-order: alphabetical
--- a/docs-website/sphinx/apidocs/sdk-v2/entity-client.rst
+++ b/docs-website/sphinx/apidocs/sdk-v2/entity-client.rst
@ -0,0 +1,7 @@
+Entity Client
+=======
+
+The DataHub Entity Client provides a client for interacting with DataHub entities.
+
+.. automodule:: datahub.sdk.entity_client
+   :member-order: alphabetical
--- a/docs-website/sphinx/apidocs/sdk-v2/lineage-client.rst
+++ b/docs-website/sphinx/apidocs/sdk-v2/lineage-client.rst
@ -0,0 +1,7 @@
+Lineage Client
+=======
+
+The DataHub Lineage Client provides a client for searching and retrieving lineage metadata from DataHub.
+
+.. automodule:: datahub.sdk.lineage_client
+   :member-order: alphabetical
--- a/docs-website/sphinx/apidocs/sdk-v2/main-client.rst
+++ b/docs-website/sphinx/apidocs/sdk-v2/main-client.rst
@ -0,0 +1,7 @@
+Main Client
+=======
+
+The DataHub Main Client provides a client for interacting with DataHub.
+
+.. automodule:: datahub.sdk.main_client
+   :member-order: alphabetical
--- a/docs-website/sphinx/apidocs/sdk-v2/resolver-client.rst
+++ b/docs-website/sphinx/apidocs/sdk-v2/resolver-client.rst
@ -0,0 +1,7 @@
+Resolver Client
+=======
+
+The DataHub Resolver Client provides a client for resolving entities by their URN.
+
+.. automodule:: datahub.sdk.resolver_client
+   :member-order: alphabetical
--- a/docs-website/sphinx/apidocs/sdk-v2/search-client.rst
+++ b/docs-website/sphinx/apidocs/sdk-v2/search-client.rst
@ -0,0 +1,10 @@
+Search Client
+=======
+
+The DataHub Search Client provides a client for searching and retrieving metadata from DataHub.
+
+.. automodule:: datahub.sdk.search_client
+   :member-order: alphabetical
+
+.. automodule:: datahub.sdk.search_filters
+   :member-order: alphabetical
--- a/docs-website/sphinx/cli.rst
+++ b/docs-website/sphinx/cli.rst
@ -1,6 +0,0 @@
-DataHub CLI
-===========
-
-.. click:: datahub.entrypoints:datahub
-   :prog: datahub
-   :nested: full
--- a/docs-website/sphinx/conf.py
+++ b/docs-website/sphinx/conf.py
@ -26,8 +26,10 @@ extensions = [
    "sphinx_autodoc_typehints",
    # This enables us to autogenerate docs for our CLI.
    "sphinx_click",
+    "sphinx_markdown_builder",
 ]

+markdown_anchor_sections = True
 napoleon_use_param = True

 # Move type hint info to function description instead of signature
--- a/docs-website/sphinx/convert_sphinx_to_docusaurus.py
+++ b/docs-website/sphinx/convert_sphinx_to_docusaurus.py
@ -1,78 +1,166 @@
 import pathlib
-import json
-from bs4 import BeautifulSoup
-
+import re

 SPHINX_ROOT_DIR = pathlib.Path(".")
-SPHINX_BUILD_DIR = SPHINX_ROOT_DIR / pathlib.Path("_build/html/apidocs")
+SPHINX_BUILD_DIR = SPHINX_ROOT_DIR / "_build/markdown"
 DOCS_OUTPUT_DIR = pathlib.Path("../docs/python-sdk")

+HTML_TAGS = {
+    "html", "head", "title", "base", "link", "meta", "style", "script", "noscript",
+    "body", "section", "nav", "article", "aside", "h1", "h2", "h3", "h4", "h5", "h6",
+    "header", "footer", "address", "p", "hr", "pre", "blockquote", "ol", "ul", "li",
+    "dl", "dt", "dd", "figure", "figcaption", "div", "main", "a", "em", "strong",
+    "small", "s", "cite", "q", "dfn", "abbr", "data", "time", "code", "var", "samp",
+    "kbd", "sub", "sup", "i", "b", "u", "mark", "ruby", "rt", "rp", "bdi", "bdo",
+    "span", "br", "wbr", "ins", "del", "img", "iframe", "embed", "object", "param",
+    "video", "audio", "track", "canvas", "map", "area", "svg", "math",
+    "table", "caption", "colgroup", "col", "tbody", "thead", "tfoot", "tr", "td", "th",
+    "form", "fieldset", "legend", "label", "button", "select", "datalist",
+    "optgroup", "option", "textarea", "output", "progress", "meter", "details",
+    "summary", "dialog", "template", "slot", "portal"
+}

-def html_to_mdx(html: str) -> str:
-    # Because the HTML uses `class` and has `{}` in it, it isn't valid
-    # MDX. As such, we use React's dangerouslySetInnerHTML.
-    return f"""
+REPLACEMENTS = [
+    ("<function ", "&lt;function "),
+    ("<disabled ", "&lt;disabled "),
+    ("MDXContent.isMDXComponent = true", ""),
+    (".md#", ".mdx#"),
+]

-<div dangerouslySetInnerHTML={{{{__html: {json.dumps(html)}}}}}></div>
+# ---- CLEAN HTML DANGEROUS TAGS ----
+def sanitize_mdx_unsafe_tags(content: str) -> str:
+    return re.sub(
+        r"<([a-zA-Z0-9_-]+)>",
+        lambda m: f"<{m.group(1)}>" if m.group(1).lower() in HTML_TAGS else f"&lt;{m.group(1)}&gt;",
+        content
+    )

-"""
+# ---- REPAIR BROKEN MARKDOWN BOLD ----
+def repair_broken_emphasis(content: str) -> str:
+    content = re.sub(r'\[\*\*([^\*]+)\*\s+\*', r'[**\1**', content)
+    content = re.sub(r'\*\*([^\*]+)\*\s+\*\]', r'**\1**]', content)
+    content = re.sub(r'\*\*([^\*]+)\*\s*,\s+\*\s*([^\*]+)\s*\*', r'**\1**, **\2**', content)
+    content = re.sub(r'\*\s*\*([^\*]+)\*\s+\*\]', r'**\1**]', content)
+    return content
+
+# ---- ONLY USED INSIDE SECTION HEADINGS ----
+def convert_md_link_to_html(arg_str: str) -> str:
+    # convert markdown links inside argument types into plain text fallback
+    return re.sub(
+        r'\[([^\]]+)\]\([^)]+\)',
+        r'<code>\1</code>',
+        arg_str
+    )


-def bs4_to_mdx(soup: BeautifulSoup) -> str:
-    # TODO: Eventually we should do something smarter here to
-    # generate something that's closer to real Markdown. This would
-    # be helpful, for example, for enabling Docusaurus to generate
-    # a table of contents for the page.
-    return html_to_mdx(str(soup))
+# ---- ARGUMENT PARSER ----
+def parse_args(arg_str: str) -> str:
+    if not arg_str.strip():
+        return ""

+    parts = []
+    for arg in arg_str.split(","):
+        arg = arg.strip().replace("\\", "")
+        if arg == "*":
+            parts.append("*")
+            continue

-def convert_html_to_md(html_file: pathlib.Path) -> str:
-    html = html_file.read_text()
-    soup = BeautifulSoup(html, "html.parser")
+        for pattern, template in [
+            (r"([\w_]+)\s*:\s*([^=]+)\s*=\s*(.+)", r'<span class="arg-name">\1</span>: <span class="arg-type">\2</span> = <span class="arg-default">\3</span>'),
+            (r"([\w_]+)\s*=\s*(.+)", r'<span class="arg-name">\1</span> = <span class="arg-default">\2</span>'),
+            (r"([\w_]+)\s*:\s*(.+)", r'<span class="arg-name">\1</span>: <span class="arg-type">\2</span>')
+        ]:
+            m = re.match(pattern, arg)
+            if m:
+                parts.append(m.expand(template))
+                break
+        else:
+            parts.append(f'<span class="arg-name">{arg}</span>')

-    body = soup.find("main").find("div", {"class": "bd-article-container"})
-    article = body.find("article")
+    parsed = ", ".join(parts)
+    parsed = convert_md_link_to_html(parsed)
+    return parsed

-    # Remove all the "permalink to this heading" links.
-    for link in article.find_all("a", {"class": "headerlink"}):
-        link.decompose()
+# ---- HEADING PARSER ----
+def parse_heading(text: str):
+    match = re.match(r"(?:\*class\*\s+)?([\w\.]+)\.([\w]+)(?:\((.*)\))?", text)
+    if match:
+        owner, name, args = match.groups()
+        parsed_args = parse_args(args or "")
+        prefix = '<span class="class-text">class</span> ' if "*class*" in text else ""
+        heading = f'{prefix}<span class="class-owner">{owner}.</span><span class="class-name">{name}</span>'
+        heading += f"({parsed_args})" if parsed_args else "()"
+        slug = f"{owner}.{name}"
+        return name, heading, slug

-    # Remove the trailing " – " from arguments that are missing
-    # a description.
-    for item in article.select("dl.field-list dd p"):
-        # Note - that's U+2013, not a normal hyphen.
-        if str(item).endswith(" – </p>"):
-            parent = item.parent
-            # print("orig item", item)
-            new_item = BeautifulSoup(str(item)[:-7] + "</p>", "html.parser")
-            # print("new-item", str(new_item))
-            parent.p.replace_with(new_item)
-            # print("item post replace", parent)
+    match = re.match(r"([\w]+)(?:\((.*)\))?", text)
+    if match:
+        name, args = match.groups()
+        parsed_args = parse_args(args or "")
+        heading = f'<span class="class-name">{name}</span>'
+        heading += f"({parsed_args})" if parsed_args else "()"
+        return name, heading, name

-    # Extract title from the h1.
-    title_element = article.find("h1")
-    title = title_element.text
-    title_element.decompose()
+    return text, text, text

-    # TODO - generate nicer slugs for these pages
-    md_meta = f"""---
-title: {title}
---\n\n"""
+# ---- SECTION WRAPPER ----
+def wrap_section_blocks(content: str, class_name: str) -> str:
+    lines = content.splitlines()
+    out = []
+    inside = False

-    return md_meta + bs4_to_mdx(article)
+    for line in lines:
+        m = re.match(r"^### (.+)$", line)
+        if m:
+            if inside:
+                out.append("\n\n</div>\n\n")

+            name, heading, slug = parse_heading(m.group(1))
+            out.append(f'\n\n### <span className="visually-hidden">{name}</span> {{#{slug}}}\n\n')
+            out.append(f'<div className="{class_name}">\n')
+            out.append(f'<div className="section-heading">{heading}<a href="#{slug}" class="hash-link"></a></div>\n')
+            inside = True
+        else:
+            out.append(line)
+
+    if inside:
+        out.append("\n\n</div>\n\n")
+
+    return "\n".join(out)
+
+# ---- PARAMETER DASH FIX ----
+def fix_parameter_dash(content: str) -> str:
+    return re.sub(r'(\*\s+\*\*[\w]+?\*\*\s+\([^\)]*\))\s+–\s*(?=\n|\r|\Z)', r'\1', content)
+
+# ---- FILE CONVERTER ----
+def convert_file(doc: pathlib.Path, outfile: pathlib.Path):
+    content = doc.read_text()
+
+    for old, new in REPLACEMENTS:
+        content = content.replace(old, new)
+
+    content = sanitize_mdx_unsafe_tags(content)
+    content = repair_broken_emphasis(content)
+    content = wrap_section_blocks(content, "h3-block")
+    content = fix_parameter_dash(content)
+
+    title_match = re.search(r"^# (.+)$", content, re.MULTILINE)
+    title = title_match.group(1).strip() if title_match else doc.stem
+    content = re.sub(r"^# .+\n?", "", content, count=1, flags=re.MULTILINE)
+
+    final = f"---\ntitle: {title}\n---\n<div className=\"sphinx-api-docs\">\n{content.strip()}\n</div>\n"
+
+    outfile.parent.mkdir(parents=True, exist_ok=True)
+    outfile.write_text(final)
+    print(f"Generated {outfile}")

 def main():
    DOCS_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
-
-    for doc in SPHINX_BUILD_DIR.glob("**/*.html"):
-        md = convert_html_to_md(doc)
-
-        outfile = DOCS_OUTPUT_DIR / doc.relative_to(SPHINX_BUILD_DIR).with_suffix(".md")
-        outfile.parent.mkdir(parents=True, exist_ok=True)
-        outfile.write_text(md)
-
-        print(f"Generated {outfile}")
+    for doc in SPHINX_BUILD_DIR.glob("**/*.md"):
+        if doc.stem == "index":
+            continue
+        outfile = DOCS_OUTPUT_DIR / doc.relative_to(SPHINX_BUILD_DIR / "apidocs").with_suffix(".mdx")
+        convert_file(doc, outfile)

 if __name__ == "__main__":
    main()
--- a/docs-website/sphinx/requirements.txt
+++ b/docs-website/sphinx/requirements.txt
@ -5,6 +5,7 @@ sphinx-click==4.4.0
 sphinx_autodoc_typehints==1.22
 pydata-sphinx-theme==0.13.1
 snowballstemmer>=2.2,<3 # Fixes https://github.com/sphinx-doc/sphinx/issues/13533
+sphinx-markdown-builder==0.6.8

 # Because of https://github.com/pydata/pydata-sphinx-theme/issues/108
 accessible-pygments
--- a/docs-website/src/styles/sphinx.scss
+++ b/docs-website/src/styles/sphinx.scss
@ -1,124 +1,108 @@
-// Styles for Sphinx Python SDK generated docs
-$borderRadius: 5px;
+.sphinx-api-docs {
+  font-size: 15px;
+  color: var(--ifm-font-color-base);

-dl.py {
-  margin-bottom: calc(var(--ifm-spacing-vertical) * 2);
-  font-size: 14px;
+  .anchor {
+    .visually-hidden {
+      clip: rect(0 0 0 0);
+      clip-path: inset(50%);
+      height: 10px;
+      width: 3rem;
+      overflow: hidden;
+      white-space: nowrap;
+      padding: 0;
+      margin: 0;
+    }
+    .hash-link { display: none; }
+  }
+
+  .h3-block {
    border: 1px solid var(--ifm-hr-border-color);
-  border-radius: $borderRadius;
+    border-radius: 8px;
+    overflow: hidden;

-  code {
-    border: none;
-    background: none;
+    .section-heading {
+      background-color: var(--ifm-color-primary-opaque);
+      font-family: var(--ifm-font-family-monospace);
+      padding: 1rem 1.5rem;
+      font-weight: 700;
+      font-size: 1rem;
+      border-bottom: 1px solid var(--ifm-hr-border-color);
+
+      .class-text { color: #aaa; margin-right: 0.3rem; }
+      .class-owner { color: var(--ifm-font-color-secondary); }
+      .class-name { color: var(--ifm-color-primary); font-weight: 700; }
+      .arg-name { color: var(--ifm-font-color-base); font-weight: 600; }
+      .arg-type { color: gray; }
+      .arg-default { color: #999; font-weight: 600; }
    }

    p {
-    margin-bottom: 0;
+      padding: 1rem 1.5rem;
    }

-  dl {
-    margin-bottom: var(--ifm-spacing-vertical);
-  }
-
-  // The parameter name:
-  em.sig-param > span:first-child {
-    font-weight: bold;
-  }
-
-  > dd:not(:empty) {
-    padding-bottom: var(--ifm-spacing-vertical);
-  }
-
-  dt.sig {
-    box-sizing: border-box;
-    font-size: 0.9rem;
-    padding: var(--ifm-spacing-vertical);
-    border-radius: $borderRadius;
+    h4 {
+      background: #FAFAFA;
      font-family: var(--ifm-font-family-monospace);
-    background-color: var(--ifm-background-surface-color);
-  }
-
-  > dd {
-    &:not(:empty) {
-      padding-top: calc(var(--ifm-spacing-vertical) / 2);
-      margin-top: 0;
-      margin-left: var(--ifm-spacing-horizontal);
-      margin-right: var(--ifm-spacing-horizontal);
-    }
-  }
-
-  // e.g. `class`, or `def`
-  em.property {
-    color: var(--ifm-font-color-base);
-    font-weight: bold;
-  }
-
-  // e.g. `MyClass`
-  span.sig-name {
-    color: #2774b3;
-    font-weight: bold;
-  }
-
-  // e.g classmethod
-  em.property {
-    color: #66b327;
-  }
-
-  em.sig-param {
-    span.default_value {
-      color: #66b327;
-    }
-  }
-
-  span.sig-return {
-    span.sig-return-typehint {
-      color: var(--ifm-font-color-base);
-
-      pre {
-        color: var(--ifm-font-color-base);
-      }
-    }
-  }
-
-  dl.field-list {
-    padding-top: calc(var(--ifm-spacing-vertical) / 2);
-    display: grid;
-    grid-template-columns: fit-content(30%) auto;
-    &:not(:first-child) {
+      padding: 1rem 1.5rem;
+      margin: 0;
+      font-weight: 600;
      border-top: 1px solid var(--ifm-hr-border-color);
+      border-bottom: 1px solid var(--ifm-hr-border-color);
+      color: #444;  
+    
+      em {
+        color: gray;
+        font-style: normal; 
+      }
    }
    
-    dt {
-      margin-right: 0.5em;
-    }
-
-    dd {
-      font-family: var(--ifm-font-family-monospace);
-    }
-
-    dt,
-    dd {
-      margin-left: 0;
-      padding-left: 0;
-
-      &:not(:first-of-type) {
-        border-top: 1px solid var(--ifm-hr-border-color);
-        padding-top: var(--ifm-spacing-vertical);
-      }
-      &:not(:last-of-type) {
-        padding-bottom: var(--ifm-spacing-vertical);
-      }

    ul {
-        list-style-type: none;
-        padding-left: 0;
-        li {
-          p {
      margin: 0;
-            padding: 0;
+      padding: 1rem 2rem;
+      background: #FDFDFD;
+
+      li {
+        display: flex;
+        flex-wrap: wrap;
+        margin-bottom: 0.75rem;
+
+        .param-name {
+          min-width: 160px;
+          font-weight: 600;
+          color: var(--ifm-color-primary-darker);
+        }
+
+        .param-type {
+          font-family: var(--ifm-font-family-monospace);
+          font-weight: 400;
+          color: #444;
+          background: #F5F5F5;
+          padding: 0.15rem 0.4rem;
+          border-radius: 4px;
+          margin-left: 0.5rem;
        }
      }
    }
+
+    // pre code block (full examples)
+    pre code, pre code * {
+      color: var(--ifm-font-color-secondary);
+      padding: 1rem;
+      margin: 0;
+      font-size: 0.9rem;
+      overflow-x: auto;
+    }
+
+    code {
+      background-color: #F5F5F5;
+      padding: 0.1rem 0.4rem;
+      margin: auto 0.2rem;
+      font-size: 0.9em;
+      border: none;
+      line-height: 1.3;
+      border-radius: 6px;
    }
  }
 }
--- a/docs-website/src/theme/Root.js
+++ b/docs-website/src/theme/Root.js
@ -0,0 +1,18 @@
+import React, { useEffect } from 'react';
+import { useLocation } from '@docusaurus/router';
+
+export default function Root({ children }) {
+  const location = useLocation();
+
+  useEffect(() => {
+    setTimeout(() => {
+      if (location.hash) {
+        const id = decodeURIComponent(location.hash.substring(1));
+        const el = document.getElementById(id);
+        if (el) el.scrollIntoView();
+      }
+    }, 0);
+  }, []);
+
+  return <>{children}</>;
+}
--- a/docs/how/delete-metadata.md
+++ b/docs/how/delete-metadata.md
@ -225,7 +225,7 @@ datahub delete --platform snowflake --only-soft-deleted --hard

 ## Deletes using the SDK and APIs

-The Python SDK's [DataHubGraph](../../python-sdk/clients.md) client supports deletes via the following methods:
+The Python SDK's [DataHubGraph](../../python-sdk/clients/graph-client.mdx) client supports deletes via the following methods:

 - `soft_delete_entity`
 - `hard_delete_entity`
--- a/docs/lineage/sql_parsing.md
+++ b/docs/lineage/sql_parsing.md
@ -21,7 +21,7 @@ If you’re using a different database system for which we don’t support colum

 ## SDK Support

-Our SDK provides a [`DataHubGraph.parse_sql_lineage()`](../../python-sdk/clients.md#datahub.ingestion.graph.client.DataHubGraph.parse_sql_lineage) method for programmatically parsing SQL queries.
+Our SDK provides a [`DataHubGraph.parse_sql_lineage()`](../../python-sdk/clients/graph-client.mdx#datahub.ingestion.graph.client.DataHubGraph.parse_sql_lineage) method for programmatically parsing SQL queries.

 The resulting object contains a `sql_parsing_result.debug_info.confidence_score` field, which is a 0-1 value indicating the confidence of the parser.

--- a/metadata-ingestion/scripts/docgen.py
+++ b/metadata-ingestion/scripts/docgen.py
@ -669,7 +669,7 @@ This is a summary of automatic lineage extraction support in our data source. Pl
 If you’re using a different database system for which we don’t support column-level lineage out of the box, but you do have a database query log available, 
 we have a SQL queries connector that generates column-level lineage and detailed table usage statistics from the query log.

-If these does not suit your needs, you can use the new `DataHubGraph.parse_sql_lineage()` method in our SDK. (See the source code [here](https://docs.datahub.com/docs/python-sdk/clients/))
+If these does not suit your needs, you can use the new `DataHubGraph.parse_sql_lineage()` method in our SDK. (See the source code [here](https://docs.datahub.com/docs/python-sdk/clients/graph-client))

 For more information, refer to the [Extracting Column-Level Lineage from SQL](https://blog.datahubproject.io/extracting-column-level-lineage-from-sql-779b8ce17567)