Steve Canny c27e0d0062
rfctr(html): replace html parser (#3218)
**Summary**
Replace legacy HTML parser with recursive version that captures all
content and provides flexibility to add new metadata. It's also
substantially faster although that's just a happy side-effect.

**Additional Context**
The prior HTML parsing algorithm that makes up the core of HTML
partitioning was buggy and very difficult to reason about because it did
not conform to the inherently recursive structure of HTML. The new
version retains `lxml` as the performant and reliable base library but
uses `lxml`'s custom element classes to efficiently classify HTML
elements by their behaviors (block-item and inline (phrasing) primarily)
and give those elements the desired partitioning behaviors.

This solves a host of existing problems with content being skipped and
elements (paragraphs) being divided improperly, but also provides a
clear domain model for reasoning about its behavior and reliably
adjusting it to suit our existing and future purposes.

The parser's operation is recursive, closely modeling the recursive
structure of HTML itself. It's behaviors are based on the HTML Standard
and reliably produce proper and explainable results even for novel
cases.

Fixes #2325 
Fixes #2562
Fixes #2675
Fixes #3168
Fixes #3227
Fixes #3228 
Fixes #3230 
Fixes #3237 
Fixes #3245 
Fixes #3247 
Fixes #3255
Fixes #3309 

### BEHAVIOR DIFFERENCES

#### `emphasized_text_tags` encoding is changed:
- `<strong>` is encoded as `"b"` rather than `"strong"`.
- `<em>` is encoded as `"i"` rather than `"em"`.
- `<span>` is no longer recorded in `emphasized_text_tags` (because
without the CSS we can't tell whether it's used for emphasis or if so
what kind).
- nested emphasis (e.g. bold+italic) is encoded as multiple characters
("bi").
- `emphasized_text_contents` is broken on emphasis-change boundaries,
like:
  ```html
   `<p>foo <b>bar <i>baz</i> bada</b> bing</p>`
  ```
  produces:
  ```json
  {
    "emphasized_text_contents": ["bar", "baz", "bada"],
    "emphasized_text_tags": ["b", "bi", "b"]
  }
  ```
   whereas previously it would have produced:
  ```json
  {
    "emphasized_text_contents": ["bar baz bada", "baz"],
    "emphasized_text_tags": ["b", "i"]
  }
  ```

#### `<pre>` text is preserved as it appears in the html
Except that a leading newline is removed if present (has to be in
position 0 of text). Also, a trailing newline is stripped but only if it
appears in the very last position ([-1]) of the `<pre>` text. Old parser
stripped all leading and trailing whitespace.

Result is that:
```html
<pre>
foo
bar
baz
</pre>
```
parses to `"foo\nbar\nbaz"` which is the same result produced for:
```html
<pre>foo
bar
baz</pre>
```
This equivalence is the same behavior exhibited by a browser, which is
why we did the extra work to make it this way.

#### Whitespace normalization
Leading and trailing whitespace are removed from element text, just as
it is removed in the browser. Runs of whitespace within the element text
are reduced to a single space character (like in the browser). Note this
means that `\t`, `\n`, and `&nbsp;` are replaced with a regular space
character. All text derived from elements is whitespace normalized
except the text within a `<pre>` tag. Any leading or trailing newline is
trimmed from `<pre>` element text; all other whitespace is preserved
just as it appeared in the HTML source.

#### `link_start_indexes` metadata is no longer captured. Rationale:
- It was frequently wrong, often `-1`.
- It was deprecated but then added back in a community PR.
- Maintaining it across any possible downstream transformations (e.g.
chunking) would be expensive and almost certainly lead to wrong values
as distant code evolves.
- It is complex to compute and recompute when whitespace is normalized,
adding substantial complexity to the code and reducing readability and
maintainability

#### `<br/>` element is replaced with a single newline (`"\n"`)
but that is usually replaced with a space in `Element.text` when it is
normalized. The newline is preserved within a `<pre>` element.
  - Related: _No paragraph-break on `<br/><br/>`_

#### Empty `h1..h6` elements are dropped.
HTML heading elements (`<h1..h6>`) are "skipped" (do not generate a
`Title` element) when they contain no text or contain only whitespace.

---------

Co-authored-by: scanny <scanny@users.noreply.github.com>
2024-07-11 00:14:28 +00:00

726 lines
23 KiB
JSON

[
{
"element_id": "d5576cc299d7d8417c136933f890831c",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Create a stellar overview",
"type": "Title"
},
{
"element_id": "d36113941235a14bdacafa399698ee71",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "The overview is the first page visitors will see when they visit your space, so it helps to include some information on what the space is about and what your team is working on.",
"type": "NarrativeText"
},
{
"element_id": "21e1683c1bc71c40ea20081368bcc7f6",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"emphasized_text_contents": [
"Add a header image."
],
"emphasized_text_tags": [
"b"
],
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Add a header image. This gives your overview visual appeal and makes it welcoming for visitors.",
"type": "NarrativeText"
},
{
"element_id": "65f03aec0f3637db38c5a3741968eeff",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"emphasized_text_contents": [
"Explain what the space is for."
],
"emphasized_text_tags": [
"b"
],
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Explain what the space is for. Start by summarizing the purpose of the space. This could be your team's mission statement or a brief description of the kind of work you do.",
"type": "NarrativeText"
},
{
"element_id": "e2522f792c3c5ef32bf1ba342a282fdd",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"emphasized_text_contents": [
"Share team goals."
],
"emphasized_text_tags": [
"b"
],
"filetype": "text/html",
"languages": [
"eng"
],
"link_texts": [
"OKRs",
"project plans",
"product roadmaps"
],
"link_urls": [
"https://www.atlassian.com/software/confluence/templates/okrs",
"https://www.atlassian.com/software/confluence/templates/project-plan",
"https://www.atlassian.com/software/confluence/templates/product-roadmap"
]
},
"text": "Share team goals. Add links to your team's OKRs, project plans, and product roadmaps so visitors can quickly get a sense of your team's goals.",
"type": "NarrativeText"
},
{
"element_id": "bd058a2d2c45c92a3178e327564e135a",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"emphasized_text_contents": [
"Tell people how to contact you."
],
"emphasized_text_tags": [
"b"
],
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Tell people how to contact you. Share your timezone and links to Slack channels, email aliases, or other contact details your team uses so visitors can contact you with questions or feedback about your team's work.",
"type": "NarrativeText"
},
{
"element_id": "eab79997042ec6e273d0a13383347a57",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Use shortcuts for easy access",
"type": "Title"
},
{
"element_id": "29cdfa9dda669b1dac60890795ab526c",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Shortcuts are helpful for important pages that members of a space might need to get to often. These shortcuts are added and organized by the space administrator. Space admins can link to pages in the space, other related spaces, or relevant external web content as well as reorder the shortcuts as needed.",
"type": "NarrativeText"
},
{
"element_id": "3251fe353cdbb64ce5cf084aef00cd96",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "💭Start discussions with inline comments",
"type": "Title"
},
{
"element_id": "29a93ef334092c2a12daf86b1c1b61fb",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
],
"link_texts": [
"Inline comments"
],
"link_urls": [
"https://support.atlassian.com/confluence-cloud/docs/comment-on-pages-and-blog-posts/"
]
},
"text": "Thoughtful responses can get lost and lose context as email replies pile up. And if you neglect to copy someone or want to add them later on, it's difficult for them to get up to speed. Inline comments allow anyone (or everyone) to huddle around an idea while referencing key information on the project page.",
"type": "NarrativeText"
},
{
"element_id": "15cc91b0ec273ab28ab202cd5e7836ea",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "To leave an inline comment, highlight text on the page and the comment icon will appear.",
"type": "NarrativeText"
},
{
"element_id": "c606d30a11f8686a33c4f5305ab878fa",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Team members with permission to access the page can respond to any comment. Plus, when a comment thread comes to its natural conclusion, comments can be resolved and cleared away.",
"type": "NarrativeText"
},
{
"element_id": "9cec5c4cb40b1424590a7d2255ba5d98",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "👋Loop in team members with @mentions",
"type": "Title"
},
{
"element_id": "158ce46e2f05121666d26652b44ce556",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
],
"link_texts": [
"@mentions"
],
"link_urls": [
"https://support.atlassian.com/confluence-cloud/docs/mention-a-person-or-team/"
]
},
"text": "@mentions on Confluence function like @mentions on social media platforms like Twitter, Instagram, and Slack. Type the @ symbol on a Confluence page or in a comment, begin spelling a team member's first name, and a list will appear. Select the individual to ask a question or assign a task.",
"type": "NarrativeText"
},
{
"element_id": "aedbcb95b475418adc9e82fb50e1832f",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "👏Endorse ideas with reactions",
"type": "Title"
},
{
"element_id": "9dcf5a605331e2e0db925a329a727df8",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Use reactions when you want to support a comment or acknowledge you've seen one without clogging up the thread with another comment.",
"type": "NarrativeText"
},
{
"element_id": "a26e40b5555fb394e0844b7ae0118a90",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "You can also use reactions on a page or blog post. The author of the content will be notified, and if enough team members react or add comments to the content, it'll be surfaced on Confluence home feed",
"type": "NarrativeText"
},
{
"element_id": "04dfe464a23b5192ca7465fca96e8a56",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Take your Confluence space to the next level",
"type": "Title"
},
{
"element_id": "06b459a1ab6ee59cbf44705c24934f15",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Extend the capabilities of your Confluence pages by adding extra functionality or including dynamic content.",
"type": "NarrativeText"
},
{
"element_id": "7d4a53bc8e11c662ba62212041b24cf6",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"emphasized_text_contents": [
"To add functionality:"
],
"emphasized_text_tags": [
"b"
],
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "To add functionality:",
"type": "NarrativeText"
},
{
"element_id": "29eaf10632e9bd8a0f0c46ac3f6ff876",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Type ' / ' to open the list of items available to use",
"type": "NarrativeText"
},
{
"element_id": "885e34b9230d70d0c3257eef2d3f6a0f",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Find the item to be inserted and select it",
"type": "NarrativeText"
},
{
"element_id": "258ee604863fd54e308f2925d07ebd79",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"emphasized_text_contents": [
"Insert"
],
"emphasized_text_tags": [
"b"
],
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Select Insert",
"type": "Title"
},
{
"element_id": "04a5e0e0b40cb961c84088dcc67b26b7",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Useful elements for Team space",
"type": "Title"
},
{
"element_id": "bd4f8d2535746efce21ce872c09ef973",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"emphasized_text_contents": [
"Introduce the team"
],
"emphasized_text_tags": [
"b"
],
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Introduce the team",
"type": "Title"
},
{
"element_id": "433789f2b20ca6275f62a944390e3c1d",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
],
"link_texts": [
"user profiles"
],
"link_urls": [
"https://support.atlassian.com/confluence-cloud/docs/insert-the-user-profile-macro/"
]
},
"text": "Add user profiles to display a short summary of a given Confluence user's profile with their role, profile photo and contact details.",
"type": "NarrativeText"
},
{
"element_id": "959ffe89453ca67c279ed576df24e196",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"emphasized_text_contents": [
"Share news and announcements with your team"
],
"emphasized_text_tags": [
"b"
],
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Share news and announcements with your team",
"type": "Title"
},
{
"element_id": "8b81b2db2cef191090cfa1d4204b8964",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
],
"link_texts": [
"blog posts"
],
"link_urls": [
"https://support.atlassian.com/confluence-cloud/docs/insert-the-blog-posts-macro/"
]
},
"text": "Display a stream of latest blog posts so your team can easily see what's been going on.",
"type": "NarrativeText"
},
{
"element_id": "3fd46bb09e57e95f1211f475c45b575b",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"emphasized_text_contents": [
"Display a list of important pages"
],
"emphasized_text_tags": [
"b"
],
"filetype": "text/html",
"languages": [
"eng"
]
},
"text": "Display a list of important pages",
"type": "NarrativeText"
},
{
"element_id": "5cbfe913e369743f1f14830c0b6572ab",
"metadata": {
"data_source": {
"date_created": "2023-07-09T12:54:45.288000",
"date_modified": "2023-07-09T12:54:45.288000",
"record_locator": {
"page_id": "1605956",
"url": "https://unstructured-ingest-test.atlassian.net"
},
"url": "https://unstructured-ingest-test.atlassian.net/wiki/rest/api/content/1605956",
"version": "1"
},
"filetype": "text/html",
"languages": [
"eng"
],
"link_texts": [
"content report table"
],
"link_urls": [
"https://support.atlassian.com/confluence-cloud/docs/insert-the-content-report-table-macro/"
]
},
"text": "Paste in page URLs to create smart links, or use the content report table to create a list of all the pages in the space.",
"type": "NarrativeText"
}
]