mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-12 03:25:54 +00:00

Part two of: https://github.com/Unstructured-IO/unstructured/pull/2842 Main changes compared to part one: * hash computation includes element's sequence number on page, page number, document filename and its text * there are more test for deterministic behavior of IDs returned by partitioning functions + their uniqueness (guaranteed at the document level, and high probability across multiple documents) This PR addresses the following issue: https://github.com/Unstructured-IO/unstructured/issues/2461
296 lines
9.8 KiB
JSON
296 lines
9.8 KiB
JSON
[
|
|
{
|
|
"element_id": "0927de880e45992628b8e899cf4b2502",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "IssueID_IssueKey:10002 JCTP1-3",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "ce218183962c98e29c5305dcd95c6b3f",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "ProjectID_Key:JCTP1 Jira Connector Test Project 1",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "59b9b966e5e1d80889ee854a8dcda3a8",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "IssueType:Task",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "d17ec696109a243b36837f371251a056",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "Status:Done",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "a488ba3dd0675c7116c4461ee98fe428",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "Priority:{'self': 'https://unstructured-jira-connector-test.atlassian.net/rest/api/2/priority/3', 'iconUrl': 'https://unstructured-jira-connector-test.atlassian.net/images/icons/priorities/medium.svg', 'name': 'Medium', 'id': '3'}",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "f2b676eff2dfcd752c262a9a93659c0c",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "AssigneeID_Name:{} {}",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "fa97e11a58a2ad94422234cb089da676",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "ReporterAdr_Name:devops+jira-connector@unstructured.io Unstructured Devops",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "1b516aaca2fb235c0ee924f8ce803c5b",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "Labels:",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "6b5245928020d946a9ad11f297594c64",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "Components:",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "8acc6598b786e0a15f62cca7a49b699b",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "Unstructured Devops My comment 1",
|
|
"type": "Title"
|
|
},
|
|
{
|
|
"element_id": "3492cd5d2f8dcb671484dabcade1adb3",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "Test done 1",
|
|
"type": "NarrativeText"
|
|
},
|
|
{
|
|
"element_id": "f51bf7082d3c3ee4569b37d7a35faee1",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "_At iam decimum annum in spelunca iacet._ Quid ei reliquisti, nisi te, quoquo modo loqueretur, intellegere, quid diceret? Non laboro, inquit, de nomine. Duo Reges: constructio interrete. Nummus in Croesi divitiis obscuratur, pars est tamen divitiarum. Bork Itaque quantum adiit periculum! ad honestatem enim illum omnem conatum suum referebat, non ad voluptatem.",
|
|
"type": "NarrativeText"
|
|
},
|
|
{
|
|
"element_id": "eca36409da50b07dfb13918d41c51ff1",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "[Bork|http://loripsum.net/] Illi enim inter se dissentiunt. Et non ex maxima parte de tota iudicabis? [Refert tamen, quo modo.|http://loripsum.net/] [Quam ob rem tandem, inquit, non satisfacit?|http://loripsum.net/] Ex quo, id quod omnes expetunt, beate vivendi ratio inveniri et comparari potest.",
|
|
"type": "NarrativeText"
|
|
},
|
|
{
|
|
"element_id": "99a0634fe4beaa8d5ef99e3548ac1b90",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-22T11:29:46.189000+00:00",
|
|
"date_modified": "2023-08-23T14:36:31.252000+00:00",
|
|
"record_locator": {
|
|
"base_url": "https://unstructured-jira-connector-test.atlassian.net",
|
|
"issue_key": "JCTP1-3"
|
|
},
|
|
"url": "https://unstructured-jira-connector-test.atlassian.net/browse/JCTP1-3"
|
|
},
|
|
"filetype": "text/plain",
|
|
"languages": [
|
|
"cat",
|
|
"ita"
|
|
]
|
|
},
|
|
"text": "Hic nihil fuit, quod quaereremus. Itaque haec cum illis est dissensio, cum Peripateticis nulla sane. Vos autem cum perspicuis dubia debeatis illustrare, dubiis perspicua conamini tollere. [Quae cum essent dicta, discessimus.|http://loripsum.net/] Nam, ut sint illa vendibiliora, haec uberiora certe sunt. [Equidem, sed audistine modo de Carneade?|http://loripsum.net/]",
|
|
"type": "NarrativeText"
|
|
}
|
|
] |