Ahmet Melek b7674fb97e
feat: confluence connector (cloud) (#906)
* Add confluence connector and an example script

* add test script, add dependency installations

* add authentication secret variables for ci tests and actions

* add dependency installation commands for workflows

* add dependency installation commands for workflows

* Update ingest test fixtures (#907)

Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>

* add add ingest test fixtures update workflow for python 3.10, update example script with dummy values

* change workflow name to avoid confusion

* change workflow name to avoid confusion

* only leave 3.8 in ingest test matrix to test consistent partitioning among python versions, remove 3.10 workflow for the test fixtures update

* only leave 3.8 in ingest test matrix to test consistent partitioning among python versions

* Update ingest test fixtures (#911)

Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>

* revert back the test python version matrix

* recompile dependencies

* modifications for shellcheck

* update changelog and version

* changelog and version

* remove comments

* Update ingest test fixtures (#915)

Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>

* add the option to state the number of spaces to be fetched

* add scroll functionality, expose --confluence-num-of-spaces, --confluence-list-of-spaces and --confluence-num-of-docs-from-each-space to users

* add help message

* add docstrings for two tests, validate grabbing every doc in the fetched spaces, count number of files instead of diffing for confluence2 test

* change test names

* rename connector arg

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>

* change arg name for connector

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>

* add comment to example

* change arg names

* add new tests to ingest test

* shellcheck remove redundant statement

* Update ingest test fixtures (#932)

Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>

* Update ingest test fixtures (#936)

Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>

* linting

* change file extensions to parse as html

* Update ingest test fixtures (#943)

Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>

* remove old fixtures

* update version to 0.8.2-dev3

* change file to trigger CI

* change file to trigger CI

* change file to trigger CI

* change file to trigger CI

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>
2023-07-18 19:29:41 +01:00

102 lines
2.5 KiB
JSON

[
{
"type": "NarrativeText",
"element_id": "74845621f15eff6061cc72e39ab19274",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 1"
},
{
"type": "NarrativeText",
"element_id": "34e254a6b1afac8256d96c37f7c39da7",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 2"
},
{
"type": "NarrativeText",
"element_id": "80338a319b6a3d406de2c0568f1abf4a",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 3"
},
{
"type": "NarrativeText",
"element_id": "27364c08c56c166059ee03d76f896b8b",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 4"
},
{
"type": "NarrativeText",
"element_id": "3083b4f1f02fe4cc80ad042cb236526d",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 5"
},
{
"type": "NarrativeText",
"element_id": "b870284cb9e641ae78b3f5069ac825ca",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 6"
},
{
"type": "NarrativeText",
"element_id": "fd7fee3b265a6113aa8867c1b42e3c33",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 7"
},
{
"type": "NarrativeText",
"element_id": "7fddc5d7d2db680eaf8b0511d8dd6199",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 8"
},
{
"type": "NarrativeText",
"element_id": "299cc72768b0b779e1abb07f743b7209",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 9"
},
{
"type": "NarrativeText",
"element_id": "73455957eed859f333fa8c54d8869dec",
"metadata": {
"data_source": {},
"filetype": "text/html",
"page_number": 1
},
"text": "Testdoc1 has only this text for 10 times: 10"
}
]