342 Commits

Author SHA1 Message Date
Alonso Guevara
737a471d18
Pandas-ify Create Final Entities (#1225) 2024-09-26 15:09:40 -06:00
Nathan Evans
ce71bcf7fb
Collapse create final entities (#1220)
* Collapse create_final_entities

* Update smoke tests

* Semver

* Remove prints

* Update embedding assertions
2024-09-25 17:35:44 -07:00
Nathan Evans
3217013019
Revisit create final text units (#1216)
* Add embeddings to collapsed subflow

* Semver

* Fix smoke tests
2024-09-25 16:55:27 -07:00
Nathan Evans
73e709b686
Collapse create final covariates (#1215)
* Add covariate test

* Add detailed mock assertions

* Collapse create_final_covariates

* Delete unused doc_id field

* Semver

* Update smoke test

* Remove unused subject/object type columns
2024-09-25 16:30:22 -07:00
Alonso Guevara
0952014fa9
Fix issue 1173 - Nested json parsing (#1218) 2024-09-25 17:11:49 -06:00
Nathan Evans
14750f4d37
Collapse create final documents (#1217)
* Collapse create_final_documents

* Semver
2024-09-25 15:50:46 -07:00
Alonso Guevara
dda4edd0fd
Pandas-ify Create Base Documents (#1209) 2024-09-24 18:37:45 -06:00
Nathan Evans
f518c8b80b
Collapse relationship embeddings (#1199)
* Merge text_embed into a single relationships subflow

* Update smoke tests

* Semver

* Spelling
2024-09-24 15:03:26 -07:00
Nathan Evans
1755afbdec
Collapse create base text units (#1178)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Setup initial test and config

* Collapse create_base_text_units

* Semver

* Spelling

* Fix smoke tests

* Addres PR comments

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-23 16:55:53 -07:00
Alonso Guevara
be7d3eb189
Remove aggregate_df from final coomunities and final text units (#1179)
* Remove aggregate_df from final coomunities and final text units

* Semver

* Ruff and format

* Format

* Format

* Fix tests, ruff and checks

* Remove some leftover prints

* Removed _final_join method
2024-09-23 16:54:15 -06:00
Nathan Evans
fbc483e4e5
Collapse create base documents (#1176)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Clean up some df/tests
2024-09-23 13:24:06 -07:00
JunHo Kim (김준호)
ea468204bc
Fix typo in documentation for customizability (#1160)
Corrected a misspelling of 'customizability' in the env_vars.md documentation. This change ensures clarity and accuracy in the description of input data handling configurations.

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-20 14:52:44 -06:00
Nathan Evans
f8ab1b30dc
Collapse create_final_nodes (#1171)
* Collapse create_final_nodes

* Update smoke tests

* Typo

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-20 13:48:56 -07:00
Alonso Guevara
fb65989c05
Incremental indexing/update old outputs (#1155)
* Create entypoint for cli and api (#1067)

* Add cli and api entrypoints for update index

* Semver

* Update docs

* Run tests on feature branch main

* Better /main handling in tests

* Incremental indexing/file delta (#1123)

* Calculate new inputs and deleted inputs on update

* Semver

* Clear ruff checks

* Fix pyright

* Fix PyRight

* Ruff again

* Update Final Entities merging in new and existing entities from delta

* Update formatting

* Pyright

* Ruff

* Fix for pyright

* Yet Another Pyright test

* Pyright

* Format
2024-09-20 14:21:50 -06:00
Chris Trevino
1dbcc42b81
Remove redundant code from error-handling code in GlobalSearch (#1170)
* remove a redundant retry

* semver

* formatting

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-20 11:29:56 -06:00
Alonso Guevara
16b4ea5dc9
Release v0.3.6 (#1172) v0.3.6 2024-09-19 18:29:52 -06:00
dependabot[bot]
b61c4ec737
Bump JamesIves/github-pages-deploy-action from 4.6.3 to 4.6.4 (#1104)
Bumps [JamesIves/github-pages-deploy-action](https://github.com/jamesives/github-pages-deploy-action) from 4.6.3 to 4.6.4.
- [Release notes](https://github.com/jamesives/github-pages-deploy-action/releases)
- [Commits](https://github.com/jamesives/github-pages-deploy-action/compare/v4.6.3...v4.6.4)

---
updated-dependencies:
- dependency-name: JamesIves/github-pages-deploy-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-19 18:07:44 -06:00
Nathan Evans
ae094bb144
Collapse create final relationships (#1158)
* Collapse pre/post embedding workflows

* Semver

* Fix smoke tests

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-19 17:38:01 -06:00
dependabot[bot]
bd2c1da9a8
Bump path-to-regexp from 6.2.1 to 6.3.0 in /docsite (#1130)
Bumps [path-to-regexp](https://github.com/pillarjs/path-to-regexp) from 6.2.1 to 6.3.0.
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md)
- [Commits](https://github.com/pillarjs/path-to-regexp/compare/v6.2.1...v6.3.0)

---
updated-dependencies:
- dependency-name: path-to-regexp
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-19 15:55:31 -06:00
Alonso Guevara
84fb14ce4d
Chore/dependency cleanup (#1169)
* fix dependencies with deptry

* change order in pyproject.toml

* fix

* Dependency updates and cleanup

* Future required

---------

Co-authored-by: Florian Maas <fpgmaas@gmail.com>
2024-09-19 15:08:13 -06:00
Alonso Guevara
96a2460375
Release v0.3.5 (#1166) v0.3.5 2024-09-19 11:34:49 -06:00
longyunfeigu
95409ff4bf
Remove lancedb_dir redundant assignments (#1163)
Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-19 09:25:10 -06:00
Alonso Guevara
ac234f47bd
Fix prompt tune output path on cli (#1157) 2024-09-19 09:22:17 -06:00
Derek Worthen
3b09df6e07
Migrate towards using static output directories (#1113)
* Migrate towards using static output directories

- Fixes load_config eagering resolving directories.
    Directories are only resolved when the output
    directories are local.
- Add support for `--output` and `--reporting` flags
    for index CLI. To achieve previous output structure
    `index --output run1/artifacts --reports run1/reports`.
- Use static output directories when initializing
    a new project.
- Maintains backward compatibility for those using
    timestamp outputs locally.

* fix smoke tests

* update query cli to work with static directories

* remove eager path resolution from load_config. Support CLI overrides that can be resolved.

* add docs and output logs/artifacts to same directory

* use match statement

* switch back to if statement

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-18 17:36:50 -06:00
Alonso Guevara
10910797d0
Fix seed init in clustering (#1156) 2024-09-18 17:22:52 -06:00
Josh Bradley
594084f156
Improve and cleanup logging output of indexing (#1144) 2024-09-18 14:38:13 -04:00
Nathan Evans
aa5b426f1d
Collapse final communities workflow (#1150)
* Collapse create_final_communities

* Semver

* Spellcheck

* Clean up filtering

* Add space in title

* Format

* Cleanup imports and format

* Spruce up the tests

* Update dictionary.txt

* Spellcheck

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-17 17:04:42 -07:00
Nathan Evans
a473265580
Collapse verbs: create_final_text_units (#1143)
* Load default config in verb tests

* Load proper workflow config

* Collapse text unit pre-embedding steps

* Format

* Update smoke tests

* Semver

* Format

* Merge join* subflows into create_final_text_units

* Remove join_text_units_to_covariate_ids

* Format

* Remove join_text_units_to_entity_ids

* Remove join_text_units_to_relationship_ids

* Clean up merges and aggregations

* Remove unnecessary cast
2024-09-17 10:32:25 -07:00
Josh Bradley
f7f96c31bb
Cleanup cli (#1127) 2024-09-17 01:37:27 -04:00
Nathan Evans
d22c0e7836
Covariate collapse (#1142)
* Setup basic verb test runner

* Replace join_text_units_to_entity_ids with subflow

* Update comments

* Replace join_text_units_to_relationship_ids subflow

* Roll in final select

* Reuse assertion util

* Small fix + format

* Format/typing

* Semver

* Format/typing

* Semver

* Revert format changes

* Fix smoke test subworkflow count

* Edit subworkflows for another smoke test

* Update test parquets for covariates

* Collapse covariate join

* Rework subtasks for per-flow customization

* Format

* Semver

* Fix smoke test
2024-09-16 12:35:45 -07:00
Nathan Evans
2de302ff0d
Verb merge nre1 (#1140)
* Setup basic verb test runner

* Replace join_text_units_to_entity_ids with subflow

* Update comments

* Replace join_text_units_to_relationship_ids subflow

* Roll in final select

* Reuse assertion util

* Small fix + format

* Format/typing

* Semver

* Format/typing

* Semver

* Revert format changes

* Fix smoke test subworkflow count

* Edit subworkflows for another smoke test
2024-09-16 12:10:29 -07:00
Alonso Guevara
cb4f2b43a7
Fix seeded random gen on clustering step (#1132) 2024-09-12 17:42:50 -06:00
Alonso Guevara
8c7f0dfc1b
Fix duplicates in community context builder (#1131)
* fix: fix the bug that community context builder will cause a report to be repeated twice in local mode.

* Fix duplicates in community context builder

* Small tweaks on code

---------

Co-authored-by: jarlor <zjl58960902@outlook.com>
2024-09-12 15:47:08 -06:00
Roberto Corno
fcfa7b1329
Update factories.py to allow the usage of the request timeout ChatOpe… (#1115)
Update factories.py to allow the usage of the request timeout ChatOpenAI parameter

allow the usage of the request timeout ChatOpenAI parameter

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-12 13:51:48 -06:00
JunHo Kim (김준호)
7b8f5ba51f
Correct links to datashaper verbs in comments (#1068)
Correct links to verbs in comments

Updated the links in comments to reflect new paths for 'derive' and 'aggregate' verbs. This improves documentation and ensures that references are up to date for future developers.

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-12 12:44:38 -06:00
Alonso Guevara
8a0bc0535f
Release v0.3.4 (#1125) 2024-09-11 16:45:43 -06:00
Alonso Guevara
c0d535d0c2
Fix summarization including empty descriptions (#1124)
* Fix summarization including empty descriptions

* Update
2024-09-11 16:30:49 -06:00
Alonso Guevara
cdf5fc4d67
Deep copy txt units on local search to avoid race conditions (#1118)
* Deep copy txt units on local search to avoid race conditions

* Format
2024-09-11 14:12:03 -06:00
Derek Worthen
e7ee8cb8a5
release v0.3.3 (#1116) v0.3.3 2024-09-10 13:07:07 -07:00
Doug Orbaker
1b559726ac
Update create_pipeline_config.py (#1108)
* Update create_pipeline_config.py

Order switched to ensure that user settings at runtime take precedence.

* Updated semversioner.
2024-09-10 11:35:47 -06:00
KennyZhang1
27c5468a8b
Load query from blob (#1095)
* Moved query loading from file to helper function

* added loading parquets from blob to function

* resolved adlfs async error

* debugging cleanup and small fixes

* added connection string support

* semversioner and ruff fixes

* completed testing for merge with main

* more ruff changes

* fixed unbound vars warning

* rewrote function to use storage utils

* removed unused vars

---------

Co-authored-by: Kenny Zhang <zhangken@microsoft.com>
2024-09-05 18:17:22 -04:00
Alonso Guevara
044516f538
Clean and organize run index code (#1090)
* Create entypoint for cli and api (#1067)

* Add cli and api entrypoints for update index

* Semver

* Update docs

* Run tests on feature branch main

* Better /main handling in tests

* Clean and organize run index code

* Ruff fix

* Pyright fix

* Format fixes

* Pyright fix

* Format

* Fix integ tests

* Fix ruff

* Reorganize and clean up
2024-09-05 08:15:10 -06:00
Derek Worthen
2d45ece9b6
fix setting base_dir to full paths when not using file system. (#1096)
* fix setting base_dir to full paths when not using file system.

* add general resolve_path
2024-09-04 11:33:44 -07:00
Derek Worthen
ab29cc2a7e
Consistent config load_config (#1065)
* Consistent config load_config

- Provide a consistent way to load configuration
- Resolve potential timestamp directories upfront
    upon config object creation
- Add unit tests for resolving timestamp directories
- Resolves #599
- Resolves #1049

* fix formatting issues

* remove unnecessary path resolution

* fix smoke tests

* update prompts to use load_config

* Update none checks

* Update none checks

* Update searching for config method signature

* Update unit tests

* fix formatting issues
2024-09-03 16:33:16 -06:00
Alonso Guevara
3f9800230f
Fix img width (#1061) 2024-08-29 17:02:47 -06:00
Alonso Guevara
7ffce8d7ba
Fix img for autotune (#1060)
* Fix img for autotune

* Add line breaks to tune docs

* More line breaks
2024-08-29 16:56:34 -06:00
Alonso Guevara
6fc452b954
Update bash example in docs for prompt tune (#1059)
* Semver

* Update bash command
2024-08-29 16:35:32 -06:00
Alonso Guevara
e023882033
Update Prompt Tuning docs (#1057)
* Update Prompt Tuning docs

* Semver
2024-08-29 16:00:07 -06:00
dependabot[bot]
d13aec5dca
Bump jupyterlab from 4.2.4 to 4.2.5 (#1056)
Bumps [jupyterlab](https://github.com/jupyterlab/jupyterlab) from 4.2.4 to 4.2.5.
- [Release notes](https://github.com/jupyterlab/jupyterlab/releases)
- [Changelog](https://github.com/jupyterlab/jupyterlab/blob/@jupyterlab/lsp@4.2.5/CHANGELOG.md)
- [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/lsp@4.2.4...@jupyterlab/lsp@4.2.5)

---
updated-dependencies:
- dependency-name: jupyterlab
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-29 12:53:41 -06:00
dependabot[bot]
0b1f7db7d8
Bump notebook from 7.2.1 to 7.2.2 (#1055)
Bumps [notebook](https://github.com/jupyter/notebook) from 7.2.1 to 7.2.2.
- [Release notes](https://github.com/jupyter/notebook/releases)
- [Changelog](https://github.com/jupyter/notebook/blob/@jupyter-notebook/tree@7.2.2/CHANGELOG.md)
- [Commits](https://github.com/jupyter/notebook/compare/@jupyter-notebook/tree@7.2.1...@jupyter-notebook/tree@7.2.2)

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-29 12:37:12 -06:00