342 Commits

Author SHA1 Message Date
Alonso Guevara
fb56b7aed0
Fix circular dependency on prompt tune api (#1054) 2024-08-29 12:11:07 -06:00
guangxiangdebizi
1e8bb409f6
Update indexer_adapters.py (#895)
Update the lines 71 and 72
before:
entity_df["community"] = entity_df["community"].fillna(-1)
entity_df["community"] = entity_df["community"].astype(int)
after:
entity_df.loc[:, "community"] = entity_df["community"].fillna(-1)
entity_df.loc[:, "community"] = entity_df["community"].astype(int)

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-28 17:53:33 -06:00
Ikko Eltociear Ashimine
26bcdf39ed
docs: update manual_prompt_tuning.md (#963)
paramater -> parameter

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-28 17:49:35 -06:00
fantom845
a3048487a1
fix for issue 515 (#925)
* fix for issue 515

* semver impact document

---------

Co-authored-by: Kanishk Tyagi <kanishktyagi@Kanishks-MacBook-Pro.local>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-28 17:47:48 -06:00
Alonso Guevara
480181769c
Fix/entity extraction strategy (#1046)
* fix strategy config in entity_extraction

* update init content

---------

Co-authored-by: KylinMountain <kose2livs@gmail.com>
2024-08-28 17:33:05 -06:00
dependabot[bot]
ee734e6003
Bump textual from 0.76.0 to 0.78.0 (#1038)
Bumps [textual](https://github.com/Textualize/textual) from 0.76.0 to 0.78.0.
- [Release notes](https://github.com/Textualize/textual/releases)
- [Changelog](https://github.com/Textualize/textual/blob/main/CHANGELOG.md)
- [Commits](https://github.com/Textualize/textual/compare/v0.76.0...v0.78.0)

---
updated-dependencies:
- dependency-name: textual
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-28 16:38:40 -06:00
dependabot[bot]
2f59701836
Bump lancedb from 0.11.0 to 0.12.0 (#1024)
Bumps [lancedb](https://github.com/lancedb/lancedb) from 0.11.0 to 0.12.0.
- [Release notes](https://github.com/lancedb/lancedb/releases)
- [Changelog](https://github.com/lancedb/lancedb/blob/main/release_process.md)
- [Commits](https://github.com/lancedb/lancedb/compare/python-v0.11.0...python-v0.12.0)

---
updated-dependencies:
- dependency-name: lancedb
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-28 16:11:35 -06:00
dependabot[bot]
89d1f02551
Bump json-repair from 0.26.0 to 0.28.4 (#1044)
Bumps [json-repair](https://github.com/mangiucugna/json_repair) from 0.26.0 to 0.28.4.
- [Release notes](https://github.com/mangiucugna/json_repair/releases)
- [Commits](https://github.com/mangiucugna/json_repair/compare/0.26.0...v0.28.4)

---
updated-dependencies:
- dependency-name: json-repair
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-28 15:34:51 -06:00
dependabot[bot]
da440f749b
Bump pytest-asyncio from 0.23.8 to 0.24.0 (#1022)
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.23.8 to 0.24.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.23.8...v0.24.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-28 14:41:53 -06:00
TLP
1b51827c66
Fix INIT_YAML embeddings default settings (#1039)
Co-authored-by: Thanh Long Phan <long.phan@dida.do>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-28 14:18:59 -06:00
Alonso Guevara
22df2f80d0
Fix/text unit code cleanup (#1040)
* Optimized _build_text_unit_context function for improved time and space complexity

Refactored the _build_text_unit_context function to enhance its performance and efficiency. Key optimizations include:

1. Set for Text Unit IDs: Replaced list-based membership checks with a set (text_unit_ids_set) to achieve constant-time complexity for membership checks, reducing overall time complexity.
2. Direct Attribute Removal: Utilized pop with a default value (None) to directly remove attributes entity_order and num_relationships from text units, minimizing overhead and avoiding potential KeyError.
3. Default Dictionary for Entity Orders: Implemented defaultdict for managing entity orders, simplifying the ranking process and improving readability.

These improvements result in a more efficient function with better performance, especially when handling large datasets or numerous selected entities. The refactoring ensures that the core functionality remains unchanged while enhancing both time and space complexity.

* Format

* Ruff fixes

* semver

---------

Co-authored-by: arjun-234 <arjun.darji@yudiz.com>
Co-authored-by: Arjun D. <103405661+arjun-234@users.noreply.github.com>
2024-08-27 16:15:16 -06:00
Konstantin Gukov
5d8e60ceb7
Add source URL to the package (#927)
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-27 14:41:21 -06:00
longyunfeigu
44fd35c84f
Update VectorStoreSearchResult score value range (#937)
update VectorStoreSearchResult score comment

Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-27 14:40:47 -06:00
Alonso Guevara
75735bd103
Release v0.3.2 (#1034) v0.3.2 2024-08-26 17:57:16 -06:00
Alonso Guevara
32c0cdfcc0
Patch "past" dependency issues (#1033)
* Patch "past" dependency issues

* Semver
2024-08-26 17:03:51 -06:00
Josh Bradley
a90d210497
Improve search type hint (#1031)
* update get_local_search_engine and get_global_search_engine return annotation

* add semversioner file

* reorder imports

* fix pyright errors

* revert change and ignore previous pyright error

---------

Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai>
Co-authored-by: longyunfeigu <2514553187@qq.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-26 15:31:46 -06:00
Alonso Guevara
4c2f5376a8
Add missing config parameter for prompt tuning docs (#1017) 2024-08-26 14:38:59 -06:00
Josh Bradley
fd8e56ce6f
Update developer guide (#1029) 2024-08-26 12:28:03 -04:00
Alonso Guevara
55e74a0c2e
Fix weight casting during graph extraction (#1016)
* Fix weight casting during graph extraction

* Format

* Format
2024-08-23 20:51:59 -06:00
Alonso Guevara
e15df44f0d
Ensure entity types to be str in prompt tune (#1015) 2024-08-23 18:35:24 -06:00
dependabot[bot]
13e17d2dac
Bump ruff from 0.5.7 to 0.6.2 (#1014)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.5.7 to 0.6.2.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.5.7...0.6.2)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-23 18:00:11 -06:00
dependabot[bot]
b1d4ddd799
Bump micromatch from 4.0.5 to 4.0.8 in /docsite (#1013)
Bumps [micromatch](https://github.com/micromatch/micromatch) from 4.0.5 to 4.0.8.
- [Release notes](https://github.com/micromatch/micromatch/releases)
- [Changelog](https://github.com/micromatch/micromatch/blob/4.0.8/CHANGELOG.md)
- [Commits](https://github.com/micromatch/micromatch/compare/4.0.5...4.0.8)

---
updated-dependencies:
- dependency-name: micromatch
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-23 17:38:26 -06:00
Alonso Guevara
cb0aae7e6b
Add graphrag_import_neo4j_cypher Notebook (#593)
* Added graphrag_import_neo4j_cypher Notebook

* changed to procedure for setting embedding property to save disk space

* Reformat and cleanup

* semver

* Poetry lock update

* Update AAIS docs

* Rename contrib folder

* Merge from main

* Revert "Merge from main"

This reverts commit a399dde97b689a5b5c62dc2e9c2290cb2503b3a4.

* Fix ruff check

* Add readme and fix tests

* Fix community reports

---------

Co-authored-by: Michael Hunger <github@jexp.de>
2024-08-23 15:18:35 -06:00
KennyZhang1
dd71135995
Change lancedb placement (#996)
* changed placement of lancedb dir to under /artifacts

* ruff checks and semversioner

* added support for static paths

* added support for streaming

* more ruff changes

* ruff format changes

* removed string concat for path formation

* added more ruff checks

* removed os.join usage

* more ruff fixes and removed unneccesary path creations

* replaced cast calls with str()

---------

Co-authored-by: Kenny Zhang <zhangken@microsoft.com>
2024-08-22 11:39:55 -06:00
Josh Bradley
4b9fdc0dfe
Add context data to query responses (#1003)
* add context data to query responses

* add semversioner file

* ignore typechecking ruff suggestion
2024-08-22 12:07:50 -04:00
Alonso Guevara
9c6f5e090a
Release v0.3.1 (#1001) v0.3.1 2024-08-21 17:03:55 -06:00
Nathan Evans
f5b4d2fea5
Ci streamline (#988)
* Remove excess vars from gh-pages build

* Delete redundant javascript ci

* Pull apart testing CI

* Clean up integration tests build

* Move storage tests to integration CI

* Take py 3.10 out of smoke tests matrix

* Use minimum supported python version for most tests

* Re-run main CI on any test change

* Add Josh and Kenny to author list

* Update auto-resolve perms
2024-08-21 15:16:15 -06:00
Nathan Evans
98cabba38b
Notebook tests (#978)
* Fix notebook test runs

* Delete old issue template

* Add notebook CI action

* Print temp directories

* Print more env

* Move printing up

* Use runner_temp

* Try using current directory

* Try TMP env

* Re-write TMP

* Wrong yml

* Fix echo

* Only export if windows

* More logging

* Move export

* Reformat env write

* Fix braces

* Switch to in-memory execution

* Downgrade action perms

* Unused import
2024-08-20 17:19:37 -06:00
dependabot[bot]
8a9a2f7574
Bump uvloop from 0.19.0 to 0.20.0 (#969)
Bumps [uvloop](https://github.com/MagicStack/uvloop) from 0.19.0 to 0.20.0.
- [Release notes](https://github.com/MagicStack/uvloop/releases)
- [Commits](https://github.com/MagicStack/uvloop/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: uvloop
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-20 16:18:45 -06:00
Derek Worthen
6b4de3d841
Index API (#953)
* Initial Index API

- Implement main API entry point: build_index
- Rely on GraphRagConfig instead of PipelineConfig
    - This unifies the API signature with the
    promt_tune and query API entry points
- Derive cache settings, config, and resuming from
    the config and other arguments to
    simplify/reduce arguments to build_index
- Add preflight config file validations
- Add semver change

* fix smoke tests

* fix smoke tests

* Use asyncio

* Add e2e artifacts in GH actions

* Remove unnecessary E2E test, and add skip_validations flag to cli

* Nicer imports

* Reorganize API functions.

* Add license headers and module docstrings

* Fix ignored ruff rule

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-20 15:42:20 -06:00
dependabot[bot]
5a781dd234
Bump nltk from 3.8.1 to 3.9.1 (#966)
* Bump nltk from 3.8.1 to 3.9.1

Bumps [nltk](https://github.com/nltk/nltk) from 3.8.1 to 3.9.1.
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
- [Commits](https://github.com/nltk/nltk/compare/3.8.1...3.9.1)

---
updated-dependencies:
- dependency-name: nltk
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Download punk_tab

* Semver

* Add missing installs

* Add missing installs

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-20 14:49:39 -06:00
Josh Bradley
62546a3c14
Add streaming support for local/global search (#944)
* Added streaming output support for global search. Introduce `--streaming` flag to enable or disable streaming mode

* ran ruff format --preview

* update

* cleanup code and streaming api

* update cli argument

* remove whitespace

* checkpoint - add context data to streaming api

* cleanup help menu

* ruff format update

* add context data to streaming response

* add semversioner file

* rename variable for better readability

* rename variable for better readability

* ruff fixes

* fix abstract class type annotation

* add documentation for --streaming CLI flag

---------

Co-authored-by: 6GOD <55304045+6ixGODD@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-20 13:44:48 -06:00
longyunfeigu
a6238c654a
Move embeddings target position (#938)
move embeddings target position

Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-20 13:02:52 -06:00
Alonso Guevara
e4daf358b9
Fix gh-pages publishing (#976)
* Remove indexer run from gh-pages, and use a local zip to avoid running

* Semver
2024-08-19 16:30:55 -06:00
Nayeon Kim
84f9bae129
Update 0-architecture.md (#961) 2024-08-19 12:21:40 -06:00
KennyZhang1
3c0a98c2d8
Add preflight config file validations (#952)
Co-authored-by: Kenny Zhang <zhangken@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-08-16 17:53:32 -04:00
Nathan Evans
4040f02508
Update general_issue.yml (#956)
Copy checklist from bug/feature to general
2024-08-16 13:26:24 -07:00
Nathan Evans
bd5be7bb1a
Update issues-autoresolve.yml (#955)
Add write permissions for actions so it can update the cache
2024-08-16 13:17:23 -07:00
Alonso Guevara
0b7c5a6ae9
Add cast check on schema validation for community reports (#932)
* Add support for both float and int on schema validation for community report generation

* Cast instead of type check

* Add mising file

* Add prompt with ints to smoke tests

* Fix unit tests

* Fix unit tests
2024-08-14 16:40:47 -06:00
dependabot[bot]
36facbd000
Bump textual from 0.74.0 to 0.76.0 (#901)
Bumps [textual](https://github.com/Textualize/textual) from 0.74.0 to 0.76.0.
- [Release notes](https://github.com/Textualize/textual/releases)
- [Changelog](https://github.com/Textualize/textual/blob/main/CHANGELOG.md)
- [Commits](https://github.com/Textualize/textual/compare/v0.74.0...v0.76.0)

---
updated-dependencies:
- dependency-name: textual
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-14 13:06:55 -06:00
dependabot[bot]
1ec1d2f920
Bump azure-storage-blob from 12.21.0 to 12.22.0 (#900)
Bumps [azure-storage-blob](https://github.com/Azure/azure-sdk-for-python) from 12.21.0 to 12.22.0.
- [Release notes](https://github.com/Azure/azure-sdk-for-python/releases)
- [Changelog](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/esrp_release.md)
- [Commits](https://github.com/Azure/azure-sdk-for-python/compare/azure-storage-blob_12.21.0...azure-storage-blob_12.22.0)

---
updated-dependencies:
- dependency-name: azure-storage-blob
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-13 22:48:07 -06:00
dependabot[bot]
ba63eda7a4
Bump pyyaml from 6.0.1 to 6.0.2 (#898)
Bumps [pyyaml](https://github.com/yaml/pyyaml) from 6.0.1 to 6.0.2.
- [Release notes](https://github.com/yaml/pyyaml/releases)
- [Changelog](https://github.com/yaml/pyyaml/blob/main/CHANGES)
- [Commits](https://github.com/yaml/pyyaml/compare/6.0.1...6.0.2)

---
updated-dependencies:
- dependency-name: pyyaml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-13 18:48:51 -06:00
Nathan Evans
ac504e31a0
Add stricter filtering and tests for cli data directory discovery (#910)
* Add stricter filtering and tests for cli data directory discovery

* Semver

* Ignore ruff on error type

* Format

* Fix for windows paths

* Fix for windows paths

* Uncomment blob tests

* Sort by timestamp name instead of modified date

* Format

* Add additional folder name test
2024-08-13 17:34:14 -06:00
Alonso Guevara
d68e323193
Disable fail fast on tests (#911) 2024-08-13 12:20:14 -06:00
Alonso Guevara
f9c1bdd748
Release v0.3.0 (#912) v0.3.0 2024-08-12 18:14:52 -06:00
Alonso Guevara
4b9f268604
Fix/query embedding (#909)
* fix strategy config in entity_extraction

* should not post token list to the embedding model

* fix embedding in local query

* add sembersioner

* remove strategy

---------

Co-authored-by: KylinMountain <kose2livs@gmail.com>
2024-08-12 17:12:51 -06:00
benx13
3f31af80d2
typo summarize prompt (#907)
* typo in  entity_summarization prompt

* typo in summarize prompt

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-12 16:03:08 -06:00
Andres Morales
5a7dbaa051
Fix sort_context max_tokens & max_tokens param in verb (#888)
* Fix sort_context max_tokens & max_tokens param in verb

* Fix sort_context for windows test

* add semversioner file

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-12 15:55:31 -06:00
Josh Bradley
238f1c2adc
Implement prompt tuning API (#855)
* initial setup commit

* cleanup API and CLI interfaces

* move datatype definition to types.py

* code cleanup

* add semversioner file

* remove unused import

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-12 15:09:00 -06:00
Josh Bradley
4bcbfd10eb
Implement query api (#839)
* initial API redesign

* typo fix

* update docstring

* update docsring

* remove artifacts caused by the merge from main

* minor typo updates

* add semversioner check

* switch API to async function calls

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-12 13:40:10 -06:00