* Optimized _build_text_unit_context function for improved time and space complexity
Refactored the _build_text_unit_context function to enhance its performance and efficiency. Key optimizations include:
1. Set for Text Unit IDs: Replaced list-based membership checks with a set (text_unit_ids_set) to achieve constant-time complexity for membership checks, reducing overall time complexity.
2. Direct Attribute Removal: Utilized pop with a default value (None) to directly remove attributes entity_order and num_relationships from text units, minimizing overhead and avoiding potential KeyError.
3. Default Dictionary for Entity Orders: Implemented defaultdict for managing entity orders, simplifying the ranking process and improving readability.
These improvements result in a more efficient function with better performance, especially when handling large datasets or numerous selected entities. The refactoring ensures that the core functionality remains unchanged while enhancing both time and space complexity.
* Format
* Ruff fixes
* semver
---------
Co-authored-by: arjun-234 <arjun.darji@yudiz.com>
Co-authored-by: Arjun D. <103405661+arjun-234@users.noreply.github.com>
* Added graphrag_import_neo4j_cypher Notebook
* changed to procedure for setting embedding property to save disk space
* Reformat and cleanup
* semver
* Poetry lock update
* Update AAIS docs
* Rename contrib folder
* Merge from main
* Revert "Merge from main"
This reverts commit a399dde97b689a5b5c62dc2e9c2290cb2503b3a4.
* Fix ruff check
* Add readme and fix tests
* Fix community reports
---------
Co-authored-by: Michael Hunger <github@jexp.de>
* changed placement of lancedb dir to under /artifacts
* ruff checks and semversioner
* added support for static paths
* added support for streaming
* more ruff changes
* ruff format changes
* removed string concat for path formation
* added more ruff checks
* removed os.join usage
* more ruff fixes and removed unneccesary path creations
* replaced cast calls with str()
---------
Co-authored-by: Kenny Zhang <zhangken@microsoft.com>
* Remove excess vars from gh-pages build
* Delete redundant javascript ci
* Pull apart testing CI
* Clean up integration tests build
* Move storage tests to integration CI
* Take py 3.10 out of smoke tests matrix
* Use minimum supported python version for most tests
* Re-run main CI on any test change
* Add Josh and Kenny to author list
* Update auto-resolve perms
* Initial Index API
- Implement main API entry point: build_index
- Rely on GraphRagConfig instead of PipelineConfig
- This unifies the API signature with the
promt_tune and query API entry points
- Derive cache settings, config, and resuming from
the config and other arguments to
simplify/reduce arguments to build_index
- Add preflight config file validations
- Add semver change
* fix smoke tests
* fix smoke tests
* Use asyncio
* Add e2e artifacts in GH actions
* Remove unnecessary E2E test, and add skip_validations flag to cli
* Nicer imports
* Reorganize API functions.
* Add license headers and module docstrings
* Fix ignored ruff rule
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Added streaming output support for global search. Introduce `--streaming` flag to enable or disable streaming mode
* ran ruff format --preview
* update
* cleanup code and streaming api
* update cli argument
* remove whitespace
* checkpoint - add context data to streaming api
* cleanup help menu
* ruff format update
* add context data to streaming response
* add semversioner file
* rename variable for better readability
* rename variable for better readability
* ruff fixes
* fix abstract class type annotation
* add documentation for --streaming CLI flag
---------
Co-authored-by: 6GOD <55304045+6ixGODD@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Add support for both float and int on schema validation for community report generation
* Cast instead of type check
* Add mising file
* Add prompt with ints to smoke tests
* Fix unit tests
* Fix unit tests
* Add stricter filtering and tests for cli data directory discovery
* Semver
* Ignore ruff on error type
* Format
* Fix for windows paths
* Fix for windows paths
* Uncomment blob tests
* Sort by timestamp name instead of modified date
* Format
* Add additional folder name test
* fix strategy config in entity_extraction
* should not post token list to the embedding model
* fix embedding in local query
* add sembersioner
* remove strategy
---------
Co-authored-by: KylinMountain <kose2livs@gmail.com>
* Fix sort_context max_tokens & max_tokens param in verb
* Fix sort_context for windows test
* add semversioner file
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* initial API redesign
* typo fix
* update docstring
* update docsring
* remove artifacts caused by the merge from main
* minor typo updates
* add semversioner check
* switch API to async function calls
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>