* Add vector store id reference to embeddings config.
* generated initial vector store pytests
* cleaned up cosmosdb vector store test
* fixed class name typo and debugged cosmosdb vector store test
* reset emulator connection string
* remove unneccessary comments
* removed extra comments from azure ai search test
* ruff
* semversioner
* fix cicd issues
* bypass diskANN policy for test env
* handle floating point inprecisions
---------
Co-authored-by: Derek Worthen <worthend.derek@gmail.com>
* Added support for embeddings chunking as defined by the config.
* ran semvisor -t patch
* Eliminated redunant code by using the embed_text strategy directly
* Added fix to support brakets within the corpus text; For example, inline LaTeX within a markdown file
---------
Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com>
* Update API overview
* Fix global search example
* Fix local search example
* Fix global dynamic example
* Fix drift example
* Update multi-index example
* Semver
* Added support for verbose logging and csv-metadata to the prompt tune client.
* Updated community report summarization file name and prompt template
* updated semversioner
* ran ruff linter
* Ran poe format
* Fix Ruff complains
* Fix a new ruff complain :P
* Pyright
* Fix tests
---------
Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Add pipeline state property bag to run context
* Move state creation out of context util
* Move callbacks into PipelineRunContext
* Semver
* Rename state.json to context.json to avoid confusion with stats.json
* Expand smoke test row count
* Add util to create storage and cache
* Move verb tests to regular CI
* Clean up env vars
* Update smoke runtime expectations
* Rework artifact assertions
* Fix plural in name
* remove redundant artifact len check
* Remove redundant artifact len check
* Adjust graph output expectations
* Update community expectations
* Include all workflow output
* Adjust text unit expectations
* Adjust assertions per dataset
* Fix test config param name
* Update nan allowed for optional model fields
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Add callbacks to global search
* Add callbacks to local search
* Add streaming callbacks in local search CLI
* Add callbacks to basic search
* Add callbacks to DRIFT search
* Semver
* Return generators directly in API
* Guard callbacks
* Add workflow registration
* Add ability to mutate config by workflows
* Separate graph finalization
* Separate graph pruning
* Semver
* Update tests
* Update smoke tests
* Fix iterrows on create_graph
* Remove prune_graph from llm construction
* Update test data
* Remove prune_graph from smoke tests
* Add children to the community tables
* Replace NaN children with empty list
* Replace subcommunity logic with built-in parent/child fields
* Remove restore_community_hierarchy
* Add children and frequency to migration notebook
* Format
* Semver
* Add children to reports
* Update tests
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Rework update output structure
* Semver
* Fix unit test
* Update frequency in incremental
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* consolidate query api functions and remove code duplication
* refactor and remove more code duplication
* Add semversioner file
* fix basic search
* fix drift search and update base class function names
* update example notebooks
* Add vector store id reference to embeddings config.
* changed structure of output config section
* added cli integration for multi index global
* added cli integration for multi index local
* added cli integration for multi index drift and basic
* finished local testing of multi-index cli
* ruff fixes
* partially refactored test code to align with new output section
* more test changes for new output structure
* semversioner
* refactored to align with new multi index config proposal
* locally tested new multi-index output proposal
* cleaned up tests to align with new structure
---------
Co-authored-by: Derek Worthen <worthend.derek@gmail.com>
Corrected "this values" to "these values" for improved clarity. This ensures the documentation is more accurate and professional.
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
Corrected a missing backtick in a note within the `GRAPHRAG_API_KEY` description. This ensures proper code formatting and improves readability in the documentation. No content was altered aside from formatting adjustments.
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
* remove unused columns and change property document_attribute_columns to metadata
* format file
* fix 'metadata' column on output
* run check
* fix test on nltk
* remove docs changes