* Add models page
* Update config docs for new params
* Spelling
* Add comment on CoT with o-series
* Add notes about managed identity
* Update the viz guide
* Spruce up the getting started wording
* Capitalization
* Add BYOG page
* More BYOG edits
* Update dictionary
* Change example model name
* Update API overview
* Fix global search example
* Fix local search example
* Fix global dynamic example
* Fix drift example
* Update multi-index example
* Semver
* Add children to the community tables
* Replace NaN children with empty list
* Replace subcommunity logic with built-in parent/child fields
* Remove restore_community_hierarchy
* Add children and frequency to migration notebook
* Format
* Semver
* Add children to reports
* Update tests
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* consolidate query api functions and remove code duplication
* refactor and remove more code duplication
* Add semversioner file
* fix basic search
* fix drift search and update base class function names
* update example notebooks
Corrected a missing backtick in a note within the `GRAPHRAG_API_KEY` description. This ensures proper code formatting and improves readability in the documentation. No content was altered aside from formatting adjustments.
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
Updated the auto prompt tuning doc with `--selection-method` instead of only `--method` as per the latest API.
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Refactor config
- Add new ModelConfig to represent LLM settings
- Combines LLMParameters, ParallelizationParameters, encoding_model, and async_mode
- Add top level models config that is a list of available LLM ModelConfigs
- Remove LLMConfig inheritance and delete LLMConfig
- Replace the inheritance with a model_id reference to the ModelConfig listed in the top level models config
- Remove all fallbacks and hydration logic from create_graphrag_config
- This removes the automatic env variable overrides
- Support env variables within config files using Templating
- This requires "$" to be escaped with extra "$" so ".*\\.txt$" becomes ".*\\.txt$$"
- Update init content to initialize new config file with the ModelConfig structure
* Use dict of ModelConfig instead of list
* Add model validations and unit tests
* Fix ruff checks
* Add semversioner change
* Fix unit tests
* validate root_dir in pydantic model
* Rename ModelConfig to LanguageModelConfig
* Rename ModelConfigMissingError to LanguageModelConfigMissingError
* Add validationg for unexpected API keys
* Allow skipping pydantic validation for testing/mocking purposes.
* Add default lm configs to verb tests
* smoke test
* remove config from flows to fix llm arg mapping
* Fix embedding llm arg mapping
* Remove timestamp from smoke test outputs
* Remove unused "subworkflows" smoke test properties
* Add models to smoke test configs
* Update smoke test output path
* Send logs to logs folder
* Fix output path
* Fix csv test file pattern
* Update placeholder
* Format
* Instantiate default model configs
* Fix unit tests for config defaults
* Fix migration notebook
* Remove create_pipeline_config
* Remove several unused config models
* Remove indexing embedding and input configs
* Move embeddings function to config
* Remove skip_workflows
* Remove skip embeddings in favor of explicit naming
* fix unit test spelling mistake
* self.models[model_id] is already a language model. Remove redundant casting.
* update validation errors to instruct users to rerun graphrag init
* instantiate LanguageModelConfigs with validation
* skip validation in unit tests
* update verb tests to use default model settings instead of skipping validation
* test using llm settings
* cleanup verb tests
* remove unsafe default model config
* remove the ability to skip pydantic validation
* remove None union types when default values are set
* move vector_store from embeddings to top level of config and delete resolve_paths
* update vector store settings
* fix vector store and smoke tests
* fix serializing vector_store settings
* fix vector_store usage
* fix vector_store type
* support cli overrides for loading graphrag config
* rename storage to output
* Add --force flag to init
* Remove run_id and resume, fix Drift config assignment
* Ruff
---------
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Add new inputs and missing vector store for retrieving vectors
* Format
* Semver
* Remove .Identifier files
* Fix spellcheck
* Remove unnecessary input file for notebooks
* update index api to accept callbacks
* fix hardcoded folder name that was creating an empty folder
* add API notebook
* add semversioner file
* filename change
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Fix local question gen and example notebook
* Update global search notebook
* Add lazy blog post
* Update breaking changes doc for migration notes
* Simplify Getting Started page
* Semver
* Spellcheck
* Fix types
* Add comments on cache-free migration
* Update wording
* Spelling
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Firsst cut at config cleanup
* Reorder top nav
* Add query prompts to tuning page
* Remove dynamic notebook from nav
* Add more thorough yml config descriptions in docs
* Further clean out the config
* Semver
* Add new blog post
* Emphasize yaml
* Clarify output
* Fix unit test
* Fix bullet nesting
* Fix footer contrast
* Fix broken links
* Remove a few unneeded examples
* Point python API example to the whole folder
* Convert schema bullets to tables
Updated the wording of the example scenario from "global search" to "drift search" to accurately reflect the topic. This improves clarity and ensures the documentation accurately describes its content.
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Move indexing prompts to root
* Move query prompts to root
* Export query prompts during init
* Extract general knowledge prompt
* Load query prompts from disk
* Semver
* Fix unit tests
* Add source documents for verb tests
* Remove entity_type erroneous column
* Add new test data
* Remove source/target degree columns
* Remove top_level_node_id
* Remove chunk column configs
* Rename "chunk" to "text"
* Rename "chunk" to "text" in base
* Re-map document input to use base text units
* Revert base text units as final documents dep
* Update test data
* Split/rename node source_id
* Drop node size (dup of degree)
* Drop document_ids from covariates
* Remove unused document_ids from models
* Remove n_tokens from covariate table
* Fix missed document_ids delete
* Wire base text units to final documents
* Rename relationship rank as combined_degree
* Add rank as first-class property to Relationship
* Remove split_text operation
* Fix relationships test parquet
* Update test parquets
* Add entity ids to community table
* Remove stored graph embedding columns
* Format
* Semver
* Fix JSON typo
* Spelling
* Rename lancedb
* Sort lancedb
* Fix unit test
* Fix test to account for changing period
* Update tests for separate embeddings
* Format
* Better assertion printing
* Fix unit test for windows
* Rename document.raw_content -> document.text
* Remove read_documents function
* Remove unused document summary from model
* Remove unused imports
* Format
* Add new snapshots to default init
* Use util to construct embeddings collection name
* Align inc index model with branch changes
* Update data and tests for int ids
* Clean up embedding locs
* Switch entity "name" to "title" for consistency
* Fix short_id -> human_readable_id defaults
* Format
* Rework community IDs
* Fix community size compute
* Fix unit tests
* Fix report read
* Pare down nodes table output
* Fix unit test
* Fix merge
* Fix community loading
* Format
* Fix community id report extraction
* Update tests
* Consistent short IDs and ordering
* Update ordering and tests
* Update incremental for new nodes model
* Guard document columns loc
* Match column ordering
* Fix document guard
* Update smoke tests
* Fill NA on community extract
* Logging for smoke test debug
* Add parquet schema details doc
* Fix community hierarchy guard
* Use better empty hierarchy guard
* Back-compat shims
* Semver
* Fix warning
* Format
* Remove default fallback
* Reuse key
* update gitignore
* add dynamic community sleection to updated main branch
* update SearchResult to record output_tokens.
* update search result
* dynamic search working
* format
* add llm_calls_categories and prompt_tokens and output_tokens cate
* update
* formatting
* log drift search output and prompt tokens separately
* update global_search.ipynb. update operate dulce dataset and add create_final_communities. update dynamic community selection init
* add .ipynb back to cspell.config.yaml
* format
* add notebook example on dynamic search
* rearrange
* update gitignore
* format code
* code format
* code format
* fix default variable
---------
Co-authored-by: Bryan Li <bryanlimy@gmail.com>
* New workflow to generate embeddings in a single workflow
* New workflow to generate embeddings in a single workflow
* version change
* clean tests without any embeddings references
* clean tests without any embeddings references
* remove code
* feedback implemented
* changes in logic
* feedback implemented
* store in table bug fixed
* smoke test for generate_text_embeddings workflow
* smoke test fix
* add generate_text_embeddings to the list of transient workflows
* smoke tests
* fix
* ruff formatting updates
* fix
* smoke test fixed
* smoke test fixed
* fix lancedb import
* smoke test fix
* ignore sorting
* smoke test fixed
* smoke test fixed
* check smoke test
* smoke test fixed
* change config for vector store
* format fix
* vector store changes
* revert debug profile back to empty filepath
* merge conflict solved
* merge conflict solved
* format fixed
* format fixed
* fix return dataframe
* snapshot fix
* format fix
* embeddings param implemented
* validation fixes
* fix map
* fix map
* fix properties
* config updates
* smoke test fixed
* settings change
* Update collection config and rework back-compat
* Repalce . with - for embedding store
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>