303 Commits

Author SHA1 Message Date
Kenny Zhang
14b1eccbff partially resolved merge conflicts 2024-12-19 17:13:14 -05:00
Kenny Zhang
9ca67643b4 partially fixed merge conflicts 2024-12-18 15:10:45 -05:00
Kenny Zhang
82548e11f8 refactored collection_name variable naming 2024-12-04 12:58:49 -05:00
Kenny Zhang
bf5c72dec0 tested cosmosdb vector store querying 2024-12-03 15:22:29 -05:00
Kenny Zhang
c3e2394304 tested cosmosdb vector_store indexing 2024-12-03 12:03:19 -05:00
Kenny Zhang
dccd4aee68 modified query string to return all cols 2024-11-26 17:20:02 -05:00
Kenny Zhang
68dfb20961 modified factory class 2024-11-26 16:09:14 -05:00
Kenny Zhang
6d9ec16efb added filter_by_id function 2024-11-26 15:09:19 -05:00
Kenny Zhang
01db1424a1 implemented similarity search methods 2024-11-26 10:50:14 -05:00
Kenny Zhang
fe2e718f8a implemented load_document and search_by_id methods 2024-11-25 14:14:52 -05:00
Kenny Zhang
b50a7a8e70 implemented container creation and deletion functions 2024-11-22 14:56:37 -05:00
Kenny Zhang
72306b2529 implemented database creation and deletion functions 2024-11-22 13:56:28 -05:00
Kenny Zhang
863363d086 added cosmosdb vector store class outline 2024-11-22 13:33:49 -05:00
Kenny Zhang
9d899fc400 removed some whitespace 2024-11-21 15:19:41 -05:00
Kenny Zhang
232cd07762 simplified create_database and create_container functions 2024-11-20 14:26:18 -05:00
Kenny Zhang
76511d0180 collapsed cosmosdb schema to use minimal containers and databases 2024-11-19 15:30:59 -05:00
Kenny Zhang
c5281bb79a tested query for cosmosdb 2024-11-19 14:45:59 -05:00
Kenny Zhang
31c0a7a316 added cosmosdb functionality to query pipeline 2024-11-18 14:53:06 -05:00
Kenny Zhang
6eb61342c0 fixed more merge conflicts 2024-11-18 11:52:22 -05:00
Kenny Zhang
594f332606 Merge branch 'main' of github.com:microsoft/graphrag into add-cosmosdb-to-storage 2024-11-18 11:49:41 -05:00
Kenny Zhang
dac0b861bd merged with main and resolved conflicts 2024-11-18 11:47:51 -05:00
Alonso Guevara
6d21ef2683
Release v0.5.0 (#1415) v0.5.0 2024-11-18 00:06:54 -06:00
Josh Bradley
22a57d14c7
Improve CLI speed with lazy imports (#1319) 2024-11-15 19:41:10 -05:00
Nathan Evans
9b4f24ebce
First cut at config cleanup (#1411)
* Firsst cut at config cleanup

* Reorder top nav

* Add query prompts to tuning page

* Remove dynamic notebook from nav

* Add more thorough yml config descriptions in docs

* Further clean out the config

* Semver

* Add new blog post

* Emphasize yaml

* Clarify output

* Fix unit test

* Fix bullet nesting
2024-11-15 14:33:26 -08:00
Kenny Zhang
0d93d0d305 added basic support for parquet emitter using internal conversions 2024-11-15 15:54:03 -05:00
Kenny Zhang
5e5f76d281 readded initial non-parquet emitter fix 2024-11-15 14:31:11 -05:00
Kenny Zhang
66641d66d7 removed nested try statement 2024-11-15 13:46:52 -05:00
Nathan Evans
425dbc60e3
Docs update (#1408)
* Fix footer contrast

* Fix broken links

* Remove a few unneeded examples

* Point python API example to the whole folder

* Convert schema bullets to tables
2024-11-14 21:26:29 -06:00
JunHo Kim (김준호)
ec9cdcce4d
fix typo. Correct the wording "global search" to "drift search" in drift search documentation (#1383)
Updated the wording of the example scenario from "global search" to "drift search" to accurately reflect the topic. This improves clarity and ensures the documentation accurately describes its content.

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-11-14 16:55:44 -06:00
Jeff Baumes
0a5801041a
Fix documentation for generate_indexing_prompts (#1336)
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-11-14 16:53:59 -06:00
Kenny Zhang
6d1a4d9914 Merge branch 'main' of github.com:microsoft/graphrag into add-cosmosdb-to-storage 2024-11-14 17:01:28 -05:00
Kenny Zhang
65c93bb098 reverted merged changed from closed branch 2024-11-14 17:01:10 -05:00
Kenny Zhang
716bfa4083 require base_dir to be typed as str 2024-11-14 15:42:43 -05:00
Alonso Guevara
c90166ca32
Add Parquet as part of the default emitters when not present (#1407)
Add Parquet as part of the default emitters when not pressent
2024-11-14 13:04:19 -06:00
Nathan Evans
51912b2e03
Move prompts (#1404)
* Move indexing prompts to root

* Move query prompts to root

* Export query prompts during init

* Extract general knowledge prompt

* Load query prompts from disk

* Semver

* Fix unit tests
2024-11-14 10:45:37 -08:00
Kenny Zhang
d1fc4f05df removed extraneous container_name setting 2024-11-14 12:49:33 -05:00
Kenny Zhang
d6c3afcaad first successful run of cosmosdb indexing 2024-11-14 12:24:42 -05:00
Kenny Zhang
0982efe6e0 Merge remote-tracking branch 'origin/fix/non-default-emitters' into add-cosmosdb-to-storage 2024-11-14 01:28:49 -05:00
Alonso Guevara
297066c168 ruff 2024-11-13 18:41:26 -06:00
Alonso Guevara
ea7a404098 Ruff 2024-11-13 18:23:56 -06:00
Alonso Guevara
d206e673a6 Format 2024-11-13 18:19:06 -06:00
Alonso Guevara
6d2427e118 Fix non-default emitters 2024-11-13 18:15:40 -06:00
Nathan Evans
c8c354e357
Artifact cleanup (#1341)
* Add source documents for verb tests

* Remove entity_type erroneous column

* Add new test data

* Remove source/target degree columns

* Remove top_level_node_id

* Remove chunk column configs

* Rename "chunk" to "text"

* Rename "chunk" to "text" in base

* Re-map document input to use base text units

* Revert base text units as final documents dep

* Update test data

* Split/rename node source_id

* Drop node size (dup of degree)

* Drop document_ids from covariates

* Remove unused document_ids from models

* Remove n_tokens from covariate table

* Fix missed document_ids delete

* Wire base text units to final documents

* Rename relationship rank as combined_degree

* Add rank as first-class property to Relationship

* Remove split_text operation

* Fix relationships test parquet

* Update test parquets

* Add entity ids to community table

* Remove stored graph embedding columns

* Format

* Semver

* Fix JSON typo

* Spelling

* Rename lancedb

* Sort lancedb

* Fix unit test

* Fix test to account for changing period

* Update tests for separate embeddings

* Format

* Better assertion printing

* Fix unit test for windows

* Rename document.raw_content -> document.text

* Remove read_documents function

* Remove unused document summary from model

* Remove unused imports

* Format

* Add new snapshots to default init

* Use util to construct embeddings collection name

* Align inc index model with branch changes

* Update data and tests for int ids

* Clean up embedding locs

* Switch entity "name" to "title" for consistency

* Fix short_id -> human_readable_id defaults

* Format

* Rework community IDs

* Fix community size compute

* Fix unit tests

* Fix report read

* Pare down nodes table output

* Fix unit test

* Fix merge

* Fix community loading

* Format

* Fix community id report extraction

* Update tests

* Consistent short IDs and ordering

* Update ordering and tests

* Update incremental for new nodes model

* Guard document columns loc

* Match column ordering

* Fix document guard

* Update smoke tests

* Fill NA on community extract

* Logging for smoke test debug

* Add parquet schema details doc

* Fix community hierarchy guard

* Use better empty hierarchy guard

* Back-compat shims

* Semver

* Fix warning

* Format

* Remove default fallback

* Reuse key
2024-11-13 15:11:19 -08:00
Kenny Zhang
e0a0546958 modified cosmosdb setter to require json 2024-11-12 16:28:11 -05:00
Kenny Zhang
5436166450 Merge branch 'main' of github.com:microsoft/graphrag into add-cosmosdb-to-storage 2024-11-12 13:28:09 -05:00
Alonso Guevara
e53422366d
Implement dynamic community selection for global search (#1396)
* update gitignore

* add dynamic community sleection to updated main branch

* update SearchResult to record output_tokens.

* update search result

* dynamic search working

* format

* add llm_calls_categories and prompt_tokens and output_tokens cate

* update

* formatting

* log drift search output and prompt tokens separately

* update global_search.ipynb. update operate dulce dataset and add create_final_communities. update dynamic community selection init

* add .ipynb back to cspell.config.yaml

* format

* add notebook example on dynamic search

* rearrange

* update gitignore

* format code

* code format

* code format

* fix default variable

---------

Co-authored-by: Bryan Li <bryanlimy@gmail.com>
2024-11-11 16:45:07 -08:00
Kenny Zhang
73d1e42a6c Merge branch 'main' of github.com:microsoft/graphrag into add-cosmosdb-to-storage 2024-11-11 10:40:11 -05:00
Alonso Guevara
ba50caab4d
Release v0.4.1 (#1387)
* Release v0.4.1

* Spellcheck
v0.4.1
2024-11-08 17:59:57 -06:00
Kenny Zhang
a76eb54b2f replaced primary key cosmosdb initialization with connection strings 2024-11-08 13:05:12 -05:00
Kenny Zhang
b263569167 Merge branch 'main' of github.com:microsoft/graphrag into add-cosmosdb-to-storage 2024-11-07 15:08:05 -05:00