graphrag/pyproject.toml

272 lines
7.1 KiB
TOML
Raw Normal View History

2024-09-11 16:45:43 -06:00
[tool.poetry]
name = "graphrag"
# Maintainers: do not change the version here manually, use ./scripts/release.sh
2025-05-23 15:19:29 -06:00
version = "2.3.0"
2024-10-24 14:22:32 -04:00
description = "GraphRAG: A graph-based retrieval-augmented generation (RAG) system."
2024-09-11 16:45:43 -06:00
authors = [
"Alonso Guevara Fernández <alonsog@microsoft.com>",
"Andrés Morales Esquivel <andresmor@microsoft.com>",
"Chris Trevino <chtrevin@microsoft.com>",
"David Tittsworth <datittsw@microsoft.com>",
"Dayenne de Souza <ddesouza@microsoft.com>",
"Derek Worthen <deworthe@microsoft.com>",
"Gaudy Blanco Meneses <gaudyb@microsoft.com>",
"Ha Trinh <trinhha@microsoft.com>",
"Jonathan Larson <jolarso@microsoft.com>",
"Josh Bradley <joshbradley@microsoft.com>",
"Kate Lytvynets <kalytv@microsoft.com>",
"Kenny Zhang <zhangken@microsoft.com>",
"Mónica Carvajal",
"Nathan Evans <naevans@microsoft.com>",
"Rodrigo Racanicci <rracanicci@microsoft.com>",
"Sarah Smith <smithsarah@microsoft.com>",
]
license = "MIT"
readme = "README.md"
packages = [{ include = "graphrag" }]
[tool.poetry.urls]
"Source" = "https://github.com/microsoft/graphrag"
2024-10-24 14:22:32 -04:00
[tool.poetry.scripts]
graphrag = "graphrag.cli.main:app"
2024-09-11 16:45:43 -06:00
[tool.poetry-dynamic-versioning]
enable = true
style = "pep440"
vcs = "git"
bump = true
format-jinja = """
{%- if distance == 0 -%}
{{ serialize_pep440(base, stage, revision) }}
{%- else -%}
{{ serialize_pep440(base, stage, revision, dev=distance) }}
{%- endif -%}
"""
[tool.poetry.dependencies]
python = ">=3.10,<3.13"
environs = "^11.0.0"
# Vector Stores
azure-search-documents = "^11.5.2"
lancedb = "^0.17.0"
2024-09-11 16:45:43 -06:00
# Async IO
aiofiles = "^24.1.0"
# LLM
fnllm = {extras = ["azure", "openai"], version = "^0.3.0"}
json-repair = "^0.30.3"
openai = "^1.68.0"
2024-09-11 16:45:43 -06:00
nltk = "3.9.1"
tiktoken = "^0.9.0"
2024-09-11 16:45:43 -06:00
# Data-Science
2024-09-11 16:45:43 -06:00
numpy = "^1.25.2"
graspologic = "^3.4.1"
networkx = "^3.4.2"
pandas = "^2.2.3"
pyarrow = ">=17.0.0"
umap-learn = "^0.5.6"
2024-09-11 16:45:43 -06:00
# Configuration
pyyaml = "^6.0.2"
python-dotenv = "^1.0.1"
2024-09-11 16:45:43 -06:00
pydantic = "^2.10.3"
rich = "^13.9.4"
2024-09-11 16:45:43 -06:00
devtools = "^0.12.2"
typing-extensions = "^4.12.2"
# Azure
Add Cosmos DB storage/cache option (#1431) * added cosmosdb constructor and database methods * added rest of abstract method headers * added cosmos db container methods * implemented has and delete methods * finished implementing abstract class methods * integrated class into storage factory * integrated cosmosdb class into cache factory * added support for new config file fields * replaced primary key cosmosdb initialization with connection strings * modified cosmosdb setter to require json * Fix non-default emitters * Format * Ruff * ruff * first successful run of cosmosdb indexing * removed extraneous container_name setting * require base_dir to be typed as str * reverted merged changed from closed branch * removed nested try statement * readded initial non-parquet emitter fix * added basic support for parquet emitter using internal conversions * merged with main and resolved conflicts * fixed more merge conflicts * added cosmosdb functionality to query pipeline * tested query for cosmosdb * collapsed cosmosdb schema to use minimal containers and databases * simplified create_database and create_container functions * ruff fixes and semversioner * spellcheck and ci fixes * updated pyproject toml and lock file * apply fixes after merge from main * add temporary comments * refactor cache factory * refactored storage factory * minor formatting * update dictionary * fix spellcheck typo * fix default value * fix pydantic model defaults * update pydantic models * fix init_content * cleanup how factory passes parameters to file storage * remove unnecessary output file type * update pydantic model * cleanup code * implemented clear method * fix merge from main * add test stub for cosmosdb * regenerate lock file * modified set method to collapse parquet rows * modified get method to collapse parquet rows * updated has and delete methods and docstrings to adhere to new schema * added prefix helper function * replaced delimiter for prefixed id * verified empty tests are passing * fix merges from main * add find test * update cicd step name * tested querying for new schema * resolved errors from merge conflicts * refactored set method to handle cache in new schema * refactored get method to handle cache in new schema * force unique ids to be written to cosmos for nodes * found bug with has and delete methods * modified has and delete to work with cache in new schema * fix the merge from main * minor typo fixes * update lock file * spellcheck fix * fix init function signature * minor formatting updates * remove https protocol * change localhost to 127.0.0.1 address * update pytest to use bacj engine * verified cache tests * improved speed of has function * resolved pytest error with find function * added test for child method * make container_name variable private as _container_name * minor variable name fix * cleanup cosmos pytest and make the cosmosdb storage class operations more efficient * update cicd to use different cosmosdb emulator * test with http protocol * added pytest for clear() * add longer timeout for cosmosdb emulator startup * revert http connection back to https * add comments to cicd code for future dev usage * set to container and database clients to none upon deletion * ruff changes * add comments to cicd code * removed unneeded None statements and ruff fixes * more ruff fixes * Update test_run.py * remove unnecessary call to delete container * ruff format updates * Reverted test_run.py * fix ruff formatter errors * cleanup variable names to be more consistent * remove extra semversioner file * revert pydantic model changes * revert pydantic model change * revert pydantic model change * re-enable inline formatting rule * update documentation in dev guide --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 14:43:21 -05:00
azure-cosmos = "^4.9.0"
azure-identity = "^1.19.0"
Add Cosmos DB storage/cache option (#1431) * added cosmosdb constructor and database methods * added rest of abstract method headers * added cosmos db container methods * implemented has and delete methods * finished implementing abstract class methods * integrated class into storage factory * integrated cosmosdb class into cache factory * added support for new config file fields * replaced primary key cosmosdb initialization with connection strings * modified cosmosdb setter to require json * Fix non-default emitters * Format * Ruff * ruff * first successful run of cosmosdb indexing * removed extraneous container_name setting * require base_dir to be typed as str * reverted merged changed from closed branch * removed nested try statement * readded initial non-parquet emitter fix * added basic support for parquet emitter using internal conversions * merged with main and resolved conflicts * fixed more merge conflicts * added cosmosdb functionality to query pipeline * tested query for cosmosdb * collapsed cosmosdb schema to use minimal containers and databases * simplified create_database and create_container functions * ruff fixes and semversioner * spellcheck and ci fixes * updated pyproject toml and lock file * apply fixes after merge from main * add temporary comments * refactor cache factory * refactored storage factory * minor formatting * update dictionary * fix spellcheck typo * fix default value * fix pydantic model defaults * update pydantic models * fix init_content * cleanup how factory passes parameters to file storage * remove unnecessary output file type * update pydantic model * cleanup code * implemented clear method * fix merge from main * add test stub for cosmosdb * regenerate lock file * modified set method to collapse parquet rows * modified get method to collapse parquet rows * updated has and delete methods and docstrings to adhere to new schema * added prefix helper function * replaced delimiter for prefixed id * verified empty tests are passing * fix merges from main * add find test * update cicd step name * tested querying for new schema * resolved errors from merge conflicts * refactored set method to handle cache in new schema * refactored get method to handle cache in new schema * force unique ids to be written to cosmos for nodes * found bug with has and delete methods * modified has and delete to work with cache in new schema * fix the merge from main * minor typo fixes * update lock file * spellcheck fix * fix init function signature * minor formatting updates * remove https protocol * change localhost to 127.0.0.1 address * update pytest to use bacj engine * verified cache tests * improved speed of has function * resolved pytest error with find function * added test for child method * make container_name variable private as _container_name * minor variable name fix * cleanup cosmos pytest and make the cosmosdb storage class operations more efficient * update cicd to use different cosmosdb emulator * test with http protocol * added pytest for clear() * add longer timeout for cosmosdb emulator startup * revert http connection back to https * add comments to cicd code for future dev usage * set to container and database clients to none upon deletion * ruff changes * add comments to cicd code * removed unneeded None statements and ruff fixes * more ruff fixes * Update test_run.py * remove unnecessary call to delete container * ruff format updates * Reverted test_run.py * fix ruff formatter errors * cleanup variable names to be more consistent * remove extra semversioner file * revert pydantic model changes * revert pydantic model change * revert pydantic model change * re-enable inline formatting rule * update documentation in dev guide --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 14:43:21 -05:00
azure-storage-blob = "^12.24.0"
2024-09-11 16:45:43 -06:00
future = "^1.0.0" # Needed until graspologic fixes their dependency
2025-06-02 14:20:21 -07:00
typer = "^0.16.0"
tqdm = "^4.67.1"
textblob = "^0.18.0.post0"
spacy = "^3.8.4"
2024-09-11 16:45:43 -06:00
[tool.poetry.group.dev.dependencies]
coverage = "^7.6.9"
ipykernel = "^6.29.5"
jupyter = "^1.1.1"
nbconvert = "^7.16.4"
poethepoet = "^0.31.1"
pyright = "^1.1.390"
pytest = "^8.3.4"
2024-09-11 16:45:43 -06:00
pytest-asyncio = "^0.24.0"
pytest-timeout = "^2.3.1"
ruff = "^0.8.2"
semversioner = "^2.0.5"
2024-09-11 16:45:43 -06:00
update-toml = "^0.2.1"
deptry = "^0.21.1"
mkdocs-material = "^9.5.48"
mkdocs-jupyter = "^0.25.1"
mkdocs-exclude-search = "^0.6.6"
Remove graphrag.llm, replace with fnllm (#1315) * add fnllm; remove llm folder * remove llm unit tests * update imports * update imports * formatting * enable autosave * update mockllm * update community reports extractor * move most llm usage to fnllm * update type issues * fix unit tests * type updates * update dictionary * semver * update llm construction, get integration tests working * load from llmparameters model * move ruff settings to ruff.toml * add gitattributes file * ignore ruff.toml spelling * update .gitattributes * update gitignore * update config construction * update prompt var usage * add cache adapter * use cache adapter in embeddings calls * update embedding strategy * add fnllm * add pytest-dotenv * fix some verb tests * get verbtests running * update ruff.toml for vscode * enable ruff native server in vscode * update artifact inspecting code * remove local-test update * use string.replace instead of string.format in community reprots etxractor * bump timeout * revert ruff.toml, vscode settings for another pr * revert cspell config * revert gitignore * remove json-repair, update fnllm * use fnllm generic type interfaces * update load_llm to use target models * consolidate chat parameters * add 'extra_attributes' prop to community report response * formatting * update fnllm * formatting * formatting * Add defaults to some llm params to avoid null on params hash * Formatting --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-05 16:07:47 -08:00
pytest-dotenv = "^0.5.2"
mkdocs-typer = "^0.0.3"
2024-09-11 16:45:43 -06:00
[build-system]
requires = ["poetry-core>=1.0.0", "poetry-dynamic-versioning>=1.0.0,<2.0.0"]
build-backend = "poetry_dynamic_versioning.backend"
[tool.poe.tasks]
_sort_imports = "ruff check --select I --fix ."
_format_code = "ruff format ."
_ruff_check = 'ruff check .'
2024-09-11 16:45:43 -06:00
_pyright = "pyright"
_convert_local_search_nb = 'jupyter nbconvert --output-dir=docsite/posts/query/notebooks/ --output="{notebook_name}_nb" --template=docsite/nbdocsite_template --to markdown examples_notebooks/local_search.ipynb'
_convert_global_search_nb = 'jupyter nbconvert --output-dir=docsite/posts/query/notebooks/ --output="{notebook_name}_nb" --template=docsite/nbdocsite_template --to markdown examples_notebooks/global_search.ipynb'
_semversioner_release = "semversioner release"
_semversioner_changelog = "semversioner changelog > CHANGELOG.md"
_semversioner_update_toml_version = "update-toml update --path tool.poetry.version --value $(poetry run semversioner current-version)"
semversioner_add = "semversioner add-change"
coverage_report = 'coverage report --omit "**/tests/**" --show-missing'
check_format = 'ruff format . --check'
fix = "ruff check --fix ."
fix_unsafe = "ruff check --fix --unsafe-fixes ."
2024-09-11 16:45:43 -06:00
_test_all = "coverage run -m pytest ./tests"
test_unit = "pytest ./tests/unit"
test_integration = "pytest ./tests/integration"
test_smoke = "pytest ./tests/smoke"
test_notebook = "pytest ./tests/notebook"
test_verbs = "pytest ./tests/verbs"
2024-10-24 14:22:32 -04:00
index = "python -m graphrag index"
update = "python -m graphrag update"
init = "python -m graphrag init"
2024-10-24 14:22:32 -04:00
query = "python -m graphrag query"
prompt_tune = "python -m graphrag prompt-tune"
2024-09-11 16:45:43 -06:00
# Pass in a test pattern
test_only = "pytest -s -k"
serve_docs = "mkdocs serve"
build_docs = "mkdocs build"
2024-09-11 16:45:43 -06:00
[[tool.poe.tasks.release]]
sequence = [
'_semversioner_release',
'_semversioner_changelog',
'_semversioner_update_toml_version',
]
ignore_fail = 'return_non_zero'
[[tool.poe.tasks.convert_docsite_notebooks]]
sequence = ['_convert_local_search_nb', '_convert_global_search_nb']
ignore_fail = 'return_non_zero'
[[tool.poe.tasks.format]]
sequence = ['_sort_imports', '_format_code']
ignore_fail = 'return_non_zero'
[[tool.poe.tasks.check]]
sequence = ['check_format', '_ruff_check', '_pyright']
ignore_fail = 'return_non_zero'
[[tool.poe.tasks.test]]
sequence = ['_test_all', 'coverage_report']
ignore_fail = 'return_non_zero'
[tool.ruff]
target-version = "py310"
extend-include = ["*.ipynb"]
[tool.ruff.format]
preview = true
2024-09-11 16:45:43 -06:00
docstring-code-format = true
docstring-code-line-length = 20
[tool.ruff.lint]
preview = true
2024-09-11 16:45:43 -06:00
select = [
"E4",
"E7",
"E9",
"W291",
"YTT",
"T10",
"ICN",
"INP",
"Q",
"RSE",
"SLOT",
"INT",
"FLY",
"LOG",
"C90",
"T20",
"D",
"RET",
"PD",
"N",
"PIE",
"SIM",
"S",
"G",
"ERA",
"ASYNC",
"TID",
"UP",
"SLF",
"BLE",
"C4",
"I",
"F",
"A",
"ARG",
"PTH",
"RUF",
"B",
"TCH",
"DTZ",
"PYI",
"PT",
"EM",
"TRY",
"PERF",
"CPY",
# "FBT", # use named arguments for boolean flags
# "TD", # todos
# "FIX", # fixme
# "FURB" # preview rules
# ANN # Type annotations, re-enable when we get bandwidth
]
ignore = [
# Ignore module names shadowing Python builtins
"A005",
# Conflicts with interface argument checking
"ARG002",
"ANN204",
# TODO: Inspect these pandas rules for validity
"PD002", # prevents inplace=True
# TODO RE-Enable when we get bandwidth
"PERF203", # Needs restructuring of errors, we should bail-out on first error
"C901", # needs refactoring to remove cyclomatic complexity
"B008", # Needs to restructure our cli params with Typer into constants
2024-09-11 16:45:43 -06:00
]
[tool.ruff.lint.per-file-ignores]
"tests/*" = ["S", "D", "ANN", "T201", "ASYNC", "ARG", "PTH", "TRY"]
"graphrag/index/config/*" = ["TCH"]
"*.ipynb" = ["T201"]
[tool.ruff.lint.flake8-builtins]
builtins-ignorelist = ["input", "id", "bytes"]
[tool.ruff.lint.pydocstyle]
convention = "numpy"
# https://github.com/microsoft/pyright/blob/9f81564a4685ff5c55edd3959f9b39030f590b2f/docs/configuration.md#sample-pyprojecttoml-file
[tool.pyright]
include = ["graphrag", "tests", "examples_notebooks"]
2024-09-11 16:45:43 -06:00
exclude = ["**/node_modules", "**/__pycache__"]
[tool.pytest.ini_options]
Add Cosmos DB storage/cache option (#1431) * added cosmosdb constructor and database methods * added rest of abstract method headers * added cosmos db container methods * implemented has and delete methods * finished implementing abstract class methods * integrated class into storage factory * integrated cosmosdb class into cache factory * added support for new config file fields * replaced primary key cosmosdb initialization with connection strings * modified cosmosdb setter to require json * Fix non-default emitters * Format * Ruff * ruff * first successful run of cosmosdb indexing * removed extraneous container_name setting * require base_dir to be typed as str * reverted merged changed from closed branch * removed nested try statement * readded initial non-parquet emitter fix * added basic support for parquet emitter using internal conversions * merged with main and resolved conflicts * fixed more merge conflicts * added cosmosdb functionality to query pipeline * tested query for cosmosdb * collapsed cosmosdb schema to use minimal containers and databases * simplified create_database and create_container functions * ruff fixes and semversioner * spellcheck and ci fixes * updated pyproject toml and lock file * apply fixes after merge from main * add temporary comments * refactor cache factory * refactored storage factory * minor formatting * update dictionary * fix spellcheck typo * fix default value * fix pydantic model defaults * update pydantic models * fix init_content * cleanup how factory passes parameters to file storage * remove unnecessary output file type * update pydantic model * cleanup code * implemented clear method * fix merge from main * add test stub for cosmosdb * regenerate lock file * modified set method to collapse parquet rows * modified get method to collapse parquet rows * updated has and delete methods and docstrings to adhere to new schema * added prefix helper function * replaced delimiter for prefixed id * verified empty tests are passing * fix merges from main * add find test * update cicd step name * tested querying for new schema * resolved errors from merge conflicts * refactored set method to handle cache in new schema * refactored get method to handle cache in new schema * force unique ids to be written to cosmos for nodes * found bug with has and delete methods * modified has and delete to work with cache in new schema * fix the merge from main * minor typo fixes * update lock file * spellcheck fix * fix init function signature * minor formatting updates * remove https protocol * change localhost to 127.0.0.1 address * update pytest to use bacj engine * verified cache tests * improved speed of has function * resolved pytest error with find function * added test for child method * make container_name variable private as _container_name * minor variable name fix * cleanup cosmos pytest and make the cosmosdb storage class operations more efficient * update cicd to use different cosmosdb emulator * test with http protocol * added pytest for clear() * add longer timeout for cosmosdb emulator startup * revert http connection back to https * add comments to cicd code for future dev usage * set to container and database clients to none upon deletion * ruff changes * add comments to cicd code * removed unneeded None statements and ruff fixes * more ruff fixes * Update test_run.py * remove unnecessary call to delete container * ruff format updates * Reverted test_run.py * fix ruff formatter errors * cleanup variable names to be more consistent * remove extra semversioner file * revert pydantic model changes * revert pydantic model change * revert pydantic model change * re-enable inline formatting rule * update documentation in dev guide --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 14:43:21 -05:00
asyncio_default_fixture_loop_scope = "function"
2024-09-11 16:45:43 -06:00
asyncio_mode = "auto"
timeout = 1000
Remove graphrag.llm, replace with fnllm (#1315) * add fnllm; remove llm folder * remove llm unit tests * update imports * update imports * formatting * enable autosave * update mockllm * update community reports extractor * move most llm usage to fnllm * update type issues * fix unit tests * type updates * update dictionary * semver * update llm construction, get integration tests working * load from llmparameters model * move ruff settings to ruff.toml * add gitattributes file * ignore ruff.toml spelling * update .gitattributes * update gitignore * update config construction * update prompt var usage * add cache adapter * use cache adapter in embeddings calls * update embedding strategy * add fnllm * add pytest-dotenv * fix some verb tests * get verbtests running * update ruff.toml for vscode * enable ruff native server in vscode * update artifact inspecting code * remove local-test update * use string.replace instead of string.format in community reprots etxractor * bump timeout * revert ruff.toml, vscode settings for another pr * revert cspell config * revert gitignore * remove json-repair, update fnllm * use fnllm generic type interfaces * update load_llm to use target models * consolidate chat parameters * add 'extra_attributes' prop to community report response * formatting * update fnllm * formatting * formatting * Add defaults to some llm params to avoid null on params hash * Formatting --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-05 16:07:47 -08:00
env_files = [".env"]