Nathan Evans
ede6a74546
Pipeline callbacks ( #1729 )
...
* Add pipeline_start and pipeline_end callbacks
* Collapse redundant callback/logger logic
* Remove redundant reporting config classes
* Remove a few out-of-date type ignores
* Semver
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-25 15:07:51 -08:00
Alonso Guevara
7bdeaee94a
Create Language Model Providers and Registry methods. Remove fnllm coupling ( #1724 )
...
* Base structure
* Add fnllm providers and Mock LLM
* Remove fnllm coupling, introduce llm providers
* Ruff + Tests fix
* Spellcheck
* Semver
* Format
* Default MockChat params
* Fix more tests
* Fix embedding smoke test
* Fix embeddings smoke test
* Fix MockEmbeddingLLM
* Rename LLM to model. Package organization
* Fix prompt tuning
* Oops
* Oops II
2025-02-20 08:56:20 -06:00
Josh Bradley
f14cda2b6d
Improve default llm retry logic to be more optimized ( #1701 )
2025-02-13 16:56:37 -05:00
Nathan Evans
c02ab0984a
Streamline workflows ( #1674 )
...
* Remove create_final_nodes
* Rename final entity output to "entities"
* Remove duplicate code from graph extraction
* Rename create_final_relationships output to "relationships"
* Rename create_final_communities output to "communities"
* Combine compute_communities and create_final_communities
* Rename create_final_covariates output to "covariates"
* Rename create_final_community_reports output to "community_reports"
* Rename create_final_text_units output to "text_units"
* Rename create_final_documents output to "documents"
* Remove transient snapshots config
* Move create_final_entities to finalize_entities operation
* Move create_final_relationships flow to finalize_relationships operation
* Reuse some community report functions
* Collapse most of graph and text unit-based report generation
* Unify schemas files
* Move community reports extractor
* Move NLP report prompt to prompts folder
* Fix a few pandas warnings
* Rename embeddings config to embed_text
* Rename claim_extraction config to extract_claims
* Remove nltk from standard graph extraction
* Fix verb tests
* Fix extract graph config naming
* Fix moved file reference
* Create v1-to-v2 migration notebook
* Semver
* Fix smoke test artifact count
* Raise tpm/rpm on smoke tests
* Update drift settings for smoke tests
* Reuse project directory var in api notebook
* Format
* Format
2025-02-07 11:11:03 -08:00
Nathan Evans
a2647da473
Simplify flow config ( #1554 )
...
* Flatten compute_communities config
* Remove cluster strategy type
* Flatten create_base_text_units config
* Move cluster seed to config default, leave as None in functions
* Remove "prechunked" logic
* Remove hard-coded encoding model
* Remove unused variables
* Strongly type embed_config
* Simplify layout_graph config
* Semver
* Fix integration test
* Fix config unit tests: ignore new config defaults
* Remove pipeline integ test
2024-12-27 16:38:36 -08:00
Nathan Evans
c1c09bab80
Flow cleanup ( #1510 )
...
* Move snapshots out of flows into verbs
* Move degree compute out of extract_graph
* Move entity/relationship df merging into extract
* Move "title" to extraction source
* Move text_unit_ids agg closer to extraction
* Move data definition
* Update test data
* Semver
* Update smoke tests
* Fix empty degree field and update smoke tests and verb data
* Move extractors (#1516 )
* Consolidate graph embedding and umap
* Consolidate claim extraction
* Consolidate graph extractor
* Move graph utils
* Move summarizers
* Semver
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Fix syntax typo
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 18:07:44 -08:00
Nathan Evans
d0543d1fd6
Move extractors ( #1516 )
...
* Consolidate graph embedding and umap
* Consolidate claim extraction
* Consolidate graph extractor
* Move graph utils
* Move summarizers
* Semver
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 16:21:41 -08:00
Chris Trevino
5ff2d3c76d
Remove graphrag.llm, replace with fnllm ( #1315 )
...
* add fnllm; remove llm folder
* remove llm unit tests
* update imports
* update imports
* formatting
* enable autosave
* update mockllm
* update community reports extractor
* move most llm usage to fnllm
* update type issues
* fix unit tests
* type updates
* update dictionary
* semver
* update llm construction, get integration tests working
* load from llmparameters model
* move ruff settings to ruff.toml
* add gitattributes file
* ignore ruff.toml spelling
* update .gitattributes
* update gitignore
* update config construction
* update prompt var usage
* add cache adapter
* use cache adapter in embeddings calls
* update embedding strategy
* add fnllm
* add pytest-dotenv
* fix some verb tests
* get verbtests running
* update ruff.toml for vscode
* enable ruff native server in vscode
* update artifact inspecting code
* remove local-test update
* use string.replace instead of string.format in community reprots etxractor
* bump timeout
* revert ruff.toml, vscode settings for another pr
* revert cspell config
* revert gitignore
* remove json-repair, update fnllm
* use fnllm generic type interfaces
* update load_llm to use target models
* consolidate chat parameters
* add 'extra_attributes' prop to community report response
* formatting
* update fnllm
* formatting
* formatting
* Add defaults to some llm params to avoid null on params hash
* Formatting
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-05 18:07:47 -06:00
Nathan Evans
c8c354e357
Artifact cleanup ( #1341 )
...
* Add source documents for verb tests
* Remove entity_type erroneous column
* Add new test data
* Remove source/target degree columns
* Remove top_level_node_id
* Remove chunk column configs
* Rename "chunk" to "text"
* Rename "chunk" to "text" in base
* Re-map document input to use base text units
* Revert base text units as final documents dep
* Update test data
* Split/rename node source_id
* Drop node size (dup of degree)
* Drop document_ids from covariates
* Remove unused document_ids from models
* Remove n_tokens from covariate table
* Fix missed document_ids delete
* Wire base text units to final documents
* Rename relationship rank as combined_degree
* Add rank as first-class property to Relationship
* Remove split_text operation
* Fix relationships test parquet
* Update test parquets
* Add entity ids to community table
* Remove stored graph embedding columns
* Format
* Semver
* Fix JSON typo
* Spelling
* Rename lancedb
* Sort lancedb
* Fix unit test
* Fix test to account for changing period
* Update tests for separate embeddings
* Format
* Better assertion printing
* Fix unit test for windows
* Rename document.raw_content -> document.text
* Remove read_documents function
* Remove unused document summary from model
* Remove unused imports
* Format
* Add new snapshots to default init
* Use util to construct embeddings collection name
* Align inc index model with branch changes
* Update data and tests for int ids
* Clean up embedding locs
* Switch entity "name" to "title" for consistency
* Fix short_id -> human_readable_id defaults
* Format
* Rework community IDs
* Fix community size compute
* Fix unit tests
* Fix report read
* Pare down nodes table output
* Fix unit test
* Fix merge
* Fix community loading
* Format
* Fix community id report extraction
* Update tests
* Consistent short IDs and ordering
* Update ordering and tests
* Update incremental for new nodes model
* Guard document columns loc
* Match column ordering
* Fix document guard
* Update smoke tests
* Fill NA on community extract
* Logging for smoke test debug
* Add parquet schema details doc
* Fix community hierarchy guard
* Use better empty hierarchy guard
* Back-compat shims
* Semver
* Fix warning
* Format
* Remove default fallback
* Reuse key
2024-11-13 15:11:19 -08:00
Nathan Evans
ce5b1207e0
Collapse graph documents workflows ( #1284 )
...
* Copy base documents logic into final documents
* Delete create_base_documents
* Combine graph creation under create_base_entity_graph
* Delete collapsed workflows
* Migrate most graph internals to nx.Graph
* Fix None edge case
* Semver
* Remove comment typo
* Fix smoke tests
2024-10-15 13:58:58 -06:00
Nathan Evans
61b3d6d56a
Migrate helper verbs ( #1248 )
...
* Remove genid
* Move snapshot_rows
* Move snapshot
* Delete spread_json
* Delete unzip
* Delete zip
* Move unpack_graph
* Move compute_edge_combined_degree
* Delete create_graph
* Delete concat
* Delete text replace
* Delete text_translate
* Move text_split
* Inline aggregate override
* Move cluster_graph
* Move merge_graphs
* Semver
* Move text_chunk
* Move layout_graph and fix some __init__s
* Move extract_covariates
* Rename text_split -> split_text
* Move extract_entities
* Move summarize_descriptions
* Rename text_chunk -> chunk_text
* Move community report creation
* Remove verb-level packing operators
* Streamline some naming
* Streamline param name/order
* Move mock LLM data to tests
* Fixed missed rename
* Update some strategy refs
* Rename run_gi
* Inject mock responses into integ test config
2024-10-09 13:46:44 -07:00
Alonso Guevara
81b81cf60b
Initial Release
2024-07-01 15:25:30 -06:00