15 Commits

Author SHA1 Message Date
Nathan Evans
94f1e62e5c
Rework workflow architecture (#1311)
* Rename pipeline_storage file

* Add runtime storage option to context

* Fix import

* Switch to memory storage for runtime

* Infra for workflow runtime storage

* Migrate base_text_units to runtime storage

* Fix comment

* Semver

* Remove whitespace

* Remove subflow smoke tests and ignore transient artifacts

* Remove entity graph from transient list (not yet implemented)

* Increase smoke runtime allotment for create_base_entity_graph

* Revert format fix

* Remove noqa
2024-10-24 10:20:03 -07:00
Nathan Evans
1f70d42572
Empty workflow returns (#1291)
* Skip emitting empty dataframes

* Semver

* Better empty df check
2024-10-17 09:25:36 -07:00
Nathan Evans
ce5b1207e0
Collapse graph documents workflows (#1284)
* Copy base documents logic into final documents

* Delete create_base_documents

* Combine graph creation under create_base_entity_graph

* Delete collapsed workflows

* Migrate most graph internals to nx.Graph

* Fix None edge case

* Semver

* Remove comment typo

* Fix smoke tests
2024-10-15 13:58:58 -06:00
Nathan Evans
61b3d6d56a
Migrate helper verbs (#1248)
* Remove genid

* Move snapshot_rows

* Move snapshot

* Delete spread_json

* Delete unzip

* Delete zip

* Move unpack_graph

* Move compute_edge_combined_degree

* Delete create_graph

* Delete concat

* Delete text replace

* Delete text_translate

* Move text_split

* Inline aggregate override

* Move cluster_graph

* Move merge_graphs

* Semver

* Move text_chunk

* Move layout_graph and fix some __init__s

* Move extract_covariates

* Rename text_split -> split_text

* Move extract_entities

* Move summarize_descriptions

* Rename text_chunk -> chunk_text

* Move community report creation

* Remove verb-level packing operators

* Streamline some naming

* Streamline param name/order

* Move mock LLM data to tests

* Fixed missed rename

* Update some strategy refs

* Rename run_gi

* Inject mock responses into integ test config
2024-10-09 13:46:44 -07:00
Nathan Evans
f5c5876dde
Reorganize flows (#1240)
* Extract base docs and entity graph

* Move extracted entities and text units

* Move communities and community reports

* Move covariates and final documents

* Move entities, nodes, relationships

* Move text_units and summarized entities

* Assert all snapshot null cases

* Remove disabled steps util

* Remove incorrect use of input "others"

* Convert text_embed_df to just return the embeddings, not update the df

* Convert snapshot functions to noops

* Semver

* Remove lingering covariates_enabled param

* Name consistency

* Syntax cleanup
2024-10-02 08:57:08 -07:00
Nathan Evans
5220bb7ecc
Collapse create base entity graph (#1233)
* Collapse create_base_entity_graph

* Format/typing

* Semver

* Fix smoke tests

* Simplify assignment
2024-09-30 15:39:42 -07:00
Nathan Evans
00d5e77568
Collapse create final community reports (#1227)
* Remove extraneous param

* Add community report mocking assertions

* Collapse primary report generation

* Collapse embeddings

* Format

* Semver

* Remove extraneous check

* Move option set
2024-09-30 10:46:07 -07:00
Nathan Evans
ce71bcf7fb
Collapse create final entities (#1220)
* Collapse create_final_entities

* Update smoke tests

* Semver

* Remove prints

* Update embedding assertions
2024-09-25 17:35:44 -07:00
Nathan Evans
73e709b686
Collapse create final covariates (#1215)
* Add covariate test

* Add detailed mock assertions

* Collapse create_final_covariates

* Delete unused doc_id field

* Semver

* Update smoke test

* Remove unused subject/object type columns
2024-09-25 16:30:22 -07:00
Nathan Evans
f518c8b80b
Collapse relationship embeddings (#1199)
* Merge text_embed into a single relationships subflow

* Update smoke tests

* Semver

* Spelling
2024-09-24 15:03:26 -07:00
Nathan Evans
1755afbdec
Collapse create base text units (#1178)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Setup initial test and config

* Collapse create_base_text_units

* Semver

* Spelling

* Fix smoke tests

* Addres PR comments

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-23 16:55:53 -07:00
Nathan Evans
fbc483e4e5
Collapse create base documents (#1176)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Clean up some df/tests
2024-09-23 13:24:06 -07:00
Nathan Evans
aa5b426f1d
Collapse final communities workflow (#1150)
* Collapse create_final_communities

* Semver

* Spellcheck

* Clean up filtering

* Add space in title

* Format

* Cleanup imports and format

* Spruce up the tests

* Update dictionary.txt

* Spellcheck

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-17 17:04:42 -07:00
Nathan Evans
a473265580
Collapse verbs: create_final_text_units (#1143)
* Load default config in verb tests

* Load proper workflow config

* Collapse text unit pre-embedding steps

* Format

* Update smoke tests

* Semver

* Format

* Merge join* subflows into create_final_text_units

* Remove join_text_units_to_covariate_ids

* Format

* Remove join_text_units_to_entity_ids

* Remove join_text_units_to_relationship_ids

* Clean up merges and aggregations

* Remove unnecessary cast
2024-09-17 10:32:25 -07:00
Nathan Evans
2de302ff0d
Verb merge nre1 (#1140)
* Setup basic verb test runner

* Replace join_text_units_to_entity_ids with subflow

* Update comments

* Replace join_text_units_to_relationship_ids subflow

* Roll in final select

* Reuse assertion util

* Small fix + format

* Format/typing

* Semver

* Format/typing

* Semver

* Revert format changes

* Fix smoke test subworkflow count

* Edit subworkflows for another smoke test
2024-09-16 12:10:29 -07:00