Nathan Evans
94f1e62e5c
Rework workflow architecture ( #1311 )
...
* Rename pipeline_storage file
* Add runtime storage option to context
* Fix import
* Switch to memory storage for runtime
* Infra for workflow runtime storage
* Migrate base_text_units to runtime storage
* Fix comment
* Semver
* Remove whitespace
* Remove subflow smoke tests and ignore transient artifacts
* Remove entity graph from transient list (not yet implemented)
* Increase smoke runtime allotment for create_base_entity_graph
* Revert format fix
* Remove noqa
2024-10-24 10:20:03 -07:00
Nathan Evans
1f70d42572
Empty workflow returns ( #1291 )
...
* Skip emitting empty dataframes
* Semver
* Better empty df check
2024-10-17 09:25:36 -07:00
Nathan Evans
ce5b1207e0
Collapse graph documents workflows ( #1284 )
...
* Copy base documents logic into final documents
* Delete create_base_documents
* Combine graph creation under create_base_entity_graph
* Delete collapsed workflows
* Migrate most graph internals to nx.Graph
* Fix None edge case
* Semver
* Remove comment typo
* Fix smoke tests
2024-10-15 13:58:58 -06:00
Nathan Evans
61b3d6d56a
Migrate helper verbs ( #1248 )
...
* Remove genid
* Move snapshot_rows
* Move snapshot
* Delete spread_json
* Delete unzip
* Delete zip
* Move unpack_graph
* Move compute_edge_combined_degree
* Delete create_graph
* Delete concat
* Delete text replace
* Delete text_translate
* Move text_split
* Inline aggregate override
* Move cluster_graph
* Move merge_graphs
* Semver
* Move text_chunk
* Move layout_graph and fix some __init__s
* Move extract_covariates
* Rename text_split -> split_text
* Move extract_entities
* Move summarize_descriptions
* Rename text_chunk -> chunk_text
* Move community report creation
* Remove verb-level packing operators
* Streamline some naming
* Streamline param name/order
* Move mock LLM data to tests
* Fixed missed rename
* Update some strategy refs
* Rename run_gi
* Inject mock responses into integ test config
2024-10-09 13:46:44 -07:00
Nathan Evans
f5c5876dde
Reorganize flows ( #1240 )
...
* Extract base docs and entity graph
* Move extracted entities and text units
* Move communities and community reports
* Move covariates and final documents
* Move entities, nodes, relationships
* Move text_units and summarized entities
* Assert all snapshot null cases
* Remove disabled steps util
* Remove incorrect use of input "others"
* Convert text_embed_df to just return the embeddings, not update the df
* Convert snapshot functions to noops
* Semver
* Remove lingering covariates_enabled param
* Name consistency
* Syntax cleanup
2024-10-02 08:57:08 -07:00
Nathan Evans
5220bb7ecc
Collapse create base entity graph ( #1233 )
...
* Collapse create_base_entity_graph
* Format/typing
* Semver
* Fix smoke tests
* Simplify assignment
2024-09-30 15:39:42 -07:00
Nathan Evans
00d5e77568
Collapse create final community reports ( #1227 )
...
* Remove extraneous param
* Add community report mocking assertions
* Collapse primary report generation
* Collapse embeddings
* Format
* Semver
* Remove extraneous check
* Move option set
2024-09-30 10:46:07 -07:00
Nathan Evans
ce71bcf7fb
Collapse create final entities ( #1220 )
...
* Collapse create_final_entities
* Update smoke tests
* Semver
* Remove prints
* Update embedding assertions
2024-09-25 17:35:44 -07:00
Nathan Evans
73e709b686
Collapse create final covariates ( #1215 )
...
* Add covariate test
* Add detailed mock assertions
* Collapse create_final_covariates
* Delete unused doc_id field
* Semver
* Update smoke test
* Remove unused subject/object type columns
2024-09-25 16:30:22 -07:00
Nathan Evans
f518c8b80b
Collapse relationship embeddings ( #1199 )
...
* Merge text_embed into a single relationships subflow
* Update smoke tests
* Semver
* Spelling
2024-09-24 15:03:26 -07:00
Nathan Evans
1755afbdec
Collapse create base text units ( #1178 )
...
* Collapse non-attribute verbs
* Include document_column_attributes in collapse
* Remove merge_override verb
* Semver
* Setup initial test and config
* Collapse create_base_text_units
* Semver
* Spelling
* Fix smoke tests
* Addres PR comments
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-23 16:55:53 -07:00
Nathan Evans
fbc483e4e5
Collapse create base documents ( #1176 )
...
* Collapse non-attribute verbs
* Include document_column_attributes in collapse
* Remove merge_override verb
* Semver
* Clean up some df/tests
2024-09-23 13:24:06 -07:00
Nathan Evans
aa5b426f1d
Collapse final communities workflow ( #1150 )
...
* Collapse create_final_communities
* Semver
* Spellcheck
* Clean up filtering
* Add space in title
* Format
* Cleanup imports and format
* Spruce up the tests
* Update dictionary.txt
* Spellcheck
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-17 17:04:42 -07:00
Nathan Evans
a473265580
Collapse verbs: create_final_text_units ( #1143 )
...
* Load default config in verb tests
* Load proper workflow config
* Collapse text unit pre-embedding steps
* Format
* Update smoke tests
* Semver
* Format
* Merge join* subflows into create_final_text_units
* Remove join_text_units_to_covariate_ids
* Format
* Remove join_text_units_to_entity_ids
* Remove join_text_units_to_relationship_ids
* Clean up merges and aggregations
* Remove unnecessary cast
2024-09-17 10:32:25 -07:00
Nathan Evans
2de302ff0d
Verb merge nre1 ( #1140 )
...
* Setup basic verb test runner
* Replace join_text_units_to_entity_ids with subflow
* Update comments
* Replace join_text_units_to_relationship_ids subflow
* Roll in final select
* Reuse assertion util
* Small fix + format
* Format/typing
* Semver
* Format/typing
* Semver
* Revert format changes
* Fix smoke test subworkflow count
* Edit subworkflows for another smoke test
2024-09-16 12:10:29 -07:00