Dayenne Souza
b94290ec2b
add option to add metadata into text chunks ( #1681 )
...
* add new options
* add metadata json into input document
* remove doc change
* add metadata column into text loader
* prepend_metadata
* run fix
* fix tests and patch
* fix test
* add watrning for metadata tokens > config size
* fix typo and run fix
* fix test_integration
* fix test
* run check
* rename and fix chunking
* fix
* fix
* fiz test verbs
* fix
* fix tests
* fix chunking
* fix index
* fix cosmos test
* fix vars
* fix after PR
* fix
2025-02-12 09:38:03 -08:00
Nathan Evans
c02ab0984a
Streamline workflows ( #1674 )
...
* Remove create_final_nodes
* Rename final entity output to "entities"
* Remove duplicate code from graph extraction
* Rename create_final_relationships output to "relationships"
* Rename create_final_communities output to "communities"
* Combine compute_communities and create_final_communities
* Rename create_final_covariates output to "covariates"
* Rename create_final_community_reports output to "community_reports"
* Rename create_final_text_units output to "text_units"
* Rename create_final_documents output to "documents"
* Remove transient snapshots config
* Move create_final_entities to finalize_entities operation
* Move create_final_relationships flow to finalize_relationships operation
* Reuse some community report functions
* Collapse most of graph and text unit-based report generation
* Unify schemas files
* Move community reports extractor
* Move NLP report prompt to prompts folder
* Fix a few pandas warnings
* Rename embeddings config to embed_text
* Rename claim_extraction config to extract_claims
* Remove nltk from standard graph extraction
* Fix verb tests
* Fix extract graph config naming
* Fix moved file reference
* Create v1-to-v2 migration notebook
* Semver
* Fix smoke test artifact count
* Raise tpm/rpm on smoke tests
* Update drift settings for smoke tests
* Reuse project directory var in api notebook
* Format
* Format
2025-02-07 11:11:03 -08:00
Dayenne Souza
ad5b5120ec
remove unused columns and rename document_attribute_columns ( #1672 )
...
* remove unused columns and change property document_attribute_columns to metadata
* format file
* fix 'metadata' column on output
* run check
* fix test on nltk
* remove docs changes
2025-02-03 14:37:06 -03:00