graphrag/tests/unit/utils/test_embeddings.py

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

import pytest

from graphrag.utils.embeddings import create_collection_name


def test_create_collection_name():
    collection = create_collection_name("default", "entity.title")
    assert collection == "default-entity-title"


def test_create_collection_name_invalid_embedding_throws():
    with pytest.raises(KeyError):
        create_collection_name("default", "invalid.name")


def test_create_collection_name_invalid_embedding_does_not_throw():
    collection = create_collection_name("default", "invalid.name", validate=False)
    assert collection == "default-invalid-name"
Artifact cleanup (#1341) * Add source documents for verb tests * Remove entity_type erroneous column * Add new test data * Remove source/target degree columns * Remove top_level_node_id * Remove chunk column configs * Rename "chunk" to "text" * Rename "chunk" to "text" in base * Re-map document input to use base text units * Revert base text units as final documents dep * Update test data * Split/rename node source_id * Drop node size (dup of degree) * Drop document_ids from covariates * Remove unused document_ids from models * Remove n_tokens from covariate table * Fix missed document_ids delete * Wire base text units to final documents * Rename relationship rank as combined_degree * Add rank as first-class property to Relationship * Remove split_text operation * Fix relationships test parquet * Update test parquets * Add entity ids to community table * Remove stored graph embedding columns * Format * Semver * Fix JSON typo * Spelling * Rename lancedb * Sort lancedb * Fix unit test * Fix test to account for changing period * Update tests for separate embeddings * Format * Better assertion printing * Fix unit test for windows * Rename document.raw_content -> document.text * Remove read_documents function * Remove unused document summary from model * Remove unused imports * Format * Add new snapshots to default init * Use util to construct embeddings collection name * Align inc index model with branch changes * Update data and tests for int ids * Clean up embedding locs * Switch entity "name" to "title" for consistency * Fix short_id -> human_readable_id defaults * Format * Rework community IDs * Fix community size compute * Fix unit tests * Fix report read * Pare down nodes table output * Fix unit test * Fix merge * Fix community loading * Format * Fix community id report extraction * Update tests * Consistent short IDs and ordering * Update ordering and tests * Update incremental for new nodes model * Guard document columns loc * Match column ordering * Fix document guard * Update smoke tests * Fill NA on community extract * Logging for smoke test debug * Add parquet schema details doc * Fix community hierarchy guard * Use better empty hierarchy guard * Back-compat shims * Semver * Fix warning * Format * Remove default fallback * Reuse key 2024-11-13 15:11:19 -08:00			`# Copyright (c) 2024 Microsoft Corporation.`
			`# Licensed under the MIT License`

			`import pytest`

			`from graphrag.utils.embeddings import create_collection_name`


			`def test_create_collection_name():`
			`collection = create_collection_name("default", "entity.title")`
			`assert collection == "default-entity-title"`


			`def test_create_collection_name_invalid_embedding_throws():`
			`with pytest.raises(KeyError):`
			`create_collection_name("default", "invalid.name")`


			`def test_create_collection_name_invalid_embedding_does_not_throw():`
			`collection = create_collection_name("default", "invalid.name", validate=False)`
			`assert collection == "default-invalid-name"`