graphiti/graphiti_core/helpers.py

"""
Copyright 2024, Zep Software, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

import os
from datetime import datetime

import numpy as np
from dotenv import load_dotenv
from neo4j import time as neo4j_time

load_dotenv()

DEFAULT_DATABASE = os.getenv('DEFAULT_DATABASE', None)
USE_PARALLEL_RUNTIME = bool(os.getenv('USE_PARALLEL_RUNTIME', False))
MAX_REFLEXION_ITERATIONS = 2
DEFAULT_PAGE_LIMIT = 20


def parse_db_date(neo_date: neo4j_time.DateTime | None) -> datetime | None:
    return neo_date.to_native() if neo_date else None


def lucene_sanitize(query: str) -> str:
    # Escape special characters from a query before passing into Lucene
    # + - && || ! ( ) { } [ ] ^ " ~ * ? : \ /
    escape_map = str.maketrans(
        {
            '+': r'\+',
            '-': r'\-',
            '&': r'\&',
            '|': r'\|',
            '!': r'\!',
            '(': r'\(',
            ')': r'\)',
            '{': r'\{',
            '}': r'\}',
            '[': r'\[',
            ']': r'\]',
            '^': r'\^',
            '"': r'\"',
            '~': r'\~',
            '*': r'\*',
            '?': r'\?',
            ':': r'\:',
            '\\': r'\\',
            '/': r'\/',
        }
    )

    sanitized = query.translate(escape_map)
    return sanitized


def normalize_l2(embedding: list[float]) -> list[float]:
    embedding_array = np.array(embedding)
    if embedding_array.ndim == 1:
        norm = np.linalg.norm(embedding_array)
        if norm == 0:
            return embedding_array.tolist()
        return (embedding_array / norm).tolist()
    else:
        norm = np.linalg.norm(embedding_array, 2, axis=1, keepdims=True)
        return (np.where(norm == 0, embedding_array, embedding_array / norm)).tolist()
Search refactor + Community search (#111) * WIP * WIP * WIP * community search * WIP * WIP * integration tested * tests * tests * mypy * mypy * format 2024-09-16 14:03:05 -04:00			`"""`
			`Copyright 2024, Zep Software, Inc.`

			`Licensed under the Apache License, Version 2.0 (the "License");`
			`you may not use this file except in compliance with the License.`
			`You may obtain a copy of the License at`

			`http://www.apache.org/licenses/LICENSE-2.0`

			`Unless required by applicable law or agreed to in writing, software`
			`distributed under the License is distributed on an "AS IS" BASIS,`
			`WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`See the License for the specific language governing permissions and`
			`limitations under the License.`
			`"""`

Make default DB explicit (#195) * add default database * update * init tests * update test * bump version * removed unused imports 2024-10-21 12:33:32 -04:00			`import os`
Add Missing Node and edge CRUD (#51) * add CRUD operations and fix search limit bugs * format * update tests * å * update tests to double limit call * add default field * format * import correct field 2024-08-27 16:18:01 -04:00			`from datetime import datetime`

Add mmr reranking (#180) * mmr start * add mmr function * normalize * add mmr options to search * update communities * build communities * format * clean up normalization * normalize in mmr * update 2024-10-08 13:55:10 -04:00			`import numpy as np`
load env in helper file (#196) * load env in helper file * bump version 2024-10-22 08:49:14 -04:00			`from dotenv import load_dotenv`
Add Missing Node and edge CRUD (#51) * add CRUD operations and fix search limit bugs * format * update tests * å * update tests to double limit call * add default field * format * import correct field 2024-08-27 16:18:01 -04:00			`from neo4j import time as neo4j_time`

load env in helper file (#196) * load env in helper file * bump version 2024-10-22 08:49:14 -04:00			`load_dotenv()`

Make default DB explicit (#195) * add default database * update * init tests * update test * bump version * removed unused imports 2024-10-21 12:33:32 -04:00			`DEFAULT_DATABASE = os.getenv('DEFAULT_DATABASE', None)`
Bulk add nodes and edges (#205) * test * only use parallel runtime if set to true * add and test bulk add * remove group_ids * format * bump version * update readme 2024-10-31 12:31:37 -04:00			`USE_PARALLEL_RUNTIME = bool(os.getenv('USE_PARALLEL_RUNTIME', False))`
add reflexion (#212) * add reflexion * clean up boolean logic * update conditional * cap reflexion iterations * don't do an extra reflection step 2024-11-13 11:58:56 -05:00			`MAX_REFLEXION_ITERATIONS = 2`
Pagination for get by group_id (#218) * add pagination to subgraphs * update pagination * update LiteralString import * cleanup * cleanup * update embedding dims 2024-12-02 11:17:37 -05:00			`DEFAULT_PAGE_LIMIT = 20`
Make default DB explicit (#195) * add default database * update * init tests * update test * bump version * removed unused imports 2024-10-21 12:33:32 -04:00
Add Missing Node and edge CRUD (#51) * add CRUD operations and fix search limit bugs * format * update tests * å * update tests to double limit call * add default field * format * import correct field 2024-08-27 16:18:01 -04:00
			`def parse_db_date(neo_date: neo4j_time.DateTime \| None) -> datetime \| None:`
			`return neo_date.to_native() if neo_date else None`
Add MSC benchmark and improve search performance (#157) * test cases * test * benchmark * eval updates * improve search performance * remove data * formatting * add None type to config * update sanitization * push version * maketrans update * mypy 2024-09-26 16:12:38 -04:00

			`def lucene_sanitize(query: str) -> str:`
			`# Escape special characters from a query before passing into Lucene`
test escape characters (#171) * test escape characters * format * tests * run tests * copyright 2024-10-03 10:08:30 -04:00			`# + - && \|\| ! ( ) { } [ ] ^ " ~ * ? : \ /`
Add MSC benchmark and improve search performance (#157) * test cases * test * benchmark * eval updates * improve search performance * remove data * formatting * add None type to config * update sanitization * push version * maketrans update * mypy 2024-09-26 16:12:38 -04:00			`escape_map = str.maketrans(`
			`{`
			`'+': r'\+',`
			`'-': r'\-',`
			`'&': r'\&',`
			`'\|': r'\\|',`
			`'!': r'\!',`
			`'(': r'\(',`
			`')': r'\)',`
			`'{': r'\{',`
			`'}': r'\}',`
			`'[': r'\[',`
			`']': r'\]',`
			`'^': r'\^',`
			`'"': r'\"',`
			`'~': r'\~',`
			`'': r'\',`
			`'?': r'\?',`
			`':': r'\:',`
			`'\\': r'\\',`
test escape characters (#171) * test escape characters * format * tests * run tests * copyright 2024-10-03 10:08:30 -04:00			`'/': r'\/',`
Add MSC benchmark and improve search performance (#157) * test cases * test * benchmark * eval updates * improve search performance * remove data * formatting * add None type to config * update sanitization * push version * maketrans update * mypy 2024-09-26 16:12:38 -04:00			`}`
			`)`

			`sanitized = query.translate(escape_map)`
			`return sanitized`
Add mmr reranking (#180) * mmr start * add mmr function * normalize * add mmr options to search * update communities * build communities * format * clean up normalization * normalize in mmr * update 2024-10-08 13:55:10 -04:00

			`def normalize_l2(embedding: list[float]) -> list[float]:`
			`embedding_array = np.array(embedding)`
			`if embedding_array.ndim == 1:`
			`norm = np.linalg.norm(embedding_array)`
			`if norm == 0:`
			`return embedding_array.tolist()`
			`return (embedding_array / norm).tolist()`
			`else:`
			`norm = np.linalg.norm(embedding_array, 2, axis=1, keepdims=True)`
			`return (np.where(norm == 0, embedding_array, embedding_array / norm)).tolist()`