--- title: "Llama.cpp" id: integrations-llama-cpp description: "Llama.cpp integration for Haystack" slug: "/integrations-llama-cpp" --- ## Module haystack\_integrations.components.generators.llama\_cpp.generator ### LlamaCppGenerator Provides an interface to generate text using LLM via llama.cpp. [llama.cpp](https://github.com/ggml-org/llama.cpp) is a project written in C/C++ for efficient inference of LLMs. It employs the quantized GGUF format, suitable for running these models on standard machines (even without GPUs). Usage example: ```python from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator generator = LlamaCppGenerator(model="zephyr-7b-beta.Q4_0.gguf", n_ctx=2048, n_batch=512) print(generator.run("Who is the best American actor?", generation_kwargs={"max_tokens": 128})) # {'replies': ['John Cusack'], 'meta': [{"object": "text_completion", ...}]} ``` #### LlamaCppGenerator.\_\_init\_\_ ```python def __init__(model: str, n_ctx: Optional[int] = 0, n_batch: Optional[int] = 512, model_kwargs: Optional[Dict[str, Any]] = None, generation_kwargs: Optional[Dict[str, Any]] = None) ``` **Arguments**: - `model`: The path of a quantized model for text generation, for example, "zephyr-7b-beta.Q4_0.gguf". If the model path is also specified in the `model_kwargs`, this parameter will be ignored. - `n_ctx`: The number of tokens in the context. When set to 0, the context will be taken from the model. - `n_batch`: Prompt processing maximum batch size. - `model_kwargs`: Dictionary containing keyword arguments used to initialize the LLM for text generation. These keyword arguments provide fine-grained control over the model loading. In case of duplication, these kwargs override `model`, `n_ctx`, and `n_batch` init parameters. For more information on the available kwargs, see [llama.cpp documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/`llama_cpp.Llama.__init__`). - `generation_kwargs`: A dictionary containing keyword arguments to customize text generation. For more information on the available kwargs, see [llama.cpp documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/`llama_cpp.Llama.create_completion`). #### LlamaCppGenerator.run ```python @component.output_types(replies=List[str], meta=List[Dict[str, Any]]) def run( prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None ) -> Dict[str, Union[List[str], List[Dict[str, Any]]]] ``` Run the text generation model on the given prompt. **Arguments**: - `prompt`: the prompt to be sent to the generative model. - `generation_kwargs`: A dictionary containing keyword arguments to customize text generation. For more information on the available kwargs, see [llama.cpp documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/`llama_cpp.Llama.create_completion`). **Returns**: A dictionary with the following keys: - `replies`: the list of replies generated by the model. - `meta`: metadata about the request.