- The [llm-instruction-eval-openai.ipynb](llm-instruction-eval-openai.ipynb) notebook uses OpenAI's GPT-4 to evaluate responses generated by instruction finetuned models. It works with a JSON file in the following format:
```python
{
"instruction": "What is the atomic number of helium?",
"input": "",
"output": "The atomic number of helium is 2.", # <--Thetargetgiveninthetestset
"model 1 response": "\nThe atomic number of helium is 2.0.", # <--ResponsebyanLLM
"model 2 response": "\nThe atomic number of helium is 3." # <--Responsebya2ndLLM
## Evaluating Instruction Responses Locally Using Ollama
- The [llm-instruction-eval-ollama.ipynb](llm-instruction-eval-ollama.ipynb) notebook offers an alternative to the one above, utilizing a locally downloaded Llama 3 model via Ollama.