diff --git a/autogen/oai/client.py b/autogen/oai/client.py index 56167f978..70251353f 100644 --- a/autogen/oai/client.py +++ b/autogen/oai/client.py @@ -77,6 +77,8 @@ class ModelClient(Protocol): class Message(Protocol): content: Optional[str] + message: Message + choices: List[Choice] model: str diff --git a/notebook/agentchat_custom_model.ipynb b/notebook/agentchat_custom_model.ipynb index b58b5d93a..a35753ad8 100644 --- a/notebook/agentchat_custom_model.ipynb +++ b/notebook/agentchat_custom_model.ipynb @@ -94,7 +94,9 @@ " class ModelClientResponseProtocol(Protocol):\n", " class Choice(Protocol):\n", " class Message(Protocol):\n", - " content: str | None\n", + " content: Optional[str]\n", + "\n", + " message: Message\n", "\n", " choices: List[Choice]\n", " model: str\n", diff --git a/website/blog/2024-01-26-Custom-Models/index.mdx b/website/blog/2024-01-26-Custom-Models/index.mdx index 796b0c00d..81a9ad383 100644 --- a/website/blog/2024-01-26-Custom-Models/index.mdx +++ b/website/blog/2024-01-26-Custom-Models/index.mdx @@ -122,7 +122,9 @@ class ModelClient(Protocol): class ModelClientResponseProtocol(Protocol): class Choice(Protocol): class Message(Protocol): - content: str | None + content: Optional[str] + + message: Message choices: List[Choice] model: str diff --git a/website/docs/FAQ.md b/website/docs/FAQ.md index e69c2f71d..c342d4619 100644 --- a/website/docs/FAQ.md +++ b/website/docs/FAQ.md @@ -89,7 +89,10 @@ In version >=1, OpenAI renamed their `api_base` parameter to `base_url`. So for ### Can I use non-OpenAI models? -Yes. Autogen can work with any API endpoint which complies with OpenAI-compatible RESTful APIs - e.g. serving local LLM via FastChat or LM Studio. Please check https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs for an example. +Yes. You currently have two options: + +- Autogen can work with any API endpoint which complies with OpenAI-compatible RESTful APIs - e.g. serving local LLM via FastChat or LM Studio. Please check https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs for an example. +- You can supply your own custom model implementation and use it with Autogen. Please check https://microsoft.github.io/autogen/blog/2024/01/26/Custom-Models for more information. ## Handle Rate Limit Error and Timeout Error diff --git a/website/docs/Use-Cases/enhanced_inference.md b/website/docs/Use-Cases/enhanced_inference.md index f49c677bf..9f73a0efe 100644 --- a/website/docs/Use-Cases/enhanced_inference.md +++ b/website/docs/Use-Cases/enhanced_inference.md @@ -107,9 +107,6 @@ The tuned config can be used to perform inference. ## API unification - - `autogen.OpenAIWrapper.create()` can be used to create completions for both chat and non-chat models, and both OpenAI API and Azure OpenAI API. ```python @@ -133,7 +130,7 @@ print(client.extract_text_or_completion_object(response)) For local LLMs, one can spin up an endpoint using a package like [FastChat](https://github.com/lm-sys/FastChat), and then use the same API to send a request. See [here](/blog/2023/07/14/Local-LLMs) for examples on how to make inference with local LLMs. - +For custom model clients, one can register the client with `autogen.OpenAIWrapper.register_model_client` and then use the same API to send a request. See [here](/blog/2024/01/26/Custom-Models) for examples on how to make inference with custom model clients. ## Usage Summary @@ -166,6 +163,8 @@ Total cost: 0.00027 * Model 'gpt-3.5-turbo': cost: 0.00027, prompt_tokens: 50, completion_tokens: 100, total_tokens: 150 ``` +Note: if using a custom model client (see [here](/blog/2024/01/26/Custom-Models) for details) and if usage summary is not implemented, then the usage summary will not be available. + ## Caching API call results are cached locally and reused when the same request is issued. @@ -241,13 +240,6 @@ The differences between autogen's `cache_seed` and openai's `seed`: ### Runtime error - One can pass a list of configurations of different models/endpoints to mitigate the rate limits and other runtime error. For example, ```python @@ -268,12 +260,16 @@ client = OpenAIWrapper( { "model": "llama2-chat-7B", "base_url": "http://127.0.0.1:8080", + }, + { + "model": "microsoft/phi-2", + "model_client_cls": "CustomModelClient" } ], ) ``` -`client.create()` will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, and a locally hosted llama2-chat-7B one by one, +`client.create()` will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, a locally hosted llama2-chat-7B, and phi-2 using a custom model client class named `CustomModelClient`, one by one, until a valid result is returned. This can speed up the development process where the rate limit is a bottleneck. An error will be raised if the last choice fails. So make sure the last choice in the list has the best availability. For convenience, we provide a number of utility functions to load config lists.