mirror of
https://github.com/microsoft/autogen.git
synced 2025-08-10 17:51:22 +00:00

* Preparing content for MDX v3 * upgrade dcusarus to v3 * upgrade to v3 * merge main to branch * space * change node version to 18 * merge main * change setup-node version * bug fix: added escape when needed * added escape characters in docstrings * upgraded docusaurus to 3.0.1 * polishing * restored commented out link * rename file * removed backtick * Add support for MD files in Docusaurus * Add support for MD files in Docusaurus * Add support for MD files in Docusaurus * polishing * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * merge with main and build scripts added --------- Co-authored-by: Davor Runje <davor@airt.ai> Co-authored-by: Shaokun Zhang <shaokunzhang529@gmail.com> Co-authored-by: Victor Dibia <victordibia@microsoft.com>
148 lines
4.2 KiB
Markdown
148 lines
4.2 KiB
Markdown
---
|
|
title: Use AutoGen for Local LLMs
|
|
authors: jialeliu
|
|
tags: [LLM]
|
|
---
|
|
**TL;DR:**
|
|
We demonstrate how to use autogen for local LLM application. As an example, we will initiate an endpoint using [FastChat](https://github.com/lm-sys/FastChat) and perform inference on [ChatGLMv2-6b](https://github.com/THUDM/ChatGLM2-6B).
|
|
|
|
## Preparations
|
|
|
|
### Clone FastChat
|
|
|
|
FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs. However, its code needs minor modification in order to function properly.
|
|
|
|
```bash
|
|
git clone https://github.com/lm-sys/FastChat.git
|
|
cd FastChat
|
|
```
|
|
|
|
### Download checkpoint
|
|
|
|
ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. ChatGLM2-6B is its second-generation version.
|
|
|
|
Before downloading from HuggingFace Hub, you need to have Git LFS [installed](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage).
|
|
|
|
```bash
|
|
git clone https://huggingface.co/THUDM/chatglm2-6b
|
|
```
|
|
|
|
## Initiate server
|
|
|
|
First, launch the controller
|
|
|
|
```bash
|
|
python -m fastchat.serve.controller
|
|
```
|
|
|
|
Then, launch the model worker(s)
|
|
|
|
```bash
|
|
python -m fastchat.serve.model_worker --model-path chatglm2-6b
|
|
```
|
|
|
|
Finally, launch the RESTful API server
|
|
|
|
```bash
|
|
python -m fastchat.serve.openai_api_server --host localhost --port 8000
|
|
```
|
|
|
|
Normally this will work. However, if you encounter error like [this](https://github.com/lm-sys/FastChat/issues/1641), commenting out all the lines containing `finish_reason` in `fastchat/protocol/api_protocol.py` and `fastchat/protocol/openai_api_protocol.py` will fix the problem. The modified code looks like:
|
|
|
|
```python
|
|
class CompletionResponseChoice(BaseModel):
|
|
index: int
|
|
text: str
|
|
logprobs: Optional[int] = None
|
|
# finish_reason: Optional[Literal["stop", "length"]]
|
|
|
|
class CompletionResponseStreamChoice(BaseModel):
|
|
index: int
|
|
text: str
|
|
logprobs: Optional[float] = None
|
|
# finish_reason: Optional[Literal["stop", "length"]] = None
|
|
```
|
|
|
|
|
|
## Interact with model using `oai.Completion` (requires openai<1)
|
|
|
|
Now the models can be directly accessed through openai-python library as well as `autogen.oai.Completion` and `autogen.oai.ChatCompletion`.
|
|
|
|
|
|
```python
|
|
from autogen import oai
|
|
|
|
# create a text completion request
|
|
response = oai.Completion.create(
|
|
config_list=[
|
|
{
|
|
"model": "chatglm2-6b",
|
|
"base_url": "http://localhost:8000/v1",
|
|
"api_type": "open_ai",
|
|
"api_key": "NULL", # just a placeholder
|
|
}
|
|
],
|
|
prompt="Hi",
|
|
)
|
|
print(response)
|
|
|
|
# create a chat completion request
|
|
response = oai.ChatCompletion.create(
|
|
config_list=[
|
|
{
|
|
"model": "chatglm2-6b",
|
|
"base_url": "http://localhost:8000/v1",
|
|
"api_type": "open_ai",
|
|
"api_key": "NULL",
|
|
}
|
|
],
|
|
messages=[{"role": "user", "content": "Hi"}]
|
|
)
|
|
print(response)
|
|
```
|
|
|
|
If you would like to switch to different models, download their checkpoints and specify model path when launching model worker(s).
|
|
|
|
## interacting with multiple local LLMs
|
|
|
|
If you would like to interact with multiple LLMs on your local machine, replace the `model_worker` step above with a multi model variant:
|
|
|
|
```bash
|
|
python -m fastchat.serve.multi_model_worker \
|
|
--model-path lmsys/vicuna-7b-v1.3 \
|
|
--model-names vicuna-7b-v1.3 \
|
|
--model-path chatglm2-6b \
|
|
--model-names chatglm2-6b
|
|
```
|
|
|
|
The inference code would be:
|
|
|
|
```python
|
|
from autogen import oai
|
|
|
|
# create a chat completion request
|
|
response = oai.ChatCompletion.create(
|
|
config_list=[
|
|
{
|
|
"model": "chatglm2-6b",
|
|
"base_url": "http://localhost:8000/v1",
|
|
"api_type": "open_ai",
|
|
"api_key": "NULL",
|
|
},
|
|
{
|
|
"model": "vicuna-7b-v1.3",
|
|
"base_url": "http://localhost:8000/v1",
|
|
"api_type": "open_ai",
|
|
"api_key": "NULL",
|
|
}
|
|
],
|
|
messages=[{"role": "user", "content": "Hi"}]
|
|
)
|
|
print(response)
|
|
```
|
|
|
|
## For Further Reading
|
|
|
|
* [Documentation](/docs/Getting-Started) about `autogen`.
|
|
* [Documentation](https://github.com/lm-sys/FastChat) about FastChat.
|