We demonstrate how to use autogen for local LLM application. As an example, we will initiate an endpoint using [FastChat](https://github.com/lm-sys/FastChat) and perform inference on [ChatGLMv2-6b](https://github.com/THUDM/ChatGLM2-6B).
FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs. However, its code needs minor modification in order to function properly.
```bash
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
```
### Download checkpoint
ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. ChatGLM2-6B is its second-generation version.
Before downloading from HuggingFace Hub, you need to have Git LFS [installed](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage).
Normally this will work. However, if you encounter error like [this](https://github.com/lm-sys/FastChat/issues/1641), commenting out all the lines containing `finish_reason` in `fastchat/protocol/api_protocol.py` and `fastchat/protocol/openai_api_protocol.py` will fix the problem. The modified code looks like: