mirror of
https://github.com/microsoft/autogen.git
synced 2025-11-03 03:10:04 +00:00
change price ratio (#1130)
This commit is contained in:
parent
eae65ac22b
commit
7665f73e4b
@ -24,7 +24,7 @@ We will use FLAML to perform model selection and inference parameter tuning. The
|
||||
|
||||
We use FLAML to select between the following models with a target inference budget $0.02 per instance:
|
||||
- gpt-3.5-turbo, a relatively cheap model that powers the popular ChatGPT app
|
||||
- gpt-4, the state of the art LLM that costs more than 100 times of gpt-3.5-turbo
|
||||
- gpt-4, the state of the art LLM that costs more than 10 times of gpt-3.5-turbo
|
||||
|
||||
We adapt the models using 20 examples in the train set, using the problem statement as the input and generating the solution as the output. We use the following inference parameters:
|
||||
|
||||
|
||||
@ -10,7 +10,7 @@ tags: [LLM, GPT, research]
|
||||
* **A case study using the HumanEval benchmark shows that an adaptive way of using multiple GPT models can achieve both much higher accuracy (from 68% to 90%) and lower inference cost (by 18%) than using GPT-4 for coding.**
|
||||
|
||||
|
||||
GPT-4 is a big upgrade of foundation model capability, e.g., in code and math, accompanied by a much higher (more than 100x) price per token to use over GPT-3.5-Turbo. On a code completion benchmark, [HumanEval](https://huggingface.co/datasets/openai_humaneval), developed by OpenAI, GPT-4 can successfully solve 68% tasks while GPT-3.5-Turbo does 46%. It is possible to increase the success rate of GPT-4 further by generating multiple responses or making multiple calls. However, that will further increase the cost, which is already nearly 20 times of using GPT-3.5-Turbo and with more restricted API call rate limit. Can we achieve more with less?
|
||||
GPT-4 is a big upgrade of foundation model capability, e.g., in code and math, accompanied by a much higher (more than 10x) price per token to use over GPT-3.5-Turbo. On a code completion benchmark, [HumanEval](https://huggingface.co/datasets/openai_humaneval), developed by OpenAI, GPT-4 can successfully solve 68% tasks while GPT-3.5-Turbo does 46%. It is possible to increase the success rate of GPT-4 further by generating multiple responses or making multiple calls. However, that will further increase the cost, which is already nearly 20 times of using GPT-3.5-Turbo and with more restricted API call rate limit. Can we achieve more with less?
|
||||
|
||||
In this blog post, we will explore a creative, adaptive way of using GPT models which leads to a big leap forward.
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user