Merge pull request #341 from charitarthchugh/charitarthchugh/vllm-defaults-speedup

Add chunked prefill and limit mm per prompt options
This commit is contained in:
Jake Poznanski 2025-10-06 13:23:47 -07:00 committed by GitHub
commit 2b70b50312
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -636,6 +636,8 @@ async def vllm_server_task(model_name_or_path, args, semaphore, unknown_args=Non
str(args.tensor_parallel_size),
"--data-parallel-size",
str(args.data_parallel_size),
"--enable-chunked-prefill",
"--limit-mm-per-prompt '{\"video\": 0}'"
]
if args.gpu_memory_utilization is not None: