添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

vLLM supports generative and pooling models across various tasks. If a model supports more than one task, you can set the task via the --task argument.

For each task, we list the model architectures that have been implemented in vLLM. Alongside each architecture, we include some popular models that use it.

Model Implementation

vLLM

If vLLM natively supports a model, its implementation can be found in vllm/model_executor/models .

These models are what we list in supported-text-models and supported-mm-models .

Transformers

vLLM also supports model implementations that are available in Transformers. This does not currently work for all models, but most decoder language models are supported, and vision language model support is planned!

To check if the modeling backend is Transformers, you can simply do this:

from vllm import LLM
llm = LLM(model=..., task="generate")  # Name or path of your model
llm.apply_model(lambda model: print(type(model)))

If it is TransformersForCausalLM then it means it's based on Transformers!

You can force the use of TransformersForCausalLM by setting model_impl="transformers" for offline-inference or --model-impl transformers for the openai-compatible-server.

vLLM may not fully optimise the Transformers implementation so you may see degraded performance if comparing a native model to a Transformers model in vLLM.

Custom models

If a model is neither supported natively by vLLM or Transformers, it can still be used in vLLM!

For a model to be compatible with the Transformers backend for vLLM it must:

  • be a Transformers compatible custom model (see Transformers - Customizing models):
  •