Skip to content

Technical endpoints

Here we present the various technical endpoints : /v1/info, /metrics, /liveness and /readiness, /v1/models

/v1/info (GET)

This endpoints gives various information on the application and the model. The format of the output is as follows:

{
  "application": "happy_vllm",
  "version": "1.0.0",
  "vllm_version": "0.4.0.post1",
  "model_name": "The best LLM",
  "truncation_side": "right",
  "max_length": 32768
}

/metrics (GET)

We remind you that this endpoint is the only one not prefixed by the api_endpoint_prefix. This endpoint is generated by a prometheus client. To have more details on which metrics are available, please refer to vLLM documentation

/liveness (GET)

Checks if the API is live. The format of the output is as follows:

{
  "alive": "ok"
}

/readiness (GET)

Checks if the API is ready. The format of the output is as follows:

{
  "ready": "ok"
}

If the API is not ready, the value is "ko"

/v1/models (GET)

The Open AI compatible endpoint used, for example, to get the name of the model. Mimicks the vLLM implementation. Getting the name of the model is important since it is needed to use the Open AI compatible generating endpoints

/v1/launch_arguments (GET)

If when launched the arguments --with-launch-arguments is given, this route gives all the parameters given when launching the application. In particular, the arguments given to the vLLM engine. It is useful for the benchmarks done by benchmark_llm_serving