Technical endpoints

Here we present the various technical endpoints : /v1/info, /metrics, /liveness and /readiness, /v1/models

/v1/info (GET)

This endpoints gives various information on the application and the model. The format of the output is as follows:

{
  "application": "happy_vllm",
  "version": "1.0.0",
  "vllm_version": "0.4.0.post1",
  "model_name": "The best LLM",
  "truncation_side": "right",
  "max_length": 32768,
  "extra_information": {}
}

In order to add a non empty dictionnary to the field "extra_information", you should pass the path to .json when initiating the API via the --extra-information argument.

/metrics (GET)

We remind you that this endpoint is the only one not prefixed by the api_endpoint_prefix. This endpoint is generated by a prometheus client. To have more details on which metrics are available, please refer to vLLM documentation

/liveness (GET)

Checks if the API is live. The format of the output is as follows:

{
  "alive": "ok"
}

/readiness (GET)

Checks if the API is ready. The format of the output is as follows:

{
  "ready": "ok"
}

If the API is not ready, the value is "ko"

/v1/models (GET)

The Open AI compatible endpoint used, for example, to get the name of the model. Mimicks the vLLM implementation.

/v1/launch_arguments (GET)

If when launched the arguments --with-launch-arguments is given, this route gives all the parameters given when launching the application. In particular, the arguments given to the vLLM engine. It is useful for the benchmarks done by benchmark_llm_serving