Skip to content

Endpoints

happy_vLLM provides several endpoints which cover most of the use cases. Feel free to open an issue or a PR if you would like to add an endpoint. All these endpoints (except for the /metrics endpoint) are prefixed by, well, a prefix which by default is absent.

Technical endpoints

/v1/info (GET)

Provides information on the API and the model (more details here)

/metrics (GET)

The technical metrics obtained for prometheus (more details here)

/liveness (GET)

The liveness endpoint (more details here)

/readiness (GET)

The readiness endpoint (more details here)

/v1/models (GET)

The Open AI compatible endpoint used, for example, to get the name of the model. Mimicks the vLLM implementation (more details here)

/v1/launch_arguments (GET)

Gives all the arguments used when launching the application. --with-launch-arguments must be activated (more details here)

Generating endpoints

/v1/completions and /v1/chat/completions (POST)

These two endpoints mimick the ones of vLLM. They follow the Open AI contract and you can find more details in the vLLM documentation

/v1/abort_request (POST)

Aborts a running request

Tokenizer endpoints

/v1/tokenizer (POST) ⚠ Deprecated

Used to tokenizer a text (more details here)

/v2/tokenizer (POST)

Used to tokenizer a text (more details here)

/v1/decode (POST) ⚠ Deprecated

Used to decode a list of token ids (more details here)

/v2/decode (POST)

Used to decode a list of token ids (more details here)

Data manipulation endpoints

/v1/metadata_text (POST)

Used to know which part of a prompt will be truncated (more details here)

/v1/split_text (POST)

Splits a text on some separators, for example to prepare for some RAG (more details here)

Embeddings endpoint

/v1/embeddings

Used to obtain the embeddings of a text (more details here)

Lora endpoints

/v1/load_lora_adapter (POST)

Load a specific Lora adapter (more details in vLLM documentation)

/v1/unload_lora_adapter (POST)

Unload a Lora adapter (more details in vLLM documentation)

Transcription endpoints

/v1/audio/transcriptions

Used to obtain the transcription of an audio file (more details here)