This notebook demonstrates the use of the logprobs parameter in the Chat Completions API. When logprobs is enabled, the API returns the log probabilities of each output token, along with a limited number of the most likely tokens at each token position and their log probabilities. The relevant request parameters are:
logprobs: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This option is currently not available on thegpt-4-vision-previewmodel.top_logprobs: An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability.logprobsmust be set to true if this parameter is used.
Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is log(p), where p = probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about logprobs:
- Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered.
- Logprob can be any negative number or
0.0.0.0corresponds to 100% probability. - Logprobs allow us to compute the joint probability of a sequence as the sum of the logprobs of the individual tokens. This is useful for scoring and ranking model outputs. Another common approach is to take the average per-token logprob of a sentence to choose the best generation.
- We can examine the
logprobsassigned to different candidate tokens to understand what options the model considered plausible or implausible.
While there are a wide array of use cases for logprobs, this notebook will focus on its use for:
- Classification tasks
- Large Language Models excel at many classification tasks, but accurately measuring the model's confidence in its outputs can be challenging.
logprobsprovide a probability associated with each class prediction, enabling users to set their own classification or confidence thresholds.
- Retrieval (Q&A) evaluation
logprobscan assist with self-evaluation in retrieval applications. In the Q&A example, the model outputs a contrivedhas_sufficient_context_for_answerboolean, which can serve as a confidence score of whether the answer is contained in the retrieved content. Evaluations of this type can reduce retrieval-based hallucinations and enhance accuracy.
- Autocomplete
logprobscould help us decide how to suggest words as a user is typing.
- Token highlighting and outputting bytes
- Users can easily create a token highlighter using the built in tokenization that comes with enabling
logprobs. Additionally, the bytes parameter includes the ASCII encoding of each output character, which is particularly useful for reproducing emojis and special characters.
- Calculating perplexity
logprobscan be used to help us assess the model's overall confidence in a result and help us compare the confidence of results from different prompts.