LLM API Parameters

May 17, 2024

When working with an LLM API, there are a few parameters that are crucial in order to get the best results for your use case.


The temperature increases or decreases the weights of the other possible tokens. The lower the value, the more deterministic the results because the highest probably next token will always get picked.

The temperature parameter controls the randomness of the generated text. In short, the lower the temperature, the more deterministic the generated text will be, while the higher the temperature, the more creative the generated text will be.

For QA tasks, a lower temperature is recommended, while for creative tasks a higher temperature is recommended.

Top P

Nucleus sampling or top-p sampling allows you to to control if only the tokens comprising the top_p probability mass should be considered for the responses.

For factual awnsers, a lower top_p is recommended, while for creative tasks a higher top_p is recommended.

It's recommended to alter temperature or top_p, but not both.

Max Length

Defines the maximum number of tokens the model generates. This can help you prevent long responses and control costs.

Stop Sequences

Another way to control the length and structure of results.

For example, you can tell the model to generate lists that do not exceed a certain number of elements.

Frequency Penalty

The frequency penalty increases or decreases the probability of generating tokens that are already in the response or prompt.

Presence Penalty

Similar to the frequency penalty, but the penalty is the same for all repeated tokens. A token that appears twice and a token that appears 10 times are penalized the same.

It's recommended to alter frequency_penalty or presence_penalty, but not both.