Model Settings Guide

Below you can see model settings. Let's go through each of them one by one.

Model

Our best model

Max output tokens

Temperature

Min P

Top P

Top K

Frequency Penalty

Presence Penalty

Repetition Penalty

DRY Sampler

Multiplier

Base

Allowed length

Range

Stop sequences

Enter sequence and press Enter or Tab

Select...

Model

This options lets you choose which model to use for completion. Currently, DreamGen offers 2 models:

Specifies the maximum number of output tokens the model is allowed to generate. Keep in mind that pricing is based on the number of output tokens. If the model cuts off in the middle of a sentence or a paragraph, it is most likely because it reached the maximum number of tokens. You can always hit the "Continue" button to generate more tokens.

Also keep in mind that models are limited in the number of input + output tokens in total.

In case of Story writing, input tokens include the full story.

In case of Role-play, input tokens include the full scenario description and as many messages as possible. Older messages that are not marked as "sticky" that would exceed the token limits are automatically removed.

Temperature

Models work trying to predict the next token in the sequence, given the input so far. Given an input, the model outputs a probability for all possible tokens. For example, given the input, "My favorite fruit is", the model may output the following probabilities:

Apple: 0.3
Banana: 0.25
Orange: 0.15
etc.

When generating output, we need to decide which token to pick, using these probabilities. This is where temperature, min p, top p and top k come in.

The base temperature is 1.0 — this means that we will pick tokens based on the original probabilities.

Temperature > 1.0 means that the probability distribution becomes more "flat" / "spread-out", therefore the probability of the most likely token decreases, and the probability of the less likely tokens increases.

Temperature < 1.0 means that the probability distribution becomes more "peaky", therefore the probability of the most likely token increases, and the probability of the less likely tokens decreases.

The folk wisdom is that with higher temperature you may achieve more surprising and creative results.

Min P

This option prevents tokens with low probabilities from being sampled. For example, if min_p = 0.05 (5%), and the top token probability is 0.9 (90%), then all tokens with probability < 0.05 * 0.9 = 0.045 (4.5%) will be ignored.

Top P

This option prevents tokens with low probabilities from being sampled. For example, if top_p = 0.9 (90%), then only the top "90%" of tokens will be considered, for example, if the top token probabilities are:

Apple: 0.7
Banana: 0.15
Orange: 0.1
etc.

Then only Apple and Banana will be considered, and Orange and everything below will be ignored, because 0.7 + 0.15 + 0.1 = 0.95 > 0.9. This can often reduce the output diversity / creativity of the model significantly.

Top K

This option prevents tokens with low probabilities from being sampled. For example, if top_k = 40, then only the top 40 tokens will be considered, and all other tokens will be ignored.

DRY Sampler

This sampler lets you penalize repeated sequences of tokens (words). It consists of these parameters:

Multiplier: The higher the value, the higher the penalty for repeated sequences, regardless of their length. Set to 0 to disable the penalty altogether.
Base: The higher the value, the higher the penalty for longer sequences.
Allowed length: Only token sequences that are longer than this will be affected.

The penalty formula is computed as follows:

penalty = multiplier * base^(length - allowed_length)

Where length is the length of the repeated sequence.

By setting the multiplier to 0, the penalty will be disabled.

By setting the base to 1, the penalty will be the same for all sequences of the same length.

Presence penalty

The base value for this option is 0, in which case it has no effect.

Values > 0 will make the model less likely to repeat tokens already in the output (tokens in input are ignored).

Values < 0 will make the model more likely to repeat tokens already in the output.

Models tend to repeat themselves, especially when generating long outputs. This option can help with that. At the same time, it can also hurt performance, since certain tokens are naturally recurring.

Frequency penalty

Very similar to the presence penalty, but it takes into account the frequency of the token in the output text, not just its presence.

Repetition penalty

Similar to presence and frequency penalties, but it takes into account also the token in the input, not just the output.

The base value for this option is 1, in which case it has no effect.

Values > 1 will make the model less likely to repeat tokens already in the input or output.

Values < 1 will make the model more likely to repeat tokens already in the input or output.

Because this option also takes input into account, it can be helpful to avoid repeating whatever already ocurred in the input so far.

Stop sequences

This option allows you to specify sequences that will make the model stop generating output.

For example for story-writing, if you want to let the model generate even the instructions, but want to tweak them afterwards (since the model is usually spot-on), you could use </instruction> as a stop sequence.

For role-play, common stop-sequence settings are taken care of for you (see below).

Do not generate instructions (opus-v1 only)

Prevents the model from generating instructions. Note that instructions help the model with writing -- they break down the writing problem into two parts: (1) what should happen; (2) how to write it.

Therefore not letting the model generate and not providing your own instruction can have a negative impact on the quality of the output.

Keep only last instruction

This option removes all but the last <instruction> block from the input before sending it to the model.

In case of role-play, instructions that are marked as "sticky" are also preserved.

This can be useful, as it makes it less likely the model will generate instructions on its own, and also saves you some tokens.

Role-play specific settings

Prune interactions above limit

When selected, interactions will be automatically pruned (removed from the prompt sent to the model) if they exceed the token limit. Non-sticky interactions will be removed before sticky interactions, and oldest interactions will be removed first.

Max total tokens

Overrides the maximum token context window (input + output tokens) for the model. This can be useful way to reduce costs. For example, if you are using a model with 12000 token context window, you can set the value to 4000 if your use case does not require more than that.

Set to 0 to disable.

Values below the minimum token context window (4000) will be ignored.

Values above the model's default maximum token context window will be ignored.

Minimum message length (opus-v1 only)

Forces the model to not end the message before it reaches the specified length. It is best to combine this with instructions, for example:

Length: 100 words
Speaker: Aeryn
Plot: Aeryn introduces herself and provides a background on her character.

When the value is set "unnaturally" high given the setting and context of the role-play, the model might resort to "merging" multiple messages into one, leading to undesired impersonation.

Model Settings Guide

Model

Max output tokens

Temperature

Min P

Top P

Top K

DRY Sampler

Presence penalty

Frequency penalty

Repetition penalty

Stop sequences

Do not generate instructions (opus-v1 only)

Keep only last instruction

Role-play specific settings

Prune interactions above limit

Max total tokens

Minimum message length (opus-v1 only)

Stop after N interactions

Stop before user message

Reveal hidden interactions