Token counts play a significant role in shaping an LLM’s memory and conversation history. Think of it as having a conversation with a friend who can remember the last few minutes of your chat, using token counts to maintain context and ensure a smooth dialogue. However, this limited memory has implications on user interactions, such as the need to repeat crucial information to maintain context.
When a conversation length is longer than the token limit, the context window shifts, potentially losing crucial content from earlier in the conversation. The previous can lead to incomplete or nonsensical responses, as the LLM loses vital context.
To overcome this limitation, users can employ different techniques, such as:
- Periodically repeating important information or using more advanced strategies.
- Truncate, omit or rephrase text to fit within the limit.
- Wrap up your current conversation by creating a summary before you run into the limit, then start your next conversation with that summary.
- Write very long prompts with the idea that you will attempt a one-shot conversation: give the AI everything you know and have it make one response.