Conversational Summarization: Summarizing the conversation so far is a useful approach. This cuts down the length of the conversation but retains the important points, allowing the model to stay within the token limit.
Relevant Context Lookup: This involves training models to identify and extract relevant sections of the conversation when necessary. By using vector space representations of text, questions and answers can be compared to find those most closely related to the current context.
Clever Prompting: Prompts can be designed in a way to "remind" the model of any important context that may influence its response, carrying relevant context throughout the conversation.
Context Window Adjustment: Some models use a sliding window approach where, as the conversation continues, the window "slides" to always include the latest inputs and outputs at the expense of older ones.
External Memory Architectures: These are designed to allow models to "remember" and pull in information from earlier in the conversation, or even from other conversations or sources of knowledge.
While these techniques have potential, they also introduce new complexities and challenges, such as deciding what information to keep or discard, maintaining coherence, and managing computational resources.