ChatGPT has a token limit of 4096 tokens per API call. This limit is due to a constraint in the model architecture, as it has been trained to handle only this amount of tokens in a single instance.
Both input and output tokens count toward this limit. If a conversation exceeds this limit, you'll need to truncate, omit, or otherwise reduce the text until it fits. However, remember that removing a message from the messages input will make the model lose knowledge of it.
In a conversation, each message uses some tokens depending on the content of the message, and there will be extra tokens that are used for formatting and special characters. Additionally, very long conversations may receive incomplete replies. If the conversation exceeds the model’s maximum limit, it can’t generate any additional tokens for a response.
As a workaround for the token limit, you need to manage your conversation history carefully to make sure important context is preserved and the conversation fits within the limit. You might need to summarize or delete less relevant parts of the conversation as it grows.