If we’re speaking of transformer models like ChatGPT, BERT or whatever: They don’t have memory at all.
The closest thing that resembles memory is the accepted length of the input sequence combined with the attention mechanism. (If left unmodified though, this will lead to a quadratic increase in computation time the longer that sequence becomes.) And since the attention weights are a learned property, it is in practise probable that earlier tokens of the input sequence get basically ignored the further they lie “in the past”, as they usually do not contribute much to the current context.
“In the past”: Transformers technically “see” the whole input sequence at once.
But they are equipped with positional encoding which incorporates spatial and/or temporal ordering into the input sequence (e.g., position of words in a sentence). That way they can model sequential relationships as those found in natural language (sentences), videos, movement trajectories and other kinds of contextually coherent sequences.
Book for new programmers: How to properly prompt ChatGPT to solve errors.
Or dont read the book and paste your question in chatGPT
Better yet, have your LLM of choice read the book first.
They don’t really have a long term memory, so it’s probably not going to help.
If we’re speaking of transformer models like ChatGPT, BERT or whatever: They don’t have memory at all.
The closest thing that resembles memory is the accepted length of the input sequence combined with the attention mechanism. (If left unmodified though, this will lead to a quadratic increase in computation time the longer that sequence becomes.) And since the attention weights are a learned property, it is in practise probable that earlier tokens of the input sequence get basically ignored the further they lie “in the past”, as they usually do not contribute much to the current context.
“In the past”: Transformers technically “see” the whole input sequence at once. But they are equipped with positional encoding which incorporates spatial and/or temporal ordering into the input sequence (e.g., position of words in a sentence). That way they can model sequential relationships as those found in natural language (sentences), videos, movement trajectories and other kinds of contextually coherent sequences.