The importance of memory.

 Well, in a RAG chain we have always several important pieces:

  • the Embeddings Model, which translates texts in dense vectors, enabling Semantic Search
  • the Vector Store, where you safely save your texts and vectors, giving you a fast way to find all relevant docs
  • A Reranker, which helps to refine your search
  • The Large Language Model (LLM)
In the beginning, you could think that the LLM is only useful at the end of the chain, when all the docs retrieved are put inside the context of the prompt, together with your request. There, it synthetizes the answer.

But all the evolution we see today is often based on ideas on how to use more and more of the incredible power we have in current LLMs.



Let us consider one important feature we want to have in a Knowledge Assistant: we want the assistant to keep the memory of all the previous questions and answers (message history) and use it to enable a more natural kind of conversation.

For example, imagine that one of your questions is: "What is Long COVID?" (I'm working on a demo based on nice documentation from NIH).
Your next question could be: "What are the symptoms?" (and you don't repeat the subject).

Now, when you do the search inside the Vector Store you cannot simply use "What are the symptoms?" The symptoms of what? Every disease in your knowledge base?

You have to rephrase your question based on the message history.

One approach, well supported for example in Llama-index, is called "condense_plus_context": here you're using the LLM twice:
  • first, you take your last question + the message history and ask the LLM to create a condensed question (for example: "What are the symptoms for Long covid")
  • Then, you do a search in the Vector Store using the condensed question
  • and, only at the end, you send all the retrieved docs from the condensed question, plus the question, to the LLM to synthesize the answer
If you have built a chain using Llama-index or LangChain, adding memory and such an approach means a few lines of code.

If you want to see more details, have a look here: chat with memory


Commenti

Post popolari in questo blog