About this Course
Retrieval Augmented Generation (RAG) - Introduced by Facebook AI Research in 2020, is an architecture used to optimize the output of an LLM with dynamic, domain specific data without the need of retraining the model. RAG is an end-to-end architecture that combines an information retrieval component with a response generator. In this introduction we provide a starting point using components we at NVIDIA have used internally. This workflow will jumpstart you on your LLM and RAG journey.
What is RAG?
Retrieval Augmented Generation (RAG) is an architecture that fuses two powerful capabilities:
Information retrieval (like a search engine)
Text generation (using an LLM)
Instead of relying solely on a model’s pre-trained knowledge, RAG retrieves external, real-time or domain-specific information and injects it into the prompt. This results in:
- More accurate and up-to-date responses
- Customization to private/internal knowledge bases
- Better transparency and fact-grounding
Learning Objectives
Topics Covered
- What is Retrieval Augmented Generation?
- Why use it with LLMs?
- Separation of retrieval and generation
- Benefits over pure LLM prompting
- Creating vector embeddings
- Using FAISS or similar vector stores
- Semantic search vs keyword search
- Injecting retrieved documents into prompts
- Context window management
- Feeding augmented prompts into LLMs
- Generating responses with grounded context
- Index a document set
- Perform retrieval
- Generate RAG responses


0 Comments:
Post a Comment