RAG Engine overview

Vertex AI RAG Engine, a component of the Vertex AI Platform, facilitates Retrieval-Augmented Generation (RAG). Vertex AI RAG Engine is also a data framework for developing context-augmented large language model (LLM) applications. Context augmentation occurs when you apply an LLM to your data. This implements retrieval-augmented generation (RAG).

A common problem with LLMs is that they don't understand private knowledge, that is, your organization's data. With RAG Engine, you can enrich the LLM context with additional private information, because the model can reduce hallucination and answer questions more accurately.

By combining additional knowledge sources with the existing knowledge that LLMs have, a better context is provided. The improved context along with the query enhances the quality of the LLM's response.

The following image illustrates the key concepts to understanding RAG Engine.

Vertex AI RAG key concepts

These concepts are listed in the order of the retrieval-augmented generation (RAG) process.

Data ingestion: Intake data from different data sources. For example, local files, Cloud Storage, and Google Drive.
Data transformation: Conversion of the data in preparation for indexing. For example, data is split into chunks.
Embedding: Numerical representations of words or pieces of text. These numbers capture the semantic meaning and context of the text. Similar or related words or text tend to have similar embeddings, which means they are closer together in the high-dimensional vector space.
Data indexing: RAG Engine creates an index called a corpus. The index structures the knowledge base so it's optimized for searching. For example, the index is like a detailed table of contents for a massive reference book.
Retrieval: When a user asks a question or provides a prompt, the retrieval component in RAG Engine searches through its knowledge base to find information that is relevant to the query.
Generation: The retrieved information becomes the context added to the original user query as a guide for the generative AI model to generate factually grounded and relevant responses.

What's next

To learn about the file size limits, see Supported document types.
To learn about quotas related to RAG Engine, see RAG Engine quotas.
To learn about customizing parameters, see Retrieval parameters.
To learn more about the RAG API, see RAG Engine API.
To learn more about grounding, see Grounding overview.
To learn more about the difference between grounding and RAG, see Ground responses using RAG.
To learn more about Generative AI on Vertex AI, see Overview of Generative AI on Vertex AI.
To learn more about RAG architecture, see the following reference architectures:
- Infrastructure for a RAG-capable generative AI application using Vertex AI and Vector Search
- Infrastructure for a RAG-capable generative AI application using Vertex AI and AlloyDB for PostgreSQL