Upgrade your site search: Contextual answers with generative AI

André Cipriani Bandarra
André Cipriani Bandarra

Generative AI refers to the use of artificial intelligence to create new content, like text, images, music, audio, and videos. Generative AI relies on a machine learning (ML) model to learn the patterns and relationships in a dataset of human-created content.

This technology has shown incredible capabilities, through applications like Gemini. You may be wondering, how do I implement generative AI tools into my web products?

One common use case is to provide users a better interface to ask questions about a website's content. You can greatly improve your user's search results, with the help of machine learning.

You could create an interface where users write their question, which is then sent to a large language model (LLM), such as Gemini, and then display the answers to your users.

Suppose such a feature existed on this site. A user wants to know which APIs are included in Interop 2024, and they input the following query:

What are the features included in Interop 24?

Unfortunately, the output will likely be incorrect, for a couple of reasons:

  • The user has given the LLM little context for the question, therefore the LLM is more prone to return wrong answers or hallucinations.
  • The LLM was likely trained before Interop 2024 was created, or its features decided, so it's unaware of that information.

While it's possible for LLMs to find more current information, LLM training datasets are inherently outdated. Maintaining fresh results can be incredibly time consuming and expensive.

Use prompt engineering

Prompt engineering is a set of techniques to get the best output out of a LLM.

One technique is to provide additional context in the prompt, making the LLM more likely to output content that is related to the context.

Continuing with our Interop example, our first step is to provide the full contents of the article as context. Then add the question as the input for the LLM to answer. For example:

Context:
Following on from the success of Interop 2022 and Interop 2023, we
are excited about the opportunity to collaborate once again with
all key browser vendors and other relevant stakeholders...
(trimmed to fit in this article)

Input:
What are the features included in Interop 2024?

You can expect Gemini to output something like the following:

The features included in Interop 24 are Accessibility, CSS Nesting, Custom
Properties, Declarative Shadow DOM, font-size-adjust, HTTPS URLs for
WebSocket, IndexedDB, Layout, Pointer and Mouse Events, Popover, Relative
Color Syntax, requestVideoFrameCallback, Scrollbar Styling, @starting-style
and, transition-behavior, Text Directionality, text-wrap: balance,URL

This answer is likely much better than with a prompt without context, as the answer is based on the provided context.

Scale up with RAG

For example, instead of answering a question about a single article, we want the LLM to answer more questions about web.dev, using any article as the additional context. While this may be possible for smaller sites, given Gemini 1.5's context-window of 1 million tokens, larger prompts are slower and more expensive to run.

LLMs input and output lengths are measured and charged in tokens, which are a way to represent a common sequence of characters found in a text input. The number of tokens will generally be larger than the number of words. As an example the input in the first example had 775 words, represented by 1097 tokens. Different LLMs might calculate tokens differently and most provide an API or an endpoint to calculate the number of tokens for text input.

One solution is to provide the LLM prompt-relevant articles. This task should be given two parts:

  1. Add the content of top articles as a context when prompting the LLM.
  2. Search the content for articles related to "What are the features included in Interop 2024?".

We want the Gemini results to return content based on the following articles:

The input should look as follows:

Context:
Article 1:
Over the past two years... (trimmed)

Article 2:
At the end of last year Interop 2023 wrapped up. This effort... (trimmed)

Article 3:
Following on from the success of Interop 2022... (trimmed)

Input:
What are the features included in Interop 2024?

This context produces our expected output.

* Accessibility * CSS Nesting * Custom Properties
* Declarative Shadow DOM * font-size-adjust
* HTTPS URLs for WebSocket * IndexedDB * Layout
* Pointer and Mouse Events * Popover * Relative Color Syntax
* requestVideoFrameCallback * Scrollbar Styling
* @starting-style and transition-behavior * Text Directionality
* text-wrap: balance * URL

For those familiar with AI techniques, this approach uses RAG, a common practice to improve the likelihood of real answers from generative AI tools.

While the RAG technique can work with regular full text search, there are shortcomings to the approach.

  • Full text search helps AI find exact keyword matches. However, LLMs are unable to determine the intended meaning behind a user's query. This can lead to outputs are incomplete or incorrect.
  • There may be problems when words have multiple meanings or the queries use synonyms. For example, "bank" (financial institution versus riverbank) can lead to irrelevant results.
  • Full text search may output results that happen to contain the keywords but don't align with the user's objective.

Semantic search is a technique to improve search accuracy by focusing on these key aspects:

  • Searcher's intent: It tries to understand the reason why a user is searching for something. What are they trying to find or accomplish?
  • Contextual meaning: It interprets words and phrases in relation to their surrounding text, as well as other factors like the user's location or search history.
  • Relationship between concepts: Semantic search uses knowledge graphs (large networks of related entities) and natural language processing to understand how words and ideas are connected.

As a result, when you build tools with semantic search, the search output relies on the overall purpose of the query, instead of keywords. This means a tool can determine relevant documents, even when the exact keyword is not present. It can also avoid results where the word is present, but has a different meaning.

Right now, you can implement two search tools which employ semantic search: Vertex AI Search and Algolia AI Search.

Draw answers from published content

You've learned how to use prompt engineering to enable a LLM to provide answers related to content it's never seen by adding context to the prompt. And, you've learned how to scale this approach from individual articles to an entire corpus of content using the Retrieval-Augmented Generation (RAG) technique. You learned how semantic search can further improve results for user search queries, better implementing RAG into your product.

It's a known problem that generative AI tools can "hallucinate," which makes them at best, sometimes unreliable, or at worst, actively harmful for a business. With these techniques, both users and developers can improve the reliability and, perhaps, build trust in the output from these applications.