Complete Guide for Deploying Production-Quality Databricks RAG Apps

Deploying production-quality Databricks RAG apps can be challenging, but this guide will walk you through the process step by step.

May 30, 2024

•

Complete Guide for Deploying Production-Quality Databricks RAG Apps

Table of Contents

Do not index

Are you struggling to optimize your internal operations efficiently? Retrieval Augmented Generation, or Databricks RAG, might be your solution. Imagine a seamless workflow where every challenge finds a solution with ease. Our blog aims to help you achieve your goal of optimizing internal operations through serverless LLMs. Curious to learn more?

Introducing ChatBees's serverless LLM, a valuable tool designed to help you reach your objectives, such as optimizing internal operations. This solution simplifies the process, making it easier to streamline your operations and enhance productivity. Ready to take your business to the next level? Let's dive in!

What Is Retrieval-Augmented Generation (RAG) & Its Importance

RAG is a groundbreaking technique that transforms retrieval models into more robust question-answering and text-generation tools. It combines large language models with information retrieval from external data sources. By harnessing the power of authoritative knowledge sources, RAG enhances the accuracy and relevance of generated responses compared to relying solely on internal training data. This is important for boosting the capability and reliability of large language models, which can sometimes provide inaccurate, outdated, or generic responses.

The Typical Architecture of RAG Systems

RAG systems typically have three main components: retriever, reader, and generator. The retriever identifies relevant information from external sources, the reader comprehends and interprets the retrieved data, and the generator uses this information to produce more accurate and contextually appropriate responses. By incorporating these components, RAG systems can significantly enhance the quality of generated text by expanding the model's knowledge base beyond its original training data.

Key Benefits of RAG Systems

RAG systems offer several key benefits that can revolutionize question-answering and text-generation capabilities. By integrating authoritative external data sources, RAG systems can improve the accuracy, coverage of niche topics, and timeliness of responses. This ensures that the information is more reliable and up-to-date, enhancing user trust and satisfaction. RAG systems can address common challenges associated with large language models, such as presenting false or outdated information, enabling organizations to deliver more precise and relevant responses to user queries.

How Does Rag Work

Rag Llm

Rag Pipeline

Rag Rating

Rag Workflow

3 Challenges When Creating High Quality RAG Applications

1. Challenges with Serving Real-Time Data For Your RAG App

Databricks' latest release now supports serving and indexing data for online retrieval, which was a previously challenging aspect. With Vector Search for unstructured data and Feature and Function Serving for structured data, real-time data serving infrastructure is now more manageable and accessible. Unity Catalog also ensures data quality and security between online and offline datasets, making debugging and auditing much easier for enterprises.

2. Challenges with Comparing, Tuning, and Serving Foundation Models

Selecting the best base LLM model can be challenging due to the sheer number of options available and the varying dimensions across models. Databricks now offers a unified LLM development and evaluation environment, making it easier to compare and select the most suitable model for a specific application. The interactive AI Playground and integrated MLflow toolchain enable easy model comparison and tracking of key metrics to identify the best model candidate for each use case.

3. Challenges with Ensuring Quality and Safety In Production

Monitoring the quality and safety of LLM applications in production is complex, as there isn't a single correct answer or obvious error conditions. Databricks has observed that many customers hesitate to roll out RAG applications due to uncertainty about how well they will perform at scale. This highlights the importance of continuously monitoring and evaluating RAG application performance to ensure users' quality and safety.

What Is RAG LLM

RAG LLM Meaning

Rag Model Llm

Rag Use Case

Llm Tech Stack

Rag Fine Tuning

Rag Nlp

Rag Api

Rag Stack

Rag Systems

Rag Evaluation

Rag Service

Rag Use Cases

Rag Software

RAG Architecture LLM

LLM Rag Meaning

RAG LLM Example

Complete Guide for Deploying Production-Quality Databricks RAG Apps

Using Feature and Function Serving (AWS)(Azure) for structured data in coordination with Databricks Vector Search (AWS)(Azure) for unstructured data significantly simplifies productionalization of Gen AI applications. Users can build and deploy these applications directly in Databricks and rely on existing data pipelines, governance, and other enterprise features. Databricks customers across various industries are using these technologies and open-source frameworks to build powerful Gen AI applications like the ones described in the table below.

Retail

Product Recommendations / Search Ranking using user preferences, search history, location, … etc

Image and Metadata based Product Search

Inventory Management and Forecasting using sales data, seasonal trends, and market/competitive analysis

Education

Personalized learning plans based on past mistakes, historical trends, and cohorts

Automated Grading, Feedback, Follow-ups, and Progress Reporting

Content filtering for issued devices

Financial Services

Natural language apps for analysts and investors to correlate earning calls and reports with market intelligence and historical trends

Fraud and Risk Analysis

Personalized Wealth Management, Retirement Planning, what-if analysis, and next best actions

Travel and Hospitality

Chatbots for personalized customer interactions and tailored travel recommendations

Dynamic Route Planning using weather, live traffic patterns, and historical data

Dynamic Price Optimization using competitive analysis and demand-based pricing

Healthcare and Life Sciences

Patient/Member engagement and health summaries

Support apps for personalized care, clinical decisions, and care coordination

R&D report summarization, Clinical Trial Analysis, Drug Repurposing

Insurance

Risk assessment for mortgage underwriting using text and structured data about properties and neighborhoods

User chatbots for questions about policies, risk, and what-if analysis

Claim Processing Automation

Technology and Manufacturing

Prescriptive maintenance and diagnostics for equipment using guided instruction

Anomaly detection on live data stream against historical statistics

Automated analysis for daily production/shift analysis and future planning.

Media and Entertainment

In-app content discovery and recommendations, personalized email and digital marketing

Content Localization

Personalized gaming experiences and game review

Serving structured data to RAG applications

To demonstrate how structured data can help enhance the quality of a Gen AI application, we use the following example for a travel planning chatbot. The example shows how user preferences (example: "ocean view" or "family friendly") can be paired with unstructured information sourced about hotels to search for hotel matches.

Dynamic Hotel Pricing and Budget Recommendations with Gen AI

Typically hotel prices dynamically change based on demand and seasonality. A price calculator built into the Gen AI application ensures the recommendations are within the user's budget. The Gen AI application that powers the bot uses Databricks Vector Search and Databricks Feature and Function Serving as building blocks to serve the necessary personalized user preferences and budget and hotel information using LangChain's agents API.

Deploying the RAG Chain Application for Chatbot Integration

The complete notebook for this RAG Chain application is depicted above. This application can be run locally within the notebook or deployed as an endpoint accessible by a chatbot user interface.

Access your data and functions as real-time endpoints

With Feature Engineering in Unity Catalog you can already use any table with a primary key to serve features for training and serving. Databricks Model Serving supports using Python functions to compute features on-demand. Built using the same technology available under the hood for Databricks Model Serving, feature and function endpoints can be used to access any pre-computed feature or compute them on demand. With a simple syntax you can define a feature spec function in Unity Catalog that can encode the directed acyclic graph to compute and serve features as a rest endpoint.

from databricks.feature_engineering import (

FeatureFunction,

FeatureLookup,

FeatureEngineeringClient,

)

features = [

# Lookup columns `latitude` and `longitude` from `restaurants` table in UC using the input `restaurant_id` as key

FeatureLookup(

table_name="main.default.restaurants",

lookup_key="restaurant_id",

features=["latitude”, “longitude"]

# Calculate a new feature called `distance` using the restaurant and user's current location

FeatureFunction(

udf_name="main.default.distance",

output_name="distance",

# bind the function parameter with input from other features or from request.

input_bindings={"user_latitude": "user_latitude", "user_longitude": "user_longitude",

"restaurant_latitude": "latitude", "restaurant_longitude": "longitude"},

]

fe = FeatureEngineeringClient()

# Create a feature spec with the features listed above.

# The FeatureSpec can be accessed in UC as a Function.

fe.create_feature_spec(

name="main.default.restaurant_features",

features=features,

)

This feature spec function can be served in real-time as a REST endpoint. All endpoints are accessible in the Serving left navigation tab including features, function, custom trained models, and foundation models. Provision the endpoint using this API

from databricks.feature_engineering.entities.feature_serving_endpoint import (

ServedEntity,

EndpointCoreConfig,

)

fe.create_feature_serving_endpoint(

name="restaurant-features",

config=EndpointCoreConfig(

served_entities=ServedEntity(

feature_spec_name="main.default.restaurant_features",

workload_size="Small",

scale_to_zero_enabled=True

)

To serve structured data to real-time AI applications, precomputed data needs to be deployed to operational databases. Users can already use external online stores as a source of precomputed features--for example DynamoDB and Cosmos DB are commonly used to serve features in Databricks Model Serving. Databricks Online Tables (AWS)(Azure) adds new functionality that simplifies synchronization of precomputed features to a data format optimized for low latency data lookups. You can sync any table with a primary key as an online table and the system will set up an automatic pipeline to ensure data freshness.

Any Unity Catalog table with primary keys can serve features in Gen AI applications using Databricks Online Tables.

Use ChatBees’ Serverless LLM to 10x Internal Operations

ChatBees' unique agentic framework optimizes Response, Analysis, and Generation (RAG) technology for internal operations. By integrating RAG with customer support, employee support, and other workflows, ChatBees ensures the most precise responses.

This framework is beneficial as it simplifies integration, offers a low-code, no-code approach, and automatically selects the best strategy for enhancing response quality in various use cases. As a result, predictability and accuracy are improved, enabling operational teams to handle larger query volumes effectively.

Serverless RAG for Seamless Data Connectivity

ChatBees' Serverless RAG feature provides simple, secure, high-performing APIs connecting various data sources. These sources may include PDFs, CSVs, websites, Google Drive, Notion, and Confluence. The platform allows for immediate search, chat, and summarization within the knowledge base.

Notably, the deployment and maintenance of this service do not require DevOps support, offering a hassle-free experience to users. This feature is ideal for onboarding, sales enablement, customer support, and product & engineering teams. By enabling quick access to essential materials and resources, the platform promotes efficiency and collaboration among team members.