Scalable Software, AI, and Career Mastery | LLMs and GenAI with Knowledge Graphs, Search and RAGs

Well, Knowledge Graphs excel at capturing context. How can combining Knowledge Graphs with RAG – an emerging technique known as GraphRAG – give context to your RAG application, and lead to more accurate and complete results, accelerated development, and explainable AI decisions? This talk will go deep on the why and how of GraphRAG, and where best to apply it. GraphRAG (Graph Retrieval-Augmented Generation) is an emerging approach in AI that combines the strengths of large language models with structured knowledge from graphs. Instead of relying solely on text or documents, GraphRAG retrieves relevant information from a graph database—such as knowledge graphs or ontologies—and feeds that context to an LLM (like GPT or similar) to produce richer, more accurate, and context-aware responses. This technique is especially powerful for applications that need up-to-date facts, complex reasoning, or connections between entities, making it ideal for domains like search, enterprise knowledge, and advanced chatbots.

GraphRAG bridges Airbnb’s rich structured data with the flexibility of LLMs—unlocking smarter, context-aware, and more trustworthy automation across product, support, and operations. Even simple prototypes could offer measurable impact in personalization, efficiency, and guest/host satisfaction. I personally worked in Search team on Airbnb Knowledge Graph and discovered a lot of great ideas on how to use them. Cache with KG (Knowledge Graph) is called CAG. It could be another option to use.

LLMs and GenAI with Knowledge Graphs, Search and RAGs

Layer	Technology	Purpose
Data Ingestion	ETL, APIs, Streams	Pull, clean, normalize, and map raw data
Knowledge Graph	JanusGraph, Elastic	Rich, structured, connected data
Embeddings & Vectors	BERT, Hugging Face,	Semantic search & similarity matching
Query API	GraphQL	Flexible query language for apps and orchestration
RAG Orchestration	Custom Logic	Combine KG and vector retrieval for LLM input
LLM Layer	OpenAI, Anthropic	Advanced reasoning, natural language generation
Applications	Custom Apps, Bots, Support Bots	End-user features: search, Q&A, support, analytics
Monitoring & Feedback	Logging, Metrics, Data Dog, Grafana	Continuous improvement & retraining

1. Smart Search & Discovery

Problem:
Travelers often have complex, nuanced search needs (e.g., “find pet-friendly homes in Paris with a garden, near Michelin restaurants, and available next weekend”).
GraphRAG Solution:

Build a property knowledge graph connecting listings, amenities, locations, nearby attractions, reviews, and hosts.
Use LLMs with GraphRAG to interpret natural language, retrieve relevant nodes (e.g., listings near specific points of interest, matching amenity profiles), and generate concise, personalized recommendations or summaries.

2. Personalized Guest and Host Support

Problem:
Support agents and self-serve bots need fast, accurate, and context-aware answers spanning policies, bookings, guest profiles, and host rules.
GraphRAG Solution:

Integrate user, booking, policy, and knowledge base graphs.
Use GraphRAG to pull facts (e.g., booking change rules, cancellation history, host preferences) and let an LLM generate personalized, authoritative responses—reducing agent lookup time and improving customer experience.

3. Automated Trip Planning

Problem:
Guests want tailored itineraries and local experiences, but piecing everything together is manual.
GraphRAG Solution:

Connect listings, local attractions, events, restaurants, and transportation in a single graph.
Given user input (“3 days in Kyoto, love food and culture, with kids”), GraphRAG can surface relevant nodes and have an LLM stitch together a multi-day itinerary, pulling accurate data and recommendations directly from Airbnb’s ecosystem.

4. Proactive Trust & Safety Monitoring

Problem:
Detecting risky behavior or fraud often requires connecting the dots across user activity, properties, communication, and payment signals.
GraphRAG Solution:

Link users, devices, transactions, property attributes, and reviews in a trust/safety graph.
LLM + GraphRAG can answer questions like “has this host had multiple complaints about safety in the last 6 months?” or “is there a pattern of suspicious bookings in this region?”
Can also generate actionable alerts with context for human review.

5. Enhanced Host Insights

Problem:
Hosts want to know how to improve, but feedback is scattered.
GraphRAG Solution:

Connect reviews, guest feedback, calendar performance, pricing, and local trends.
LLM + GraphRAG can generate actionable, customized suggestions for hosts (“Consider adding Wi-Fi speed to your listing; guests in your area often mention it”).

6. Richer Knowledge Management

Problem:
Internal teams have to find and connect insights from vast product docs, policies, A/B test results, and user feedback.
GraphRAG Solution:

Internal graph links docs, experiments, metrics, team owners, and code repositories.
GraphRAG lets LLMs answer complex, context-rich queries, trace feature ownership, or summarize impacts (“What were the top three guest complaints in EU this quarter and which team owns those workflows?”).

7. Onboarding and Training

Problem:
New hires need to understand Airbnb’s ecosystem, policies, and tech quickly.
GraphRAG Solution:

Connect internal wikis, org charts, past tickets, and best practices.
LLM + GraphRAG can provide interactive onboarding Q&A, contextual document retrieval, or even custom learning paths.

Large language models do many things, and it's not clear from black-box interactions how they do them. We will discuss recent progress in mechanistic interpretability, an approach to understanding models based on decomposing them into pieces, understanding the role of the pieces, and then understanding behaviors based on how those pieces fit together.

Deep Architecture Overview

1. Data Ingestion & Processing Layer

ETL Pipelines (Batch/Streaming): Ingests data from Airbnb’s transactional DBs, logs, APIs, external sources (e.g., partner content).
Preprocessing: Cleans, transforms, deduplicates, and normalizes data.
Entity Resolution: Identifies and merges entities (users, listings, locations) across sources.
Schema Mapping: Translates incoming data into the open source version of JanusGraph schema (vertices/edges/properties).

2. Knowledge Graph Layer

JanusGraph: Distributed, scalable property graph database.
- Vertices: Users, Listings, Holidays, Interests, Hosts, Reviews, Amenities, Locations, Events, Experiences and Services.
- Edges: Relationships like Passive Connections between people who booked together or BOOKED, REVIEWED, LOCATED_IN, NEARBY, SIMILAR_TO, etc.
- Property Indexing: ElasticSearch/Lucene for fast attribute search.
- Backend Storage: Cassandra/HBase/DynamoDB/S3.
Schema Management: Versioned schema, dynamic type support.
Graph Enrichment: Additional graph analytics (e.g., PageRank, cosine similarity, clusters).

3. Embeddings & Vector Database Layer

Embeddings Service:
- Uses pre-trained or fine-tuned transformer models (BERT, OpenAI, Hugging Face) to generate dense vector embeddings for nodes, text, or metadata.
- Syncs embeddings back to the knowledge graph (node/edge-level).
Vector Database (Pinecone, Weaviate, or OpenSearch Vector):
- Stores vector representations for rapid similarity search.
- Connects listing/user/review/amenity nodes to their embeddings for hybrid retrieval (text + structure).

4. Query & Retrieval Layer

GraphQL API:
- Unified, typed interface for querying the knowledge graph.
- Custom resolvers route queries to JanusGraph or the vector DB as needed.
- Supports complex joins, aggregations, and subgraph queries.
RAG Orchestrator:
- Coordinates between GraphQL (JanusGraph) and vector search (embeddings DB) for hybrid retrieval-augmented generation (GraphRAG).
- Aggregates, deduplicates, and scores retrieved nodes/subgraphs.
- Injects rich context into LLM prompts.

5. Application & LLM Layer

LLM Gateway:
- Connects to external LLM APIs (OpenAI, Anthropic, etc.) or open-source models.
- Handles prompt construction using graph and embedding context.
- Post-processes LLM output for grounding, explanation, and traceability.
Applications:
- Search & Recommendations
- Support Automation and Trip Planning & Itinerary Generation
- Knowledge Exploration & Q&A and Internal Tools (analytics, trust/safety)

6. Monitoring, Logging, and Feedback

Observability: Logs all queries, model outputs, and user feedback. For example Grafana, Data Dog, Prometheus.
Feedback Loop: Feeds usage data and user ratings back into model retraining and graph enrichment