At Airbnb, making sense of mountains of data isn’t just a nice-to-have—it’s mission-critical. Whether you’re a guest searching for the perfect getaway or an employee launching a new feature, finding the right information quickly makes all the difference. That’s where Airbnb’s contextual knowledge graphs come in—a powerful, evolving web of connected data that’s transforming how we understand and retrieve knowledge.

Why Knowledge Graphs?
Traditional databases are great for storing facts, but not so great at connecting the dots. As Airbnb has grown, so has the complexity of our data. Knowledge graphs bridge these gaps by modeling relationships—so we can link “listing has amenity,” “host provides experience,” and more, into a single, searchable network.
What Does This Mean in Practice?
- Richer Search & Recommendations: Surfacing smarter, more relevant results (for guests and employees alike).
- Business Agility: New data sources and concepts can be integrated on the fly as Airbnb evolves.
- Context at Scale: Not just answering “what,” but “why” and “how” different data points relate.
Behind the Scenes: Engineering Highlights
- Graph Construction: Automated pipelines extract and connect data from across the company.
- Contextualization: The graph doesn’t just store facts—it understands context, weaving together the full story behind each node.
- Lightning-Fast Retrieval: Efficient algorithms make knowledge accessible in real-time, powering both internal tools and guest features.
Why Taxonomy is the Secret Sauce
Taxonomies give us a shared language. Imagine consistent labels for everything from “wifi” and “hot tub” to “villa” and “treehouse.” Airbnb’s taxonomy is baked directly into the knowledge graph, creating:
- Standardized Data: Consistency across listings, search, and analytics.
- Discovery Superpowers: Advanced filters so guests can find exactly what they’re dreaming of.
- Deeper Insights: A rock-solid foundation for analytics and trend-spotting.
In the Graph:
- Nodes = taxonomy concepts (“Hot Tub,” “Pet-Friendly”)
- Edges = relationships (“has_amenity”, “is_a_type_of”)
- Rich Metadata = multilingual labels, definitions, and context for every concept
This supports powerful, hierarchical queries like “Show me all listings with any wellness-related amenity”—making Airbnb’s data more useful than ever.
Keeping the Taxonomy Clean: Fighting Fragmentation
But here’s the twist: as teams and features multiply, so do the chances of duplication and confusion. (Is “Eco Lodging” the same as “Sustainable Stays”?)
A healthy taxonomy is non-negotiable. Airbnb tracks:
- Semantic Duplication Rate
- Orphaned Node Rate
- Legacy Rate
- Cross-Team Overlap
AI to the Rescue: Smarter Semantic Consolidation
To keep things tidy, Airbnb uses an AI-powered cleanup crew:
- Semantic Embeddings: Every node gets mapped in semantic space (using OpenAI Ada, Sentence-BERT, etc.).
- Clustering: Algorithms group together concepts that mean the same thing, even if named differently.
- LLM Validation: Large Language Models confirm duplicates and suggest canonical names.
Real Example:
- “Pet-Friendly Stays” & “Dog-Friendly Accommodations” → 0.85 similarity, 92% data overlap → likely merge.
- “Remote Work Retreats” vs “Work from Anywhere” → Only 17% overlap → probably not duplicates.
Even better, Airbnb cross-checks the real-world data beneath each node: if 90% of listings overlap, merge away. If overlap is low, flag it for human review.
![]()
Engineering Highlights
- Graph Construction: Airbnb’s engineering team built automated pipelines to extract, transform, and link data from multiple sources into a unified graph.
- Contextualization: Instead of serving isolated facts, the system provides contextual knowledge—answering not just “what” but “why” and “how” different pieces of data relate.
- Retrieval at Scale: By using efficient retrieval algorithms, Airbnb can serve knowledge-based queries quickly, supporting both user-facing features and internal tools.
Impact
- For Employees: Improved internal knowledge retrieval accelerates product development and operational decision-making.
- For Guests: Contextual recommendations and richer search experiences lead to better trip planning and more personalized stays.
Why Taxonomy Matters
Taxonomies create a shared language and consistent structure for everything Airbnb offers, from amenities (like “pool” or “wifi”) to property types (“villa”, “apartment”), to experiences and beyond. By defining clear categories and relationships, Airbnb’s taxonomy allows the company to:
- Standardize Data: Ensures data consistency across listings, search, recommendations, and internal tools.
- Enable Discovery: Powers advanced search filters, helping guests find exactly what they want.
- Drive Product Insights: Provides a framework for analytics, trend spotting, and product development.
Taxonomy in the Knowledge Graph
Airbnb’s engineering team integrates the taxonomy directly into the knowledge graph:
- Nodes represent concepts in the taxonomy, such as “Hot Tub” (an amenity) or “Pet-Friendly”.
- Edges model relationships, like “has_amenity” or “is_a_type_of”, capturing how concepts connect.
- Contextual Metadata: Each taxonomy node can include multilingual labels, definitions, and usage context—making the data rich and adaptable.
This approach supports hierarchical queries (e.g., “Show me all listings with any wellness-related amenity”) and provides a backbone for contextualizing all Airbnb data.
Scaling and Evolving
Maintaining and growing the taxonomy is a continuous process:
- Automated Pipelines ensure the graph stays up-to-date as new concepts are added.
- Versioning & Governance allow Airbnb to update or retire categories safely, minimizing downstream impact.
- Collaboration: Teams across Airbnb—from engineering to product to customer support—contribute to and benefit from a robust, shared taxonomy.
Why Ontology Health Matters
A fragmented taxonomy leads to:
- Redundant or confusing search results (“Eco Lodging” vs. “Sustainable Stays”)
- Missed connections between related concepts
- Inconsistent product analytics
- Slower onboarding for new teams and projects
Quantifying the problem: Airbnb now tracks metrics like Semantic Duplication Rate, Orphaned Nodes Rate, Legacy Rate, and Cross-Team Overlap to understand and improve taxonomy health.
AI-Driven Semantic Consolidation
To combat fragmentation, Airbnb is piloting an AI-guided approach:
- Semantic Embeddings: Each taxonomy node is represented as a vector, capturing its meaning and context (using models like OpenAI Ada or Sentence-BERT).
- Clustering & Similarity: Unsupervised clustering algorithms group nodes with similar meanings—even when the names are different.
- LLM Validation: Large Language Models (LLMs) review candidate clusters, confirm true duplicates, and suggest canonical names or definitions.
Example:
- “Pet-Friendly Stays” and “Dog-Friendly Accommodations” have 0.85 embedding similarity and 92% data overlap (listings in both). These can be merged or aliased.
- “Remote Work Retreats” vs. “Work from Anywhere”: Only 17% data overlap, so maybe these should stay separate.
Data-Level Validation
Beyond language, the team checks if nodes actually share downstream data (listings, experiences, tags). High overlap = strong merge candidate. Low overlap, even with similar names, signals potential differences needing human review: https://www.airbnb.com/release

Conclusion
Airbnb’s investment in knowledge graphs is helping the company scale not just data storage, but real-world understanding—connecting people to the information they need, when they need it. By investing in knowledge graphs and a living taxonomy, Airbnb isn’t just organizing data—it’s unlocking meaning and context. The result? Faster decisions, smarter recommendations, and a seamless experience for guests and employees alike. In a world that’s only getting more complex, this is how Airbnb keeps knowledge accessible, actionable, and always evolving.