Compiling Ideas
Compiling Ideas Podcast
The Global Search Problem Nobody Talks About (And How I Finally Solved It)
0:00
-17:04

The Global Search Problem Nobody Talks About (And How I Finally Solved It)

Enterprise knowledge is scattered everywhere: Confluence, Git repos, Google Docs, PDFs, random wikis. You need information, but good luck finding it. I got tired of this, so I built docsearch-mcp, an MCP server that turns any AI assistant into a search engine for all your docs. This article walks through why vector databases and semantic search matter, how chunking strategies affect your results, and why this is bigger than just search (think onboarding, architecture context, and making your LLM actually useful).

Introduction

Picture this: It’s 2 a.m., you’re debugging a production issue, and you need to find that one architecture decision document someone wrote six months ago. Was it in Confluence? Git? A PDF in Slack? A Google Doc? Who the hell knows.

You try Confluence search. Nothing. You grep through repos. Still nothing. You ping three people on Slack. Two are asleep, one sends you the wrong doc.

This isn’t a productivity problem. It’s a knowledge architecture problem. And every company has it.

Most places “solve” this by having better documentation discipline (lol) or by building some janky internal search tool that takes six months and dies the moment the engineer who built it leaves. I tried both. Neither worked.

So I did what any reasonable person would do: I built a tool that solves this once and for all. Not just for me, but for anyone who’s ever rage-typed “site:confluence.company.com” into Google.

That tool is docsearch-mcp.

The Messy Reality of Enterprise Knowledge

Here’s what your knowledge looks like in the real world:

Your product specs live in Confluence. Your engineering decisions are in markdown files scattered across five Git repos. Your onboarding docs are in Google Drive. Your compliance stuff is in PDFs that someone emailed around in 2019. Your API documentation is in Swagger, but half of it’s outdated. And your tribal knowledge? That’s in Slack threads that nobody bookmarked.

When a new engineer joins, they spend two weeks just figuring out where stuff is. When you’re trying to make a decision, you waste hours hunting down context that you know exists somewhere. And when you’re building features, you end up reinventing solutions that someone already documented, because finding that doc is harder than just doing it again.

The obvious answer is “just put everything in one place.” Cool. Which place? And who’s going to migrate 47 Confluence spaces, 300 Git repos, and 10,000 Google Docs? Also, good luck getting everyone to agree on one tool.

So you’re stuck. Different teams use different tools. Your knowledge graph looks like a spider web drawn by a drunk spider. And search? LOL. Confluence search finds everything except what you want. Git grep only works if you remember the exact phrase. Google Docs search is… let’s not even talk about it.

This is the global search problem. And it’s not going away by telling people to “organize better.”

The scale of this problem is real. Airbnb’s engineering team documented their struggles with “knowledge cacophony, where teams only read and trust research that they themselves created” as their organization grew.[¹] And it’s not just them. Research shows enterprise search only yields a 6% success rate in providing relevant results on the first try.[²]

RAG: The Theoretical Solution That’s a Pain to Build

If you know anything about modern AI, you’ve heard of RAG (Retrieval-Augmented Generation). The idea is simple: instead of relying on an LLM’s training data, you give it a search engine for your docs. The LLM retrieves relevant info, then generates an answer.

RAG solves the global search problem perfectly. In theory.

In practice? You need to build an entire pipeline. And RAG has exploded in popularity, with more than 1,200 RAG-related papers appearing on arXiv in 2024 alone, compared to fewer than 100 the previous year.[³] This explosion shows both the promise and the complexity of the approach.

Here’s what building RAG looks like:

You start by collecting all your data. That means writing connectors for Confluence, Git, Google Drive, your wiki, your internal APIs, and whatever else. Then you chunk it, which means splitting documents into smaller pieces so your embeddings don’t blow up in size. Then you embed those chunks, which means running them through a model that turns text into vectors. Then you ingest those vectors into a vector database like Pinecone or Weaviate or Qdrant. Then you build a query interface. Then you hook that up to an LLM. Then you tune your retrieval to balance precision and recall. Then you handle updates, because your docs change constantly.

And that’s just the MVP. You haven’t even dealt with access control, deduplication, metadata filtering, or any of the hundred other things that make RAG actually useful in production.

Most teams look at this and say “nah, we’ll just keep using Confluence search.” And I get it. Building RAG from scratch is a multi-month project. It’s a whole system.

But the problem is real. And ignoring it doesn’t make it go away.

The Simpler Way: docsearch-mcp

So here’s what I did. Instead of building yet another internal tool that only works at one company, I built a tool that anyone can use: docsearch-mcp.

It’s an MCP server (Model Context Protocol, the open standard that lets AI assistants talk to external tools[⁴]). Think of MCP like USB-C for AI applications: instead of maintaining separate connectors for each data source, you build against a standard protocol. You point it at your docs, it ingests them into a local SQLite or PostgreSQL database with vector embeddings, and boom. Now Claude Code, Cursor, or any other MCP client can search your entire knowledge base.

You don’t write connectors. I already wrote them. Confluence, local files, PDFs, images (with AI-powered descriptions). You don’t build a vector database. It uses SQLite with pgvector or PostgreSQL. You don’t write a search interface. The MCP server exposes it. You don’t integrate it with your LLM. MCP does that automatically.

Setup looks like this:

npm install -g docsearch-mcp
echo “OPENAI_API_KEY=your-key” > .env
echo “FILE_ROOTS=.” >> .env

Add it to your Claude Code config:

{
  “mcpServers”: {
    “docsearch”: {
      “command”: “npx”,
      “args”: [”docsearch-mcp”, “start”],
      “env”: {
        “OPENAI_API_KEY”: “your-key”,
        “FILE_ROOTS”: “.,../other-project”
      }
    }
  }
}

Run the ingestion:

docsearch ingest all

And that’s it. Your AI assistant can now search everything.

No multi-month project. No dedicated team. No reinventing RAG. Just install, configure, done.

How It Actually Works: Vector Databases and Semantic Search

Let’s talk about what’s happening under the hood, because this is where it gets interesting.

Traditional search works with keywords. You type “authentication bug,” it finds docs with those exact words. But what if the doc says “login issue” instead? Or “auth failure”? You miss it.

Semantic search solves this by understanding meaning, not just words. It converts your query into a vector (a list of numbers representing the meaning), then finds documents with similar vectors. So “authentication bug” matches “login issue” because they mean the same thing, even though the words are different.

Google has been incorporating semantic search into their systems since 2015 with innovations like RankBrain and neural matching,[⁵] and the technology has come a long way since then.

Here’s how docsearch-mcp does it:

First, it chunks your documents. A 50-page PDF gets split into smaller pieces, because embedding an entire doc creates a giant, unfocused vector. Each chunk is small enough to have a clear topic, but big enough to have context.

Then it embeds each chunk using OpenAI’s text-embedding-3-small model (or any compatible embedding API). This creates a 1536-dimensional vector for each chunk. That vector captures the semantic meaning of the text.

Those vectors go into a database. SQLite uses pgvector for vector similarity search. PostgreSQL has native pgvector support. Either way, you get fast approximate nearest neighbor search (using HNSW or IVFFlat indexes).

HNSW (Hierarchical Navigable Small World) offers logarithmic search time that scales well even with massive datasets,[⁶] while IVFFlat provides a more memory-efficient alternative with faster build times. For a 1M vector dataset, HNSW delivers 40.5 queries per second compared to IVFFlat’s 2.6, but HNSW uses 729MB of memory versus IVFFlat’s 257MB.[⁷]

When you search, your query gets embedded the same way. The database finds the top K chunks with the most similar vectors. Those chunks get returned to the LLM, which uses them to generate an answer.

The magic is in the hybrid approach. docsearch-mcp doesn’t just do vector search. It combines full-text search (FTS) with vector similarity. This means you get exact keyword matches when they matter, and semantic matches when keywords fail. Best of both worlds.

Chunking Strategies: Why Size Matters

Chunking is one of those things that seems trivial until you actually try it. Split your docs into pieces. Easy, right?

Not really. Chunk too small, and you lose context. Chunk too large, and your embeddings become too generic. Chunk inconsistently, and some queries work great while others fail mysteriously.

Here are the main strategies:

  • Fixed-size chunking is the simplest: You split every N tokens (say, 512). It’s fast and predictable, but it breaks in the middle of sentences, paragraphs, or ideas. Your chunks lose coherence.

  • Sentence-based chunking splits at sentence boundaries: Better than fixed-size, but sentences vary wildly in length. You might get a 5-word chunk next to a 200-word chunk. And short chunks often lack context.

  • Paragraph-based chunking uses natural document structure: Works great for articles and docs. Doesn’t work at all for code, logs, or unstructured text.

  • Recursive chunking splits intelligently: It tries to split at paragraphs, then sentences, then words, until it hits the target size. This is what LangChain does by default. It’s pretty good for mixed content.

Semantic chunking uses embeddings to detect topic shifts. When the meaning changes, you split. This creates chunks that are topically coherent, which improves retrieval quality. But it’s slower and more complex.

As Pinecone’s chunking strategies guide explains: “If the chunk of text makes sense without the surrounding context to a human, it will make sense to the language model as well.”[⁸] This principle guides all good chunking decisions.

docsearch-mcp uses a hybrid approach. For code, it chunks by functions and classes. For docs, it uses recursive splitting with paragraph awareness. For PDFs, it splits by pages but respects sentence boundaries. The goal is context preservation, not just uniform size.

And here’s the thing: chunking affects everything downstream. Bad chunks mean bad embeddings, which mean bad search results, which mean your LLM gives you irrelevant answers. Getting this right is the difference between “this tool is amazing” and “why does it keep missing obvious stuff?”

Beyond Search: Real Use Cases That Actually Matter

Okay, so you have semantic search over all your docs. Cool. But why does this matter beyond saving a few minutes when you’re hunting for stuff?

Onboarding new engineers: Instead of handing them a 50-page onboarding doc (which they won’t read), they ask Claude: “How do we handle authentication?” Claude searches your architecture docs, your code, your runbooks, and your Confluence pages. It pulls together a coherent answer with links to the source material. The new engineer gets context, not just links. And they learn by asking, not by reading endless docs.

This mirrors what Airbnb built with their knowledge graph to “surface the relevant parts of information at the exact time you’re looking for it.”[⁹] Except you don’t need a dedicated team to build it.

Global search for existing employees: You’re designing a new feature. You ask Claude: “Have we built anything like this before?” It searches across old PRs, design docs, Confluence pages, and Slack exports (if you ingest those). Turns out, yeah, someone built something similar two years ago. You avoid reinventing the wheel. You learn from past decisions. You ship faster.

Local docs for your current project: You’re working on a microservice. You want to know how it handles retries. Instead of grepping through code or reading outdated comments, you ask Claude. It searches the local repo, finds the relevant functions, and explains how it works. It’s like having a senior engineer on call, except it’s instant and it never gets annoyed.

Giving your LLM context about best practices: You ask Claude to write a new API endpoint. Without context, it might use patterns that don’t match your codebase. But with docsearch-mcp, it searches your internal style guides, your architecture decision records (ADRs), your code examples. It generates code that actually fits your system. Less review churn, fewer bugs, faster velocity.

Architecture and decision history: Why did we choose PostgreSQL over DynamoDB? Why did we split this service? Why is this API designed this way? These answers are buried in docs, PRs, and Slack threads. docsearch-mcp finds them. You get institutional knowledge without having to track down the person who made the decision (who probably left six months ago).

The pattern here is clear: docsearch-mcp turns your scattered, siloed knowledge into something your AI assistant can actually use. And that makes the AI assistant way more useful. It’s not just autocomplete anymore. It’s a teammate with access to your entire knowledge base.

How It Fits Into Your Workflow (Without Breaking Anything)

The best tools are the ones you don’t have to think about. They just work. docsearch-mcp is designed to be invisible until you need it.

You’re coding in Claude Code or Cursor. You ask a question. Behind the scenes, the MCP server searches your indexed docs. It returns relevant chunks. The LLM uses those chunks to answer your question. You get an answer with sources. You keep working.

No context switching. No opening browser tabs. No hunting through Confluence. The knowledge is just… there.

And because it’s local-first (SQLite by default), your data never leaves your machine unless you explicitly use PostgreSQL on a remote server. Your API keys, your docs, your proprietary knowledge, all stays on your laptop. No cloud vendor has access. No security review needed. Just install and go.

For teams, you can run it with PostgreSQL and share the index. Everyone gets the same search results. When someone updates a doc, you re-ingest and everyone benefits. It scales to millions of documents without breaking a sweat.

And if you want to use it outside of Claude? There’s a CLI:

docsearch search “how do we handle retries” --output json

You get JSON with the top results, including content, metadata, and similarity scores. Pipe it into your own tools. Build your own workflows. It’s flexible.

The Stuff I Learned Building This

Building docsearch-mcp taught me a bunch of things I wish I’d known earlier.

Embedding models matter. I tested three different models. text-embedding-3-small was the sweet spot: cheap, fast, good quality. text-embedding-3-large was overkill for most use cases. Older models like ada-002 were noticeably worse at semantic matching.

The numbers back this up: text-embedding-3-large improved MIRACL scores from 31.4% to 54.9% compared to ada-002,[¹⁰] a massive leap in multilingual semantic understanding. But for most use cases, the smaller model hits the right balance of performance and cost.

Hybrid search beats pure vector search. I started with just vector similarity. It worked great for abstract queries (“explain our auth system”), but terrible for exact matches (“find the config file”). Adding FTS fixed that. Now it handles both.

Chunking is harder than it looks. My first version used fixed-size chunks. It sucked. Context got lost. I switched to recursive splitting, and quality improved immediately. Then I added per-file-type logic (code chunks differently than docs), and it got even better.

Metadata is underrated. Returning just the chunk isn’t enough. You need file path, line numbers, page numbers, source URLs, last modified date. Without metadata, users can’t verify results or navigate to the source. With metadata, they trust it more.

Real-time updates are a pain but necessary. Docs change constantly. If your index is stale, people stop using it. I added file watching with automatic re-indexing. Now it stays up to date without manual intervention.

Local-first is a feature, not a limitation. I almost started with a cloud-based design. Then I realized: engineers want to run this on their laptop. They want their data local. SQLite was perfect. PostgreSQL is there if you need scale, but most people don’t.

MCP is a game-changer. Before MCP, I would’ve built a custom Claude plugin or API. MCP gives me interop with any MCP client for free. It’s like writing to a standard interface instead of building for one vendor. Huge unlock.

Conclusion

The global search problem is real. Your knowledge is scattered. Your tools don’t talk to each other. And building a proper RAG pipeline is way harder than it should be.

docsearch-mcp fixes this. It’s a local-first, MCP-compatible search system that works with any AI assistant. It handles ingestion, chunking, embedding, indexing, and search. You get semantic search over all your docs without building RAG from scratch.

And it’s not just about search. It’s about making your AI assistant actually useful. Onboarding, architecture context, best practices, decision history. All of it becomes accessible, instantly, while you’re coding.

If you’ve ever spent 20 minutes hunting for a doc you know exists, or if you’ve ever wished your LLM knew about your internal systems, give it a shot. It’s open source, it’s easy to set up, and it just works.

Check it out at github.com/patrickkoss/docsearch-mcp.

Stop searching. Start finding.

References

[¹]: Sharma, C., & Overgoor, J. (2020). “Scaling Knowledge at Airbnb.” The Airbnb Tech Blog. https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091

[²]: Norton, A. (2021). “Knowledge Management and Enterprise Search are still largely unsolved problems.” Medium. https://medium.com/@angusnorton/knowledge-management-and-enterprise-search-are-still-largely-unsolved-problems-68d9c1e03698

[³]: “The Rise and Evolution of RAG in 2024: A Year in Review.” (2024). RAGFlow Blog. https://ragflow.io/blog/the-rise-and-evolution-of-rag-in-2024-a-year-in-review

[⁴]: Anthropic. (2024). “Introducing the Model Context Protocol.” Anthropic News. https://www.anthropic.com/news/model-context-protocol

[⁵]: Google Developers. (2024). “Improve Gen AI Search with Vertex AI Embeddings and Task Types.” Google Cloud Blog. https://cloud.google.com/blog/products/ai-machine-learning/improve-gen-ai-search-with-vertex-ai-embeddings-and-task-types

[⁶]: “Hierarchical navigable small world.” Wikipedia. https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world

[⁷]: Singh, B. (2024). “PGVector: HNSW vs IVFFlat — A Comprehensive Study.” Medium. https://medium.com/@bavalpreetsinghh/pgvector-hnsw-vs-ivfflat-a-comprehensive-study-21ce0aaab931

[⁸]: Pinecone. (2024). “Chunking Strategies for LLM Applications.” Pinecone Learning Center. https://www.pinecone.io/learn/chunking-strategies/

[⁹]: Chang, S. (2024). “Scaling Knowledge Access and Retrieval at Airbnb.” The Airbnb Tech Blog. https://medium.com/airbnb-engineering/scaling-knowledge-access-and-retrieval-at-airbnb-665b6ba21e95

[¹⁰]: Li, L. (2024). “Embedding Model Comparison: text-embedding-ada-002 vs. text-embedding-3-large Across Different Dimensions.” Medium. https://medium.com/@lilianli1922/embedding-model-comparison-text-embedding-ada-002-vs-a618116575a6

Discussion about this episode

User's avatar

Ready for more?