What Are Embeddings and Vector Databases?

Embeddings are numerical representations of text (or images) that capture meaning. Similar content produces similar vectors — numbers that are close together in a high-dimensional space. That property enables semantic search: find things that mean something, not just things that contain certain words. Vector databases store these vectors and retrieve the nearest neighbors quickly. They are the storage layer for RAG, recommendations, and similarity search.

Think of embeddings as GPS coordinates for meaning. "Dog" and "puppy" are close; "dog" and "tax return" are far. The database finds what is near your query.

Embeddings: Turning Meaning Into Numbers

An embedding model converts text into a vector — a list of hundreds or thousands of numbers. The model is trained so that:

Semantically similar text has similar vectors
The distance between vectors reflects semantic distance

You do not need to understand the math. The important idea: meaning maps to position. "Find documents about project timelines" will match docs that discuss "deadlines," "schedules," and "milestones" even if those exact words do not appear, because their embeddings are close.

Why This Matters

Traditional search matches keywords. If you search "car repair," you get documents containing "car" and "repair." You miss "automotive maintenance" or "vehicle servicing" unless the system has synonyms. Semantic search finds by meaning: it retrieves what is conceptually relevant, even with different wording.

That enables:

RAG — Retrieve relevant document chunks for a question, then generate an answer
Recommendations — "Find items similar to this one"
Deduplication — Detect near-duplicate content
Clustering — Group similar items

Vector Databases

A vector database is built for one job: store vectors and return the k nearest neighbors to a query vector, fast. Relational databases can do this with extensions (e.g., pgvector), but purpose-built systems (Pinecone, Weaviate, Qdrant, Chroma) are optimized for scale and speed.

How it works — You ingest documents, embed them, and insert vectors. At query time, you embed the query and ask for the top k similar vectors. The database returns the closest matches. You use those to fetch the original text or metadata.

Key features — Filtering (e.g., by date or category), hybrid search (combine vector and keyword), and metadata storage. Different databases trade off latency, scale, and features.

The RAG Connection

RAG depends on embeddings and vector databases. Documents are chunked, embedded, and stored. A user question is embedded and used to retrieve relevant chunks. Those chunks are passed to the LLM as context. Without embeddings, you would need keyword search — less accurate for natural language questions. Without a vector database, similarity search at scale would be slow.

When You Need a Vector Database

You need one when:

You run RAG over more than a few hundred documents
You need semantic search or recommendations
You have thousands or millions of vectors
Latency matters (sub-100ms retrieval)

You might not need one when:

Keyword search is sufficient
Your corpus is tiny (embedding and searching in memory can work)
You use a managed RAG product that includes storage

Tools in the Hokai Directory

The >Model Directory includes embedding providers (OpenAI, Cohere, open-source) and vector databases (Pinecone, Weaviate, Qdrant, Chroma). Filter by "embeddings," "vector database," or "RAG" to find options. Many AI platforms bundle embeddings and vector storage, so you may not need separate tools.

The Bottom Line

Embeddings turn text into vectors that capture meaning. Vector databases store and retrieve them for semantic search. Together they power RAG, recommendations, and similarity applications. If you are building a knowledge base or search system, embeddings and a vector store are usually core components.