LlamaIndex#
A high-performance LlamaIndex integration for ZeusDB, bringing enterprise-grade vector search capabilities to your LlamaIndex RAG applications.
Features#
🎯 LlamaIndex Native
Full VectorStore API compliance
Direct query interface for low-level control
Seamless VectorStoreIndex integration
Maximal Marginal Relevance (MMR) search
🚀 High Performance
Rust-powered vector database backend
Advanced HNSW indexing for sub-millisecond search
Product Quantization for 4x-256x memory compression
Concurrent search with automatic parallelization
🏢 Enterprise Ready
Complete persistence with state preservation
Advanced metadata filtering
Structured logging with performance monitoring
Async/await support for non-blocking operations
Quick Start#
Installation#
pip install llama-index-vector-stores-zeusdb
Getting Started#
This example uses OpenAIEmbeddings, which requires an OpenAI API key - Get your OpenAI API key here
If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.).
pip install llama-index-embeddings-openai llama-index-llms-openai
import os
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
Basic Usage#
from llama_index.core import VectorStoreIndex, Document, StorageContext
from llama_index.vector_stores.zeusdb import ZeusDBVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
# Set up embedding model and LLM
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-5")
# Create ZeusDB vector store
vector_store = ZeusDBVectorStore(
dim=1536, # OpenAI embedding dimension
distance="cosine",
index_type="hnsw"
)
# Create storage context
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Create documents
documents = [
Document(text="ZeusDB is a high-performance vector database."),
Document(text="LlamaIndex provides RAG capabilities."),
Document(text="Vector search enables semantic similarity.")
]
# Create index and store documents
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is ZeusDB?")
print(response)
Expected results:
ZeusDB is a high-performance vector database.
Advanced Features#
ZeusDB’s enterprise-grade capabilities are fully integrated into the LlamaIndex ecosystem, providing quantization, persistence, advanced search features and many other enterprise capabilities.
All examples below have been tested and verified to work as-is. Copy-paste and run them directly.
Direct Query Interface#
Execute low-level queries against the vector store:
from llama_index.core.vector_stores.types import VectorStoreQuery
# Create query
embed_model = Settings.embed_model
query_embedding = embed_model.get_text_embedding("machine learning")
query_obj = VectorStoreQuery(
query_embedding=query_embedding,
similarity_top_k=2
)
# Execute query
results = vector_store.query(query_obj)
# Results contain IDs and similarities
print(f"Found {len(results.ids or [])} results:")
for node_id, similarity in zip(results.ids or [], results.similarities or []):
print(f" ID: {node_id}, Similarity: {similarity:.4f}")
Expected results:
Found 2 results:
ID: d40de654-67b6-4621-a950-9b29a0742089, Similarity: 0.6730
ID: 984cf00d-8595-425e-b8a3-3aeaa96207f8, Similarity: 0.7777
MMR Search for Diversity#
MMR (Maximal Marginal Relevance) balances relevance and diversity, reducing near-duplicate results:
# MMR search via direct query
mmr_results = vector_store.query(
query_obj,
mmr=True,
fetch_k=10,
mmr_lambda=0.7 # 0.0=max diversity, 1.0=pure relevance
)
print(f"MMR Results: {len(mmr_results.ids or [])} items (with diversity)")
Expected results:
MMR Results: 2 items (with diversity)
Note: MMR re-ranking happens adapter-side after initial search. The adapter automatically enables return_vector=True when MMR is requested. Control with fetch_k (candidate pool size) and mmr_lambda (0.0=max diversity, 1.0=pure relevance).
Search with Metadata Filtering#
Filter results using document metadata:
from llama_index.core.vector_stores.types import (
MetadataFilters,
FilterOperator,
FilterCondition
)
# Create a fresh vector store for this example
vector_store = ZeusDBVectorStore(dim=1536, distance="cosine")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Create documents with metadata
documents_with_meta = [
Document(
text="Python is great for data science",
metadata={"category": "tech", "year": 2024}
),
Document(
text="JavaScript is for web development",
metadata={"category": "tech", "year": 2023}
),
Document(
text="Climate change impacts ecosystems",
metadata={"category": "environment", "year": 2024}
),
]
# Build index with metadata
index = VectorStoreIndex.from_documents(
documents_with_meta,
storage_context=storage_context
)
# Create metadata filter
filters = MetadataFilters.from_dicts([
{"key": "category", "value": "tech", "operator": FilterOperator.EQ},
{"key": "year", "value": 2024, "operator": FilterOperator.GTE}
], condition=FilterCondition.AND)
# Use the retriever with filters (recommended approach)
retriever = index.as_retriever(similarity_top_k=5, filters=filters)
filtered_results = retriever.retrieve("programming")
# Process results
for r in filtered_results:
print(f"- {r.node.get_content(metadata_mode='none')}")
print(f" Metadata: {r.node.metadata}\n")
Expected results:
- Python is great for data science
Metadata: {'category': 'tech', 'year': 2024}
Supported operators: EQ, NE, GT, GTE, LT, LTE, IN, NIN, ANY, ALL, CONTAINS, TEXT_MATCH, TEXT_MATCH_INSENSITIVE, IS_EMPTY
Note: Combine conditions with AND only. OR and NOT conditions are not yet supported by the ZeusDB backend.
Persistence#
Save and load indexes with complete state restoration:
# Save index
save_path = "my_index.zdb"
vector_store.save_index(save_path)
print(f"âś… Index saved to {save_path}")
print(f" Vector count: {vector_store.get_vector_count()}")
# Load index
loaded_store = ZeusDBVectorStore.load_index(save_path)
print(f"âś… Index loaded from {save_path}")
print(f" Vector count: {loaded_store.get_vector_count()}")
Expected results:
âś… Index saved to my_index.zdb
Vector count: 3
âś… Index loaded from my_index.zdb
Vector count: 3
What gets saved:
Vectors & IDs
Metadata
HNSW graph structure
Quantization config, centroids, and training state (if PQ is enabled)
Notes:
The path is a directory, not a single file. Ensure the target is writable.
Saved indexes are cross-platform and include format/version info for compatibility checks.
If you used PQ, both the compression model and state are preserved—no need to retrain after loading.
Memory-Efficient Setup with Quantization#
For large datasets, use Product Quantization to reduce memory usage:
# Create quantized vector store for memory efficiency
quantization_config = {
'type': 'pq',
'subvectors': 8,
'bits': 8,
'training_size': 1000,
'storage_mode': 'quantized_only'
}
vector_store = ZeusDBVectorStore(
dim=1536,
distance="cosine",
index_type="hnsw",
quantization_config=quantization_config
)
# Check quantization status
print(f"Is quantized: {vector_store.is_quantized()}")
print(f"Can use quantization: {vector_store.can_use_quantization()}")
print(f"Training progress: {vector_store.get_training_progress():.1f}%")
print(f"Storage mode: {vector_store.get_storage_mode()}")
Expected results:
Is quantized: False
Can use quantization: False
Training progress: 0.0%
Storage mode: raw_collecting_for_training
Please refer to our documentation for helpful configuration guidelines and recommendations for setting up quantization.
Delete Operations#
Delete vectors by node ID (standalone example):
from llama_index.core import VectorStoreIndex, Document, StorageContext
from llama_index.vector_stores.zeusdb import ZeusDBVectorStore
# Create a fresh vector store for this example
delete_vs = ZeusDBVectorStore(dim=1536, distance="cosine")
delete_sc = StorageContext.from_defaults(vector_store=delete_vs)
# Create documents
delete_docs = [
Document(text=f"Document {i}", metadata={"doc_id": i})
for i in range(5)
]
# Build index
delete_index = VectorStoreIndex.from_documents(
delete_docs,
storage_context=delete_sc
)
print(f"Before delete: {delete_vs.get_vector_count()} vectors")
# Get node IDs to delete
retriever = delete_index.as_retriever(similarity_top_k=10)
results = retriever.retrieve("document")
if results:
# Extract node IDs from results
node_ids_to_delete = [result.node.node_id for result in results[:2]]
print(f"Deleting node IDs: {node_ids_to_delete[0][:8]}...")
# Delete by node IDs
delete_vs.delete_nodes(node_ids=node_ids_to_delete)
print(f"After delete: {delete_vs.get_vector_count()} vectors")
print("âś… delete_nodes(node_ids=[...]) works!")
# Demonstrate unsupported delete by ref_doc_id
try:
delete_vs.delete(ref_doc_id="doc_1")
print("❌ Should have raised NotImplementedError")
except NotImplementedError as e:
print("❌ delete(ref_doc_id='...') raises NotImplementedError")
print(f" (This is expected - not supported by backend)")
Expected results:
Before delete: 5 vectors
Deleting node IDs: 3b1843da...
After delete: 3 vectors
âś… delete_nodes(node_ids=[...]) works!
❌ delete(ref_doc_id='...') raises NotImplementedError
(This is expected - not supported by backend)
Async Support#
ZeusDB supports asynchronous operations for non-blocking, concurrent vector operations.
When to use async: web servers (FastAPI/Starlette), agents/pipelines doing parallel searches, or notebooks where you want non-blocking/concurrent operations. For simple scripts, sync methods are fine.
Available async methods:
await vector_store.async_add(nodes)- Add nodes asynchronouslyawait vector_store.aquery(query_obj)- Query asynchronouslyawait vector_store.adelete_nodes(node_ids)- Delete by node IDs asynchronouslyawait vector_store.aclear()- Clear the index asynchronously
All async methods follow the LlamaIndex standard naming convention and are thread-offloaded via asyncio.to_thread() for non-blocking operations.
Example:
import asyncio
from llama_index.core.schema import TextNode
# In Jupyter, use nest_asyncio to handle event loops
try:
import nest_asyncio
nest_asyncio.apply()
except ImportError:
pass
async def async_operations():
# Create nodes
nodes = [
TextNode(text=f"Document {i}", metadata={"doc_id": i})
for i in range(10)
]
# Generate embeddings (required before adding)
embed_model = Settings.embed_model
for node in nodes:
node.embedding = embed_model.get_text_embedding(node.text)
# Add nodes asynchronously
node_ids = await vector_store.async_add(nodes)
print(f"Added {len(node_ids)} nodes")
# Query asynchronously
query_embedding = embed_model.get_text_embedding("document")
query_obj = VectorStoreQuery(
query_embedding=query_embedding,
similarity_top_k=3
)
results = await vector_store.aquery(query_obj)
print(f"Found {len(results.ids or [])} results")
# Delete asynchronously
await vector_store.adelete_nodes(node_ids=node_ids[:2])
print(f"Deleted 2 nodes, {vector_store.get_vector_count()} remaining")
# Run async function
await async_operations() # In Jupyter
# asyncio.run(async_operations()) # In regular Python scripts
Expected results:
Added 10 nodes
Found 3 results
Deleted 2 nodes, 8 remaining
Monitoring and Observability#
Performance Monitoring#
# Get index statistics
stats = vector_store.get_zeusdb_stats()
print(f"Key stats: vectors={stats.get('total_vectors')}, space={stats.get('space')}")
# Get vector count
count = vector_store.get_vector_count()
print(f"Vector count: {count}")
# Get detailed index info
info = vector_store.info()
print(f"Index info: {info}")
# Check quantization status
if vector_store.is_quantized():
progress = vector_store.get_training_progress()
quant_info = vector_store.get_quantization_info()
print(f"Quantization: {progress:.1f}% complete")
print(f"Compression: {quant_info['compression_ratio']:.1f}x")
else:
print("Index is not quantized")
Expected results:
Key stats: vectors=3, space=cosine
Vector count: 3
Index info: HNSWIndex(dim=1536, space=cosine, m=16, ef_construction=200, expected_size=10000, vectors=3, quantization=none)
Is quantized: False
Enterprise Logging#
ZeusDB includes enterprise-grade structured logging that works automatically with smart environment detection:
Environment-Aware Logging:
Development: Human-readable logs, WARNING level
Production: JSON structured logs, ERROR level
Testing: Minimal output, CRITICAL level
Jupyter: Clean readable logs, INFO level
Example log output from operations:
{"operation":"add_vectors","requested":10,"inserted":10,"duration_ms":45}
Control logging with environment variables:
ZEUSDB_LOG_LEVEL=debug ZEUSDB_LOG_FORMAT=json python your_app.py
To learn more about the full features of ZeusDB’s enterprise logging capabilities, see the documentation.
Configuration Options#
Vector Store Parameters#
vector_store = ZeusDBVectorStore(
dim=1536, # Vector dimension (required)
distance="cosine", # Distance metric: cosine, l2, l1
index_type="hnsw", # Index algorithm
m=16, # HNSW connectivity
ef_construction=200, # Build-time search width
expected_size=10000, # Expected number of vectors
quantization_config=None # Optional quantization
)
Query Parameters#
results = vector_store.query(
query_obj,
mmr=False, # Enable MMR
fetch_k=20, # MMR candidate pool size
mmr_lambda=0.7, # MMR diversity control
ef_search=None, # Runtime search width (auto if None)
auto_fallback=True # Retry with broader search if needed
)
Error Handling#
The integration includes comprehensive error handling:
try:
results = vector_store.query(query_obj)
print(results)
except Exception as e:
# Graceful degradation with logging
print(f"Query failed: {e}")
# Fallback logic here
Requirements#
Python: 3.10 or higher
ZeusDB: 0.0.8 or higher
LlamaIndex Core: 0.14.4 or higher
Installation from Source#
git clone https://github.com/zeusdb/llama-index-vector-stores-zeusdb.git
cd llama-index-vector-stores-zeusdb
pip install -e .
Compatibility#
Distance Metrics#
Cosine: Default, normalized similarity
Euclidean (L2): Geometric distance
Manhattan (L1): City-block distance
Embedding Models#
Compatible with any LlamaIndex embedding provider:
OpenAI (
text-embedding-3-small,text-embedding-3-large)Hugging Face Transformers
Cohere Embeddings
Custom embedding functions
Known Limitations#
Query Results: The
query()method returnsidsandsimilaritiesonly (withnodes=None). Use the returned IDs to retrieve full node content if needed.Delete Operations: Only deletion by node ID is supported (
delete_nodes(node_ids=[...])). Deletion byref_doc_idor metadata filters is not supported.
Making vector search fast, scalable, and developer-friendly.