Product Quantization#

Product Quantization (PQ) is a vector compression technique that significantly reduces memory usage while preserving high search accuracy. Commonly used in HNSW-based vector databases, PQ works by dividing each vector into subvectors and quantizing them independently. This enables compression ratios of 4× to 256×, making it ideal for large-scale, high-dimensional datasets.

ZeusDB Vector Database’s PQ implementation features:

✅ Intelligent Training – PQ model trains automatically at defined thresholds

✅ Efficient Memory Use – Store 4× to 256× more vectors in the same RAM footprint

✅ Fast Approximate Search – Uses Asymmetric Distance Computation (ADC) for high-speed search computation

✅ Seamless Operation – Index automatically switches from raw to quantized storage modes


Quantization Configuration Parameters

type : str, required

Quantization algorithm type. Currently only supports "pq" for Product Quantization.

subvectors : int, default 8

Number of vector subspaces (must divide dimension evenly). The vector is split into this many subvectors for independent quantization. Valid range: 1 to dimension.

bits : int, default 8

Bits per quantized code (controls centroids per subvector). Determines the number of centroids as 2^bits. Higher values provide better accuracy but use more memory. Valid range: 1-8.

training_size : int, default 1000

Minimum vectors needed for stable k-means clustering during quantization training. Must be ≥ 1000 for reliable centroid estimation.

max_training_vectors : int, default None

Maximum vectors used during training (optional limit). When specified, limits the number of vectors used for training even if more are available. Must be ≥ training_size.

storage_mode : str, default “quantized_only”

Storage strategy for vectors. Options:

  • "quantized_only" - Memory optimized, stores only quantized vectors

  • "quantized_with_raw" - Keep both quantized and raw vectors for exact reconstruction


🗜️ Usage Example 1

from zeusdb_vector_database import VectorDatabase
import numpy as np

# Create index with product quantization
vdb = VectorDatabase()

# Configure quantization for memory efficiency
quantization_config = {
    'type': 'pq',                  # `pq` for Product Quantization
    'subvectors': 8,               # Divide 1536-dim vectors into 8 subvectors of 192 dims each
    'bits': 8,                     # 256 centroids per subvector (2^8)
    'training_size': 10000,        # Train when 10k vectors are collected
    'max_training_vectors': 50000  # Use max 50k vectors for training
}

# Create index with quantization
# This will automatically handle training when enough vectors are added
index = vdb.create(
    index_type="hnsw",
    dim=1536,                                  # OpenAI `text-embedding-3-small` dimension
    quantization_config=quantization_config    # Add the compression configuration
)

# Add vectors - training triggers automatically at threshold
documents = [
    {
        "id": f"doc_{i}", 
        "values": np.random.rand(1536).astype(float).tolist(),
        "metadata": {"category": "tech", "year": 2026}
    }
    for i in range(15000) 
]

# Training will trigger automatically when 10k vectors are added
result = index.add(documents)
print(f"Added {result.total_inserted} vectors")

# Check quantization status
print(f"Training progress: {index.get_training_progress():.1f}%")
print(f"Storage mode: {index.get_storage_mode()}")
print(f"Is quantized: {index.is_quantized()}")

# Get compression statistics
quant_info = index.get_quantization_info()
if quant_info:
    print(f"Compression ratio: {quant_info['compression_ratio']:.1f}x")
    print(f"Memory usage: {quant_info['memory_mb']:.1f} MB")

# Search works seamlessly with quantized storage
query_vector = np.random.rand(1536).astype(float).tolist()
results = index.search(vector=query_vector, top_k=3)

# Simply print raw results
print(results)

Results

[
{'id': 'doc_9719', 'score': 0.5133496522903442, 'metadata': {'category': 'tech', 'year': 2026}},
{'id': 'doc_8148', 'score': 0.5139288306236267, 'metadata': {'category': 'tech', 'year': 2026}}, 
{'id': 'doc_7822', 'score': 0.5151920914649963, 'metadata': {'category': 'tech', 'year': 2026}}, 
]

🗜️Usage Example 2 - with explicit storage mode

from zeusdb_vector_database import VectorDatabase
import numpy as np

# Create index with product quantization
vdb = VectorDatabase()

# Configure quantization for memory efficiency
quantization_config = {
    'type': 'pq',                  # `pq` for Product Quantization
    'subvectors': 8,               # Divide 1536-dim vectors into 8 subvectors of 192 dims each
    'bits': 8,                     # 256 centroids per subvector (2^8)
    'training_size': 10000,        # Train when 10k vectors are collected
    'max_training_vectors': 50000,  # Use max 50k vectors for training
    'storage_mode': 'quantized_only'  # Explicitly set storage mode to only keep quantized values
}

# Create index with quantization
# This will automatically handle training when enough vectors are added
index = vdb.create(
    index_type="hnsw",
    dim=3072,                                  # OpenAI `text-embedding-3-large` dimension
    quantization_config=quantization_config    # Add the compression configuration
)


⚙️ Configuration Guidelines#

For Balanced Memory & Accuracy (Recommended to start with)

quantization_config = {
    'type': 'pq',
    'subvectors': 8,      # Balanced: moderate compression, good accuracy
    'bits': 8,            # 256 centroids per subvector (high precision)
    'training_size': 10000,  # Or higher for large datasets
    'storage_mode': 'quantized_only'  # Default, memory efficient
}
# Achieves ~16x–32x compression with strong recall for most applications

For Memory Optimization:

quantization_config = {
    'type': 'pq',
    'subvectors': 16,      # More subvectors = better compression
    'bits': 6,             # Fewer bits = less memory per centroid
    'training_size': 20000,
    'storage_mode': 'quantized_only'
}
# Achieves ~32x compression ratio

For Accuracy Optimization:

quantization_config = {
    'type': 'pq',
    'subvectors': 4,       # Fewer subvectors = better accuracy
    'bits': 8,             # More bits = more precise quantization
    'training_size': 50000 # More training data = better centroids
    'storage_mode': 'quantized_with_raw'  # Keep raw vectors for exact recall
}
# Achieves ~4x compression ratio with minimal accuracy loss

📊 Performance Characteristics#

  • Training: Occurs once when threshold is reached (typically 1-5 minutes for 50k vectors)

  • Memory Reduction: 4x-256x depending on configuration

  • Search Speed: Comparable or faster than raw vectors due to ADC optimization

  • Accuracy Impact: Typically 1-5% recall reduction with proper tuning

Quantization is ideal for production deployments with large vector datasets (100k+ vectors) where memory efficiency is critical.

"quantized_only" is recommended for most use cases and maximizes memory savings.

"quantized_with_raw" keeps both quantized and raw vectors for exact reconstruction, but uses more memory.