tercul-backend/docs/BLEVE_INTEGRATION.md
Damir Mukimov 0f25c8645c
Add Bleve search integration with hybrid search capabilities
- Add Bleve client for keyword search functionality
- Integrate Bleve service into application builder
- Add BleveIndexPath configuration
- Update domain mappings for proper indexing
- Add comprehensive documentation and tests
2025-11-27 03:40:48 +01:00

5.8 KiB

Bleve Search Integration

Overview

Bleve is an embedded full-text search library that provides keyword and exact-match search capabilities. It complements Weaviate's vector/semantic search with traditional text-based search.

Architecture

Package Structure

backend/
├── pkg/search/bleve/           # Bleve client wrapper
│   ├── bleveclient.go          # Core Bleve functionality
│   └── bleveclient_test.go     # Tests
├── internal/platform/search/   # Platform initialization
│   ├── bleve_client.go         # Bleve init/shutdown
│   └── weaviate_client.go      # Weaviate init
└── internal/app/search/        # Application services
    ├── bleve_service.go        # Translation search service
    └── service.go              # Weaviate indexing service

Configuration

Environment variable: BLEVE_INDEX_PATH (default: ./data/bleve_index)

Added to internal/platform/config/config.go:

BleveIndexPath string

Initialization Flow

  1. ApplicationBuilder.BuildBleve() - Called during app startup
  2. platform/search.InitBleve() - Creates/opens Bleve index
  3. Global platform/search.BleveClient available to services

Application Layer

Service: BleveSearchService in internal/app/search/bleve_service.go

Interface:

type BleveSearchService interface {
    IndexTranslation(ctx context.Context, translation domain.Translation) error
    IndexAllTranslations(ctx context.Context) error
    SearchTranslations(ctx context.Context, query string, filters map[string]string, limit int) ([]TranslationSearchResult, error)
}

Access: Available via Application.BleveSearch

Features

Indexing

  • Single Translation: IndexTranslation() - Index one translation
  • Bulk Indexing: IndexAllTranslations() - Index all translations from DB
  • Batch Processing: Automatically batches in chunks of 50,000 for performance
  • Full-text search: Fuzzy matching with configurable fuzziness (default: 2)
  • Filtered search: Combine keyword search with field filters
  • Multi-field indexing: Indexes title, content, description, language, status, etc.

Indexed Fields

{
    "id":                 translation.ID,
    "title":              translation.Title,
    "content":            translation.Content,
    "description":        translation.Description,
    "language":           translation.Language,
    "status":             translation.Status,
    "translatable_id":    translation.TranslatableID,
    "translatable_type":  translation.TranslatableType,
    "translator_id":      translation.TranslatorID,
}

Usage Examples

Indexing a Translation

err := app.BleveSearch.IndexTranslation(ctx, translation)

Searching Translations

// Simple keyword search
results, err := app.BleveSearch.SearchTranslations(ctx, "poetry", nil, 10)

// Search with filters
filters := map[string]string{
    "language": "en",
    "status":   "published",
}
results, err := app.BleveSearch.SearchTranslations(ctx, "shakespeare", filters, 20)

Search Results

type TranslationSearchResult struct {
    ID               uint
    Score            float64  // Relevance score
    Title            string
    Content          string
    Language         string
    TranslatableID   uint
    TranslatableType string
}

Search Strategy: Bleve vs Weaviate

Use Bleve for:

  • Exact keyword matching - Find specific words or phrases
  • Language-filtered search - Search within specific language translations
  • Status-based queries - Filter by draft/published/reviewing status
  • Translator-specific search - Find translations by specific translator
  • High-precision queries - When exact text matching is required

Use Weaviate for:

  • Semantic search - Find conceptually similar content
  • Multilingual search - Cross-language semantic matching
  • Context-aware search - Understanding meaning beyond keywords
  • Recommendation systems - "More like this" functionality

Combine both for optimal results:

  1. Use Bleve for initial keyword filtering
  2. Use Weaviate for semantic reranking
  3. Merge results based on use case

Performance Considerations

Index Size

  • Embedded on-disk index (BBolt backend)
  • Auto-managed by Bleve
  • Location: ./data/bleve_index/ (configurable)

Batch Operations

  • Batch size: 50,000 translations per commit
  • Reduces I/O overhead during bulk indexing

Memory Usage

  • In-memory caching handled by Bleve
  • Minimal application memory footprint

Maintenance

Reindexing

# Delete existing index
rm -rf ./data/bleve_index

# Restart application - index auto-recreates
# Or call IndexAllTranslations() programmatically

Monitoring

  • Check logs for "Bleve search client initialized successfully"
  • Index stats available via Bleve's Index.Stats() API

Future Enhancements

Potential Additions

  1. GraphQL Integration - Add search query/mutation
  2. Incremental Updates - Auto-index on translation create/update
  3. Advanced Analyzers - Language-specific tokenization
  4. Highlighting - Return matched text snippets
  5. Faceted Search - Aggregate by language, status, translator
  6. Pagination - Add cursor-based pagination for large result sets

Performance Optimizations

  1. Index Optimization - Periodic index compaction
  2. Read Replicas - Multiple read-only index instances
  3. Custom Mapping - Fine-tune field analyzers per use case

Dependencies

  • github.com/blevesearch/bleve/v2 v2.5.5
  • 23 additional Bleve sub-packages (auto-managed)
  • go.etcd.io/bbolt v1.4.0 (storage backend)

Documentation