# Bleve Search Integration ## Overview Bleve is an embedded full-text search library that provides keyword and exact-match search capabilities. It complements Weaviate's vector/semantic search with traditional text-based search. ## Architecture ### Package Structure ``` backend/ ├── pkg/search/bleve/ # Bleve client wrapper │ ├── bleveclient.go # Core Bleve functionality │ └── bleveclient_test.go # Tests ├── internal/platform/search/ # Platform initialization │ ├── bleve_client.go # Bleve init/shutdown │ └── weaviate_client.go # Weaviate init └── internal/app/search/ # Application services ├── bleve_service.go # Translation search service └── service.go # Weaviate indexing service ``` ### Configuration Environment variable: `BLEVE_INDEX_PATH` (default: `./data/bleve_index`) Added to `internal/platform/config/config.go`: ```go BleveIndexPath string ``` ### Initialization Flow 1. `ApplicationBuilder.BuildBleve()` - Called during app startup 2. `platform/search.InitBleve()` - Creates/opens Bleve index 3. Global `platform/search.BleveClient` available to services ### Application Layer **Service**: `BleveSearchService` in `internal/app/search/bleve_service.go` **Interface**: ```go type BleveSearchService interface { IndexTranslation(ctx context.Context, translation domain.Translation) error IndexAllTranslations(ctx context.Context) error SearchTranslations(ctx context.Context, query string, filters map[string]string, limit int) ([]TranslationSearchResult, error) } ``` **Access**: Available via `Application.BleveSearch` ## Features ### Indexing - **Single Translation**: `IndexTranslation()` - Index one translation - **Bulk Indexing**: `IndexAllTranslations()` - Index all translations from DB - **Batch Processing**: Automatically batches in chunks of 50,000 for performance ### Search - **Full-text search**: Fuzzy matching with configurable fuzziness (default: 2) - **Filtered search**: Combine keyword search with field filters - **Multi-field indexing**: Indexes title, content, description, language, status, etc. ### Indexed Fields ```go { "id": translation.ID, "title": translation.Title, "content": translation.Content, "description": translation.Description, "language": translation.Language, "status": translation.Status, "translatable_id": translation.TranslatableID, "translatable_type": translation.TranslatableType, "translator_id": translation.TranslatorID, } ``` ## Usage Examples ### Indexing a Translation ```go err := app.BleveSearch.IndexTranslation(ctx, translation) ``` ### Searching Translations ```go // Simple keyword search results, err := app.BleveSearch.SearchTranslations(ctx, "poetry", nil, 10) // Search with filters filters := map[string]string{ "language": "en", "status": "published", } results, err := app.BleveSearch.SearchTranslations(ctx, "shakespeare", filters, 20) ``` ### Search Results ```go type TranslationSearchResult struct { ID uint Score float64 // Relevance score Title string Content string Language string TranslatableID uint TranslatableType string } ``` ## Search Strategy: Bleve vs Weaviate ### Use Bleve for: - **Exact keyword matching** - Find specific words or phrases - **Language-filtered search** - Search within specific language translations - **Status-based queries** - Filter by draft/published/reviewing status - **Translator-specific search** - Find translations by specific translator - **High-precision queries** - When exact text matching is required ### Use Weaviate for: - **Semantic search** - Find conceptually similar content - **Multilingual search** - Cross-language semantic matching - **Context-aware search** - Understanding meaning beyond keywords - **Recommendation systems** - "More like this" functionality ### Hybrid Search Combine both for optimal results: 1. Use Bleve for initial keyword filtering 2. Use Weaviate for semantic reranking 3. Merge results based on use case ## Performance Considerations ### Index Size - Embedded on-disk index (BBolt backend) - Auto-managed by Bleve - Location: `./data/bleve_index/` (configurable) ### Batch Operations - Batch size: 50,000 translations per commit - Reduces I/O overhead during bulk indexing ### Memory Usage - In-memory caching handled by Bleve - Minimal application memory footprint ## Maintenance ### Reindexing ```bash # Delete existing index rm -rf ./data/bleve_index # Restart application - index auto-recreates # Or call IndexAllTranslations() programmatically ``` ### Monitoring - Check logs for "Bleve search client initialized successfully" - Index stats available via Bleve's `Index.Stats()` API ## Future Enhancements ### Potential Additions 1. **GraphQL Integration** - Add search query/mutation 2. **Incremental Updates** - Auto-index on translation create/update 3. **Advanced Analyzers** - Language-specific tokenization 4. **Highlighting** - Return matched text snippets 5. **Faceted Search** - Aggregate by language, status, translator 6. **Pagination** - Add cursor-based pagination for large result sets ### Performance Optimizations 1. **Index Optimization** - Periodic index compaction 2. **Read Replicas** - Multiple read-only index instances 3. **Custom Mapping** - Fine-tune field analyzers per use case ## Dependencies - `github.com/blevesearch/bleve/v2` v2.5.5 - 23 additional Bleve sub-packages (auto-managed) - `go.etcd.io/bbolt` v1.4.0 (storage backend) ## Documentation - [Bleve Documentation](http://blevesearch.com/) - [Bleve GitHub](https://github.com/blevesearch/bleve) - Backend implementation: See source files above