tercul-backend/docs/BLEVE_INTEGRATION.md
Damir Mukimov 0f25c8645c
Add Bleve search integration with hybrid search capabilities
- Add Bleve client for keyword search functionality
- Integrate Bleve service into application builder
- Add BleveIndexPath configuration
- Update domain mappings for proper indexing
- Add comprehensive documentation and tests
2025-11-27 03:40:48 +01:00

210 lines
5.8 KiB
Markdown

# Bleve Search Integration
## Overview
Bleve is an embedded full-text search library that provides keyword and exact-match search capabilities. It complements Weaviate's vector/semantic search with traditional text-based search.
## Architecture
### Package Structure
```
backend/
├── pkg/search/bleve/ # Bleve client wrapper
│ ├── bleveclient.go # Core Bleve functionality
│ └── bleveclient_test.go # Tests
├── internal/platform/search/ # Platform initialization
│ ├── bleve_client.go # Bleve init/shutdown
│ └── weaviate_client.go # Weaviate init
└── internal/app/search/ # Application services
├── bleve_service.go # Translation search service
└── service.go # Weaviate indexing service
```
### Configuration
Environment variable: `BLEVE_INDEX_PATH` (default: `./data/bleve_index`)
Added to `internal/platform/config/config.go`:
```go
BleveIndexPath string
```
### Initialization Flow
1. `ApplicationBuilder.BuildBleve()` - Called during app startup
2. `platform/search.InitBleve()` - Creates/opens Bleve index
3. Global `platform/search.BleveClient` available to services
### Application Layer
**Service**: `BleveSearchService` in `internal/app/search/bleve_service.go`
**Interface**:
```go
type BleveSearchService interface {
IndexTranslation(ctx context.Context, translation domain.Translation) error
IndexAllTranslations(ctx context.Context) error
SearchTranslations(ctx context.Context, query string, filters map[string]string, limit int) ([]TranslationSearchResult, error)
}
```
**Access**: Available via `Application.BleveSearch`
## Features
### Indexing
- **Single Translation**: `IndexTranslation()` - Index one translation
- **Bulk Indexing**: `IndexAllTranslations()` - Index all translations from DB
- **Batch Processing**: Automatically batches in chunks of 50,000 for performance
### Search
- **Full-text search**: Fuzzy matching with configurable fuzziness (default: 2)
- **Filtered search**: Combine keyword search with field filters
- **Multi-field indexing**: Indexes title, content, description, language, status, etc.
### Indexed Fields
```go
{
"id": translation.ID,
"title": translation.Title,
"content": translation.Content,
"description": translation.Description,
"language": translation.Language,
"status": translation.Status,
"translatable_id": translation.TranslatableID,
"translatable_type": translation.TranslatableType,
"translator_id": translation.TranslatorID,
}
```
## Usage Examples
### Indexing a Translation
```go
err := app.BleveSearch.IndexTranslation(ctx, translation)
```
### Searching Translations
```go
// Simple keyword search
results, err := app.BleveSearch.SearchTranslations(ctx, "poetry", nil, 10)
// Search with filters
filters := map[string]string{
"language": "en",
"status": "published",
}
results, err := app.BleveSearch.SearchTranslations(ctx, "shakespeare", filters, 20)
```
### Search Results
```go
type TranslationSearchResult struct {
ID uint
Score float64 // Relevance score
Title string
Content string
Language string
TranslatableID uint
TranslatableType string
}
```
## Search Strategy: Bleve vs Weaviate
### Use Bleve for:
- **Exact keyword matching** - Find specific words or phrases
- **Language-filtered search** - Search within specific language translations
- **Status-based queries** - Filter by draft/published/reviewing status
- **Translator-specific search** - Find translations by specific translator
- **High-precision queries** - When exact text matching is required
### Use Weaviate for:
- **Semantic search** - Find conceptually similar content
- **Multilingual search** - Cross-language semantic matching
- **Context-aware search** - Understanding meaning beyond keywords
- **Recommendation systems** - "More like this" functionality
### Hybrid Search
Combine both for optimal results:
1. Use Bleve for initial keyword filtering
2. Use Weaviate for semantic reranking
3. Merge results based on use case
## Performance Considerations
### Index Size
- Embedded on-disk index (BBolt backend)
- Auto-managed by Bleve
- Location: `./data/bleve_index/` (configurable)
### Batch Operations
- Batch size: 50,000 translations per commit
- Reduces I/O overhead during bulk indexing
### Memory Usage
- In-memory caching handled by Bleve
- Minimal application memory footprint
## Maintenance
### Reindexing
```bash
# Delete existing index
rm -rf ./data/bleve_index
# Restart application - index auto-recreates
# Or call IndexAllTranslations() programmatically
```
### Monitoring
- Check logs for "Bleve search client initialized successfully"
- Index stats available via Bleve's `Index.Stats()` API
## Future Enhancements
### Potential Additions
1. **GraphQL Integration** - Add search query/mutation
2. **Incremental Updates** - Auto-index on translation create/update
3. **Advanced Analyzers** - Language-specific tokenization
4. **Highlighting** - Return matched text snippets
5. **Faceted Search** - Aggregate by language, status, translator
6. **Pagination** - Add cursor-based pagination for large result sets
### Performance Optimizations
1. **Index Optimization** - Periodic index compaction
2. **Read Replicas** - Multiple read-only index instances
3. **Custom Mapping** - Fine-tune field analyzers per use case
## Dependencies
- `github.com/blevesearch/bleve/v2` v2.5.5
- 23 additional Bleve sub-packages (auto-managed)
- `go.etcd.io/bbolt` v1.4.0 (storage backend)
## Documentation
- [Bleve Documentation](http://blevesearch.com/)
- [Bleve GitHub](https://github.com/blevesearch/bleve)
- Backend implementation: See source files above