mirror of
https://github.com/SamyRai/tercul-backend.git
synced 2025-12-27 05:11:34 +00:00
- Add Bleve client for keyword search functionality - Integrate Bleve service into application builder - Add BleveIndexPath configuration - Update domain mappings for proper indexing - Add comprehensive documentation and tests
210 lines
5.8 KiB
Markdown
210 lines
5.8 KiB
Markdown
# Bleve Search Integration
|
|
|
|
## Overview
|
|
|
|
Bleve is an embedded full-text search library that provides keyword and exact-match search capabilities. It complements Weaviate's vector/semantic search with traditional text-based search.
|
|
|
|
## Architecture
|
|
|
|
### Package Structure
|
|
|
|
```
|
|
backend/
|
|
├── pkg/search/bleve/ # Bleve client wrapper
|
|
│ ├── bleveclient.go # Core Bleve functionality
|
|
│ └── bleveclient_test.go # Tests
|
|
├── internal/platform/search/ # Platform initialization
|
|
│ ├── bleve_client.go # Bleve init/shutdown
|
|
│ └── weaviate_client.go # Weaviate init
|
|
└── internal/app/search/ # Application services
|
|
├── bleve_service.go # Translation search service
|
|
└── service.go # Weaviate indexing service
|
|
```
|
|
|
|
### Configuration
|
|
|
|
Environment variable: `BLEVE_INDEX_PATH` (default: `./data/bleve_index`)
|
|
|
|
Added to `internal/platform/config/config.go`:
|
|
|
|
```go
|
|
BleveIndexPath string
|
|
```
|
|
|
|
### Initialization Flow
|
|
|
|
1. `ApplicationBuilder.BuildBleve()` - Called during app startup
|
|
2. `platform/search.InitBleve()` - Creates/opens Bleve index
|
|
3. Global `platform/search.BleveClient` available to services
|
|
|
|
### Application Layer
|
|
|
|
**Service**: `BleveSearchService` in `internal/app/search/bleve_service.go`
|
|
|
|
**Interface**:
|
|
|
|
```go
|
|
type BleveSearchService interface {
|
|
IndexTranslation(ctx context.Context, translation domain.Translation) error
|
|
IndexAllTranslations(ctx context.Context) error
|
|
SearchTranslations(ctx context.Context, query string, filters map[string]string, limit int) ([]TranslationSearchResult, error)
|
|
}
|
|
```
|
|
|
|
**Access**: Available via `Application.BleveSearch`
|
|
|
|
## Features
|
|
|
|
### Indexing
|
|
|
|
- **Single Translation**: `IndexTranslation()` - Index one translation
|
|
- **Bulk Indexing**: `IndexAllTranslations()` - Index all translations from DB
|
|
- **Batch Processing**: Automatically batches in chunks of 50,000 for performance
|
|
|
|
### Search
|
|
|
|
- **Full-text search**: Fuzzy matching with configurable fuzziness (default: 2)
|
|
- **Filtered search**: Combine keyword search with field filters
|
|
- **Multi-field indexing**: Indexes title, content, description, language, status, etc.
|
|
|
|
### Indexed Fields
|
|
|
|
```go
|
|
{
|
|
"id": translation.ID,
|
|
"title": translation.Title,
|
|
"content": translation.Content,
|
|
"description": translation.Description,
|
|
"language": translation.Language,
|
|
"status": translation.Status,
|
|
"translatable_id": translation.TranslatableID,
|
|
"translatable_type": translation.TranslatableType,
|
|
"translator_id": translation.TranslatorID,
|
|
}
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### Indexing a Translation
|
|
|
|
```go
|
|
err := app.BleveSearch.IndexTranslation(ctx, translation)
|
|
```
|
|
|
|
### Searching Translations
|
|
|
|
```go
|
|
// Simple keyword search
|
|
results, err := app.BleveSearch.SearchTranslations(ctx, "poetry", nil, 10)
|
|
|
|
// Search with filters
|
|
filters := map[string]string{
|
|
"language": "en",
|
|
"status": "published",
|
|
}
|
|
results, err := app.BleveSearch.SearchTranslations(ctx, "shakespeare", filters, 20)
|
|
```
|
|
|
|
### Search Results
|
|
|
|
```go
|
|
type TranslationSearchResult struct {
|
|
ID uint
|
|
Score float64 // Relevance score
|
|
Title string
|
|
Content string
|
|
Language string
|
|
TranslatableID uint
|
|
TranslatableType string
|
|
}
|
|
```
|
|
|
|
## Search Strategy: Bleve vs Weaviate
|
|
|
|
### Use Bleve for:
|
|
|
|
- **Exact keyword matching** - Find specific words or phrases
|
|
- **Language-filtered search** - Search within specific language translations
|
|
- **Status-based queries** - Filter by draft/published/reviewing status
|
|
- **Translator-specific search** - Find translations by specific translator
|
|
- **High-precision queries** - When exact text matching is required
|
|
|
|
### Use Weaviate for:
|
|
|
|
- **Semantic search** - Find conceptually similar content
|
|
- **Multilingual search** - Cross-language semantic matching
|
|
- **Context-aware search** - Understanding meaning beyond keywords
|
|
- **Recommendation systems** - "More like this" functionality
|
|
|
|
### Hybrid Search
|
|
|
|
Combine both for optimal results:
|
|
|
|
1. Use Bleve for initial keyword filtering
|
|
2. Use Weaviate for semantic reranking
|
|
3. Merge results based on use case
|
|
|
|
## Performance Considerations
|
|
|
|
### Index Size
|
|
|
|
- Embedded on-disk index (BBolt backend)
|
|
- Auto-managed by Bleve
|
|
- Location: `./data/bleve_index/` (configurable)
|
|
|
|
### Batch Operations
|
|
|
|
- Batch size: 50,000 translations per commit
|
|
- Reduces I/O overhead during bulk indexing
|
|
|
|
### Memory Usage
|
|
|
|
- In-memory caching handled by Bleve
|
|
- Minimal application memory footprint
|
|
|
|
## Maintenance
|
|
|
|
### Reindexing
|
|
|
|
```bash
|
|
# Delete existing index
|
|
rm -rf ./data/bleve_index
|
|
|
|
# Restart application - index auto-recreates
|
|
# Or call IndexAllTranslations() programmatically
|
|
```
|
|
|
|
### Monitoring
|
|
|
|
- Check logs for "Bleve search client initialized successfully"
|
|
- Index stats available via Bleve's `Index.Stats()` API
|
|
|
|
## Future Enhancements
|
|
|
|
### Potential Additions
|
|
|
|
1. **GraphQL Integration** - Add search query/mutation
|
|
2. **Incremental Updates** - Auto-index on translation create/update
|
|
3. **Advanced Analyzers** - Language-specific tokenization
|
|
4. **Highlighting** - Return matched text snippets
|
|
5. **Faceted Search** - Aggregate by language, status, translator
|
|
6. **Pagination** - Add cursor-based pagination for large result sets
|
|
|
|
### Performance Optimizations
|
|
|
|
1. **Index Optimization** - Periodic index compaction
|
|
2. **Read Replicas** - Multiple read-only index instances
|
|
3. **Custom Mapping** - Fine-tune field analyzers per use case
|
|
|
|
## Dependencies
|
|
|
|
- `github.com/blevesearch/bleve/v2` v2.5.5
|
|
- 23 additional Bleve sub-packages (auto-managed)
|
|
- `go.etcd.io/bbolt` v1.4.0 (storage backend)
|
|
|
|
## Documentation
|
|
|
|
- [Bleve Documentation](http://blevesearch.com/)
|
|
- [Bleve GitHub](https://github.com/blevesearch/bleve)
|
|
- Backend implementation: See source files above
|