mirror of
https://github.com/SamyRai/tercul-backend.git
synced 2025-12-27 02:51:34 +00:00
- Add Bleve client for keyword search functionality - Integrate Bleve service into application builder - Add BleveIndexPath configuration - Update domain mappings for proper indexing - Add comprehensive documentation and tests
5.8 KiB
5.8 KiB
Bleve Search Integration
Overview
Bleve is an embedded full-text search library that provides keyword and exact-match search capabilities. It complements Weaviate's vector/semantic search with traditional text-based search.
Architecture
Package Structure
backend/
├── pkg/search/bleve/ # Bleve client wrapper
│ ├── bleveclient.go # Core Bleve functionality
│ └── bleveclient_test.go # Tests
├── internal/platform/search/ # Platform initialization
│ ├── bleve_client.go # Bleve init/shutdown
│ └── weaviate_client.go # Weaviate init
└── internal/app/search/ # Application services
├── bleve_service.go # Translation search service
└── service.go # Weaviate indexing service
Configuration
Environment variable: BLEVE_INDEX_PATH (default: ./data/bleve_index)
Added to internal/platform/config/config.go:
BleveIndexPath string
Initialization Flow
ApplicationBuilder.BuildBleve()- Called during app startupplatform/search.InitBleve()- Creates/opens Bleve index- Global
platform/search.BleveClientavailable to services
Application Layer
Service: BleveSearchService in internal/app/search/bleve_service.go
Interface:
type BleveSearchService interface {
IndexTranslation(ctx context.Context, translation domain.Translation) error
IndexAllTranslations(ctx context.Context) error
SearchTranslations(ctx context.Context, query string, filters map[string]string, limit int) ([]TranslationSearchResult, error)
}
Access: Available via Application.BleveSearch
Features
Indexing
- Single Translation:
IndexTranslation()- Index one translation - Bulk Indexing:
IndexAllTranslations()- Index all translations from DB - Batch Processing: Automatically batches in chunks of 50,000 for performance
Search
- Full-text search: Fuzzy matching with configurable fuzziness (default: 2)
- Filtered search: Combine keyword search with field filters
- Multi-field indexing: Indexes title, content, description, language, status, etc.
Indexed Fields
{
"id": translation.ID,
"title": translation.Title,
"content": translation.Content,
"description": translation.Description,
"language": translation.Language,
"status": translation.Status,
"translatable_id": translation.TranslatableID,
"translatable_type": translation.TranslatableType,
"translator_id": translation.TranslatorID,
}
Usage Examples
Indexing a Translation
err := app.BleveSearch.IndexTranslation(ctx, translation)
Searching Translations
// Simple keyword search
results, err := app.BleveSearch.SearchTranslations(ctx, "poetry", nil, 10)
// Search with filters
filters := map[string]string{
"language": "en",
"status": "published",
}
results, err := app.BleveSearch.SearchTranslations(ctx, "shakespeare", filters, 20)
Search Results
type TranslationSearchResult struct {
ID uint
Score float64 // Relevance score
Title string
Content string
Language string
TranslatableID uint
TranslatableType string
}
Search Strategy: Bleve vs Weaviate
Use Bleve for:
- Exact keyword matching - Find specific words or phrases
- Language-filtered search - Search within specific language translations
- Status-based queries - Filter by draft/published/reviewing status
- Translator-specific search - Find translations by specific translator
- High-precision queries - When exact text matching is required
Use Weaviate for:
- Semantic search - Find conceptually similar content
- Multilingual search - Cross-language semantic matching
- Context-aware search - Understanding meaning beyond keywords
- Recommendation systems - "More like this" functionality
Hybrid Search
Combine both for optimal results:
- Use Bleve for initial keyword filtering
- Use Weaviate for semantic reranking
- Merge results based on use case
Performance Considerations
Index Size
- Embedded on-disk index (BBolt backend)
- Auto-managed by Bleve
- Location:
./data/bleve_index/(configurable)
Batch Operations
- Batch size: 50,000 translations per commit
- Reduces I/O overhead during bulk indexing
Memory Usage
- In-memory caching handled by Bleve
- Minimal application memory footprint
Maintenance
Reindexing
# Delete existing index
rm -rf ./data/bleve_index
# Restart application - index auto-recreates
# Or call IndexAllTranslations() programmatically
Monitoring
- Check logs for "Bleve search client initialized successfully"
- Index stats available via Bleve's
Index.Stats()API
Future Enhancements
Potential Additions
- GraphQL Integration - Add search query/mutation
- Incremental Updates - Auto-index on translation create/update
- Advanced Analyzers - Language-specific tokenization
- Highlighting - Return matched text snippets
- Faceted Search - Aggregate by language, status, translator
- Pagination - Add cursor-based pagination for large result sets
Performance Optimizations
- Index Optimization - Periodic index compaction
- Read Replicas - Multiple read-only index instances
- Custom Mapping - Fine-tune field analyzers per use case
Dependencies
github.com/blevesearch/bleve/v2v2.5.5- 23 additional Bleve sub-packages (auto-managed)
go.etcd.io/bboltv1.4.0 (storage backend)
Documentation
- Bleve Documentation
- Bleve GitHub
- Backend implementation: See source files above