mirror of
https://github.com/SamyRai/tercul-backend.git
synced 2025-12-27 05:11:34 +00:00
504 lines
17 KiB
Markdown
504 lines
17 KiB
Markdown
# Backend Production Readiness & Code Quality Improvements
|
|
|
|
## Overview
|
|
Implement critical production-ready features, refactor architectural issues, and improve code quality for the Tercul backend. The codebase uses Go 1.25, follows DDD/CQRS patterns, GraphQL API, and clean architecture principles.
|
|
|
|
## Critical Issues to Resolve
|
|
|
|
### 1. Implement Full-Text Search Service (P0 - Critical)
|
|
**Problem**: The search service in `internal/app/search/service.go` is a stub that returns empty results. This is a core feature that users depend on.
|
|
|
|
**Current State**:
|
|
- `Search()` method returns empty results (line 31-39)
|
|
- `IndexWork()` is partially implemented but search logic missing
|
|
- Weaviate client exists but not utilized for search
|
|
- Search filters are defined but not applied
|
|
|
|
**Affected Files**:
|
|
- `internal/app/search/service.go` - Main search service (stub implementation)
|
|
- `internal/platform/search/weaviate_wrapper.go` - Weaviate client wrapper
|
|
- `internal/domain/search/search.go` - Search domain interfaces
|
|
- GraphQL resolvers that use search service
|
|
|
|
**Solution**:
|
|
1. Implement full Weaviate search query in `Search()` method:
|
|
- Query Weaviate for works, translations, and authors
|
|
- Apply search filters (language, type, date range, tags, authors)
|
|
- Support multi-language search (Russian, English, Tatar)
|
|
- Implement relevance ranking
|
|
- Add pagination support
|
|
- Handle special characters and diacritics
|
|
|
|
2. Enhance indexing:
|
|
- Index work titles, content, and metadata
|
|
- Index translation content with language tags
|
|
- Index author names and biographies
|
|
- Add incremental indexing on create/update operations
|
|
- Create background job for bulk indexing existing content
|
|
|
|
3. Add search result transformation:
|
|
- Map Weaviate results to domain entities
|
|
- Include relevance scores
|
|
- Handle empty results gracefully
|
|
- Add search analytics/metrics
|
|
|
|
**Acceptance Criteria**:
|
|
- Search returns relevant results ranked by relevance
|
|
- Supports filtering by language, category, tags, authors, date ranges
|
|
- Search response time < 200ms for 95th percentile
|
|
- Handles multi-language queries correctly
|
|
- All existing tests pass
|
|
- Integration tests with real Weaviate instance
|
|
|
|
### 2. Refactor Global Configuration Singleton (P1 - High Priority)
|
|
**Problem**: The application uses a global singleton `config.Cfg` which violates dependency injection principles and makes testing difficult.
|
|
|
|
**Current State**:
|
|
- `internal/platform/config/config.go` has global `var Cfg *Config`
|
|
- `config.Cfg` is accessed directly in multiple places:
|
|
- `internal/platform/search/bleve_client.go` (line 13)
|
|
- Various other packages
|
|
|
|
**Affected Files**:
|
|
- `internal/platform/config/config.go` - Global config singleton
|
|
- `internal/platform/search/bleve_client.go` - Uses `config.Cfg`
|
|
- `cmd/api/main.go` - Loads config but also sets global
|
|
- `cmd/worker/main.go` - Similar pattern
|
|
- Any other files accessing `config.Cfg` directly
|
|
|
|
**Solution**:
|
|
1. Remove global `Cfg` variable from config package
|
|
2. Refactor `LoadConfig()` to return config without setting global
|
|
3. Pass `*config.Config` as dependency to all constructors:
|
|
- Update `NewBleveClient()` to accept config parameter
|
|
- Update all repository constructors to accept config
|
|
- Update application service constructors
|
|
- Update platform service constructors
|
|
|
|
4. Update main entry points:
|
|
- `cmd/api/main.go` - Pass config to all dependencies
|
|
- `cmd/worker/main.go` - Pass config to all dependencies
|
|
- `cmd/tools/enrich/main.go` - Pass config to dependencies
|
|
|
|
5. Make configuration more flexible:
|
|
- Make migration path configurable (currently hardcoded)
|
|
- Make metrics server port configurable
|
|
- Add validation for required config values
|
|
- Add config struct tags for better documentation
|
|
|
|
**Acceptance Criteria**:
|
|
- No global `config.Cfg` usage anywhere in codebase
|
|
- All dependencies receive config via constructor injection
|
|
- Tests can easily mock/inject different configs
|
|
- Configuration validation on startup
|
|
- Backward compatible (same environment variables work)
|
|
|
|
### 3. Enhance Observability: Distributed Tracing (P0 - Critical)
|
|
**Problem**: Tracing is implemented but only exports to stdout. Need production-ready tracing with OTLP exporter and proper instrumentation.
|
|
|
|
**Current State**:
|
|
- `internal/observability/tracing.go` uses `stdouttrace` exporter
|
|
- Basic tracer provider exists but not production-ready
|
|
- Missing instrumentation in many places
|
|
|
|
**Affected Files**:
|
|
- `internal/observability/tracing.go` - Only stdout exporter
|
|
- HTTP middleware - May need tracing instrumentation
|
|
- GraphQL resolvers - Need span creation
|
|
- Database queries - Need query tracing
|
|
- Application services - Need business logic spans
|
|
|
|
**Solution**:
|
|
1. Replace stdout exporter with OTLP exporter:
|
|
- Add OTLP exporter configuration
|
|
- Support both gRPC and HTTP OTLP endpoints
|
|
- Add environment-based configuration (dev vs prod)
|
|
- Add trace sampling strategy (100% dev, 10% prod)
|
|
|
|
2. Enhance instrumentation:
|
|
- Add automatic HTTP request tracing in middleware
|
|
- Instrument all GraphQL resolvers with spans
|
|
- Add database query spans via GORM callbacks
|
|
- Create custom spans for slow operations (>100ms)
|
|
- Add span attributes (user_id, work_id, etc.)
|
|
|
|
3. Add trace context propagation:
|
|
- Ensure trace IDs propagate through all layers
|
|
- Add trace ID to structured logs
|
|
- Support distributed tracing across services
|
|
|
|
4. Configuration:
|
|
```go
|
|
type TracingConfig struct {
|
|
Enabled bool
|
|
ServiceName string
|
|
OTLPEndpoint string
|
|
SamplingRate float64
|
|
Environment string
|
|
}
|
|
```
|
|
|
|
**Acceptance Criteria**:
|
|
- Traces exported to OTLP collector (Jaeger/Tempo compatible)
|
|
- All HTTP requests have spans
|
|
- All GraphQL resolvers traced
|
|
- Database queries have spans
|
|
- Trace IDs in logs
|
|
- Sampling configurable per environment
|
|
|
|
### 4. Enhance Observability: Prometheus Metrics (P0 - Critical)
|
|
**Problem**: Basic metrics exist but need enhancement for production monitoring and alerting.
|
|
|
|
**Current State**:
|
|
- `internal/observability/metrics.go` has basic HTTP and DB metrics
|
|
- Missing business metrics, GraphQL-specific metrics
|
|
- No Grafana dashboards or alerting rules
|
|
|
|
**Affected Files**:
|
|
- `internal/observability/metrics.go` - Basic metrics
|
|
- GraphQL resolvers - Need resolver metrics
|
|
- Application services - Need business metrics
|
|
- Background jobs - Need job metrics
|
|
|
|
**Solution**:
|
|
1. Add GraphQL-specific metrics:
|
|
- `graphql_resolver_duration_seconds{operation, resolver}`
|
|
- `graphql_errors_total{operation, error_type}`
|
|
- `graphql_operations_total{operation, status}`
|
|
|
|
2. Add business metrics:
|
|
- `works_created_total{language}`
|
|
- `searches_performed_total{type}`
|
|
- `user_registrations_total`
|
|
- `translations_created_total{language}`
|
|
- `likes_total{entity_type}`
|
|
|
|
3. Enhance existing metrics:
|
|
- Add more labels to HTTP metrics (status code as number)
|
|
- Add query type labels to DB metrics
|
|
- Add connection pool metrics
|
|
- Add cache hit/miss metrics
|
|
|
|
4. Create observability package structure:
|
|
- Move metrics to `internal/observability/metrics/`
|
|
- Add metric collection helpers
|
|
- Document metric naming conventions
|
|
|
|
**Acceptance Criteria**:
|
|
- All critical paths have metrics
|
|
- GraphQL operations fully instrumented
|
|
- Business metrics tracked
|
|
- Metrics exposed on `/metrics` endpoint
|
|
- Metric labels follow Prometheus best practices
|
|
|
|
### 5. Implement Read Models (DTOs) for Efficient Queries (P1 - High Priority)
|
|
**Problem**: Application queries return full domain entities, which is inefficient and leaks domain logic to API layer.
|
|
|
|
**Current State**:
|
|
- Queries in `internal/app/*/queries.go` return domain entities
|
|
- GraphQL resolvers receive full entities with all fields
|
|
- No optimization for list vs detail views
|
|
|
|
**Affected Files**:
|
|
- `internal/app/work/queries.go` - Returns `domain.Work`
|
|
- `internal/app/translation/queries.go` - Returns `domain.Translation`
|
|
- `internal/app/author/queries.go` - Returns `domain.Author`
|
|
- GraphQL resolvers - Receive full entities
|
|
|
|
**Solution**:
|
|
1. Create DTO packages:
|
|
- `internal/app/work/dto` - WorkListDTO, WorkDetailDTO
|
|
- `internal/app/translation/dto` - TranslationListDTO, TranslationDetailDTO
|
|
- `internal/app/author/dto` - AuthorListDTO, AuthorDetailDTO
|
|
|
|
2. Define optimized DTOs:
|
|
```go
|
|
// WorkListDTO - For list views (minimal fields)
|
|
type WorkListDTO struct {
|
|
ID uint
|
|
Title string
|
|
AuthorName string
|
|
AuthorID uint
|
|
Language string
|
|
CreatedAt time.Time
|
|
ViewCount int
|
|
LikeCount int
|
|
TranslationCount int
|
|
}
|
|
|
|
// WorkDetailDTO - For single work view (all fields)
|
|
type WorkDetailDTO struct {
|
|
*WorkListDTO
|
|
Content string
|
|
Description string
|
|
Tags []string
|
|
Translations []TranslationSummaryDTO
|
|
Author AuthorSummaryDTO
|
|
}
|
|
```
|
|
|
|
3. Refactor queries to return DTOs:
|
|
- Update query methods to use optimized SQL
|
|
- Use joins to avoid N+1 queries
|
|
- Map domain entities to DTOs
|
|
- Update GraphQL resolvers to use DTOs
|
|
|
|
4. Add benchmarks comparing old vs new approach
|
|
|
|
**Acceptance Criteria**:
|
|
- List queries return optimized DTOs
|
|
- Detail queries return full DTOs
|
|
- No N+1 query problems
|
|
- Payload size reduced by 30-50%
|
|
- Query response time improved by 20%
|
|
- No breaking changes to GraphQL schema
|
|
|
|
### 6. Improve Structured Logging (P1 - High Priority)
|
|
**Problem**: Logging exists but lacks request context, user IDs, and trace correlation.
|
|
|
|
**Current State**:
|
|
- `internal/platform/log` uses zerolog
|
|
- Basic logging but missing context
|
|
- No request ID propagation
|
|
- No user ID in logs
|
|
- No trace/span ID correlation
|
|
|
|
**Affected Files**:
|
|
- `internal/platform/log/logger.go` - Basic logger
|
|
- HTTP middleware - Needs request ID injection
|
|
- All application services - Need context logging
|
|
|
|
**Solution**:
|
|
1. Enhance HTTP middleware:
|
|
- Generate request ID for each request
|
|
- Inject request ID into context
|
|
- Add user ID from JWT to context
|
|
- Add trace/span IDs to context
|
|
|
|
2. Update logger to use context:
|
|
- Extract request ID, user ID, trace ID from context
|
|
- Add to all log entries automatically
|
|
- Create helper: `log.FromContext(ctx).WithRequestID().WithUserID()`
|
|
|
|
3. Add structured logging fields:
|
|
- Define field name constants
|
|
- Ensure consistent field names across codebase
|
|
- Add sensitive data redaction
|
|
|
|
4. Implement log sampling:
|
|
- Sample high-volume endpoints (e.g., health checks)
|
|
- Configurable sampling rates
|
|
- Always log errors regardless of sampling
|
|
|
|
**Acceptance Criteria**:
|
|
- All logs include request ID
|
|
- Authenticated request logs include user ID
|
|
- All logs include trace/span IDs
|
|
- Consistent log format across codebase
|
|
- Sensitive data excluded from logs
|
|
- Log sampling for high-volume endpoints
|
|
|
|
### 7. Refactor Caching with Decorator Pattern (P1 - High Priority)
|
|
**Problem**: Current caching implementation uses bespoke cached repositories. Should use decorator pattern for better maintainability.
|
|
|
|
**Current State**:
|
|
- `internal/data/cache` has custom caching logic
|
|
- Cached repositories are separate implementations
|
|
- Not following decorator pattern
|
|
|
|
**Affected Files**:
|
|
- `internal/data/cache/*` - Current caching implementation
|
|
- Repository interfaces - Need to support decorators
|
|
|
|
**Solution**:
|
|
1. Implement decorator pattern:
|
|
- Create `CachedWorkRepository` decorator
|
|
- Create `CachedAuthorRepository` decorator
|
|
- Create `CachedTranslationRepository` decorator
|
|
- Decorators wrap base repositories
|
|
|
|
2. Implement cache-aside pattern:
|
|
- Check cache on read, populate on miss
|
|
- Invalidate cache on write operations
|
|
- Add cache key versioning strategy
|
|
|
|
3. Add cache configuration:
|
|
- TTL per entity type
|
|
- Cache size limits
|
|
- Cache warming strategies
|
|
|
|
4. Add cache metrics:
|
|
- Hit/miss rates
|
|
- Cache size
|
|
- Eviction counts
|
|
|
|
**Acceptance Criteria**:
|
|
- Decorator pattern implemented
|
|
- Cache hit rate > 70% for reads
|
|
- Automatic cache invalidation on updates
|
|
- Cache failures don't break application
|
|
- Metrics for cache performance
|
|
|
|
### 8. Complete API Documentation (P1 - High Priority)
|
|
**Problem**: API documentation is incomplete. Need comprehensive GraphQL API documentation.
|
|
|
|
**Current State**:
|
|
- GraphQL schema exists but lacks descriptions
|
|
- No example queries
|
|
- No API guide for consumers
|
|
|
|
**Affected Files**:
|
|
- GraphQL schema files - Need descriptions
|
|
- `api/README.md` - Needs comprehensive guide
|
|
- All resolver implementations - Need documentation
|
|
|
|
**Solution**:
|
|
1. Add descriptions to GraphQL schema:
|
|
- Document all types, queries, mutations
|
|
- Add field descriptions
|
|
- Document input validation rules
|
|
- Add deprecation notices where applicable
|
|
|
|
2. Create comprehensive API documentation:
|
|
- `api/README.md` - Complete API guide
|
|
- `api/EXAMPLES.md` - Query examples
|
|
- Document authentication requirements
|
|
- Document rate limiting
|
|
- Document error responses
|
|
|
|
3. Enhance GraphQL Playground:
|
|
- Pre-populate with example queries
|
|
- Add query templates
|
|
- Document schema changes
|
|
|
|
**Acceptance Criteria**:
|
|
- All 80+ GraphQL resolvers documented
|
|
- Example queries for each operation
|
|
- Input validation rules documented
|
|
- Error response examples
|
|
- Authentication requirements clear
|
|
- API changelog maintained
|
|
|
|
### 9. Refactor Testing Utilities (P2 - Medium Priority)
|
|
**Problem**: Tests depend on live database connections, making them slow and unreliable.
|
|
|
|
**Current State**:
|
|
- `internal/testutil/testutil.go` has database connection logic
|
|
- Integration tests require live database
|
|
- Tests are slow and may be flaky
|
|
|
|
**Affected Files**:
|
|
- `internal/testutil/testutil.go` - Database connection logic
|
|
- All integration tests - Depend on live DB
|
|
|
|
**Solution**:
|
|
1. Decouple tests from live database:
|
|
- Remove database connection from testutil
|
|
- Use test containers for integration tests
|
|
- Use mocks for unit tests
|
|
|
|
2. Improve test utilities:
|
|
- Create test data builders
|
|
- Add fixtures for common scenarios
|
|
- Improve test isolation
|
|
|
|
3. Add parallel test execution:
|
|
- Enable `-parallel` flag where safe
|
|
- Use test-specific database schemas
|
|
- Clean up test data properly
|
|
|
|
**Acceptance Criteria**:
|
|
- Unit tests run without database
|
|
- Integration tests use test containers
|
|
- Tests run in parallel where possible
|
|
- Test execution time < 5 seconds for unit tests
|
|
- Clear separation between unit and integration tests
|
|
|
|
### 10. Implement Analytics Features (P2 - Medium Priority)
|
|
**Problem**: Analytics service exists but some metrics are stubs (like, comment, bookmark counting).
|
|
|
|
**Current State**:
|
|
- `internal/jobs/linguistics/work_analysis_service.go` has TODO comments:
|
|
- Line 184: ViewCount TODO
|
|
- Line 185: LikeCount TODO
|
|
- Line 186: CommentCount TODO
|
|
- Line 187: BookmarkCount TODO
|
|
- Line 188: TranslationCount TODO
|
|
- Line 192: PopularTranslations TODO
|
|
|
|
**Affected Files**:
|
|
- `internal/jobs/linguistics/work_analysis_service.go` - Stub implementations
|
|
- `internal/app/analytics/*` - Analytics services
|
|
|
|
**Solution**:
|
|
1. Implement counting services:
|
|
- Like counting service
|
|
- Comment counting service
|
|
- Bookmark counting service
|
|
- Translation counting service
|
|
- View counting service
|
|
|
|
2. Implement popular translations calculation:
|
|
- Calculate based on likes, comments, bookmarks
|
|
- Cache results for performance
|
|
- Update periodically via background job
|
|
|
|
3. Add analytics to work analysis:
|
|
- Integrate counting services
|
|
- Update WorkAnalytics struct
|
|
- Ensure data is accurate and up-to-date
|
|
|
|
**Acceptance Criteria**:
|
|
- All analytics metrics implemented
|
|
- Popular translations calculated correctly
|
|
- Analytics updated in real-time or near-real-time
|
|
- Performance optimized (cached where appropriate)
|
|
- Tests for all analytics features
|
|
|
|
## Implementation Guidelines
|
|
|
|
1. **Architecture First**: Maintain clean architecture, DDD, and CQRS patterns
|
|
2. **Backward Compatibility**: Ensure API contracts remain consistent
|
|
3. **Code Quality**:
|
|
- Follow Go best practices and idioms
|
|
- Use interfaces for testability
|
|
- Maintain separation of concerns
|
|
- Add comprehensive error handling
|
|
4. **Testing**: Write tests for all new features and refactorings
|
|
5. **Documentation**: Add GoDoc comments for all public APIs
|
|
6. **Performance**: Optimize for production workloads
|
|
7. **Observability**: Instrument all critical paths
|
|
|
|
## Expected Outcome
|
|
|
|
- Production-ready search functionality
|
|
- Proper dependency injection (no globals)
|
|
- Full observability (tracing, metrics, logging)
|
|
- Optimized queries with DTOs
|
|
- Comprehensive API documentation
|
|
- Fast, reliable test suite
|
|
- Complete analytics features
|
|
- Improved code maintainability
|
|
|
|
## Files to Prioritize
|
|
|
|
1. `internal/app/search/service.go` - Core search implementation (P0)
|
|
2. `internal/platform/config/config.go` - Configuration refactoring (P1)
|
|
3. `internal/observability/*` - Observability enhancements (P0)
|
|
4. `internal/app/*/queries.go` - DTO implementation (P1)
|
|
5. `internal/platform/log/*` - Logging improvements (P1)
|
|
6. `api/README.md` - API documentation (P1)
|
|
|
|
## Notes
|
|
|
|
- Codebase uses Go 1.25
|
|
- Follows DDD/CQRS/Clean Architecture patterns
|
|
- GraphQL API with gqlgen
|
|
- PostgreSQL with GORM
|
|
- Weaviate for vector search
|
|
- Redis for caching and job queue
|
|
- Docker for local development
|
|
- Existing tests should continue to pass
|
|
- Follow existing code style and patterns
|
|
|