17 KiB
Backend Production Readiness & Code Quality Improvements
Overview
Implement critical production-ready features, refactor architectural issues, and improve code quality for the Tercul backend. The codebase uses Go 1.25, follows DDD/CQRS patterns, GraphQL API, and clean architecture principles.
Critical Issues to Resolve
1. Implement Full-Text Search Service (P0 - Critical)
Problem: The search service in internal/app/search/service.go is a stub that returns empty results. This is a core feature that users depend on.
Current State:
Search()method returns empty results (line 31-39)IndexWork()is partially implemented but search logic missing- Weaviate client exists but not utilized for search
- Search filters are defined but not applied
Affected Files:
internal/app/search/service.go- Main search service (stub implementation)internal/platform/search/weaviate_wrapper.go- Weaviate client wrapperinternal/domain/search/search.go- Search domain interfaces- GraphQL resolvers that use search service
Solution:
-
Implement full Weaviate search query in
Search()method:- Query Weaviate for works, translations, and authors
- Apply search filters (language, type, date range, tags, authors)
- Support multi-language search (Russian, English, Tatar)
- Implement relevance ranking
- Add pagination support
- Handle special characters and diacritics
-
Enhance indexing:
- Index work titles, content, and metadata
- Index translation content with language tags
- Index author names and biographies
- Add incremental indexing on create/update operations
- Create background job for bulk indexing existing content
-
Add search result transformation:
- Map Weaviate results to domain entities
- Include relevance scores
- Handle empty results gracefully
- Add search analytics/metrics
Acceptance Criteria:
- Search returns relevant results ranked by relevance
- Supports filtering by language, category, tags, authors, date ranges
- Search response time < 200ms for 95th percentile
- Handles multi-language queries correctly
- All existing tests pass
- Integration tests with real Weaviate instance
2. Refactor Global Configuration Singleton (P1 - High Priority)
Problem: The application uses a global singleton config.Cfg which violates dependency injection principles and makes testing difficult.
Current State:
internal/platform/config/config.gohas globalvar Cfg *Configconfig.Cfgis accessed directly in multiple places:internal/platform/search/bleve_client.go(line 13)- Various other packages
Affected Files:
internal/platform/config/config.go- Global config singletoninternal/platform/search/bleve_client.go- Usesconfig.Cfgcmd/api/main.go- Loads config but also sets globalcmd/worker/main.go- Similar pattern- Any other files accessing
config.Cfgdirectly
Solution:
-
Remove global
Cfgvariable from config package -
Refactor
LoadConfig()to return config without setting global -
Pass
*config.Configas dependency to all constructors:- Update
NewBleveClient()to accept config parameter - Update all repository constructors to accept config
- Update application service constructors
- Update platform service constructors
- Update
-
Update main entry points:
cmd/api/main.go- Pass config to all dependenciescmd/worker/main.go- Pass config to all dependenciescmd/tools/enrich/main.go- Pass config to dependencies
-
Make configuration more flexible:
- Make migration path configurable (currently hardcoded)
- Make metrics server port configurable
- Add validation for required config values
- Add config struct tags for better documentation
Acceptance Criteria:
- No global
config.Cfgusage anywhere in codebase - All dependencies receive config via constructor injection
- Tests can easily mock/inject different configs
- Configuration validation on startup
- Backward compatible (same environment variables work)
3. Enhance Observability: Distributed Tracing (P0 - Critical)
Problem: Tracing is implemented but only exports to stdout. Need production-ready tracing with OTLP exporter and proper instrumentation.
Current State:
internal/observability/tracing.gousesstdouttraceexporter- Basic tracer provider exists but not production-ready
- Missing instrumentation in many places
Affected Files:
internal/observability/tracing.go- Only stdout exporter- HTTP middleware - May need tracing instrumentation
- GraphQL resolvers - Need span creation
- Database queries - Need query tracing
- Application services - Need business logic spans
Solution:
-
Replace stdout exporter with OTLP exporter:
- Add OTLP exporter configuration
- Support both gRPC and HTTP OTLP endpoints
- Add environment-based configuration (dev vs prod)
- Add trace sampling strategy (100% dev, 10% prod)
-
Enhance instrumentation:
- Add automatic HTTP request tracing in middleware
- Instrument all GraphQL resolvers with spans
- Add database query spans via GORM callbacks
- Create custom spans for slow operations (>100ms)
- Add span attributes (user_id, work_id, etc.)
-
Add trace context propagation:
- Ensure trace IDs propagate through all layers
- Add trace ID to structured logs
- Support distributed tracing across services
-
Configuration:
type TracingConfig struct { Enabled bool ServiceName string OTLPEndpoint string SamplingRate float64 Environment string }
Acceptance Criteria:
- Traces exported to OTLP collector (Jaeger/Tempo compatible)
- All HTTP requests have spans
- All GraphQL resolvers traced
- Database queries have spans
- Trace IDs in logs
- Sampling configurable per environment
4. Enhance Observability: Prometheus Metrics (P0 - Critical)
Problem: Basic metrics exist but need enhancement for production monitoring and alerting.
Current State:
internal/observability/metrics.gohas basic HTTP and DB metrics- Missing business metrics, GraphQL-specific metrics
- No Grafana dashboards or alerting rules
Affected Files:
internal/observability/metrics.go- Basic metrics- GraphQL resolvers - Need resolver metrics
- Application services - Need business metrics
- Background jobs - Need job metrics
Solution:
-
Add GraphQL-specific metrics:
graphql_resolver_duration_seconds{operation, resolver}graphql_errors_total{operation, error_type}graphql_operations_total{operation, status}
-
Add business metrics:
works_created_total{language}searches_performed_total{type}user_registrations_totaltranslations_created_total{language}likes_total{entity_type}
-
Enhance existing metrics:
- Add more labels to HTTP metrics (status code as number)
- Add query type labels to DB metrics
- Add connection pool metrics
- Add cache hit/miss metrics
-
Create observability package structure:
- Move metrics to
internal/observability/metrics/ - Add metric collection helpers
- Document metric naming conventions
- Move metrics to
Acceptance Criteria:
- All critical paths have metrics
- GraphQL operations fully instrumented
- Business metrics tracked
- Metrics exposed on
/metricsendpoint - Metric labels follow Prometheus best practices
5. Implement Read Models (DTOs) for Efficient Queries (P1 - High Priority)
Problem: Application queries return full domain entities, which is inefficient and leaks domain logic to API layer.
Current State:
- Queries in
internal/app/*/queries.goreturn domain entities - GraphQL resolvers receive full entities with all fields
- No optimization for list vs detail views
Affected Files:
internal/app/work/queries.go- Returnsdomain.Workinternal/app/translation/queries.go- Returnsdomain.Translationinternal/app/author/queries.go- Returnsdomain.Author- GraphQL resolvers - Receive full entities
Solution:
-
Create DTO packages:
internal/app/work/dto- WorkListDTO, WorkDetailDTOinternal/app/translation/dto- TranslationListDTO, TranslationDetailDTOinternal/app/author/dto- AuthorListDTO, AuthorDetailDTO
-
Define optimized DTOs:
// WorkListDTO - For list views (minimal fields) type WorkListDTO struct { ID uint Title string AuthorName string AuthorID uint Language string CreatedAt time.Time ViewCount int LikeCount int TranslationCount int } // WorkDetailDTO - For single work view (all fields) type WorkDetailDTO struct { *WorkListDTO Content string Description string Tags []string Translations []TranslationSummaryDTO Author AuthorSummaryDTO } -
Refactor queries to return DTOs:
- Update query methods to use optimized SQL
- Use joins to avoid N+1 queries
- Map domain entities to DTOs
- Update GraphQL resolvers to use DTOs
-
Add benchmarks comparing old vs new approach
Acceptance Criteria:
- List queries return optimized DTOs
- Detail queries return full DTOs
- No N+1 query problems
- Payload size reduced by 30-50%
- Query response time improved by 20%
- No breaking changes to GraphQL schema
6. Improve Structured Logging (P1 - High Priority)
Problem: Logging exists but lacks request context, user IDs, and trace correlation.
Current State:
internal/platform/loguses zerolog- Basic logging but missing context
- No request ID propagation
- No user ID in logs
- No trace/span ID correlation
Affected Files:
internal/platform/log/logger.go- Basic logger- HTTP middleware - Needs request ID injection
- All application services - Need context logging
Solution:
-
Enhance HTTP middleware:
- Generate request ID for each request
- Inject request ID into context
- Add user ID from JWT to context
- Add trace/span IDs to context
-
Update logger to use context:
- Extract request ID, user ID, trace ID from context
- Add to all log entries automatically
- Create helper:
log.FromContext(ctx).WithRequestID().WithUserID()
-
Add structured logging fields:
- Define field name constants
- Ensure consistent field names across codebase
- Add sensitive data redaction
-
Implement log sampling:
- Sample high-volume endpoints (e.g., health checks)
- Configurable sampling rates
- Always log errors regardless of sampling
Acceptance Criteria:
- All logs include request ID
- Authenticated request logs include user ID
- All logs include trace/span IDs
- Consistent log format across codebase
- Sensitive data excluded from logs
- Log sampling for high-volume endpoints
7. Refactor Caching with Decorator Pattern (P1 - High Priority)
Problem: Current caching implementation uses bespoke cached repositories. Should use decorator pattern for better maintainability.
Current State:
internal/data/cachehas custom caching logic- Cached repositories are separate implementations
- Not following decorator pattern
Affected Files:
internal/data/cache/*- Current caching implementation- Repository interfaces - Need to support decorators
Solution:
-
Implement decorator pattern:
- Create
CachedWorkRepositorydecorator - Create
CachedAuthorRepositorydecorator - Create
CachedTranslationRepositorydecorator - Decorators wrap base repositories
- Create
-
Implement cache-aside pattern:
- Check cache on read, populate on miss
- Invalidate cache on write operations
- Add cache key versioning strategy
-
Add cache configuration:
- TTL per entity type
- Cache size limits
- Cache warming strategies
-
Add cache metrics:
- Hit/miss rates
- Cache size
- Eviction counts
Acceptance Criteria:
- Decorator pattern implemented
- Cache hit rate > 70% for reads
- Automatic cache invalidation on updates
- Cache failures don't break application
- Metrics for cache performance
8. Complete API Documentation (P1 - High Priority)
Problem: API documentation is incomplete. Need comprehensive GraphQL API documentation.
Current State:
- GraphQL schema exists but lacks descriptions
- No example queries
- No API guide for consumers
Affected Files:
- GraphQL schema files - Need descriptions
api/README.md- Needs comprehensive guide- All resolver implementations - Need documentation
Solution:
-
Add descriptions to GraphQL schema:
- Document all types, queries, mutations
- Add field descriptions
- Document input validation rules
- Add deprecation notices where applicable
-
Create comprehensive API documentation:
api/README.md- Complete API guideapi/EXAMPLES.md- Query examples- Document authentication requirements
- Document rate limiting
- Document error responses
-
Enhance GraphQL Playground:
- Pre-populate with example queries
- Add query templates
- Document schema changes
Acceptance Criteria:
- All 80+ GraphQL resolvers documented
- Example queries for each operation
- Input validation rules documented
- Error response examples
- Authentication requirements clear
- API changelog maintained
9. Refactor Testing Utilities (P2 - Medium Priority)
Problem: Tests depend on live database connections, making them slow and unreliable.
Current State:
internal/testutil/testutil.gohas database connection logic- Integration tests require live database
- Tests are slow and may be flaky
Affected Files:
internal/testutil/testutil.go- Database connection logic- All integration tests - Depend on live DB
Solution:
-
Decouple tests from live database:
- Remove database connection from testutil
- Use test containers for integration tests
- Use mocks for unit tests
-
Improve test utilities:
- Create test data builders
- Add fixtures for common scenarios
- Improve test isolation
-
Add parallel test execution:
- Enable
-parallelflag where safe - Use test-specific database schemas
- Clean up test data properly
- Enable
Acceptance Criteria:
- Unit tests run without database
- Integration tests use test containers
- Tests run in parallel where possible
- Test execution time < 5 seconds for unit tests
- Clear separation between unit and integration tests
10. Implement Analytics Features (P2 - Medium Priority)
Problem: Analytics service exists but some metrics are stubs (like, comment, bookmark counting).
Current State:
internal/jobs/linguistics/work_analysis_service.gohas TODO comments:- Line 184: ViewCount TODO
- Line 185: LikeCount TODO
- Line 186: CommentCount TODO
- Line 187: BookmarkCount TODO
- Line 188: TranslationCount TODO
- Line 192: PopularTranslations TODO
Affected Files:
internal/jobs/linguistics/work_analysis_service.go- Stub implementationsinternal/app/analytics/*- Analytics services
Solution:
-
Implement counting services:
- Like counting service
- Comment counting service
- Bookmark counting service
- Translation counting service
- View counting service
-
Implement popular translations calculation:
- Calculate based on likes, comments, bookmarks
- Cache results for performance
- Update periodically via background job
-
Add analytics to work analysis:
- Integrate counting services
- Update WorkAnalytics struct
- Ensure data is accurate and up-to-date
Acceptance Criteria:
- All analytics metrics implemented
- Popular translations calculated correctly
- Analytics updated in real-time or near-real-time
- Performance optimized (cached where appropriate)
- Tests for all analytics features
Implementation Guidelines
- Architecture First: Maintain clean architecture, DDD, and CQRS patterns
- Backward Compatibility: Ensure API contracts remain consistent
- Code Quality:
- Follow Go best practices and idioms
- Use interfaces for testability
- Maintain separation of concerns
- Add comprehensive error handling
- Testing: Write tests for all new features and refactorings
- Documentation: Add GoDoc comments for all public APIs
- Performance: Optimize for production workloads
- Observability: Instrument all critical paths
Expected Outcome
- Production-ready search functionality
- Proper dependency injection (no globals)
- Full observability (tracing, metrics, logging)
- Optimized queries with DTOs
- Comprehensive API documentation
- Fast, reliable test suite
- Complete analytics features
- Improved code maintainability
Files to Prioritize
internal/app/search/service.go- Core search implementation (P0)internal/platform/config/config.go- Configuration refactoring (P1)internal/observability/*- Observability enhancements (P0)internal/app/*/queries.go- DTO implementation (P1)internal/platform/log/*- Logging improvements (P1)api/README.md- API documentation (P1)
Notes
- Codebase uses Go 1.25
- Follows DDD/CQRS/Clean Architecture patterns
- GraphQL API with gqlgen
- PostgreSQL with GORM
- Weaviate for vector search
- Redis for caching and job queue
- Docker for local development
- Existing tests should continue to pass
- Follow existing code style and patterns