tercul-backend/jules-task.md
2025-11-30 03:13:33 +01:00

17 KiB

Backend Production Readiness & Code Quality Improvements

Overview

Implement critical production-ready features, refactor architectural issues, and improve code quality for the Tercul backend. The codebase uses Go 1.25, follows DDD/CQRS patterns, GraphQL API, and clean architecture principles.

Critical Issues to Resolve

1. Implement Full-Text Search Service (P0 - Critical)

Problem: The search service in internal/app/search/service.go is a stub that returns empty results. This is a core feature that users depend on.

Current State:

  • Search() method returns empty results (line 31-39)
  • IndexWork() is partially implemented but search logic missing
  • Weaviate client exists but not utilized for search
  • Search filters are defined but not applied

Affected Files:

  • internal/app/search/service.go - Main search service (stub implementation)
  • internal/platform/search/weaviate_wrapper.go - Weaviate client wrapper
  • internal/domain/search/search.go - Search domain interfaces
  • GraphQL resolvers that use search service

Solution:

  1. Implement full Weaviate search query in Search() method:

    • Query Weaviate for works, translations, and authors
    • Apply search filters (language, type, date range, tags, authors)
    • Support multi-language search (Russian, English, Tatar)
    • Implement relevance ranking
    • Add pagination support
    • Handle special characters and diacritics
  2. Enhance indexing:

    • Index work titles, content, and metadata
    • Index translation content with language tags
    • Index author names and biographies
    • Add incremental indexing on create/update operations
    • Create background job for bulk indexing existing content
  3. Add search result transformation:

    • Map Weaviate results to domain entities
    • Include relevance scores
    • Handle empty results gracefully
    • Add search analytics/metrics

Acceptance Criteria:

  • Search returns relevant results ranked by relevance
  • Supports filtering by language, category, tags, authors, date ranges
  • Search response time < 200ms for 95th percentile
  • Handles multi-language queries correctly
  • All existing tests pass
  • Integration tests with real Weaviate instance

2. Refactor Global Configuration Singleton (P1 - High Priority)

Problem: The application uses a global singleton config.Cfg which violates dependency injection principles and makes testing difficult.

Current State:

  • internal/platform/config/config.go has global var Cfg *Config
  • config.Cfg is accessed directly in multiple places:
    • internal/platform/search/bleve_client.go (line 13)
    • Various other packages

Affected Files:

  • internal/platform/config/config.go - Global config singleton
  • internal/platform/search/bleve_client.go - Uses config.Cfg
  • cmd/api/main.go - Loads config but also sets global
  • cmd/worker/main.go - Similar pattern
  • Any other files accessing config.Cfg directly

Solution:

  1. Remove global Cfg variable from config package

  2. Refactor LoadConfig() to return config without setting global

  3. Pass *config.Config as dependency to all constructors:

    • Update NewBleveClient() to accept config parameter
    • Update all repository constructors to accept config
    • Update application service constructors
    • Update platform service constructors
  4. Update main entry points:

    • cmd/api/main.go - Pass config to all dependencies
    • cmd/worker/main.go - Pass config to all dependencies
    • cmd/tools/enrich/main.go - Pass config to dependencies
  5. Make configuration more flexible:

    • Make migration path configurable (currently hardcoded)
    • Make metrics server port configurable
    • Add validation for required config values
    • Add config struct tags for better documentation

Acceptance Criteria:

  • No global config.Cfg usage anywhere in codebase
  • All dependencies receive config via constructor injection
  • Tests can easily mock/inject different configs
  • Configuration validation on startup
  • Backward compatible (same environment variables work)

3. Enhance Observability: Distributed Tracing (P0 - Critical)

Problem: Tracing is implemented but only exports to stdout. Need production-ready tracing with OTLP exporter and proper instrumentation.

Current State:

  • internal/observability/tracing.go uses stdouttrace exporter
  • Basic tracer provider exists but not production-ready
  • Missing instrumentation in many places

Affected Files:

  • internal/observability/tracing.go - Only stdout exporter
  • HTTP middleware - May need tracing instrumentation
  • GraphQL resolvers - Need span creation
  • Database queries - Need query tracing
  • Application services - Need business logic spans

Solution:

  1. Replace stdout exporter with OTLP exporter:

    • Add OTLP exporter configuration
    • Support both gRPC and HTTP OTLP endpoints
    • Add environment-based configuration (dev vs prod)
    • Add trace sampling strategy (100% dev, 10% prod)
  2. Enhance instrumentation:

    • Add automatic HTTP request tracing in middleware
    • Instrument all GraphQL resolvers with spans
    • Add database query spans via GORM callbacks
    • Create custom spans for slow operations (>100ms)
    • Add span attributes (user_id, work_id, etc.)
  3. Add trace context propagation:

    • Ensure trace IDs propagate through all layers
    • Add trace ID to structured logs
    • Support distributed tracing across services
  4. Configuration:

    type TracingConfig struct {
        Enabled       bool
        ServiceName   string
        OTLPEndpoint  string
        SamplingRate  float64
        Environment   string
    }
    

Acceptance Criteria:

  • Traces exported to OTLP collector (Jaeger/Tempo compatible)
  • All HTTP requests have spans
  • All GraphQL resolvers traced
  • Database queries have spans
  • Trace IDs in logs
  • Sampling configurable per environment

4. Enhance Observability: Prometheus Metrics (P0 - Critical)

Problem: Basic metrics exist but need enhancement for production monitoring and alerting.

Current State:

  • internal/observability/metrics.go has basic HTTP and DB metrics
  • Missing business metrics, GraphQL-specific metrics
  • No Grafana dashboards or alerting rules

Affected Files:

  • internal/observability/metrics.go - Basic metrics
  • GraphQL resolvers - Need resolver metrics
  • Application services - Need business metrics
  • Background jobs - Need job metrics

Solution:

  1. Add GraphQL-specific metrics:

    • graphql_resolver_duration_seconds{operation, resolver}
    • graphql_errors_total{operation, error_type}
    • graphql_operations_total{operation, status}
  2. Add business metrics:

    • works_created_total{language}
    • searches_performed_total{type}
    • user_registrations_total
    • translations_created_total{language}
    • likes_total{entity_type}
  3. Enhance existing metrics:

    • Add more labels to HTTP metrics (status code as number)
    • Add query type labels to DB metrics
    • Add connection pool metrics
    • Add cache hit/miss metrics
  4. Create observability package structure:

    • Move metrics to internal/observability/metrics/
    • Add metric collection helpers
    • Document metric naming conventions

Acceptance Criteria:

  • All critical paths have metrics
  • GraphQL operations fully instrumented
  • Business metrics tracked
  • Metrics exposed on /metrics endpoint
  • Metric labels follow Prometheus best practices

5. Implement Read Models (DTOs) for Efficient Queries (P1 - High Priority)

Problem: Application queries return full domain entities, which is inefficient and leaks domain logic to API layer.

Current State:

  • Queries in internal/app/*/queries.go return domain entities
  • GraphQL resolvers receive full entities with all fields
  • No optimization for list vs detail views

Affected Files:

  • internal/app/work/queries.go - Returns domain.Work
  • internal/app/translation/queries.go - Returns domain.Translation
  • internal/app/author/queries.go - Returns domain.Author
  • GraphQL resolvers - Receive full entities

Solution:

  1. Create DTO packages:

    • internal/app/work/dto - WorkListDTO, WorkDetailDTO
    • internal/app/translation/dto - TranslationListDTO, TranslationDetailDTO
    • internal/app/author/dto - AuthorListDTO, AuthorDetailDTO
  2. Define optimized DTOs:

    // WorkListDTO - For list views (minimal fields)
    type WorkListDTO struct {
        ID              uint
        Title           string
        AuthorName      string
        AuthorID        uint
        Language        string
        CreatedAt       time.Time
        ViewCount       int
        LikeCount       int
        TranslationCount int
    }
    
    // WorkDetailDTO - For single work view (all fields)
    type WorkDetailDTO struct {
        *WorkListDTO
        Content         string
        Description     string
        Tags            []string
        Translations    []TranslationSummaryDTO
        Author          AuthorSummaryDTO
    }
    
  3. Refactor queries to return DTOs:

    • Update query methods to use optimized SQL
    • Use joins to avoid N+1 queries
    • Map domain entities to DTOs
    • Update GraphQL resolvers to use DTOs
  4. Add benchmarks comparing old vs new approach

Acceptance Criteria:

  • List queries return optimized DTOs
  • Detail queries return full DTOs
  • No N+1 query problems
  • Payload size reduced by 30-50%
  • Query response time improved by 20%
  • No breaking changes to GraphQL schema

6. Improve Structured Logging (P1 - High Priority)

Problem: Logging exists but lacks request context, user IDs, and trace correlation.

Current State:

  • internal/platform/log uses zerolog
  • Basic logging but missing context
  • No request ID propagation
  • No user ID in logs
  • No trace/span ID correlation

Affected Files:

  • internal/platform/log/logger.go - Basic logger
  • HTTP middleware - Needs request ID injection
  • All application services - Need context logging

Solution:

  1. Enhance HTTP middleware:

    • Generate request ID for each request
    • Inject request ID into context
    • Add user ID from JWT to context
    • Add trace/span IDs to context
  2. Update logger to use context:

    • Extract request ID, user ID, trace ID from context
    • Add to all log entries automatically
    • Create helper: log.FromContext(ctx).WithRequestID().WithUserID()
  3. Add structured logging fields:

    • Define field name constants
    • Ensure consistent field names across codebase
    • Add sensitive data redaction
  4. Implement log sampling:

    • Sample high-volume endpoints (e.g., health checks)
    • Configurable sampling rates
    • Always log errors regardless of sampling

Acceptance Criteria:

  • All logs include request ID
  • Authenticated request logs include user ID
  • All logs include trace/span IDs
  • Consistent log format across codebase
  • Sensitive data excluded from logs
  • Log sampling for high-volume endpoints

7. Refactor Caching with Decorator Pattern (P1 - High Priority)

Problem: Current caching implementation uses bespoke cached repositories. Should use decorator pattern for better maintainability.

Current State:

  • internal/data/cache has custom caching logic
  • Cached repositories are separate implementations
  • Not following decorator pattern

Affected Files:

  • internal/data/cache/* - Current caching implementation
  • Repository interfaces - Need to support decorators

Solution:

  1. Implement decorator pattern:

    • Create CachedWorkRepository decorator
    • Create CachedAuthorRepository decorator
    • Create CachedTranslationRepository decorator
    • Decorators wrap base repositories
  2. Implement cache-aside pattern:

    • Check cache on read, populate on miss
    • Invalidate cache on write operations
    • Add cache key versioning strategy
  3. Add cache configuration:

    • TTL per entity type
    • Cache size limits
    • Cache warming strategies
  4. Add cache metrics:

    • Hit/miss rates
    • Cache size
    • Eviction counts

Acceptance Criteria:

  • Decorator pattern implemented
    • Cache hit rate > 70% for reads
    • Automatic cache invalidation on updates
    • Cache failures don't break application
    • Metrics for cache performance

8. Complete API Documentation (P1 - High Priority)

Problem: API documentation is incomplete. Need comprehensive GraphQL API documentation.

Current State:

  • GraphQL schema exists but lacks descriptions
  • No example queries
  • No API guide for consumers

Affected Files:

  • GraphQL schema files - Need descriptions
  • api/README.md - Needs comprehensive guide
  • All resolver implementations - Need documentation

Solution:

  1. Add descriptions to GraphQL schema:

    • Document all types, queries, mutations
    • Add field descriptions
    • Document input validation rules
    • Add deprecation notices where applicable
  2. Create comprehensive API documentation:

    • api/README.md - Complete API guide
    • api/EXAMPLES.md - Query examples
    • Document authentication requirements
    • Document rate limiting
    • Document error responses
  3. Enhance GraphQL Playground:

    • Pre-populate with example queries
    • Add query templates
    • Document schema changes

Acceptance Criteria:

  • All 80+ GraphQL resolvers documented
  • Example queries for each operation
  • Input validation rules documented
  • Error response examples
  • Authentication requirements clear
  • API changelog maintained

9. Refactor Testing Utilities (P2 - Medium Priority)

Problem: Tests depend on live database connections, making them slow and unreliable.

Current State:

  • internal/testutil/testutil.go has database connection logic
  • Integration tests require live database
  • Tests are slow and may be flaky

Affected Files:

  • internal/testutil/testutil.go - Database connection logic
  • All integration tests - Depend on live DB

Solution:

  1. Decouple tests from live database:

    • Remove database connection from testutil
    • Use test containers for integration tests
    • Use mocks for unit tests
  2. Improve test utilities:

    • Create test data builders
    • Add fixtures for common scenarios
    • Improve test isolation
  3. Add parallel test execution:

    • Enable -parallel flag where safe
    • Use test-specific database schemas
    • Clean up test data properly

Acceptance Criteria:

  • Unit tests run without database
  • Integration tests use test containers
  • Tests run in parallel where possible
  • Test execution time < 5 seconds for unit tests
  • Clear separation between unit and integration tests

10. Implement Analytics Features (P2 - Medium Priority)

Problem: Analytics service exists but some metrics are stubs (like, comment, bookmark counting).

Current State:

  • internal/jobs/linguistics/work_analysis_service.go has TODO comments:
    • Line 184: ViewCount TODO
    • Line 185: LikeCount TODO
    • Line 186: CommentCount TODO
    • Line 187: BookmarkCount TODO
    • Line 188: TranslationCount TODO
    • Line 192: PopularTranslations TODO

Affected Files:

  • internal/jobs/linguistics/work_analysis_service.go - Stub implementations
  • internal/app/analytics/* - Analytics services

Solution:

  1. Implement counting services:

    • Like counting service
    • Comment counting service
    • Bookmark counting service
    • Translation counting service
    • View counting service
  2. Implement popular translations calculation:

    • Calculate based on likes, comments, bookmarks
    • Cache results for performance
    • Update periodically via background job
  3. Add analytics to work analysis:

    • Integrate counting services
    • Update WorkAnalytics struct
    • Ensure data is accurate and up-to-date

Acceptance Criteria:

  • All analytics metrics implemented
  • Popular translations calculated correctly
  • Analytics updated in real-time or near-real-time
  • Performance optimized (cached where appropriate)
  • Tests for all analytics features

Implementation Guidelines

  1. Architecture First: Maintain clean architecture, DDD, and CQRS patterns
  2. Backward Compatibility: Ensure API contracts remain consistent
  3. Code Quality:
    • Follow Go best practices and idioms
    • Use interfaces for testability
    • Maintain separation of concerns
    • Add comprehensive error handling
  4. Testing: Write tests for all new features and refactorings
  5. Documentation: Add GoDoc comments for all public APIs
  6. Performance: Optimize for production workloads
  7. Observability: Instrument all critical paths

Expected Outcome

  • Production-ready search functionality
  • Proper dependency injection (no globals)
  • Full observability (tracing, metrics, logging)
  • Optimized queries with DTOs
  • Comprehensive API documentation
  • Fast, reliable test suite
  • Complete analytics features
  • Improved code maintainability

Files to Prioritize

  1. internal/app/search/service.go - Core search implementation (P0)
  2. internal/platform/config/config.go - Configuration refactoring (P1)
  3. internal/observability/* - Observability enhancements (P0)
  4. internal/app/*/queries.go - DTO implementation (P1)
  5. internal/platform/log/* - Logging improvements (P1)
  6. api/README.md - API documentation (P1)

Notes

  • Codebase uses Go 1.25
  • Follows DDD/CQRS/Clean Architecture patterns
  • GraphQL API with gqlgen
  • PostgreSQL with GORM
  • Weaviate for vector search
  • Redis for caching and job queue
  • Docker for local development
  • Existing tests should continue to pass
  • Follow existing code style and patterns