# Backend Production Readiness & Code Quality Improvements

## Overview
Implement critical production-ready features, refactor architectural issues, and improve code quality for the Tercul backend. The codebase uses Go 1.25, follows DDD/CQRS patterns, GraphQL API, and clean architecture principles.

## Critical Issues to Resolve

### 1. Implement Full-Text Search Service (P0 - Critical)
**Problem**: The search service in `internal/app/search/service.go` is a stub that returns empty results. This is a core feature that users depend on.

**Current State**:
- `Search()` method returns empty results (line 31-39)
- `IndexWork()` is partially implemented but search logic missing
- Weaviate client exists but not utilized for search
- Search filters are defined but not applied

**Affected Files**:
- `internal/app/search/service.go` - Main search service (stub implementation)
- `internal/platform/search/weaviate_wrapper.go` - Weaviate client wrapper
- `internal/domain/search/search.go` - Search domain interfaces
- GraphQL resolvers that use search service

**Solution**:
1. Implement full Weaviate search query in `Search()` method:
   - Query Weaviate for works, translations, and authors
   - Apply search filters (language, type, date range, tags, authors)
   - Support multi-language search (Russian, English, Tatar)
   - Implement relevance ranking
   - Add pagination support
   - Handle special characters and diacritics

2. Enhance indexing:
   - Index work titles, content, and metadata
   - Index translation content with language tags
   - Index author names and biographies
   - Add incremental indexing on create/update operations
   - Create background job for bulk indexing existing content

3. Add search result transformation:
   - Map Weaviate results to domain entities
   - Include relevance scores
   - Handle empty results gracefully
   - Add search analytics/metrics

**Acceptance Criteria**:
- Search returns relevant results ranked by relevance
- Supports filtering by language, category, tags, authors, date ranges
- Search response time < 200ms for 95th percentile
- Handles multi-language queries correctly
- All existing tests pass
- Integration tests with real Weaviate instance

### 2. Refactor Global Configuration Singleton (P1 - High Priority)
**Problem**: The application uses a global singleton `config.Cfg` which violates dependency injection principles and makes testing difficult.

**Current State**:
- `internal/platform/config/config.go` has global `var Cfg *Config`
- `config.Cfg` is accessed directly in multiple places:
  - `internal/platform/search/bleve_client.go` (line 13)
  - Various other packages

**Affected Files**:
- `internal/platform/config/config.go` - Global config singleton
- `internal/platform/search/bleve_client.go` - Uses `config.Cfg`
- `cmd/api/main.go` - Loads config but also sets global
- `cmd/worker/main.go` - Similar pattern
- Any other files accessing `config.Cfg` directly

**Solution**:
1. Remove global `Cfg` variable from config package
2. Refactor `LoadConfig()` to return config without setting global
3. Pass `*config.Config` as dependency to all constructors:
   - Update `NewBleveClient()` to accept config parameter
   - Update all repository constructors to accept config
   - Update application service constructors
   - Update platform service constructors

4. Update main entry points:
   - `cmd/api/main.go` - Pass config to all dependencies
   - `cmd/worker/main.go` - Pass config to all dependencies
   - `cmd/tools/enrich/main.go` - Pass config to dependencies

5. Make configuration more flexible:
   - Make migration path configurable (currently hardcoded)
   - Make metrics server port configurable
   - Add validation for required config values
   - Add config struct tags for better documentation

**Acceptance Criteria**:
- No global `config.Cfg` usage anywhere in codebase
- All dependencies receive config via constructor injection
- Tests can easily mock/inject different configs
- Configuration validation on startup
- Backward compatible (same environment variables work)

### 3. Enhance Observability: Distributed Tracing (P0 - Critical)
**Problem**: Tracing is implemented but only exports to stdout. Need production-ready tracing with OTLP exporter and proper instrumentation.

**Current State**:
- `internal/observability/tracing.go` uses `stdouttrace` exporter
- Basic tracer provider exists but not production-ready
- Missing instrumentation in many places

**Affected Files**:
- `internal/observability/tracing.go` - Only stdout exporter
- HTTP middleware - May need tracing instrumentation
- GraphQL resolvers - Need span creation
- Database queries - Need query tracing
- Application services - Need business logic spans

**Solution**:
1. Replace stdout exporter with OTLP exporter:
   - Add OTLP exporter configuration
   - Support both gRPC and HTTP OTLP endpoints
   - Add environment-based configuration (dev vs prod)
   - Add trace sampling strategy (100% dev, 10% prod)

2. Enhance instrumentation:
   - Add automatic HTTP request tracing in middleware
   - Instrument all GraphQL resolvers with spans
   - Add database query spans via GORM callbacks
   - Create custom spans for slow operations (>100ms)
   - Add span attributes (user_id, work_id, etc.)

3. Add trace context propagation:
   - Ensure trace IDs propagate through all layers
   - Add trace ID to structured logs
   - Support distributed tracing across services

4. Configuration:
   ```go
   type TracingConfig struct {
       Enabled       bool
       ServiceName   string
       OTLPEndpoint  string
       SamplingRate  float64
       Environment   string
   }
   ```

**Acceptance Criteria**:
- Traces exported to OTLP collector (Jaeger/Tempo compatible)
- All HTTP requests have spans
- All GraphQL resolvers traced
- Database queries have spans
- Trace IDs in logs
- Sampling configurable per environment

### 4. Enhance Observability: Prometheus Metrics (P0 - Critical)
**Problem**: Basic metrics exist but need enhancement for production monitoring and alerting.

**Current State**:
- `internal/observability/metrics.go` has basic HTTP and DB metrics
- Missing business metrics, GraphQL-specific metrics
- No Grafana dashboards or alerting rules

**Affected Files**:
- `internal/observability/metrics.go` - Basic metrics
- GraphQL resolvers - Need resolver metrics
- Application services - Need business metrics
- Background jobs - Need job metrics

**Solution**:
1. Add GraphQL-specific metrics:
   - `graphql_resolver_duration_seconds{operation, resolver}`
   - `graphql_errors_total{operation, error_type}`
   - `graphql_operations_total{operation, status}`

2. Add business metrics:
   - `works_created_total{language}`
   - `searches_performed_total{type}`
   - `user_registrations_total`
   - `translations_created_total{language}`
   - `likes_total{entity_type}`

3. Enhance existing metrics:
   - Add more labels to HTTP metrics (status code as number)
   - Add query type labels to DB metrics
   - Add connection pool metrics
   - Add cache hit/miss metrics

4. Create observability package structure:
   - Move metrics to `internal/observability/metrics/`
   - Add metric collection helpers
   - Document metric naming conventions

**Acceptance Criteria**:
- All critical paths have metrics
- GraphQL operations fully instrumented
- Business metrics tracked
- Metrics exposed on `/metrics` endpoint
- Metric labels follow Prometheus best practices

### 5. Implement Read Models (DTOs) for Efficient Queries (P1 - High Priority)
**Problem**: Application queries return full domain entities, which is inefficient and leaks domain logic to API layer.

**Current State**:
- Queries in `internal/app/*/queries.go` return domain entities
- GraphQL resolvers receive full entities with all fields
- No optimization for list vs detail views

**Affected Files**:
- `internal/app/work/queries.go` - Returns `domain.Work`
- `internal/app/translation/queries.go` - Returns `domain.Translation`
- `internal/app/author/queries.go` - Returns `domain.Author`
- GraphQL resolvers - Receive full entities

**Solution**:
1. Create DTO packages:
   - `internal/app/work/dto` - WorkListDTO, WorkDetailDTO
   - `internal/app/translation/dto` - TranslationListDTO, TranslationDetailDTO
   - `internal/app/author/dto` - AuthorListDTO, AuthorDetailDTO

2. Define optimized DTOs:
   ```go
   // WorkListDTO - For list views (minimal fields)
   type WorkListDTO struct {
       ID              uint
       Title           string
       AuthorName      string
       AuthorID        uint
       Language        string
       CreatedAt       time.Time
       ViewCount       int
       LikeCount       int
       TranslationCount int
   }

   // WorkDetailDTO - For single work view (all fields)
   type WorkDetailDTO struct {
       *WorkListDTO
       Content         string
       Description     string
       Tags            []string
       Translations    []TranslationSummaryDTO
       Author          AuthorSummaryDTO
   }
   ```

3. Refactor queries to return DTOs:
   - Update query methods to use optimized SQL
   - Use joins to avoid N+1 queries
   - Map domain entities to DTOs
   - Update GraphQL resolvers to use DTOs

4. Add benchmarks comparing old vs new approach

**Acceptance Criteria**:
- List queries return optimized DTOs
- Detail queries return full DTOs
- No N+1 query problems
- Payload size reduced by 30-50%
- Query response time improved by 20%
- No breaking changes to GraphQL schema

### 6. Improve Structured Logging (P1 - High Priority)
**Problem**: Logging exists but lacks request context, user IDs, and trace correlation.

**Current State**:
- `internal/platform/log` uses zerolog
- Basic logging but missing context
- No request ID propagation
- No user ID in logs
- No trace/span ID correlation

**Affected Files**:
- `internal/platform/log/logger.go` - Basic logger
- HTTP middleware - Needs request ID injection
- All application services - Need context logging

**Solution**:
1. Enhance HTTP middleware:
   - Generate request ID for each request
   - Inject request ID into context
   - Add user ID from JWT to context
   - Add trace/span IDs to context

2. Update logger to use context:
   - Extract request ID, user ID, trace ID from context
   - Add to all log entries automatically
   - Create helper: `log.FromContext(ctx).WithRequestID().WithUserID()`

3. Add structured logging fields:
   - Define field name constants
   - Ensure consistent field names across codebase
   - Add sensitive data redaction

4. Implement log sampling:
   - Sample high-volume endpoints (e.g., health checks)
   - Configurable sampling rates
   - Always log errors regardless of sampling

**Acceptance Criteria**:
- All logs include request ID
- Authenticated request logs include user ID
- All logs include trace/span IDs
- Consistent log format across codebase
- Sensitive data excluded from logs
- Log sampling for high-volume endpoints

### 7. Refactor Caching with Decorator Pattern (P1 - High Priority)
**Problem**: Current caching implementation uses bespoke cached repositories. Should use decorator pattern for better maintainability.

**Current State**:
- `internal/data/cache` has custom caching logic
- Cached repositories are separate implementations
- Not following decorator pattern

**Affected Files**:
- `internal/data/cache/*` - Current caching implementation
- Repository interfaces - Need to support decorators

**Solution**:
1. Implement decorator pattern:
   - Create `CachedWorkRepository` decorator
   - Create `CachedAuthorRepository` decorator
   - Create `CachedTranslationRepository` decorator
   - Decorators wrap base repositories

2. Implement cache-aside pattern:
   - Check cache on read, populate on miss
   - Invalidate cache on write operations
   - Add cache key versioning strategy

3. Add cache configuration:
   - TTL per entity type
   - Cache size limits
   - Cache warming strategies

4. Add cache metrics:
   - Hit/miss rates
   - Cache size
   - Eviction counts

**Acceptance Criteria**:
- Decorator pattern implemented
   - Cache hit rate > 70% for reads
   - Automatic cache invalidation on updates
   - Cache failures don't break application
   - Metrics for cache performance

### 8. Complete API Documentation (P1 - High Priority)
**Problem**: API documentation is incomplete. Need comprehensive GraphQL API documentation.

**Current State**:
- GraphQL schema exists but lacks descriptions
- No example queries
- No API guide for consumers

**Affected Files**:
- GraphQL schema files - Need descriptions
- `api/README.md` - Needs comprehensive guide
- All resolver implementations - Need documentation

**Solution**:
1. Add descriptions to GraphQL schema:
   - Document all types, queries, mutations
   - Add field descriptions
   - Document input validation rules
   - Add deprecation notices where applicable

2. Create comprehensive API documentation:
   - `api/README.md` - Complete API guide
   - `api/EXAMPLES.md` - Query examples
   - Document authentication requirements
   - Document rate limiting
   - Document error responses

3. Enhance GraphQL Playground:
   - Pre-populate with example queries
   - Add query templates
   - Document schema changes

**Acceptance Criteria**:
- All 80+ GraphQL resolvers documented
- Example queries for each operation
- Input validation rules documented
- Error response examples
- Authentication requirements clear
- API changelog maintained

### 9. Refactor Testing Utilities (P2 - Medium Priority)
**Problem**: Tests depend on live database connections, making them slow and unreliable.

**Current State**:
- `internal/testutil/testutil.go` has database connection logic
- Integration tests require live database
- Tests are slow and may be flaky

**Affected Files**:
- `internal/testutil/testutil.go` - Database connection logic
- All integration tests - Depend on live DB

**Solution**:
1. Decouple tests from live database:
   - Remove database connection from testutil
   - Use test containers for integration tests
   - Use mocks for unit tests

2. Improve test utilities:
   - Create test data builders
   - Add fixtures for common scenarios
   - Improve test isolation

3. Add parallel test execution:
   - Enable `-parallel` flag where safe
   - Use test-specific database schemas
   - Clean up test data properly

**Acceptance Criteria**:
- Unit tests run without database
- Integration tests use test containers
- Tests run in parallel where possible
- Test execution time < 5 seconds for unit tests
- Clear separation between unit and integration tests

### 10. Implement Analytics Features (P2 - Medium Priority)
**Problem**: Analytics service exists but some metrics are stubs (like, comment, bookmark counting).

**Current State**:
- `internal/jobs/linguistics/work_analysis_service.go` has TODO comments:
  - Line 184: ViewCount TODO
  - Line 185: LikeCount TODO
  - Line 186: CommentCount TODO
  - Line 187: BookmarkCount TODO
  - Line 188: TranslationCount TODO
  - Line 192: PopularTranslations TODO

**Affected Files**:
- `internal/jobs/linguistics/work_analysis_service.go` - Stub implementations
- `internal/app/analytics/*` - Analytics services

**Solution**:
1. Implement counting services:
   - Like counting service
   - Comment counting service
   - Bookmark counting service
   - Translation counting service
   - View counting service

2. Implement popular translations calculation:
   - Calculate based on likes, comments, bookmarks
   - Cache results for performance
   - Update periodically via background job

3. Add analytics to work analysis:
   - Integrate counting services
   - Update WorkAnalytics struct
   - Ensure data is accurate and up-to-date

**Acceptance Criteria**:
- All analytics metrics implemented
- Popular translations calculated correctly
- Analytics updated in real-time or near-real-time
- Performance optimized (cached where appropriate)
- Tests for all analytics features

## Implementation Guidelines

1. **Architecture First**: Maintain clean architecture, DDD, and CQRS patterns
2. **Backward Compatibility**: Ensure API contracts remain consistent
3. **Code Quality**:
   - Follow Go best practices and idioms
   - Use interfaces for testability
   - Maintain separation of concerns
   - Add comprehensive error handling
4. **Testing**: Write tests for all new features and refactorings
5. **Documentation**: Add GoDoc comments for all public APIs
6. **Performance**: Optimize for production workloads
7. **Observability**: Instrument all critical paths

## Expected Outcome

- Production-ready search functionality
- Proper dependency injection (no globals)
- Full observability (tracing, metrics, logging)
- Optimized queries with DTOs
- Comprehensive API documentation
- Fast, reliable test suite
- Complete analytics features
- Improved code maintainability

## Files to Prioritize

1. `internal/app/search/service.go` - Core search implementation (P0)
2. `internal/platform/config/config.go` - Configuration refactoring (P1)
3. `internal/observability/*` - Observability enhancements (P0)
4. `internal/app/*/queries.go` - DTO implementation (P1)
5. `internal/platform/log/*` - Logging improvements (P1)
6. `api/README.md` - API documentation (P1)

## Notes

- Codebase uses Go 1.25
- Follows DDD/CQRS/Clean Architecture patterns
- GraphQL API with gqlgen
- PostgreSQL with GORM
- Weaviate for vector search
- Redis for caching and job queue
- Docker for local development
- Existing tests should continue to pass
- Follow existing code style and patterns