mukimovd/tercul-backend

Fork 0

mirror of https://github.com/SamyRai/tercul-backend.git synced 2025-12-26 22:21:33 +00:00

Damir Mukimov b5cd1761af

Update workflows and tasks documentation

2025-11-30 03:13:33 +01:00

17 KiB

Raw Permalink Blame History

Backend Production Readiness & Code Quality Improvements

Overview

Implement critical production-ready features, refactor architectural issues, and improve code quality for the Tercul backend. The codebase uses Go 1.25, follows DDD/CQRS patterns, GraphQL API, and clean architecture principles.

Critical Issues to Resolve

1. Implement Full-Text Search Service (P0 - Critical)

Problem: The search service in internal/app/search/service.go is a stub that returns empty results. This is a core feature that users depend on.

Current State:

Search() method returns empty results (line 31-39)
IndexWork() is partially implemented but search logic missing
Weaviate client exists but not utilized for search
Search filters are defined but not applied

Affected Files:

internal/app/search/service.go - Main search service (stub implementation)
internal/platform/search/weaviate_wrapper.go - Weaviate client wrapper
internal/domain/search/search.go - Search domain interfaces
GraphQL resolvers that use search service

Solution:

Implement full Weaviate search query in Search() method:
- Query Weaviate for works, translations, and authors
- Apply search filters (language, type, date range, tags, authors)
- Support multi-language search (Russian, English, Tatar)
- Implement relevance ranking
- Add pagination support
- Handle special characters and diacritics
Enhance indexing:
- Index work titles, content, and metadata
- Index translation content with language tags
- Index author names and biographies
- Add incremental indexing on create/update operations
- Create background job for bulk indexing existing content
Add search result transformation:
- Map Weaviate results to domain entities
- Include relevance scores
- Handle empty results gracefully
- Add search analytics/metrics

Acceptance Criteria:

Search returns relevant results ranked by relevance
Supports filtering by language, category, tags, authors, date ranges
Search response time < 200ms for 95th percentile
Handles multi-language queries correctly
All existing tests pass
Integration tests with real Weaviate instance

2. Refactor Global Configuration Singleton (P1 - High Priority)

Problem: The application uses a global singleton config.Cfg which violates dependency injection principles and makes testing difficult.

Current State:

internal/platform/config/config.go has global var Cfg *Config
config.Cfg is accessed directly in multiple places:
- internal/platform/search/bleve_client.go (line 13)
- Various other packages

Affected Files:

internal/platform/config/config.go - Global config singleton
internal/platform/search/bleve_client.go - Uses config.Cfg
cmd/api/main.go - Loads config but also sets global
cmd/worker/main.go - Similar pattern
Any other files accessing config.Cfg directly

Solution:

Remove global Cfg variable from config package
Refactor LoadConfig() to return config without setting global
Pass *config.Config as dependency to all constructors:
- Update NewBleveClient() to accept config parameter
- Update all repository constructors to accept config
- Update application service constructors
- Update platform service constructors
Update main entry points:
- cmd/api/main.go - Pass config to all dependencies
- cmd/worker/main.go - Pass config to all dependencies
- cmd/tools/enrich/main.go - Pass config to dependencies
Make configuration more flexible:
- Make migration path configurable (currently hardcoded)
- Make metrics server port configurable
- Add validation for required config values
- Add config struct tags for better documentation

Acceptance Criteria:

No global config.Cfg usage anywhere in codebase
All dependencies receive config via constructor injection
Tests can easily mock/inject different configs
Configuration validation on startup
Backward compatible (same environment variables work)

3. Enhance Observability: Distributed Tracing (P0 - Critical)

Problem: Tracing is implemented but only exports to stdout. Need production-ready tracing with OTLP exporter and proper instrumentation.

Current State:

internal/observability/tracing.go uses stdouttrace exporter
Basic tracer provider exists but not production-ready
Missing instrumentation in many places

Affected Files:

internal/observability/tracing.go - Only stdout exporter
HTTP middleware - May need tracing instrumentation
GraphQL resolvers - Need span creation
Database queries - Need query tracing
Application services - Need business logic spans

Solution:

Replace stdout exporter with OTLP exporter:
- Add OTLP exporter configuration
- Support both gRPC and HTTP OTLP endpoints
- Add environment-based configuration (dev vs prod)
- Add trace sampling strategy (100% dev, 10% prod)
Enhance instrumentation:
- Add automatic HTTP request tracing in middleware
- Instrument all GraphQL resolvers with spans
- Add database query spans via GORM callbacks
- Create custom spans for slow operations (>100ms)
- Add span attributes (user_id, work_id, etc.)
Add trace context propagation:
- Ensure trace IDs propagate through all layers
- Add trace ID to structured logs
- Support distributed tracing across services

Configuration:

type TracingConfig struct {
    Enabled       bool
    ServiceName   string
    OTLPEndpoint  string
    SamplingRate  float64
    Environment   string
}

Acceptance Criteria:

Traces exported to OTLP collector (Jaeger/Tempo compatible)
All HTTP requests have spans
All GraphQL resolvers traced
Database queries have spans
Trace IDs in logs
Sampling configurable per environment

4. Enhance Observability: Prometheus Metrics (P0 - Critical)

Problem: Basic metrics exist but need enhancement for production monitoring and alerting.

Current State:

internal/observability/metrics.go has basic HTTP and DB metrics
Missing business metrics, GraphQL-specific metrics
No Grafana dashboards or alerting rules

Affected Files:

internal/observability/metrics.go - Basic metrics
GraphQL resolvers - Need resolver metrics
Application services - Need business metrics
Background jobs - Need job metrics

Solution:

Add GraphQL-specific metrics:
- graphql_resolver_duration_seconds{operation, resolver}
- graphql_errors_total{operation, error_type}
- graphql_operations_total{operation, status}
Add business metrics:
- works_created_total{language}
- searches_performed_total{type}
- user_registrations_total
- translations_created_total{language}
- likes_total{entity_type}
Enhance existing metrics:
- Add more labels to HTTP metrics (status code as number)
- Add query type labels to DB metrics
- Add connection pool metrics
- Add cache hit/miss metrics
Create observability package structure:
- Move metrics to internal/observability/metrics/
- Add metric collection helpers
- Document metric naming conventions

Acceptance Criteria:

All critical paths have metrics
GraphQL operations fully instrumented
Business metrics tracked
Metrics exposed on /metrics endpoint
Metric labels follow Prometheus best practices

5. Implement Read Models (DTOs) for Efficient Queries (P1 - High Priority)

Problem: Application queries return full domain entities, which is inefficient and leaks domain logic to API layer.

Current State:

Queries in internal/app/*/queries.go return domain entities
GraphQL resolvers receive full entities with all fields
No optimization for list vs detail views

Affected Files:

internal/app/work/queries.go - Returns domain.Work
internal/app/translation/queries.go - Returns domain.Translation
internal/app/author/queries.go - Returns domain.Author
GraphQL resolvers - Receive full entities

Solution:

Create DTO packages:
- internal/app/work/dto - WorkListDTO, WorkDetailDTO
- internal/app/translation/dto - TranslationListDTO, TranslationDetailDTO
- internal/app/author/dto - AuthorListDTO, AuthorDetailDTO

Define optimized DTOs:

// WorkListDTO - For list views (minimal fields)
type WorkListDTO struct {
    ID              uint
    Title           string
    AuthorName      string
    AuthorID        uint
    Language        string
    CreatedAt       time.Time
    ViewCount       int
    LikeCount       int
    TranslationCount int
}

// WorkDetailDTO - For single work view (all fields)
type WorkDetailDTO struct {
    *WorkListDTO
    Content         string
    Description     string
    Tags            []string
    Translations    []TranslationSummaryDTO
    Author          AuthorSummaryDTO
}

Refactor queries to return DTOs:
- Update query methods to use optimized SQL
- Use joins to avoid N+1 queries
- Map domain entities to DTOs
- Update GraphQL resolvers to use DTOs
Add benchmarks comparing old vs new approach

Acceptance Criteria:

List queries return optimized DTOs
Detail queries return full DTOs
No N+1 query problems
Payload size reduced by 30-50%
Query response time improved by 20%
No breaking changes to GraphQL schema

6. Improve Structured Logging (P1 - High Priority)

Problem: Logging exists but lacks request context, user IDs, and trace correlation.

Current State:

internal/platform/log uses zerolog
Basic logging but missing context
No request ID propagation
No user ID in logs
No trace/span ID correlation

Affected Files:

internal/platform/log/logger.go - Basic logger
HTTP middleware - Needs request ID injection
All application services - Need context logging

Solution:

Enhance HTTP middleware:
- Generate request ID for each request
- Inject request ID into context
- Add user ID from JWT to context
- Add trace/span IDs to context
Update logger to use context:
- Extract request ID, user ID, trace ID from context
- Add to all log entries automatically
- Create helper: log.FromContext(ctx).WithRequestID().WithUserID()
Add structured logging fields:
- Define field name constants
- Ensure consistent field names across codebase
- Add sensitive data redaction
Implement log sampling:
- Sample high-volume endpoints (e.g., health checks)
- Configurable sampling rates
- Always log errors regardless of sampling

Acceptance Criteria:

All logs include request ID
Authenticated request logs include user ID
All logs include trace/span IDs
Consistent log format across codebase
Sensitive data excluded from logs
Log sampling for high-volume endpoints

7. Refactor Caching with Decorator Pattern (P1 - High Priority)

Problem: Current caching implementation uses bespoke cached repositories. Should use decorator pattern for better maintainability.

Current State:

internal/data/cache has custom caching logic
Cached repositories are separate implementations
Not following decorator pattern

Affected Files:

internal/data/cache/* - Current caching implementation
Repository interfaces - Need to support decorators

Solution:

Implement decorator pattern:
- Create CachedWorkRepository decorator
- Create CachedAuthorRepository decorator
- Create CachedTranslationRepository decorator
- Decorators wrap base repositories
Implement cache-aside pattern:
- Check cache on read, populate on miss
- Invalidate cache on write operations
- Add cache key versioning strategy
Add cache configuration:
- TTL per entity type
- Cache size limits
- Cache warming strategies
Add cache metrics:
- Hit/miss rates
- Cache size
- Eviction counts

Acceptance Criteria:

Decorator pattern implemented
- Cache hit rate > 70% for reads
- Automatic cache invalidation on updates
- Cache failures don't break application
- Metrics for cache performance

8. Complete API Documentation (P1 - High Priority)

Problem: API documentation is incomplete. Need comprehensive GraphQL API documentation.

Current State:

GraphQL schema exists but lacks descriptions
No example queries
No API guide for consumers

Affected Files:

GraphQL schema files - Need descriptions
api/README.md - Needs comprehensive guide
All resolver implementations - Need documentation

Solution:

Add descriptions to GraphQL schema:
- Document all types, queries, mutations
- Add field descriptions
- Document input validation rules
- Add deprecation notices where applicable
Create comprehensive API documentation:
- api/README.md - Complete API guide
- api/EXAMPLES.md - Query examples
- Document authentication requirements
- Document rate limiting
- Document error responses
Enhance GraphQL Playground:
- Pre-populate with example queries
- Add query templates
- Document schema changes

Acceptance Criteria:

All 80+ GraphQL resolvers documented
Example queries for each operation
Input validation rules documented
Error response examples
Authentication requirements clear
API changelog maintained

9. Refactor Testing Utilities (P2 - Medium Priority)

Problem: Tests depend on live database connections, making them slow and unreliable.

Current State:

internal/testutil/testutil.go has database connection logic
Integration tests require live database
Tests are slow and may be flaky

Affected Files:

internal/testutil/testutil.go - Database connection logic
All integration tests - Depend on live DB

Solution:

Decouple tests from live database:
- Remove database connection from testutil
- Use test containers for integration tests
- Use mocks for unit tests
Improve test utilities:
- Create test data builders
- Add fixtures for common scenarios
- Improve test isolation
Add parallel test execution:
- Enable -parallel flag where safe
- Use test-specific database schemas
- Clean up test data properly

Acceptance Criteria:

Unit tests run without database
Integration tests use test containers
Tests run in parallel where possible
Test execution time < 5 seconds for unit tests
Clear separation between unit and integration tests

10. Implement Analytics Features (P2 - Medium Priority)

Problem: Analytics service exists but some metrics are stubs (like, comment, bookmark counting).

Current State:

internal/jobs/linguistics/work_analysis_service.go has TODO comments:
- Line 184: ViewCount TODO
- Line 185: LikeCount TODO
- Line 186: CommentCount TODO
- Line 187: BookmarkCount TODO
- Line 188: TranslationCount TODO
- Line 192: PopularTranslations TODO

Affected Files:

internal/jobs/linguistics/work_analysis_service.go - Stub implementations
internal/app/analytics/* - Analytics services

Solution:

Implement counting services:
- Like counting service
- Comment counting service
- Bookmark counting service
- Translation counting service
- View counting service
Implement popular translations calculation:
- Calculate based on likes, comments, bookmarks
- Cache results for performance
- Update periodically via background job
Add analytics to work analysis:
- Integrate counting services
- Update WorkAnalytics struct
- Ensure data is accurate and up-to-date

Acceptance Criteria:

All analytics metrics implemented
Popular translations calculated correctly
Analytics updated in real-time or near-real-time
Performance optimized (cached where appropriate)
Tests for all analytics features

Implementation Guidelines

Architecture First: Maintain clean architecture, DDD, and CQRS patterns
Backward Compatibility: Ensure API contracts remain consistent
Code Quality:
- Follow Go best practices and idioms
- Use interfaces for testability
- Maintain separation of concerns
- Add comprehensive error handling
Testing: Write tests for all new features and refactorings
Documentation: Add GoDoc comments for all public APIs
Performance: Optimize for production workloads
Observability: Instrument all critical paths

Expected Outcome

Production-ready search functionality
Proper dependency injection (no globals)
Full observability (tracing, metrics, logging)
Optimized queries with DTOs
Comprehensive API documentation
Fast, reliable test suite
Complete analytics features
Improved code maintainability

Files to Prioritize

internal/app/search/service.go - Core search implementation (P0)
internal/platform/config/config.go - Configuration refactoring (P1)
internal/observability/* - Observability enhancements (P0)
internal/app/*/queries.go - DTO implementation (P1)
internal/platform/log/* - Logging improvements (P1)
api/README.md - API documentation (P1)

Notes

Codebase uses Go 1.25
Follows DDD/CQRS/Clean Architecture patterns
GraphQL API with gqlgen
PostgreSQL with GORM
Weaviate for vector search
Redis for caching and job queue
Docker for local development
Existing tests should continue to pass
Follow existing code style and patterns

17 KiB Raw Permalink Blame History

Backend Production Readiness & Code Quality Improvements

Overview

Critical Issues to Resolve

1. Implement Full-Text Search Service (P0 - Critical)

2. Refactor Global Configuration Singleton (P1 - High Priority)

3. Enhance Observability: Distributed Tracing (P0 - Critical)

4. Enhance Observability: Prometheus Metrics (P0 - Critical)

5. Implement Read Models (DTOs) for Efficient Queries (P1 - High Priority)

6. Improve Structured Logging (P1 - High Priority)

7. Refactor Caching with Decorator Pattern (P1 - High Priority)

8. Complete API Documentation (P1 - High Priority)

9. Refactor Testing Utilities (P2 - Medium Priority)

10. Implement Analytics Features (P2 - Medium Priority)

Implementation Guidelines

Expected Outcome

Files to Prioritize

Notes

17 KiB

Raw Permalink Blame History