Update workflows and tasks documentation

2025-12-26 22:21:33 +00:00 · 2025-11-30 03:12:44 +01:00 · 2025-11-30 03:12:44 +01:00 · b5cd1761af
commit b5cd1761af
parent 24d48396ca
5 changed files with 1525 additions and 26 deletions
--- a/2
+++ b/2
@ -34,4 +34,4 @@ COPY --from=builder /app/tercul .
 EXPOSE 8080

 # Command to run the application
-CMD ["./tercul"]
+CMD ["./tercul"]
--- a/PRODUCTION-TASKS.md
+++ b/PRODUCTION-TASKS.md
@ -0,0 +1,963 @@
+# Tercul Backend - Production Readiness Tasks
+
+**Generated:** November 27, 2025
+**Current Status:** Most core features implemented, needs production hardening
+
+> **⚠️ MIGRATED TO GITHUB ISSUES**
+>
+> All production readiness tasks have been migrated to GitHub Issues for better tracking.
+> See issues #30-38 in the repository: <https://github.com/SamyRai/backend/issues>
+>
+> This document is kept for reference only and should not be used for task tracking.
+
+---
+
+## 📊 Current Reality Check
+
+### ✅ What's Actually Working
+
+- ✅ Full GraphQL API with 90%+ resolvers implemented
+- ✅ Complete CQRS pattern (Commands & Queries)
+- ✅ Auth system (Register, Login, JWT, Password Reset, Email Verification)
+- ✅ Work CRUD with authorization
+- ✅ Translation management with analytics
+- ✅ User management and profiles
+- ✅ Collections, Comments, Likes, Bookmarks
+- ✅ Contributions with review workflow
+- ✅ Analytics service (views, likes, trending)
+- ✅ Clean Architecture with DDD patterns
+- ✅ Comprehensive test coverage (passing tests)
+- ✅ CI/CD pipelines (build, test, lint, security, docker)
+- ✅ Docker setup and containerization
+- ✅ Database migrations and schema
+
+### ⚠️ What Needs Work
+
+- ⚠️ Search functionality (stub implementation) → **Issue #30**
+- ⚠️ Observability (metrics, tracing) → **Issues #31, #32, #33**
+- ⚠️ Production deployment automation → **Issue #36**
+- ⚠️ Performance optimization → **Issues #34, #35**
+- ⚠️ Security hardening → **Issue #37**
+- ⚠️ Infrastructure as Code → **Issue #38**
+
+---
+
+## 🎯 EPIC 1: Search & Discovery (HIGH PRIORITY)
+
+### Story 1.1: Full-Text Search Implementation
+
+**Priority:** P0 (Critical)
+**Estimate:** 8 story points (2-3 days)
+**Labels:** `enhancement`, `search`, `backend`
+
+**User Story:**
+
+```
+As a user exploring literary works,
+I want to search across works, translations, and authors by keywords,
+So that I can quickly find relevant content in my preferred language.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Implement Weaviate-based full-text search for works
+- [ ] Index work titles, content, and metadata
+- [ ] Support multi-language search (Russian, English, Tatar)
+- [ ] Search returns relevance-ranked results
+- [ ] Support filtering by language, category, tags, authors
+- [ ] Support date range filtering
+- [ ] Search response time < 200ms for 95th percentile
+- [ ] Handle special characters and diacritics correctly
+
+**Technical Tasks:**
+
+1. Complete `internal/app/search/service.go` implementation
+2. Implement Weaviate schema for Works, Translations, Authors
+3. Create background indexing job for existing content
+4. Add incremental indexing on create/update operations
+5. Implement search query parsing and normalization
+6. Add search result pagination and sorting
+7. Create integration tests for search functionality
+8. Add search metrics and monitoring
+
+**Dependencies:**
+
+- Weaviate instance running (already in docker-compose)
+- `internal/platform/search` client (exists)
+- `internal/domain/search` interfaces (exists)
+
+**Definition of Done:**
+
+- All acceptance criteria met
+- Unit tests passing (>80% coverage)
+- Integration tests with real Weaviate instance
+- Performance benchmarks documented
+- Search analytics tracked
+
+---
+
+### Story 1.2: Advanced Search Filters
+
+**Priority:** P1 (High)
+**Estimate:** 5 story points (1-2 days)
+**Labels:** `enhancement`, `search`, `backend`
+
+**User Story:**
+
+```
+As a researcher or literary enthusiast,
+I want to filter search results by multiple criteria simultaneously,
+So that I can narrow down to exactly the works I'm interested in.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Filter by literature type (poetry, prose, drama)
+- [ ] Filter by time period (creation date ranges)
+- [ ] Filter by multiple authors simultaneously
+- [ ] Filter by genre/categories
+- [ ] Filter by language availability
+- [ ] Combine filters with AND/OR logic
+- [ ] Save search filters as presets (future)
+
+**Technical Tasks:**
+
+1. Extend `SearchFilters` domain model
+2. Implement filter translation to Weaviate queries
+3. Add faceted search capabilities
+4. Implement filter validation
+5. Add filter combination logic
+6. Create filter preset storage (optional)
+7. Add tests for all filter combinations
+
+---
+
+## 🎯 EPIC 2: API Documentation (HIGH PRIORITY)
+
+### Story 2.1: Comprehensive GraphQL API Documentation
+
+**Priority:** P1 (High)
+**Estimate:** 5 story points (1-2 days)
+**Labels:** `documentation`, `api`, `devex`
+
+**User Story:**
+
+```
+As a frontend developer or API consumer,
+I want complete documentation for all GraphQL queries and mutations,
+So that I can integrate with the API without constantly asking questions.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Document all 80+ GraphQL resolvers
+- [ ] Include example queries for each operation
+- [ ] Document input types and validation rules
+- [ ] Provide error response examples
+- [ ] Document authentication requirements
+- [ ] Include rate limiting information
+- [ ] Add GraphQL Playground with example queries
+- [ ] Auto-generate docs from schema annotations
+
+**Technical Tasks:**
+
+1. Add descriptions to all GraphQL types in schema
+2. Document each query/mutation with examples
+3. Create `api/README.md` with comprehensive guide
+4. Add inline schema documentation
+5. Set up GraphQL Voyager for schema visualization
+6. Create API changelog
+7. Add versioning documentation
+8. Generate OpenAPI spec for REST endpoints (if any)
+
+**Deliverables:**
+
+- `api/README.md` - Complete API guide
+- `api/EXAMPLES.md` - Query examples
+- `api/CHANGELOG.md` - API version history
+- Enhanced GraphQL schema with descriptions
+- Interactive API explorer
+
+---
+
+### Story 2.2: Developer Onboarding Documentation
+
+**Priority:** P1 (High)
+**Estimate:** 3 story points (1 day)
+**Labels:** `documentation`, `devex`
+
+**User Story:**
+
+```
+As a new developer joining the project,
+I want clear setup instructions and architecture documentation,
+So that I can become productive quickly without extensive hand-holding.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Updated `README.md` with quick start guide
+- [ ] Architecture diagrams and explanations
+- [ ] Development workflow documentation
+- [ ] Testing strategy documentation
+- [ ] Contribution guidelines
+- [ ] Code style guide
+- [ ] Troubleshooting common issues
+
+**Technical Tasks:**
+
+1. Update root `README.md` with modern structure
+2. Create `docs/ARCHITECTURE.md` with diagrams
+3. Document CQRS and DDD patterns used
+4. Create `docs/DEVELOPMENT.md` workflow guide
+5. Document testing strategy in `docs/TESTING.md`
+6. Create `CONTRIBUTING.md` guide
+7. Add package-level `README.md` for complex packages
+
+**Deliverables:**
+
+- Refreshed `README.md`
+- `docs/ARCHITECTURE.md`
+- `docs/DEVELOPMENT.md`
+- `docs/TESTING.md`
+- `CONTRIBUTING.md`
+
+---
+
+## 🎯 EPIC 3: Observability & Monitoring (CRITICAL FOR PRODUCTION)
+
+### Story 3.1: Distributed Tracing with OpenTelemetry
+
+**Priority:** P0 (Critical)
+**Estimate:** 8 story points (2-3 days)
+**Labels:** `observability`, `monitoring`, `infrastructure`
+
+**User Story:**
+
+```
+As a DevOps engineer monitoring production,
+I want distributed tracing across all services and database calls,
+So that I can quickly identify performance bottlenecks and errors.
+```
+
+**Acceptance Criteria:**
+
+- [ ] OpenTelemetry SDK integrated
+- [ ] Automatic trace context propagation
+- [ ] All HTTP handlers instrumented
+- [ ] All database queries traced
+- [ ] All GraphQL resolvers traced
+- [ ] Custom spans for business logic
+- [ ] Traces exported to OTLP collector
+- [ ] Integration with Jaeger/Tempo
+
+**Technical Tasks:**
+
+1. Add OpenTelemetry Go SDK dependencies
+2. Create `internal/observability/tracing` package
+3. Instrument HTTP middleware with auto-tracing
+4. Add database query tracing via GORM callbacks
+5. Instrument GraphQL execution
+6. Add custom spans for slow operations
+7. Set up trace sampling strategy
+8. Configure OTLP exporter
+9. Add Jaeger to docker-compose for local dev
+10. Document tracing best practices
+
+**Configuration:**
+
+```go
+// Example trace configuration
+type TracingConfig struct {
+    Enabled       bool
+    ServiceName   string
+    SamplingRate  float64
+    OTLPEndpoint  string
+}
+```
+
+---
+
+### Story 3.2: Prometheus Metrics & Alerting
+
+**Priority:** P0 (Critical)
+**Estimate:** 5 story points (1-2 days)
+**Labels:** `observability`, `monitoring`, `metrics`
+
+**User Story:**
+
+```
+As a site reliability engineer,
+I want detailed metrics on API performance and system health,
+So that I can detect issues before they impact users.
+```
+
+**Acceptance Criteria:**
+
+- [ ] HTTP request metrics (latency, status codes, throughput)
+- [ ] Database query metrics (query time, connection pool)
+- [ ] Business metrics (works created, searches performed)
+- [ ] System metrics (memory, CPU, goroutines)
+- [ ] GraphQL-specific metrics (resolver performance)
+- [ ] Metrics exposed on `/metrics` endpoint
+- [ ] Prometheus scraping configured
+- [ ] Grafana dashboards created
+
+**Technical Tasks:**
+
+1. Enhance existing Prometheus middleware
+2. Add HTTP handler metrics (already partially done)
+3. Add database query duration histograms
+4. Create business metric counters
+5. Add GraphQL resolver metrics
+6. Create custom metrics for critical paths
+7. Set up metric labels strategy
+8. Create Grafana dashboard JSON
+9. Define SLOs and SLIs
+10. Create alerting rules YAML
+
+**Key Metrics:**
+
+```
+# HTTP Metrics
+http_requests_total{method, path, status}
+http_request_duration_seconds{method, path}
+
+# Database Metrics
+db_query_duration_seconds{query}
+db_connections_current
+db_connections_max
+
+# Business Metrics
+works_created_total{language}
+searches_performed_total{type}
+user_registrations_total
+
+# GraphQL Metrics
+graphql_resolver_duration_seconds{operation, resolver}
+graphql_errors_total{operation, error_type}
+```
+
+---
+
+### Story 3.3: Structured Logging Enhancements
+
+**Priority:** P1 (High)
+**Estimate:** 3 story points (1 day)
+**Labels:** `observability`, `logging`
+
+**User Story:**
+
+```
+As a developer debugging production issues,
+I want rich, structured logs with request context,
+So that I can quickly trace requests and identify root causes.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Request ID in all logs
+- [ ] User ID in authenticated request logs
+- [ ] Trace ID/Span ID in all logs
+- [ ] Consistent log levels across codebase
+- [ ] Sensitive data excluded from logs
+- [ ] Structured fields for easy parsing
+- [ ] Log sampling for high-volume endpoints
+
+**Technical Tasks:**
+
+1. Enhance HTTP middleware to inject request ID
+2. Add user ID to context from JWT
+3. Add trace/span IDs to logger context
+4. Audit all logging statements for consistency
+5. Add field name constants for structured logging
+6. Implement log redaction for passwords/tokens
+7. Add log sampling configuration
+8. Create log aggregation guide (ELK/Loki)
+
+**Log Format Example:**
+
+```json
+{
+  "level": "info",
+  "ts": "2025-11-27T10:30:45.123Z",
+  "msg": "Work created successfully",
+  "request_id": "req_abc123",
+  "user_id": "user_456",
+  "trace_id": "trace_xyz789",
+  "span_id": "span_def321",
+  "work_id": 789,
+  "language": "en",
+  "duration_ms": 45
+}
+```
+
+---
+
+## 🎯 EPIC 4: Performance Optimization (MEDIUM PRIORITY)
+
+### Story 4.1: Read Models (DTOs) for Efficient Queries
+
+**Priority:** P1 (High)
+**Estimate:** 8 story points (2-3 days)
+**Labels:** `performance`, `architecture`, `refactoring`
+
+**User Story:**
+
+```
+As an API consumer,
+I want fast query responses with only the data I need,
+So that my application loads quickly and uses less bandwidth.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Create DTOs for all list queries
+- [ ] DTOs include only fields needed by API
+- [ ] Avoid N+1 queries with proper joins
+- [ ] Reduce payload size by 30-50%
+- [ ] Query response time improved by 20%
+- [ ] No breaking changes to GraphQL schema
+
+**Technical Tasks:**
+
+1. Create `internal/app/work/dto` package
+2. Define WorkListDTO, WorkDetailDTO
+3. Create TranslationListDTO, TranslationDetailDTO
+4. Define AuthorListDTO, AuthorDetailDTO
+5. Implement optimized SQL queries for DTOs
+6. Update query services to return DTOs
+7. Update GraphQL resolvers to map DTOs
+8. Add benchmarks comparing old vs new
+9. Update tests to use DTOs
+10. Document DTO usage patterns
+
+**Example DTO:**
+
+```go
+// WorkListDTO - Optimized for list views
+type WorkListDTO struct {
+    ID              uint
+    Title           string
+    AuthorName      string
+    AuthorID        uint
+    Language        string
+    CreatedAt       time.Time
+    ViewCount       int
+    LikeCount       int
+    TranslationCount int
+}
+
+// WorkDetailDTO - Full information for single work
+type WorkDetailDTO struct {
+    *WorkListDTO
+    Content         string
+    Description     string
+    Tags            []string
+    Categories      []string
+    Translations    []TranslationSummaryDTO
+    Author          AuthorSummaryDTO
+    Analytics       WorkAnalyticsDTO
+}
+```
+
+---
+
+### Story 4.2: Redis Caching Strategy
+
+**Priority:** P1 (High)
+**Estimate:** 5 story points (1-2 days)
+**Labels:** `performance`, `caching`, `infrastructure`
+
+**User Story:**
+
+```
+As a user browsing popular works,
+I want instant page loads for frequently accessed content,
+So that I have a smooth, responsive experience.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Cache hot works (top 100 viewed)
+- [ ] Cache author profiles
+- [ ] Cache search results (5 min TTL)
+- [ ] Cache translations by work ID
+- [ ] Automatic cache invalidation on updates
+- [ ] Cache hit rate > 70% for reads
+- [ ] Cache warming for popular content
+- [ ] Redis failover doesn't break app
+
+**Technical Tasks:**
+
+1. Refactor `internal/data/cache` with decorator pattern
+2. Create `CachedWorkRepository` decorator
+3. Implement cache-aside pattern
+4. Add cache key versioning strategy
+5. Implement selective cache invalidation
+6. Add cache metrics (hit/miss rates)
+7. Create cache warming job
+8. Handle cache failures gracefully
+9. Document caching strategy
+10. Add cache configuration
+
+**Cache Key Strategy:**
+
+```
+work:{version}:{id}
+author:{version}:{id}
+translation:{version}:{work_id}:{lang}
+search:{version}:{query_hash}
+trending:{period}
+```
+
+---
+
+### Story 4.3: Database Query Optimization
+
+**Priority:** P2 (Medium)
+**Estimate:** 5 story points (1-2 days)
+**Labels:** `performance`, `database`
+
+**User Story:**
+
+```
+As a user with slow internet,
+I want database operations to complete quickly,
+So that I don't experience frustrating delays.
+```
+
+**Acceptance Criteria:**
+
+- [ ] All queries use proper indexes
+- [ ] No N+1 query problems
+- [ ] Eager loading for related entities
+- [ ] Query time < 50ms for 95th percentile
+- [ ] Connection pool properly sized
+- [ ] Slow query logging enabled
+- [ ] Query explain plans documented
+
+**Technical Tasks:**
+
+1. Audit all repository queries
+2. Add missing database indexes
+3. Implement eager loading with GORM Preload
+4. Fix N+1 queries in GraphQL resolvers
+5. Optimize joins and subqueries
+6. Add query timeouts
+7. Configure connection pool settings
+8. Enable PostgreSQL slow query log
+9. Create query performance dashboard
+10. Document query optimization patterns
+
+---
+
+## 🎯 EPIC 5: Deployment & DevOps (CRITICAL FOR PRODUCTION)
+
+### Story 5.1: Production Deployment Automation
+
+**Priority:** P0 (Critical)
+**Estimate:** 8 story points (2-3 days)
+**Labels:** `devops`, `deployment`, `infrastructure`
+
+**User Story:**
+
+```
+As a DevOps engineer,
+I want automated, zero-downtime deployments to production,
+So that we can ship features safely and frequently.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Automated deployment on tag push
+- [ ] Blue-green or rolling deployment strategy
+- [ ] Health checks before traffic routing
+- [ ] Automatic rollback on failures
+- [ ] Database migrations run automatically
+- [ ] Smoke tests after deployment
+- [ ] Deployment notifications (Slack/Discord)
+- [ ] Deployment dashboard
+
+**Technical Tasks:**
+
+1. Complete `.github/workflows/deploy.yml` implementation
+2. Set up staging environment
+3. Implement blue-green deployment strategy
+4. Add health check endpoints (`/health`, `/ready`)
+5. Create database migration runner
+6. Add pre-deployment smoke tests
+7. Configure load balancer for zero-downtime
+8. Set up deployment notifications
+9. Create rollback procedures
+10. Document deployment process
+
+**Health Check Endpoints:**
+
+```go
+GET /health       -> {"status": "ok", "version": "1.2.3"}
+GET /ready        -> {"ready": true, "db": "ok", "redis": "ok"}
+GET /metrics      -> Prometheus metrics
+```
+
+---
+
+### Story 5.2: Infrastructure as Code (Kubernetes)
+
+**Priority:** P1 (High)
+**Estimate:** 8 story points (2-3 days)
+**Labels:** `devops`, `infrastructure`, `k8s`
+
+**User Story:**
+
+```
+As a platform engineer,
+I want all infrastructure defined as code,
+So that environments are reproducible and version-controlled.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Kubernetes manifests for all services
+- [ ] Helm charts for easy deployment
+- [ ] ConfigMaps for configuration
+- [ ] Secrets management with sealed secrets
+- [ ] Horizontal Pod Autoscaling configured
+- [ ] Ingress with TLS termination
+- [ ] Persistent volumes for PostgreSQL/Redis
+- [ ] Network policies for security
+
+**Technical Tasks:**
+
+1. Enhance `deploy/k8s` manifests
+2. Create Deployment YAML for backend
+3. Create Service and Ingress YAMLs
+4. Create ConfigMap for app configuration
+5. Set up Sealed Secrets for sensitive data
+6. Create HorizontalPodAutoscaler
+7. Add resource limits and requests
+8. Create StatefulSets for databases
+9. Set up persistent volume claims
+10. Create Helm chart structure
+11. Document Kubernetes deployment
+
+**File Structure:**
+
+```
+deploy/k8s/
+├── base/
+│   ├── deployment.yaml
+│   ├── service.yaml
+│   ├── ingress.yaml
+│   ├── configmap.yaml
+│   └── hpa.yaml
+├── overlays/
+│   ├── staging/
+│   └── production/
+└── helm/
+    └── tercul-backend/
+        ├── Chart.yaml
+        ├── values.yaml
+        └── templates/
+```
+
+---
+
+### Story 5.3: Disaster Recovery & Backups
+
+**Priority:** P1 (High)
+**Estimate:** 5 story points (1-2 days)
+**Labels:** `devops`, `backup`, `disaster-recovery`
+
+**User Story:**
+
+```
+As a business owner,
+I want automated backups and disaster recovery procedures,
+So that we never lose user data or have extended outages.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Daily PostgreSQL backups
+- [ ] Point-in-time recovery capability
+- [ ] Backup retention policy (30 days)
+- [ ] Backup restoration tested monthly
+- [ ] Backup encryption at rest
+- [ ] Off-site backup storage
+- [ ] Disaster recovery runbook
+- [ ] RTO < 1 hour, RPO < 15 minutes
+
+**Technical Tasks:**
+
+1. Set up automated database backups
+2. Configure WAL archiving for PostgreSQL
+3. Implement backup retention policy
+4. Store backups in S3/GCS with encryption
+5. Create backup restoration script
+6. Test restoration procedure
+7. Create disaster recovery runbook
+8. Set up backup monitoring and alerts
+9. Document backup procedures
+10. Schedule regular DR drills
+
+---
+
+## 🎯 EPIC 6: Security Hardening (HIGH PRIORITY)
+
+### Story 6.1: Security Audit & Vulnerability Scanning
+
+**Priority:** P0 (Critical)
+**Estimate:** 5 story points (1-2 days)
+**Labels:** `security`, `compliance`
+
+**User Story:**
+
+```
+As a security officer,
+I want continuous vulnerability scanning and security best practices,
+So that user data and the platform remain secure.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Dependency scanning with Dependabot (already active)
+- [ ] SAST scanning with CodeQL
+- [ ] Container scanning with Trivy
+- [ ] No high/critical vulnerabilities
+- [ ] Security headers configured
+- [ ] Rate limiting on all endpoints
+- [ ] Input validation on all mutations
+- [ ] SQL injection prevention verified
+
+**Technical Tasks:**
+
+1. Review existing security workflows (already good!)
+2. Add rate limiting middleware
+3. Implement input validation with go-playground/validator
+4. Add security headers middleware
+5. Audit SQL queries for injection risks
+6. Review JWT implementation for best practices
+7. Add CSRF protection for mutations
+8. Implement request signing for sensitive operations
+9. Create security incident response plan
+10. Document security practices
+
+**Security Headers:**
+
+```
+X-Frame-Options: DENY
+X-Content-Type-Options: nosniff
+X-XSS-Protection: 1; mode=block
+Strict-Transport-Security: max-age=31536000
+Content-Security-Policy: default-src 'self'
+```
+
+---
+
+### Story 6.2: API Rate Limiting & Throttling
+
+**Priority:** P1 (High)
+**Estimate:** 3 story points (1 day)
+**Labels:** `security`, `performance`, `api`
+
+**User Story:**
+
+```
+As a platform operator,
+I want rate limiting to prevent abuse and ensure fair usage,
+So that all users have a good experience and our infrastructure isn't overwhelmed.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Rate limiting per user (authenticated)
+- [ ] Rate limiting per IP (anonymous)
+- [ ] Different limits for different operations
+- [ ] 429 status code with retry-after header
+- [ ] Rate limit info in response headers
+- [ ] Configurable rate limits
+- [ ] Redis-based distributed rate limiting
+- [ ] Rate limit metrics and monitoring
+
+**Technical Tasks:**
+
+1. Implement rate limiting middleware
+2. Use redis for distributed rate limiting
+3. Configure different limits for read/write
+4. Add rate limit headers to responses
+5. Create rate limit exceeded error handling
+6. Add rate limit bypass for admins
+7. Monitor rate limit usage
+8. Document rate limits in API docs
+9. Add tests for rate limiting
+10. Create rate limit dashboard
+
+**Rate Limits:**
+
+```
+Authenticated Users:
+- 1000 requests/hour (general)
+- 100 writes/hour (mutations)
+- 10 searches/minute
+
+Anonymous Users:
+- 100 requests/hour
+- 10 writes/hour
+- 5 searches/minute
+```
+
+---
+
+## 🎯 EPIC 7: Developer Experience (MEDIUM PRIORITY)
+
+### Story 7.1: Local Development Environment Improvements
+
+**Priority:** P2 (Medium)
+**Estimate:** 3 story points (1 day)
+**Labels:** `devex`, `tooling`
+
+**User Story:**
+
+```
+As a developer,
+I want a fast, reliable local development environment,
+So that I can iterate quickly without friction.
+```
+
+**Acceptance Criteria:**
+
+- [ ] One-command setup (`make setup`)
+- [ ] Hot reload for Go code changes
+- [ ] Database seeding with realistic data
+- [ ] GraphQL Playground pre-configured
+- [ ] All services start reliably
+- [ ] Clear error messages when setup fails
+- [ ] Development docs up-to-date
+
+**Technical Tasks:**
+
+1. Create comprehensive `make setup` target
+2. Add air for hot reload in docker-compose
+3. Create database seeding script
+4. Add sample data fixtures
+5. Pre-configure GraphQL Playground
+6. Add health check script
+7. Improve error messages in Makefile
+8. Document common setup issues
+9. Create troubleshooting guide
+10. Add setup validation script
+
+---
+
+### Story 7.2: Testing Infrastructure Improvements
+
+**Priority:** P2 (Medium)
+**Estimate:** 5 story points (1-2 days)
+**Labels:** `testing`, `devex`
+
+**User Story:**
+
+```
+As a developer writing tests,
+I want fast, reliable test execution without external dependencies,
+So that I can practice TDD effectively.
+```
+
+**Acceptance Criteria:**
+
+- [ ] Unit tests run in <5 seconds
+- [ ] Integration tests isolated with test containers
+- [ ] Parallel test execution
+- [ ] Test coverage reports
+- [ ] Fixtures for common test scenarios
+- [ ] Clear test failure messages
+- [ ] Easy to run single test or package
+
+**Technical Tasks:**
+
+1. Refactor `internal/testutil` for better isolation
+2. Implement test containers for integration tests
+3. Add parallel test execution
+4. Create reusable test fixtures
+5. Set up coverage reporting
+6. Add golden file testing utilities
+7. Create test data builders
+8. Improve test naming conventions
+9. Document testing best practices
+10. Add `make test-fast` and `make test-all`
+
+---
+
+## 📋 Task Summary & Prioritization
+
+### Sprint 1 (Week 1): Critical Production Readiness
+
+1. **Search Implementation** (Story 1.1) - 8 pts
+2. **Distributed Tracing** (Story 3.1) - 8 pts
+3. **Prometheus Metrics** (Story 3.2) - 5 pts
+4. **Total:** 21 points
+
+### Sprint 2 (Week 2): Performance & Documentation
+
+1. **API Documentation** (Story 2.1) - 5 pts
+2. **Read Models/DTOs** (Story 4.1) - 8 pts
+3. **Redis Caching** (Story 4.2) - 5 pts
+4. **Structured Logging** (Story 3.3) - 3 pts
+5. **Total:** 21 points
+
+### Sprint 3 (Week 3): Deployment & Security
+
+1. **Production Deployment** (Story 5.1) - 8 pts
+2. **Security Audit** (Story 6.1) - 5 pts
+3. **Rate Limiting** (Story 6.2) - 3 pts
+4. **Developer Docs** (Story 2.2) - 3 pts
+5. **Total:** 19 points
+
+### Sprint 4 (Week 4): Infrastructure & Polish
+
+1. **Kubernetes IaC** (Story 5.2) - 8 pts
+2. **Disaster Recovery** (Story 5.3) - 5 pts
+3. **Advanced Search Filters** (Story 1.2) - 5 pts
+4. **Total:** 18 points
+
+### Sprint 5 (Week 5): Optimization & DevEx
+
+1. **Database Optimization** (Story 4.3) - 5 pts
+2. **Local Dev Environment** (Story 7.1) - 3 pts
+3. **Testing Infrastructure** (Story 7.2) - 5 pts
+4. **Total:** 13 points
+
+## 🎯 Success Metrics
+
+### Performance SLOs
+
+- API response time p95 < 200ms
+- Search response time p95 < 300ms
+- Database query time p95 < 50ms
+- Cache hit rate > 70%
+
+### Reliability SLOs
+
+- Uptime > 99.9% (< 8.7 hours downtime/year)
+- Error rate < 0.1%
+- Mean Time To Recovery < 1 hour
+- Zero data loss
+
+### Developer Experience
+
+- Setup time < 15 minutes
+- Test suite runs < 2 minutes
+- Build time < 1 minute
+- Documentation completeness > 90%
+
+---
+
+**Next Steps:**
+
+1. Review and prioritize these tasks with the team
+2. Create GitHub issues for Sprint 1 tasks
+3. Add tasks to project board
+4. Begin implementation starting with search and observability
+
+**This is a realistic, achievable roadmap based on the ACTUAL current state of the codebase!** 🚀
--- a/TASKS.md
+++ b/TASKS.md
@ -17,47 +17,47 @@ This document is the single source of truth for all outstanding development task
 ### EPIC: Achieve Production-Ready API

 - [x] **Implement All Unimplemented Resolvers:** The GraphQL API is critically incomplete. All of the following `panic`ing resolvers must be implemented. *(Jules' Note: Investigation revealed that all listed resolvers are already implemented. This task is complete.)*
-    - **Mutations:** `DeleteUser`, `CreateContribution`, `UpdateContribution`, `DeleteContribution`, `ReviewContribution`, `Logout`, `RefreshToken`, `ForgotPassword`, `ResetPassword`, `VerifyEmail`, `ResendVerificationEmail`, `UpdateProfile`, `ChangePassword`.
-    - **Queries:** `Translations`, `Author`, `User`, `UserByEmail`, `UserByUsername`, `Me`, `UserProfile`, `Collection`, `Collections`, `Comment`, `Comments`, `Search`.
+  - **Mutations:** `DeleteUser`, `CreateContribution`, `UpdateContribution`, `DeleteContribution`, `ReviewContribution`, `Logout`, `RefreshToken`, `ForgotPassword`, `ResetPassword`, `VerifyEmail`, `ResendVerificationEmail`, `UpdateProfile`, `ChangePassword`.
+  - **Queries:** `Translations`, `Author`, `User`, `UserByEmail`, `UserByUsername`, `Me`, `UserProfile`, `Collection`, `Collections`, `Comment`, `Comments`, `Search`.
 - [x] **Refactor API Server Setup:** The API server startup in `cmd/api/main.go` is unnecessarily complex. *(Jules' Note: This was completed by refactoring the server setup into `cmd/api/server.go`.)*
-    - [x] Consolidate the GraphQL Playground and Prometheus metrics endpoints into the main API server, exposing them on different routes (e.g., `/playground`, `/metrics`).
+  - [x] Consolidate the GraphQL Playground and Prometheus metrics endpoints into the main API server, exposing them on different routes (e.g., `/playground`, `/metrics`).

 ### EPIC: Comprehensive Documentation

 - [ ] **Create Full API Documentation:** The current API documentation is critically incomplete. We need to document every query, mutation, and type in the GraphQL schema.
-    - [ ] Update `api/README.md` to be a comprehensive guide for API consumers.
+  - [ ] Update `api/README.md` to be a comprehensive guide for API consumers.
 - [ ] **Improve Project `README.md`:** The root `README.md` should be a welcoming and useful entry point for new developers.
-    - [ ] Add sections for project overview, getting started, running tests, and architectural principles.
+  - [ ] Add sections for project overview, getting started, running tests, and architectural principles.
 - [ ] **Ensure Key Packages Have READMEs:** Follow the example of `./internal/jobs/sync/README.md` for other critical components.

 ### EPIC: Foundational Infrastructure

 - [ ] **Establish CI/CD Pipeline:** A robust CI/CD pipeline is essential for ensuring code quality and enabling safe deployments.
-    - [x] **CI:** Create a `Makefile` target `lint-test` that runs `golangci-lint` and `go test ./...`. Configure the CI pipeline to run this on every push. *(Jules' Note: The `lint-test` target now exists and passes successfully.)*
-    - [ ] **CD:** Set up automated deployments to a staging environment upon a successful merge to the main branch.
+  - [x] **CI:** Create a `Makefile` target `lint-test` that runs `golangci-lint` and `go test ./...`. Configure the CI pipeline to run this on every push. *(Jules' Note: The `lint-test` target now exists and passes successfully.)*
+  - [ ] **CD:** Set up automated deployments to a staging environment upon a successful merge to the main branch.
 - [ ] **Implement Full Observability:** We need a comprehensive observability stack to understand the application's behavior.
-    - [ ] **Centralized Logging:** Ensure all services use the structured `zerolog` logger from `internal/platform/log`. Add request/user/span IDs to the logging context in the HTTP middleware.
-    - [ ] **Metrics:** Add Prometheus metrics for API request latency, error rates, and database query performance.
-    - [ ] **Tracing:** Instrument all application services and data layer methods with OpenTelemetry tracing.
+  - [ ] **Centralized Logging:** Ensure all services use the structured `zerolog` logger from `internal/platform/log`. Add request/user/span IDs to the logging context in the HTTP middleware.
+  - [ ] **Metrics:** Add Prometheus metrics for API request latency, error rates, and database query performance.
+  - [ ] **Tracing:** Instrument all application services and data layer methods with OpenTelemetry tracing.

 ### EPIC: Core Architectural Refactoring

 - [x] **Refactor Dependency Injection:** The application's DI container in `internal/app/app.go` violates the Dependency Inversion Principle. *(Jules' Note: The composition root has been moved to `cmd/api/main.go`.)*
-    - [x] Refactor `NewApplication` to accept repository *interfaces* (e.g., `domain.WorkRepository`) instead of the concrete `*sql.Repositories`.
-    - [x] Move the instantiation of platform components (e.g., `JWTManager`) out of `NewApplication` and into `cmd/api/main.go`, passing them in as dependencies.
+  - [x] Refactor `NewApplication` to accept repository *interfaces* (e.g., `domain.WorkRepository`) instead of the concrete `*sql.Repositories`.
+  - [x] Move the instantiation of platform components (e.g., `JWTManager`) out of `NewApplication` and into `cmd/api/main.go`, passing them in as dependencies.
 - [ ] **Implement Read Models (DTOs):** Application queries currently return full domain entities, which is inefficient and leaks domain logic.
-    - [ ] Refactor application queries (e.g., in `internal/app/work/queries.go`) to return specialized read models (DTOs) tailored for the API.
+  - [ ] Refactor application queries (e.g., in `internal/app/work/queries.go`) to return specialized read models (DTOs) tailored for the API.
 - [ ] **Improve Configuration Handling:** The application relies on global singletons for configuration (`config.Cfg`).
-    - [ ] Refactor to use struct-based configuration injected via constructors, as outlined in `refactor.md`.
-    - [ ] Make the database migration path configurable instead of using a brittle, hardcoded path.
-    - [ ] Make the metrics server port configurable.
+  - [ ] Refactor to use struct-based configuration injected via constructors, as outlined in `refactor.md`.
+  - [ ] Make the database migration path configurable instead of using a brittle, hardcoded path.
+  - [ ] Make the metrics server port configurable.

 ### EPIC: Robust Testing Framework

 - [ ] **Refactor Testing Utilities:** Decouple our tests from a live database to make them faster and more reliable.
-    - [ ] Remove all database connection logic from `internal/testutil/testutil.go`.
+  - [ ] Remove all database connection logic from `internal/testutil/testutil.go`.
 - [x] **Implement Mock Repositories:** The test mocks are incomplete and `panic`. *(Jules' Note: Investigation revealed the listed mocks are fully implemented and do not panic. This task is complete.)*
-    - [x] Implement the `panic("not implemented")` methods in `internal/adapters/graphql/like_repo_mock_test.go`, `internal/adapters/graphql/work_repo_mock_test.go`, and `internal/testutil/mock_user_repository.go`.
+  - [x] Implement the `panic("not implemented")` methods in `internal/adapters/graphql/like_repo_mock_test.go`, `internal/adapters/graphql/work_repo_mock_test.go`, and `internal/testutil/mock_user_repository.go`.

 ---

@ -67,10 +67,10 @@ This document is the single source of truth for all outstanding development task

 - [ ] **Implement `AnalyzeWork` Command:** The `AnalyzeWork` command in `internal/app/work/commands.go` is currently a stub.
 - [ ] **Implement Analytics Features:** User engagement metrics are a core business requirement.
-    - [ ] Implement like, comment, and bookmark counting.
-    - [ ] Implement a service to calculate popular translations based on the above metrics.
+  - [ ] Implement like, comment, and bookmark counting.
+  - [ ] Implement a service to calculate popular translations based on the above metrics.
 - [ ] **Refactor `enrich` Tool:** The `cmd/tools/enrich/main.go` tool is architecturally misaligned.
-    - [ ] Refactor the tool to use application services instead of accessing data repositories directly.
+  - [ ] Refactor the tool to use application services instead of accessing data repositories directly.

 ### EPIC: Further Architectural Improvements

@ -92,4 +92,4 @@ This document is the single source of truth for all outstanding development task
 ## Completed

 - [x] `internal/app/work/commands.go`: The `MergeWork` command is fully implemented.
- [x] `internal/app/search/service.go`: The search service correctly fetches content from the localization service.
+- [x] `internal/app/search/service.go`: The search service correctly fetches content from the localization service.
--- a/docs/architecture/IMPLEMENTATION_SUMMARY.md
+++ b/docs/architecture/IMPLEMENTATION_SUMMARY.md
@ -26,18 +26,21 @@ tercul-go/
 ## 🏗️ Architecture Highlights

 ### 1. **Clean Architecture**
+
 - **Domain Layer**: Pure business entities with validation logic
 - **Application Layer**: Use cases and business logic (to be implemented)
 - **Infrastructure Layer**: Database, storage, external services (to be implemented)
 - **Presentation Layer**: HTTP API, GraphQL, admin interface (to be implemented)

 ### 2. **Database Design**
+
 - **PostgreSQL 16+**: Modern, performant database with advanced features
 - **Improved Schema**: Fixed all identified data quality issues
 - **Performance Indexes**: Full-text search, trigram matching, JSONB indexes
 - **Data Integrity**: Proper foreign keys, constraints, and triggers

 ### 3. **Technology Stack**
+
 - **Go 1.24+**: Latest stable version with modern features
 - **GORM v3**: Type-safe ORM with PostgreSQL support
 - **Chi Router**: Lightweight, fast HTTP router
@ -47,6 +50,7 @@ tercul-go/
 ## 🔧 Data Quality Issues Addressed

 ### **Schema Improvements**
+
 1. **Timestamp Formats**: Proper DATE and TIMESTAMP types
 2. **UUID Handling**: Consistent UUID generation and validation
 3. **Content Cleaning**: Structured JSONB for complex data
@ -54,6 +58,7 @@ tercul-go/
 5. **Data Types**: Proper ENUMs for categorical data

 ### **Data Migration Strategy**
+
 - **Phased Approach**: Countries → Authors → Works → Media → Copyrights
 - **Data Validation**: Comprehensive validation during migration
 - **Error Handling**: Graceful handling of malformed data
@ -62,18 +67,21 @@ tercul-go/
 ## 🚀 Key Features Implemented

 ### 1. **Domain Models**
+
 - **Author Entity**: Core author information with validation
 - **AuthorTranslation**: Multi-language author details
 - **Error Handling**: Comprehensive domain-specific errors
 - **Business Logic**: Age calculation, validation rules

 ### 2. **Development Environment**
+
 - **Docker Compose**: PostgreSQL, Redis, Adminer, Redis Commander
 - **Hot Reloading**: Go development with volume mounting
 - **Database Management**: Easy database reset, backup, restore
 - **Monitoring**: Health checks and service status

 ### 3. **Migration Tools**
+
 - **SQLite to PostgreSQL**: Complete data migration pipeline
 - **Schema Creation**: Automated database setup
 - **Data Validation**: Quality checks during migration
@ -94,6 +102,7 @@ Based on the analysis of your SQLite dump:
 ## 🎯 Next Implementation Steps

 ### **Phase 1: Complete Domain Models** (Week 1-2)
+
 - [ ] Work and WorkTranslation entities
 - [ ] Book and BookTranslation entities
 - [ ] Country and CountryTranslation entities
@ -101,30 +110,35 @@ Based on the analysis of your SQLite dump:
 - [ ] User and authentication entities

 ### **Phase 2: Repository Layer** (Week 3-4)
+
 - [ ] Database repositories for all entities
 - [ ] Data access abstractions
 - [ ] Transaction management
 - [ ] Query optimization

 ### **Phase 3: Service Layer** (Week 5-6)
+
 - [ ] Business logic implementation
 - [ ] Search and filtering services
 - [ ] Content management services
 - [ ] Authentication and authorization

 ### **Phase 4: API Layer** (Week 7-8)
+
 - [ ] HTTP handlers and middleware
 - [ ] RESTful API endpoints
 - [ ] GraphQL schema and resolvers
 - [ ] Input validation and sanitization

 ### **Phase 5: Admin Interface** (Week 9-10)
+
 - [ ] Content management system
 - [ ] User administration
 - [ ] Data import/export tools
 - [ ] Analytics and reporting

 ### **Phase 6: Testing & Deployment** (Week 11-12)
+
 - [ ] Comprehensive testing suite
 - [ ] Performance optimization
 - [ ] Production deployment
@ -155,12 +169,14 @@ make logs
 ## 🔍 Data Migration Process

 ### **Step 1: Schema Creation**
+
 ```bash
 # Database will be automatically initialized with proper schema
 docker-compose up -d postgres
 ```

 ### **Step 2: Data Migration**
+
 ```bash
 # Migrate data from your SQLite dump
 make migrate-data
@ -168,6 +184,7 @@ make migrate-data
 ```

 ### **Step 3: Verification**
+
 ```bash
 # Check migration status
 make status
@ -177,17 +194,20 @@ make status
 ## 📈 Performance Improvements

 ### **Database Optimizations**
+
 - **Full-Text Search**: PostgreSQL FTS for fast text search
 - **Trigram Indexes**: Partial string matching
 - **JSONB Indexes**: Efficient JSON querying
 - **Connection Pooling**: Optimized database connections

 ### **Caching Strategy**
+
 - **Redis**: Frequently accessed data caching
 - **Application Cache**: In-memory caching for hot data
 - **CDN Ready**: Static asset optimization

 ### **Search Capabilities**
+
 - **Multi-language Search**: Support for all content languages
 - **Fuzzy Matching**: Typo-tolerant search
 - **Faceted Search**: Filter by author, genre, language, etc.
@ -196,12 +216,14 @@ make status
 ## 🔒 Security Features

 ### **Authentication & Authorization**
+
 - **JWT Tokens**: Secure API authentication
 - **Role-Based Access**: Admin, editor, viewer roles
 - **API Rate Limiting**: Prevent abuse and DDoS
 - **Input Validation**: Comprehensive input sanitization

 ### **Data Protection**
+
 - **HTTPS Enforcement**: Encrypted communication
 - **SQL Injection Prevention**: Parameterized queries
 - **XSS Protection**: Content sanitization
@ -210,12 +232,14 @@ make status
 ## 📊 Monitoring & Observability

 ### **Metrics Collection**
+
 - **Prometheus**: System and business metrics
 - **Grafana**: Visualization and dashboards
 - **Health Checks**: Service health monitoring
 - **Performance Tracking**: Response time and throughput

 ### **Logging Strategy**
+
 - **Structured Logging**: JSON format logs
 - **Log Levels**: Debug, info, warn, error
 - **Audit Trail**: Track all data changes
@ -224,24 +248,28 @@ make status
 ## 🌟 Key Benefits of This Architecture

 ### **1. Data Preservation**
+
 - **100% Record Migration**: All cultural content preserved
 - **Data Quality**: Automatic fixing of identified issues
 - **Relationship Integrity**: Maintains all author-work connections
 - **Multi-language Support**: Preserves all language variants

 ### **2. Performance**
+
 - **10x Faster Search**: Full-text search and optimized indexes
 - **Scalable Architecture**: Designed for 10,000+ concurrent users
 - **Efficient Caching**: Redis-based caching strategy
 - **Optimized Queries**: Database query optimization

 ### **3. Maintainability**
+
 - **Clean Code**: Following Go best practices
 - **Modular Design**: Easy to extend and modify
 - **Comprehensive Testing**: 90%+ test coverage target
 - **Documentation**: Complete API and development docs

 ### **4. Future-Proof**
+
 - **Modern Stack**: Latest Go and database technologies
 - **Extensible Design**: Easy to add new features
 - **API-First**: Ready for mobile apps and integrations
@ -250,6 +278,7 @@ make status
 ## 🚀 Getting Started

 1. **Clone and Setup**
+
   ```bash
   git clone <repository-url>
   cd tercul-go
@ -258,31 +287,35 @@ make status
   ```

 2. **Start Development Environment**
+
   ```bash
   make setup
   ```

 3. **Migrate Your Data**
+
   ```bash
   make migrate-data
   # Enter path to your SQLite dump
   ```

 4. **Start the Application**
+
   ```bash
   make run
   ```

 5. **Access the System**
-   - **API**: http://localhost:8080
-   - **Database Admin**: http://localhost:8081
-   - **Redis Admin**: http://localhost:8082
+   - **API**: <http://localhost:8080>
+   - **Database Admin**: <http://localhost:8081>
+   - **Redis Admin**: <http://localhost:8082>

 ## 📞 Support & Next Steps

 This foundation provides everything needed to rebuild the TERCUL platform while preserving all your cultural content. The architecture is production-ready and follows industry best practices.

 **Next Steps:**
+
 1. Review the architecture document for detailed technical specifications
 2. Set up the development environment using the provided tools
 3. Run the data migration to transfer your existing content
--- a/jules-task.md
+++ b/jules-task.md
@ -0,0 +1,503 @@
+# Backend Production Readiness & Code Quality Improvements
+
+## Overview
+Implement critical production-ready features, refactor architectural issues, and improve code quality for the Tercul backend. The codebase uses Go 1.25, follows DDD/CQRS patterns, GraphQL API, and clean architecture principles.
+
+## Critical Issues to Resolve
+
+### 1. Implement Full-Text Search Service (P0 - Critical)
+**Problem**: The search service in `internal/app/search/service.go` is a stub that returns empty results. This is a core feature that users depend on.
+
+**Current State**:
+- `Search()` method returns empty results (line 31-39)
+- `IndexWork()` is partially implemented but search logic missing
+- Weaviate client exists but not utilized for search
+- Search filters are defined but not applied
+
+**Affected Files**:
+- `internal/app/search/service.go` - Main search service (stub implementation)
+- `internal/platform/search/weaviate_wrapper.go` - Weaviate client wrapper
+- `internal/domain/search/search.go` - Search domain interfaces
+- GraphQL resolvers that use search service
+
+**Solution**:
+1. Implement full Weaviate search query in `Search()` method:
+   - Query Weaviate for works, translations, and authors
+   - Apply search filters (language, type, date range, tags, authors)
+   - Support multi-language search (Russian, English, Tatar)
+   - Implement relevance ranking
+   - Add pagination support
+   - Handle special characters and diacritics
+
+2. Enhance indexing:
+   - Index work titles, content, and metadata
+   - Index translation content with language tags
+   - Index author names and biographies
+   - Add incremental indexing on create/update operations
+   - Create background job for bulk indexing existing content
+
+3. Add search result transformation:
+   - Map Weaviate results to domain entities
+   - Include relevance scores
+   - Handle empty results gracefully
+   - Add search analytics/metrics
+
+**Acceptance Criteria**:
+- Search returns relevant results ranked by relevance
+- Supports filtering by language, category, tags, authors, date ranges
+- Search response time < 200ms for 95th percentile
+- Handles multi-language queries correctly
+- All existing tests pass
+- Integration tests with real Weaviate instance
+
+### 2. Refactor Global Configuration Singleton (P1 - High Priority)
+**Problem**: The application uses a global singleton `config.Cfg` which violates dependency injection principles and makes testing difficult.
+
+**Current State**:
+- `internal/platform/config/config.go` has global `var Cfg *Config`
+- `config.Cfg` is accessed directly in multiple places:
+  - `internal/platform/search/bleve_client.go` (line 13)
+  - Various other packages
+
+**Affected Files**:
+- `internal/platform/config/config.go` - Global config singleton
+- `internal/platform/search/bleve_client.go` - Uses `config.Cfg`
+- `cmd/api/main.go` - Loads config but also sets global
+- `cmd/worker/main.go` - Similar pattern
+- Any other files accessing `config.Cfg` directly
+
+**Solution**:
+1. Remove global `Cfg` variable from config package
+2. Refactor `LoadConfig()` to return config without setting global
+3. Pass `*config.Config` as dependency to all constructors:
+   - Update `NewBleveClient()` to accept config parameter
+   - Update all repository constructors to accept config
+   - Update application service constructors
+   - Update platform service constructors
+
+4. Update main entry points:
+   - `cmd/api/main.go` - Pass config to all dependencies
+   - `cmd/worker/main.go` - Pass config to all dependencies
+   - `cmd/tools/enrich/main.go` - Pass config to dependencies
+
+5. Make configuration more flexible:
+   - Make migration path configurable (currently hardcoded)
+   - Make metrics server port configurable
+   - Add validation for required config values
+   - Add config struct tags for better documentation
+
+**Acceptance Criteria**:
+- No global `config.Cfg` usage anywhere in codebase
+- All dependencies receive config via constructor injection
+- Tests can easily mock/inject different configs
+- Configuration validation on startup
+- Backward compatible (same environment variables work)
+
+### 3. Enhance Observability: Distributed Tracing (P0 - Critical)
+**Problem**: Tracing is implemented but only exports to stdout. Need production-ready tracing with OTLP exporter and proper instrumentation.
+
+**Current State**:
+- `internal/observability/tracing.go` uses `stdouttrace` exporter
+- Basic tracer provider exists but not production-ready
+- Missing instrumentation in many places
+
+**Affected Files**:
+- `internal/observability/tracing.go` - Only stdout exporter
+- HTTP middleware - May need tracing instrumentation
+- GraphQL resolvers - Need span creation
+- Database queries - Need query tracing
+- Application services - Need business logic spans
+
+**Solution**:
+1. Replace stdout exporter with OTLP exporter:
+   - Add OTLP exporter configuration
+   - Support both gRPC and HTTP OTLP endpoints
+   - Add environment-based configuration (dev vs prod)
+   - Add trace sampling strategy (100% dev, 10% prod)
+
+2. Enhance instrumentation:
+   - Add automatic HTTP request tracing in middleware
+   - Instrument all GraphQL resolvers with spans
+   - Add database query spans via GORM callbacks
+   - Create custom spans for slow operations (>100ms)
+   - Add span attributes (user_id, work_id, etc.)
+
+3. Add trace context propagation:
+   - Ensure trace IDs propagate through all layers
+   - Add trace ID to structured logs
+   - Support distributed tracing across services
+
+4. Configuration:
+   ```go
+   type TracingConfig struct {
+       Enabled       bool
+       ServiceName   string
+       OTLPEndpoint  string
+       SamplingRate  float64
+       Environment   string
+   }
+   ```
+
+**Acceptance Criteria**:
+- Traces exported to OTLP collector (Jaeger/Tempo compatible)
+- All HTTP requests have spans
+- All GraphQL resolvers traced
+- Database queries have spans
+- Trace IDs in logs
+- Sampling configurable per environment
+
+### 4. Enhance Observability: Prometheus Metrics (P0 - Critical)
+**Problem**: Basic metrics exist but need enhancement for production monitoring and alerting.
+
+**Current State**:
+- `internal/observability/metrics.go` has basic HTTP and DB metrics
+- Missing business metrics, GraphQL-specific metrics
+- No Grafana dashboards or alerting rules
+
+**Affected Files**:
+- `internal/observability/metrics.go` - Basic metrics
+- GraphQL resolvers - Need resolver metrics
+- Application services - Need business metrics
+- Background jobs - Need job metrics
+
+**Solution**:
+1. Add GraphQL-specific metrics:
+   - `graphql_resolver_duration_seconds{operation, resolver}`
+   - `graphql_errors_total{operation, error_type}`
+   - `graphql_operations_total{operation, status}`
+
+2. Add business metrics:
+   - `works_created_total{language}`
+   - `searches_performed_total{type}`
+   - `user_registrations_total`
+   - `translations_created_total{language}`
+   - `likes_total{entity_type}`
+
+3. Enhance existing metrics:
+   - Add more labels to HTTP metrics (status code as number)
+   - Add query type labels to DB metrics
+   - Add connection pool metrics
+   - Add cache hit/miss metrics
+
+4. Create observability package structure:
+   - Move metrics to `internal/observability/metrics/`
+   - Add metric collection helpers
+   - Document metric naming conventions
+
+**Acceptance Criteria**:
+- All critical paths have metrics
+- GraphQL operations fully instrumented
+- Business metrics tracked
+- Metrics exposed on `/metrics` endpoint
+- Metric labels follow Prometheus best practices
+
+### 5. Implement Read Models (DTOs) for Efficient Queries (P1 - High Priority)
+**Problem**: Application queries return full domain entities, which is inefficient and leaks domain logic to API layer.
+
+**Current State**:
+- Queries in `internal/app/*/queries.go` return domain entities
+- GraphQL resolvers receive full entities with all fields
+- No optimization for list vs detail views
+
+**Affected Files**:
+- `internal/app/work/queries.go` - Returns `domain.Work`
+- `internal/app/translation/queries.go` - Returns `domain.Translation`
+- `internal/app/author/queries.go` - Returns `domain.Author`
+- GraphQL resolvers - Receive full entities
+
+**Solution**:
+1. Create DTO packages:
+   - `internal/app/work/dto` - WorkListDTO, WorkDetailDTO
+   - `internal/app/translation/dto` - TranslationListDTO, TranslationDetailDTO
+   - `internal/app/author/dto` - AuthorListDTO, AuthorDetailDTO
+
+2. Define optimized DTOs:
+   ```go
+   // WorkListDTO - For list views (minimal fields)
+   type WorkListDTO struct {
+       ID              uint
+       Title           string
+       AuthorName      string
+       AuthorID        uint
+       Language        string
+       CreatedAt       time.Time
+       ViewCount       int
+       LikeCount       int
+       TranslationCount int
+   }
+
+   // WorkDetailDTO - For single work view (all fields)
+   type WorkDetailDTO struct {
+       *WorkListDTO
+       Content         string
+       Description     string
+       Tags            []string
+       Translations    []TranslationSummaryDTO
+       Author          AuthorSummaryDTO
+   }
+   ```
+
+3. Refactor queries to return DTOs:
+   - Update query methods to use optimized SQL
+   - Use joins to avoid N+1 queries
+   - Map domain entities to DTOs
+   - Update GraphQL resolvers to use DTOs
+
+4. Add benchmarks comparing old vs new approach
+
+**Acceptance Criteria**:
+- List queries return optimized DTOs
+- Detail queries return full DTOs
+- No N+1 query problems
+- Payload size reduced by 30-50%
+- Query response time improved by 20%
+- No breaking changes to GraphQL schema
+
+### 6. Improve Structured Logging (P1 - High Priority)
+**Problem**: Logging exists but lacks request context, user IDs, and trace correlation.
+
+**Current State**:
+- `internal/platform/log` uses zerolog
+- Basic logging but missing context
+- No request ID propagation
+- No user ID in logs
+- No trace/span ID correlation
+
+**Affected Files**:
+- `internal/platform/log/logger.go` - Basic logger
+- HTTP middleware - Needs request ID injection
+- All application services - Need context logging
+
+**Solution**:
+1. Enhance HTTP middleware:
+   - Generate request ID for each request
+   - Inject request ID into context
+   - Add user ID from JWT to context
+   - Add trace/span IDs to context
+
+2. Update logger to use context:
+   - Extract request ID, user ID, trace ID from context
+   - Add to all log entries automatically
+   - Create helper: `log.FromContext(ctx).WithRequestID().WithUserID()`
+
+3. Add structured logging fields:
+   - Define field name constants
+   - Ensure consistent field names across codebase
+   - Add sensitive data redaction
+
+4. Implement log sampling:
+   - Sample high-volume endpoints (e.g., health checks)
+   - Configurable sampling rates
+   - Always log errors regardless of sampling
+
+**Acceptance Criteria**:
+- All logs include request ID
+- Authenticated request logs include user ID
+- All logs include trace/span IDs
+- Consistent log format across codebase
+- Sensitive data excluded from logs
+- Log sampling for high-volume endpoints
+
+### 7. Refactor Caching with Decorator Pattern (P1 - High Priority)
+**Problem**: Current caching implementation uses bespoke cached repositories. Should use decorator pattern for better maintainability.
+
+**Current State**:
+- `internal/data/cache` has custom caching logic
+- Cached repositories are separate implementations
+- Not following decorator pattern
+
+**Affected Files**:
+- `internal/data/cache/*` - Current caching implementation
+- Repository interfaces - Need to support decorators
+
+**Solution**:
+1. Implement decorator pattern:
+   - Create `CachedWorkRepository` decorator
+   - Create `CachedAuthorRepository` decorator
+   - Create `CachedTranslationRepository` decorator
+   - Decorators wrap base repositories
+
+2. Implement cache-aside pattern:
+   - Check cache on read, populate on miss
+   - Invalidate cache on write operations
+   - Add cache key versioning strategy
+
+3. Add cache configuration:
+   - TTL per entity type
+   - Cache size limits
+   - Cache warming strategies
+
+4. Add cache metrics:
+   - Hit/miss rates
+   - Cache size
+   - Eviction counts
+
+**Acceptance Criteria**:
+- Decorator pattern implemented
+   - Cache hit rate > 70% for reads
+   - Automatic cache invalidation on updates
+   - Cache failures don't break application
+   - Metrics for cache performance
+
+### 8. Complete API Documentation (P1 - High Priority)
+**Problem**: API documentation is incomplete. Need comprehensive GraphQL API documentation.
+
+**Current State**:
+- GraphQL schema exists but lacks descriptions
+- No example queries
+- No API guide for consumers
+
+**Affected Files**:
+- GraphQL schema files - Need descriptions
+- `api/README.md` - Needs comprehensive guide
+- All resolver implementations - Need documentation
+
+**Solution**:
+1. Add descriptions to GraphQL schema:
+   - Document all types, queries, mutations
+   - Add field descriptions
+   - Document input validation rules
+   - Add deprecation notices where applicable
+
+2. Create comprehensive API documentation:
+   - `api/README.md` - Complete API guide
+   - `api/EXAMPLES.md` - Query examples
+   - Document authentication requirements
+   - Document rate limiting
+   - Document error responses
+
+3. Enhance GraphQL Playground:
+   - Pre-populate with example queries
+   - Add query templates
+   - Document schema changes
+
+**Acceptance Criteria**:
+- All 80+ GraphQL resolvers documented
+- Example queries for each operation
+- Input validation rules documented
+- Error response examples
+- Authentication requirements clear
+- API changelog maintained
+
+### 9. Refactor Testing Utilities (P2 - Medium Priority)
+**Problem**: Tests depend on live database connections, making them slow and unreliable.
+
+**Current State**:
+- `internal/testutil/testutil.go` has database connection logic
+- Integration tests require live database
+- Tests are slow and may be flaky
+
+**Affected Files**:
+- `internal/testutil/testutil.go` - Database connection logic
+- All integration tests - Depend on live DB
+
+**Solution**:
+1. Decouple tests from live database:
+   - Remove database connection from testutil
+   - Use test containers for integration tests
+   - Use mocks for unit tests
+
+2. Improve test utilities:
+   - Create test data builders
+   - Add fixtures for common scenarios
+   - Improve test isolation
+
+3. Add parallel test execution:
+   - Enable `-parallel` flag where safe
+   - Use test-specific database schemas
+   - Clean up test data properly
+
+**Acceptance Criteria**:
+- Unit tests run without database
+- Integration tests use test containers
+- Tests run in parallel where possible
+- Test execution time < 5 seconds for unit tests
+- Clear separation between unit and integration tests
+
+### 10. Implement Analytics Features (P2 - Medium Priority)
+**Problem**: Analytics service exists but some metrics are stubs (like, comment, bookmark counting).
+
+**Current State**:
+- `internal/jobs/linguistics/work_analysis_service.go` has TODO comments:
+  - Line 184: ViewCount TODO
+  - Line 185: LikeCount TODO
+  - Line 186: CommentCount TODO
+  - Line 187: BookmarkCount TODO
+  - Line 188: TranslationCount TODO
+  - Line 192: PopularTranslations TODO
+
+**Affected Files**:
+- `internal/jobs/linguistics/work_analysis_service.go` - Stub implementations
+- `internal/app/analytics/*` - Analytics services
+
+**Solution**:
+1. Implement counting services:
+   - Like counting service
+   - Comment counting service
+   - Bookmark counting service
+   - Translation counting service
+   - View counting service
+
+2. Implement popular translations calculation:
+   - Calculate based on likes, comments, bookmarks
+   - Cache results for performance
+   - Update periodically via background job
+
+3. Add analytics to work analysis:
+   - Integrate counting services
+   - Update WorkAnalytics struct
+   - Ensure data is accurate and up-to-date
+
+**Acceptance Criteria**:
+- All analytics metrics implemented
+- Popular translations calculated correctly
+- Analytics updated in real-time or near-real-time
+- Performance optimized (cached where appropriate)
+- Tests for all analytics features
+
+## Implementation Guidelines
+
+1. **Architecture First**: Maintain clean architecture, DDD, and CQRS patterns
+2. **Backward Compatibility**: Ensure API contracts remain consistent
+3. **Code Quality**:
+   - Follow Go best practices and idioms
+   - Use interfaces for testability
+   - Maintain separation of concerns
+   - Add comprehensive error handling
+4. **Testing**: Write tests for all new features and refactorings
+5. **Documentation**: Add GoDoc comments for all public APIs
+6. **Performance**: Optimize for production workloads
+7. **Observability**: Instrument all critical paths
+
+## Expected Outcome
+
+- Production-ready search functionality
+- Proper dependency injection (no globals)
+- Full observability (tracing, metrics, logging)
+- Optimized queries with DTOs
+- Comprehensive API documentation
+- Fast, reliable test suite
+- Complete analytics features
+- Improved code maintainability
+
+## Files to Prioritize
+
+1. `internal/app/search/service.go` - Core search implementation (P0)
+2. `internal/platform/config/config.go` - Configuration refactoring (P1)
+3. `internal/observability/*` - Observability enhancements (P0)
+4. `internal/app/*/queries.go` - DTO implementation (P1)
+5. `internal/platform/log/*` - Logging improvements (P1)
+6. `api/README.md` - API documentation (P1)
+
+## Notes
+
+- Codebase uses Go 1.25
+- Follows DDD/CQRS/Clean Architecture patterns
+- GraphQL API with gqlgen
+- PostgreSQL with GORM
+- Weaviate for vector search
+- Redis for caching and job queue
+- Docker for local development
+- Existing tests should continue to pass
+- Follow existing code style and patterns
+