Update workflows and tasks documentation

This commit is contained in:
Damir Mukimov 2025-11-30 03:12:44 +01:00
parent 24d48396ca
commit b5cd1761af
No known key found for this signature in database
GPG Key ID: 42996CC7C73BC750
5 changed files with 1525 additions and 26 deletions

View File

@ -34,4 +34,4 @@ COPY --from=builder /app/tercul .
EXPOSE 8080
# Command to run the application
CMD ["./tercul"]
CMD ["./tercul"]

963
PRODUCTION-TASKS.md Normal file
View File

@ -0,0 +1,963 @@
# Tercul Backend - Production Readiness Tasks
**Generated:** November 27, 2025
**Current Status:** Most core features implemented, needs production hardening
> **⚠️ MIGRATED TO GITHUB ISSUES**
>
> All production readiness tasks have been migrated to GitHub Issues for better tracking.
> See issues #30-38 in the repository: <https://github.com/SamyRai/backend/issues>
>
> This document is kept for reference only and should not be used for task tracking.
---
## 📊 Current Reality Check
### ✅ What's Actually Working
- ✅ Full GraphQL API with 90%+ resolvers implemented
- ✅ Complete CQRS pattern (Commands & Queries)
- ✅ Auth system (Register, Login, JWT, Password Reset, Email Verification)
- ✅ Work CRUD with authorization
- ✅ Translation management with analytics
- ✅ User management and profiles
- ✅ Collections, Comments, Likes, Bookmarks
- ✅ Contributions with review workflow
- ✅ Analytics service (views, likes, trending)
- ✅ Clean Architecture with DDD patterns
- ✅ Comprehensive test coverage (passing tests)
- ✅ CI/CD pipelines (build, test, lint, security, docker)
- ✅ Docker setup and containerization
- ✅ Database migrations and schema
### ⚠️ What Needs Work
- ⚠️ Search functionality (stub implementation) → **Issue #30**
- ⚠️ Observability (metrics, tracing) → **Issues #31, #32, #33**
- ⚠️ Production deployment automation → **Issue #36**
- ⚠️ Performance optimization → **Issues #34, #35**
- ⚠️ Security hardening → **Issue #37**
- ⚠️ Infrastructure as Code → **Issue #38**
---
## 🎯 EPIC 1: Search & Discovery (HIGH PRIORITY)
### Story 1.1: Full-Text Search Implementation
**Priority:** P0 (Critical)
**Estimate:** 8 story points (2-3 days)
**Labels:** `enhancement`, `search`, `backend`
**User Story:**
```
As a user exploring literary works,
I want to search across works, translations, and authors by keywords,
So that I can quickly find relevant content in my preferred language.
```
**Acceptance Criteria:**
- [ ] Implement Weaviate-based full-text search for works
- [ ] Index work titles, content, and metadata
- [ ] Support multi-language search (Russian, English, Tatar)
- [ ] Search returns relevance-ranked results
- [ ] Support filtering by language, category, tags, authors
- [ ] Support date range filtering
- [ ] Search response time < 200ms for 95th percentile
- [ ] Handle special characters and diacritics correctly
**Technical Tasks:**
1. Complete `internal/app/search/service.go` implementation
2. Implement Weaviate schema for Works, Translations, Authors
3. Create background indexing job for existing content
4. Add incremental indexing on create/update operations
5. Implement search query parsing and normalization
6. Add search result pagination and sorting
7. Create integration tests for search functionality
8. Add search metrics and monitoring
**Dependencies:**
- Weaviate instance running (already in docker-compose)
- `internal/platform/search` client (exists)
- `internal/domain/search` interfaces (exists)
**Definition of Done:**
- All acceptance criteria met
- Unit tests passing (>80% coverage)
- Integration tests with real Weaviate instance
- Performance benchmarks documented
- Search analytics tracked
---
### Story 1.2: Advanced Search Filters
**Priority:** P1 (High)
**Estimate:** 5 story points (1-2 days)
**Labels:** `enhancement`, `search`, `backend`
**User Story:**
```
As a researcher or literary enthusiast,
I want to filter search results by multiple criteria simultaneously,
So that I can narrow down to exactly the works I'm interested in.
```
**Acceptance Criteria:**
- [ ] Filter by literature type (poetry, prose, drama)
- [ ] Filter by time period (creation date ranges)
- [ ] Filter by multiple authors simultaneously
- [ ] Filter by genre/categories
- [ ] Filter by language availability
- [ ] Combine filters with AND/OR logic
- [ ] Save search filters as presets (future)
**Technical Tasks:**
1. Extend `SearchFilters` domain model
2. Implement filter translation to Weaviate queries
3. Add faceted search capabilities
4. Implement filter validation
5. Add filter combination logic
6. Create filter preset storage (optional)
7. Add tests for all filter combinations
---
## 🎯 EPIC 2: API Documentation (HIGH PRIORITY)
### Story 2.1: Comprehensive GraphQL API Documentation
**Priority:** P1 (High)
**Estimate:** 5 story points (1-2 days)
**Labels:** `documentation`, `api`, `devex`
**User Story:**
```
As a frontend developer or API consumer,
I want complete documentation for all GraphQL queries and mutations,
So that I can integrate with the API without constantly asking questions.
```
**Acceptance Criteria:**
- [ ] Document all 80+ GraphQL resolvers
- [ ] Include example queries for each operation
- [ ] Document input types and validation rules
- [ ] Provide error response examples
- [ ] Document authentication requirements
- [ ] Include rate limiting information
- [ ] Add GraphQL Playground with example queries
- [ ] Auto-generate docs from schema annotations
**Technical Tasks:**
1. Add descriptions to all GraphQL types in schema
2. Document each query/mutation with examples
3. Create `api/README.md` with comprehensive guide
4. Add inline schema documentation
5. Set up GraphQL Voyager for schema visualization
6. Create API changelog
7. Add versioning documentation
8. Generate OpenAPI spec for REST endpoints (if any)
**Deliverables:**
- `api/README.md` - Complete API guide
- `api/EXAMPLES.md` - Query examples
- `api/CHANGELOG.md` - API version history
- Enhanced GraphQL schema with descriptions
- Interactive API explorer
---
### Story 2.2: Developer Onboarding Documentation
**Priority:** P1 (High)
**Estimate:** 3 story points (1 day)
**Labels:** `documentation`, `devex`
**User Story:**
```
As a new developer joining the project,
I want clear setup instructions and architecture documentation,
So that I can become productive quickly without extensive hand-holding.
```
**Acceptance Criteria:**
- [ ] Updated `README.md` with quick start guide
- [ ] Architecture diagrams and explanations
- [ ] Development workflow documentation
- [ ] Testing strategy documentation
- [ ] Contribution guidelines
- [ ] Code style guide
- [ ] Troubleshooting common issues
**Technical Tasks:**
1. Update root `README.md` with modern structure
2. Create `docs/ARCHITECTURE.md` with diagrams
3. Document CQRS and DDD patterns used
4. Create `docs/DEVELOPMENT.md` workflow guide
5. Document testing strategy in `docs/TESTING.md`
6. Create `CONTRIBUTING.md` guide
7. Add package-level `README.md` for complex packages
**Deliverables:**
- Refreshed `README.md`
- `docs/ARCHITECTURE.md`
- `docs/DEVELOPMENT.md`
- `docs/TESTING.md`
- `CONTRIBUTING.md`
---
## 🎯 EPIC 3: Observability & Monitoring (CRITICAL FOR PRODUCTION)
### Story 3.1: Distributed Tracing with OpenTelemetry
**Priority:** P0 (Critical)
**Estimate:** 8 story points (2-3 days)
**Labels:** `observability`, `monitoring`, `infrastructure`
**User Story:**
```
As a DevOps engineer monitoring production,
I want distributed tracing across all services and database calls,
So that I can quickly identify performance bottlenecks and errors.
```
**Acceptance Criteria:**
- [ ] OpenTelemetry SDK integrated
- [ ] Automatic trace context propagation
- [ ] All HTTP handlers instrumented
- [ ] All database queries traced
- [ ] All GraphQL resolvers traced
- [ ] Custom spans for business logic
- [ ] Traces exported to OTLP collector
- [ ] Integration with Jaeger/Tempo
**Technical Tasks:**
1. Add OpenTelemetry Go SDK dependencies
2. Create `internal/observability/tracing` package
3. Instrument HTTP middleware with auto-tracing
4. Add database query tracing via GORM callbacks
5. Instrument GraphQL execution
6. Add custom spans for slow operations
7. Set up trace sampling strategy
8. Configure OTLP exporter
9. Add Jaeger to docker-compose for local dev
10. Document tracing best practices
**Configuration:**
```go
// Example trace configuration
type TracingConfig struct {
Enabled bool
ServiceName string
SamplingRate float64
OTLPEndpoint string
}
```
---
### Story 3.2: Prometheus Metrics & Alerting
**Priority:** P0 (Critical)
**Estimate:** 5 story points (1-2 days)
**Labels:** `observability`, `monitoring`, `metrics`
**User Story:**
```
As a site reliability engineer,
I want detailed metrics on API performance and system health,
So that I can detect issues before they impact users.
```
**Acceptance Criteria:**
- [ ] HTTP request metrics (latency, status codes, throughput)
- [ ] Database query metrics (query time, connection pool)
- [ ] Business metrics (works created, searches performed)
- [ ] System metrics (memory, CPU, goroutines)
- [ ] GraphQL-specific metrics (resolver performance)
- [ ] Metrics exposed on `/metrics` endpoint
- [ ] Prometheus scraping configured
- [ ] Grafana dashboards created
**Technical Tasks:**
1. Enhance existing Prometheus middleware
2. Add HTTP handler metrics (already partially done)
3. Add database query duration histograms
4. Create business metric counters
5. Add GraphQL resolver metrics
6. Create custom metrics for critical paths
7. Set up metric labels strategy
8. Create Grafana dashboard JSON
9. Define SLOs and SLIs
10. Create alerting rules YAML
**Key Metrics:**
```
# HTTP Metrics
http_requests_total{method, path, status}
http_request_duration_seconds{method, path}
# Database Metrics
db_query_duration_seconds{query}
db_connections_current
db_connections_max
# Business Metrics
works_created_total{language}
searches_performed_total{type}
user_registrations_total
# GraphQL Metrics
graphql_resolver_duration_seconds{operation, resolver}
graphql_errors_total{operation, error_type}
```
---
### Story 3.3: Structured Logging Enhancements
**Priority:** P1 (High)
**Estimate:** 3 story points (1 day)
**Labels:** `observability`, `logging`
**User Story:**
```
As a developer debugging production issues,
I want rich, structured logs with request context,
So that I can quickly trace requests and identify root causes.
```
**Acceptance Criteria:**
- [ ] Request ID in all logs
- [ ] User ID in authenticated request logs
- [ ] Trace ID/Span ID in all logs
- [ ] Consistent log levels across codebase
- [ ] Sensitive data excluded from logs
- [ ] Structured fields for easy parsing
- [ ] Log sampling for high-volume endpoints
**Technical Tasks:**
1. Enhance HTTP middleware to inject request ID
2. Add user ID to context from JWT
3. Add trace/span IDs to logger context
4. Audit all logging statements for consistency
5. Add field name constants for structured logging
6. Implement log redaction for passwords/tokens
7. Add log sampling configuration
8. Create log aggregation guide (ELK/Loki)
**Log Format Example:**
```json
{
"level": "info",
"ts": "2025-11-27T10:30:45.123Z",
"msg": "Work created successfully",
"request_id": "req_abc123",
"user_id": "user_456",
"trace_id": "trace_xyz789",
"span_id": "span_def321",
"work_id": 789,
"language": "en",
"duration_ms": 45
}
```
---
## 🎯 EPIC 4: Performance Optimization (MEDIUM PRIORITY)
### Story 4.1: Read Models (DTOs) for Efficient Queries
**Priority:** P1 (High)
**Estimate:** 8 story points (2-3 days)
**Labels:** `performance`, `architecture`, `refactoring`
**User Story:**
```
As an API consumer,
I want fast query responses with only the data I need,
So that my application loads quickly and uses less bandwidth.
```
**Acceptance Criteria:**
- [ ] Create DTOs for all list queries
- [ ] DTOs include only fields needed by API
- [ ] Avoid N+1 queries with proper joins
- [ ] Reduce payload size by 30-50%
- [ ] Query response time improved by 20%
- [ ] No breaking changes to GraphQL schema
**Technical Tasks:**
1. Create `internal/app/work/dto` package
2. Define WorkListDTO, WorkDetailDTO
3. Create TranslationListDTO, TranslationDetailDTO
4. Define AuthorListDTO, AuthorDetailDTO
5. Implement optimized SQL queries for DTOs
6. Update query services to return DTOs
7. Update GraphQL resolvers to map DTOs
8. Add benchmarks comparing old vs new
9. Update tests to use DTOs
10. Document DTO usage patterns
**Example DTO:**
```go
// WorkListDTO - Optimized for list views
type WorkListDTO struct {
ID uint
Title string
AuthorName string
AuthorID uint
Language string
CreatedAt time.Time
ViewCount int
LikeCount int
TranslationCount int
}
// WorkDetailDTO - Full information for single work
type WorkDetailDTO struct {
*WorkListDTO
Content string
Description string
Tags []string
Categories []string
Translations []TranslationSummaryDTO
Author AuthorSummaryDTO
Analytics WorkAnalyticsDTO
}
```
---
### Story 4.2: Redis Caching Strategy
**Priority:** P1 (High)
**Estimate:** 5 story points (1-2 days)
**Labels:** `performance`, `caching`, `infrastructure`
**User Story:**
```
As a user browsing popular works,
I want instant page loads for frequently accessed content,
So that I have a smooth, responsive experience.
```
**Acceptance Criteria:**
- [ ] Cache hot works (top 100 viewed)
- [ ] Cache author profiles
- [ ] Cache search results (5 min TTL)
- [ ] Cache translations by work ID
- [ ] Automatic cache invalidation on updates
- [ ] Cache hit rate > 70% for reads
- [ ] Cache warming for popular content
- [ ] Redis failover doesn't break app
**Technical Tasks:**
1. Refactor `internal/data/cache` with decorator pattern
2. Create `CachedWorkRepository` decorator
3. Implement cache-aside pattern
4. Add cache key versioning strategy
5. Implement selective cache invalidation
6. Add cache metrics (hit/miss rates)
7. Create cache warming job
8. Handle cache failures gracefully
9. Document caching strategy
10. Add cache configuration
**Cache Key Strategy:**
```
work:{version}:{id}
author:{version}:{id}
translation:{version}:{work_id}:{lang}
search:{version}:{query_hash}
trending:{period}
```
---
### Story 4.3: Database Query Optimization
**Priority:** P2 (Medium)
**Estimate:** 5 story points (1-2 days)
**Labels:** `performance`, `database`
**User Story:**
```
As a user with slow internet,
I want database operations to complete quickly,
So that I don't experience frustrating delays.
```
**Acceptance Criteria:**
- [ ] All queries use proper indexes
- [ ] No N+1 query problems
- [ ] Eager loading for related entities
- [ ] Query time < 50ms for 95th percentile
- [ ] Connection pool properly sized
- [ ] Slow query logging enabled
- [ ] Query explain plans documented
**Technical Tasks:**
1. Audit all repository queries
2. Add missing database indexes
3. Implement eager loading with GORM Preload
4. Fix N+1 queries in GraphQL resolvers
5. Optimize joins and subqueries
6. Add query timeouts
7. Configure connection pool settings
8. Enable PostgreSQL slow query log
9. Create query performance dashboard
10. Document query optimization patterns
---
## 🎯 EPIC 5: Deployment & DevOps (CRITICAL FOR PRODUCTION)
### Story 5.1: Production Deployment Automation
**Priority:** P0 (Critical)
**Estimate:** 8 story points (2-3 days)
**Labels:** `devops`, `deployment`, `infrastructure`
**User Story:**
```
As a DevOps engineer,
I want automated, zero-downtime deployments to production,
So that we can ship features safely and frequently.
```
**Acceptance Criteria:**
- [ ] Automated deployment on tag push
- [ ] Blue-green or rolling deployment strategy
- [ ] Health checks before traffic routing
- [ ] Automatic rollback on failures
- [ ] Database migrations run automatically
- [ ] Smoke tests after deployment
- [ ] Deployment notifications (Slack/Discord)
- [ ] Deployment dashboard
**Technical Tasks:**
1. Complete `.github/workflows/deploy.yml` implementation
2. Set up staging environment
3. Implement blue-green deployment strategy
4. Add health check endpoints (`/health`, `/ready`)
5. Create database migration runner
6. Add pre-deployment smoke tests
7. Configure load balancer for zero-downtime
8. Set up deployment notifications
9. Create rollback procedures
10. Document deployment process
**Health Check Endpoints:**
```go
GET /health -> {"status": "ok", "version": "1.2.3"}
GET /ready -> {"ready": true, "db": "ok", "redis": "ok"}
GET /metrics -> Prometheus metrics
```
---
### Story 5.2: Infrastructure as Code (Kubernetes)
**Priority:** P1 (High)
**Estimate:** 8 story points (2-3 days)
**Labels:** `devops`, `infrastructure`, `k8s`
**User Story:**
```
As a platform engineer,
I want all infrastructure defined as code,
So that environments are reproducible and version-controlled.
```
**Acceptance Criteria:**
- [ ] Kubernetes manifests for all services
- [ ] Helm charts for easy deployment
- [ ] ConfigMaps for configuration
- [ ] Secrets management with sealed secrets
- [ ] Horizontal Pod Autoscaling configured
- [ ] Ingress with TLS termination
- [ ] Persistent volumes for PostgreSQL/Redis
- [ ] Network policies for security
**Technical Tasks:**
1. Enhance `deploy/k8s` manifests
2. Create Deployment YAML for backend
3. Create Service and Ingress YAMLs
4. Create ConfigMap for app configuration
5. Set up Sealed Secrets for sensitive data
6. Create HorizontalPodAutoscaler
7. Add resource limits and requests
8. Create StatefulSets for databases
9. Set up persistent volume claims
10. Create Helm chart structure
11. Document Kubernetes deployment
**File Structure:**
```
deploy/k8s/
├── base/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── configmap.yaml
│ └── hpa.yaml
├── overlays/
│ ├── staging/
│ └── production/
└── helm/
└── tercul-backend/
├── Chart.yaml
├── values.yaml
└── templates/
```
---
### Story 5.3: Disaster Recovery & Backups
**Priority:** P1 (High)
**Estimate:** 5 story points (1-2 days)
**Labels:** `devops`, `backup`, `disaster-recovery`
**User Story:**
```
As a business owner,
I want automated backups and disaster recovery procedures,
So that we never lose user data or have extended outages.
```
**Acceptance Criteria:**
- [ ] Daily PostgreSQL backups
- [ ] Point-in-time recovery capability
- [ ] Backup retention policy (30 days)
- [ ] Backup restoration tested monthly
- [ ] Backup encryption at rest
- [ ] Off-site backup storage
- [ ] Disaster recovery runbook
- [ ] RTO < 1 hour, RPO < 15 minutes
**Technical Tasks:**
1. Set up automated database backups
2. Configure WAL archiving for PostgreSQL
3. Implement backup retention policy
4. Store backups in S3/GCS with encryption
5. Create backup restoration script
6. Test restoration procedure
7. Create disaster recovery runbook
8. Set up backup monitoring and alerts
9. Document backup procedures
10. Schedule regular DR drills
---
## 🎯 EPIC 6: Security Hardening (HIGH PRIORITY)
### Story 6.1: Security Audit & Vulnerability Scanning
**Priority:** P0 (Critical)
**Estimate:** 5 story points (1-2 days)
**Labels:** `security`, `compliance`
**User Story:**
```
As a security officer,
I want continuous vulnerability scanning and security best practices,
So that user data and the platform remain secure.
```
**Acceptance Criteria:**
- [ ] Dependency scanning with Dependabot (already active)
- [ ] SAST scanning with CodeQL
- [ ] Container scanning with Trivy
- [ ] No high/critical vulnerabilities
- [ ] Security headers configured
- [ ] Rate limiting on all endpoints
- [ ] Input validation on all mutations
- [ ] SQL injection prevention verified
**Technical Tasks:**
1. Review existing security workflows (already good!)
2. Add rate limiting middleware
3. Implement input validation with go-playground/validator
4. Add security headers middleware
5. Audit SQL queries for injection risks
6. Review JWT implementation for best practices
7. Add CSRF protection for mutations
8. Implement request signing for sensitive operations
9. Create security incident response plan
10. Document security practices
**Security Headers:**
```
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Strict-Transport-Security: max-age=31536000
Content-Security-Policy: default-src 'self'
```
---
### Story 6.2: API Rate Limiting & Throttling
**Priority:** P1 (High)
**Estimate:** 3 story points (1 day)
**Labels:** `security`, `performance`, `api`
**User Story:**
```
As a platform operator,
I want rate limiting to prevent abuse and ensure fair usage,
So that all users have a good experience and our infrastructure isn't overwhelmed.
```
**Acceptance Criteria:**
- [ ] Rate limiting per user (authenticated)
- [ ] Rate limiting per IP (anonymous)
- [ ] Different limits for different operations
- [ ] 429 status code with retry-after header
- [ ] Rate limit info in response headers
- [ ] Configurable rate limits
- [ ] Redis-based distributed rate limiting
- [ ] Rate limit metrics and monitoring
**Technical Tasks:**
1. Implement rate limiting middleware
2. Use redis for distributed rate limiting
3. Configure different limits for read/write
4. Add rate limit headers to responses
5. Create rate limit exceeded error handling
6. Add rate limit bypass for admins
7. Monitor rate limit usage
8. Document rate limits in API docs
9. Add tests for rate limiting
10. Create rate limit dashboard
**Rate Limits:**
```
Authenticated Users:
- 1000 requests/hour (general)
- 100 writes/hour (mutations)
- 10 searches/minute
Anonymous Users:
- 100 requests/hour
- 10 writes/hour
- 5 searches/minute
```
---
## 🎯 EPIC 7: Developer Experience (MEDIUM PRIORITY)
### Story 7.1: Local Development Environment Improvements
**Priority:** P2 (Medium)
**Estimate:** 3 story points (1 day)
**Labels:** `devex`, `tooling`
**User Story:**
```
As a developer,
I want a fast, reliable local development environment,
So that I can iterate quickly without friction.
```
**Acceptance Criteria:**
- [ ] One-command setup (`make setup`)
- [ ] Hot reload for Go code changes
- [ ] Database seeding with realistic data
- [ ] GraphQL Playground pre-configured
- [ ] All services start reliably
- [ ] Clear error messages when setup fails
- [ ] Development docs up-to-date
**Technical Tasks:**
1. Create comprehensive `make setup` target
2. Add air for hot reload in docker-compose
3. Create database seeding script
4. Add sample data fixtures
5. Pre-configure GraphQL Playground
6. Add health check script
7. Improve error messages in Makefile
8. Document common setup issues
9. Create troubleshooting guide
10. Add setup validation script
---
### Story 7.2: Testing Infrastructure Improvements
**Priority:** P2 (Medium)
**Estimate:** 5 story points (1-2 days)
**Labels:** `testing`, `devex`
**User Story:**
```
As a developer writing tests,
I want fast, reliable test execution without external dependencies,
So that I can practice TDD effectively.
```
**Acceptance Criteria:**
- [ ] Unit tests run in <5 seconds
- [ ] Integration tests isolated with test containers
- [ ] Parallel test execution
- [ ] Test coverage reports
- [ ] Fixtures for common test scenarios
- [ ] Clear test failure messages
- [ ] Easy to run single test or package
**Technical Tasks:**
1. Refactor `internal/testutil` for better isolation
2. Implement test containers for integration tests
3. Add parallel test execution
4. Create reusable test fixtures
5. Set up coverage reporting
6. Add golden file testing utilities
7. Create test data builders
8. Improve test naming conventions
9. Document testing best practices
10. Add `make test-fast` and `make test-all`
---
## 📋 Task Summary & Prioritization
### Sprint 1 (Week 1): Critical Production Readiness
1. **Search Implementation** (Story 1.1) - 8 pts
2. **Distributed Tracing** (Story 3.1) - 8 pts
3. **Prometheus Metrics** (Story 3.2) - 5 pts
4. **Total:** 21 points
### Sprint 2 (Week 2): Performance & Documentation
1. **API Documentation** (Story 2.1) - 5 pts
2. **Read Models/DTOs** (Story 4.1) - 8 pts
3. **Redis Caching** (Story 4.2) - 5 pts
4. **Structured Logging** (Story 3.3) - 3 pts
5. **Total:** 21 points
### Sprint 3 (Week 3): Deployment & Security
1. **Production Deployment** (Story 5.1) - 8 pts
2. **Security Audit** (Story 6.1) - 5 pts
3. **Rate Limiting** (Story 6.2) - 3 pts
4. **Developer Docs** (Story 2.2) - 3 pts
5. **Total:** 19 points
### Sprint 4 (Week 4): Infrastructure & Polish
1. **Kubernetes IaC** (Story 5.2) - 8 pts
2. **Disaster Recovery** (Story 5.3) - 5 pts
3. **Advanced Search Filters** (Story 1.2) - 5 pts
4. **Total:** 18 points
### Sprint 5 (Week 5): Optimization & DevEx
1. **Database Optimization** (Story 4.3) - 5 pts
2. **Local Dev Environment** (Story 7.1) - 3 pts
3. **Testing Infrastructure** (Story 7.2) - 5 pts
4. **Total:** 13 points
## 🎯 Success Metrics
### Performance SLOs
- API response time p95 < 200ms
- Search response time p95 < 300ms
- Database query time p95 < 50ms
- Cache hit rate > 70%
### Reliability SLOs
- Uptime > 99.9% (< 8.7 hours downtime/year)
- Error rate < 0.1%
- Mean Time To Recovery < 1 hour
- Zero data loss
### Developer Experience
- Setup time < 15 minutes
- Test suite runs < 2 minutes
- Build time < 1 minute
- Documentation completeness > 90%
---
**Next Steps:**
1. Review and prioritize these tasks with the team
2. Create GitHub issues for Sprint 1 tasks
3. Add tasks to project board
4. Begin implementation starting with search and observability
**This is a realistic, achievable roadmap based on the ACTUAL current state of the codebase!** 🚀

View File

@ -17,47 +17,47 @@ This document is the single source of truth for all outstanding development task
### EPIC: Achieve Production-Ready API
- [x] **Implement All Unimplemented Resolvers:** The GraphQL API is critically incomplete. All of the following `panic`ing resolvers must be implemented. *(Jules' Note: Investigation revealed that all listed resolvers are already implemented. This task is complete.)*
- **Mutations:** `DeleteUser`, `CreateContribution`, `UpdateContribution`, `DeleteContribution`, `ReviewContribution`, `Logout`, `RefreshToken`, `ForgotPassword`, `ResetPassword`, `VerifyEmail`, `ResendVerificationEmail`, `UpdateProfile`, `ChangePassword`.
- **Queries:** `Translations`, `Author`, `User`, `UserByEmail`, `UserByUsername`, `Me`, `UserProfile`, `Collection`, `Collections`, `Comment`, `Comments`, `Search`.
- **Mutations:** `DeleteUser`, `CreateContribution`, `UpdateContribution`, `DeleteContribution`, `ReviewContribution`, `Logout`, `RefreshToken`, `ForgotPassword`, `ResetPassword`, `VerifyEmail`, `ResendVerificationEmail`, `UpdateProfile`, `ChangePassword`.
- **Queries:** `Translations`, `Author`, `User`, `UserByEmail`, `UserByUsername`, `Me`, `UserProfile`, `Collection`, `Collections`, `Comment`, `Comments`, `Search`.
- [x] **Refactor API Server Setup:** The API server startup in `cmd/api/main.go` is unnecessarily complex. *(Jules' Note: This was completed by refactoring the server setup into `cmd/api/server.go`.)*
- [x] Consolidate the GraphQL Playground and Prometheus metrics endpoints into the main API server, exposing them on different routes (e.g., `/playground`, `/metrics`).
- [x] Consolidate the GraphQL Playground and Prometheus metrics endpoints into the main API server, exposing them on different routes (e.g., `/playground`, `/metrics`).
### EPIC: Comprehensive Documentation
- [ ] **Create Full API Documentation:** The current API documentation is critically incomplete. We need to document every query, mutation, and type in the GraphQL schema.
- [ ] Update `api/README.md` to be a comprehensive guide for API consumers.
- [ ] Update `api/README.md` to be a comprehensive guide for API consumers.
- [ ] **Improve Project `README.md`:** The root `README.md` should be a welcoming and useful entry point for new developers.
- [ ] Add sections for project overview, getting started, running tests, and architectural principles.
- [ ] Add sections for project overview, getting started, running tests, and architectural principles.
- [ ] **Ensure Key Packages Have READMEs:** Follow the example of `./internal/jobs/sync/README.md` for other critical components.
### EPIC: Foundational Infrastructure
- [ ] **Establish CI/CD Pipeline:** A robust CI/CD pipeline is essential for ensuring code quality and enabling safe deployments.
- [x] **CI:** Create a `Makefile` target `lint-test` that runs `golangci-lint` and `go test ./...`. Configure the CI pipeline to run this on every push. *(Jules' Note: The `lint-test` target now exists and passes successfully.)*
- [ ] **CD:** Set up automated deployments to a staging environment upon a successful merge to the main branch.
- [x] **CI:** Create a `Makefile` target `lint-test` that runs `golangci-lint` and `go test ./...`. Configure the CI pipeline to run this on every push. *(Jules' Note: The `lint-test` target now exists and passes successfully.)*
- [ ] **CD:** Set up automated deployments to a staging environment upon a successful merge to the main branch.
- [ ] **Implement Full Observability:** We need a comprehensive observability stack to understand the application's behavior.
- [ ] **Centralized Logging:** Ensure all services use the structured `zerolog` logger from `internal/platform/log`. Add request/user/span IDs to the logging context in the HTTP middleware.
- [ ] **Metrics:** Add Prometheus metrics for API request latency, error rates, and database query performance.
- [ ] **Tracing:** Instrument all application services and data layer methods with OpenTelemetry tracing.
- [ ] **Centralized Logging:** Ensure all services use the structured `zerolog` logger from `internal/platform/log`. Add request/user/span IDs to the logging context in the HTTP middleware.
- [ ] **Metrics:** Add Prometheus metrics for API request latency, error rates, and database query performance.
- [ ] **Tracing:** Instrument all application services and data layer methods with OpenTelemetry tracing.
### EPIC: Core Architectural Refactoring
- [x] **Refactor Dependency Injection:** The application's DI container in `internal/app/app.go` violates the Dependency Inversion Principle. *(Jules' Note: The composition root has been moved to `cmd/api/main.go`.)*
- [x] Refactor `NewApplication` to accept repository *interfaces* (e.g., `domain.WorkRepository`) instead of the concrete `*sql.Repositories`.
- [x] Move the instantiation of platform components (e.g., `JWTManager`) out of `NewApplication` and into `cmd/api/main.go`, passing them in as dependencies.
- [x] Refactor `NewApplication` to accept repository *interfaces* (e.g., `domain.WorkRepository`) instead of the concrete `*sql.Repositories`.
- [x] Move the instantiation of platform components (e.g., `JWTManager`) out of `NewApplication` and into `cmd/api/main.go`, passing them in as dependencies.
- [ ] **Implement Read Models (DTOs):** Application queries currently return full domain entities, which is inefficient and leaks domain logic.
- [ ] Refactor application queries (e.g., in `internal/app/work/queries.go`) to return specialized read models (DTOs) tailored for the API.
- [ ] Refactor application queries (e.g., in `internal/app/work/queries.go`) to return specialized read models (DTOs) tailored for the API.
- [ ] **Improve Configuration Handling:** The application relies on global singletons for configuration (`config.Cfg`).
- [ ] Refactor to use struct-based configuration injected via constructors, as outlined in `refactor.md`.
- [ ] Make the database migration path configurable instead of using a brittle, hardcoded path.
- [ ] Make the metrics server port configurable.
- [ ] Refactor to use struct-based configuration injected via constructors, as outlined in `refactor.md`.
- [ ] Make the database migration path configurable instead of using a brittle, hardcoded path.
- [ ] Make the metrics server port configurable.
### EPIC: Robust Testing Framework
- [ ] **Refactor Testing Utilities:** Decouple our tests from a live database to make them faster and more reliable.
- [ ] Remove all database connection logic from `internal/testutil/testutil.go`.
- [ ] Remove all database connection logic from `internal/testutil/testutil.go`.
- [x] **Implement Mock Repositories:** The test mocks are incomplete and `panic`. *(Jules' Note: Investigation revealed the listed mocks are fully implemented and do not panic. This task is complete.)*
- [x] Implement the `panic("not implemented")` methods in `internal/adapters/graphql/like_repo_mock_test.go`, `internal/adapters/graphql/work_repo_mock_test.go`, and `internal/testutil/mock_user_repository.go`.
- [x] Implement the `panic("not implemented")` methods in `internal/adapters/graphql/like_repo_mock_test.go`, `internal/adapters/graphql/work_repo_mock_test.go`, and `internal/testutil/mock_user_repository.go`.
---
@ -67,10 +67,10 @@ This document is the single source of truth for all outstanding development task
- [ ] **Implement `AnalyzeWork` Command:** The `AnalyzeWork` command in `internal/app/work/commands.go` is currently a stub.
- [ ] **Implement Analytics Features:** User engagement metrics are a core business requirement.
- [ ] Implement like, comment, and bookmark counting.
- [ ] Implement a service to calculate popular translations based on the above metrics.
- [ ] Implement like, comment, and bookmark counting.
- [ ] Implement a service to calculate popular translations based on the above metrics.
- [ ] **Refactor `enrich` Tool:** The `cmd/tools/enrich/main.go` tool is architecturally misaligned.
- [ ] Refactor the tool to use application services instead of accessing data repositories directly.
- [ ] Refactor the tool to use application services instead of accessing data repositories directly.
### EPIC: Further Architectural Improvements
@ -92,4 +92,4 @@ This document is the single source of truth for all outstanding development task
## Completed
- [x] `internal/app/work/commands.go`: The `MergeWork` command is fully implemented.
- [x] `internal/app/search/service.go`: The search service correctly fetches content from the localization service.
- [x] `internal/app/search/service.go`: The search service correctly fetches content from the localization service.

View File

@ -26,18 +26,21 @@ tercul-go/
## 🏗️ Architecture Highlights
### 1. **Clean Architecture**
- **Domain Layer**: Pure business entities with validation logic
- **Application Layer**: Use cases and business logic (to be implemented)
- **Infrastructure Layer**: Database, storage, external services (to be implemented)
- **Presentation Layer**: HTTP API, GraphQL, admin interface (to be implemented)
### 2. **Database Design**
- **PostgreSQL 16+**: Modern, performant database with advanced features
- **Improved Schema**: Fixed all identified data quality issues
- **Performance Indexes**: Full-text search, trigram matching, JSONB indexes
- **Data Integrity**: Proper foreign keys, constraints, and triggers
### 3. **Technology Stack**
- **Go 1.24+**: Latest stable version with modern features
- **GORM v3**: Type-safe ORM with PostgreSQL support
- **Chi Router**: Lightweight, fast HTTP router
@ -47,6 +50,7 @@ tercul-go/
## 🔧 Data Quality Issues Addressed
### **Schema Improvements**
1. **Timestamp Formats**: Proper DATE and TIMESTAMP types
2. **UUID Handling**: Consistent UUID generation and validation
3. **Content Cleaning**: Structured JSONB for complex data
@ -54,6 +58,7 @@ tercul-go/
5. **Data Types**: Proper ENUMs for categorical data
### **Data Migration Strategy**
- **Phased Approach**: Countries → Authors → Works → Media → Copyrights
- **Data Validation**: Comprehensive validation during migration
- **Error Handling**: Graceful handling of malformed data
@ -62,18 +67,21 @@ tercul-go/
## 🚀 Key Features Implemented
### 1. **Domain Models**
- **Author Entity**: Core author information with validation
- **AuthorTranslation**: Multi-language author details
- **Error Handling**: Comprehensive domain-specific errors
- **Business Logic**: Age calculation, validation rules
### 2. **Development Environment**
- **Docker Compose**: PostgreSQL, Redis, Adminer, Redis Commander
- **Hot Reloading**: Go development with volume mounting
- **Database Management**: Easy database reset, backup, restore
- **Monitoring**: Health checks and service status
### 3. **Migration Tools**
- **SQLite to PostgreSQL**: Complete data migration pipeline
- **Schema Creation**: Automated database setup
- **Data Validation**: Quality checks during migration
@ -94,6 +102,7 @@ Based on the analysis of your SQLite dump:
## 🎯 Next Implementation Steps
### **Phase 1: Complete Domain Models** (Week 1-2)
- [ ] Work and WorkTranslation entities
- [ ] Book and BookTranslation entities
- [ ] Country and CountryTranslation entities
@ -101,30 +110,35 @@ Based on the analysis of your SQLite dump:
- [ ] User and authentication entities
### **Phase 2: Repository Layer** (Week 3-4)
- [ ] Database repositories for all entities
- [ ] Data access abstractions
- [ ] Transaction management
- [ ] Query optimization
### **Phase 3: Service Layer** (Week 5-6)
- [ ] Business logic implementation
- [ ] Search and filtering services
- [ ] Content management services
- [ ] Authentication and authorization
### **Phase 4: API Layer** (Week 7-8)
- [ ] HTTP handlers and middleware
- [ ] RESTful API endpoints
- [ ] GraphQL schema and resolvers
- [ ] Input validation and sanitization
### **Phase 5: Admin Interface** (Week 9-10)
- [ ] Content management system
- [ ] User administration
- [ ] Data import/export tools
- [ ] Analytics and reporting
### **Phase 6: Testing & Deployment** (Week 11-12)
- [ ] Comprehensive testing suite
- [ ] Performance optimization
- [ ] Production deployment
@ -155,12 +169,14 @@ make logs
## 🔍 Data Migration Process
### **Step 1: Schema Creation**
```bash
# Database will be automatically initialized with proper schema
docker-compose up -d postgres
```
### **Step 2: Data Migration**
```bash
# Migrate data from your SQLite dump
make migrate-data
@ -168,6 +184,7 @@ make migrate-data
```
### **Step 3: Verification**
```bash
# Check migration status
make status
@ -177,17 +194,20 @@ make status
## 📈 Performance Improvements
### **Database Optimizations**
- **Full-Text Search**: PostgreSQL FTS for fast text search
- **Trigram Indexes**: Partial string matching
- **JSONB Indexes**: Efficient JSON querying
- **Connection Pooling**: Optimized database connections
### **Caching Strategy**
- **Redis**: Frequently accessed data caching
- **Application Cache**: In-memory caching for hot data
- **CDN Ready**: Static asset optimization
### **Search Capabilities**
- **Multi-language Search**: Support for all content languages
- **Fuzzy Matching**: Typo-tolerant search
- **Faceted Search**: Filter by author, genre, language, etc.
@ -196,12 +216,14 @@ make status
## 🔒 Security Features
### **Authentication & Authorization**
- **JWT Tokens**: Secure API authentication
- **Role-Based Access**: Admin, editor, viewer roles
- **API Rate Limiting**: Prevent abuse and DDoS
- **Input Validation**: Comprehensive input sanitization
### **Data Protection**
- **HTTPS Enforcement**: Encrypted communication
- **SQL Injection Prevention**: Parameterized queries
- **XSS Protection**: Content sanitization
@ -210,12 +232,14 @@ make status
## 📊 Monitoring & Observability
### **Metrics Collection**
- **Prometheus**: System and business metrics
- **Grafana**: Visualization and dashboards
- **Health Checks**: Service health monitoring
- **Performance Tracking**: Response time and throughput
### **Logging Strategy**
- **Structured Logging**: JSON format logs
- **Log Levels**: Debug, info, warn, error
- **Audit Trail**: Track all data changes
@ -224,24 +248,28 @@ make status
## 🌟 Key Benefits of This Architecture
### **1. Data Preservation**
- **100% Record Migration**: All cultural content preserved
- **Data Quality**: Automatic fixing of identified issues
- **Relationship Integrity**: Maintains all author-work connections
- **Multi-language Support**: Preserves all language variants
### **2. Performance**
- **10x Faster Search**: Full-text search and optimized indexes
- **Scalable Architecture**: Designed for 10,000+ concurrent users
- **Efficient Caching**: Redis-based caching strategy
- **Optimized Queries**: Database query optimization
### **3. Maintainability**
- **Clean Code**: Following Go best practices
- **Modular Design**: Easy to extend and modify
- **Comprehensive Testing**: 90%+ test coverage target
- **Documentation**: Complete API and development docs
### **4. Future-Proof**
- **Modern Stack**: Latest Go and database technologies
- **Extensible Design**: Easy to add new features
- **API-First**: Ready for mobile apps and integrations
@ -250,6 +278,7 @@ make status
## 🚀 Getting Started
1. **Clone and Setup**
```bash
git clone <repository-url>
cd tercul-go
@ -258,31 +287,35 @@ make status
```
2. **Start Development Environment**
```bash
make setup
```
3. **Migrate Your Data**
```bash
make migrate-data
# Enter path to your SQLite dump
```
4. **Start the Application**
```bash
make run
```
5. **Access the System**
- **API**: http://localhost:8080
- **Database Admin**: http://localhost:8081
- **Redis Admin**: http://localhost:8082
- **API**: <http://localhost:8080>
- **Database Admin**: <http://localhost:8081>
- **Redis Admin**: <http://localhost:8082>
## 📞 Support & Next Steps
This foundation provides everything needed to rebuild the TERCUL platform while preserving all your cultural content. The architecture is production-ready and follows industry best practices.
**Next Steps:**
1. Review the architecture document for detailed technical specifications
2. Set up the development environment using the provided tools
3. Run the data migration to transfer your existing content

503
jules-task.md Normal file
View File

@ -0,0 +1,503 @@
# Backend Production Readiness & Code Quality Improvements
## Overview
Implement critical production-ready features, refactor architectural issues, and improve code quality for the Tercul backend. The codebase uses Go 1.25, follows DDD/CQRS patterns, GraphQL API, and clean architecture principles.
## Critical Issues to Resolve
### 1. Implement Full-Text Search Service (P0 - Critical)
**Problem**: The search service in `internal/app/search/service.go` is a stub that returns empty results. This is a core feature that users depend on.
**Current State**:
- `Search()` method returns empty results (line 31-39)
- `IndexWork()` is partially implemented but search logic missing
- Weaviate client exists but not utilized for search
- Search filters are defined but not applied
**Affected Files**:
- `internal/app/search/service.go` - Main search service (stub implementation)
- `internal/platform/search/weaviate_wrapper.go` - Weaviate client wrapper
- `internal/domain/search/search.go` - Search domain interfaces
- GraphQL resolvers that use search service
**Solution**:
1. Implement full Weaviate search query in `Search()` method:
- Query Weaviate for works, translations, and authors
- Apply search filters (language, type, date range, tags, authors)
- Support multi-language search (Russian, English, Tatar)
- Implement relevance ranking
- Add pagination support
- Handle special characters and diacritics
2. Enhance indexing:
- Index work titles, content, and metadata
- Index translation content with language tags
- Index author names and biographies
- Add incremental indexing on create/update operations
- Create background job for bulk indexing existing content
3. Add search result transformation:
- Map Weaviate results to domain entities
- Include relevance scores
- Handle empty results gracefully
- Add search analytics/metrics
**Acceptance Criteria**:
- Search returns relevant results ranked by relevance
- Supports filtering by language, category, tags, authors, date ranges
- Search response time < 200ms for 95th percentile
- Handles multi-language queries correctly
- All existing tests pass
- Integration tests with real Weaviate instance
### 2. Refactor Global Configuration Singleton (P1 - High Priority)
**Problem**: The application uses a global singleton `config.Cfg` which violates dependency injection principles and makes testing difficult.
**Current State**:
- `internal/platform/config/config.go` has global `var Cfg *Config`
- `config.Cfg` is accessed directly in multiple places:
- `internal/platform/search/bleve_client.go` (line 13)
- Various other packages
**Affected Files**:
- `internal/platform/config/config.go` - Global config singleton
- `internal/platform/search/bleve_client.go` - Uses `config.Cfg`
- `cmd/api/main.go` - Loads config but also sets global
- `cmd/worker/main.go` - Similar pattern
- Any other files accessing `config.Cfg` directly
**Solution**:
1. Remove global `Cfg` variable from config package
2. Refactor `LoadConfig()` to return config without setting global
3. Pass `*config.Config` as dependency to all constructors:
- Update `NewBleveClient()` to accept config parameter
- Update all repository constructors to accept config
- Update application service constructors
- Update platform service constructors
4. Update main entry points:
- `cmd/api/main.go` - Pass config to all dependencies
- `cmd/worker/main.go` - Pass config to all dependencies
- `cmd/tools/enrich/main.go` - Pass config to dependencies
5. Make configuration more flexible:
- Make migration path configurable (currently hardcoded)
- Make metrics server port configurable
- Add validation for required config values
- Add config struct tags for better documentation
**Acceptance Criteria**:
- No global `config.Cfg` usage anywhere in codebase
- All dependencies receive config via constructor injection
- Tests can easily mock/inject different configs
- Configuration validation on startup
- Backward compatible (same environment variables work)
### 3. Enhance Observability: Distributed Tracing (P0 - Critical)
**Problem**: Tracing is implemented but only exports to stdout. Need production-ready tracing with OTLP exporter and proper instrumentation.
**Current State**:
- `internal/observability/tracing.go` uses `stdouttrace` exporter
- Basic tracer provider exists but not production-ready
- Missing instrumentation in many places
**Affected Files**:
- `internal/observability/tracing.go` - Only stdout exporter
- HTTP middleware - May need tracing instrumentation
- GraphQL resolvers - Need span creation
- Database queries - Need query tracing
- Application services - Need business logic spans
**Solution**:
1. Replace stdout exporter with OTLP exporter:
- Add OTLP exporter configuration
- Support both gRPC and HTTP OTLP endpoints
- Add environment-based configuration (dev vs prod)
- Add trace sampling strategy (100% dev, 10% prod)
2. Enhance instrumentation:
- Add automatic HTTP request tracing in middleware
- Instrument all GraphQL resolvers with spans
- Add database query spans via GORM callbacks
- Create custom spans for slow operations (>100ms)
- Add span attributes (user_id, work_id, etc.)
3. Add trace context propagation:
- Ensure trace IDs propagate through all layers
- Add trace ID to structured logs
- Support distributed tracing across services
4. Configuration:
```go
type TracingConfig struct {
Enabled bool
ServiceName string
OTLPEndpoint string
SamplingRate float64
Environment string
}
```
**Acceptance Criteria**:
- Traces exported to OTLP collector (Jaeger/Tempo compatible)
- All HTTP requests have spans
- All GraphQL resolvers traced
- Database queries have spans
- Trace IDs in logs
- Sampling configurable per environment
### 4. Enhance Observability: Prometheus Metrics (P0 - Critical)
**Problem**: Basic metrics exist but need enhancement for production monitoring and alerting.
**Current State**:
- `internal/observability/metrics.go` has basic HTTP and DB metrics
- Missing business metrics, GraphQL-specific metrics
- No Grafana dashboards or alerting rules
**Affected Files**:
- `internal/observability/metrics.go` - Basic metrics
- GraphQL resolvers - Need resolver metrics
- Application services - Need business metrics
- Background jobs - Need job metrics
**Solution**:
1. Add GraphQL-specific metrics:
- `graphql_resolver_duration_seconds{operation, resolver}`
- `graphql_errors_total{operation, error_type}`
- `graphql_operations_total{operation, status}`
2. Add business metrics:
- `works_created_total{language}`
- `searches_performed_total{type}`
- `user_registrations_total`
- `translations_created_total{language}`
- `likes_total{entity_type}`
3. Enhance existing metrics:
- Add more labels to HTTP metrics (status code as number)
- Add query type labels to DB metrics
- Add connection pool metrics
- Add cache hit/miss metrics
4. Create observability package structure:
- Move metrics to `internal/observability/metrics/`
- Add metric collection helpers
- Document metric naming conventions
**Acceptance Criteria**:
- All critical paths have metrics
- GraphQL operations fully instrumented
- Business metrics tracked
- Metrics exposed on `/metrics` endpoint
- Metric labels follow Prometheus best practices
### 5. Implement Read Models (DTOs) for Efficient Queries (P1 - High Priority)
**Problem**: Application queries return full domain entities, which is inefficient and leaks domain logic to API layer.
**Current State**:
- Queries in `internal/app/*/queries.go` return domain entities
- GraphQL resolvers receive full entities with all fields
- No optimization for list vs detail views
**Affected Files**:
- `internal/app/work/queries.go` - Returns `domain.Work`
- `internal/app/translation/queries.go` - Returns `domain.Translation`
- `internal/app/author/queries.go` - Returns `domain.Author`
- GraphQL resolvers - Receive full entities
**Solution**:
1. Create DTO packages:
- `internal/app/work/dto` - WorkListDTO, WorkDetailDTO
- `internal/app/translation/dto` - TranslationListDTO, TranslationDetailDTO
- `internal/app/author/dto` - AuthorListDTO, AuthorDetailDTO
2. Define optimized DTOs:
```go
// WorkListDTO - For list views (minimal fields)
type WorkListDTO struct {
ID uint
Title string
AuthorName string
AuthorID uint
Language string
CreatedAt time.Time
ViewCount int
LikeCount int
TranslationCount int
}
// WorkDetailDTO - For single work view (all fields)
type WorkDetailDTO struct {
*WorkListDTO
Content string
Description string
Tags []string
Translations []TranslationSummaryDTO
Author AuthorSummaryDTO
}
```
3. Refactor queries to return DTOs:
- Update query methods to use optimized SQL
- Use joins to avoid N+1 queries
- Map domain entities to DTOs
- Update GraphQL resolvers to use DTOs
4. Add benchmarks comparing old vs new approach
**Acceptance Criteria**:
- List queries return optimized DTOs
- Detail queries return full DTOs
- No N+1 query problems
- Payload size reduced by 30-50%
- Query response time improved by 20%
- No breaking changes to GraphQL schema
### 6. Improve Structured Logging (P1 - High Priority)
**Problem**: Logging exists but lacks request context, user IDs, and trace correlation.
**Current State**:
- `internal/platform/log` uses zerolog
- Basic logging but missing context
- No request ID propagation
- No user ID in logs
- No trace/span ID correlation
**Affected Files**:
- `internal/platform/log/logger.go` - Basic logger
- HTTP middleware - Needs request ID injection
- All application services - Need context logging
**Solution**:
1. Enhance HTTP middleware:
- Generate request ID for each request
- Inject request ID into context
- Add user ID from JWT to context
- Add trace/span IDs to context
2. Update logger to use context:
- Extract request ID, user ID, trace ID from context
- Add to all log entries automatically
- Create helper: `log.FromContext(ctx).WithRequestID().WithUserID()`
3. Add structured logging fields:
- Define field name constants
- Ensure consistent field names across codebase
- Add sensitive data redaction
4. Implement log sampling:
- Sample high-volume endpoints (e.g., health checks)
- Configurable sampling rates
- Always log errors regardless of sampling
**Acceptance Criteria**:
- All logs include request ID
- Authenticated request logs include user ID
- All logs include trace/span IDs
- Consistent log format across codebase
- Sensitive data excluded from logs
- Log sampling for high-volume endpoints
### 7. Refactor Caching with Decorator Pattern (P1 - High Priority)
**Problem**: Current caching implementation uses bespoke cached repositories. Should use decorator pattern for better maintainability.
**Current State**:
- `internal/data/cache` has custom caching logic
- Cached repositories are separate implementations
- Not following decorator pattern
**Affected Files**:
- `internal/data/cache/*` - Current caching implementation
- Repository interfaces - Need to support decorators
**Solution**:
1. Implement decorator pattern:
- Create `CachedWorkRepository` decorator
- Create `CachedAuthorRepository` decorator
- Create `CachedTranslationRepository` decorator
- Decorators wrap base repositories
2. Implement cache-aside pattern:
- Check cache on read, populate on miss
- Invalidate cache on write operations
- Add cache key versioning strategy
3. Add cache configuration:
- TTL per entity type
- Cache size limits
- Cache warming strategies
4. Add cache metrics:
- Hit/miss rates
- Cache size
- Eviction counts
**Acceptance Criteria**:
- Decorator pattern implemented
- Cache hit rate > 70% for reads
- Automatic cache invalidation on updates
- Cache failures don't break application
- Metrics for cache performance
### 8. Complete API Documentation (P1 - High Priority)
**Problem**: API documentation is incomplete. Need comprehensive GraphQL API documentation.
**Current State**:
- GraphQL schema exists but lacks descriptions
- No example queries
- No API guide for consumers
**Affected Files**:
- GraphQL schema files - Need descriptions
- `api/README.md` - Needs comprehensive guide
- All resolver implementations - Need documentation
**Solution**:
1. Add descriptions to GraphQL schema:
- Document all types, queries, mutations
- Add field descriptions
- Document input validation rules
- Add deprecation notices where applicable
2. Create comprehensive API documentation:
- `api/README.md` - Complete API guide
- `api/EXAMPLES.md` - Query examples
- Document authentication requirements
- Document rate limiting
- Document error responses
3. Enhance GraphQL Playground:
- Pre-populate with example queries
- Add query templates
- Document schema changes
**Acceptance Criteria**:
- All 80+ GraphQL resolvers documented
- Example queries for each operation
- Input validation rules documented
- Error response examples
- Authentication requirements clear
- API changelog maintained
### 9. Refactor Testing Utilities (P2 - Medium Priority)
**Problem**: Tests depend on live database connections, making them slow and unreliable.
**Current State**:
- `internal/testutil/testutil.go` has database connection logic
- Integration tests require live database
- Tests are slow and may be flaky
**Affected Files**:
- `internal/testutil/testutil.go` - Database connection logic
- All integration tests - Depend on live DB
**Solution**:
1. Decouple tests from live database:
- Remove database connection from testutil
- Use test containers for integration tests
- Use mocks for unit tests
2. Improve test utilities:
- Create test data builders
- Add fixtures for common scenarios
- Improve test isolation
3. Add parallel test execution:
- Enable `-parallel` flag where safe
- Use test-specific database schemas
- Clean up test data properly
**Acceptance Criteria**:
- Unit tests run without database
- Integration tests use test containers
- Tests run in parallel where possible
- Test execution time < 5 seconds for unit tests
- Clear separation between unit and integration tests
### 10. Implement Analytics Features (P2 - Medium Priority)
**Problem**: Analytics service exists but some metrics are stubs (like, comment, bookmark counting).
**Current State**:
- `internal/jobs/linguistics/work_analysis_service.go` has TODO comments:
- Line 184: ViewCount TODO
- Line 185: LikeCount TODO
- Line 186: CommentCount TODO
- Line 187: BookmarkCount TODO
- Line 188: TranslationCount TODO
- Line 192: PopularTranslations TODO
**Affected Files**:
- `internal/jobs/linguistics/work_analysis_service.go` - Stub implementations
- `internal/app/analytics/*` - Analytics services
**Solution**:
1. Implement counting services:
- Like counting service
- Comment counting service
- Bookmark counting service
- Translation counting service
- View counting service
2. Implement popular translations calculation:
- Calculate based on likes, comments, bookmarks
- Cache results for performance
- Update periodically via background job
3. Add analytics to work analysis:
- Integrate counting services
- Update WorkAnalytics struct
- Ensure data is accurate and up-to-date
**Acceptance Criteria**:
- All analytics metrics implemented
- Popular translations calculated correctly
- Analytics updated in real-time or near-real-time
- Performance optimized (cached where appropriate)
- Tests for all analytics features
## Implementation Guidelines
1. **Architecture First**: Maintain clean architecture, DDD, and CQRS patterns
2. **Backward Compatibility**: Ensure API contracts remain consistent
3. **Code Quality**:
- Follow Go best practices and idioms
- Use interfaces for testability
- Maintain separation of concerns
- Add comprehensive error handling
4. **Testing**: Write tests for all new features and refactorings
5. **Documentation**: Add GoDoc comments for all public APIs
6. **Performance**: Optimize for production workloads
7. **Observability**: Instrument all critical paths
## Expected Outcome
- Production-ready search functionality
- Proper dependency injection (no globals)
- Full observability (tracing, metrics, logging)
- Optimized queries with DTOs
- Comprehensive API documentation
- Fast, reliable test suite
- Complete analytics features
- Improved code maintainability
## Files to Prioritize
1. `internal/app/search/service.go` - Core search implementation (P0)
2. `internal/platform/config/config.go` - Configuration refactoring (P1)
3. `internal/observability/*` - Observability enhancements (P0)
4. `internal/app/*/queries.go` - DTO implementation (P1)
5. `internal/platform/log/*` - Logging improvements (P1)
6. `api/README.md` - API documentation (P1)
## Notes
- Codebase uses Go 1.25
- Follows DDD/CQRS/Clean Architecture patterns
- GraphQL API with gqlgen
- PostgreSQL with GORM
- Weaviate for vector search
- Redis for caching and job queue
- Docker for local development
- Existing tests should continue to pass
- Follow existing code style and patterns