tercul-backend/TODO.md

# TODO List for Tercul Go Application

---

## [x] Performance Improvements

- [x] **COMPLETED: Add pagination to all repository list operations** (High, 2d)
  - [x] /works: Add limit/offset support to repository and resolver
  - [x] /translations: Add limit/offset support to repository and resolver
  - [x] /authors: Add limit/offset support to repository and resolver
  - [x] /users: Add limit/offset support to repository and resolver
  - [x] /collections: Add limit/offset support to repository and resolver
  - [x] /tags: Add limit/offset support to repository and resolver
  - [x] /categories: Add limit/offset support to repository and resolver
  - [x] /comments: Add limit/offset support to repository and resolver
  - [x] /search: Add limit/offset support to repository and resolver
  - [x] Validate all endpoints for correct pagination and total count
  - [x] Add unit tests for paginated list operations
  - [x] Document pagination parameters in API docs
- [x] **COMPLETED: Refactor raw SQL queries to use GORM structured methods** (High, 1d)
  - [x] Identify all usages of raw SQL queries in repositories and sync jobs
  - [x] Refactor syncEntities in syncjob/entities_sync.go to use GORM methods
  - [x] Refactor any string-concatenated queries to parameterized GORM queries
  - [x] Validate correctness and performance of refactored queries
  - [x] Add unit tests for refactored query logic
  - [x] Document query changes and migration steps
- [ ] Implement batching for Weaviate operations (Medium, 2d)
- [x] **COMPLETED: Optimize linguistic analysis algorithms** (Medium, 2d)
  - [x] Introduced clean NLP ports/adapters (`LanguageDetector`, `SentimentProvider`, `KeywordProvider`)
  - [x] Integrated lingua-go (language detection) and GoVADER (sentiment) behind adapters
  - [x] Added TF-IDF-based keyword provider (lightweight, state-free)
  - [x] Bounded in-memory cache via LRU with config-driven capacity
  - [x] Switched text cache keys to SHA-256 content hashes
  - [x] Concurrent analysis: provider-aware and context-cancellable
  - [x] Config toggles for providers and cache TTL
- [x] **COMPLETED: Add database indexes for frequently queried fields** (Medium, 1d)
  - [x] Foreign key indexes for all relationships
  - [x] Unique indexes for constraint enforcement
  - [x] Timestamp indexes for sorting and filtering
  - [x] Composite indexes for complex queries
  - [x] Linguistic analysis indexes for performance
- [x] **COMPLETED: Implement Redis caching for hot data** (Medium, 2d)

## [x] Security Enhancements

- [x] **COMPLETED: Implement password hashing in User model** (Critical, 1d)
  - [x] bcrypt password hashing in BeforeSave hook
  - [x] CheckPassword method for password verification
  - [x] Automatic password hashing on model save
- [x] **COMPLETED: Move hardcoded credentials to environment variables/config** (Critical, 1d)
  - [x] Fixed internal/cmd/enrich/main.go to use config package
  - [x] Fixed internal/testutil/testutil.go to use config package
  - [x] All database connections now use environment variables
- [ ] Add comprehensive input validation for all GraphQL mutations (High, 2d)
- [x] **COMPLETED: Implement rate limiting for API and background jobs** (High, 2d)
  - [x] Rate limiting middleware implemented
  - [x] Configuration for rate limits in config package
- [x] **COMPLETED: Replace raw SQL with safe query builders to prevent SQL injection** (Critical, 1d)
  - [x] All repositories use GORM structured methods
  - [x] No raw SQL queries in production code

## [ ] Code Quality & Architecture

- [x] **REFACTORED: Split linguistics/analyzer.go into focused components** (Completed)
- [x] **COMPLETED: Clean NLP infrastructure and factory wiring**
  - [x] Ports for NLP capabilities with SRP/DRY boundaries
  - [x] Adapters for lingua-go and GoVADER with fallbacks
  - [x] Factory respects config toggles and wires providers
  - [x] Repository no longer leaks GORM into services; added methods for fetching work and analysis data
  - [x] Created `linguistics/text_analyzer.go` - Pure text analysis logic
  - [x] Created `linguistics/analysis_cache.go` - Caching logic with multiple strategies
  - [x] Created `linguistics/analysis_repository.go` - Database operations
  - [x] Created `linguistics/work_analysis_service.go` - Work-specific analysis coordination
  - [x] Created `linguistics/types.go` - Shared data structures
  - [x] Created `linguistics/text_utils.go` - Text processing utilities
  - [x] Created `linguistics/factory.go` - Component factory with dependency injection
- [x] **REFACTORED: Split main.go into focused components** (Completed)
  - [x] Created `internal/app/application_builder.go` - Application initialization
  - [x] Created `internal/app/server_factory.go` - Server creation and configuration
  - [x] Refactored `main.go` to use dependency injection and builders
- [x] **REFACTORED: Standardize repository implementation** (Completed)
  - [x] Improved BaseRepository with comprehensive error handling, validation, logging, and transaction support
  - [x] Removed GenericRepository wrapper (unnecessary duplication)
  - [x] Updated CachedRepository to use BaseRepository interface
  - [x] Refactored WorkRepository and UserRepository to use BaseRepository pattern
  - [x] Updated WorkService to use context in all repository calls
  - [x] Fixed GraphQL resolvers to use context for WorkRepository calls
  - [x] **REFACTORED: All repositories completed!** (Author, Tag, Category, Translation, Comment, Like, Bookmark, Collection, Book, Publisher, Country, Place, City, Source, Edition, UserProfile, UserSession, EmailVerification, PasswordReset, Contribution, Copyright, CopyrightClaim, Monetization, Edge)
  - [x] **COMPLETED: Updated mock repositories for testing**
  - [x] **COMPLETED: Updated services to use context in repository calls**
  - [x] **COMPLETED: Updated GraphQL resolvers to use context and handle pagination**
  - [x] **COMPLETED: Fixed linguistics package model field mismatches**
  - [x] **COMPLETED: Fixed application builder CopyrightRepository initialization**
  - [x] **COMPLETED: Fixed server factory configuration and interface issues**
  - [x] **COMPLETED: Removed all legacy code and interfaces**
  - [x] **COMPLETED: Project builds successfully!**
- [x] **COMPLETED: Add a service layer for business logic and validation** (High, 2d)
  - [x] Comprehensive validation in all service methods
  - [x] Business logic separation from repositories
  - [x] Input validation for all service operations
- [x] Refactor duplicate code in sync jobs (Medium, 1d)
- [x] **COMPLETED: Improve error handling with custom error types and propagation** (High, 2d)
  - [x] Custom error types defined in BaseRepository
  - [x] Error wrapping and propagation throughout codebase
  - [x] Standardized error handling patterns
- [ ] Expand Weaviate client to support all models (Medium, 2d)
- [ ] Add code documentation and API docs (Medium, 2d)

## [ ] Testing

- [ ] Add unit tests for all models, repositories, and services (High, 3d)
- [ ] Add integration tests for GraphQL API and background jobs (High, 3d)
- [ ] Add performance benchmarks for critical paths (Medium, 2d)
  - [x] Added unit tests for linguistics adapters (lingua-go, GoVADER) and utilities
  - [ ] Add benchmarks for text analysis (sequential vs concurrent) and cache hit/miss rates

## [x] Monitoring & Logging

- [x] **COMPLETED: Integrate a structured logging framework** (Medium, 1d)
  - [x] Structured logging implemented throughout codebase
  - [x] Performance timing and debug logging in repositories
  - [x] Error logging with context and structured fields
- [ ] Add monitoring for background jobs and API endpoints (Medium, 2d)
  - [ ] Add metrics for linguistics: analysis duration, cache hit/miss, provider usage

---

## Next Objective Proposal

- [ ] Stabilize non-linguistics tests and interfaces (High, 2d)
  - [ ] Fix `graph` mocks to accept context in service interfaces
  - [ ] Update `repositories` tests (missing `TestModel`) and align with new repository interfaces
  - [ ] Update `services` tests to pass context and implement missing repo methods in mocks
- [ ] Add performance benchmarks and metrics for linguistics (Medium, 2d)
  - [ ] Benchmarks for AnalyzeText (provider on/off, concurrency levels)
  - [ ] Export metrics and dashboards for analysis duration and cache effectiveness
- [ ] Documentation (Medium, 1d)
  - [ ] Document NLP provider toggles and defaults in README/config docs
  - [ ] Describe SRP/DRY design and extension points for new providers

## [x] Security & Auth

- [x] **COMPLETED: Implement JWT authentication and role-based authorization** (High, 2d)
  - [x] JWT token generation and validation with proper error handling
  - [x] Role-based authorization with hierarchy (reader < contributor < reviewer < editor < admin)
  - [x] Authentication middleware for GraphQL and HTTP with context validation
  - [x] Login and registration mutations with comprehensive input validation
  - [x] Password hashing with bcrypt (already implemented in User model)
  - [x] Environment variable configuration for JWT with secure defaults
  - [x] Comprehensive authentication service following SRP and clean code principles
  - [x] Structured logging with proper error context and performance timing
  - [x] Input sanitization and validation using govalidator
  - [x] Context validation and proper error propagation
  - [x] Integration with existing rate limiting system
  - [x] GraphQL schema alignment with Go models
  - [x] Comprehensive test coverage for authentication components
  - [x] Production-ready error handling and security practices
- [x] **COMPLETED: Add rate limiting middleware** (High, 1d)
  - [x] Rate limiting middleware implemented and tested
  - [x] Configuration-driven rate limits
- [x] **COMPLETED: Use environment variables for all sensitive config** (Critical, 1d)
  - [x] All database credentials use environment variables
  - [x] Redis configuration uses environment variables
  - [x] Centralized configuration management

---

> TODO items include context, priority, and estimated effort. Update this list after each milestone.