tercul-backend/TODO.md
Damir Mukimov 4957117cb6 Initial commit: Tercul Go project with comprehensive architecture
- Core Go application with GraphQL API using gqlgen
- Comprehensive data models for literary works, authors, translations
- Repository pattern with caching layer
- Authentication and authorization system
- Linguistics analysis capabilities with multiple adapters
- Vector search integration with Weaviate
- Docker containerization support
- Python data migration and analysis scripts
- Clean architecture with proper separation of concerns
- Production-ready configuration and middleware
- Proper .gitignore excluding vendor/, database files, and build artifacts
2025-08-13 07:42:32 +02:00

169 lines
9.9 KiB
Markdown

# TODO List for Tercul Go Application
---
## [x] Performance Improvements
- [x] **COMPLETED: Add pagination to all repository list operations** (High, 2d)
- [x] /works: Add limit/offset support to repository and resolver
- [x] /translations: Add limit/offset support to repository and resolver
- [x] /authors: Add limit/offset support to repository and resolver
- [x] /users: Add limit/offset support to repository and resolver
- [x] /collections: Add limit/offset support to repository and resolver
- [x] /tags: Add limit/offset support to repository and resolver
- [x] /categories: Add limit/offset support to repository and resolver
- [x] /comments: Add limit/offset support to repository and resolver
- [x] /search: Add limit/offset support to repository and resolver
- [x] Validate all endpoints for correct pagination and total count
- [x] Add unit tests for paginated list operations
- [x] Document pagination parameters in API docs
- [x] **COMPLETED: Refactor raw SQL queries to use GORM structured methods** (High, 1d)
- [x] Identify all usages of raw SQL queries in repositories and sync jobs
- [x] Refactor syncEntities in syncjob/entities_sync.go to use GORM methods
- [x] Refactor any string-concatenated queries to parameterized GORM queries
- [x] Validate correctness and performance of refactored queries
- [x] Add unit tests for refactored query logic
- [x] Document query changes and migration steps
- [ ] Implement batching for Weaviate operations (Medium, 2d)
- [x] **COMPLETED: Optimize linguistic analysis algorithms** (Medium, 2d)
- [x] Introduced clean NLP ports/adapters (`LanguageDetector`, `SentimentProvider`, `KeywordProvider`)
- [x] Integrated lingua-go (language detection) and GoVADER (sentiment) behind adapters
- [x] Added TF-IDF-based keyword provider (lightweight, state-free)
- [x] Bounded in-memory cache via LRU with config-driven capacity
- [x] Switched text cache keys to SHA-256 content hashes
- [x] Concurrent analysis: provider-aware and context-cancellable
- [x] Config toggles for providers and cache TTL
- [x] **COMPLETED: Add database indexes for frequently queried fields** (Medium, 1d)
- [x] Foreign key indexes for all relationships
- [x] Unique indexes for constraint enforcement
- [x] Timestamp indexes for sorting and filtering
- [x] Composite indexes for complex queries
- [x] Linguistic analysis indexes for performance
- [x] **COMPLETED: Implement Redis caching for hot data** (Medium, 2d)
## [x] Security Enhancements
- [x] **COMPLETED: Implement password hashing in User model** (Critical, 1d)
- [x] bcrypt password hashing in BeforeSave hook
- [x] CheckPassword method for password verification
- [x] Automatic password hashing on model save
- [x] **COMPLETED: Move hardcoded credentials to environment variables/config** (Critical, 1d)
- [x] Fixed internal/cmd/enrich/main.go to use config package
- [x] Fixed internal/testutil/testutil.go to use config package
- [x] All database connections now use environment variables
- [ ] Add comprehensive input validation for all GraphQL mutations (High, 2d)
- [x] **COMPLETED: Implement rate limiting for API and background jobs** (High, 2d)
- [x] Rate limiting middleware implemented
- [x] Configuration for rate limits in config package
- [x] **COMPLETED: Replace raw SQL with safe query builders to prevent SQL injection** (Critical, 1d)
- [x] All repositories use GORM structured methods
- [x] No raw SQL queries in production code
## [ ] Code Quality & Architecture
- [x] **REFACTORED: Split linguistics/analyzer.go into focused components** (Completed)
- [x] **COMPLETED: Clean NLP infrastructure and factory wiring**
- [x] Ports for NLP capabilities with SRP/DRY boundaries
- [x] Adapters for lingua-go and GoVADER with fallbacks
- [x] Factory respects config toggles and wires providers
- [x] Repository no longer leaks GORM into services; added methods for fetching work and analysis data
- [x] Created `linguistics/text_analyzer.go` - Pure text analysis logic
- [x] Created `linguistics/analysis_cache.go` - Caching logic with multiple strategies
- [x] Created `linguistics/analysis_repository.go` - Database operations
- [x] Created `linguistics/work_analysis_service.go` - Work-specific analysis coordination
- [x] Created `linguistics/types.go` - Shared data structures
- [x] Created `linguistics/text_utils.go` - Text processing utilities
- [x] Created `linguistics/factory.go` - Component factory with dependency injection
- [x] **REFACTORED: Split main.go into focused components** (Completed)
- [x] Created `internal/app/application_builder.go` - Application initialization
- [x] Created `internal/app/server_factory.go` - Server creation and configuration
- [x] Refactored `main.go` to use dependency injection and builders
- [x] **REFACTORED: Standardize repository implementation** (Completed)
- [x] Improved BaseRepository with comprehensive error handling, validation, logging, and transaction support
- [x] Removed GenericRepository wrapper (unnecessary duplication)
- [x] Updated CachedRepository to use BaseRepository interface
- [x] Refactored WorkRepository and UserRepository to use BaseRepository pattern
- [x] Updated WorkService to use context in all repository calls
- [x] Fixed GraphQL resolvers to use context for WorkRepository calls
- [x] **REFACTORED: All repositories completed!** (Author, Tag, Category, Translation, Comment, Like, Bookmark, Collection, Book, Publisher, Country, Place, City, Source, Edition, UserProfile, UserSession, EmailVerification, PasswordReset, Contribution, Copyright, CopyrightClaim, Monetization, Edge)
- [x] **COMPLETED: Updated mock repositories for testing**
- [x] **COMPLETED: Updated services to use context in repository calls**
- [x] **COMPLETED: Updated GraphQL resolvers to use context and handle pagination**
- [x] **COMPLETED: Fixed linguistics package model field mismatches**
- [x] **COMPLETED: Fixed application builder CopyrightRepository initialization**
- [x] **COMPLETED: Fixed server factory configuration and interface issues**
- [x] **COMPLETED: Removed all legacy code and interfaces**
- [x] **COMPLETED: Project builds successfully!**
- [x] **COMPLETED: Add a service layer for business logic and validation** (High, 2d)
- [x] Comprehensive validation in all service methods
- [x] Business logic separation from repositories
- [x] Input validation for all service operations
- [x] Refactor duplicate code in sync jobs (Medium, 1d)
- [x] **COMPLETED: Improve error handling with custom error types and propagation** (High, 2d)
- [x] Custom error types defined in BaseRepository
- [x] Error wrapping and propagation throughout codebase
- [x] Standardized error handling patterns
- [ ] Expand Weaviate client to support all models (Medium, 2d)
- [ ] Add code documentation and API docs (Medium, 2d)
## [ ] Testing
- [ ] Add unit tests for all models, repositories, and services (High, 3d)
- [ ] Add integration tests for GraphQL API and background jobs (High, 3d)
- [ ] Add performance benchmarks for critical paths (Medium, 2d)
- [x] Added unit tests for linguistics adapters (lingua-go, GoVADER) and utilities
- [ ] Add benchmarks for text analysis (sequential vs concurrent) and cache hit/miss rates
## [x] Monitoring & Logging
- [x] **COMPLETED: Integrate a structured logging framework** (Medium, 1d)
- [x] Structured logging implemented throughout codebase
- [x] Performance timing and debug logging in repositories
- [x] Error logging with context and structured fields
- [ ] Add monitoring for background jobs and API endpoints (Medium, 2d)
- [ ] Add metrics for linguistics: analysis duration, cache hit/miss, provider usage
---
## Next Objective Proposal
- [ ] Stabilize non-linguistics tests and interfaces (High, 2d)
- [ ] Fix `graph` mocks to accept context in service interfaces
- [ ] Update `repositories` tests (missing `TestModel`) and align with new repository interfaces
- [ ] Update `services` tests to pass context and implement missing repo methods in mocks
- [ ] Add performance benchmarks and metrics for linguistics (Medium, 2d)
- [ ] Benchmarks for AnalyzeText (provider on/off, concurrency levels)
- [ ] Export metrics and dashboards for analysis duration and cache effectiveness
- [ ] Documentation (Medium, 1d)
- [ ] Document NLP provider toggles and defaults in README/config docs
- [ ] Describe SRP/DRY design and extension points for new providers
## [x] Security & Auth
- [x] **COMPLETED: Implement JWT authentication and role-based authorization** (High, 2d)
- [x] JWT token generation and validation with proper error handling
- [x] Role-based authorization with hierarchy (reader < contributor < reviewer < editor < admin)
- [x] Authentication middleware for GraphQL and HTTP with context validation
- [x] Login and registration mutations with comprehensive input validation
- [x] Password hashing with bcrypt (already implemented in User model)
- [x] Environment variable configuration for JWT with secure defaults
- [x] Comprehensive authentication service following SRP and clean code principles
- [x] Structured logging with proper error context and performance timing
- [x] Input sanitization and validation using govalidator
- [x] Context validation and proper error propagation
- [x] Integration with existing rate limiting system
- [x] GraphQL schema alignment with Go models
- [x] Comprehensive test coverage for authentication components
- [x] Production-ready error handling and security practices
- [x] **COMPLETED: Add rate limiting middleware** (High, 1d)
- [x] Rate limiting middleware implemented and tested
- [x] Configuration-driven rate limits
- [x] **COMPLETED: Use environment variables for all sensitive config** (Critical, 1d)
- [x] All database credentials use environment variables
- [x] Redis configuration uses environment variables
- [x] Centralized configuration management
---
> TODO items include context, priority, and estimated effort. Update this list after each milestone.