mirror of https://github.com/SamyRai/tercul-backend.git synced 2025-12-27 05:11:34 +00:00

Add Bleve search integration with hybrid search capabilities

- Add Bleve client for keyword search functionality
- Integrate Bleve service into application builder
- Add BleveIndexPath configuration
- Update domain mappings for proper indexing
- Add comprehensive documentation and tests

2025-11-27 03:40:48 +01:00

24 KiB

Raw Blame History

TERCUL Go Service Architecture & Migration Plan

Executive Summary

This document outlines the architecture and implementation plan for rebuilding the TERCUL cultural exchange platform using Go. The current system contains over 1 million records across 19 tables, including authors, works, translations, and multimedia content in multiple languages. This plan ensures data preservation while modernizing the architecture for better performance, maintainability, and scalability.

Current System Analysis

Data Volume & Structure

Total Records: 1,031,288
Tables: 19
Data Quality Issues: 106 identified issues
Languages: Multi-language support (Russian, English, Tatar, etc.)

Core Entities

Authors (4,810 records) - Core creators with biographical information
Works (52,759 records) - Literary pieces (poetry, prose, etc.)
Work Translations (53,133 records) - Multi-language versions of works
Books (12 records) - Published collections
Countries (243 records) - Geographic information
Copyrights (130 records) - Rights management
Media Assets (3,627 records) - Images and files

Data Quality Issues Identified

Invalid timestamp formats in metadata fields
UUID format inconsistencies
HTML content in text fields
YAML/Ruby format contamination in content
Very long content fields (up to 19,913 characters)

Target Architecture

1. Technology Stack

Backend

Language: Go 1.24+ (latest stable)
Framework: Chi router + Echo (minimal, fast HTTP)
Database: PostgreSQL 16+ (migrated from SQLite)
ORM: GORM v3 or SQLC (type-safe SQL)
Authentication: JWT + OAuth2
API Documentation: OpenAPI 3.0 + Swagger

Infrastructure

Containerization: Docker + Docker Compose
Database Migration: Golang-migrate
Configuration: Viper + environment variables
Logging: Zerolog (structured, fast)
Monitoring: Prometheus + Grafana
Testing: Testify + GoConvey

2. System Architecture

Layered Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Presentation Layer                       │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │   HTTP API  │  │   GraphQL   │  │   Admin Interface   │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                     Business Layer                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │   Services  │  │   Handlers  │  │   Middleware        │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                     Data Layer                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │ Repositories│  │   Models    │  │   Migrations        │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                   Infrastructure Layer                       │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │ PostgreSQL  │  │   Redis     │  │   File Storage      │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Domain-Driven Design Structure

internal/
├── domain/           # Core business entities
│   ├── author/
│   ├── work/
│   ├── translation/
│   ├── book/
│   ├── country/
│   └── copyright/
├── application/      # Use cases & business logic
│   ├── services/
│   ├── handlers/
│   └── middleware/
├── infrastructure/   # External concerns
│   ├── database/
│   ├── storage/
│   ├── cache/
│   └── external/
└── presentation/     # API & interfaces
    ├── http/
    ├── graphql/
    └── admin/

3. Database Design

Improved Schema (PostgreSQL)

Core Tables

-- Authors table with improved structure
CREATE TABLE authors (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    slug VARCHAR(255) UNIQUE NOT NULL,
    date_of_birth DATE,
    date_of_death DATE,
    birth_precision VARCHAR(20) CHECK (birth_precision IN ('year', 'month', 'day', 'custom')),
    death_precision VARCHAR(20) CHECK (death_precision IN ('year', 'month', 'day', 'custom')),
    custom_birth_date VARCHAR(255),
    custom_death_date VARCHAR(255),
    is_top BOOLEAN DEFAULT FALSE,
    is_draft BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Author translations with proper language support
CREATE TABLE author_translations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    author_id UUID NOT NULL REFERENCES authors(id) ON DELETE CASCADE,
    language_code VARCHAR(10) NOT NULL,
    first_name VARCHAR(255),
    last_name VARCHAR(255),
    full_name VARCHAR(500),
    place_of_birth VARCHAR(500),
    place_of_death VARCHAR(500),
    biography TEXT,
    pen_names JSONB,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(author_id, language_code)
);

-- Works with improved categorization
CREATE TABLE works (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    author_id UUID NOT NULL REFERENCES authors(id) ON DELETE CASCADE,
    slug VARCHAR(255) UNIQUE NOT NULL,
    literature_type VARCHAR(50) NOT NULL,
    date_created DATE,
    date_created_precision VARCHAR(20),
    age_restrictions VARCHAR(100),
    genres JSONB,
    is_top BOOLEAN DEFAULT FALSE,
    is_draft BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Work translations with content management
CREATE TABLE work_translations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    work_id UUID NOT NULL REFERENCES works(id) ON DELETE CASCADE,
    language_code VARCHAR(10) NOT NULL,
    title VARCHAR(500) NOT NULL,
    alternative_title VARCHAR(500),
    body TEXT,
    translator VARCHAR(255),
    date_translated DATE,
    is_original_language BOOLEAN DEFAULT FALSE,
    audio_url VARCHAR(1000),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(work_id, language_code)
);

-- Countries with translations
CREATE TABLE countries (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    iso_code VARCHAR(3) UNIQUE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE TABLE country_translations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    country_id UUID NOT NULL REFERENCES countries(id) ON DELETE CASCADE,
    language_code VARCHAR(10) NOT NULL,
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(country_id, language_code)
);

-- Copyrights with proper metadata
CREATE TABLE copyrights (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    identifier VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE TABLE copyright_translations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    copyright_id UUID NOT NULL REFERENCES copyrights(id) ON DELETE CASCADE,
    language_code VARCHAR(10) NOT NULL,
    message TEXT,
    description TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(copyright_id, language_code)
);

-- Media assets with improved metadata
CREATE TABLE media_assets (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    filename VARCHAR(500) NOT NULL,
    original_filename VARCHAR(500),
    content_type VARCHAR(100) NOT NULL,
    byte_size BIGINT NOT NULL,
    checksum VARCHAR(255) NOT NULL,
    storage_path VARCHAR(1000) NOT NULL,
    metadata JSONB,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE TABLE media_attachments (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    media_asset_id UUID NOT NULL REFERENCES media_assets(id) ON DELETE CASCADE,
    record_type VARCHAR(100) NOT NULL,
    record_id UUID NOT NULL,
    attachment_type VARCHAR(50) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Indexes for Performance

-- Performance indexes
CREATE INDEX idx_authors_slug ON authors(slug);
CREATE INDEX idx_authors_top ON authors(is_top) WHERE is_top = true;
CREATE INDEX idx_works_author_id ON works(author_id);
CREATE INDEX idx_works_literature_type ON works(literature_type);
CREATE INDEX idx_work_translations_work_id ON work_translations(work_id);
CREATE INDEX idx_work_translations_language ON work_translations(language_code);
CREATE INDEX idx_author_translations_author_id ON author_translations(author_id);
CREATE INDEX idx_author_translations_language ON author_translations(language_code);
CREATE INDEX idx_media_attachments_record ON media_attachments(record_type, record_id);

-- Full-text search indexes
CREATE INDEX idx_work_translations_title_fts ON work_translations USING gin(to_tsvector('english', title));
CREATE INDEX idx_author_translations_name_fts ON author_translations USING gin(to_tsvector('english', full_name));

4. API Design

RESTful Endpoints

GET    /api/v1/authors                    # List authors with pagination
GET    /api/v1/authors/{slug}             # Get author by slug
GET    /api/v1/authors/{slug}/works       # Get author's works
GET    /api/v1/works                      # List works with filters
GET    /api/v1/works/{slug}               # Get work by slug
GET    /api/v1/works/{slug}/translations  # Get work translations
GET    /api/v1/translations               # Search translations
GET    /api/v1/countries                  # List countries
GET    /api/v1/search                     # Global search endpoint

GraphQL Schema

type Author {
  id: ID!
  slug: String!
  dateOfBirth: Date
  dateOfDeath: Date
  translations(language: String): [AuthorTranslation!]!
  works: [Work!]!
  countries: [Country!]!
}

type Work {
  id: ID!
  slug: String!
  literatureType: String!
  author: Author!
  translations(language: String): [WorkTranslation!]!
  genres: [String!]
}

type Query {
  author(slug: String!): Author
  authors(filter: AuthorFilter, pagination: Pagination): AuthorConnection!
  work(slug: String!): Work
  works(filter: WorkFilter, pagination: Pagination): WorkConnection!
  search(query: String!, language: String): SearchResult!
}

5. Migration Strategy

Phase 1: Data Extraction & Validation

Data Export: Extract all data from SQLite dump
Data Cleaning: Fix identified quality issues
Schema Mapping: Map old schema to new structure
Data Validation: Verify data integrity

Phase 2: Database Migration

PostgreSQL Setup: Initialize new database
Schema Creation: Create new tables with constraints
Data Import: Import cleaned data
Index Creation: Build performance indexes
Data Verification: Cross-check record counts

Phase 3: Application Development

Core Models: Implement domain entities
Repository Layer: Data access abstraction
Service Layer: Business logic implementation
API Layer: HTTP handlers and middleware
Admin Interface: Content management system

Phase 4: Testing & Deployment

Unit Tests: Test individual components
Integration Tests: Test database interactions
API Tests: Test HTTP endpoints
Performance Tests: Load testing
Staging Deployment: Production-like environment

6. Data Migration Scripts

Go Migration Tool

// cmd/migrate/main.go
package main

import (
    "database/sql"
    "encoding/json"
    "log"
    "os"
    
    _ "github.com/lib/pq"
    "github.com/google/uuid"
)

type MigrationService struct {
    sourceDB *sql.DB
    targetDB *sql.DB
}

func (m *MigrationService) MigrateAuthors() error {
    // Extract from SQLite
    rows, err := m.sourceDB.Query(`
        SELECT id, date_of_birth, date_of_death, is_top, is_draft, slug
        FROM authors
    `)
    if err != nil {
        return err
    }
    defer rows.Close()
    
    // Insert into PostgreSQL with proper UUID conversion
    for rows.Next() {
        var author Author
        if err := rows.Scan(&author.ID, &author.DateOfBirth, &author.DateOfDeath, 
                           &author.IsTop, &author.IsDraft, &author.Slug); err != nil {
            return err
        }
        
        // Convert string ID to UUID if needed
        if author.ID == "" {
            author.ID = uuid.New().String()
        }
        
        if err := m.insertAuthor(author); err != nil {
            return err
        }
    }
    
    return nil
}

func (m *MigrationService) MigrateAuthorTranslations() error {
    // Similar migration for author translations
    // Handle language codes, text cleaning, etc.
    return nil
}

7. Performance Optimizations

Caching Strategy

Redis: Cache frequently accessed data
Application Cache: In-memory caching for hot data
CDN: Static asset delivery

Database Optimizations

Connection Pooling: Efficient database connections
Query Optimization: Optimized SQL queries
Read Replicas: Scale read operations
Partitioning: Large table partitioning

Search Optimization

Full-Text Search: PostgreSQL FTS for text search
Vector Search: Embedding-based similarity search
Elasticsearch: Advanced search capabilities

8. Security Considerations

Authentication & Authorization

JWT Tokens: Secure API authentication
Role-Based Access: Admin, editor, viewer roles
API Rate Limiting: Prevent abuse
Input Validation: Sanitize all inputs

Data Protection

HTTPS Only: Encrypted communication
SQL Injection Prevention: Parameterized queries
XSS Protection: Content sanitization
CORS Configuration: Controlled cross-origin access

9. Monitoring & Observability

Metrics Collection

Prometheus: System and business metrics
Grafana: Visualization and dashboards
Health Checks: Service health monitoring

Logging Strategy

Structured Logging: JSON format logs
Log Levels: Debug, info, warn, error
Log Aggregation: Centralized log management
Audit Trail: Track all data changes

10. Deployment Architecture

Container Strategy

# docker-compose.yml
version: '3.8'
services:
  app:
    build: .
    ports:
      - "8080:8080"
    environment:
      - DB_HOST=postgres
      - REDIS_HOST=redis
    depends_on:
      - postgres
      - redis
      
  postgres:
    image: postgres:16
    environment:
      - POSTGRES_DB=tercul
      - POSTGRES_USER=tercul
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      
  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
      
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl

Production Considerations

Load Balancing: Multiple app instances
Auto-scaling: Kubernetes or similar
Backup Strategy: Automated database backups
Disaster Recovery: Multi-region deployment

11. Development Workflow

Code Organization

cmd/
├── migrate/          # Migration tool
├── seed/            # Data seeding
└── server/          # Main application

internal/
├── domain/          # Business entities
├── application/     # Use cases
├── infrastructure/  # External concerns
└── presentation/    # API layer

pkg/                 # Shared packages
├── database/
├── validation/
└── utils/

scripts/             # Build and deployment
├── build.sh
├── deploy.sh
└── migrate.sh

docs/                # Documentation
├── api/
├── deployment/
└── development/

Testing Strategy

Unit Tests: 90%+ coverage target
Integration Tests: Database and API testing
Performance Tests: Load and stress testing
Security Tests: Vulnerability scanning

12. Timeline & Milestones

Week 1-2: Foundation ✅ COMPLETED

Project setup and structure
Database schema design
Basic Go modules setup
Docker development environment
Build system and Makefile
Code quality tools (go vet, go fmt)

Week 3-4: Data Migration 🚧 IN PROGRESS

Migration tool development (basic structure)
PostgreSQL schema initialization script
Database connection setup
SQLite driver integration
Data extraction and cleaning
PostgreSQL setup and import

Week 5-6: Core Models 🚧 IN PROGRESS

Author domain entity implementation
AuthorTranslation domain entity
Domain error handling
Business logic and validation
Work and WorkTranslation entities
Book and BookTranslation entities
Country and CountryTranslation entities
Copyright and Media entities
Repository layer
Basic CRUD operations

Week 7-8: API Development 📋 PLANNED

Basic HTTP server structure
Health check endpoints
HTTP handlers
Middleware implementation
RESTful API endpoints
GraphQL schema and resolvers

Week 9-10: Advanced Features 📋 PLANNED

Search functionality
Admin interface
Authentication system
Role-based access control

Week 11-12: Testing & Deployment 📋 PLANNED

Comprehensive testing
Performance optimization
Production deployment
Monitoring and alerting

13. Risk Mitigation

Technical Risks

Data Loss: Multiple backup strategies
Performance Issues: Load testing and optimization
Integration Problems: Comprehensive testing

Business Risks

Downtime: Blue-green deployment
Data Corruption: Validation and verification
User Experience: Gradual rollout

14. Success Metrics

Technical Metrics

Response Time: < 200ms for 95% of requests
Uptime: 99.9% availability
Error Rate: < 0.1% error rate

Business Metrics

Data Preservation: 100% record migration
Performance: 10x improvement in search speed
Maintainability: Reduced development time

15. Future Enhancements

Phase 2 Features

Machine Learning: Content recommendations
Advanced Search: Semantic search capabilities
Mobile App: Native mobile applications
API Marketplace: Third-party integrations

Scalability Plans

Microservices: Service decomposition
Event Sourcing: Event-driven architecture
Multi-tenancy: Support for multiple organizations

Current Implementation Status

✅ Completed Components

Project Foundation

Go Module Setup: Go 1.25+ with all dependencies resolved
Project Structure: Clean architecture with proper directory organization
Build System: Makefile with comprehensive development commands
Docker Environment: PostgreSQL 16+, Redis 7+, Adminer, Redis Commander
Code Quality: Passes go build, go vet, go fmt, go mod verify

Domain Models

Author Entity: Complete with validation, business logic, and GORM tags
AuthorTranslation Entity: Multi-language support with JSONB handling
Error Handling: Comprehensive domain-specific error definitions
Business Logic: Age calculation, validation rules, data integrity

Infrastructure

Database Schema: Complete PostgreSQL initialization script
Migration Tools: Basic structure for data migration pipeline
HTTP Server: Basic server with health check endpoints
Configuration: Environment-based configuration with Viper

🚧 In Progress

Data Migration Pipeline

Migration Service: Basic structure implemented
PostgreSQL Connection: Working database connection
Schema Creation: Ready for database initialization
Next Steps: SQLite driver integration and actual data migration

Domain Model Expansion

Core Entities: Author domain complete, ready for expansion
Next Steps: Work, Book, Country, Copyright, and Media entities

📋 Next Priority Items

Complete Domain Models (Week 1-2)
- Implement Work and WorkTranslation entities
- Implement Book and BookTranslation entities
- Implement Country and CountryTranslation entities
- Implement Copyright and Media entities
Repository Layer (Week 3-4)
- Database repositories for all entities
- Data access abstractions
- Transaction management
Data Migration (Week 3-4)
- SQLite driver integration
- Data extraction and cleaning
- Migration testing and validation

🎯 Immediate Next Steps

Start Docker Environment
```
make docker-up
```

Initialize Database

# Database will auto-initialize with schema
docker-compose up -d postgres

Implement Next Domain Models
- Work entity with literature types
- Country entity with translations
- Book entity with publication data
Test Data Migration
- Connect to existing SQLite dump
- Validate data extraction
- Test PostgreSQL import

Conclusion

This architecture provides a solid foundation for rebuilding the TERCUL platform in Go while ensuring data preservation and improving performance, maintainability, and scalability. The phased approach minimizes risk and allows for iterative development and testing.

Current Status: Foundation complete, ready for domain model expansion and data migration implementation.

The new system will be more robust, faster, and easier to maintain while preserving all existing cultural content and relationships. The modern technology stack ensures long-term sustainability and provides a foundation for future enhancements.

24 KiB Raw Blame History

TERCUL Go Service Architecture & Migration Plan

Executive Summary

Current System Analysis

Data Volume & Structure

Core Entities

Data Quality Issues Identified

Target Architecture

1. Technology Stack

Backend

Infrastructure

2. System Architecture

Layered Architecture

Domain-Driven Design Structure

3. Database Design

Improved Schema (PostgreSQL)

Core Tables

Indexes for Performance

4. API Design

RESTful Endpoints

GraphQL Schema

5. Migration Strategy

Phase 1: Data Extraction & Validation

Phase 2: Database Migration

Phase 3: Application Development

Phase 4: Testing & Deployment

6. Data Migration Scripts

Go Migration Tool

7. Performance Optimizations

Caching Strategy

Database Optimizations

Search Optimization

8. Security Considerations

Authentication & Authorization

Data Protection

9. Monitoring & Observability

Metrics Collection

Logging Strategy

10. Deployment Architecture

Container Strategy

Production Considerations

11. Development Workflow

Code Organization

Testing Strategy

12. Timeline & Milestones

Week 1-2: Foundation ✅ COMPLETED

Week 3-4: Data Migration 🚧 IN PROGRESS

Week 5-6: Core Models 🚧 IN PROGRESS

Week 7-8: API Development 📋 PLANNED

Week 9-10: Advanced Features 📋 PLANNED

Week 11-12: Testing & Deployment 📋 PLANNED

13. Risk Mitigation

Technical Risks

Business Risks

14. Success Metrics

Technical Metrics

Business Metrics

15. Future Enhancements

Phase 2 Features

Scalability Plans

Current Implementation Status

✅ Completed Components

Project Foundation

Domain Models

Infrastructure

🚧 In Progress

Data Migration Pipeline

Domain Model Expansion

📋 Next Priority Items

🎯 Immediate Next Steps

Conclusion

24 KiB

Raw Blame History