mirror of https://github.com/SamyRai/tercul-backend.git synced 2025-12-26 22:21:33 +00:00

google-labs-jules[bot] c4b4319ae8 This commit updates the TODO.md and refactor.md files to reflect the latest architectural changes. It also removes several temporary and one-off script files to clean up the repository.

Key changes:
- Marked the "Adopt migrations tool" and "Resolvers call application services only" tasks as complete in `TODO.md`.
- Updated the "Unify GraphQL" and "Migrations" sections in `refactor.md` to reflect the completed work.
- Removed the following temporary files:
  - `create_repo_interfaces.go`
  - `fix_domain_repos.go`
  - `fix_sql_imports.go`
  - `report.md`
  - `validate.py`

2025-10-03 03:02:46 +00:00

11 KiB

Raw Permalink Blame History

Short, sharp audit. You’ve got good bones but too many cross-cutting seams: duplicated GraphQL layers, mixed Python ops scripts with runtime code, domain spread across “models/ + repositories/ + services/” without clear aggregate boundaries, and infra (cache/db/auth) bleeding into app layer. Here’s a tighter, execution-ready structure and the reasoning behind each cut.

1) Target repo layout (Go standards + DDD-lite)

.
├── cmd/
│   ├── api/                 # main GraphQL/HTTP server
│   │   └── main.go
│   ├── worker/              # background jobs (sync, enrichment)
│   │   └── main.go
│   └── tools/               # one-off CLIs (e.g., enrich)
│       └── enrich/
│           └── main.go
├── internal/
│   ├── platform/            # cross-cutting infra (private)
│   │   ├── config/          # config load/validate
│   │   ├── db/              # connection pool, migrations runner, uow/tx helpers
│   │   ├── cache/           # redis client + cache abstractions
│   │   ├── auth/            # jwt, middleware, authn/z policies
│   │   ├── http/            # router, middleware (rate limit, recovery, observability)
│   │   ├── log/             # logger facade
│   │   └── search/          # weaviate client, schema mgmt
│   ├── domain/              # business concepts & interfaces only
│   │   ├── work/
│   │   │   ├── entity.go        # Work, Value Objects, invariants
│   │   │   ├── repo.go          # interface WorkRepository
│   │   │   └── service.go       # domain service interfaces (pure)
│   │   ├── author/
│   │   ├── user/
│   │   └── ... (countries, tags, etc.)
│   ├── data/                # data access (implement domain repos)
│   │   ├── sql/             # sqlc or squirrel; concrete repos
│   │   ├── cache/           # cached repos/decorators (per-aggregate)
│   │   └── migrations/      # *.sql, versioned
│   ├── app/                 # application services (orchestrate use cases)
│   │   ├── work/
│   │   │   ├── commands.go  # Create/Update ops
│   │   │   └── queries.go   # Read models, listings
│   │   └── ...              # other aggregates
│   ├── adapters/
│   │   ├── graphql/         # gqlgen resolvers map → app layer (one place!)
│   │   │   ├── schema.graphqls
│   │   │   ├── generated.go
│   │   │   └── resolvers.go
│   │   └── http/            # (optional) REST handlers if any
│   ├── jobs/                # background jobs, queues, schedulers
│   │   ├── sync/            # edges/entities sync
│   │   └── linguistics/     # text analysis pipelines
│   └── observability/
│       ├── metrics.go
│       └── tracing.go
├── pkg/                     # public reusable libs (if truly reusable)
│   └── linguistics/         # only if you intend external reuse; else keep in internal/
├── api/                     # GraphQL docs & examples; schema copies for consumers
│   └── README.md
├── deploy/
│   ├── docker/              # Dockerfile(s), compose for dev
│   └── k8s/                 # manifests/helm (if/when)
├── ops/                     # data migration & analysis (Python lives here)
│   ├── migration/
│   │   ├── scripts/*.py
│   │   ├── reports/*.md|.json
│   │   └── inputs/outputs/  # authors.json, works.json, etc.
│   └── analysis/
│       └── notebooks|scripts
├── test/
│   ├── integration/         # black-box tests; spins containers
│   ├── fixtures/            # testdata
│   └── e2e/
├── Makefile
├── go.mod
└── README.md

Why this wins

One GraphQL layer: you currently have both /graph and /graphql. Kill one. Put schema+resolvers under internal/adapters/graphql. Adapters call application services, not repos directly.
Domain isolation: internal/domain/* holds entities/value objects and interfaces only. No SQL or Redis here.
Data layer as a replaceable detail: internal/data/sql implements domain repositories (and adds caching as decorators in internal/data/cache).
Background jobs are first-class: move syncjob, linguistics processing into internal/jobs/* and run them via cmd/worker.
Python is ops-only: all migration/one-off analysis goes to /ops. Don’t ship Python into the runtime container.
Infra cohesion: auth, cache, db pools, http middleware under internal/platform/. You had them scattered across auth/, middleware/, db/, cache/.

2) Specific refactors (high ROI)

Unify GraphQL [COMPLETED]

Delete one of: /graph or /graphql. Keep gqlgen in internal/adapters/graphql. [COMPLETED]
Put schema.graphqls there. Configure gqlgen.yml to output generated code in the same package. [COMPLETED]
Resolvers should call internal/app/* use-cases (not repos), returning read models tailored for GraphQL. [COMPLETED]

Introduce Unit-of-Work (UoW) + Transaction boundaries

In internal/platform/db, add WithTx(ctx, func(ctx context.Context) error) that injects transactional repos into the app layer.
Repos get created from a factory bound to *sql.DB or *sql.Tx.
This eliminates hidden transaction bugs across services.

Split Write vs Read paths (lightweight CQRS)

In internal/app/work/commands.go, keep strict invariants (create/update/merge).
In internal/app/work/queries.go, return view models optimized for UI/GraphQL (joins, denormalized fields), leveraging read-only query helpers.
Keep read models cacheable independently (Redis).

Cache as decorators, not bespoke repos

Replace cached_*_repository.go proliferation with decorator pattern:
- type CachedWorkRepo struct { inner WorkRepository; cache Cache }
- Only decorate reads. Writes invalidate keys deterministically.
- Move all cache code to internal/data/cache.

Models package explosion → domain aggregates

Current models/*.go mixes everything. Group by aggregate (work, author, user, …). Co-locate value objects and invariants. Keep constructors that validate invariants (no anemic structs).

Migrations [COMPLETED]

Move raw SQL to internal/data/migrations (or /migrations at repo root) and adopt a tool (goose, atlas, migrate). Delete migrations.go hand-rollers. [COMPLETED]
Version generated tercul_schema.sql as snapshots in /ops/migration/outputs/ instead of in runtime code. [COMPLETED]

Observability

Centralize logging (internal/platform/log), add request IDs, user IDs (if any), and span IDs.
Add Prometheus metrics and OpenTelemetry tracing (internal/observability). Wire to router and DB.

Config

Replace ad-hoc config/config.go with strict struct + env parsing + validation (envconfig or koanf). No globals; inject via constructors.

Security

Move JWT + middleware under internal/platform/auth. Add authz policy functions (e.g., CanEditWork(user, work)).
Make resolvers fetch user from context once.

Weaviate

Put client + schema code in internal/platform/search. Provide an interface in internal/domain/search only if you truly need to swap engines.

Testing

test/integration: spin Postgres/Redis via docker-compose; seed minimal fixtures.
Use make test-integration target.
Favor table-driven tests at app layer. Cut duplicated repo tests; test behavior via app services + a fake repo.

Delete dead duplication

graph/ vs graphql/ → one.
repositories/*_repository.go vs internal/store → one place: internal/data/sql.
services/work_service.go vs resolvers doing business logic → all business logic in internal/app/*.

3) gqlgen wiring (clean, dependency-safe)

internal/adapters/graphql/resolvers.go should accept a single Application façade:

type Application struct {
    Works   app.WorkService
    Authors app.AuthorService
    // ...
}

Construct Application in cmd/api/main.go by wiring platform/db, repos, caches, and services.
Resolvers never import platform/* or data/*.

4) Background jobs: make them boring & reliable

cmd/worker/main.go loads the same DI container, then registers jobs:
- jobs/linguistics.Pipeline (tokenizer → POS → lemmas → phonetic → analysis repo)
- jobs/sync.Entities/Edges
Use asynq or a simple cron (robfig/cron) depending on needs. Each job is idempotent and has a lease (prevent overlaps).

5) Python: isolate and containerize for ops

Move data_extractor.py, postgres_to_sqlite_converter.py, etc., into /ops/migration.
Give them their own Dockerfile.ops if needed.
Outputs (*.json, *.md) should live under /ops/migration/outputs/. Do not commit giant blobs into root.

6) Incremental migration plan (so you don’t freeze dev)

Week 1

Create new skeleton folders (cmd, internal/platform, internal/domain, internal/app, internal/data, internal/adapters/graphql, internal/jobs).
Move config/log/db/cache/auth into internal/platform/*. Add DI wiring in cmd/api/main.go.
Pick and migrate one aggregate end-to-end (e.g., work): domain entity → repo interface → sql repo → app service (commands/queries) → GraphQL resolvers. Ship.

Week 2

Kill duplicate GraphQL folder. Point gqlgen to the adapters path. Move remaining resolvers to call app services.
Introduce UoW helper and convert multi-repo write flows.
Replace cached_* repos with decorators.

Week 3

Move background jobs to cmd/worker + internal/jobs/*.
Migrations: adopt goose/atlas; relocate SQL; remove migrations.go.
Observability and authz policy pass.

Week 4

Sweep: delete dead packages (store, duplicate repositories), move Python to /ops.
Add integration tests; lock CI with make lint test test-integration.

7) A few code-level nits to hunt down

Context: ensure every repo method accepts context.Context and respects timeouts.
Errors: wrap with %w and define sentinel errors (e.g., ErrNotFound). Map to GraphQL errors centrally.
Caching keys: namespace per aggregate + version (e.g., work:v2:{id}) so you can invalidate by bumping version.
GraphQL N+1: use dataloaders per aggregate, scoped to request context. Put loader wiring in internal/adapters/graphql.
Pagination: choose offset vs cursor (prefer cursor) and make it consistent across queries.
ID semantics: unify UUID vs int64 across domains; add ID value object to eliminate accidental mixing.
Config for dev/prod: two Dockerfiles were fine; just move them under /deploy/docker and keep env-driven config.

11 KiB Raw Permalink Blame History Unescape Escape