tercul-backend/refactor.md

Short, sharp audit. You’ve got good bones but too many cross-cutting seams: duplicated GraphQL layers, mixed Python ops scripts with runtime code, domain spread across “models/ + repositories/ + services/” without clear aggregate boundaries, and infra (cache/db/auth) bleeding into app layer. Here’s a tighter, execution-ready structure and the reasoning behind each cut.

# 1) Target repo layout (Go standards + DDD-lite)

```
.
├── cmd/
│   ├── api/                 # main GraphQL/HTTP server
│   │   └── main.go
│   ├── worker/              # background jobs (sync, enrichment)
│   │   └── main.go
│   └── tools/               # one-off CLIs (e.g., enrich)
│       └── enrich/
│           └── main.go
├── internal/
│   ├── platform/            # cross-cutting infra (private)
│   │   ├── config/          # config load/validate
│   │   ├── db/              # connection pool, migrations runner, uow/tx helpers
│   │   ├── cache/           # redis client + cache abstractions
│   │   ├── auth/            # jwt, middleware, authn/z policies
│   │   ├── http/            # router, middleware (rate limit, recovery, observability)
│   │   ├── log/             # logger facade
│   │   └── search/          # weaviate client, schema mgmt
│   ├── domain/              # business concepts & interfaces only
│   │   ├── work/
│   │   │   ├── entity.go        # Work, Value Objects, invariants
│   │   │   ├── repo.go          # interface WorkRepository
│   │   │   └── service.go       # domain service interfaces (pure)
│   │   ├── author/
│   │   ├── user/
│   │   └── ... (countries, tags, etc.)
│   ├── data/                # data access (implement domain repos)
│   │   ├── sql/             # sqlc or squirrel; concrete repos
│   │   ├── cache/           # cached repos/decorators (per-aggregate)
│   │   └── migrations/      # *.sql, versioned
│   ├── app/                 # application services (orchestrate use cases)
│   │   ├── work/
│   │   │   ├── commands.go  # Create/Update ops
│   │   │   └── queries.go   # Read models, listings
│   │   └── ...              # other aggregates
│   ├── adapters/
│   │   ├── graphql/         # gqlgen resolvers map → app layer (one place!)
│   │   │   ├── schema.graphqls
│   │   │   ├── generated.go
│   │   │   └── resolvers.go
│   │   └── http/            # (optional) REST handlers if any
│   ├── jobs/                # background jobs, queues, schedulers
│   │   ├── sync/            # edges/entities sync
│   │   └── linguistics/     # text analysis pipelines
│   └── observability/
│       ├── metrics.go
│       └── tracing.go
├── pkg/                     # public reusable libs (if truly reusable)
│   └── linguistics/         # only if you intend external reuse; else keep in internal/
├── api/                     # GraphQL docs & examples; schema copies for consumers
│   └── README.md
├── deploy/
│   ├── docker/              # Dockerfile(s), compose for dev
│   └── k8s/                 # manifests/helm (if/when)
├── ops/                     # data migration & analysis (Python lives here)
│   ├── migration/
│   │   ├── scripts/*.py
│   │   ├── reports/*.md|.json
│   │   └── inputs/outputs/  # authors.json, works.json, etc.
│   └── analysis/
│       └── notebooks|scripts
├── test/
│   ├── integration/         # black-box tests; spins containers
│   ├── fixtures/            # testdata
│   └── e2e/
├── Makefile
├── go.mod
└── README.md
```

### Why this wins

* **One GraphQL layer**: you currently have both `/graph` and `/graphql`. Kill one. Put schema+resolvers under `internal/adapters/graphql`. Adapters call **application services**, not repos directly.
* **Domain isolation**: `internal/domain/*` holds entities/value objects and interfaces only. No SQL or Redis here.
* **Data layer as a replaceable detail**: `internal/data/sql` implements domain repositories (and adds caching as decorators in `internal/data/cache`).
* **Background jobs are first-class**: move `syncjob`, `linguistics` processing into `internal/jobs/*` and run them via `cmd/worker`.
* **Python is ops-only**: all migration/one-off analysis goes to `/ops`. Don’t ship Python into the runtime container.
* **Infra cohesion**: auth, cache, db pools, http middleware under `internal/platform/`. You had them scattered across `auth/`, `middleware/`, `db/`, `cache/`.

# 2) Specific refactors (high ROI)

1. **Unify GraphQL**

* Delete one of: `/graph` or `/graphql`. Keep **gqlgen** in `internal/adapters/graphql`.
* Put `schema.graphqls` there. Configure `gqlgen.yml` to output generated code in the same package.
* Resolvers should call `internal/app/*` use-cases (not repos), returning **read models** tailored for GraphQL.

2. **Introduce Unit-of-Work (UoW) + Transaction boundaries**

* In `internal/platform/db`, add `WithTx(ctx, func(ctx context.Context) error)` that injects transactional repos into the app layer.
* Repos get created from a factory bound to `*sql.DB` or `*sql.Tx`.
* This eliminates hidden transaction bugs across services.

3. **Split Write vs Read paths (lightweight CQRS)**

* In `internal/app/work/commands.go`, keep strict invariants (create/update/merge).
* In `internal/app/work/queries.go`, return view models optimized for UI/GraphQL (joins, denormalized fields), leveraging read-only query helpers.
* Keep read models cacheable independently (Redis).

4. **Cache as decorators, not bespoke repos**

* Replace `cached_*_repository.go` proliferation with **decorator pattern**:

  * `type CachedWorkRepo struct { inner WorkRepository; cache Cache }`
  * Only decorate **reads**. Writes invalidate keys deterministically.
  * Move all cache code to `internal/data/cache`.

5. **Models package explosion → domain aggregates**

* Current `models/*.go` mixes everything. Group by aggregate (`work`, `author`, `user`, …). Co-locate value objects and invariants. Keep **constructors** that validate invariants (no anemic structs).

6. **Migrations**

* Move raw SQL to `internal/data/migrations` (or `/migrations` at repo root) and adopt a tool (goose, atlas, migrate). Delete `migrations.go` hand-rollers.
* Version generated `tercul_schema.sql` as **snapshots** in `/ops/migration/outputs/` instead of in runtime code.

7. **Observability**

* Centralize logging (`internal/platform/log`), add request IDs, user IDs (if any), and span IDs.
* Add Prometheus metrics and OpenTelemetry tracing (`internal/observability`). Wire to router and DB.

8. **Config**

* Replace ad-hoc `config/config.go` with strict struct + env parsing + validation (envconfig or koanf). No globals; inject via constructors.

9. **Security**

* Move JWT + middleware under `internal/platform/auth`. Add **authz policy functions** (e.g., `CanEditWork(user, work)`).
* Make resolvers fetch `user` from context once.

10. **Weaviate**

* Put client + schema code in `internal/platform/search`. Provide an interface in `internal/domain/search` only if you truly need to swap engines.

11. **Testing**

* `test/integration`: spin Postgres/Redis via docker-compose; seed minimal fixtures.
* Use `make test-integration` target.
* Favor **table-driven** tests at app layer. Cut duplicated repo tests; test behavior via app services + a `fake` repo.

12. **Delete dead duplication**

* `graph/` vs `graphql/` → one.
* `repositories/*_repository.go` vs `internal/store` → one place: `internal/data/sql`.
* `services/work_service.go` vs resolvers doing business logic → all business logic in `internal/app/*`.

# 3) gqlgen wiring (clean, dependency-safe)

* `internal/adapters/graphql/resolvers.go` should accept a single `Application` façade:

  ```go
  type Application struct {
      Works   app.WorkService
      Authors app.AuthorService
      // ...
  }
  ```
* Construct `Application` in `cmd/api/main.go` by wiring `platform/db`, repos, caches, and services.
* Resolvers never import `platform/*` or `data/*`.

# 4) Background jobs: make them boring & reliable

* `cmd/worker/main.go` loads the same DI container, then registers jobs:

  * `jobs/linguistics.Pipeline` (tokenizer → POS → lemmas → phonetic → analysis repo)
  * `jobs/sync.Entities/Edges`
* Use asynq or a simple cron (robfig/cron) depending on needs. Each job is idempotent and has a **lease** (prevent overlaps).

# 5) Python: isolate and containerize for ops

* Move `data_extractor.py`, `postgres_to_sqlite_converter.py`, etc., into `/ops/migration`.
* Give them their own `Dockerfile.ops` if needed.
* Outputs (`*.json`, `*.md`) should live under `/ops/migration/outputs/`. Do not commit giant blobs into root.

# 6) Incremental migration plan (so you don’t freeze dev)

**Week 1**

* Create new skeleton folders (`cmd`, `internal/platform`, `internal/domain`, `internal/app`, `internal/data`, `internal/adapters/graphql`, `internal/jobs`).
* Move config/log/db/cache/auth into `internal/platform/*`. Add DI wiring in `cmd/api/main.go`.
* Pick and migrate **one aggregate** end-to-end (e.g., `work`): domain entity → repo interface → sql repo → app service (commands/queries) → GraphQL resolvers. Ship.

**Week 2**

* Kill duplicate GraphQL folder. Point gqlgen to the adapters path. Move remaining resolvers to call app services.
* Introduce UoW helper and convert multi-repo write flows.
* Replace cached\_\* repos with decorators.

**Week 3**

* Move background jobs to `cmd/worker` + `internal/jobs/*`.
* Migrations: adopt goose/atlas; relocate SQL; remove `migrations.go`.
* Observability and authz policy pass.

**Week 4**

* Sweep: delete dead packages (`store`, duplicate `repositories`), move Python to `/ops`.
* Add integration tests; lock CI with `make lint test test-integration`.

# 7) A few code-level nits to hunt down

* **Context**: ensure every repo method accepts `context.Context` and respects timeouts.
* **Errors**: wrap with `%w` and define sentinel errors (e.g., `ErrNotFound`). Map to GraphQL errors centrally.
* **Caching keys**: namespace per aggregate + version (e.g., `work:v2:{id}`) so you can invalidate by bumping version.
* **GraphQL N+1**: use dataloaders per aggregate, scoped to request context. Put loader wiring in `internal/adapters/graphql`.
* **Pagination**: choose offset vs cursor (prefer cursor) and make it consistent across queries.
* **ID semantics**: unify UUID vs int64 across domains; add `ID` value object to eliminate accidental mixing.
* **Config for dev/prod**: two Dockerfiles were fine; just move them under `/deploy/docker` and keep env-driven config.