mirror of
https://github.com/SamyRai/tercul-backend.git
synced 2025-12-27 04:01:34 +00:00
- Core Go application with GraphQL API using gqlgen - Comprehensive data models for literary works, authors, translations - Repository pattern with caching layer - Authentication and authorization system - Linguistics analysis capabilities with multiple adapters - Vector search integration with Weaviate - Docker containerization support - Python data migration and analysis scripts - Clean architecture with proper separation of concerns - Production-ready configuration and middleware - Proper .gitignore excluding vendor/, database files, and build artifacts
214 lines
11 KiB
Markdown
214 lines
11 KiB
Markdown
Short, sharp audit. You’ve got good bones but too many cross-cutting seams: duplicated GraphQL layers, mixed Python ops scripts with runtime code, domain spread across “models/ + repositories/ + services/” without clear aggregate boundaries, and infra (cache/db/auth) bleeding into app layer. Here’s a tighter, execution-ready structure and the reasoning behind each cut.
|
||
|
||
# 1) Target repo layout (Go standards + DDD-lite)
|
||
|
||
```
|
||
.
|
||
├── cmd/
|
||
│ ├── api/ # main GraphQL/HTTP server
|
||
│ │ └── main.go
|
||
│ ├── worker/ # background jobs (sync, enrichment)
|
||
│ │ └── main.go
|
||
│ └── tools/ # one-off CLIs (e.g., enrich)
|
||
│ └── enrich/
|
||
│ └── main.go
|
||
├── internal/
|
||
│ ├── platform/ # cross-cutting infra (private)
|
||
│ │ ├── config/ # config load/validate
|
||
│ │ ├── db/ # connection pool, migrations runner, uow/tx helpers
|
||
│ │ ├── cache/ # redis client + cache abstractions
|
||
│ │ ├── auth/ # jwt, middleware, authn/z policies
|
||
│ │ ├── http/ # router, middleware (rate limit, recovery, observability)
|
||
│ │ ├── log/ # logger facade
|
||
│ │ └── search/ # weaviate client, schema mgmt
|
||
│ ├── domain/ # business concepts & interfaces only
|
||
│ │ ├── work/
|
||
│ │ │ ├── entity.go # Work, Value Objects, invariants
|
||
│ │ │ ├── repo.go # interface WorkRepository
|
||
│ │ │ └── service.go # domain service interfaces (pure)
|
||
│ │ ├── author/
|
||
│ │ ├── user/
|
||
│ │ └── ... (countries, tags, etc.)
|
||
│ ├── data/ # data access (implement domain repos)
|
||
│ │ ├── sql/ # sqlc or squirrel; concrete repos
|
||
│ │ ├── cache/ # cached repos/decorators (per-aggregate)
|
||
│ │ └── migrations/ # *.sql, versioned
|
||
│ ├── app/ # application services (orchestrate use cases)
|
||
│ │ ├── work/
|
||
│ │ │ ├── commands.go # Create/Update ops
|
||
│ │ │ └── queries.go # Read models, listings
|
||
│ │ └── ... # other aggregates
|
||
│ ├── adapters/
|
||
│ │ ├── graphql/ # gqlgen resolvers map → app layer (one place!)
|
||
│ │ │ ├── schema.graphqls
|
||
│ │ │ ├── generated.go
|
||
│ │ │ └── resolvers.go
|
||
│ │ └── http/ # (optional) REST handlers if any
|
||
│ ├── jobs/ # background jobs, queues, schedulers
|
||
│ │ ├── sync/ # edges/entities sync
|
||
│ │ └── linguistics/ # text analysis pipelines
|
||
│ └── observability/
|
||
│ ├── metrics.go
|
||
│ └── tracing.go
|
||
├── pkg/ # public reusable libs (if truly reusable)
|
||
│ └── linguistics/ # only if you intend external reuse; else keep in internal/
|
||
├── api/ # GraphQL docs & examples; schema copies for consumers
|
||
│ └── README.md
|
||
├── deploy/
|
||
│ ├── docker/ # Dockerfile(s), compose for dev
|
||
│ └── k8s/ # manifests/helm (if/when)
|
||
├── ops/ # data migration & analysis (Python lives here)
|
||
│ ├── migration/
|
||
│ │ ├── scripts/*.py
|
||
│ │ ├── reports/*.md|.json
|
||
│ │ └── inputs/outputs/ # authors.json, works.json, etc.
|
||
│ └── analysis/
|
||
│ └── notebooks|scripts
|
||
├── test/
|
||
│ ├── integration/ # black-box tests; spins containers
|
||
│ ├── fixtures/ # testdata
|
||
│ └── e2e/
|
||
├── Makefile
|
||
├── go.mod
|
||
└── README.md
|
||
```
|
||
|
||
### Why this wins
|
||
|
||
* **One GraphQL layer**: you currently have both `/graph` and `/graphql`. Kill one. Put schema+resolvers under `internal/adapters/graphql`. Adapters call **application services**, not repos directly.
|
||
* **Domain isolation**: `internal/domain/*` holds entities/value objects and interfaces only. No SQL or Redis here.
|
||
* **Data layer as a replaceable detail**: `internal/data/sql` implements domain repositories (and adds caching as decorators in `internal/data/cache`).
|
||
* **Background jobs are first-class**: move `syncjob`, `linguistics` processing into `internal/jobs/*` and run them via `cmd/worker`.
|
||
* **Python is ops-only**: all migration/one-off analysis goes to `/ops`. Don’t ship Python into the runtime container.
|
||
* **Infra cohesion**: auth, cache, db pools, http middleware under `internal/platform/`. You had them scattered across `auth/`, `middleware/`, `db/`, `cache/`.
|
||
|
||
# 2) Specific refactors (high ROI)
|
||
|
||
1. **Unify GraphQL**
|
||
|
||
* Delete one of: `/graph` or `/graphql`. Keep **gqlgen** in `internal/adapters/graphql`.
|
||
* Put `schema.graphqls` there. Configure `gqlgen.yml` to output generated code in the same package.
|
||
* Resolvers should call `internal/app/*` use-cases (not repos), returning **read models** tailored for GraphQL.
|
||
|
||
2. **Introduce Unit-of-Work (UoW) + Transaction boundaries**
|
||
|
||
* In `internal/platform/db`, add `WithTx(ctx, func(ctx context.Context) error)` that injects transactional repos into the app layer.
|
||
* Repos get created from a factory bound to `*sql.DB` or `*sql.Tx`.
|
||
* This eliminates hidden transaction bugs across services.
|
||
|
||
3. **Split Write vs Read paths (lightweight CQRS)**
|
||
|
||
* In `internal/app/work/commands.go`, keep strict invariants (create/update/merge).
|
||
* In `internal/app/work/queries.go`, return view models optimized for UI/GraphQL (joins, denormalized fields), leveraging read-only query helpers.
|
||
* Keep read models cacheable independently (Redis).
|
||
|
||
4. **Cache as decorators, not bespoke repos**
|
||
|
||
* Replace `cached_*_repository.go` proliferation with **decorator pattern**:
|
||
|
||
* `type CachedWorkRepo struct { inner WorkRepository; cache Cache }`
|
||
* Only decorate **reads**. Writes invalidate keys deterministically.
|
||
* Move all cache code to `internal/data/cache`.
|
||
|
||
5. **Models package explosion → domain aggregates**
|
||
|
||
* Current `models/*.go` mixes everything. Group by aggregate (`work`, `author`, `user`, …). Co-locate value objects and invariants. Keep **constructors** that validate invariants (no anemic structs).
|
||
|
||
6. **Migrations**
|
||
|
||
* Move raw SQL to `internal/data/migrations` (or `/migrations` at repo root) and adopt a tool (goose, atlas, migrate). Delete `migrations.go` hand-rollers.
|
||
* Version generated `tercul_schema.sql` as **snapshots** in `/ops/migration/outputs/` instead of in runtime code.
|
||
|
||
7. **Observability**
|
||
|
||
* Centralize logging (`internal/platform/log`), add request IDs, user IDs (if any), and span IDs.
|
||
* Add Prometheus metrics and OpenTelemetry tracing (`internal/observability`). Wire to router and DB.
|
||
|
||
8. **Config**
|
||
|
||
* Replace ad-hoc `config/config.go` with strict struct + env parsing + validation (envconfig or koanf). No globals; inject via constructors.
|
||
|
||
9. **Security**
|
||
|
||
* Move JWT + middleware under `internal/platform/auth`. Add **authz policy functions** (e.g., `CanEditWork(user, work)`).
|
||
* Make resolvers fetch `user` from context once.
|
||
|
||
10. **Weaviate**
|
||
|
||
* Put client + schema code in `internal/platform/search`. Provide an interface in `internal/domain/search` only if you truly need to swap engines.
|
||
|
||
11. **Testing**
|
||
|
||
* `test/integration`: spin Postgres/Redis via docker-compose; seed minimal fixtures.
|
||
* Use `make test-integration` target.
|
||
* Favor **table-driven** tests at app layer. Cut duplicated repo tests; test behavior via app services + a `fake` repo.
|
||
|
||
12. **Delete dead duplication**
|
||
|
||
* `graph/` vs `graphql/` → one.
|
||
* `repositories/*_repository.go` vs `internal/store` → one place: `internal/data/sql`.
|
||
* `services/work_service.go` vs resolvers doing business logic → all business logic in `internal/app/*`.
|
||
|
||
# 3) gqlgen wiring (clean, dependency-safe)
|
||
|
||
* `internal/adapters/graphql/resolvers.go` should accept a single `Application` façade:
|
||
|
||
```go
|
||
type Application struct {
|
||
Works app.WorkService
|
||
Authors app.AuthorService
|
||
// ...
|
||
}
|
||
```
|
||
* Construct `Application` in `cmd/api/main.go` by wiring `platform/db`, repos, caches, and services.
|
||
* Resolvers never import `platform/*` or `data/*`.
|
||
|
||
# 4) Background jobs: make them boring & reliable
|
||
|
||
* `cmd/worker/main.go` loads the same DI container, then registers jobs:
|
||
|
||
* `jobs/linguistics.Pipeline` (tokenizer → POS → lemmas → phonetic → analysis repo)
|
||
* `jobs/sync.Entities/Edges`
|
||
* Use asynq or a simple cron (robfig/cron) depending on needs. Each job is idempotent and has a **lease** (prevent overlaps).
|
||
|
||
# 5) Python: isolate and containerize for ops
|
||
|
||
* Move `data_extractor.py`, `postgres_to_sqlite_converter.py`, etc., into `/ops/migration`.
|
||
* Give them their own `Dockerfile.ops` if needed.
|
||
* Outputs (`*.json`, `*.md`) should live under `/ops/migration/outputs/`. Do not commit giant blobs into root.
|
||
|
||
# 6) Incremental migration plan (so you don’t freeze dev)
|
||
|
||
**Week 1**
|
||
|
||
* Create new skeleton folders (`cmd`, `internal/platform`, `internal/domain`, `internal/app`, `internal/data`, `internal/adapters/graphql`, `internal/jobs`).
|
||
* Move config/log/db/cache/auth into `internal/platform/*`. Add DI wiring in `cmd/api/main.go`.
|
||
* Pick and migrate **one aggregate** end-to-end (e.g., `work`): domain entity → repo interface → sql repo → app service (commands/queries) → GraphQL resolvers. Ship.
|
||
|
||
**Week 2**
|
||
|
||
* Kill duplicate GraphQL folder. Point gqlgen to the adapters path. Move remaining resolvers to call app services.
|
||
* Introduce UoW helper and convert multi-repo write flows.
|
||
* Replace cached\_\* repos with decorators.
|
||
|
||
**Week 3**
|
||
|
||
* Move background jobs to `cmd/worker` + `internal/jobs/*`.
|
||
* Migrations: adopt goose/atlas; relocate SQL; remove `migrations.go`.
|
||
* Observability and authz policy pass.
|
||
|
||
**Week 4**
|
||
|
||
* Sweep: delete dead packages (`store`, duplicate `repositories`), move Python to `/ops`.
|
||
* Add integration tests; lock CI with `make lint test test-integration`.
|
||
|
||
# 7) A few code-level nits to hunt down
|
||
|
||
* **Context**: ensure every repo method accepts `context.Context` and respects timeouts.
|
||
* **Errors**: wrap with `%w` and define sentinel errors (e.g., `ErrNotFound`). Map to GraphQL errors centrally.
|
||
* **Caching keys**: namespace per aggregate + version (e.g., `work:v2:{id}`) so you can invalidate by bumping version.
|
||
* **GraphQL N+1**: use dataloaders per aggregate, scoped to request context. Put loader wiring in `internal/adapters/graphql`.
|
||
* **Pagination**: choose offset vs cursor (prefer cursor) and make it consistent across queries.
|
||
* **ID semantics**: unify UUID vs int64 across domains; add `ID` value object to eliminate accidental mixing.
|
||
* **Config for dev/prod**: two Dockerfiles were fine; just move them under `/deploy/docker` and keep env-driven config.
|