turash/docs/concept/09_graph_database_design.md
Damir Mukimov 000eab4740
Major repository reorganization and missing backend endpoints implementation
Repository Structure:
- Move files from cluttered root directory into organized structure
- Create archive/ for archived data and scraper results
- Create bugulma/ for the complete application (frontend + backend)
- Create data/ for sample datasets and reference materials
- Create docs/ for comprehensive documentation structure
- Create scripts/ for utility scripts and API tools

Backend Implementation:
- Implement 3 missing backend endpoints identified in gap analysis:
  * GET /api/v1/organizations/{id}/matching/direct - Direct symbiosis matches
  * GET /api/v1/users/me/organizations - User organizations
  * POST /api/v1/proposals/{id}/status - Update proposal status
- Add complete proposal domain model, repository, and service layers
- Create database migration for proposals table
- Fix CLI server command registration issue

API Documentation:
- Add comprehensive proposals.md API documentation
- Update README.md with Users and Proposals API sections
- Document all request/response formats, error codes, and business rules

Code Quality:
- Follow existing Go backend architecture patterns
- Add proper error handling and validation
- Match frontend expected response schemas
- Maintain clean separation of concerns (handler -> service -> repository)
2025-11-25 06:01:16 +01:00

115 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## 7. Graph Database Design
### Base Technology
**Graph Database Selection**: Start with **Neo4j** for MVP (best documentation, largest ecosystem), plan migration path to **TigerGraph** if scale exceeds 10B nodes.
**Decision Criteria**:
1. **Scalability**:
- Neo4j: Strong until ~50B nodes, then requires clustering
- ArangoDB: Better horizontal scaling
- TigerGraph: Designed for very large graphs (100B+ nodes)
- Memgraph: Fast but less mature ecosystem
2. **Geospatial Support**:
- Neo4j: Requires APOC library + PostGIS integration
- ArangoDB: Built-in geospatial indexes
- TigerGraph: Requires external PostGIS
3. **Query Performance**: Benchmark common queries (5km radius, temporal overlap, quality matching)
4. **Ecosystem**: Community size, cloud managed options, integration with existing stack
5. **Cost**: Licensing, cloud costs, operational complexity
### Relationships
```
(Business)-[:OPERATES_AT]->(Site)
(Site)-[:HOSTS]->(ResourceFlow)
(ResourceFlow)-[:MATCHABLE_TO {efficiency, distance, savings}]->(ResourceFlow)
(Site)-[:HOSTS]->(SharedAsset)
(Business)-[:OFFERS]->(Service)
(Business)-[:SELLS]->(Product)
```
### Hybrid Architecture for Geospatial Queries
**Architecture**:
- **Neo4j**: Stores graph structure, relationships, quality/temporal properties
- **PostgreSQL+PostGIS**: Stores detailed geospatial data, handles complex distance calculations, spatial joins
- **Synchronization**: Event-driven sync (Site created/updated → sync to PostGIS)
**Query Pattern**:
```
1. PostGIS: Find all sites within 5km radius (fast spatial index)
2. Neo4j: Filter by ResourceFlow types, quality, temporal overlap (graph traversal)
3. Join results in application layer or use Neo4j spatial plugin
```
**Alternative**: Use Neo4j APOC spatial procedures if graph is primary store.
### Zone-First Architecture for Data Sovereignty
**Problem**: Global graph vs local adoption conflict - EU-wide matching requires unified schema, but local clusters need low-latency, sovereign data control.
**Solution**: **Zone-first graph architecture** where each geographic/regulatory zone operates semi-autonomously:
**Zone Types**:
- **City Zones**: Municipal boundaries, operated by city governments
- **Industrial Park Zones**: Single park operators, private industrial clusters
- **Regional Zones**: County/state level, cross-municipality coordination
- **Country Zones**: National regulatory compliance, standardized schemas
**Architecture Pattern**:
```
Zone Database (Local Neo4j/PostgreSQL)
├── Local Graph: Sites, flows, businesses within zone
├── Local Rules: Zone-specific matching logic, regulations
├── Selective Publishing: Choose what to expose globally
└── Data Sovereignty: Zone operator controls data visibility
Global Federation Layer
├── Cross-zone matching requests
├── Federated queries (zone A requests zone B data)
├── Anonymized global analytics
└── Selective data sharing agreements
```
**Key Benefits**:
- **Data Sovereignty**: Cities/utilities control their data, GDPR compliance
- **Low Latency**: Local queries stay within zone boundaries
- **Regulatory Flexibility**: Each zone adapts to local waste/energy rules
- **Scalable Adoption**: Start with single zones, federate gradually
- **Trust Building**: Local operators maintain control while enabling cross-zone matches
**Implementation**:
- **Zone Registry**: Global catalog of active zones with API endpoints
- **Federation Protocol**: Standardized cross-zone query interface
- **Data Contracts**: Per-zone agreements on what data is shared globally
- **Migration Path**: Start mono-zone, add federation as network grows
### Indexing Strategy
**Required Indexes**:
- **Spatial Index**: Site locations (latitude, longitude)
- **Temporal Index**: ResourceFlow availability windows, seasonality
- **Composite Indexes**:
- (ResourceFlow.type, ResourceFlow.direction, Site.location)
- (ResourceFlow.quality.temperature_celsius, ResourceFlow.type)
- **Full-Text Search**: Business names, NACE codes, service domains
**Index Maintenance**:
- Monitor query performance and index usage
- Use Neo4j's EXPLAIN PROFILE for query optimization
- Consider partitioning large graphs by geographic regions
### Why Graph DB
Queries like:
"find all output nodes within 5 km producing heat 3560 °C that matches any input nodes needing heat 3055 °C, ΔT ≤ 10 K, availability overlap ≥ 70 %, and net savings > €0.02/kWh."
That's a multi-criteria graph traversal — perfect fit.
---