Repository Structure:
- Move files from cluttered root directory into organized structure
- Create archive/ for archived data and scraper results
- Create bugulma/ for the complete application (frontend + backend)
- Create data/ for sample datasets and reference materials
- Create docs/ for comprehensive documentation structure
- Create scripts/ for utility scripts and API tools
Backend Implementation:
- Implement 3 missing backend endpoints identified in gap analysis:
* GET /api/v1/organizations/{id}/matching/direct - Direct symbiosis matches
* GET /api/v1/users/me/organizations - User organizations
* POST /api/v1/proposals/{id}/status - Update proposal status
- Add complete proposal domain model, repository, and service layers
- Create database migration for proposals table
- Fix CLI server command registration issue
API Documentation:
- Add comprehensive proposals.md API documentation
- Update README.md with Users and Proposals API sections
- Document all request/response formats, error codes, and business rules
Code Quality:
- Follow existing Go backend architecture patterns
- Add proper error handling and validation
- Match frontend expected response schemas
- Maintain clean separation of concerns (handler -> service -> repository)
4.5 KiB
7. Graph Database Design
Base Technology
Graph Database Selection: Start with Neo4j for MVP (best documentation, largest ecosystem), plan migration path to TigerGraph if scale exceeds 10B nodes.
Decision Criteria:
-
Scalability:
- Neo4j: Strong until ~50B nodes, then requires clustering
- ArangoDB: Better horizontal scaling
- TigerGraph: Designed for very large graphs (100B+ nodes)
- Memgraph: Fast but less mature ecosystem
-
Geospatial Support:
- Neo4j: Requires APOC library + PostGIS integration
- ArangoDB: Built-in geospatial indexes
- TigerGraph: Requires external PostGIS
-
Query Performance: Benchmark common queries (5km radius, temporal overlap, quality matching)
-
Ecosystem: Community size, cloud managed options, integration with existing stack
-
Cost: Licensing, cloud costs, operational complexity
Relationships
(Business)-[:OPERATES_AT]->(Site)
(Site)-[:HOSTS]->(ResourceFlow)
(ResourceFlow)-[:MATCHABLE_TO {efficiency, distance, savings}]->(ResourceFlow)
(Site)-[:HOSTS]->(SharedAsset)
(Business)-[:OFFERS]->(Service)
(Business)-[:SELLS]->(Product)
Hybrid Architecture for Geospatial Queries
Architecture:
- Neo4j: Stores graph structure, relationships, quality/temporal properties
- PostgreSQL+PostGIS: Stores detailed geospatial data, handles complex distance calculations, spatial joins
- Synchronization: Event-driven sync (Site created/updated → sync to PostGIS)
Query Pattern:
1. PostGIS: Find all sites within 5km radius (fast spatial index)
2. Neo4j: Filter by ResourceFlow types, quality, temporal overlap (graph traversal)
3. Join results in application layer or use Neo4j spatial plugin
Alternative: Use Neo4j APOC spatial procedures if graph is primary store.
Zone-First Architecture for Data Sovereignty
Problem: Global graph vs local adoption conflict - EU-wide matching requires unified schema, but local clusters need low-latency, sovereign data control.
Solution: Zone-first graph architecture where each geographic/regulatory zone operates semi-autonomously:
Zone Types:
- City Zones: Municipal boundaries, operated by city governments
- Industrial Park Zones: Single park operators, private industrial clusters
- Regional Zones: County/state level, cross-municipality coordination
- Country Zones: National regulatory compliance, standardized schemas
Architecture Pattern:
Zone Database (Local Neo4j/PostgreSQL)
├── Local Graph: Sites, flows, businesses within zone
├── Local Rules: Zone-specific matching logic, regulations
├── Selective Publishing: Choose what to expose globally
└── Data Sovereignty: Zone operator controls data visibility
Global Federation Layer
├── Cross-zone matching requests
├── Federated queries (zone A requests zone B data)
├── Anonymized global analytics
└── Selective data sharing agreements
Key Benefits:
- Data Sovereignty: Cities/utilities control their data, GDPR compliance
- Low Latency: Local queries stay within zone boundaries
- Regulatory Flexibility: Each zone adapts to local waste/energy rules
- Scalable Adoption: Start with single zones, federate gradually
- Trust Building: Local operators maintain control while enabling cross-zone matches
Implementation:
- Zone Registry: Global catalog of active zones with API endpoints
- Federation Protocol: Standardized cross-zone query interface
- Data Contracts: Per-zone agreements on what data is shared globally
- Migration Path: Start mono-zone, add federation as network grows
Indexing Strategy
Required Indexes:
- Spatial Index: Site locations (latitude, longitude)
- Temporal Index: ResourceFlow availability windows, seasonality
- Composite Indexes:
- (ResourceFlow.type, ResourceFlow.direction, Site.location)
- (ResourceFlow.quality.temperature_celsius, ResourceFlow.type)
- Full-Text Search: Business names, NACE codes, service domains
Index Maintenance:
- Monitor query performance and index usage
- Use Neo4j's EXPLAIN PROFILE for query optimization
- Consider partitioning large graphs by geographic regions
Why Graph DB
Queries like: "find all output nodes within 5 km producing heat 35–60 °C that matches any input nodes needing heat 30–55 °C, ΔT ≤ 10 K, availability overlap ≥ 70 %, and net savings > €0.02/kWh."
That's a multi-criteria graph traversal — perfect fit.