turash/docs/GEOGRAPHICAL_DATA_MIGRATION_PLAN.md
Damir Mukimov 0df4812c82
feat: Complete geographical features implementation with full test coverage
- Add comprehensive geographical data models (GeographicalFeature, TransportMode, TransportProfile, TransportOption)
- Implement geographical feature repository with PostGIS support and spatial queries
- Create transportation service for cost calculation and route optimization
- Build spatial resource matcher for geographical resource matching
- Develop environmental impact service for site environmental scoring
- Implement facility location optimizer with multi-criteria analysis
- Add geographical data migration service for SQLite to PostgreSQL migration
- Create database migrations for geographical features and site footprints
- Update geospatial service integration and server initialization
- Add CLI command for geographical data synchronization
- Implement complete test coverage for all geographical components (28 test cases)
- Update test infrastructure for geographical table creation and PostGIS handling

This implements advanced geospatial capabilities including transportation cost modeling, environmental impact assessment, and facility location optimization for the Turash platform.
2025-11-25 06:42:18 +01:00

295 lines
9.5 KiB
Markdown

# Geographical Data Migration Plan: SQLite OSM Features → PostgreSQL PostGIS
## Executive Summary
This document outlines a strategic plan to migrate valuable geographical and geospatial data from the SQLite database (`bugulma_city_data.db`) to the PostgreSQL PostGIS database for enhanced industrial symbiosis analysis.
## Current Data Landscape
### SQLite Database (`bugulma_city_data.db`)
- **Total OSM Features**: 11,058 records
- **Data Volume**: 32MB with geospatial geometries
- **Geometry Format**: GeoJSON (ready for PostGIS conversion)
#### Feature Breakdown:
| Feature Type | Count | Geometry Type | Avg Size (chars) | Potential Value |
|-------------|-------|---------------|------------------|----------------|
| Buildings | 9,128 | Polygon | 193 | High - facility footprints |
| Roads | 1,644 | LineString/Point | 162 | High - transportation network |
| POIs | 274 | Point | 155 | Medium - service locations |
| Green Spaces | 12 | Polygon | 397 | High - environmental zones |
### PostgreSQL Database (`turash`)
- **Current Sites**: 9,144 (all points only)
- **PostGIS Enabled**: Yes, with spatial indexes
- **Industrial Symbiosis Data**: Organizations, resource flows, trust metrics
- **Missing**: Polygon geometries, road networks, green spaces
## Strategic Value Assessment
### High Priority Features (Immediate Migration)
#### 1. Building Polygons
**Why Migrate:**
- Current sites are only centroids (lat/lng points)
- Polygon footprints enable:
- Accurate area calculations for industrial facilities
- Spatial analysis of facility layouts
- Proximity analysis between buildings
- Land use optimization studies
**Implementation:**
- Upgrade existing 9,144 sites with polygon geometries
- Add building metadata (height, levels, construction type)
- Enable spatial joins with resource flows
#### 2. Road Network (LineStrings)
**Why Migrate:**
- Transportation infrastructure is critical for:
- Logistics optimization
- Supply chain analysis
- Transportation cost modeling
- Emergency response planning
**Data Quality:**
- 1,408 LineString geometries (high-quality road segments)
- 236 Point geometries (road intersections/nodes)
#### 3. Green Spaces (Polygons)
**Why Migrate:**
- Environmental considerations in industrial symbiosis:
- Carbon sequestration analysis
- Biodiversity impact assessment
- Green infrastructure planning
- Recreational space identification
### Medium Priority Features (Phase 2)
#### 4. Points of Interest (POIs)
**Why Migrate:**
- Service location data for:
- Supplier identification
- Service provider networks
- Community resource mapping
- Infrastructure gap analysis
## Technical Implementation Plan
### Phase 1: Core Infrastructure Migration
#### Database Schema Extensions
**New Tables:**
```sql
-- Road network table
CREATE TABLE geographical_features (
id TEXT PRIMARY KEY,
feature_type VARCHAR(50) NOT NULL, -- 'road', 'green_space', 'poi'
name TEXT,
geometry GEOMETRY(GEOGRAPHY, 4326),
properties JSONB,
osm_type TEXT,
osm_id TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Indexes
CREATE INDEX idx_geographical_geometry ON geographical_features USING GIST (geometry);
CREATE INDEX idx_geographical_type ON geographical_features (feature_type);
CREATE INDEX idx_geographical_properties ON geographical_features USING GIN (properties);
```
**Sites Table Enhancement:**
```sql
-- Add polygon geometry to existing sites
ALTER TABLE sites ADD COLUMN IF NOT EXISTS footprint_geometry GEOMETRY(POLYGON, 4326);
CREATE INDEX idx_sites_footprint ON sites USING GIST (footprint_geometry);
-- Add building metadata
ALTER TABLE sites ADD COLUMN IF NOT EXISTS building_height REAL;
ALTER TABLE sites ADD COLUMN IF NOT EXISTS building_levels INTEGER;
ALTER TABLE sites ADD COLUMN IF NOT EXISTS construction_year TEXT;
ALTER TABLE sites ADD COLUMN IF NOT EXISTS roof_type TEXT;
```
#### Data Migration Scripts
**Priority 1: Building Polygons**
```sql
-- Migrate building polygons to sites table
UPDATE sites
SET footprint_geometry = ST_GeomFromGeoJSON(osm_features.geometry)::GEOMETRY(Polygon, 4326)
FROM osm_features
WHERE sites.id = osm_features.id
AND osm_features.feature_type = 'building'
AND ST_IsValid(ST_GeomFromGeoJSON(osm_features.geometry)::GEOMETRY(Polygon, 4326));
```
**Priority 2: Road Network**
```sql
-- Insert road LineStrings
INSERT INTO geographical_features (id, feature_type, geometry, osm_type, osm_id, properties)
SELECT
'road_' || osm_id,
'road',
ST_GeomFromGeoJSON(geometry)::GEOMETRY(LineString, 4326),
osm_type,
osm_id,
properties::JSONB
FROM osm_features
WHERE feature_type = 'road'
AND geometry LIKE '%LineString%';
```
**Priority 3: Green Spaces**
```sql
-- Insert green space polygons
INSERT INTO geographical_features (id, feature_type, geometry, properties)
SELECT
'green_' || ROW_NUMBER() OVER (),
'green_space',
ST_GeomFromGeoJSON(geometry)::GEOMETRY(Polygon, 4326),
properties::JSONB
FROM osm_features
WHERE feature_type = 'green_space';
```
### Phase 2: Advanced Features
#### Spatial Analysis Functions
```sql
-- Building proximity analysis
CREATE OR REPLACE FUNCTION find_nearby_buildings(
target_geom GEOMETRY,
radius_meters REAL DEFAULT 1000
)
RETURNS TABLE (
site_id TEXT,
distance_meters REAL,
site_name TEXT
)
AS $$
BEGIN
RETURN QUERY
SELECT
s.id,
ST_Distance(target_geom::GEOGRAPHY, s.location_geometry::GEOGRAPHY) as distance,
s.name
FROM sites s
WHERE ST_DWithin(target_geom::GEOGRAPHY, s.location_geometry::GEOGRAPHY, radius_meters)
ORDER BY distance;
END;
$$ LANGUAGE plpgsql;
-- Road network analysis
CREATE OR REPLACE FUNCTION calculate_transport_cost(
from_geom GEOMETRY,
to_geom GEOMETRY
)
RETURNS REAL
AS $$
DECLARE
road_distance REAL;
BEGIN
-- Find shortest path along road network
SELECT ST_Length(ST_ShortestLine(from_geom, to_geom)::GEOGRAPHY)
INTO road_distance;
RETURN road_distance * 0.1; -- Example: €0.10 per meter transport cost
END;
$$ LANGUAGE plpgsql;
```
### Phase 3: Integration with Industrial Symbiosis Platform
#### Neo4j Graph Synchronization
- Extend sync service to include geographical features
- Add spatial relationships to graph database
- Enable geospatial queries in Cypher
#### API Enhancements
```go
// New geographical service endpoints
type GeographicalService interface {
FindNearbyFacilities(ctx context.Context, lat, lng float64, radiusMeters float64) ([]Facility, error)
CalculateTransportRoutes(ctx context.Context, from, to Location) (Route, error)
AnalyzeGreenSpaceCoverage(ctx context.Context, area Geometry) (CoverageReport, error)
}
```
## Data Quality Considerations
### Validation Rules
- **Geometry Validity**: ST_IsValid() checks on all imported geometries
- **Coordinate System**: Ensure all data uses WGS84 (SRID 4326)
- **Topology**: Validate polygon orientations and ring ordering
- **Duplicates**: Handle OSM feature duplicates and versioning
### Performance Optimization
- **Spatial Indexes**: GIST indexes on all geometry columns
- **Partitioning**: Consider partitioning large tables by geography
- **Caching**: Cache frequently accessed spatial queries
## Risk Assessment & Mitigation
### Technical Risks
| Risk | Impact | Mitigation |
|------|--------|------------|
| Invalid geometries | Data corruption | Pre-validation before import |
| Performance degradation | Slow queries | Proper indexing strategy |
| Coordinate system mismatch | Incorrect analysis | SRID validation |
| Data volume | Storage issues | Incremental migration approach |
### Business Risks
| Risk | Impact | Mitigation |
|------|--------|------------|
| Incomplete migration | Missing analysis capabilities | Phased approach with testing |
| Data quality issues | Incorrect business decisions | Quality assurance pipeline |
| Integration complexity | System instability | Isolated testing environment |
## Implementation Timeline
### Phase 1 (Weeks 1-2): Foundation
- [ ] Create geographical_features table schema
- [ ] Implement building polygon migration
- [ ] Add spatial indexes and constraints
- [ ] Basic validation and testing
### Phase 2 (Weeks 3-4): Expansion
- [ ] Migrate road network data
- [ ] Add green space polygons
- [ ] Implement spatial analysis functions
- [ ] Performance optimization
### Phase 3 (Weeks 5-6): Integration
- [ ] Extend Neo4j synchronization
- [ ] Add API endpoints
- [ ] Update frontend components
- [ ] Documentation and training
## Success Metrics
### Quantitative Metrics
- **Data Completeness**: 95% of OSM features successfully migrated
- **Geometry Validity**: 99% of imported geometries pass validation
- **Query Performance**: Spatial queries < 500ms average response time
- **Storage Efficiency**: < 20% increase in database size
### Qualitative Metrics
- **Analysis Capabilities**: Enable 5+ new spatial analysis features
- **User Experience**: Improved map visualization and interaction
- **Business Value**: Support for location-based industrial symbiosis decisions
## Conclusion
Migrating geographical data from SQLite to PostgreSQL PostGIS will significantly enhance the Turash platform's analytical capabilities. The focus on building polygons, road networks, and green spaces will enable sophisticated spatial analysis for industrial symbiosis optimization.
**Recommended Approach**: Start with building polygons (highest impact, lowest risk), then expand to road networks and environmental features.
---
*Document Version: 1.0*
*Last Updated: November 2025*
*Author: AI Assistant*