Overview
The Spacedrive semantic tagging system is an advanced, graph-based tagging architecture that transforms traditional flat tagging into a sophisticated semantic fabric for content organization. Unlike simple label-based systems, semantic tags support polymorphic naming, context-aware disambiguation, hierarchical relationships, and intelligent conflict resolution during synchronization. This system implements the semantic tagging architecture described in the Spacedrive whitepaper, enabling enterprise-grade knowledge management capabilities while maintaining intuitive user experience.Core Architecture
Design Principles
- Graph-Based DAG Structure - Tags form a directed acyclic graph with closure table optimization
- Polymorphic Naming - Multiple tags can share the same name in different contexts
- Semantic Variants - Each tag supports formal names, abbreviations, and aliases
- Context Resolution - Intelligent disambiguation based on existing tag relationships
- Union Merge Conflicts - Sync conflicts resolved by combining tags (additive approach)
- AI-Native Integration - Built-in confidence scoring and pattern recognition
- Privacy-Aware - Tags support visibility controls and search filtering
Core Components
- SemanticTag - Enhanced tag entity with variants and relationships
- TagRelationship - Typed relationships between tags (parent/child, synonym, related)
- TagClosure - Closure table for efficient hierarchical queries
- TagApplication - Context-aware association of tags with content
- TagUsagePattern - Co-occurrence tracking for intelligent suggestions
- TagContextResolver - Disambiguation engine for ambiguous tag names
Data Models
SemanticTag
The core tag entity with advanced semantic capabilities:TagType Enum
PrivacyLevel Enum
TagRelationship
Defines relationships between tags in the semantic graph:TagApplication
Context-aware association of tags with user metadata:Database Schema
Tables Overview
Closure Table Pattern
The closure table enables O(1) hierarchical queries by pre-computing all ancestor-descendant relationships:Key Features
1. Polymorphic Naming
Multiple tags can share the same canonical name when differentiated by namespace:2. Semantic Variants
Each tag supports multiple access points for flexible user interaction:3. Context-Aware Resolution
When users type ambiguous tag names, the system intelligently resolves them based on existing context:- Namespace compatibility with existing tags
- Usage patterns from historical co-occurrence
- Hierarchical relationships between tags
4. Hierarchical Organization
Tags form a directed acyclic graph (DAG) structure supporting:- Implicit Classification: Tagging with “React” automatically inherits “Frontend”, “Web Development”, etc.
- Semantic Discovery: Searching “Technology” surfaces all descendant content
- Emergent Patterns: System reveals organizational connections users didn’t explicitly create
5. AI Integration
The system supports AI-powered tagging with confidence scoring:- Confidence Scoring: 0.0-1.0 confidence levels for AI suggestions
- User Review: Low confidence tags require user approval
- Learning Loop: User corrections improve future AI suggestions
- Privacy Options: Local models (Ollama) or cloud APIs with user control
6. Union Merge Conflict Resolution
During synchronization, tag conflicts are resolved using an additive approach:Manager Layer
TagManager
Core manager providing high-level tag operations. Located inops/tags/manager.rs:
TagContextResolver
Handles intelligent disambiguation of ambiguous tag names:TagUsageAnalyzer
Tracks usage patterns and discovers emergent organizational structures:UserMetadataManager
Manages user metadata including semantic tag applications. Located inops/metadata/manager.rs:
Usage Examples
Basic Tag Creation
Building Hierarchies
Applying Tags to Content
Context Resolution
Pattern Discovery
Integration with Core Systems
Entry-Centric Metadata
Every Entry has immediate metadata capability through themetadata_id field:
Action System Integration
The semantic tagging system integrates with Spacedrive’s Action System for validation, audit logging, and transactional operations:- Instant Tagging: Files can be tagged immediately upon discovery
- Rich Context: Each tag application includes confidence, source, and attributes
- Sync Integration: Tag applications sync with conflict resolution
Indexing System Integration
The indexing system can trigger automatic tagging during the Intelligence Queueing Phase:Search Integration
The Temporal-Semantic Search system leverages semantic tags for enhanced discovery:Sync System Integration
Semantic tags integrate with Library Sync using union merge resolution:Performance Considerations
Closure Table Benefits
The closure table pattern provides O(1) hierarchical queries:- Ancestor Queries:
SELECT * FROM tag_closure WHERE descendant_id = ? - Descendant Queries:
SELECT * FROM tag_closure WHERE ancestor_id = ? - Path Queries:
SELECT * FROM tag_closure WHERE ancestor_id = ? AND descendant_id = ? - Depth Queries:
SELECT * FROM tag_closure WHERE depth = ?
Indexing Strategy
Key database indexes for performance:Full-Text Search
SQLite FTS5 provides efficient text search across all tag variants:File Organization
The semantic tagging system is organized in theops/ directory following Spacedrive’s architectural patterns:
Migration Strategy
Since this is a development codebase with no existing users, the semantic tagging system completely replaces the old simple tag system:- Database Migration:
m20250115_000001_semantic_tags.rscreates all new tables - Clean Implementation: No data migration or backward compatibility needed
- Feature Complete: All whitepaper features available from day one
- Performance Optimized: Built with proper indexing and closure table
- Action Integration: Full integration with Spacedrive’s Action System
Future Enhancements
Planned advanced features building on this foundation:Enterprise RBAC Integration
Advanced AI Features
- Semantic Similarity: Vector embeddings for content-based tag suggestions
- Temporal Patterns: Time-based usage analysis for lifecycle tagging
- Cross-Library Learning: Federated learning across user libraries (privacy-preserving)
Enhanced Sync Features
- Selective Sync: Choose which tag namespaces to sync across devices
- Conflict Policies: User-configurable resolution strategies
- Audit Trail: Complete history of tag operations across all devices
