Spacedrive’s data model powers a Virtual Distributed File System (VDFS) that unifies files across all your devices. It enables instant organization, content deduplication, and powerful semantic search while maintaining performance at scale.
It is critical to understand the distinction between two data modeling layers in Spacedrive:
Domain Models: These are the rich objects used throughout the application’s business logic. They contain computed fields and methods that provide a powerful, high-level interface to the underlying data. For example, the domain::File structure represents several database models such as entities::entry, entities::content_identity, and entities::user_metadata.
Database Entity Models: These are simpler structs that map directly to the database tables (e.g., entities::entry). They represent the raw, persisted state of the data and are optimized for storage and query performance.
The code examples in this document generally refer to the database entity models to accurately represent what is stored on disk. The domain models provide a convenient abstraction over this raw data.
The SdPath enum is the universal addressing system for files across all storage backends:
Copy
pub enum SdPath { /// A direct pointer to a file on a specific local device Physical { device_slug: String, // The device slug (e.g., "jamies-macbook") path: PathBuf, // The local filesystem path }, /// A cloud storage path within a cloud volume Cloud { service: CloudServiceType, // The cloud service type (S3, GoogleDrive, etc.) identifier: String, // The cloud identifier (bucket name, drive name, etc.) path: String, // The cloud-native path (e.g., "photos/vacation.jpg") }, /// An abstract, location-independent handle via content ID Content { content_id: Uuid, // The unique content identifier }, /// A derivative data file (thumbnail, OCR text, embedding, etc.) Sidecar { content_id: Uuid, // The content this sidecar is derived from kind: SidecarKind, // The type of sidecar (thumb, ocr, embeddings, etc.) variant: SidecarVariant, // The specific variant (e.g., "grid@2x", "1080p") format: SidecarFormat, // The storage format (webp, json, msgpack, etc.) },}
This enum enables transparent operations across local filesystems, cloud storage, content-addressed files, and derivative data. The Physical variant handles traditional filesystem paths, Cloud manages cloud storage locations, Content enables deduplication-aware operations by referencing files by their content, and Sidecar addresses generated derivative data like thumbnails and embeddings.
The Entry is the core entity representing a file or directory. The database entity (entities::entry::Model) stores the fundamental hierarchy and metadata.
Expandable
Copy
pub struct Entry { pub id: i32, // Database primary key pub uuid: Option<Uuid>, // Global identifier (assigned immediately during indexing) pub name: String, // File or directory name pub kind: i32, // 0=File, 1=Directory, 2=Symlink pub extension: Option<String>, // File extension (without dot) // Relationships pub parent_id: Option<i32>, // Parent directory (self-referential) pub metadata_id: Option<i32>, // User metadata (when present) pub content_id: Option<i32>, // Content identity (for deduplication) pub volume_id: Option<i32>, // Volume this entry resides on (determines sync ownership) // Size and hierarchy pub size: i64, // File size in bytes pub aggregate_size: i64, // Total size including children pub child_count: i32, // Direct children count pub file_count: i32, // Total files in subtree // Filesystem metadata pub permissions: Option<String>, // Unix-style permissions pub inode: Option<i64>, // Platform-specific identifier // Timestamps pub created_at: DateTime<Utc>, pub modified_at: DateTime<Utc>, pub accessed_at: Option<DateTime<Utc>>, pub indexed_at: Option<DateTime<Utc>>, // When this entry was indexed, used for sync}
Entries inherit sync ownership from their volume, not directly from a device. When you plug a portable drive into a different machine, updating the volume’s device reference instantly transfers ownership of all entries on that volume. No bulk updates needed.This design enables portable storage to move seamlessly between devices while maintaining correct sync behavior.
All entries receive UUIDs immediately during indexing for UI caching compatibility. However, sync readiness is determined separately:
Directories - Sync ready immediately (no content to identify)
Empty files - Sync ready immediately (size = 0)
Regular files - Sync ready only after content identification (content_id present)
This ensures files sync only after proper content identification, while allowing the UI to cache and track all entries from the moment they’re discovered.
Volumes serve as the ownership anchor for entries. The device_id field determines which device owns all entries on this volume. When a portable drive moves between machines, updating this single field transfers ownership of the entire volume’s contents. See Library Sync for details on portable volume handling.
-- Photos extension creates:CREATE TABLE ext_photos_person ( id BLOB PRIMARY KEY, name TEXT NOT NULL, birth_date TEXT, metadata_id INTEGER NOT NULL, FOREIGN KEY (metadata_id) REFERENCES user_metadata(id));CREATE TABLE ext_photos_album ( id BLOB PRIMARY KEY, title TEXT NOT NULL, description TEXT, created_date TEXT, metadata_id INTEGER NOT NULL, FOREIGN KEY (metadata_id) REFERENCES user_metadata(id));
Entities: Device, Volume, Location, EntryOwnership flows through volumes. A device owns its volumes. Locations and entries reference a volume, inheriting ownership from the volume’s device. This indirection enables portable storage: when a drive moves between machines, updating the volume’s device reference transfers ownership of all associated entries instantly.Only the owning device can modify these resources. Last state wins.
Entities: Tag, UserMetadata, TagRelationship, ContentIdentityAny device can modify shared resources. Changes are ordered using Hybrid Logical Clocks for consistency across devices.
Directory Path Table - The full path for every directory is stored in a dedicated directory_paths table. This is the source of truth for directory paths and avoids storing redundant path information on every file entry, making path-based updates significantly more efficient.