Announcing Spiral: Data 3.0 with Leading Backing Date: September 11, 2025 Author: Will Manning Category: Company --- Overview Spiral is introduced as a next-generation data system designed specifically for the "Third Age" of data, characterized by AI workloads and machine-scale data consumption. Current legacy systems fall short in handling this new era, necessitating a fresh architectural approach. --- Three Eras of Data Systems First Age: Human-scale inputs and outputs. Databases like Postgres were built for discrete, human-interacted data. Second Age: The "Big Data" era with machine-scale inputs but human-scale outputs, causing a split between data lakes and data warehouses. Led to the hybrid Lakehouse model merging both. Third Age: Marked by machine consumers needing both machine-scale inputs and machine-scale outputs—not just dashboards or summaries, but everything at massive scale. --- What Machines Want GPUs like NVIDIA H100 consume up to 4 million 100KiB images per second. Machine workloads require fast scans, lookups, and searches over petabyte to exabyte scales. Current tools (like Parquet files on S3) are inefficient, resulting in huge latency and low GPU utilization (e.g., 55 hours of network overhead to feed 1 second of GPU compute). AI systems demand direct and efficient data access, especially for complex data like vector embeddings, images, and documents. --- Symptoms of Challenges Price-performance issues: Complex, multi-step data loading pipelines waste costly GPU and engineer time. Security risks: Existing methods force insecure workarounds, such as granting excessive permissions or leaking sensitive data. Performance and security are deeply intertwined; shortcuts today add heavy technical debt and limit future features like multi-tenancy. --- Current Solutions and Their Limits Lakehouses are a step forward but still patchwork, involving multiple systems with inconsistent APIs and permissions. WebDataset and similar solutions work only for simple use cases, lacking performance and governance for production AI workloads. AI leaders like OpenAI build custom infrastructure because legacy platforms cannot meet their needs. This problem requires ground-up rethinking, not incremental fixes. --- Spiral’s Approach: Building for the Future Spiral developed Vortex, a state-of-the-art columnar file format donated to the Linux Foundation, endorsed by companies like Microsoft, Snowflake, and Palantir. Vortex offers: Compression like Parquet but 10-20x faster scans and 5-10x faster writes 100-200x faster random access reads (e.g., 1.5 ms vs. 200 ms for Parquet) Capability to decode data directly from S3 to GPU, eliminating CPU bottlenecks Spiral is an object-store-native database using Vortex at its core, delivering: Unified governance and security “fearless permissioning” True machine-scale throughput that saturates GPUs One API handling tiny embeddings through huge video files with no compromise Addresses the "uncanny valley" for 1KB–25MB data sizes by smartly storing or batching data as appropriate. --- What Spiral Delivers in Practice Fully saturates GPUs like the H100 with massive data feeds. Provides secure data sharing with time-bound, audited, granular permissions. Reduces complex multi-step data loading to a single query. Frees AI engineers to focus on AI, not infrastructure. --- The Future is Machine Scale Spiral focuses on handling complex data (e.g., multimodal AI, robotics) at machine scale, capable of terabit-per-second throughput and millions of concurrent reads. The gap between AI leaders and laggards widens; those solving data infrastructure challenges early will gain commanding future advantages. Spiral is actively partnering with industry leaders and invites teams spending