Why ML Needs a New Programming Language Season 3, Episode 10 | September 3rd, 2025 With Chris Lattner, creator of LLVM, Swift, and now Mojo --- Overview Chris Lattner discusses the challenges of programming modern heterogeneous computing hardware—especially GPUs—and introduces Mojo, a new programming language designed to unlock the full power of AI accelerators while maintaining usability and portability. --- Background Chris Lattner's career: Creator of LLVM, architect of Swift, and contributor to foundational compiler technologies. Motivation: Existing tools like CUDA are outdated and fragmented, tied closely to specific hardware vendors, making it hard to write performant, portable AI code. New challenge: Modern AI hardware (e.g., GPUs with tensor cores, TPUs) is complex and diverse, requiring new software approaches. --- Structural Problem in AI Compute Fragmented ecosystem with each hardware vendor (Nvidia, AMD, Google) building their own stack. Lack of a unified, vendor-neutral platform limits portability and programmer productivity. Software stacks for AI are fast-evolving but incompatible, leading to complexity and maintenance difficulties. Modular aims to build a better, unified platform that competes even with vendor-specific tools on their own hardware. --- Modular and Mojo Modular: Company founded to solve fragmentation in AI compute software stacks. Mojo: A new programming language designed for high-performance, portable heterogeneous computing. Goals of Mojo: Full control and performance on hardware, including GPUs and accelerators. Easy integration into Python ecosystems (Python-like syntax). Support metaprogramming and type-safe abstractions for domain-specific optimization. Deliver the power, control, and predictability often lacking in Python or CUDA-based approaches. --- Mojo Language Design Highlights Pythonic syntax: familiar to AI researchers and data scientists. Static typing and traits: improves safety, makes performance guarantees easier, supports generic programming. Powerful metaprogramming: compile-time execution of code enables specialization and optimization for diverse hardware platforms. Type-safe abstractions: traits akin to Rust or Swift protocols enable modular and composable designs. Portable intermediate representation: Mojo packages compile to portable code, allowing later specialization per target device (similar in spirit to Java bytecode but different in design). --- Advantages Over Existing Tools Unlike CUDA or OpenCL, Mojo aims to be safe, expressive, high-performance, and portable without sacrificing control. Avoids the “magic compiler” approach that tries for automatic optimization but often breaks unpredictably. Enables expert developers to directly encode hardware details without hidden complexity. Supports seamless interoperability with Python, allowing gradual migration of slow Python code to Mojo with massive speed-ups (10x, 100x, or more). --- GPU and Hardware Context Modern GPUs have complex architectures with "warps," "streaming multiprocessors," and tensor cores specialized for matrix operations central to AI. Hardware evolves quickly, breaking backward compatibility (e.g., Nvidia's Blackwell GPU incompatible with Hopper kernels). Other compute devices like TPUs and FPGAs add complexity with distinct programming models. Mojo is designed from the ground up with these hardware realities in mind. --- Metaprogramming and Performance Metaprogramming merges compile-time and runtime code, allowing generation of highly specialized, zero-overhead GPU kernels. Enables writing one program that can adapt to varying hardware layouts, precision formats, and threading models. This addresses the high combinatorial complexity of AI kernels and hardware configurations. --- Managing Complexity and