From PyTorch to JAX: The Evolution of Ergonomics in Scalable Programming

Two Stanford Grads Transpile Messenger Chats into a Substack Essay using AI

and

Sep 21, 2023

Moore’s Law has ended, forcing software to be explicitly parallelized and scaled across multiple processors. Imperative code relying on sequential CPU speedups no longer suffices.

Functional programming (FP) has become essential for performant computing post-Moore. By emphasizing immutability, purity and avoiding side effects, FP naturally lends itself to parallelization and scaling. Languages like Haskell bake in FP concepts, guiding developers towards scalable code.

However, FP is often seen as more difficult than imperative programming. The FP paradigm feels unnatural to those accustomed to mutating state with for loops. FP languages frequently lack developer tools that aid imperative development like debuggers and REPLs.

Interestingly, Python has drifted towards a more functional style despite its imperative roots. With compute tasks offloaded to performant C libraries, Python code now focuses more on declaratively describing operations on arrays and data. Libraries like NumPy were directly inspired by array programming languages that embodied FP concepts.

PyTorch exemplifies this trend. Its eager execution model retained Python’s interactivity for fast prototyping. But by building on optimized C++/CUDA libraries, PyTorch code scales seamlessly to GPUs and clusters. This combination made PyTorch the dominant framework for ML research.

Other Python ML libraries like JAX and Equinox are pushing further into FP with immutable data structures, functional operators and compiled execution. By layering FP abstractions on performant numerical libraries, they retain Python’s ease-of-use while boosting performance and scalability.

This shift is driven by hardware changes, not an inherent preference for FP style. Imperative code matched the Moore’s Law era of fast sequential CPUs. But FP maps better to today’s parallel hardware with its emphasis on explicit parallelism and immutability. FP allows expressing more complex parallel algorithms in a simpler and more scalable way.

An interesting analogy can be made between programming languages and linear algebra bases. Just as bases provide alternative data representations, languages enable different ways to model and transform code. Adopting a new language or library is like changing bases — it enables new abstractions to simplify algorithms and data structures.

For example, switching from Python to Haskell is akin to changing from a coordinate to eigenbasis. Concepts tangled imperatively become elegant functionally, just as eigendecompositions simplify coordinate transformations. This basis view explains the allure of new languages and libraries — they reduce complexity by providing better abstractions.

Our approach to scalability has changed over time. In the past, scaling systems incurred major overhead. Technologies like assembly lines and computing enabled easier scaling via parallelization. Today’s cloud platforms commoditize low-level scaling, letting developers focus on readable code. The future may automate optimizing and parallelizing code for scale using AI like GPT. This could enable simple code with automatic scalability.

Debugging FP code can be challenging compared to imperative code. The layered development style common in imperative languages is difficult with pure FP. FP libraries often lack intermediate steps, making their final form code misleading. Tools like Debux that trace intermediate values help debug complex pipelines. Integrating FP libraries into imperative code requires care to bridge differences in development approaches.

The long-term impact of FP remains uncertain. Full adoption of languages like Haskell has been slow. But FP features like immutability and purity are becoming mainstream through libraries like NumPy and PyTorch. As parallelism becomes mandatory, FP will likely grow as the preferred way to write scalable code in a post-Moore world.

This essay is a unique experiment in content generation. Both authors (@yashsavani_, @eating_entropy), having roots in Stanford's computer science program and diverging into the realms of AI doing a Ph.D. at CMU and Large Language Model UX, sought to harness the potential of AI in reimagining the nature of our casual discourse. Drawing from our extensive conversations over Facebook Messenger, we converted our raw chat logs into this essay. The process began by refining the logs through regular expressions to weed out minor inconsistencies. For the transmutation of our chat logs into a cohesive narrative, we employed Claude 2, an AI model known for handling long context lengths. Our choice was driven by necessity: while OpenAI's models cap at 8192 tokens, our chat logs surpassed this limit, nudging us towards Claude 2's extended capabilities. What follows is an AI-assisted distillation of our musings, discussions, and philosophical ponderings, offering readers a glimpse into our thought processes, amplified and restructured by the powers of machine learning. (This paragraph was generated by GPT-4, meaning in this entire piece, we hand-typed only this sentence.)

A guest post by

Yash Savani

I’m a third-year Ph.D. student in the CSD at CMU. My current research interests are high dimensional statistics, nonlinear dynamical systems, deep learning, and probabilistic modeling.

Josh’s Substack

Discussion about this post