HackerSqueeze — AI‑curated tech & startup insights

Jane Street’s 2025 intern edition, written by Yaron Minsky (Aug 27, 2025; ~14 min read), recaps a bumper crop of intern projects and three deeper dives into tooling and systems work. Intern projects (highlights) - Annie Hu explored neural net sequence models, various compilation toolchains, and optimization strategies. - Aster Oransoy added build-priority controls to shared action execution services to prevent low-priority builds from delaying high-priority ones. - Allen Pei built a QuickCheck-like system for testing trading logic by generating randomized event sequences, including shrinking of failing test cases. - Evan Thompson created an LSP for an OCaml-based CSS extension, including a CSS validator that found many invalid CSS instances. - Zhibo Chen extended OCaml with a generic form of optional arguments using alternative representations for options. - Conor Kennedy added predicate pushdown to the internal data warehouse and even sketched a mini query planner for narrowing key ranges. - Joe Cutler experimented with JIT techniques to speed up the HardCaml simulator toward Verilator-level performance with faster startup times. Three projects are explored in more depth: 1) Faster (J)SQL evaluation (Leo Gagnon) - Built a selection-based approach to exploit indexed data structures (like Map.t and Hashtbl.t) when evaluating a JSQL WHERE clause. - The system determines relevant keys, uses logarithmic lookups, and then filters within the smaller result set, achieving large speedups over linear scans. - Benchmark results show dramatic gains (e.g., ~700x speedup for a first query targeting MSFT trades, and ~30x for a more complex range-query scenario). - Involves designing a selection type, extraction/optimization logic, and specialized execution paths for different storage backends; also supports multi-index structures and tuning via heuristics. 2) Better Torch bindings (Aryan Khatri) - Addresses the mismatch between OCaml’s tracing GC and Python’s refcounting GC when driving PyTorch from OCaml. - Introduces a safe, efficient memory-management model using OxCaml, including withrcscope to mark allocation scopes for tensors and ensure timely deallocation. - Utilizes OxCaml’s local mode to prevent tensors from escaping scopes, with examples showing safe, scoped tensor usage and disciplined memory management to avoid expensive GC pauses. - The effort aims at a practical API upgrade to Open-Source Torch bindings, with plans to release on the project page. 3) Ref-counted objects in shared memory (Anthony Li) - Seeks to avoid serializing complex shared data by passing pointers across processes via shared memory. - Requires careful management of object lifetimes since OCaml’s GC is per-process; uses reference counting to control object lifetimes across processes. - Employs a mode-based approach (including read/write/read_only visibility) to avoid data races and to allow safe sharing of mutable data. - Introduces a framework of handles (Handle.t) with uniq