Indiana University Bloomington

School of Informatics, Computing, and Engineering

Technical Report TR741:
LoCal: A Language for Programs Operating on Serialized Data

Michael Vollmer, Chaitanya Koparkar, Mike Rainey, Laith Sakka, Milind Kulkarni, Ryan R. Newton
(Apr 2019), 34 pages
In a typical data-processing program, the representation of data in memory is distinct from its representation in a serialized form on disk. The former has pointers and ar- bitrary, sparse layout, facilitating easy manipulation by a program, while the latter is packed contiguously, facilitating easy I/O. We propose a language, LoCal, to unify in-memory and serialized formats. LoCal extends a region calculus into a location calculus, employing a type system that tracks the byte-addressed layout of all heap values. We formalize LoCal and prove type safety, and show how LoCal programs can be inferred from unannotated source terms. We transform the existing Gibbon compiler to use LoCal as an intermediate language, with the goal of achieving a balance between code speed and data compactness by intro- ducing just enough indirection into heap layouts, preserving the asymptotic complexity of traditional representations, but working with mostly or completely serialized data. We show that our approach yields significant performance improve- ment over prior approaches to operating on packed data, without abandoning idiomatic programming with recursive functions.

Available as: