Learning to Predict User-Defined Types

Reviewed by Greg Wilson / 2023-03-01
Keywords: Machine Learning, Types

There has been a lot of excitement (and hype) recently about using large language models (LLMs) to write software. While most attention has focused on the general case, narrower or more specific uses are equally interesting. This paper looks at one of those: using a model to define types for user-defined classes and interfaces in TypeScript. As tools like this go into production, I wonder if the next generation of programming languages will be consciously designed to be easier for models like this to understand…

Kevin Jesse, Premkumar Devanbu, and Anand Ashok Sawant. Learning to predict user-defined types. IEEE Transactions on Software Engineering, 2022. doi:10.1109/tse.2022.3178945 (PDF).

TypeScript is a widely adopted gradual typed language where developers can optionally type variables, functions, parameters and more. Probabilistic type inference approaches with ML (machine learning) work well especially for commonly occurring types such as boolean, number, and string. TypeScript permits a wide range of types including developer defined class names and type interfaces. These developer defined types, termed user-defined types, can be written within the realm of language naming conventions. The set of user-defined types is boundless and existing bounded type guessing approaches are an imperfect solution. Existing works either under perform in user-defined types or ignore user-defined types altogether. This work leverages a BERT-style pre-trained model, with multi-task learning objectives, to learn how to type user-defined classes and interfaces. Thus we present DIVERSETYPER, a solution that explores the diverse set of user-defined types by uniquely aligning classes and interfaces declarations to the places in which they are used. DIVERSETYPER surpasses all existing works including those that model user-defined types.