Syntactic foundations for machine learning

Thumbnail Image
Bhat, Sooraj
Gray, Alexander G.
Isbell, Charles L.
Associated Organizations
Organizational Unit
Organizational Unit
Supplementary to
Machine learning has risen in importance across science, engineering, and business in recent years. Domain experts have begun to understand how their data analysis problems can be solved in a principled and efficient manner using methods from machine learning, with its simultaneous focus on statistical and computational concerns. Moreover, the data in many of these application domains has exploded in availability and scale, further underscoring the need for algorithms which find patterns and trends quickly and correctly. However, most people actually analyzing data today operate far from the expert level. Available statistical libraries and even textbooks contain only a finite sample of the possibilities afforded by the underlying mathematical principles. Ideally, practitioners should be able to do what machine learning experts can do--employ the fundamental principles to experiment with the practically infinite number of possible customized statistical models as well as alternative algorithms for solving them, including advanced techniques for handling massive datasets. This would lead to more accurate models, the ability in some cases to analyze data that was previously intractable, and, if the experimentation can be greatly accelerated, huge gains in human productivity. Fixing this state of affairs involves mechanizing and automating these statistical and algorithmic principles. This task has received little attention because we lack a suitable syntactic representation that is capable of specifying machine learning problems and solutions, so there is no way to encode the principles in question, which are themselves a mapping between problem and solution. This work focuses on providing the foundational layer for enabling this vision, with the thesis that such a representation is possible. We demonstrate the thesis by defining a syntactic representation of machine learning that is expressive, promotes correctness, and enables the mechanization of a wide variety of useful solution principles.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI