Which programming error messages are the most common? We
investigate this question, motivated by writing error
explanations for novices. We consider large data sets in Python
and Java that include both syntax and run-time errors. In both
data sets, after grouping essentially identical messages, the
error message frequencies empirically resemble Zipf-Mandelbrot
distributions. We use a maximum-likelihood approach to fit the
distribution parameters. This gives one possible way to contrast
languages or compilers quantitatively.

IndentationError: unindent does not match any outer indentation level

and the 5 most common in Java are:

702102

cannot find symbol - variable NAME

407776

’;’ expected

280874

cannot find symbol - method NAME

197213

cannot find symbol - class NAME

183908

incompatible types

What's more,
their frequency has a power law (or "long tail") distribution,
which suggests that improving reporting for just a handful of errors
would have a disproportionate effect on usability.
But my favorite part of this paper comes toward the end of Section 3.2:

Can this [relationship] be plausible: is the total number of
possible errors infinite? We will accept this as a reasonable
hypothesis...