Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker:
"Hey, You Have Given Me Too Many Knobs! Understanding and Dealing with Over-Designed Configuration in System Software".
ESEC/FSE'15, August 2015, http://dx.doi.org/10.1145/2786805.2786852,
This paper makes a first step in understanding a fundamental
question of configuration design: "do users really need so many
knobs?" To provide the quantitatively answer, we study the
configuration settings of real-world users, including thousands of
customers of a commercial storage system (Storage-A), and hundreds
of users of two widely-used open-source system software projects.
Our study reveals a series of interesting findings to motivate
software architects and developers to be more cautious and
disciplined in configuration design. Motivated by these findings,
we providea few concrete, practical guidelines which can
significantly reduce the configuration space. Take Storage-A as an
example, the guidelines can remove 51.9% of its parameters and
simplify 19.7% of the remaining ones with little impact on
existing users. Also, we study the existing configuration
navigation methods in the context of "too many knobs" to
understand their effectiveness in dealing with the over-designed
configuration, and to provide practices for building navigation
support in system software.
I can't write a better summary of the paper than the authors have themselves:
Only a small percentage (6.1%-16.7%) of configuration parameters
are set by the majority of users; a significant percentage (up to
54.1%) of parameters are rarely set by any user.
A small percentage (1.8%-7.8%) of parameters are configured by more than 90% of the users.
Software developers often choose more "flexible" data types for
configuration parameters to give users more flexibility of
settings (e.g., using numeric types instead of the simple Boolean
or enumerative ones). However, users seem not to take full
advantage of such flexibility. A significant percentage (up to
47.4%) of numeric parameters have no more than five distinct
settings among all the users' settings.
Similarly, for enumerative parameters with many options, typically
only two to three of the options are actually used by the users,
indicating once again the over-designed flexibility.
Too many knobs do come with a cost: users encounter tremendous
difficulties in knowing which parameters should be set among the
large configuration space. This is reflected by the following two
facts: (1) a significant percentage (up to 48.5%) of configuration
issues are about users' difficulties in finding or setting the
parameters to obtain the intended system behavior; (2) a
significant percentage (up to 53.3%) of configuration errors are
introduced due to users' staying with default values incorrectly.
Configuration parameters with explicit semantics, visible external
impact are set by more users, in comparison to parameters that are
specific to internal system implementation.
The configuration of the studied software can be significantly
simplified by reducing the configuration space both vertically and
horizontally. For Storage-A, 51.9% of the original parameters can
be hidden or removed, and 19.7% of the remaining ones can be
further converted into simpler types, with the impact on fewer
than 1% of the users. The similar reduction rates are also
observed in the other two open-source software.
Searching user manuals by keywords is not efficient to help users
identify the target parameter(s).
Google search can provide useful information for 46.1%-80.0% of
the historical configuration navigation issues. However, it is
less efficient in navigation parameters of less popular software
or new issues. The majority of resources on the Web that host
useful information for navigation are the contents contributed by
users, such as Q&A forums and blog articles.
Well-engineered NLP-based navigation can return the target
configuration parameter for more than 60% of the historical
navigation issues. Boosting the results with the statistics of
users' configuration settings in the field can significantly
improve the performance of NLP-based navigation.
There's lots more in here: discussion of the experimental method
used, a table of recommendations for simplifying configuration whose
points are all grounded in findings, and pointers to related work
(much of which I hadn't seen before). What's more, the
in a GitHub repository for those who wish to examine it