July 2007

Lonely? Dr. Max Diff to the Rescue! Benefits of the MaxDiff Approach Over Standard Rating Exercises

By Shiv Raman, M.A., M.B.A.,
Senior Vice President and Chief Marketing Scientist, GfK V2


I have a terrific idea for an online business in the brave new world of Web 2.0. Let’s find out what busy, career-driven singles really want in a partner, what’s really important to them and create compatibility metrics to match couples. People would pay anything for such a service, right? It’s foolproof!

OK. Reality check. According to Wikipedia, there were at least 844 lifestyle and dating services at the end of 2004. That number is likely closer to a thousand now. So, not a great business idea in the summer of 2007, but I bring this up for another reason. A quick run through some of the major dating sites reveals that they all require prospective singles to fill out an online questionnaire. The questionnaire is presumably used to develop profiles of the respondents and thereby identify the kind of people they are likely to be compatible with. Many of these questions measure the importance of traits such as intelligence, looks, sense of humor and so on. But, just how useful are such measures of importance? Would anyone rate any of these attributes as being less important relative to the others?

My contrived example illustrates two well-known issues in the measurement of stated attribute importance. First is the obvious point that if our attributes are all of the “mom and apple pie” variety, we are unlikely to detect variation in responses. This is the reason online dating sites want to know whether your idea of a good time involves dodging kinkajous and agoutis in the rainforest or sipping mojitos at a candle-lit table. Second, even if attributes cover a wide spectrum of issues, are such ratings-based exercises capable of identifying the top few attributes that truly matter? This latter point is the premise of this column – that typical, scale-based questions to measure stated importance are seldom a source of great insight. There are many reasons for the poor information content of such tasks, not the least of which are a) variations in how people interpret and use scales (think Grumpy Gordon with his fours and Sunny Sally with her tens) and b) a general unwillingness on our part to admit that some things really are unimportant (Bigger! Better! Faster! More!)

So what’s a poor researcher to do? She can’t just stop measuring attribute importance – if for nothing else, out of compassion for the legions of research agencies that churn out glossy quadrant maps by the dozen, plotting stated and derived importance with circles and arrows and a paragraph under each one. No, the solution must lie in thwarting the twin evils that plague ratings-based importance measurement, namely, scale usage bias and lack of differentiation.

Such a solution does exist in the form of a technique known MaxDiff. In the MaxDiff exercise, respondents do not rate attributes on a scale; instead, they evaluate attributes in smaller groups (typically four to six at a time) and pick the most and the least important in each group. This is an “information-rich” task, since for any subset of four attributes, identifying best and worst allows us to infer the ordering for five out of the six possible comparisons; that is, if attribute A is best and D is worst, we automatically know that A>B, A>C, A>D, B>D and C>D. An experimental design, known as the Balanced Incomplete Block Design, is utilized to ensure that all attributes appear in every position the same number of times to avoid bias arising from context and order effects.

Sample Input
The respondent evaluates several such subsets. For example, with 15 attributes, a reasonably optimal design is possible with 15 subsets of five attributes each. The resulting data is analogous to discrete choice or conjoint data and is analyzed using a special form of the Multinomial Logit Model. The end result is a complete scoring of the relative value (or importance) of each of the attributes.

The charts below illustrate stated importance measurements from a traditional ratings-based exercise and those from a MaxDiff exercise. There is little differentiation among the top attributes and the results are likely to jump around from study to study (i.e., attribute 5, the most important attribute, may or may not be the winner in a subsequent study). In contrast, the MaxDiff output demonstrates a clearly differentiated ordering of the attributes and provides solid evidence for attribute five’s preeminent role in influencing decisions.


MaxDiff provides unambiguous results as to the attributes that respondents feel are truly important. This is the nature of the exercise: By forcing respondents to identify the most and least important attributes, we are able to avoid the “everything is important” syndrome that afflicts the typical rating task. Another advantage is that the stability of these model-based estimates of importance makes them amenable to tracking over time.

So, is there anything at all that mars this rosy picture? As with most things, the power and value of this approach are not free. Though the exercise is simple, MaxDiff takes longer to complete than a rating exercise. Secondly, the analysis is analytically more challenging and requires specialized design and model-estimation. Typically, however, this additional complexity is on the scale of “value-add” analytics that adds richness to standard landscaping or ATU studies – in short a small investment that pays rich dividends immediately and over time.

Helping people find their perfect partners may be beyond the scope of any statistical tool. However, if you simply want to maximize the value of information from your next market research study, count on Dr. Max Diff!


Want to learn more on this topic? Please contact: