 |


By Paul Teta, Ph.D.,
Executive Vice President, GfK V2
There is a dirty little secret behind sample size decisions: They’re
subjective. You could spend a lifetime developing theorems, doing power
analyses and programming Monte Carlo simulations, but it will never
remove the fundamental subjectivity that underlies determining the “right”
sample size for a given project.
I mean no offense to my colleagues, but few of us yearned in our youth
to become sampling statisticians. The field is necessarily steeped in
complex mathematical formulae, the results of which are difficult for
the practitioner to apply when evaluating the fundamental question of
how much is enough. As a result, we often…far too often…rely
on project budgets to determine how many respondents to include in research.
Some marketing research suppliers fuel this reliance by providing multiple
sampling alternatives in their proposals without an explanation of the
trade-offs associated with each. Let’s face it, providing a menu
of sample sizes is more often motivated by a desire to hedge bets on
winning work than it is on helping clients make informed research decisions.
Absent the detail on margin of error, the decision maker has nothing
to go on but price. Why would anyone pay more for the “optimum
n” alternative when the “minimum n” will suffice?
Mind you, budget is a perfectly legitimate part of the equation, but
it must be considered within the context of other design factors, including
the size of the population of interest, the variability of what you
wish to measure and the importance of the business decisions riding
on the research outcomes.
Despite statisticians’ impenetrable lexicon, the logic underlying
sampling is simple. The essential concept is given by the “principle
of aggregation.” Every piece of data that you collect in primary
marketing research consists of some part “truth” and some
part “error.” Error arises from a number of causes. Poor
recollection, respondent fatigue, overstating prescribing intent and
misentering responses are all common examples of experimental error.
Some errors will be positive – for example, the respondent overestimates
use of your product – some errors will be negative, as when the
respondent underestimates use of your product. In theory, if you have
an infinite number of respondents, positive and negative errors will
fully cancel out each other and you will be left with pure truth. The
implication of this principle in practice is that larger sample sizes
will contain less error than smaller sample sizes: An n of 50 will contain
significantly more error than an n of 500.
The principle of aggregation is operationalized through the central
limit theorem (CLT), which many regard as the heart of probability theory.
Statisticians use the CLT to justify mathematically the assumption of
normality that underlies most of statistical testing. The normality
assumption is critical because it provides the probabilities (p-values)
that allow us to infer statistical significance. The CLT says that if
we have a sufficient number of observations, we can be comfortable that
the data do not violate the assumption of normality.
So what number is sufficient for the CLT? Over nearly 300 years
of conducting probability simulations, statisticians have concluded
that a sample size of 30 is enough for the theorem to hold (okay, not
all statisticians, but that’s a topic for another article). But
common sense dictates that simply because 30 respondents are enough
to “make the math work,” it does not mean that we can get
away with this minimum in applied research. Would you use a novel cardiovascular
agent if it were tested in clinical trials with just 30 subjects?
Sample size decisions must be informed with the aid of a power analysis.
Statistical power is broadly defined as our ability to detect an effect,
given that the effect actually exists. For example, if a new pharmaceutical
product is truly superior in efficacy to the existing gold standard,
a power analysis will guide how many patients we need in the clinical
trials to detect this effect.
There are several elements that drive a power analysis. First, consider
the anticipated variability in response, because higher variability
will require larger samples. If you want to understand the prescribing
dynamics in Alzheimer’s disease, you would need to talk to far
fewer physicians than if you want to understand the same dynamics in
hyperlipidemia. There are fewer approved agents in Alzheimer’s
disease than there are in hyperlipidemia, so there will be less variability
in prescribing decisions. Similarly, generalists tend to be more heterogenous
than are specialists, therefore you should sample more heavily from
primary care physicians.
Next, decide on level of confidence with which you are comfortable.
Level of confidence refers to the amount of risk you are willing to
tolerate in your study. For example, a risk of 5 percent means there
is a 1-in-20 chance you will fail to find an effect that actually exists.
Finally, estimate the effect size. The more subtle the effect, the more
difficult it will be to detect. Isolating small anticipated effects
will require more respondents.
The power analysis results in estimates of the margin of error associated
with various sample sizes. Pollsters frequently report margin of error
to communicate the statistical reliability of their findings, e.g.,
“this poll has a margin of error of plus or minus 4 percent.”
This means that if the poll were conducted 100 times, the data would
be within four points above or below the percentage reported 95 times
out of 100.
Once you have the results of the power analysis, the resultant margin
of error will help you with the most important question: What is riding
on this project? If the research is exploratory, then your tolerance
for risk will be high, so a higher margin of error is acceptable. If,
on the other hand, critical business decisions are on the line, you
are well advised to demand a low margin of error and budget accordingly.
|
 |