A. C. Brett acbrett@uvic.ca Department of Linguistics
University of Victoria
Clearihue C139
Last updated: 24 December 2005
Frequency and Probability
EVENT: A specified value or collection of values that might be observed or
measured on an identified variable or a collection of variables.
If the population sampled is discrete, an event can consist of one or more discrete values.
If the population is continuous, an event consists of one or more intervals (ranges of
values).
INDEPENDENT EVENTS: Events are independent if the occurrence or observation of
one event does not determine or enable one to predict another event.
If the events of interest are the values on a specific variable, and if the values are
independent, then the measurement of a particular value does not determine or enable one
to anticipate another value on the variable.
FREQUENCY (of and event): The number of times an event is observed (measured)
in a sample. If the sample consists of n observations of a discrete population, then n(i)
can denote the number of times the i-th value is observed or measured in the sample.
(Note that the index i in this context serves only to identify an event consisting of an
observed value.
The index i is not necessarily the value itself.
The index i may also identify a collection of values comprising an event.
These values may be measured on a variable with a nominal, ordinal, interval, or ratio
level of measurement.
Note further that, if the sample is from a continuous population, then the index i may
identify an interval or range of data values.)
RELATIVE FREQUENCY (of an event): If n(i) denotes the frequency observed for the
i-th event (consisting of a measured value, collection, or range of values) in a sample of
size n, then the relative frequency of the i-th event is n(i)/n, which may be denoted by
f(i); that is, f(i) = n(i)/n.
Relative frequency is sometimes treated as a proportion and is represented as a percentage,
which is calculated by multiplying f(i) by 100.
CUMULATIVE FREQUENCY: If a variable has at least an ordinal level of measurement,
it is possible to accumulate (add up, compute a "running total") of the relative
frequencies determined from the observations on a variable for a sample.
This accumulation (running total) begins with the smallest value (the first of the values
in their order according to the scale for an ordinal variable).
If the smallest value is indexed 1, and its relative frequency if f(1), then the
cumulative (relative) frequency is F(1) = f(1).
If the next smallest value is indexed 2, and its relative frequency is f(2), then the
cumulative (relative) frequency for the value is F(2) = f(1) + f(2).
The cumulative (relative) frequency for the third value is F(3) = f(1) + f(2) + f(3).
This procedure for computing a cumulative frequency is continued for all the values
observed on the scale up to the last, or n-th, so that F(n) = f(1) + f(2) + f(3) + ... + f(n).
Since the sum of the relative frequencies over all the values on the scale is one,
F(n) = 1.
(The cumulative frequency is sometimes treated as a proportion and is represented as a
percentage, which is computed by multiplying F(i) by 100, where the index i ranges over
all the values on the scale from 1 to n.)
If the sampled population is continuous, then the index identifies intervals or ranges of
values.
If the observations comprise independent events, then the cumulative (relative)
frequency F(i) can be treated as (an estimate of) the probability of observing a value that
is less than or equal to the valued indexed i.
PROBABILITY (of an event): A number, ranging between zero and one, that may be
estimated (inferred) using the relative frequency of an event determined from a sample.
For an event i, corresponding to a specified value or collection of values, the probability
of the event might be denoted p(i).
If the sample is measured again (on a particular variable), then the probability p(i) of
observing a given value or range of values comprising an event i is equal to the relative
frequency f(i) that was determined from the sample for the value or range of values
comprising the event i; that is, p(i) = f(i).
"Chance" and "likelihood" are common synonyms for the term "probability."
The probability or chance p(i) of observing a specified event i in a population might be
viewed as something like a parameter of the population in that, the more accurately a
sample represents the population from which it is drawn, the better the relative frequency
f(i) determined from sample estimates the probability p(i) of the event i in the
population; that is, f(i) ® p(i).
PROBABILITY (of independent events): If events i and j are independent, with
probabilities p(i) and p(j), respectively, then the probability
p(i Ú j) of observing either i or j is the sum of the
probabilities of i and j; that is, p(i Ú j) = p(i) +
p(j). Note that "Ú" in
p(i Ú j) is read as "or."
If n events, indexed 1, 2, ..., n, are independent with probabilities
p(1), p(2), ..., p(n), then p(1 Ú 2
Ú ... Ú n) = p(1) +
p(2) + ... + p(n), where p(1 Ú 2
Ú ... Ú n) denotes the
probability of observing one of the events, either event 1, or event 2, or one of the
events up to the n-th.
If events i and j are independent, with probabilities p(i) and p(j), respectively, then
the probability p(i & j) of observing both i and j is the product of the probabilities
of i and j; that is, p(i & j) = p(i) × p(j).
Note that "&" in p(i & j) is read as "and." If n events, indexed 1, 2, ..., n, are
independent with probabilities p(1), p(2), ..., p(n), then p(1 & 2 & ... & n)
= p(1) × p(2) × ... × p(n), where p(1 & 2 & ... & n) denotes
the probability of observing all of the events, including event 1, and event 2, and the
events up to and including the n-th.