“Olympic Legacies” Conference, Oxford, March
2008.
Searching for the
Greatest Olympic Performances, Using a Complete Summer Olympics Database
By Charles Davis
A complete
digital database has been developed for all performances in the Olympic summer
games from 1896 to 2004, using historical sources and some online material. The
database covers all athletes in all events in all sports, and includes ranks
and performances at all stages including heats, qualification rounds and
preliminary rounds. There are more than 105,000 athletes listed, and more than
275,000 lines of data.
Such
complete digital data allows new forms of analysis. The performances and the
competitiveness of entire fields, not just medallists or finalists, can be
analysed, and changes tracked over time. Relative standard deviation (rsd) of
results measures how closely together results cluster. The lower the rsd, the
more competitive the event; in the 100m sprint in athletics, the rsd for men is
2-3%, while for women it is 3-4%. The rsd for the men’s long jump has declined
from 6% from 1912-1924 to 3% in recent Games. The negative impact of boycotts
in 1980 and 1984 can also be seen. Performances for individual champions can
now be measured and compared statistically as z-scores, showing how far above
their contemporaries they stood. Extraordinary z-scores were recorded by Bob
Beamon in the 1968 long jump, Bob Hayes in the 1964 100m, and Wilma Rudolph in
the 1960 100m. The search for exceptional performances is ongoing.
The Database
A complete
digital database has been developed for all performances in the Olympic summer
games from 1896 to 2004. The data has been sourced from printed sources and
some online material. The Maritchev Volumes (Who’s Who in the Summer Olympics 1896-1992) and official reports of Olympic games have been
instrumental in this endeavour. There are more than 105,000 athletes listed,
and more than 275,000 lines of data.
The database
is in a spreadsheet format, with one line per performance in most sports. Each
line names an athlete and gives his or her performance in a specific race or
contest. For a complete event, there are separate entries for each stage:
heats, semi-finals, finals etc.
Each line
names an athlete along with some limited information about that athlete. Date
of birth is given where known, and alternative spellings of names (including
name changes), where they have been recorded, are offered. Names are limited to
the English alphabet. Where athletes have appeared in more than one sport, or
for more than one country, this is recorded. A numbering system has been
developed to simplify the identification of athletes, especially those who
change names or countries.
Tracking
down changes of names and movement of athletes has been a major task in
preparing the database.
The
performances of the athletes in individual events are recorded in fields
covering the year, sport, event, and stage, with results recorded in terms of
ranking and measured performance (times etc). Additional information is also
available, particularly for recent games, including intermediate times in
races. Results for the decathlon, for example, include a breakdown of results
in each event.
For
one-on-one contest-based sports, such as boxing, judo and wrestling, each bout
gets two entries, from the perspective of each contestant. For “doubles” events
such as in tennis and badminton, the team mate of the athlete, along with the
identities of opponents, are given.
While individual entry style is maintained for
team events such as relay and rowing events, team sports are treated a little
differently. Each player in a team gets an individual entry, but this won’t
contain every team result, although the final placing of the team is given.
Where available, the number of goals, points, etc contributed by the player in
the tournament is given. There are separate records in the database for the
team as a whole, giving all results.
The results
include all “Demonstration” sports. Where a Demonstration sport has gone on to
become an official sport, the status of the sport in specific years is
identified.
Single Games Studies
The
availability of comprehensive data in a single database greatly facilitates
study of statistics of whole fields in events. This opens up many potential
areas for study; one of these is comparison of male and female events.
As an
example, at the Sydney 2000 Games, average performances for men’s and women’s
fields were calculated for a range of events in athletics and swimming. The
differences in performances for track events are summarised in Table 1.
Table 1.
Sydney 2000 Track Events: Male and Female Average Performances
Number of Athletes |
Avge |
Difference |
||||
M |
F |
M |
F |
Gold |
Field |
|
100m |
108 |
65 |
10.43 |
11.54 |
8.2% |
9.6% |
200m |
55 |
48 |
20.75 |
23.20 |
8.0% |
10.6% |
400m |
62 |
52 |
45.97 |
52.52 |
10.7% |
12.5% |
800m |
55 |
34 |
107.43 |
122.35 |
10.3% |
12.2% |
1500m |
39 |
37 |
219.73 |
250.49 |
13.5% |
12.3% |
5000m |
33 |
48 |
819 |
931 |
8.9% |
12.1% |
10,000m |
33 |
27 |
1684 |
1912 |
9.9% |
11.9% |
Marathon |
75 |
40 |
8480 |
9180 |
9.1% |
7.6% |
Note: In most events, some
non-competitive athletes, who did not meet qualifying standards, have been
excluded.
In most
events, the difference between male and female gold medal winners is less than
the difference between the average performers, suggesting that male events on
the whole are more competitive. This can be better measured by calculated
relative standard deviation (rsd, a measure of how tightly the performances of
all athletes are clustered together) of data for each event. A “Competitiveness
Index” (CI) can also be used (Table 2), which takes into account both rsd and
the number of competitors
CI = (No of competitors)/(rsd x 100)
An event
with a low rsd (performances of most athletes are in a narrow range), and with
a high number of competitors, will have very high Competitiveness Index.
Table 2.
Competitiveness Indices for selected male and female track events.
M |
F |
|
100m |
54.5 |
21.9 |
200m |
31.9 |
20.1 |
400m |
36.5 |
15.3 |
Marathon |
17.2 |
10.2 |
It is
interesting that the difference between field performance is relatively low in
the marathon (7.6%). In fact, there is considerable overlap between the weaker
men’s times and the strongest women’s: the women’s winner, Naomo Takahashi,
would have beaten 31 male competitors home. Such overlap is not seen in most
events; in other events, such as in the pole vault, there are significant gaps
between the weakest men and strongest women. The reduced difference between men
and women in endurance events is paralleled in the swimming, where the
difference between the fields in the 400m freestyle and medleys are 8.0-8.5%,
while the 50m and 100m freestyle had differences of 12.2% and 10.6%
respectively.
Analysis of
whole fields can also allow comparison between events, and a quantitative
measure of the greatest performances in these sports. By measuring the number
of standard deviations from the average performance achieved by gold
medallists, we produce a score (a type of z-score) that shows us how far above
his peers a particular athlete stood. At Sydney, the highest z-scores recorded
in athletics and swimming are given in Table 3.
Table 3.
Highest “Z-Scores” in Athletics and Swimming, Sydney 2000
Men: 400m |
2.72 |
Johnson, Michael |
United States |
Men: 400m Freestyle |
2.47 |
Thorpe, Ian |
Australia |
Women: 200m |
2.47 |
Jones, Marion |
United States |
Men: 100m |
2.40 |
Greene, Maurice |
United States |
Men: Long Jump |
2.33 |
Pedroso,Ivan |
Cuba |
Women: Triple Jump |
2.30 |
Marinova, Tereza |
Bulgaria |
Women: 100m |
2.30 |
Jones, Marion |
United States |
Men: 100m Freestyle |
2.23 |
Van Den Hoogenband, Pieter |
Netherlands |
Women: 100m Butterfly |
2.20 |
De Bruijn, Inge |
Netherlands |
Women: Discus |
2.20 |
Zvereva, Ellina |
Belarus |
Multiple Games Studies
Where
results for all competitors are available, we can track changes in performance
patterns over time. Note that there are many earlier Games (especially prior to
1952) for which times or performances data is not comprehensive, because this
was recorded for winners or placegetters only.
The long
jump is an interesting example because complete data for the whole field can be
found back to 1912, earlier than for many other events. (However, data is not
complete for 1932, 1936 1948, and 1956). A comparison of winning performance,
average of Top 10, and median performance of the entire field through this
history (Figure 1) highlights the improvements in performance that have occurred.
It is perhaps surprising that there is little or no upward trend from 1968 to
2004 for either the winners or the Top Ten. In 1972, the Top Ten averaged 8.06
metres; it was exactly the same 28 years later. There was somewhat greater
increase in the Median for the field, which increased from 7.68 metres to 7.86
metres.
Figure 1. Men’s Long
Jump History: Best, Top 10, and Median
This means
there is some “compression” in the performances, with the average competitor
getting closer to the elite over successive Games. While not absolutely
clear-cut, this is supporting evidence for Stephen Jay Gould’s theory of sports
performance, which argues that as sports develop, the performances of the very
best competitors changes little as they approach physiological limits, but the
number of elite competitors does increase.
By
incorporating the data from all competitors in a given event, the
competitiveness of the event can be tracked over time. A fall in rsd, or an
increase in CI, over succeeding Games is evidence of increasing
competitiveness.
This can be
seen a little more clearly in the rsd. Changes in rsd for the men’s long jump
are tracked in Table 4.
Table 4.
Men’s Olympic Long Jump. Relative Standard Deviations
1912 |
6.57% |
1920 |
5.79% |
1924 |
7.12% |
1928 |
4.92% |
1952 |
3.85% |
1960 |
5.33% |
1964 |
4.20% |
1968 |
5.88% |
1972 |
4.13% |
1976 |
4.57% |
1980 |
5.76% |
1984 |
6.69% |
1988 |
4.82% |
1992 |
3.96% |
1996 |
2.96% |
2000 |
3.06% |
2004 |
3.21% |
The rsd for
the men’s long jump has declined from 6% from 1912-1924 to 3% in recent Games.
The negative impact of boycotts in 1980 and 1984 can be seen. The three most
recent Games appear to be clearly the most competitive in this event.
The 100m
Sprint
Complete
data for all competitors is available from 1952 onwards in both the men’s and
women’s events. There have been changes in numbers of competitors in these
events, so the Competitiveness Index shows the clearest result. (Figure 2)
Figure 2. 100m Sprint -
Men and Women Competitiveness
Both men’s
and women’s events have increased in CI over the years, with the men’s event
maintaining a clear margin, the CI for men being almost double the CI for
women. There is some scatter in the results, and it is notable that in spite of
the positive trend, the men’s 100m in 1952 appears, by this measure, to have
been almost as competitive as the 2004 Games. The effect of the boycott is seen
clearly in the 1980 Games, but less clearly in 1984.
100m
Freestyle
Data since
1924 (Figure 3) shows competitiveness increasing at a faster rate than for the
100m sprint. This suggests that swimming has undergone a steeper development
curve than sprinting. However, there is scatter in the data; it may well be
that development in the sport has been slower since 1952 than it was before. Improvement
has been faster than in the 100m sprint or long jump; men’s performances in the
last 50 years have improved about 5% in the 100m sprint (similar for women) and
10% in the long jump, but 16% in the 100m freestyle (19% in the women’s 100m
freestyle).
A striking
feature is that competitiveness in the women’s event is much closer to the
men’s than in the case of the track race. In fact, at a few Games the women’s
event has been more competitive than the men’s. This includes early Games in
1924 and 1932, a surprising finding.
The boycott
of 1980 appears to have severely damaged the competitiveness of both men’s and
women’s events at that Games, more so than in the athletic events studied.
Figure 3. 100m
Freestyle Swimming - Men and Women Competitiveness
The Measure of Greatness
The search
for the most extreme individual z-scores can be extended across all Games for
which complete performance data is available. The highest z-scores tend to be
recorded by men, perhaps because they have historically outnumbered women
competitors, often by a large margin. The search for extreme performances will
be an ongoing effort; for now, this analysis will conclude with the highest
z-scores recorded in the small number of events studied. As it happens, each of
these performances has already earned lasting fame in the annals of the Olympic
Games.
|
z-score |
|
Men’s long jump (1912-2004) |
3.05 |
Bob Beamon, Mexico City 1968 |
Men’s 100m Sprint (1952-2004) |
2.65 |
Bob Hayes, Tokyo 1964 |
Women’s 100m Sprint (1952-2004) |
2.42 |
Wilma Rudolph, Rome 1960 |
Men’s 100m freestyle (1924-2004) |
2.48 |
Jim Montgomery, Montreal 1976 |
Women’s 100m freestyle (1924-2004) |
2.52 |
Dawn Fraser, Tokyo 1964 |