Word Processing


What words should I teach? What words do students actually need to know? I don't know either, and intuition is always a lousy guide, but here's one approach to finding out.

The latest Oxford English Dictionary contains about 290,500 entries. The Concise OED has 65795, and I'm using these as my starting point.

I can discard 13,479 entries because they're phrases instead of individual words, plus I can lose 2,727 entries because they're hyphenated terms. That leaves 45,495.

But which ones are absolutely essential, which are kind-of useful, and which are in there to make it look 'comprehensive' or because the compilers just liked them?

I have the subtitles of 20,749 BBC programmes broadcast over the last two years - in effect, transcriptions. By ditching the shortest 749, then filtering out formatting data and punctuation, I've got a pretty large corpus of reasonably authentic utterances.

So, what words from the COED occur with what frequency in the BBC transcriptions? And what words don't occur at all?

Well, here a selection from the 14,633 individual words which occur exactly zero times in two years worth of BBC TV. I know what ten of them mean.

WordOccurances
backgrounder0
bouclé0
chametz0
contumacious0
delist0
dyspepsia0
externalism0
gambado0
headquarter0
inamorato0
kaffeeklatsch0
linstock0
menhir0
mutuel0
orangeman0
pemphigoid0
portière0
raja0
sandinista0
siksika0
stumer0
tetrastich0
tynwald0
usufruct0
yaar0

That means 29,140 words occur at least once. Here are 25 of the 19,381 which occur less than ten times. I know the meanings of 14 of them - what about you?

WordOccurances
shirty9
maraschino8
cortisone7
serried7
ganglion6
turbocharger6
gilet5
som5
convulsion4
nimbus4
unlistenable4
divestment3
miscast3
spousal3
bioactive2
epsilon2
leafhopper2
prelate2
tambourin2
angiography1
chinkara1
eclampsia1
honeyguide1
minuteman1
piscina1

9,339 occur a hundred times or more. The following happen more than ten but less than 1,000 times.

WordOccurances
honourable704
underwear529
vain416
troop327
muffin264
max215
cam177
blip150
uncanny129
gland111
aerospace96
detonate83
mangle72
yam63
ringside55
chamomile47
embryonic41
poncho36
uneducated32
bawl27
gunfight24
permissible21
convection18
lucre16
morass14

A more managable quantity of 520 occur 10,000 or more times. Here are some of those between 1000 and 10,000:

WordOccurances
summer8831
offer7861
including6985
hopefully6285
fruit5693
showing5224
closed4761
sight4342
location4031
countryside3784
product3474
lack3211
arrive3001
transport2793
shower2633
iron2484
breathe2297
panic2162
twist2031
cave1921
purple1824
innocent1718
fraud1641
virtually1560
assume1480


Here's a selection between 10,000 and 100,000. Do any surprise you as being more or less common than you thought?

WordOccurances
get273618
much117311
home63526
course45194
lovely34349
such28509
front24414
while20752
easy18113
hold16047
dad13846
hour12535
cost11389
beat10660

A mere 76 occur 100,000 or more times:

WordOccurances
the3812984
to2404952
a2098428
of1699308
and1640805
it1432806
in1209482
that1173021
for730257
on699870
have633194
this568081
be564405
are532264
with490772
not401645
at396377
he374352
do368329
me350617
all346828
what338545
there337154
as331112
but321033
like319404
just309660
up305068
can297541
about290147
out285530
so285324
going283807
think281064
from275761
get273618
will269378
know267222
here252640
go239097
an220785
very212718
them212592
see201120
time196690
now187122
if187114
right182193
by182007
more179865
really176809
good169427
people168640
or163126
back160195
some159211
she157138
want152556
no148444
then137351
into134491
down130689
how130228
look124413
come124007
way123631
make122129
over119391
well117864
say117728
much117311
need115195
bit113799
off107582
little103122
take101002

So, here's one difficulty in learning a language. Once you've got the major meanings of the top 100-200 words, you've got tens of thousands of others to learn, and the additional benefit of knowing each of them - their usefulness - is pretty damn small.

How often do you need to describe something as 'spicy' (position 3,000, 975 occurances)? Or 'compulsory' (position 5,000, 370 occurances)? Or describe someone as a 'colonel' (position 7,000, 182 occurances)?

I may have had the occasional 'manky' cheese sandwich (position 10,000, 85 occurances) - but I'm not sure I've ever used the word in conversation.

The Arabic method of learning languages is "memorise the dictionary". It doesn't work, for obvious reasons. But it's...interesting that they've taken only the most difficult, least interesting and least rewarding part of the process, missing out all the easy, fun and useful parts.

The Arabs are almost British in their ability to miss the point.

No comments:

Post a Comment