I really should write about my first day of paid work in months, but it was on the one hand so complex, and on the other so tedious, that I don't have the will.
However, I've been working on an algorithm for a rhyming dictionary. One that works the way songwriters work - allowing for loosening or tightening of the rules for what constitutes a rhyme.
Take the word 'Game'. There's plenty of strict rhymes for it: Blame, Came, Dame, Fame, Flame, Frame, Insane, Maim, Lame, Same etc. But there's also some 'half-rhymes' - Bane, Again, Drain. What happens if the rules are relaxed still further? Queen, Say, Aging etc.
The rules for a strict rhyme looks something like this: Take a word, let's say 'Gridlock', and convert to phonemic transcription - /grIdlok/. Now break the word into segments - /g rId l ok/.
This is not exactly breaking the word into syllables. Starting from the end, ask whether the final sound is a vowel, or semi-vowel+vowel pair. If it is, that is the final segment. Hence:
Frappe - /f rap ei:/
Greywacke - /g rei: wak i:/
Mondo - /m on d o/
In the case of 'Gridlock', there is no terminal vowel. So, identify the final vowel sound and the final consonant, treat them as a unit, and work backwards in the same way.
Gridlock - /g rid l ok/
Telephone - /t el ef @u:n/
Cassette - /k @s et/
The phoemic stream is devided into vowel-consonant pairs, with some isolated consonants, especially at the beginning. 'r' is treated as a semivowel that only occurs before other vowels. 'h', although it is strictly speaking vocalic, is treated as a consonant. This system of segmentation is eccentric, but (I hope) optimised for finding rhymes.
A strict rhyme should occur on the stressed syllable of a word, so stress is indicated thus:
Frappe - /f ~rap ei:/
Greywacke - /g ~rei: wak i:/
Mondo - /m ~on d o/
Gridlock - /g ~rid l ok/
Telephone - /t ~el ef @un/
Cassette - /k ~@s et/
So, we now know that a strict rhyme for 'Mondo' must end in /~on d@u: /, so 'Condo' /k on d@u:/ is acceptable, though 'Condor' /k on dO:/ is not. A rhyme for 'Salient' must end in /~ei:l i: @n t/, if such a word exists.
What about looser rhymes? Well, we can relax the rule about stressed segments by ignoring stress marking, so rhymes for 'Peregrination' (/p e reg rIn rin ei:S @n/) need only end in /@n/ or /rin ei:S @n/, for instance.
More usefully, some destinctions between consonants can be ignored. In particular, destinctions between nasals. If the three-way destinction between /m/, /n/ and /N/ is relaxed, the following words rhyme:
Pan /p An/
Sam /s Am/
Sang /s AN/
Destinctions between long and short vowels can be ignored, or between vowels preceeded by a glide and those not etc etc.
This is what I worked out during a quiet half hour at work. Implementation is the hard part. I'll need to brush up on my ASCII phonemics (most likely the SAMPA system), and refamilarise myself with a programming language (BASIC is probably adaquate, though not elegant).
No comments:
Post a Comment