[index]
"I before E except after C" is an unreliable heuristic
Craig Turner, 12 August 2020
--
PATH_DICT = '/usr/share/dict/words'
def main():
lst_tracks = []
lst_breaks = []
with open(PATH_DICT) as f_ptr:
lines = [l.strip() for l in f_ptr.readlines() if len(l) > 0]
for word in lines:
# Stick to lower-case words
if ord(word[0]) > 122 or ord(word[0]) < 97:
continue
if 'cei' in word:
lst_tracks.append(word)
elif 'ei' in word:
lst_breaks.append(word)
if 'cie' in word:
lst_breaks.append(word)
elif 'ie' in word:
lst_tracks.append(word)
print('Tracks the rule: %s'%(len(lst_tracks)))
print('Breaks the rule: %s'%(len(lst_breaks)))
if __name__ == '__main__':
main()
$ python3 i_before_e_except_after_c.py
Tracks the rule: 3790
Breaks the rule: 717
$
About 16% of words break with the saying.
At school, we were taught that the eight-like words are an exception to the
rule. This includes freight, weight. Fine.
Another set of exceptions are words with "re" as a prefix. OK, I can see how
that would happen.
Some plural words fail the test - vacancies. Makes sense.
But here are some words that the rule fails despite these words not fitting
any of the clear exception cases above: vein, veil, heinous, weir, seize,
glacier, feint, feisty, science, deficient.
@20250203 1144
Related post on hacker news - https://news.ycombinator.com/item?id=40324534