[index]

"I before E except after C" is an unreliable heuristic
Craig Turner, 12 August 2020
--

PATH_DICT = '/usr/share/dict/words'

def main():
    lst_tracks = []
    lst_breaks = []

    with open(PATH_DICT) as f_ptr:
        lines = [l.strip() for l in f_ptr.readlines() if len(l) > 0]

    for word in lines:
        # Stick to lower-case words
        if ord(word[0]) > 122 or ord(word[0]) < 97:
            continue

        if 'cei' in word:
            lst_tracks.append(word)
        elif 'ei' in word:
            lst_breaks.append(word)

        if 'cie' in word:
            lst_breaks.append(word)
        elif 'ie' in word:
            lst_tracks.append(word)

    print('Tracks the rule: %s'%(len(lst_tracks)))

    print('Breaks the rule: %s'%(len(lst_breaks)))

if __name__ == '__main__':
    main()



$ python3 i_before_e_except_after_c.py 
Tracks the rule: 3790
Breaks the rule: 717
$ 

About 16% of words break with the saying.

At school, we were taught that the eight-like words are an exception to the
rule. This includes freight, weight. Fine.

Another set of exceptions are words with "re" as a prefix. OK, I can see how
that would happen.

Some plural words fail the test - vacancies. Makes sense.

But here are some words that the rule fails despite these words not fitting
any of the clear exception cases above: vein, veil, heinous, weir, seize,
glacier, feint, feisty, science, deficient.


@20250203 1144
Related post on hacker news - https://news.ycombinator.com/item?id=40324534