20200812, "I before E except after C" is a low-value heuristic
.

PATH_DICT = '/usr/share/dict/words'

def main():
    lst_tracks = []
    lst_breaks = []

    with open(PATH_DICT) as f_ptr:
        lines = [l.strip() for l in f_ptr.readlines() if len(l) > 0]

    for word in lines:
        # Stick to lower-case words
        if ord(word[0]) > 122 or ord(word[0]) < 97:
            continue

        if 'cei' in word:
            lst_tracks.append(word)
        elif 'ei' in word:
            lst_breaks.append(word)

        if 'cie' in word:
            lst_breaks.append(word)
        elif 'ie' in word:
            lst_tracks.append(word)

    print('Tracks the rule: %s'%(len(lst_tracks)))

    print('Breaks the rule: %s'%(len(lst_breaks)))

if __name__ == '__main__':
    main()



$ python3 i_before_e_except_after_c.py 
Tracks the rule: 3790
Breaks the rule: 717
$ 

About 16% of words break the rule.

A bunch of fail due to plural words - vacancies. Then there are the eight-like
words - freight, weight. Then there are words with "re" as a prefix.

But here are some that fail the rule despite no clear exception case: vein,
veil, heinous, weir, seize, glacier, feint, feisty, science, deficient.