20200812, "I before E except after C" is a low-value heuristic . PATH_DICT = '/usr/share/dict/words' def main(): lst_tracks = [] lst_breaks = [] with open(PATH_DICT) as f_ptr: lines = [l.strip() for l in f_ptr.readlines() if len(l) > 0] for word in lines: # Stick to lower-case words if ord(word[0]) > 122 or ord(word[0]) < 97: continue if 'cei' in word: lst_tracks.append(word) elif 'ei' in word: lst_breaks.append(word) if 'cie' in word: lst_breaks.append(word) elif 'ie' in word: lst_tracks.append(word) print('Tracks the rule: %s'%(len(lst_tracks))) print('Breaks the rule: %s'%(len(lst_breaks))) if __name__ == '__main__': main() $ python3 i_before_e_except_after_c.py Tracks the rule: 3790 Breaks the rule: 717 $ About 16% of words break the rule. A bunch of fail due to plural words - vacancies. Then there are the eight-like words - freight, weight. Then there are words with "re" as a prefix. But here are some that fail the rule despite no clear exception case: vein, veil, heinous, weir, seize, glacier, feint, feisty, science, deficient.