日本語Wordnetのfreq
http://ja.wikipedia.org/wiki/WordNet
によると、Wordnetには語彙の多義性の度合いに関するデータが含まれているらしい。
おそらくsense.freqかな、というわけで。
sqlite> select sense.freq, word.lemma, word.wordid,sense.synset from sense, word where sense.freq>0 and sense.wordid = word.wordid order by sense.freq desc limit 10; 10742|be|150936|02604760-v 6833|person|55670|00007846-n 3019|be|150936|02616386-v 1861|say|150344|01009240-v 1837|not|122211|00024073-r 1345|group|23353|00031264-n 1202|have|152975|02203362-v 992|location|68437|00027167-n 901|be|150936|02655135-v 749|man|47012|10287213-n sqlite>
トップはbeだが、同じwordidでsynsetが異なるものが複数存在する。
sqlite> select * from synset_def where synset in('02604760-v','02616386-v','02655135-v'); 02604760-v|eng| have the quality of being; (copula, used with an adjective or a predicate noun); "John is rich"; "This is not a good answer"|0 02616386-v|eng| be identical to; be someone or something; "The president of the company is John Smith"; "This is my house"|0 02655135-v|eng| occupy a certain position or area; be somewhere; "Where is my umbrella?" "The toolshed is in the back"; "What is behind this behavior?"|0 sqlite>