日本語Wordnetのfreq

http://ja.wikipedia.org/wiki/WordNet
によると、Wordnetには語彙の多義性の度合いに関するデータが含まれているらしい。
おそらくsense.freqかな、というわけで。

sqlite> select sense.freq, word.lemma, word.wordid,sense.synset
 from sense, word
 where sense.freq>0 and sense.wordid = word.wordid
 order by sense.freq desc limit 10;


10742|be|150936|02604760-v
6833|person|55670|00007846-n
3019|be|150936|02616386-v
1861|say|150344|01009240-v
1837|not|122211|00024073-r
1345|group|23353|00031264-n
1202|have|152975|02203362-v
992|location|68437|00027167-n
901|be|150936|02655135-v
749|man|47012|10287213-n
sqlite> 

トップはbeだが、同じwordidでsynsetが異なるものが複数存在する。

sqlite> select * from synset_def
 where synset in('02604760-v','02616386-v','02655135-v');

02604760-v|eng|
have the quality of being;
 (copula, used with an adjective or a predicate noun);
 "John is rich";
 "This is not a good answer"|0

02616386-v|eng|
be identical to;
 be someone or something;
 "The president of the company is John Smith";
 "This is my house"|0

02655135-v|eng|
occupy a certain position or area; be somewhere;
 "Where is my umbrella?" "The toolshed is in the back";
 "What is behind this behavior?"|0
sqlite>