我已成功通过其他语义关系检索连接到基本synset的同义词集,如下所示:
wn.synset('good.a.01').also_sees()
Out[63]:
[synset('best.a.01'),
synset('better.a.01'),
synset('favorable.a.01'),
synset('good.a.03'),
synset('obedient.a.01'),
synset('respectable.a.01')]
wn.synset('good.a.01').similar_tos()
Out[64]:
[synset('bang-up.s.01'),
synset('good_enough.s.01'),
synset('goodish.s.01'),
synset('hot.s.15'),
synset('redeeming.s.02'),
synset('satisfactory.s.02'),
synset('solid.s.01'),
synset('superb.s.02'),
synset('well-behaved.s.01')]
然而,反义词关系似乎不同.我设法检索连接到我的基本synset的引理,但无法检索实际的synset,如下所示:
wn.synset('good.a.01').lemmas()[0].antonyms()
Out[67]: [Lemma('bad.a.01.bad')]
如何通过antonymy连接到我的基本synset -wn.synset(‘good.a.01’)来获取synset而不是引理? TIA
解决方法:
出于某种原因,WordNet在引理级别而不是synset(参见http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c)索引反义关系,因此问题是synsets和Lemmas是否具有多对多或一对一的关系.
在含糊不清的单词,一个含义很多的单词的情况下,我们在String-to-synset之间有一对多的关系,例如
>>> wn.synsets('dog')
[synset('dog.n.01'), synset('frump.n.01'), synset('dog.n.03'), synset('cad.n.01'), synset('frank.n.02'), synset('pawl.n.01'), synset('andiron.n.01'), synset('chase.v.01')]
在一个含义/概念,多个表示的情况下,我们在synset-to-String(其中String指引理名称)之间具有一对多关系:
>>> dog = wn.synset('dog.n.1')
>>> dog.deFinition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> dog.lemma_names()
[u'dog', u'domestic_dog', u'Canis_familiaris']
注意:到目前为止,我们正在比较String和synsets之间的关系,而不是Lemmas和synsets.
“可爱”的东西是Lemma和String有一对一的关系:
>>> wn.synsets('dog')
[synset('dog.n.01'), synset('frump.n.01'), synset('dog.n.03'), synset('cad.n.01'), synset('frank.n.02'), synset('pawl.n.01'), synset('andiron.n.01'), synset('chase.v.01')]
>>> wn.synsets('dog')[0]
synset('dog.n.01')
>>> wn.synsets('dog')[0].deFinition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> wn.synsets('dog')[0].lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].name()
u'dog'
Lemma对象的_name属性返回unicode字符串,而不是列表.从代码点:https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202和https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L444
看起来Lemma与synset有一对一的关系.来自于https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220的docstring:
Lemma attributes, accessible via methods with the same name::
- name: The canonical name of this lemma.
- synset: The synset that this lemma belongs to.
- syntactic_marker: For adjectives, the WordNet string identifying the
syntactic position relative modified noun. See:
07004
For all other parts of speech, this attribute is None.- count: The frequency of this lemma in wordnet.
所以我们可以这样做,并以某种方式知道每个Lemma对象只会返回1个synset:
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].synset()
synset('dog.n.01')
假设您正在尝试进行一些情绪分析,并且您需要WordNet中每个形容词的反义词,您可以轻松地接受反义词的synsets:
>>> from nltk.corpus import wordnet as wn
>>> all_adj_in_wn = wn.all_synsets(pos='a')
>>> def get_antonyms(ss):
... return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))
...
>>> for ss in all_adj_in_wn:
... print ss, ':', get_antonyms(ss)
...
synset('unable.a.01') : set([synset('unable.a.01')])
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。