首先上代码 英文版Textblob:

Last login: Tue Sep  4 17:44:20 on ttys000

linhuideMBP:~ linhui$ cd /Users/linhui/anaconda2a/lib/python2.7/site-packages

linhuideMBP:site-packages linhui$ python

Python 2.7.14 |Anaconda, Inc.| (default, Dec  7 2017, 11:07:58)

[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> text = open('Book2.txt')

>>> text = text.read()

>>> text = text[0:200]

>>> blob = TextBlob(text)

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

NameError: name 'TextBlob' is not defined

>>> from textblob import Textblob

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

ImportError: cannot import name Textblob

>>> from textblob import TextBlob

>>> text = open('Book2.txt')

>>> text = text.read()

>>> text = text[0:200]

>>> blob = TextBlob(text)

>>> print(blob.words)

['It', 'was', 'a', 'bright', 'cold', 'day', 'in', 'April', 'and', 'the', 'clocks', 'were', 'striking', 'thirteen', 'Winston', 'Smith', 'his', 'chin', 'nuzzled', 'into', 'his', 'breast', 'in', 'an', 'effort', 'to', 'escape', 'the', 'vile', 'wind', 'slipped', 'quickly', 'through', 'the', 'glass', 'doors']

>>>

这里的Book2.txt文件是需要处理的文章,路径放在Python里的包文件

以下为法文版的Textblob: 

>>> from textblob import TextBlob
>>> from textblob_fr import PatternTagger, PatternAnalyzer
>>> text = u"Quelle belle matinée"
>>> blob = TextBlob(text, pos_tagger=PatternTagger(), analyzer=PatternAnalyzer())
>>> blob.tags
[(u'Quelle', u'DT'), (u'belle', u'JJ'), (u'matin\xe9e', u'NN')]
>>> [(u'Quelle', u'DT'), (u'belle', u'JJ'), (u'matin\xe9e', u'NN')]
[(u'Quelle', u'DT'), (u'belle', u'JJ'), (u'matin\xe9e', u'NN')]
>>> blob.sentiment
(0.8, 0.8) # belle 这个词比较明显 能很容易分析
>>>  text = u“Je suis le roi” 
  File "<stdin>", line 1
    text = u“Je suis le roi” # 注意中英文 经常忘了切换
    ^
IndentationError: unexpected indent
>>> text = u"Je suis le roi"
>>> blob = TextBlob(text, pos_tagger=PatternTagger(), analyzer=PatternAnalyzer())
>>> blob.tags
[(u'Je', u'PRP'), (u'suis', u'VB'), (u'le', u'DT'), (u'roi', u'NN')]
>>> blob.sentiment
(0.0, 0.0) # 个人认为这个模型在深层语义理解上差距明显
>>> 

可以看到,Textblob对深层语义理解还是欠缺的,但可以看出该模型对句法的分析是没有问题的,如‘PRP’、‘VB’、‘DT’、‘NN’.

 下期将带来德文版Textblob。

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐