View statistics for this project via
Libraries.io
, or by using
our public dataset on Google BigQuery
Author:
Steven Loria
Development Status
4 - Beta
Installation
TextBlob’s only external dependency is PyYAML. A vendorized version of NLTK is bundled internally.
If you have
pip
:
pip install textblob
Or (if you must):
easy_install textblob
IMPORTANT
: TextBlob depends on some NLTK corpora to work. The easiest way
to get these is to run this command:
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python
You can also download the script
here
.
Then run:
python download_corpora.py
Usage
Simple.
Create a TextBlob
from text.blob import TextBlob
wikitext = '''
Python is a widely used general-purpose, high-level programming language.
Its design philosophy emphasizes code readability, and its syntax allows
programmers to express concepts in fewer lines of code than would be
possible in languages such as C.
wiki = TextBlob(wikitext)
Part-of-speech tags and noun phrases…
...are just properties.
wiki.pos_tags # [(Word('Python'), 'NNP'), (Word('is'), 'VBZ'),
# (Word('widely'), 'RB')...]
wiki.noun_phrases # WordList(['python', 'design philosophy', 'code readability'])
Sentiment analysis
The
sentiment
property returns a tuple of the form
(polarity, subjectivity)
where
polarity
ranges from -1.0 to 1.0 and
subjectivity
ranges from 0.0 to 1.0.
blob.sentiment # (0.20, 0.58)
Tokenization
zen = TextBlob("Beautiful is better than ugly. "
"Explicit is better than implicit. "
"Simple is better than complex.")
zen.words # WordList(['Beautiful', 'is', 'better'...])
zen.sentences # [Sentence('Beautiful is better than ugly.'),
# Sentence('Explicit is better than implicit.'),
# ...]
Words and inflection
Each word in
TextBlob.words
or
Sentence.words
is a
Word
object (a subclass of
unicode
) with useful methods, e.g. for word inflection.
sentence = TextBlob('Use 4 spaces per indentation level.')
sentence.words
# OUT: WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])
sentence.words[2].singularize()
# OUT: 'space'
sentence.words[-1].pluralize()
# OUT: 'levels'
Get word and noun phrase frequencies
wiki.word_counts['its'] # 2 (not case-sensitive by default)
wiki.words.count('its') # Same thing
wiki.words.count('its', case_sensitive=True) # 1
wiki.noun_phrases.count('code readability') # 1
TextBlobs are like Python strings!
zen[0:19] # TextBlob("Beautiful is better")
zen.upper() # TextBlob("BEAUTIFUL IS BETTER THAN UGLY...")
zen.find("Simple") # 65
apple_blob = TextBlob('apples')
banana_blob = TextBlob('bananas')
apple_blob < banana_blob # True
apple_blob + ' and ' + banana_blob # TextBlob('apples and bananas')
"{0} and {1}".format(apple_blob, banana_blob) # 'apples and bananas'
Get start and end indices of sentences
Use
sentence.start
and
sentence.end
. This can be useful for sentence highlighting, for example.
for sentence in zen.sentences:
print(sentence) # Beautiful is better than ugly
print("---- Starts at index {}, Ends at index {}"\
.format(sentence.start, sentence.end)) # 0, 30
Get a JSON-serialized version of the blob
zen.json # '[{"sentiment": [0.2166666666666667, ' '0.8333333333333334],
# "stripped": "beautiful is better than ugly", '
# '"noun_phrases": ["beautiful"], "raw": "Beautiful is better than ugly. ", '
# '"end_index": 30, "start_index": 0}
# ...]'
Overriding the noun phrase extractor
TextBlob currently has two noun phrases chunker implementations,
text.np_extractor.FastNPExtractor
(default, based on Shlomi Babluki’s implementation from
this blog post
)
and
text.np_extractor.ConllExtractor
(currently working on Python 2 only).
You can change the chunker implementation (or even use your own) by overriding
TextBlob.np_extractor
from text.np_extractor import ConllExtractor
extractor = ConllExtractor()
blob = TextBlob("Python is a widely used general-purpose, high-level programming language.")
blob.np_extractor = extractor
blob.noun_phrases # This will use the Conll2000 noun phrase extractor
Testing
nosetests
to run all tests.
License
TextBlob is licenced under the MIT license. See the bundled
LICENSE
file for more details.
Changelog for textblob
0.3.7 (2013-07-29)
Every word in a
Blob
or
Sentence
is a
Word
instance which has methods for inflection, e.g
word.pluralize()
and
word.singularize()
.
Updated the
np_extractor
module. Now has an new implementation,
ConllExtractor
that uses the Conll2000 chunking corpus. Only works on Py2.
View statistics for this project via
Libraries.io
, or by using
our public dataset on Google BigQuery
Author:
Steven Loria
Development Status
4 - Beta
Download files
Download the file for your platform. If you're not sure which to choose, learn more about
installing packages
.
Source Distribution
Hashes
for textblob-0.3.7-py2.py3-none-any.whl
Hashes for textblob-0.3.7-py2.py3-none-any.whl
Algorithm
Hash digest
"PyPI", "Python Package Index", and the blocks logos are registered
trademarks
of the
Python Software Foundation
.
© 2024
Python Software Foundation
Site map