View statistics for this project via
Libraries.io
, or by using
our public dataset on Google BigQuery
Author:
Steven Loria
Development Status
4 - Beta
Installation
If you don’t have
pip
(you should), run this first:
curl
https://raw.github.com/pypa/pip/master/contrib/get-pip.py
| python
Option 1
Choose this option if you:
Want a quick install.
Don’t have nltk currently installed, or don’t mind if your current installation is overriden by the latest version on the
master branch
. NOTE: You can also prevent the effects of this if you use textblob in a virtualenv.
pip install -U textblob
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python
This will install textblob and download the necessary NLTK corpora.
Option 2
Choose this option if you:
Don’t want your local nltk installation to be overridden.
Want to keep your nltk on the bleeding edge of development.
pip install -U git+https://github.com/nltk/nltk
pip install -U git+https://github.com/sloria/TextBlob.git@no-bundle
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python
This will install the latest NLTK from the master branch, install textblob from the
no-bundle
branch, and download the necessary corpora.
Usage
Simple.
Create a TextBlob
from text.blob import TextBlob
wikitext = '''
Python is a widely used general-purpose, high-level programming language.
Its design philosophy emphasizes code readability, and its syntax allows
programmers to express concepts in fewer lines of code than would be
possible in languages such as C.
wiki = TextBlob(wikitext)
Part-of-speech tags and noun phrases…
...are just properties.
wiki.pos_tags # [(Word('Python'), 'NNP'), (Word('is'), 'VBZ'),
# (Word('a'), u'DT'), (Word('widely'), 'RB')...]
wiki.noun_phrases # WordList(['python', 'design philosophy', 'code readability'])
Note: The first time you access
noun_phrases
might take a few seconds because the noun phrase chunker needs to be trained. Subsequent calls to
noun_phrases
will be quick, however, since all TextBlobs share the same instance of a noun phrase chunker.
Sentiment analysis
The
sentiment
property returns a tuple of the form
(polarity, subjectivity)
where
polarity
ranges from -1.0 to 1.0 and
subjectivity
ranges from 0.0 to 1.0.
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
testimonial.sentiment # (0.4583333333333333, 0.4357142857142857)
Tokenization
zen = TextBlob("Beautiful is better than ugly. "
"Explicit is better than implicit. "
"Simple is better than complex.")
zen.words # WordList(['Beautiful', 'is', 'better'...])
zen.sentences # [Sentence('Beautiful is better than ugly.'),
# Sentence('Explicit is better than implicit.'),
# ...]
for sentence in zen.sentences:
print(sentence.sentiment)
Words and inflection
Each word in
TextBlob.words
or
Sentence.words
is a
Word
object (a subclass of
unicode
) with useful methods, e.g. for word inflection.
sentence = TextBlob('Use 4 spaces per indentation level.')
sentence.words
# OUT: WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])
sentence.words[2].singularize()
# OUT: 'space'
sentence.words[-1].pluralize()
# OUT: 'levels'
Get word and noun phrase frequencies
wiki.word_counts['its'] # 2 (not case-sensitive by default)
wiki.words.count('its') # Same thing
wiki.words.count('its', case_sensitive=True) # 1
wiki.noun_phrases.count('code readability') # 1
TextBlobs are like Python strings!
zen[0:19] # TextBlob("Beautiful is better")
zen.upper() # TextBlob("BEAUTIFUL IS BETTER THAN UGLY...")
zen.find("Simple") # 65
apple_blob = TextBlob('apples')
banana_blob = TextBlob('bananas')
apple_blob < banana_blob # True
apple_blob + ' and ' + banana_blob # TextBlob('apples and bananas')
"{0} and {1}".format(apple_blob, banana_blob) # 'apples and bananas'
Get start and end indices of sentences
Use
sentence.start
and
sentence.end
. This can be useful for sentence highlighting, for example.
for sentence in zen.sentences:
print(sentence) # Beautiful is better than ugly
print("---- Starts at index {}, Ends at index {}"\
.format(sentence.start, sentence.end)) # 0, 30
Get a JSON-serialized version of the blob
zen.json # '[{"sentiment": [0.2166666666666667, ' '0.8333333333333334],
# "stripped": "beautiful is better than ugly", '
# '"noun_phrases": ["beautiful"], "raw": "Beautiful is better than ugly. ", '
# '"end_index": 30, "start_index": 0}
# ...]'
Advanced usage
Noun Phrase Chunkers
TextBlob currently has two noun phrases chunker implementations,
text.np_extractors.FastNPExtractor
(default, based on Shlomi Babluki’s implementation from
this blog post
)
and
text.np_extractors.ConllExtractor
, which uses the CoNLL 2000 corpus to train a tagger.
You can change the chunker implementation (or even use your own) by explicitly passing an instance of a noun phrase extractor to a TextBlob’s constructor.
from text.blob import TextBlob
from text.np_extractors import ConllExtractor
extractor = ConllExtractor()
blob = TextBlob("Extract my noun phrases.", np_extractor=extractor)
blob.noun_phrases # This will use the Conll2000 noun phrase extractor
POS Taggers
TextBlob currently has two POS tagger implementations, located in
text.taggers
. The default is the
PatternTagger
which uses the same implementation as the excellent
pattern
library.
The second implementation is
NLTKTagger
which uses
NLTK
’s TreeBank tagger.
It requires numpy and only works on Python 2
.
Similar to the noun phrase chunkers, you can explicitly specify which POS tagger to use by passing a tagger instance to the constructor.
from text.blob import TextBlob
from text.taggers import NLTKTagger
nltk_tagger = NLTKTagger()
blob = TextBlob("Tag! You're It!", pos_tagger=nltk_tagger)
blob.pos_tags
Testing
python run_tests.py
to run all tests.
License
TextBlob is licenced under the MIT license. See the bundled
LICENSE
file for more details.
Changelog
0.3.9 (2013-07-31)
Updated nltk.
ConllExtractor is now Python 3-compatible.
Improved sentiment analysis.
Blobs are equal (with
==
) to their string counterparts.
Added instructions to install textblob without nltk bundled.
Dropping official 3.1 and 3.2 support.
0.3.8 (2013-07-30)
Importing TextBlob is now
much faster
. This is because the noun phrase parsers are trained only on the first call to
noun_phrases
(instead of training them every time you import TextBlob).
Add text.taggers module which allows user to change which POS tagger implementation to use. Currently supports PatternTagger and NLTKTagger (NLTKTagger only works with Python 2).
NPExtractor and Tagger objects can be passed to TextBlob’s constructor.
Fix bug with POS-tagger not tagging one-letter words.
Rename text/np_extractor.py -> text/np_extractors.py
Add run_tests.py script.
0.3.7 (2013-07-28)
Every word in a
Blob
or
Sentence
is a
Word
instance which has methods for inflection, e.g
word.pluralize()
and
word.singularize()
.
Updated the
np_extractor
module. Now has an new implementation,
ConllExtractor
that uses the Conll2000 chunking corpus. Only works on Py2.
View statistics for this project via
Libraries.io
, or by using
our public dataset on Google BigQuery
Author:
Steven Loria
Development Status
4 - Beta
Download files
Download the file for your platform. If you're not sure which to choose, learn more about
installing packages
.
Source Distribution
Hashes
for textblob-0.3.9-py2.py3-none-any.whl
Hashes for textblob-0.3.9-py2.py3-none-any.whl
Algorithm
Hash digest
"PyPI", "Python Package Index", and the blocks logos are registered
trademarks
of the
Python Software Foundation
.
© 2024
Python Software Foundation
Site map