I am using nltk, so I want to create my own custom texts just like the default ones on nltk.books. However, I've just got up to the method like
my_text = ['This', 'is', 'my', 'text']
I'd like to discover any way to input my "text" as:
my_text = "This is my text, this is a nice way to input text."
Which method, python's or from nltk allows me to do this. And more important, how can I dismiss punctuation symbols?
As @PavelAnossov answered, the canonical answer, use the
word_tokenize
function in nltk:If your sentence is truly simple enough:
Using the
string.punctuation
set, remove punctuation then split using the whitespace delimiter:This is actually on the main page of nltk.org: