Answer a question

I have just started to learn python. I want to write a program in NLTK that breaks a text into unigrams, bigrams. For example if the input text is...

"I am feeling sad and disappointed due to errors"

... my function should generate text like:

I am-->am feeling-->feeling sad-->sad and-->and disappointed-->disppointed due-->due to-->to errors

I have written code to input text into the program. Here's the function I'm trying:

def gen_bigrams(text):
    token = nltk.word_tokenize(review)
    bigrams = ngrams(token, 2)
    #print Counter(bigrams)
    bigram_list = ""
    for x in range(0, len(bigrams)):
        words = bigrams[x]
        bigram_list = bigram_list + words[0]+ " " + words[1]+"-->"
    return bigram_list

The error I'm getting is...

for x in range(0, len(bigrams)):

TypeError: object of type 'generator' has no len()

As the ngrams function returns a generator, I tried using len(list(bigrams)) but it returns 0 value, so I'm getting the same error. I have referred to other questions on StackExchange but I am still not getting around how to resolve this. I am stuck at this error. Any workaround, suggestion?

Answers

Constructing strings by concatenating values separated by a separator is best done by str.join:

def gen_bigrams(text):
    token = nltk.word_tokenize(text)
    bigrams = nltk.ngrams(token, 2)
    # instead of " ".join also "{} {}".format would work in the map
    return "-->".join(map(" ".join, bigrams))

Note that there'll be no trailing "-->", so add that, if it's necessary. This way you don't even have to think about the length of the iterable you're using. In general in python that is almost always the case. If you want to iterate through an iterable, use for x in iterable:. If you do need the indexes, use enumerate:

for i, x in enumerate(iterable):
    ...
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐