blot.summarize

class blot.summarize.Summary(url, article_html, title, summaries)

Bases: object

blot.summarize.compare_sents(sent1, sent2)

Compare two word-tokenized sentences for shared words

blot.summarize.compare_sents_bounded(sent1, sent2)

If the result of compare_sents is not between LOWER_BOUND and UPPER_BOUND, it returns 0 instead, so outliers don’t mess with the sum

blot.summarize.compute_score(sent, sents)

Computes the average score of sent vs the other sentences (the result of sent vs itself isn’t counted because it’s 1, and that’s above UPPER_BOUND)

blot.summarize.find_likely_body(b)

Find the tag with the most directly-descended <p> tags

blot.summarize.is_unimportant(word)

Decides if a word is ok to toss out for the sentence comparisons

blot.summarize.only_important(sent)

Just a little wrapper to filter on is_unimportant

blot.summarize.summarize_block(block)

Return the sentence that best summarizes block

blot.summarize.summarize_blocks(blocks)
blot.summarize.summarize_page(url)
blot.summarize.summarize_text(text, block_sep='\n\n', url=None, title=None)
blot.summarize.u(s)

Ensure our string is unicode independent of Python version, since Python 3 versions < 3.3 do not support the u”...” prefix