Skip to content

UnicodeDecodeError #3

@edoost

Description

@edoost

Hi,

When I try to extract an article from varzesh3.com (for example https://www.varzesh3.com/news/1554055/) I get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'your_url' is not defined
>>> your_url = 'https://www.varzesh3.com/news/1554055/'
>>> extractor = Extractor(extractor='ArticleExtractor', url=your_url)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/boilerpipe/extract/__init__.py", line 46, in __init__
    self.data = str(self.data, encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I solved this by replacing line 46 with:
self.data = self.data.decode(encoding, "ignore")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions