-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Hi,
When I try to extract an article from varzesh3.com (for example https://www.varzesh3.com/news/1554055/) I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'your_url' is not defined
>>> your_url = 'https://www.varzesh3.com/news/1554055/'
>>> extractor = Extractor(extractor='ArticleExtractor', url=your_url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/boilerpipe/extract/__init__.py", line 46, in __init__
self.data = str(self.data, encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I solved this by replacing line 46 with:
self.data = self.data.decode(encoding, "ignore")
Metadata
Metadata
Assignees
Labels
No labels