packtpub-crawler: Error attempting to claim book from newsletter
~ $ python script/spider.py --config config/prod.cfg --notify ifttt --claimOnly
__ __ __ __
____ ____ ______/ /__/ /_____ __ __/ /_ ______________ __ __/ /__ _____
/ __ \/ __ `/ ___/ //_/ __/ __ \/ / / / __ \______/ ___/ ___/ __ `/ | /| / / / _ \/ ___/
/ /_/ / /_/ / /__/ ,< / /_/ /_/ / /_/ / /_/ /_____/ /__/ / / /_/ /| |/ |/ / / __/ /
/ .___/\__,_/\___/_/|_|\__/ .___/\__,_/_.___/ \___/_/ \__,_/ |__/|__/_/\___/_/
/_/ /_/
Download FREE eBook every day from www.packtpub.com
@see github.com/niqdev/packtpub-crawler
[*] 2017-01-31 10:30 - fetching today's eBooks
[*] configuration file: /app/config/prod.cfg
[*] getting daily free eBook
[*] fetching url... 200 | https://www.packtpub.com/packt/offers/free-learning
[*] fetching url... 200 | https://www.packtpub.com/packt/offers/free-learning
[*] fetching url... 200 | https://www.packtpub.com/account/my-ebooks
[+] book successfully claimed
[+] notification sent to IFTTT
[*] getting free eBook from newsletter
[*] fetching url... 200 | https://www.packtpub.com/packt/free-ebook/practical-data-analysis
[-] <type 'exceptions.IndexError'> list index out of range | spider.py@123
Traceback (most recent call last):
File "script/spider.py", line 123, in main
packtpub.runNewsletter(currentNewsletterUrl)
File "/app/script/packtpub.py", line 160, in runNewsletter
self.__parseNewsletterBookInfo(soup)
File "/app/script/packtpub.py", line 98, in __parseNewsletterBookInfo
title = urlWithTitle.split('/')[4].replace('-', ' ').title()
IndexError: list index out of range
[+] error notification sent to IFTTT
[*] done
~ $
It has successfully claimed the book from the newsletter already, but on subsequent days I’m getting the above error.
And it sends an IFTTT notification for the second one 😦
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 39 (30 by maintainers)
Hi Guys, I’m creating google script that parsing PacktPab tweets(it comes from @juzim google script). I’m not sure but there is a chance that all books from newsletters also will be published on their Twitter and no needs to fix it 😃 joking. It’s not finished - should exclude duplicates and check does link still available or not. If you have time, please look on output if it’s fine for crawler or not https://goo.gl/AXtAC8
Looks like some of the divs has been renamed on the newsletter’s landing page. I compared the page for an older book:
with the current one:
and came up with this hotfix: https://github.com/niqdev/packtpub-crawler/compare/master...mkarpiarz:fix_newsletter_divs I haven’t tested email notifications yet, so I’m not sure how the description would look like, but claiming a newsletter ebook seems to work now. Happy to submit a PR if @juzim haven’t started working on this yet.
That’s it?! I’ll try to fix it soon but it might take till next week, sorry.
niqdev notifications@github.com schrieb am So., 2. Apr. 2017, 11:10:
The script would just claim the book and you can download it later manually or run it with a “downloadAll” parameter that only syncs the archive with the local folder. Notifications etc are handled on claim, not download.