python - Scrapy ImagesPipeline WARNING: File (unknown-error): Error downloading image from <GET -
i learning python , scrapy , learning how download images using it. kind of stuck right , cant figure out real problem is.
i getting error message when run spider
<none>: unsupported url scheme '': no handler available scheme
and
[imageflip] warning: file (unknown-error): error downloading image <get
please see pipelines.py here
import scrapy scrapy.contrib.pipeline.images import imagespipeline scrapy.exceptions import dropitem class priceoflipkartpipeline(object): def process_item(self, item, spider): return item class myimagespipeline(imagespipeline): def get_media_requests(self, item, info): image_url in item['image_urls']: yield scrapy.request(image_url) def item_completed(self, results, item, info): image_paths = [x['path'] ok, x in results if ok] if not image_paths: raise dropitem("item contains no images") item['image_paths'] = image_paths return item
please see settings.py here
spider_modules = ['priceoflipkart.spiders'] newspider_module = 'priceoflipkart.spiders' item_pipelines = ['scrapy.contrib.pipeline.images.imagespipeline'] images_store = 'd:\priceoflipkart\images' images_expires = 90
please see spider here
import scrapy priceoflipkart.items import priceoflipkartitem class flipkartspider(scrapy.spider): name = "imageflip" allowed_domains = ["flipkart.com"] start_urls = [ "http://www.flipkart.com/moto-g-2nd-gen/p/itme5z8n9mt77ajr?pid=mobdygz6shnb7rfc&srno=b_1&ref=06f4e48c-9548-45fa-b3ac-fa5fdf0e0d22" ] def parse(self, response): sel in response.xpath('//body'): item = priceoflipkartitem() item['image_urls'] = sel.select('//img[@class="productimage current"]').extract() yield item
and in item.py have added following code
image_urls = scrapy.field() images = scrapy.field()
please advice me how configure correctly image downloaded. on windows 8 machine. thank in advance.
the xpath extract image urls not correct, should include /@src
@ end extract url image. make like:
item['image_urls'] = sel.select( '//img[@class="productimage current"]/@src').extract()
Comments
Post a Comment