python - Scrapy ImagesPipeline WARNING: File (unknown-error): Error downloading image from <GET -


i learning python , scrapy , learning how download images using it. kind of stuck right , cant figure out real problem is.

i getting error message when run spider

<none>: unsupported url scheme '': no handler available scheme 

and

[imageflip] warning: file (unknown-error): error downloading image <get 

please see pipelines.py here

import scrapy scrapy.contrib.pipeline.images import imagespipeline scrapy.exceptions import dropitem   class priceoflipkartpipeline(object):     def process_item(self, item, spider):         return item  class myimagespipeline(imagespipeline):  def get_media_requests(self, item, info):     image_url in item['image_urls']:         yield scrapy.request(image_url)  def item_completed(self, results, item, info):     image_paths = [x['path'] ok, x in results if ok]     if not image_paths:         raise dropitem("item contains no images")     item['image_paths'] = image_paths     return item 

please see settings.py here

spider_modules = ['priceoflipkart.spiders'] newspider_module = 'priceoflipkart.spiders' item_pipelines = ['scrapy.contrib.pipeline.images.imagespipeline'] images_store = 'd:\priceoflipkart\images' images_expires = 90 

please see spider here

import scrapy priceoflipkart.items import priceoflipkartitem  class flipkartspider(scrapy.spider):     name = "imageflip"     allowed_domains = ["flipkart.com"]      start_urls = [     "http://www.flipkart.com/moto-g-2nd-gen/p/itme5z8n9mt77ajr?pid=mobdygz6shnb7rfc&srno=b_1&ref=06f4e48c-9548-45fa-b3ac-fa5fdf0e0d22" ]  def parse(self, response):     sel in response.xpath('//body'):         item = priceoflipkartitem()         item['image_urls'] = sel.select('//img[@class="productimage  current"]').extract()         yield item 

and in item.py have added following code

image_urls = scrapy.field() images = scrapy.field() 

please advice me how configure correctly image downloaded. on windows 8 machine. thank in advance.

the xpath extract image urls not correct, should include /@src @ end extract url image. make like:

item['image_urls'] = sel.select(     '//img[@class="productimage  current"]/@src').extract() 

Comments

Popular posts from this blog

java - Could not locate OpenAL library -

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

sorting - opencl Bitonic sort with 64 bits keys -