从spiders调用shell来检查响应¶有时,您希望检查在您的蜘蛛的某个点上正在处理的响应,如果只是检查您期望的响应是否到达那里的话。 这可以通过使用  下面是一个例子,说明如何从您的蜘蛛中命名它: import scrapy
class MySpider(scrapy.Spider):
    name = "myspider"
    start_urls = [
        "http://example.com",
        "http://example.org",
        "http://example.net",
    ]
    def parse(self, response):
        # We want to inspect one specific response.
        if ".org" in response.url:
            from scrapy.shell import inspect_response
            inspect_response(response, self)
        # Rest of parsing code.
当你运行蜘蛛时,你会得到类似的东西: 2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.com> (referer: None)
2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.org> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x1e16b50>
...
>>> response.url
'http://example.org'
然后,可以检查提取代码是否正常工作: >>> response.xpath('//h1[@class="fn"]')
[]
不,它不是。所以您可以在web浏览器中打开响应,看看它是否是您期望的响应: >>> view(response)
True
最后,单击ctrl-d(或在Windows中单击ctrl-z)退出shell并继续爬网: >>> ^D
2014-01-23 17:50:03-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
...
请注意,您不能使用   | 
Archiver|手机版|笨鸟自学网 ( 粤ICP备20019910号 )
GMT+8, 2025-11-4 09:31 , Processed in 0.017808 second(s), 18 queries .