笨鸟编程-零基础入门Pyhton教程 › 首页 ›Scrapy中文手册 › 查看内容

Scrapy shell

发布者: 笨鸟自学网

scrappyshell是一个交互式shell，您可以在其中快速调试 scrape 代码，而不必运行spider。它本来是用来测试数据提取代码的，但实际上您可以使用它来测试任何类型的代码，因为它也是一个常规的Python外壳。

shell用于测试xpath或css表达式，并查看它们是如何工作的，以及它们从您试图抓取的网页中提取的数据。它允许您在编写spider时交互地测试表达式，而不必运行spider来测试每个更改。

一旦你熟悉了 Scrapy Shell，你就会发现它是开发和调试蜘蛛的宝贵工具。

配置shell¶

如果你有 IPython 安装后，scrapy shell将使用它（而不是标准的python控制台）。这个 IPython 控制台功能更强大，提供智能自动完成和彩色输出等功能。

我们强烈建议您安装 IPython ，特别是在使用Unix系统时（其中 IPython 擅长）。见 IPython installation guide 更多信息。

Scrapy还支持 bpython ，并将尝试在 IPython 不可用。

通过Scrapy的设置，你可以配置它使用 ipython ， bpython 或标准 python 外壳，无论安装了什么。这是通过设置 SCRAPY_PYTHON_SHELL 环境变量；或通过在 scrapy.cfg ：：

[settings]
shell = bpython

启动外壳¶

要启动碎屑壳，可以使用 shell 命令如下：

scrapy shell <url>

何处 <url> 是要擦除的URL。

shell 也适用于本地文件。如果你想玩一个网页的本地副本，这很方便。 shell 了解本地文件的以下语法：：

# UNIX-style
scrapy shell ./path/to/file.html
scrapy shell ../other/path/to/file.html
scrapy shell /absolute/path/to/file.html

# File URI
scrapy shell file:///absolute/path/to/file.html

注解

使用相对文件路径时，请显式并用 ./ （或） ../ 相关时）。 scrapy shell index.html 不会像人们预期的那样工作（这是设计上的，而不是错误）。

因为 shell 喜欢HTTP URL而不是文件URI，以及 index.html 在句法上类似于 example.com ， shell 会治疗 index.html 作为域名并触发DNS查找错误：：

$ scrapy shell index.html
[ ... scrapy shell starts ... ]
[ ... traceback ... ]
twisted.internet.error.DNSLookupError: DNS lookup failed:
address 'index.html' not found: [Errno -5] No address associated with hostname.

shell 如果文件调用了 index.html 存在于当前目录中。同样，要明确。

12 3 / 3 页下一页