如果tag中包含多个字符串 [2] ,可以使用 for string in soup.strings:
print(repr(string))
# u"The Dormouse's story"
# u'\n\n'
# u"The Dormouse's story"
# u'\n\n'
# u'Once upon a time there were three little sisters; and their names were\n'
# u'Elsie'
# u',\n'
# u'Lacie'
# u' and\n'
# u'Tillie'
# u';\nand they lived at the bottom of a well.'
# u'\n\n'
# u'...'
# u'\n'
输出的字符串中可能包含了很多空格或空行,使用 for string in soup.stripped_strings:
print(repr(string))
# u"The Dormouse's story"
# u"The Dormouse's story"
# u'Once upon a time there were three little sisters; and their names were'
# u'Elsie'
# u','
# u'Lacie'
# u'and'
# u'Tillie'
# u';\nand they lived at the bottom of a well.'
# u'...'
全部是空格的行会被忽略掉,段首和段末的空白会被删除 父节点继续分析文档树,每个tag或字符串都有父节点:被包含在某个tag中 .parent通过 title_tag = soup.title
title_tag
# <title>The Dormouse's story</title>
title_tag.parent
# <head><title>The Dormouse's story</title></head>
文档title的字符串也有父节点:<title>标签 title_tag.string.parent
# <title>The Dormouse's story</title>
文档的顶层节点比如<html>的父节点是 html_tag = soup.html
type(html_tag.parent)
# <class 'bs4.BeautifulSoup'>
print(soup.parent)
# None
.parents通过元素的 link = soup.a
link
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
for parent in link.parents:
if parent is None:
print(parent)
else:
print(parent.name)
# p
# body
# html
# [document]
# None
|
Archiver|手机版|笨鸟自学网 ( 粤ICP备20019910号 )
GMT+8, 2024-10-5 16:40 , Processed in 0.015794 second(s), 17 queries .