使用Unicode时,Beautiful Soup还会智能的把引号 [10] 转换成HTML或XML中的特殊字符: markup = b"<p>I just \x93love\x94 Microsoft Word\x92s smart quotes</p>"
UnicodeDammit(markup, ["windows-1252"], smart_quotes_to="html").unicode_markup
# u'<p>I just “love” Microsoft Word’s smart quotes</p>'
UnicodeDammit(markup, ["windows-1252"], smart_quotes_to="xml").unicode_markup
# u'<p>I just “love” Microsoft Word’s smart quotes</p>'
也可以把引号转换为ASCII码: UnicodeDammit(markup, ["windows-1252"], smart_quotes_to="ascii").unicode_markup
# u'<p>I just "love" Microsoft Word\'s smart quotes</p>'
很有用的功能,但是Beautiful Soup没有使用这种方式.默认情况下,Beautiful Soup把引号转换成Unicode: UnicodeDammit(markup, ["windows-1252"]).unicode_markup
# u'<p>I just \u201clove\u201d Microsoft Word\u2019s smart quotes</p>' |
Archiver|手机版|笨鸟自学网 ( 粤ICP备20019910号 )
GMT+8, 2024-10-5 18:33 , Processed in 0.014737 second(s), 17 queries .