Python爬虫是如何遍历文档树呢？一招教你(4)|python教程|python入门|python教程

当前位置:

首页 > 编程开发 > python入门 >

python入门教程之Python爬虫是如何遍历文档树呢？一招教你(4)

本站最新发布 Python从入门到精通|Python基础教程
试听地址 https://www.xin3721.com/eschool/pythonxin3721/

运行结果

<generator object descendants at 0x00519AB0>

<title>The Dormouse's story</title>

The Dormouse's story

3.节点内容：.string属性

如果Tag只有一个NavigableString类型子节点，那么这个Tag可以使用.string得到子节点。如果一个Tag仅有一个子节点，那么这个Tab也可以使用.string方法，输出结果与当前唯一子节点的.string结果相同。

通俗点来讲就是：如果一个标签里面没有标签了，那么.string就会返回标签里面的内容。如果标签里面只有唯一的一个标签了，那么.string也会返回里面的内容。例如：

#!/usr/bin/python3

# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup

html = """

<html><head><title>The Dormouse's story</title></head>

<body>

<p class="title" name="dromouse"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.</p>

<p class="story">...</p>

"""

# 创建 Beautiful Soup 对象，指定lxml解析器

soup = BeautifulSoup(html, "lxml")

print(soup.head.string)

print(soup.head.title.string)

栏目列表

首页 > 编程开发 > python入门 >

python入门教程之Python爬虫是如何遍历文档树呢？一招教你(4)