VB.net 2010 视频教程 VB.net 2010 视频教程 python基础视频教程
SQL Server 2008 视频教程 c#入门经典教程 Visual Basic从门到精通视频教程
当前位置:
首页 > 编程开发 > .net教程 >
  • ASP.net教程之C#采用vony.Html.AIO插件批量爬MM网站图(2)

 

 3.所有分页都获取到了,接下来就是要获取页面中的每张图片了,打开页面查看源代码:

观察发现,所有的图片都在class=img的div下面,那就可以从每个分页中直接下载所有的图片了,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
//获取每一个分页的文档模型
 IHtmlDocument htm2 = new JumonyParser().LoadDocument($"{address}{href}", System.Text.Encoding.GetEncoding("utf-8"));
 
                    //获取class=img的div下的img标签
                    var aLink = htm2.Find(".img img");
 
                    foreach (var link in aLink)
                    {
                        var imgsrc = link.Attribute("src").Value();
                        Console.WriteLine("获取到图片路径" + imgsrc);
                        Console.WriteLine($"开始下载图片{imgsrc}>>>>>>>");
                        DownLoadImg(new Image { Address = address + imgsrc, Title = url });
                        
                    }
                }

 图片下载方法如下,为防止下载的时候阻塞主进程,下载采用异步:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/// <summary>
/// 异不下载图片
/// </summary>
/// <param name="image"></param>
async static void DownLoadImg(Image image)
{
    using (WebClient client = new WebClient())
    {
        try
        {
            int start = image.Address.LastIndexOf("/") + 1;
 
            string fileName = image.Address.Substring(start, image.Address.Length - start);
          //图片目录采用页面地址作为文件名
            string directory = "c:/images/" + image.Title.Replace("/""-").Replace("html""") + "/";               
            if (!Directory.Exists(directory))
            {
                Directory.CreateDirectory(directory);
            }
            await client.DownloadFileTaskAsync(new Uri(image.Address), directory + fileName);
        }
        catch (Exception)
        {
            Console.WriteLine($"{image.Address}下载失败");
            File.AppendText(@"c:/log.txt");
        }
        Console.WriteLine($"{image.Address}下载成功");
    }
 
}   
相关教程