复习:Python正则解析语法
需求:爬取糗事百科中糗图板块下所有的糗图图片
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
|
import requests import re import os
if __name__ == "__main__": if not os.path.exists('./img'): os.mkdir('./img')
url = 'https://www.sitapix.com/search/%E5%B1%B1' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36' } page_text = requests.get(url=url, headers=headers).text ex = '<div class="thumb">.*?<img src="(.*?)" alt.*?</div>' img_src_list = re.findall(ex, page_text, re.S) for src in img_src_list: src = 'https:' + src img_data = requests.get(url=src, headers=headers).content img_name = src.split('/')[-1] imgPath = './img/' + img_name with open(imgPath, 'wb') as fp: fp.write(img_data) print(img_name, '下载成功!!!')
|
Tips:
Please indicate the source and original author when reprinting or quoting this article.