复习:UA检测
UA:User-Agent(请求载体的身份标识).
门户网站的服务器会检测对应请求的载体身份标识,如果检测到请求的载体身份标识为某一款浏览器,说明该请求是一个正常的请求。但是,如果检测到请求的载体身份标识不是基于某一款浏览器的,则表示该请求为不正常的请求(爬虫),则服务器端就很有可能拒绝该次请求。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import requests
if __name__ == "__main__": headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36' } url = 'https://www.sogou.com/web' kw = input('enter a word:') param = { 'query': kw } response = requests.get(url=url, params=param, headers=headers)
page_text = response.text fileName = kw + '.html' with open(fileName, 'w', encoding='utf-8') as fp: fp.write(page_text) print(fileName, '保存成功!!!')
|
Tips:
Please indicate the source and original author when reprinting or quoting this article.