爬取LOL皮肤图片
1.进入LOL官网,让后进入”游戏资料”的”资料库”
2.F12检查网页,在netwoork中找到hero_list.js
可以看到通过Ajax请求,获得了英雄列表
3.随便点击几个英雄头像,然后可以发现图片地址规律
https://game.gtimg.cn/images/lol/act/img/skin/big7004.jpg这是妖姬的一个皮肤,
https://game.gtimg.cn/images/lol/act/img/skin/big64005.jpg这是李青的一个皮肤
可以发现big后面跟的是英雄ID和皮肤ID的拼接
4.皮肤ID我们在hero_list.js中就可以找到,至于皮肤ID我们可以设置一个上限值,进行循环测试,一般皮肤都会在25个以内(这里不包括炫彩皮肤)
5.将地址进行拼接,就可以进行下载了.
下面是源码,
import json import os import requests import vthread @vthread.pool(10) def get_heroSkin(hero): skin_url = 'https://game.gtimg.cn/images/lol/act/img/skin/big{}.jpg'#皮肤地址,下面进行拼接 heroId = hero['heroId'] name = hero['name'] alias = hero['alias'] title = hero['title'] if not os.path.exists(f'./lol/{name}-{alias}-{title}'): os.mkdir(f'./lol/{name}-{alias}-{title}') #拼接地址 for i in range(25): if len(str(i)) == 1: i = f'00{i}' elif len(str(i)) == 2: i = f'0{i}' number_url = heroId + i response = requests.get(skin_url.format(number_url)) if response.status_code == 200: with open(f'./lol/{name}-{alias}-{title}/{i}.jpg', 'wb') as f: f.write(response.content) def get_heroId(url): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/79.0.3945.130 Safari/537.36 ' } response = requests.get(url, headers=headers) html = response.content.decode('utf-8') # 是将原网页的utf-8转化为unicode(解码) #response.encoding = 'utf-8'进行utf-8编码 html = json.loads(html) heros = html['hero'] #print(type(html), html['hero'][0]) for hero in heros: get_heroSkin(hero) def main(): # 获取英雄id url = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js' if not os.path.exists('./lol'): os.mkdir('./lol') get_heroId(url) if __name__ == '__main__': main()
其中引入import vthead是使用多线程加速下载.
下面展示一下成果: