python爬虫之获取页面script里面的内容
这是网页上的script 我要获取的是00914这个数字 直接使用正则表达式即可
运行结果:
源码:
import re from bs4 import BeautifulSoup from urllib.request import urlopen url = "你要解析的网页URL" html = urlopen(url).read() soup = BeautifulSoup(html,"html.parser") titles = soup.select("body script") # CSS 选择器 i = 1 for title in titles: if i == 3: #print(title.get_text())# 标签体、标签属性 str=title.get_text() break if i == 2: i = 3 if i == 1: i = 2 print(str) str1 = "\"\"\""+"<script>"+str+"</script>"+"\"\"\"" soup = BeautifulSoup(str1, "html.parser") pattern = re.compile(r"var _url = \'(.*?)\';$", re.MULTILINE | re.DOTALL) script = soup.find("script", text=pattern) #print (pattern.search(script.text).string) s = pattern.search(script.text).string print (s.split(\'\\'\')[11])
版权声明:本文为mm20原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。