“电气自动化搬砖打工人的IT探索之路”
声明:本篇文章纯粹记录如何下载通过python下载西门子技术中心文档,所下载文档完全自己学习使用,非商业用途;本文以西门子技术中心"S7-200 smart"手册下载为例,不适用于其它项目;
编辑
由于西门子技术中心网站是由JS渲染,常规的requsts+beautiful sopu 无法获取需要的资源;通过对网页请求内容进行抓包解析发现,所有查询结果均通过接口实现,返回数据为JSON结构,且内部包含相关手册下载链接;因此,技术实现通过Requests.get()获取接口内容,然后提取下载链接进行下载;
一. 获取接口信息
通过抓包工具Fiddler Classic或者直接使用浏览器工具(edge按F12即可)可获取网页所有请求信息,Fiddler Classic比较直观,如下图查询s7-200 samrt 手册请求信息;有了接口信息就可以开始python程序编写;
编辑
二. python程序设计
程序适用requests_html获取接口数据,再对返回数据Json数据进行处理,因此需要安装requests_html库和Json库;
2.1 安装requests_html库:
pip install requests-html
2.2 安装JSON库:
pip install json
2.3 python库安装完成后即可开始程序编写,获取返回结果,并将JSON转换为python对象字典,具体代码如下:
import requests_htmlimport json
session = requests_html.HTMLSession()host = "https://support.industry.siemens.com"API = "/webbackend/api/ProductSupport/ProductSupportSearch"url = host+APIpayload = {'language': 'zh', 'region': 'cn', 'networks': 'Internet', 'documentType': 'Manual', 'suppressedResource': 'productNodePath', '$search': "'s7-1200'", '$orderby': 'DefaultRankingDesc', '$top': '100', '$skip': f'{skpitem}', '$inlinecount': 'allpages'}try: content = session.get(url, params=urllib.parse.urlencode( payload, quote_via=urllib.parse.quote, safe='$')) print("请求接口完成") data = json.loads(content.text)except: print("请求接口失败")
返回结果(示例):
{ "AlternateLanguageTitle": "en", "AlternateLanguageCount": 13, "Documents": [ { "ForProductsText": "6ES7288-2DR32-0AA0, 6ES7288-2DT16-0AA0,...", "ShowMoreProductsLink": true, "HasReleaseVersions": false, "Level1Id": "gen_1318291", "Output": "", "HasAttachment": true, "HasAttachmentsHits": true, "HasHint": false, "MlfbDruckForm": null, "PdfLink": "/cs/attachments/109745610/s7-200_SMART_system_manual_zh-CHS.pdf", "AvailableLanguages": [ { "LanguageTitle": "en", "DocumentTitle": "S7-200 SMART System manual " }, { "LanguageTitle": "zh", "DocumentTitle": "S7-200 SMART 系统手册 " } ], "SlkNavigationNodeId": null, "BusinessUnitId": 4224, "Url": null, "Id": 1318292, "Title": "S7-200 SMART 系统手册", "Description": "系统手册", "Type": "Manual", "Network": "Intranet, Internet", "DocumentDate": "2021-07-15T00:00:00", "DocumentActuality": "None", "Rating": 4.540984, "RatingCount": 122, "LocaleGroupId": 109745610, "LanguageId": 6, "IsSipsManual": false, "SipsSummary": "" } }
2.4 转化完成后即可提取下载链接并保存文件,代码如下:
for k in range(len(data['Documents'])): if 'PdfLink' in data['Documents'][k] and data['Documents'][k]["DocumentDate"].startswith('202'): title = FileName(data['Documents'][k]['Title']) link = data['Documents'][k]['PdfLink'] donwloadlink = host+link if os.path.exists(path+f"/{title}.pdf"): print(f"{title}.pdf 已经存在") continue res = session.get(donwloadlink) with open(path+f"/{title}.pdf", 'wb') as f1, open(path+"/"+FileName(f"{payload['$search']} {payload['documentType']} link.txt"), 'a') as f2: f1.write(res.content) f2.writelines(f"文件名:{title}; 链接地址:{donwloadlink}"+'\n') print(f"标题:{title},链接: {link}")
至此程序完成,本程序jinxian于测试,正常需要多次运行减少bug;