XML是实现不一样语言或程序之间进行数据交换的协议,XML文件格式以下python
读xml文件web
<data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data>
from xml.etree import ElementTree #导入xml处理模块app
XML()模块函数dom
功能:解析字符串形式的xml,返回的xml的最外层标签节点,也就是一级标签节点【有参】ide
使用方法:模块名称.XML(xml字符串变量)
函数
格式如:a = ElementTree.XML(neir)oop
text模块关键字ui
功能:获取标签里的字符串this
使用方法:要获取字符串的标签节点变量.text
编码
格式如:b = a.text
http请求处理xmlQQ在线状态
#!/usr/bin/env python # -*- coding:utf8 -*- """http请求处理xmlQQ在线状态""" import requests #导入http请求模块 from xml.etree import ElementTree #导入xml处理模块 http =requests.get("http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=729088188") #发送http请求 http.encoding = "utf-8" #http请求编码 neir = http.text #获取http请求的xml字符串代码 print(neir) #打印获取到http请求的xml字符串代码 print("\n") a = ElementTree.XML(neir) #解析xml,返回的xml的最外层标签节点,也就是一级标签节点 print(a) #打印xml的最外层标签节点,也就是一级标签节点 print("\n") b = a.text #获取节点标签里的字符串 print(b) #能够根据这个标签的字符串来判断QQ是否在线 #输出 # <?xml version="1.0" encoding="utf-8"?> # <string xmlns="http://WebXml.com.cn/">V</string> # <Element '{http://WebXml.com.cn/}string' at 0x0000005692820548> # V
注意: 返回以Element开头的为标签节点 如:<Element '{http://WebXml.com.cn/}DataSet' at 0x0000008C179C0548>
iter()模块函数
功能:获取一级标签节点下的,多个同名同级的标签节点,可跨级的获取节点,返回迭代节点,须要for循环出标签【有参】
使用方法:解析xml节点变量.iter("要获取的标签名称")
格式如:b = a.iter("TrainDetailInfo")
#!/usr/bin/env python # -*- coding:utf8 -*- """http请求处理列车时刻表""" import requests #导入http请求模块 from xml.etree import ElementTree #导入xml处理模块 http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k234&UserID=") #发送http请求 http.encoding = "utf-8" #http请求编码 neir = http.text #获取http请求的xml字符串代码 a = ElementTree.XML(neir) #解析xml,返回的xml的最外层标签节点,也就是一级标签节点 print(a) #打印xml的最外层标签节点,也就是一级标签节点 print("\n") b = a.iter("TrainDetailInfo") #获取一级标签下的,多个同名同级的标签节点,返回迭代节点,须要for循环出标签节点 for i in b: print(i) #循环出全部的TrainDetailInfo标签节点 # 输出 # <Element '{http://WebXml.com.cn/}DataSet' at 0x000000B52FC7F548> # # # <Element 'TrainDetailInfo' at 0x000000B52FC7FB88> # <Element 'TrainDetailInfo' at 0x000000B52FC7FD18> # <Element 'TrainDetailInfo' at 0x000000B52FC7FEA8> # <Element 'TrainDetailInfo' at 0x000000B52FC98098> # <Element 'TrainDetailInfo' at 0x000000B52FC98228> # <Element 'TrainDetailInfo' at 0x000000B52FC983B8> # <Element 'TrainDetailInfo' at 0x000000B52FC98548> # <Element 'TrainDetailInfo' at 0x000000B52FC986D8> # <Element 'TrainDetailInfo' at 0x000000B52FC98868> # <Element 'TrainDetailInfo' at 0x000000B52FC989F8> # <Element 'TrainDetailInfo' at 0x000000B52FC98B88> # <Element 'TrainDetailInfo' at 0x000000B52FC98D18> # <Element 'TrainDetailInfo' at 0x000000B52FC98EA8> # <Element 'TrainDetailInfo' at 0x000000B52FC9B098> # <Element 'TrainDetailInfo' at 0x000000B52FC9B228> # <Element 'TrainDetailInfo' at 0x000000B52FC9B3B8> # <Element 'TrainDetailInfo' at 0x000000B52FC9B548> # <Element 'TrainDetailInfo' at 0x000000B52FC9B6D8>
tag模块关键字
功能:获取标签的名称,返回标签名称
使用方法:标签节点变量.tag
格式如:i.tag
attrib模块关键字
功能:获取标签的属性,以字典形式返回标签属性
使用方法:标签节点变量.attrib
格式如:i.attrib
#!/usr/bin/env python # -*- coding:utf8 -*- """http请求处理列车时刻表""" import requests #导入http请求模块 from xml.etree import ElementTree #导入xml处理模块 http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k234&UserID=") #发送http请求 http.encoding = "utf-8" #http请求编码 neir = http.text #获取http请求的xml字符串代码 a = ElementTree.XML(neir) #解析xml,返回的xml的最外层标签节点,也就是一级标签节点 print(a) #打印xml的最外层标签节点,也就是一级标签节点 print("\n") b = a.iter("TrainDetailInfo") #获取一级标签下的,多个同名同级的标签节点,返回迭代节点,须要for循环出标签节点 for i in b: print(i.tag,i.attrib) #tag获取标签的名称,attrib获取标签的属性 # 输出 # <Element '{http://WebXml.com.cn/}DataSet' at 0x0000008C179C0548> # # # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '0', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo1', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '1', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo2', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '2', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo3', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '3', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo4', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '4', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo5', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '5', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo6', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '6', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo7', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '7', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo8', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '8', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo9', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'} # TrainDetailInfo {'{urn:schemas-microsoft-com:xml-msdata}rowOrder': '9', '{urn:schemas-microsoft-com:xml-diffgram-v1}id': 'TrainDetailInfo10', '{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges': 'inserted'}
find()模块函数
功能:查找一个标签节点下的子标签节点,返回子标签节点【有参】
使用方法:父标签节点变量.find("要查找的子标签名称")
格式如:i.find("TrainStation")
#!/usr/bin/env python # -*- coding:utf8 -*- """http请求处理xmll列出时刻表""" import requests #导入http请求模块 from xml.etree import ElementTree #导入xml处理模块 http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k234&UserID=") #发送http请求 http.encoding = "utf-8" #http请求编码 neir = http.text #获取http请求的xml字符串代码 a = ElementTree.XML(neir) #解析xml,返回的xml的最外层标签节点,也就是一级标签节点 print(a) #打印xml的最外层标签节点,也就是一级标签节点 print("\n") b = a.iter("TrainDetailInfo") #获取一级标签下的,多个同名同级的标签节点,返回迭代句柄,须要for循环出标签节点 for i in b: print(i.find("TrainStation")) #find()查找一个标签节点下的子标签节点 # 输出 # <Element '{http://WebXml.com.cn/}DataSet' at 0x000000BF7CFC0548> # # # <Element 'TrainStation' at 0x000000BF7CFC0BD8> # <Element 'TrainStation' at 0x000000BF7CFC0D68> # <Element 'TrainStation' at 0x000000BF7CFC0EF8> # <Element 'TrainStation' at 0x000000BF7CFD90E8> # <Element 'TrainStation' at 0x000000BF7CFD9278> # <Element 'TrainStation' at 0x000000BF7CFD9408> # <Element 'TrainStation' at 0x000000BF7CFD9598> # <Element 'TrainStation' at 0x000000BF7CFD9728> # <Element 'TrainStation' at 0x000000BF7CFD98B8> # <Element 'TrainStation' at 0x000000BF7CFD9A48> # <Element 'TrainStation' at 0x000000BF7CFD9BD8> # <Element 'TrainStation' at 0x000000BF7CFD9D68> # <Element 'TrainStation' at 0x000000BF7CFD9EF8> # <Element 'TrainStation' at 0x000000BF7CFDC0E8> # <Element 'TrainStation' at 0x000000BF7CFDC278> # <Element 'TrainStation' at 0x000000BF7CFDC408> # <Element 'TrainStation' at 0x000000BF7CFDC598> # <Element 'TrainStation' at 0x000000BF7CFDC728>
拿出标签里须要的数据
#!/usr/bin/env python # -*- coding:utf8 -*- """http请求处理xmlQQ在线状态""" import requests #导入http请求模块 from xml.etree import ElementTree #导入xml处理模块 http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k567&UserID=") #发送http请求 http.encoding = "utf-8" #http请求编码 neir = http.text #获取http请求的xml字符串代码 a = ElementTree.XML(neir) #解析xml,返回的xml的最外层标签节点,也就是一级标签节点 print(a) #打印xml的最外层标签节点,也就是一级标签节点 print("\n") b = a.iter("TrainDetailInfo") #获取一级标签下的,多个同名同级的标签节点,返回迭代句柄,须要for循环出标签节点 for i in b: print(i.find("TrainStation").text,i.find("ArriveTime").text,i.find("StartTime").text,i.find("KM").text) #获取标签里的字符串 # 输出 # <Element '{http://WebXml.com.cn/}DataSet' at 0x000000B9C3775408> # # # 天津(车次:k567) None 15:30:00 0 # 唐山 17:06:00 17:11:00 114 # 昌黎 18:19:00 18:22:00 226 # 北戴河 18:43:00 18:46:00 254 # 秦皇岛 19:04:00 19:09:00 275 # 山海关 19:34:00 19:50:00 292 # 绥中 20:54:00 20:58:00 357 # 兴城 21:34:00 21:38:00 405 # 葫芦岛 21:58:00 22:02:00 426 # 锦州 22:47:00 22:53:00 476 # 沟帮子 23:42:00 23:46:00 540 # 沈阳 02:19:00 02:31:00 718 # 四平 05:16:00 05:19:00 906 # 八面城 05:40:00 05:42:00 934 # 双辽 06:28:00 06:33:00 996 # 保康 07:28:00 07:30:00 1076 # 太平川 07:56:00 08:00:00 1110 # 开通 08:33:00 08:36:00 1159 # 洮南 09:21:00 09:24:00 1227 # 白城 09:52:00 10:04:00 1259 # 镇赉 10:31:00 10:34:00 1297 # 泰来 11:22:00 11:24:00 1361 # 江桥 12:02:00 12:06:00 1409 # 三间房 12:51:00 12:53:00 1447 # 齐齐哈尔 13:26:00 None 1477
注意:xml经过XML()解析获得一级节点后,能够用for循环节点,和嵌套循环节点来获得想要的节点【重点】
以下列:
#!/usr/bin/env python # -*- coding:utf8 -*- """ 注意:xml经过ElementTree.XML()解析获得一级节点后,能够用for循环节点,和嵌套循环节点来获得想要的节点 以下列: <data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data> """ a = open("xml.xml","r",encoding="utf-8") #本地打开一个xml文件 b = a.read() #读出文件内容 from xml.etree import ElementTree #导入xml处理模块 c = ElementTree.XML(b) #解析xml获得,第一个节点,也就是一级节点 for i in c: #循环一级节点里的节点,获得二级节点 for s in i: #循环二级节点里的节点,获得三级节点下的节点 print(s) #打印出三级节点 # 输出 # <Element 'rank' at 0x000000A4C3B083B8> # <Element 'year' at 0x000000A4C3B08408> # <Element 'gdppc' at 0x000000A4C3B08458> # <Element 'neighbor' at 0x000000A4C3B084A8> # <Element 'neighbor' at 0x000000A4C3B084F8> # <Element 'rank' at 0x000000A4C3B08598> # <Element 'year' at 0x000000A4C3B085E8> # <Element 'gdppc' at 0x000000A4C3B08638> # <Element 'neighbor' at 0x000000A4C3B08688> # <Element 'rank' at 0x000000A4C3B08728> # <Element 'year' at 0x000000A4C3B08778> # <Element 'gdppc' at 0x000000A4C3B087C8> # <Element 'neighbor' at 0x000000A4C3B08818> # <Element 'neighbor' at 0x000000A4C3B08868>
重点:推荐iter()跨级获取父点,如iter()没法知足需求考虑用for循环节在配合使用。:for循环节点,配合iter()跨级获取父点,配合find()获取子字典,能够获取到全部你想须要的节点。
解析XML
解析xml有两种方式,一种是字符串方式解析XML(),一种是xml文件直接解析parse()
字符串方式解析XML()
#!/usr/bin/env python # -*- coding:utf8 -*- a = open("xml.xml","r",encoding="utf-8") #以utf-8编码只读模式打开 b = a.read() #读出文件里的字符串 a.close() from xml.etree import ElementTree #导入xml解析模块 c = ElementTree.XML(b) #解析字符串形式的xml,返回的xml的最外层标签节点,也就是一级标签节点 print(c) # 输出 # <Element 'data' at 0x000000D2756B3228>
xml文件直接解析parse() 注意:parse()方式解析xml是便可读,也可对xml文件写入的,包括,增,删,改
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #导入xml解析模块 c = ElementTree.parse("xml.xml") #打开一个xml文件 b = c.getroot() #获取xml文件的一级节点 print(b) # 输出 # <Element 'data' at 0x0000006C681C3228>
写xml文件包括(增,删,改)
parse()模块函数
功能:打开xml文件解析,直接打开一个xml文件,parse()方式解析xml是便可读,也可对xml文件写入的,包括,增,删,改【有参】
使用方法:模块名称.parse("要打开解析的xml文件路径或名称")
格式如:c = ElementTree.parse("xml.xml")
getroot()模块函数
功能:获取parse()方式解析的xml节点,返回的一级节点,也就是最外层节点
使用方法:打开xml文件解析变量.getroot()
格式如:b = c.getroot()
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #导入xml解析模块 c = ElementTree.parse("xml.xml") #打开一个xml文件 b = c.getroot() #获取xml文件的一级节点 print(b) # 输出 # <Element 'data' at 0x000000F9AF0D3228>
注意:parse()解析xml文件能够读写,只是解析方式不一样,其余的获取里面的标签节点,和获取标签里的字符串的方式,都和XML()解析是同样的
write()模块函数
功能:写入保存修改的xml文件【有参】
使用方法:parse()解析变量.write("保存文件名称",encoding='字符编码', xml_declaration=True, short_empty_elements=False)
参数说明
encoding="字符编码"
xml_declaration=True :写入xml文件时是否写入xml版本信息和字符编码如:<?xml version='1.0' encoding='utf-8'?> True 写入 False 不写入
short_empty_elements=False :是否自动将没有值的空标签保存为单标签 True 是 False 否
格式如:c.write("xml.xml", encoding='utf-8')
修改一个标签里的字符串
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #导入xml解析模块 c = ElementTree.parse("xml.xml") #打开一个xml文件 b = c.getroot() #获取xml文件的一级节点 print(b,"\n") #打印获取xml文件的一级节点 d = b.iter("rank") #获取一级节点下的,全部名称为rank的节点 for i in d: #循环出全部rank节点 f1 = int(i.text) + 1 #获取rank节点标签里的字符串,而后转换成数字类型加1,赋值给f1 i.text = str(f1) #而后将rank节点里的字符串,修改为f1的值 c.write("xml.xml", encoding='utf-8',xml_declaration=True, short_empty_elements=False) #最后将写入保存
set()模块函数
功能:为节点标签添加属性【有参】
使用方法:节点变量.set("标签属性名称","标签属性值")
格式如:i.set("linguixiou2","yes2")
为标签添加属性
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #导入xml解析模块 c = ElementTree.parse("xml.xml") #打开一个xml文件 b = c.getroot() #获取xml文件的一级节点 print(b,"\n") #打印获取xml文件的一级节点 d = b.iter("year") #获取一级节点下的,全部名称为rank的节点 for i in d: #循环出全部rank节点 i.set("linguixiou2","yes2") #为当前节点标签添加属性 i.set("linguixiou","yes") #为当前节点标签添加属性 c.write("xml.xml", encoding='utf-8') #最后将写入保存
del 模块关键字
功能:删除标签属性
使用方法:del 当前标签节点.attrib["要删除的标签属性名称"]
格式如:del i.attrib["linguixiou"]
删除标签属性
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #导入xml解析模块 c = ElementTree.parse("xml.xml") #打开一个xml文件 b = c.getroot() #获取xml文件的一级节点 print(b,"\n") #打印获取xml文件的一级节点 d = b.iter("year") #获取一级节点下的,全部名称为rank的节点 for i in d: #循环出全部rank节点 i.set("linguixiou2","yes2") #为当前节点标签添加属性 i.set("linguixiou","yes") #为当前节点标签添加属性 del i.attrib["linguixiou"] #(删除标签的属性)del删除,i当前节点,attrib获取标签名称,["linguixiou"]标签名称里的属性名称 c.write("xml.xml", encoding='utf-8') #最后将写入保存
findall()模块函数
功能:获取一级节点,下的多个同名同级的节点,返回成一个列表,每一个元素是一个节点
使用方法:一级节点变量.findall("要获取的节点标签名称")
格式如:d = b.findall("country")
查找多个同名同级的节点的某一个节点
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree #导入xml解析模块 c = ElementTree.parse("xml.xml") #打开一个xml文件 b = c.getroot() #获取xml文件的一级节点 print(b,"\n") #打印获取xml文件的一级节点 d = b.findall("country") #获取一级节点,下的多个同名同级的节点,返回成一个列表,每一个元素是一个节点 print(d,"\n") e = d[0].find("rank") #索引列表里的第0个元素节点,查找0个元素节点下的rank子节点 print(e) # 输出 # <Element 'data' at 0x000000278AB63228> # # [<Element 'country' at 0x000000278AD12B88>, <Element 'country' at 0x000000278AD27548>, <Element 'country' at 0x000000278AD276D8>] # # <Element 'rank' at 0x000000278AD273B8>
remove()模块函数
功能:删除父节点下的子节点
使用方法:父节点变量.remove(从父节点找到子节点)
格式如:e.remove(e.find("rank"))
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as xml #导入xml解析模块 c = xml.parse("xml.xml") #打开一个xml文件 b = c.getroot() #获取xml文件的一级节点 e = b.find("country") #获取根节点下的country节点 e.remove(e.find("rank")) #删除country节点下的gdppc节点 c.write("xml.xml", encoding='utf-8',xml_declaration=True, short_empty_elements=False) #最后将写入保存
Element()模块函数
功能:建立标签节点,注意只是建立了节点,须要结合append()追加到父节点下
使用方法:要建立的父级节点变量.Element("标签名称",attrib={'键值对字典形式的标签属性})
格式如:xml.Element('er', attrib={'name': '2'})
append()模块函数
功能:追加标签节点,将一个建立的节点,追加到父级节点下
使用方法:父级节点变量.append("建立节点变量")
格式如:geng.append(b1)
ElementTree()模块函数
功能1:建立一个ElementTree对象,生成xml,追加等操做后生成xml
注意:XML()解析的Element对象,使用ElementTree()建立一个ElementTree对象,也能够修改保存的
使用方法:模块名称.ElementTree(根节点)
格式如:tree = xml.ElementTree(geng)
新建一个xml文件1,注意这个方式生成的xml文件是没自动缩进的
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as xml #as将模块名称重命名为xml geng = xml.Element("geng") #建立一级节点,根节点 b1 = xml.Element('er', attrib={'name': '2'}) #建立二级节点 b2 = xml.Element('er', attrib={'name': '2'})#建立二级节点 geng.append(b1) #将二级节点添加到根节点下 geng.append(b2) #将二级节点添加到根节点下 c1 = xml.Element('san', attrib={'name': '3'})#建立三级节 c2 = xml.Element('san', attrib={'name': '3'})#建立三级节 b1.append(c1) #将三级节点添加到二级节点下 b2.append(c2) #将三级节点添加到二级节点下 tree = xml.ElementTree(geng) #生成xml tree.write('oooo.xml',encoding='utf-8',xml_declaration=True, short_empty_elements=False)
SubElement()【推荐】
功能:建立节点追加节点,而且能够定义标签名称,标签属性,以及标签的text文本值
使用方法:定义变量 = 模块名称.SubElement(要追加的父级变量,"标签名称",attrib={"字典形式标签属性"})
定义变量.text = "标签的text文本值"
格式如:ch1 = xml.SubElement(geng,"er",attrib={"name":"zhi"})
ch1.text = "二级标签"
新建xml文件2,注意这个方式生成的xml文件是没自动缩进的
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as xml #as将模块名称重命名为xml geng = xml.Element("geng") #建立一级节点,根节点 ch1 = xml.SubElement(geng,"er",attrib={"name":"zhi"}) ch1.text = "二级标签" ch2 = xml.SubElement(ch1,"er3",attrib={"name":"zhi3"}) ch2.text = "三级标签" tree = xml.ElementTree(geng) #生成xml tree.write('oooo.xml',encoding='utf-8',xml_declaration=True, short_empty_elements=False)
新建xml文件3【推荐】,自动缩进
缩进须要引入 xml文件夹下dom文件夹里的 minidom 模块文件
from xml.etree import ElementTree as ET 引入ElementTree模块
from xml.dom import minidom 引入模块
建立节点等操做用ElementTree模块,minidom只作缩进处理
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as ET from xml.dom import minidom def prettify(elem): """将节点转换成字符串,并添加缩进。""" rough_string = ET.tostring(elem, 'utf-8') reparsed = minidom.parseString(rough_string) return reparsed.toprettyxml(indent="\t") # 建立根节点 root = ET.Element("famliy") #建立一级节点 a = ET.SubElement(root,"erji",attrib={"mna":"zhi"}) #建立二级节点 b1 = ET.SubElement(a,"mingzi") #建立三级节点 b1.text = "林贵秀" b2 = ET.SubElement(a,"xingbie") #建立三级节点 b2.text = "男" b3 = ET.SubElement(a,"nianl") #建立三级节点 b3.text = "32" raw_str = prettify(root) #将整个根节点传入函数里缩进处理 f = open("xxxoo.xml",'w',encoding='utf-8') #打开xxxoo.xml文件 f.write(raw_str) #将缩进处理好的整个xml写入文件 f.close() #关闭打开的文件
XML 命名空间提供避免元素命名冲突的方法。
在 XML 中,元素名称是由开发者定义的,当两个不一样的文档使用相同的元素名时,就会发生命名冲突。
这个 XML 文档携带着某个表格中的信息:
<table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table>
这个 XML 文档携带有关桌子的信息(一件家具):
<table> <name>African Coffee Table</name> <width>80</width> <length>120</length> </table>
假如这两个 XML 文档被一块儿使用,因为两个文档都包含带有不一样内容和定义的 <table> 元素,就会发生命名冲突。
XML 解析器没法肯定如何处理这类冲突。
此文档带有某个表格中的信息:
<h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table>
此 XML 文档携带着有关一件家具的信息:
<f:table> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table>
如今,命名冲突不存在了,这是因为两个文档都使用了不一样的名称来命名它们的 <table> 元素 (<h:table> 和 <f:table>)。
经过使用前缀,咱们建立了两种不一样类型的 <table> 元素。
register_namespace()模块函数
功能:建立命名空间
使用方法:模块名称.register_namespace("命名名称","命名名称值")
格式如:ET.register_namespace('com',"http://www.company.com")
#!/usr/bin/env python # -*- coding:utf8 -*- from xml.etree import ElementTree as ET ET.register_namespace('com',"http://www.company.com") #register_namespace("命名名称","命名名称值")建立命名空间 """ 建立命名空间后,后面建立节点的时候,定义节点标签,和定义节点标签属性的时候,在里面加入命名名称值 如【"{命名名称值}节点标签名称"】,这样会自动将命名名称值转换成命名名称 简单的理解就是,给标签,标签属性,加上一个标示,防止名称冲突 """ root = ET.Element("{http://www.company.com}STUFF") body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF", attrib={"{http://www.company.com}hhh": "123"}) body.text = "STUFF EVERYWHERE!" tree = ET.ElementTree(root) tree.write("page.xml",xml_declaration=True,encoding='utf-8',method="xml") #生成以下 # <?xml version='1.0' encoding='utf-8'?> # <com:STUFF xmlns:com="http://www.company.com"> # <com:MORE_STUFF com:hhh="123">STUFF EVERYWHERE!</com:MORE_STUFF> # </com:STUFF>
重点总结
一.解析xml文件
有两种方式
字符串方式:XML()解析,返回的是Element对象,直接获得根节点
文件方式:parse()解析,返回的是ElementTree对象,要经过getroot()来获得Element对象,获得根节点
重点:ElementTree对象才能够写入文件,Element没法写入文件,因此若是是Element解析的,须要ElementTree(根节点变量)函数来建立ElementTree对象,后就能够写入了
二.ElementTree对象
1.ElementTree 类建立,能够经过ElementTree(xxxx)建立对象,parse()底层仍是调用ElementTree()建立的
2.ElementTree 对象,经过getroot()获取根节点
ElementTree() 建立一个ElementTree对象,生成xml
write() 内存中的xml写入文件
三.Element对象
iter() 获取一级标签节点下的,多个同名同级的标签节点,可跨级的获取节点,返回迭代节点,须要for循环出标签【有参】
findall() 获取一级节点,下的多个同名同级的节点,返回成一个列表,每一个元素是一个节点
find() 查找一个标签节点下的子标签节点,返回子标签节点【有参】
set() 为节点标签添加属性【有参】
remove() 删除父节点下的子节点
text 获取标签里的文本字符串
tag 获取标签的名称,返回标签名称
attrib 获取标签的属性,以字典形式返回标签属性
del 删除标签属性
Element() 建立标签节点,注意只是建立了节点,须要结合append()追加到父节点下
append() 追加标签节点,将一个建立的节点,追加到父级节点下
SubElement() 建立节点追加节点,而且能够定义标签名称,标签属性,以及标签的text文本值
register_namespace() 建立命名空间
Element,操做详细源码
class Element: """An XML element. This class is the reference implementation of the Element interface. An element's length is its number of subelements. That means if you want to check if an element is truly empty, you should check BOTH its length AND its text attribute. The element tag, attribute names, and attribute values can be either bytes or strings. *tag* is the element name. *attrib* is an optional dictionary containing element attributes. *extra* are additional element attributes given as keyword arguments. Example form: <tag attrib>text<child/>...</tag>tail """ 当前节点的标签名 tag = None """The element's name.""" 当前节点的属性 attrib = None """Dictionary of the element's attributes.""" 当前节点的内容 text = None """ Text before first subelement. This is either a string or the value None. Note that if there is no text, this attribute may be either None or the empty string, depending on the parser. """ tail = None """ Text after this element's end tag, but before the next sibling element's start tag. This is either a string or the value None. Note that if there was no text, this attribute may be either None or an empty string, depending on the parser. """ def __init__(self, tag, attrib={}, **extra): if not isinstance(attrib, dict): raise TypeError("attrib must be dict, not %s" % ( attrib.__class__.__name__,)) attrib = attrib.copy() attrib.update(extra) self.tag = tag self.attrib = attrib self._children = [] def __repr__(self): return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self)) def makeelement(self, tag, attrib): 建立一个新节点 """Create a new element with the same type. *tag* is a string containing the element name. *attrib* is a dictionary containing the element attributes. Do not call this method, use the SubElement factory function instead. """ return self.__class__(tag, attrib) def copy(self): """Return copy of current element. This creates a shallow copy. Subelements will be shared with the original tree. """ elem = self.makeelement(self.tag, self.attrib) elem.text = self.text elem.tail = self.tail elem[:] = self return elem def __len__(self): return len(self._children) def __bool__(self): warnings.warn( "The behavior of this method will change in future versions. " "Use specific 'len(elem)' or 'elem is not None' test instead.", FutureWarning, stacklevel=2 ) return len(self._children) != 0 # emulate old behaviour, for now def __getitem__(self, index): return self._children[index] def __setitem__(self, index, element): # if isinstance(index, slice): # for elt in element: # assert iselement(elt) # else: # assert iselement(element) self._children[index] = element def __delitem__(self, index): del self._children[index] def append(self, subelement): 为当前节点追加一个子节点 """Add *subelement* to the end of this element. The new element will appear in document order after the last existing subelement (or directly after the text, if it's the first subelement), but before the end tag for this element. """ self._assert_is_element(subelement) self._children.append(subelement) def extend(self, elements): 为当前节点扩展 n 个子节点 """Append subelements from a sequence. *elements* is a sequence with zero or more elements. """ for element in elements: self._assert_is_element(element) self._children.extend(elements) def insert(self, index, subelement): 在当前节点的子节点中插入某个节点,即:为当前节点建立子节点,而后插入指定位置 """Insert *subelement* at position *index*.""" self._assert_is_element(subelement) self._children.insert(index, subelement) def _assert_is_element(self, e): # Need to refer to the actual Python implementation, not the # shadowing C implementation. if not isinstance(e, _Element_Py): raise TypeError('expected an Element, not %s' % type(e).__name__) def remove(self, subelement): 在当前节点在子节点中删除某个节点 """Remove matching subelement. Unlike the find methods, this method compares elements based on identity, NOT ON tag value or contents. To remove subelements by other means, the easiest way is to use a list comprehension to select what elements to keep, and then use slice assignment to update the parent element. ValueError is raised if a matching element could not be found. """ # assert iselement(element) self._children.remove(subelement) def getchildren(self): 获取全部的子节点(废弃) """(Deprecated) Return all subelements. Elements are returned in document order. """ warnings.warn( "This method will be removed in future versions. " "Use 'list(elem)' or iteration over elem instead.", DeprecationWarning, stacklevel=2 ) return self._children def find(self, path, namespaces=None): 获取第一个寻找到的子节点 """Find first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return the first matching element, or None if no element was found. """ return ElementPath.find(self, path, namespaces) def findtext(self, path, default=None, namespaces=None): 获取第一个寻找到的子节点的内容 """Find text for first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *default* is the value to return if the element was not found, *namespaces* is an optional mapping from namespace prefix to full name. Return text content of first matching element, or default value if none was found. Note that if an element is found having no text content, the empty string is returned. """ return ElementPath.findtext(self, path, default, namespaces) def findall(self, path, namespaces=None): 获取全部的子节点 """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Returns list containing all matching elements in document order. """ return ElementPath.findall(self, path, namespaces) def iterfind(self, path, namespaces=None): 获取全部指定的节点,并建立一个迭代器(能够被for循环) """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return an iterable yielding all matching elements in document order. """ return ElementPath.iterfind(self, path, namespaces) def clear(self): 清空节点 """Reset element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None. """ self.attrib.clear() self._children = [] self.text = self.tail = None def get(self, key, default=None): 获取当前节点的属性值 """Get element attribute. Equivalent to attrib.get, but some implementations may handle this a bit more efficiently. *key* is what attribute to look for, and *default* is what to return if the attribute was not found. Returns a string containing the attribute value, or the default if attribute was not found. """ return self.attrib.get(key, default) def set(self, key, value): 为当前节点设置属性值 """Set element attribute. Equivalent to attrib[key] = value, but some implementations may handle this a bit more efficiently. *key* is what attribute to set, and *value* is the attribute value to set it to. """ self.attrib[key] = value def keys(self): 获取当前节点的全部属性的 key """Get list of attribute names. Names are returned in an arbitrary order, just like an ordinary Python dict. Equivalent to attrib.keys() """ return self.attrib.keys() def items(self): 获取当前节点的全部属性值,每一个属性都是一个键值对 """Get element attributes as a sequence. The attributes are returned in arbitrary order. Equivalent to attrib.items(). Return a list of (name, value) tuples. """ return self.attrib.items() def iter(self, tag=None): 在当前节点的子孙中根据节点名称寻找全部指定的节点,并返回一个迭代器(能够被for循环)。 """Create tree iterator. The iterator loops over the element and all subelements in document order, returning all elements with a matching tag. If the tree structure is modified during iteration, new or removed elements may or may not be included. To get a stable set, use the list() function on the iterator, and loop over the resulting list. *tag* is what tags to look for (default is to return all elements) Return an iterator containing all the matching elements. """ if tag == "*": tag = None if tag is None or self.tag == tag: yield self for e in self._children: yield from e.iter(tag) # compatibility def getiterator(self, tag=None): # Change for a DeprecationWarning in 1.4 warnings.warn( "This method will be removed in future versions. " "Use 'elem.iter()' or 'list(elem.iter())' instead.", PendingDeprecationWarning, stacklevel=2 ) return list(self.iter(tag)) def itertext(self): 在当前节点的子孙中根据节点名称寻找全部指定的节点的内容,并返回一个迭代器(能够被for循环)。 """Create text iterator. The iterator loops over the element and all subelements in document order, returning all inner text. """ tag = self.tag if not isinstance(tag, str) and tag is not None: return if self.text: yield self.text for e in self: yield from e.itertext() if e.tail: yield e.tail 复制代码