最近须要用python处理一个简单的XML,因其格式较乱,恰巧为了测试BeautifulSoup,因此百度学习了下,发现大多数都是解析HTML的文章,因此翻文档大概笔记下,功能是实现了,但问题不少后期再说吧。python
测试XML代码:web
<?xml version="1.0" encoding="utf-8"?> <web-app> <context-param> <param-name>地址</param-name> <param-value>北京西街</param-value> </context-param> <listener> <listener-class> 寡妇墙..... </listener-class> </listener> <servlet> <servlet-name>姓名</servlet-name> <servlet-class>小强</servlet-class> <init-param> <param-name>动物</param-name> <param-value>人类</param-value> </init-param> <load-on-startup>1</load-on-startup> </servlet> </web-app>
测试python代码shell
#coding=utf-8 ''' 简单测试BeautifulSoup解析XML ''' from bs4 import BeautifulSoup import re #使用BeautifulSoup以XML格式打开test.xml文件 soup = BeautifulSoup(open('test.xml'),'xml') #格式化XML输出 print soup.prettify() #查找全部叫param-value的tag子节点 print "\n" + "*"*20 + "\n" print soup.find_all('param-value') print "\n" + "*"*20 + "\n" #打印出全部符合条件的子节点属性值 for tag in soup.find_all('param-value'): print tag.text.strip() print "\n" + "*"*20 + "\n" #使用正则的方式查找符合条件的子节点 for tag1 in soup.find_all(re.compile('param-value')): print tag1.text.strip()
输出结果app
<?xml version="1.0" encoding="utf-8"?> <web-app> <context-param> <param-name> 地址 </param-name> <param-value> 北京西街 </param-value> </context-param> <listener> <listener-class> 寡妇墙..... </listener-class> </listener> <servlet> <servlet-name> 姓名 </servlet-name> <servlet-class> 小强 </servlet-class> <init-param> <param-name> 动物 </param-name> <param-value> 人类 </param-value> </init-param> <load-on-startup> 1 </load-on-startup> </servlet> </web-app> ******************** [<param-value>北京西街</param-value>, <param-value>人类</param-value>] ******************** 北京西街 人类 ******************** 北京西街 人类