常见的XML编程接口有DOM和SAX,这两种接口处理XML文件的方式不一样,固然使用场合也不一样。html
DOM和SAX的区别python
DOM(Document Object Model):将XML数据在内存中解析成一个树,经过对树的操做来操做XML。shell
SAX(Simple API for XML):采用事件驱动模型,经过在解析XML的过程当中触发一个个的事件并调用用户定义的回调函数来处理XML文件。编程
因DOM须要将XML数据映射到内存中的树,一是比较慢,二是比较耗内存,而SAX流式读取XML文件,比较快,占用内存少,但须要用户实现回调函数(handler)。dom
SAX和DOM的使用场合函数
Use DOM when:ui
• read-write access to the document is requiredspa
• the processing requires random access to the documentcode
Use SAX when:orm
• dealing with big documents (>1MB)
• looking for a precise information in the document
• instantiating custom objects from the document
SAX程序的构成
readers:XML文件的读取器,读取过程产生一系列的事件发送给handler进程处理;
handlers:事件处理器,由用户自定义事件处理函数;
xmlfiles:要处理的XML文件
exceptions:SAX提供了四种异常处理类
简单示例
首先建立一个xml文件:students.xml:
<?xml version="1.0" encoding="UTF-8"?> <students> <student id="1"> <name>zhangsan</name> <age>20</age> <dob>1990.11.22</dob> </student> <student id="2"> <name>lisi</name> <age>21</age> <dob>1992.06.15</dob> </student> </students>
建立python程序:sax_demo.py
# -*- coding:utf-8 -*- import sys from xml.sax import parse, handler, SAXException class MyGeneralHandler(handler.ContentHandler): """ 用户自定义事件处理器 """ #文档开始事件处理 def startDocument(self): print 'Document Start...' #文档结束事件处理 def endDocument(self): print 'Document End...' #元素开始事件处理 def startElement(self, name, attrs): print 'encounter element(%s)' % (name) #元素结束事件处理 def endElement(self, name): print 'leave element(%s)' % (name) #内容事件处理 def characters(self, content): if content.isspace(): #去掉内容中的空格 return print 'characters:' + content try: parse('students.xml', MyGeneralHandler()) except SAXException, msg: print msg.getException() except: print sys.exc_info()[0],sys.exc_info()[1]
输出:
horen@heart> python sax_demo.py Document Start... encounter element(students) encounter element(student) encounter element(name) characters:zhangsan leave element(name) encounter element(age) characters:20 leave element(age) encounter element(dob) characters:1990.11.22 leave element(dob) leave element(student) encounter element(student) encounter element(name) characters:lisi leave element(name) encounter element(age) characters:21 leave element(age) encounter element(dob) characters:1992.06.15 leave element(dob) leave element(student) leave element(students) Document End...
参考文档:
官方文档:http://docs.python.org/library/xml.sax.html?highlight=sax#xml.sax
本文例子:http://gaodayue.com/2012/03/xml-parse-in-python-using-sax/