xmlreader与xmlwriter里的几个坑与解决方案

 

加载超过100M的xml文件时(可能不是很常见),XmlDocument这种所有加载到内存里的模式就有点不友好了,耗时长、内存高。node

这时用xmlreader就会有自行车换超跑的感受,但其间遇到几个坑,记录一下。web

先看源码,包括dom和sax两种模式的读取和写入dom

DOM模式:ui

 1         /// <summary>  2 /// dom模式建立xml文件  3 /// </summary>  4 /// <param name="path"></param>  5 public void CreateXml_Dom(string path)  6  {  7 XmlDocument xmlDocw = new XmlDocument();  8 //xml头  9 var xmldecl = xmlDocw.CreateXmlDeclaration("1.0", "utf-8", null); 10 var root = xmlDocw.CreateElement("root"); 11 root.SetAttribute("Name", "李四"); 12 var test = xmlDocw.CreateElement("test"); 13  root.AppendChild(test); 14 15  xmlDocw.AppendChild(xmldecl); 16  xmlDocw.AppendChild(root); 17  xmlDocw.Save(path); 18 19 //能够经过xmlreader读数据后生成节点 20 //var node = xmlDocw.ReadNode(rdr); 21 //root.AppendChild(node); 22 //或者读取outerxml后做为innerxml写入 23 //string str = rdr.ReadOuterXml(); 24 //root.InnerXml = str; 25  } 26 27 /// <summary> 28 /// dom模式读取xml 29 /// </summary> 30 /// <param name="path"></param> 31 public void ReadXml_Dom(string path) 32  { 33 XmlDocument xmlDocr = new XmlDocument(); 34  xmlDocr.Load(path); 35 var root = xmlDocr.DocumentElement; 36 string str = root.GetAttribute("Name"); 37  Console.WriteLine(str); 38 }

 

SAX(simple API for XML)模式:几种错误也都用注释标注出来了编码

 1         /// <summary>  2 /// xmlwriter建立xml文件  3 /// </summary>  4 /// <param name="path"></param>  5 public void CreateXml_Sax(string path)  6  {  7 //filestream没问题  8 //FileStream stream = new FileStream(path,FileMode.Create);  9 //会出现编码一直是utf-16问题 10 //StringBuilder stream = new StringBuilder(); 11 MemoryStream stream = new MemoryStream(); 12 XmlWriterSettings settings = new XmlWriterSettings(); 13 //Encoding.UTF8这个会报错,字节顺序标记 14 settings.Encoding = new UTF8Encoding(false); 15 XmlWriter xw = XmlWriter.Create(stream, settings); 16 //XmlTextWriter xw = new XmlTextWriter(stream, new UTF8Encoding(false)); 17 18 //写入声明 19  xw.WriteStartDocument(); 20 21 xw.WriteStartElement("root"); 22 xw.WriteAttributeString("Name", "张三"); 23 //能够经过xmlreader读数据后直接写入 24 //xw.WriteNode(rdr); 25 xw.WriteStartElement("test"); 26  xw.WriteEndElement(); 27 28  xw.WriteEndElement(); 29 30  xw.WriteEndDocument(); 31  xw.Close(); 32 33 string xmlstr = Encoding.UTF8.GetString(stream.ToArray()); 34  stream.Close(); 35 XmlDocument xmlDocw = new XmlDocument(); 36  xmlDocw.LoadXml(xmlstr); 37  xmlDocw.Save(path); 38  } 39 40 /// <summary> 41 /// xmlreader读取xml 42 /// </summary> 43 /// <param name="path"></param> 44 public void ReadXml_Sax(string path) 45  { 46 XmlDocument xmlDocw = new XmlDocument(); 47 XmlReaderSettings rsettings = new XmlReaderSettings(); 48 rsettings.IgnoreComments = true; 49 rsettings.IgnoreWhitespace = false; 50 rsettings.CheckCharacters = false; 51 //默认的xmlreader不读取内容中的回车换行\r\n 52 //(XmlReader rdr = XmlReader.Create(path,rsettings)) 53 using (XmlTextReader rdr = new XmlTextReader(path)) 54  { 55 rdr.WhitespaceHandling = WhitespaceHandling.Significant; 56 string eleName = ""; 57 while (rdr.Read()) 58  { 59 if (rdr.NodeType == XmlNodeType.Element) 60  { 61 //节点名称 62 eleName = rdr.Name; 63 //节点深度 64 int dp = rdr.Depth; 65 //是否空节点,表示<elememt/> 不是<element></element> 66 bool needend = rdr.IsEmptyElement; 67 for (int i = 0; i < rdr.AttributeCount; i++) 68  { 69  rdr.MoveToAttribute(i); 70 Console.WriteLine(rdr.Name+":"+rdr.Value); 71  } 72 //能够直接读取节点全部的数据.能够用readNode读取 73 //rdr.EOF断定,否则会跳过节点 74 //rdr.ReadOuterXml(); 75  } 76 else if (rdr.NodeType == XmlNodeType.EndElement) 77  { 78 eleName = rdr.Name; 79  } 80  } 81  } 82 }

xmlreader和xmldocument(xmlwriter)组合一块儿用对大型xml进行拆分读取,十分有效。 spa

 

下面是遇到的问题:code

1.xmlwriter后xml文件头始终是utf-16orm

这是用StringBuilder才会有的问题,改用FileStream、MemoryStream等就行了。xml

 

2.(UTF8)改用MemoryStream后,造成的xml字符串经过XMLDocument.LoadXml时报错blog

XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.UTF8;

最终发现默认的Encoding.UTF8是带有字节顺序标记的,要用new UTF8Encoding(false);

经过监视区代码能够看到,xmlstr[0]是65279,修改后就对了变成60'<'。

 


3.xmlreader默认不读取内容中的回车换行,读进来就是个空格。

第二个直接回车换行就是读不进来,用xmldocument能够读到两个,xmlreader就是读取不到。

 

   期间一直在找设置,好比IgnoreWhitespace等,发现都没有用,仍是不读。

 XmlReaderSettings rsettings = new XmlReaderSettings();
    rsettings.IgnoreWhitespace = false;

 最后在stackoverflow上找到答案(注1),不能用XmlReader rdr = XmlReader.Create(path),用XmlTextReader就行了。

 

 

 

注1:不读回车换行问题 https://stackoverflow.com/questions/1793908/xmlreader-newline-n-instead-of-r-n

This is because the XmlTextReader has a normalization setting defaulted to false unlike XmlReader.Create which always normalizes newlines no matter what. 

本文为原创,转载请注明:https://www.cnblogs.com/zhanglb163/

相关文章
相关标签/搜索