核心思想: python
创建一个List用来存储父节点信息,每当读到以Tab+name 开头的行时,将这行父节点信息存储在prefixList[tab 的个数] 中,即prefixList[i] 存储 Tab 个数为 i 的父节点信息。算法
当读到以Tab+ptr 开头的行的时候,代表到达了子节点,那么它的父节点(前缀)一定为:preList[0] + ...+ preList[tab 的个数],因此最终结果为: 前缀 + 当前子节点信息。app
当再次读到以Tab+name 开头的行时,代表对于接下来的子节点而言,其父节点中某个节点变化了,咱们只要覆盖对应的prefixList[tab 的个数] 的值,由于不会有节点须要原来prefixList[tab 的个数] 的值。优化
实现:spa
现模拟debug trace 建一个文本文件1.txt,内容以下:debug
service[hi] name: [1] { name:[11] { name: [111] { ptr->1111-->[value0] ptr->1112-->[value1] } name: [112] { name: [1121] { ptr->111211-->[value2] } } } name:[12] { ptr->121-->[value3] } name:[13] { ptr->131-->[value4] } } service[Jeff] name: [1] { name:[11] { name: [111] { ptr->1111-->[value0] ptr->1112-->[value1] } name: [112] { name: [1121] { ptr->111211-->[value2] } } } name:[12] { ptr->121-->[value3] } name:[13] { ptr->131-->[value4] } }
解析程序以下:
1.common.pycode
''' Created on 2012-5-28 @author: Jeff_Yu ''' def getValue(string,key1,key2): """ get the value between key1 and key2 in string """ index1 = string.find(key1) index2 = string.find(key2) value = string[index1 + 1 :index2] return value def getFiledNum(string,key,begin): """ get the number of key in string from begin position """ keyNum = 0 start = begin while True: index = string.find(key, start) if index == -1: break keyNum = keyNum + 1 start = index + 1 return keyNum
2. main.py
''' Created on 2012-6-1 @author: Jeff_Yu ''' import common fileNameRead = "1.txt" fileNameWrite = '%s%s' %("Result_",fileNameRead) writeList = [] # the first name always start with 0 Tab i = 0 fr = open(fileNameRead,'r') fw = open(fileNameWrite,'w') for data in fr: if not data: break # find the Service Name if data.startswith("service"): #for each service prefixList = list("0" * 30) prefixString = "" recordNum = "" index = data.find('\n') writeList.append('%s\n' %data[0:index]) continue # find name if data.find("name") != -1: tabNumOfData = common.getFiledNum(data, '\t', 0) value = common.getValue(data, '[', ']') prefixList[tabNumOfData] = value + "." if data.find("ptr") != -1: tabNumOfLeaf = common.getFiledNum(data, '\t', 0) valueOfLeaf = common.getValue(data, '[', ']') nameOfLeaf = common.getValue(data, '>', '-->') LeafPartstring = nameOfLeaf + "[" + valueOfLeaf + "]" finalString = "" while i < tabNumOfLeaf: finalString = finalString + prefixList[i] i = i + 1 i = 0 finalString = finalString + LeafPartstring #append line to writeList writeList.append(finalString) writeList.append("\n") # write writeList to result file fw.writelines(writeList) del prefixList del writeList fw.close() fr.close()
解析结果Result_1.txt:
service[hi] 1.11.111.1111[value0] 1.11.111.1112[value1] 1.11.112.1121.111211[value2] 1.12.121[value3] 1.13.131[value4] service[Jeff] 1.11.111.1111[value0] 1.11.111.1112[value1] 1.11.112.1121.111211[value2] 1.12.121[value3] 1.13.131[value4]
实际的trace文件比这个复杂,由于涉及公司信息,实现代码就不贴出来,可是核心思想和上面是同样的
字符串
这个版本效率大大提升,原来解析5M的文件要2分多钟,如今只要1秒钟
get
这个版本优化了:
博客
1.字符串相加的部分改为 all = ‘%s%s%s%s’ % (str0, str1, str2, str3) 的形式。
2.要写入得内容保存在List中,最后用f.writelines(list)一块儿写入。
3. 这个算法减小了读文件的次数,及时保存读过的有用信息,避免往回读文件。