第四章——续2

时间 2019-11-08

标签第四繁體版

原文原文链接

4.9 xml模块node

xml是实现不一样语言或程序之间进行数据交换的协议，跟json差很少，但json使用起来更简单，不过，古时候，在json还没诞生的混沌年代，你们只能选择用xml，至今不少传统公司如金融行业的不少系统的接口还主要是xml。python

xml的格式以下，就是经过<>节点来区别数据结构的:git

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

xml协议在各个语言里的都是支持的，在python中能够用如下模块操做xml算法

import xml.etree.ElementTree as ET

tree = ET.parse("xmltest.xml")
root = tree.getroot()
print(root.tag)

#遍历xml文档
for child in root:
    print(child.tag, child.attrib)
    for i in child:
        print(i.tag,i.text)

#只遍历year 节点
for node in root.iter('year'):
    print(node.tag,node.text)

修改和删除xml文档内容shell

import xml.etree.ElementTree as ET

tree = ET.parse("xmltest.xml")
root = tree.getroot()

#修改
for node in root.iter('year'):
    new_year = int(node.text) + 1
    node.text = str(new_year)
    node.set("updated","yes")

tree.write("xmltest.xml")
#删除node
for country in root.findall('country'):
   rank = int(country.find('rank').text)
   if rank > 50:
     root.remove(country)

tree.write('output.xml')

本身建立xml文档数据库

import xml.etree.ElementTree as ET


new_xml = ET.Element("namelist")
name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
age = ET.SubElement(name,"age",attrib={"checked":"no"})
sex = ET.SubElement(name,"sex")
sex.text = '33'
name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
age = ET.SubElement(name2,"age")
age.text = '19'

et = ET.ElementTree(new_xml) #生成文档对象
et.write("test.xml", encoding="utf-8",xml_declaration=True)

ET.dump(new_xml) #打印生成的格式

4.10 configparser模块json

此模块用于生成和修改常见配置文档，当前模块的名称在 python 3.x 版本中变动为 configparser。缓存

看一个好多软件的常见配置文件格式以下安全

```cnf
[DEFAULT]
ServerAliveInterval = 45   
Compression = yes
CompressionLevel = 9
ForwardX11 = yes

[bitbucket.org]
User = hg

[topsecret.server.com]
Port = 50022
ForwardX11 = no
```

解析配置文件数据结构

```py
>>> import configparser # 导入模块
>>> config = configparser.ConfigParser()  #实例化(生成对象)
>>> config.sections()  #调用sections方法
[]
>>> config.read('example.ini')  # 读配置文件(注意文件路径)
['example.ini']
>>> config.sections() #调用sections方法(默认不会读取default)
['bitbucket.org', 'topsecret.server.com']
>>> 'bitbucket.org' in config #判断元素是否在sections列表内
True
>>> 'bytebong.com' in config
False
>>> config['bitbucket.org']['User'] # 经过字典的形式取值
'hg'
>>> config['DEFAULT']['Compression']
'yes'
>>> topsecret = config['topsecret.server.com']
>>> topsecret['ForwardX11']
'no'
>>> topsecret['Port']
'50022'
>>> for key in config['bitbucket.org']: print(key) # for循环 bitbucket.org 字典的key
...
user
compressionlevel
serveraliveinterval
compression
forwardx11
>>> config['bitbucket.org']['ForwardX11']
'yes'
```

其它增删改查语法

```python
[group1] # 支持的两种分隔符“=”, “:”
k1 = v1
k2:v2

[group2]
k1 = v1

import ConfigParser

config = ConfigParser.ConfigParser()
config.read('i.cfg')

# ########## 读 ##########
#secs = config.sections()
#print(secs)
#options = config.options('group2') # 获取指定section的keys
#print(options)

#item_list = config.items('group2') # 获取指定 section 的 keys & values ,key value 以元组的形式
#print(item_list)

#val = config.get('group1','key') # 获取指定的key 的value
#val = config.getint('group1','key')

# ########## 改写 ##########
#sec = config.remove_section('group1') # 删除section 并返回状态(true, false)
#config.write(open('i.cfg', "w")) # 对应的删除操做要写入文件才会生效

#sec = config.has_section('wupeiqi')
#sec = config.add_section('wupeiqi')
#config.write(open('i.cfg', "w")) # 


#config.set('group2','k1',11111)
#config.write(open('i.cfg', "w"))

#config.remove_option('group2','age')
#config.write(open('i.cfg', "w"))
```

4.11 hashlib模块

Hash，通常翻译作“散列”，也有直接音译为”哈希”的，就是把任意长度的输入（又叫作预映射，pre-image），经过散列算法，变换成固定长度的输出，该输出就是散列值。这种转换是一种压缩映射，也就是，散列值的空间一般远小于输入的空间，不一样的输入可能会散列成相同的输出，而不可能从散列值来惟一的肯定输入值。

简单的说就是一种将任意长度的消息压缩到某一固定长度的消息摘要的函数。

HASH主要用于信息安全领域中加密算法，他把一些不一样长度的信息转化成杂乱的128位的编码里,叫作HASH值.也能够说，hash就是找到一种数据内容和数据存放地址之间的映射关系

什么是MD5算法

MD5讯息摘要演算法（英语：MD5 Message-Digest Algorithm），一种被普遍使用的密码杂凑函数，能够产生出一个128位的散列值（hash value），用于确保信息传输完整一致。MD5的前身有MD二、MD3和MD4。

MD5功能

输入任意长度的信息，通过处理，输出为128位的信息（数字指纹）；
不一样的输入获得的不一样的结果（惟一性）；

MD5算法的特色

压缩性：任意长度的数据，算出的MD5值的长度都是固定的
容易计算：从原数据计算出MD5值很容易
抗修改性：对原数据进行任何改动，修改一个字节生成的MD5值区别也会很大
强抗碰撞：已知原数据和MD5，想找到一个具备相同MD5值的数据（即伪造数据）是很是困难的。

MD5算法是否可逆？

MD5不可逆的缘由是其是一种散列函数，使用的是hash算法，在计算过程当中原文的部分信息是丢失了的。

MD5用途

防止被篡改：
- 好比发送一个电子文档，发送前，我先获得MD5的输出结果a。而后在对方收到电子文档后，对方也获得一个MD5的输出结果b。若是a与b同样就表明中途未被篡改。
- 好比我提供文件下载，为了防止不法分子在安装程序中添加木马，我能够在网站上公布由安装文件获得的MD5输出结果。
- SVN在检测文件是否在CheckOut后被修改过，也是用到了MD5.
防止直接看到明文：
- 如今不少网站在数据库存储用户的密码的时候都是存储用户密码的MD5值。这样就算不法分子获得数据库的用户密码的MD5值，也没法知道用户的密码。（好比在UNIX系统中用户的密码就是以MD5（或其它相似的算法）经加密后存储在文件系统中。当用户登陆的时候，系统把用户输入的密码计算成MD5值，而后再去和保存在文件系统中的MD5值进行比较，进而肯定输入的密码是否正确。经过这样的步骤，系统在并不知道用户密码的明码的状况下就能够肯定用户登陆系统的合法性。这不但能够避免用户的密码被具备系统管理员权限的用户知道，并且还在必定程度上增长了密码被破解的难度。）
防止抵赖（数字签名）：
- 这须要一个第三方认证机构。例如A写了一个文件，认证机构对此文件用MD5算法产生摘要信息并作好记录。若之后A说这文件不是他写的，权威机构只需对此文件从新产生摘要信息，而后跟记录在册的摘要信息进行比对，相同的话，就证实是A写的了。这就是所谓的“数字签名”。

SHA-1

安全哈希算法（Secure Hash Algorithm）主要适用于数字签名标准（Digital Signature Standard DSS）里面定义的数字签名算法（Digital Signature Algorithm DSA）。对于长度小于2^64位的消息，SHA1会产生一个160位的消息摘要。当接收到消息的时候，这个消息摘要能够用来验证数据的完整性。

SHA是美国国家安全局设计的，由美国国家标准和技术研究院发布的一系列密码散列函数。

因为MD5和SHA-1于2005年被山东大学的教授王小云破解了，科学家们又推出了SHA224, SHA256, SHA384, SHA512，固然位数越长，破解难度越大，但同时生成加密的消息摘要所耗时间也更长。目前最流行的是加密算法是SHA-256 .

MD5与SHA-1的比较

因为MD5与SHA-1均是从MD4发展而来，它们的结构和强度等特性有不少类似之处，SHA-1与MD5的最大区别在于其摘要比MD5摘要长32 比特。对于强行攻击，产生任何一个报文使之摘要等于给定报文摘要的难度：MD5是2128数量级的操做，SHA-1是2160数量级的操做。产生具备相同摘要的两个报文的难度：MD5是264是数量级的操做，SHA-1 是280数量级的操做。于是,SHA-1对强行攻击的强度更大。但因为SHA-1的循环步骤比MD5多80:64且要处理的缓存大160比特:128比特，SHA-1的运行速度比MD5慢。

Python的提供的相关模块

用于加密相关的操做，3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

import hashlib

m = hashlib.md5()
m.update(b"Hello")
m.update(b"It's me")
print(m.digest())
m.update(b"It's been a long time since last time we ...")

print(m.digest()) #2进制格式hash
print(len(m.hexdigest())) #16进制格式hash
'''
def digest(self, *args, **kwargs): # real signature unknown
    """ Return the digest value as a string of binary data. """
    pass

def hexdigest(self, *args, **kwargs): # real signature unknown
    """ Return the digest value as a string of hexadecimal digits. """
    pass

'''
import hashlib

# ######## md5 ########

hash = hashlib.md5()
hash.update('admin')
print(hash.hexdigest())

# ######## sha1 ########

hash = hashlib.sha1()
hash.update('admin')
print(hash.hexdigest())

# ######## sha256 ########

hash = hashlib.sha256()
hash.update('admin')
print(hash.hexdigest())


# ######## sha384 ########

hash = hashlib.sha384()
hash.update('admin')
print(hash.hexdigest())

# ######## sha512 ########

hash = hashlib.sha512()
hash.update('admin')
print(hash.hexdigest())

4.12 subprocess模块

咱们常常须要经过Python去执行一条系统命令或脚本，系统的shell命令是独立于你的python进程以外的，每执行一条命令，就是发起一个新进程，经过python调用系统命令或脚本的模块在python2有os.system，如：

>>> os.system('uname -a')
Darwin Alexs-MacBook-Pro.local 15.6.0 Darwin Kernel Version 15.6.0: Sun Jun  4 21:43:07 PDT 2017; root:xnu-3248.70.3~1/RELEASE_X86_64 x86_64
0

这条命令的实现原理是什么呢？(视频中讲，解释进程间通讯的问题...)

除了os.system能够调用系统命令，,commands,popen2等也能够，比较乱，因而官方推出了subprocess,目地是提供统一的模块来实现对系统命令或脚本的调用

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several older modules and functions:

os.system
os.spawn*

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

The run() function was added in Python 3.5; if you need to retain compatibility with older versions, see the Older high-level API section.

三种执行命令的方法

subprocess.run(*popenargs, input=None, timeout=None, check=False, **kwargs) #官方推荐
subprocess.call(*popenargs, timeout=None, **kwargs) #跟上面实现的内容差很少，另外一种写法
subprocess.Popen() #上面各类方法的底层封装

run()方法

Run command with arguments and return a CompletedProcess instance.The returned instance will have attributes args, returncode, stdout and stderr. By default, stdout and stderr are not captured, and those attributes will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

If check is True and the exit code was non-zero, it raises a CalledProcessError. The CalledProcessError object will have the return code in the returncode attribute, and output & stderr attributes if those streams were captured.

If timeout is given, and the process takes too long, a TimeoutExpired exception will be raised.

The other arguments are the same as for the Popen constructor.

标准写法

subprocess.run(['df','-h'],stderr=subprocess.PIPE,stdout=subprocess.PIPE,check=True

涉及到管道|的命令须要这样写

subprocess.run('df -h|grep disk1',shell=True) #shell=True的意思是这条命令直接交给系统去执行，不须要python负责解析

call()方法

#执行命令，返回命令执行状态 ， 0 or 非0
>>> retcode = subprocess.call(["ls", "-l"])

#执行命令，若是命令结果为0，就正常返回，不然抛异常
>>> subprocess.check_call(["ls", "-l"])
0

#接收字符串格式命令，返回元组形式，第1个元素是执行状态，第2个是命令结果 
>>> subprocess.getstatusoutput('ls /bin/ls')
(0, '/bin/ls')

#接收字符串格式命令，并返回结果
>>> subprocess.getoutput('ls /bin/ls')
'/bin/ls'

#执行命令，并返回结果，注意是返回结果，不是打印，下例结果返回给res
>>> res=subprocess.check_output(['ls','-l'])
>>> res
b'total 0\ndrwxr-xr-x 12 alex staff 408 Nov 2 11:05 OldBoyCRM\n'

Popen()方法

经常使用参数：

args：shell命令，能够是字符串或者序列类型（如：list，元组）
stdin, stdout, stderr：分别表示程序的标准输入、输出、错误句柄
preexec_fn：只在Unix平台下有效，用于指定一个可执行对象（callable object），它将在子进程运行以前被调用
shell：同上
cwd：用于设置子进程的当前目录
env：用于指定子进程的环境变量。若是env = None，子进程的环境变量将从父进程中继承。

下面这2条语句执行会有什么区别？

a=subprocess.run('sleep 10',shell=True,stdout=subprocess.PIPE)
a=subprocess.Popen('sleep 10',shell=True,stdout=subprocess.PIPE)

区别是Popen会在发起命令后马上返回，而不等命令执行结果。这样的好处是什么呢？

若是你调用的命令或脚本须要执行10分钟，你的主程序不需卡在这里等10分钟，能够继续往下走，干别的事情，每过一会，经过一个什么方法来检测一下命令是否执行完成就行了。

Popen调用后会返回一个对象，能够经过这个对象拿到命令执行结果或状态等，该对象有如下方法

poll()

Check if child process has terminated. Returns returncode

wait()

Wait for child process to terminate. Returns returncode attribute.

terminate()终止所启动的进程Terminate the process with SIGTERM

kill() 杀死所启动的进程 Kill the process with SIGKILL

communicate()与启动的进程交互，发送数据到stdin,并从stdout接收输出，而后等待任务结束

>>> a = subprocess.Popen('python3 guess_age.py',stdout=subprocess.PIPE,stderr=subprocess.PIPE,stdin=subprocess.PIPE,shell=True)

>>> a.communicate(b'22')

(b'your guess:try bigger\n', b'')

send_signal(signal.xxx)发送系统信号

pid 拿到所启动进程的进程号