Python基础知识分享

时间 2019-11-30

原文原文链接

介绍

以前也蜻蜓点水的看了一下Python的基础，可是感受有点不扎实，因此本身又从新细细的把基础过了一遍，同时把觉着重要的记录下来。文章最末尾分享了《Python爬虫开发与项目实战》pdf书籍，此pdf是高清有目录的，有须要的朋友拿去。html

元组

元组内的数据不能修改和删除算法

Python 表达式	结果	描述
('Hi!',) * 4	('Hi!', 'Hi!', 'Hi!', 'Hi!')	复制
3 in (1, 2, 3)	True	元素是否存在

任意无符号的对象，以逗号隔开，默认为元组。例：x, y = 1, 2;json

建立一个元素的元组

必定要有一个逗号，要不是错误的数组

tuple = ("apple",)
复制代码

经过元组实现数值交换

def test2():
    x = 2
    y = 3
    x, y = y, x
    print x,y
复制代码

查看帮助文档

help多线程

help(list)
复制代码

字典

dict["x"]="value"
复制代码

若是索引x不在字典dict的key中，则会新增一条数据，反之为修改数据app

set()内置函数

set() 函数建立一个无序不重复元素集，可进行关系测试，删除重复数据，还能够计算交集、差集、并集等。框架

x = set(["1","2"])
y = set(["1","3","4"])
print x&y # 交集
print x|y # 并集
print x-y # 差集
zip(x) #解包为数组
复制代码

zip()内置函数

zip() 函数用于将可迭代的对象做为参数，将对象中对应的元素打包成一个个元组，而后返回由这些元组组成的列表。函数

若是各个迭代器的元素个数不一致，则返回列表长度与最短的对象相同，利用 * 号操做符，能够将元组解压为列表。性能

a = [1,2,3]
b = [4,5,6]
c = [4,5,6,7,8]
zipped = zip(a,b)     # 打包为元组的列表[(1, 4), (2, 5), (3, 6)]
zip(a,c) # 元素个数与最短的列表一致[(1, 4), (2, 5), (3, 6)]
zip(*zipped)  #与zip相反，可理解为解压，返回二维矩阵式
[(1, 2, 3), (4, 5, 6)]
复制代码

可变参数

在函数的参数使用标识符"*"来实现可变参数的功能。"*"能够引用元组，把多个参会组合到一个元组中； "**"能够引用字典测试

def search(*t,**d):
    keys = d.keys()
    for arg in t:
        for key in keys:
            if arg == key:
                print ("find:",d[key])

search("a","two",a="1",b="2") #调用
复制代码

时间与字符串的转换

时间转字符串使用time模块中的strftime()函数

import time

print time.strftime("%Y-%m-%d",time.localtime())
复制代码

字符串到时间使用time模块中strftime和datetime模块中的datetime()函数

import time
import datetime

t = time.strptime("2018-3-8", "%Y-%m-%d")
y, m, d = t[0:3]

print datetime.datetime(y,m,d)
复制代码

操做文件和目录操做

好比对文件重命名、删除、查找等操做

os库:文件的重命名、获取路径下全部的文件等。os.path模块能够对路径、文件名等进行操做

files = os.listdir(".")
print type(os.path)
for filename in files:
    print os.path.splitext(filename)# 文件名和后缀分开
复制代码

shutil库：文件的复制、移动等操做
glob库：glob.glob("*.txt")查找当前路径下后缀名txt全部文件

读取配置文件

经过configparser(3.x，ConfigParser（2.x）)库进行配置的文件的读取、更改、增长等操做

config = ConfigParser.ConfigParser()
config.add_section("系统")
config.set("系统", "系统名称", "iOS")
f = open("Sys.ini", "a+")
config.write(f)
f.close()
复制代码

正则

re正则匹配查找等操做

类

属性

私有属性名字前边加"__"

class Fruits:
    price = 0               # 类属性，全部的类变量共享，对象和类都可访问。可是修改只能经过类访问进行修改

    def __init__(self):
        self.color = "red"  # 实例变量，只有对象才能够访问
        zone = "中国"        # 局部变量
        self.__weight = "12" # 私有变量，不能够直接访问，能够经过_classname__attribute进行访问


if __name__ == "__main__":
    apple = Fruits()
    print (apple._Fruits__weight) #访问私有变量
复制代码

方法

静态方法

 @staticmethod
    def getPrice():
        print (Fruits.price)
复制代码

私有方法

def __getWeight(self):
        print self.__weight
复制代码

类方法

 @classmethod
    def getPrice2(cls):
        print (cls.price)
复制代码

动态增长方法

Python做为动态脚本语言，编写的程序也具备很强的动态性。

class_name.method_name = function_name

类的继续

而且支持多重继承

格式：

class class_name(super_class1,super_class2):

抽象方法

 @abstractmethod
    def grow(self):
        pass
复制代码

运算符的重载

Python将运算符和类的内置方法关联起来,每一个运算符对应1个函数。例如__add__()表示加好运算符;gt()表示大于运算符

经过重载运算符咱们能够实现对象的加减或者比较等操做。

异常

捕获异常

try: except:finally:

抛出异常

raise语言抛出异常

断言

assert len(t)==1

文件持久化

`shelve`本地建库

shelve模块提供了本地数据化存储的方法

addresses = shelve.open("addresses") # 若是没有本地会建立
addresses["city"] = "北京"
addresses["pro"] = "广东"
addresses.close()
复制代码

cPickle 序列化

cPickle和pickle两个模块都是来实现序列号的，前者是C语言编写的，效率比较高

序列化：

import cPickle as pickle
str = "我须要序列化"
f = open("serial.txt", "wb")
pickle.dump(str, f)
f.close()
复制代码

反序列化:

f = open("serial.txt","rb")
str = pickle.load(f)
f.close()
复制代码

json文件存储

Python内置了json模块用于json数据的操做

序列号到本地

import json
new_str = [{'a': 1}, {'b': 2}]
f = open('json.txt', 'w')
json.dump(new_str, f,ensure_ascii=False)
f.close()
复制代码

从本地读取

import json
f = open('json.txt', 'r')
str = json.load(f)
print str
f.close()
复制代码

线程

threading模块

class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)

线程和queue

# -*- coding:UTF-8 -*-

import threading
import Queue

class MyJob(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self, name="aa")

    def run(self):
        print threading.currentThread()

        while not q.empty():
            a = q.get()
            print("个人%d"%a)
            print "个人线程"
            q.task_done()


def job(a, b):
    print a+b
    print threading.activeCount()
    print "多线程"


thread = threading.Thread(target=job, args=(2, 4), name="mythread")
q = Queue.Queue()
if __name__ == "__main__":
    myjob = MyJob()
    for i in range(100):
        q.put(i)
    myjob.start()
    q.join() #每一个昨晚的任何须须调用task_done()，要不主线程会挂起
复制代码

进程

multiprocessing中 Process能够建立进程，经过Pool进程池能够对进程进行管理

from multiprocessing import Process
import os

def run_pro(name):
    print 'process %s(%s)' % (os.getpid(),name)

if __name__ == "__main__":
    print 'parent process %s' % os.getpid()
    for i in range(5):
        p = Process(target=run_pro, args=(str(i)))
        p.start()
复制代码

爬虫

爬取数据

urllib2/urllib Python内置的，能够实现爬虫，比较经常使用

import urllib2
response = urllib2.urlopen('http://www.baidu.com')
html = response.read()
print html

try:
    request = urllib2.Request('http://www.google.com')
    response = urllib2.urlopen(request,timeout=5)
    html = response.read()
    print html
except urllib2.URLError as e:
    if hasattr(e, 'code'):
        print 'error code:',e.code
    print e
复制代码

Requests 第三方比较人性化的框架

import requests
r = requests.get('http://www.baidu.com')
print r.content
print r.url
print r.headers
复制代码

解析爬取的数据

经过BeautifulSoup来解析html数据，Python标准库（html.parser）容错比较差，通常使用第三方的lxml,性能、容错等比较好。

hash算法库

hashlib介绍

hashlib 是一个提供了一些流行的hash算法的 Python 标准库．其中所包括的算法有 md5, sha1, sha224, sha256, sha384, sha512. 另外，模块中所定义的 new(name, string=”) 方法可经过指定系统所支持的hash算法来构造相应的hash对象

Python基础知识分享

介绍

元组

建立一个元素的元组

经过元组实现数值交换

查看帮助文档

字典

set()内置函数

zip()内置函数

可变参数

时间与字符串的转换

操做文件和目录操做

读取配置文件

正则

类

属性

方法

动态增长方法

类的继续

抽象方法

运算符的重载

异常

抛出异常

断言

文件持久化

`shelve`本地建库

cPickle 序列化

json文件存储

线程

threading模块

线程和queue

进程

爬虫

爬取数据

解析爬取的数据

hash算法库

hashlib介绍

比较好的资料

《Python爬虫开发与项目实战》pdf书籍

个人博客

Python基础知识分享

介绍

元组

建立一个元素的元组

经过元组实现数值交换

查看帮助文档

字典

set()内置函数

zip()内置函数

可变参数

时间与字符串的转换

操做文件和目录操做

读取配置文件

正则

类

属性

方法

动态增长方法

类的继续

抽象方法

运算符的重载

异常

抛出异常

断言

文件持久化

shelve本地建库

cPickle 序列化

json文件存储

线程

threading模块

线程和queue

进程

爬虫

爬取数据

解析爬取的数据

hash算法库

hashlib介绍

比较好的资料

《Python爬虫开发与项目实战》pdf书籍

个人博客

`shelve`本地建库