python和java,.net,php web平台交互最好使用web通讯方式,不要使用Jypython,IronPython,这样的好处是可以保持程序模块化,解耦性好javascript
>>> print(r'''Hello, ... Lisa!''') Hello, Lisa! >>>
>>> print('''line1 ... line2 ... line3''') line1 line2 line3
print(r'\\\t\\') # 输出 \\\t\\
整数,浮点数,字符串,布尔值(True,False),php
还有list(相似数组),dict(相似js object literal)html
常量: PIjava
两种除法:python
/ : 自动使用浮点数,好比10/3=3.33333 9/3=3.0mysql
// : 取整 10//3= 3web
%: 10%3=1算法
注意:sql
python支持多种数据类型,而在计算机内部,能够把任何数据都当作一个"对象“,而变量就是在程序中用来指向这些数据对象的,对变量赋值实际上就是把数据和变量给关联起来”shell
python的整数没有大小的限制
python字符串編碼经常使用的函数:
ord(‘x’)返回x字符对应的unicode编码,chr(‘hexcode’)则返回unicode编码对应的祖父
>>> ord('A') 65 >>> ord('中') 20013 >>> chr(66) 'B' >>> chr(25991) '文'
因为python的字符串类型是str,在内存中以unicode表示,一个字符都会对应着若干个字节,可是若是要在网络上传输,或者保存到磁盘上,则须要把str变为以字节为单位的bytes类型。
python对bytes类型的数据用带b前缀的单引号或者双引号表示:
>>> 'ABC'.encode('ascii') b'ABC' >>> '中文'.encode('utf-8') b'\xe4\xb8\xad\xe6\x96\x87'
反过来,若是从网络或者磁盘上读取了utf-8 byte字节流,那么必须作decode操做成为unicode后才能在代码中使用,须要使用decode方法:
>>> b'ABC'.decode('ascii') 'ABC' >>> b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8') '中文' >>> len('abc') 3 >>> len('中') 1 >>> len('中文'.encode('utf-8')) 6
Python解释器读取源代码时,为了让它按UTF-8编码读取,咱们一般在文件开头写上这两行:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
全部python中须要显示的字符串,应该以 u"this is unicode字符串"的方式来定义使用字符串
>>> 'Hello, %s' % 'world' 'Hello, world' >>> 'Hi, %s, you have $%d.' % ('Michael', 1000000) 'Hi, Michael, you have $1000000.'
list相似于js的array,是一种有序的集合,能够随时添加和删除对应的元素
>>> classmates = ['Michael', 'Bob', 'Tracy'] >>> classmates ['Michael', 'Bob', 'Tracy'] >>> len(classmates) 3 >>> classmates[0] 'Michael' >>> classmates[1] 'Bob' >>> classmates[2] 'Tracy' >>> classmates[3] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: list index out of range >>> classmates[-1] 'Tracy' >>> classmates[-2] 'Bob' >>> classmates[-3] 'Michael' >>> classmates[-4] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: list index out of range
list还有如下经常使用的操做函数: append,insert,pop
L = ['Hello', 'World', 18, 'Apple', None] print([s.lower() if isinstance(s,str) else s for s in L]) ['hello', 'world', 18, 'apple', None]
在科学计算中,若是range为百万,咱们没有必要所有先在内存中以list形式生成好,只需在用到的时候再生成,这就是generator,generator自己保存的是算法,generator自己也是iteratable可递归访问的(用在for循环中)
>>> L = [x * x for x in range(10)] >>> L [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> g = (x * x for x in range(10)) >>> g <generator object <genexpr> at 0x1022ef630> >>> next(g) 0 >>> next(g) 1 >>> next(g) 4 >>> next(g) 9 >>> next(g) 16 >>> g = (x * x for x in range(10)) >>> for n in g: ... print(n) ... 0 1 4 9
若是是复杂的generator算法逻辑,则能够经过相似函数来定义。
gougu = {z: (x,y) for z in [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] for y in range(1, z) for x in range(1, y) if x*x + y*y == z*z} gougu Out[17]: {5: (3, 4), 10: (6, 8), 13: (5, 12), 15: (9, 12), 17: (8, 15), 20: (12, 16), 25: (7, 24), 26: (10, 24)} gougu = [[x, y, z] for z in [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] for y in range(1, z) for x in range(1, y) if x*x + y*y == z*z] gougu Out[19]: [[3, 4, 5], [6, 8, 10], [5, 12, 13], [9, 12, 15], pyt = ((x, y, z) for z in [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] for y in range(1, z) for x in range(1, y) if x*x + y*y == z*z) #这里pyt就是一个generator,注意最外面的括号!随后可使用for来调用生成式 print([m for m in pyt]) [(3, 4, 5), (6, 8, 10), (5, 12, 13), (9, 12, 15), (8, 15, 17), (12, 16, 20), (15, 20, 25), (7, 24, 25), (10, 24, 26)]
import jieba documents = [u'我来到北京清华大学', u'假如当前的单词表有10个不一样的单词', u'我是中华人民共和国的公民,来自上海,老家是湖北襄阳'] documents_after = [] documents_after = [[w for w in jieba.cut(s)] for s in documents] documents_after2 = [' '.join(s) for s in documents_after] print(documents_after) print(documents_after2) [['我', '来到', '北京', '清华大学'], ['假如', '当前', '的', '单词表', '有', '10', '个', '不一样', '的', '单词'], ['我', '是', '中华人民共和国', '的', '公民', ',', '来自', '上海', ',', '老家', '是', '湖北', '襄阳']] ['我 来到 北京 清华大学', '假如 当前 的 单词表 有 10 个 不一样 的 单词', '我 是 中华人民共和国 的 公民 , 来自 上海 , 老家 是 湖北 襄阳']
def fib(max): n,a,b = 0,0,1 while n < max: yield b a,b = b,a+b n = n+1 return 'done' f = fib(6) for n in fib(6): print(n) 1 1 2 3 5 8
generator是一个产生一系列结果的一个函数(注意不是只产生一个value的函数哦!)
def countdown(n): print("counting down from ",n) while n > 0: yield n n -=1 x = countdown(10) print(x)
# 注意并未打印出 counting down from 10的信息哦 <generator object countdown at 0x0000026385694468>
print(x.__next__())
# counting down from 10
# 10
print(x.__next__())
#Out[17]:
#9
generator和普通函数的行为是彻底不一样的。调用一个generator functionjiang chuangjian yige generator object.可是注意这时并不会调用函数自己!!
当generator return时,iteration就将stop.
当调用__next__()时yield一个value出来,可是并不会继续往下执行,function挂起pending,直到下一次next()调用时才往下执行,可是却记录着相应的状态.
generator虽然行为和iterator很是相似,可是也有一点差异:generator是一个one-time operation
generator还有一个无与伦比的优势:因为generator并不会一次性把全部序列加载到内存处理后返回,而是一轮一轮地加载一轮一轮地处理并返回,所以再大的文件,generator也能够处理!
a = [1,2,3,4] b = (2*x for x in a) b Out[19]: <generator object <genexpr> at 0x0000023EDA2C6CA8> for i in b: print(i) 2 4 6 8
generator表达式语法:
(expression for i in s if condition) # 等价于 for i in s: if condition: yield expression
注意:若是generator expression仅仅用于做为惟一的函数形参时,能够省略()
a = [1,2,3,4] sum(x*x for x in a) Out[21]: 30
咱们知道能够用于for循环中不断迭代的数据有:list,tuple,dict,set,str等集合类数据类型,或者是generator(包括带yield的generator function)。全部这些类型的数据咱们都称之为可迭代的数据类型(iterable),可使用isinstance()来具体判断:
>>> from collections import Iterable >>> isinstance([], Iterable) True >>> isinstance({}, Iterable) True >>> isinstance('abc', Iterable) True >>> isinstance((x for x in range(10)), Iterable) True >>> isinstance(100, Iterable) False
而generator不只能够用于for循环,还能够被next()函数所调用,而且返回下一个值,直到抛出StopIteration异常。
全部能够被next()函数调用并不断返回下一个值的对象成为迭代器Iterator
一样可使用isinstance()来判断是否Iterator对象:
>>> from collections import Iterator >>> isinstance((x for x in range(10)), Iterator) True >>> isinstance([], Iterator) False >>> isinstance({}, Iterator) False >>> isinstance('abc', Iterator) False
从上面能够看到,虽然list,dict,set,str是Iterable,可是却不是Iterator,而generator是Iterator
可是咱们能够经过iter()函数将dist,list等iterable对象转变为iterator,好比:
>>> isinstance(iter([]), Iterator) True >>> isinstance(iter('abc'), Iterator) True
凡是可做用于for循环的对象都是Iterable类型;
凡是可做用于next()函数的对象都是Iterator类型,它们表示一个惰性计算的序列;
集合数据类型如list、dict、str等是Iterable但不是Iterator,不过能够经过iter()函数得到一个Iterator对象。
Python的for循环本质上就是经过不断调用next()函数实现的,例如:
for x in [1, 2, 3, 4, 5]: pass #彻底等价于: # 首先得到Iterator对象: it = iter([1, 2, 3, 4, 5]) # 循环: while True: try: # 得到下一个值: x = next(it) except StopIteration: # 遇到StopIteration就退出循环 break
tuple:
>>> classmates = ('Michael', 'Bob', 'Tracy')
只有一个元素的tuple必须用,分开以避免歧义,不然会被认为是一个元素自己,而非只含一个元素的tuple,
>>> t = (1,) >>> t (1,)
https://stackoverflow.com/questions/509211/understanding-pythons-slice-notation
a[start:end] # items start through end-1 a[start:] # items start through the rest of the array a[:end] # items from the beginning through end-1 a[:] # a copy of the whole array a[start:end:step] # start through not past end, by step a[-1] # last item in the array a[-2:] # last two items in the array a[:-2] # everything except the last two items a[::-1] # all items in the array, reversed a[1::-1] # the first two items, reversed a[:-3:-1] # the last two items, reversed a[-3::-1] # everything except the last two items, reversed
https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
ndarray可使用标准的python $x[obj]$方式来访问和切片,这里$x$是数组自己,而$obj$是相应的选择表达式。ndarray支持3中不一样的index方式:field access, basic slicing, advanced indexing,具体使用哪种取决于$obj$自己。
注意:
$x[(exp1, exp2, ..., expN)] 等价于 x[exp1, exp2, ..., expN]$
ndarray的basic slicing将python仅能针对一维数组的基础index和slicing概念拓展到N维。当前面的$x[obj]$ slice形式中的obj为一个slice对象($[start:stop:step]$格式),或者一个整数,或者$(slice obj,int)$时,这就是basic slicing。basic slicing的标准规则在每一个纬度上分别应用。
全部basic slicing产生的数组其实是原始数组的view,数据自己并不会复制。
如下是抽象出来的基础顺序切片规则
$i:j:k$,$i = start:end:step$,其中,若是$i,j$为负数,则能够理解为$n+i,n+j$,n是相应维度上元素的个数。若是$k<0$,则表示走向到更小的indices.
>>> x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> x[1:7:2] array([1, 3, 5]) >>> x[-2:10] array([8, 9]) >>> x[-3:3:-1] array([7, 6, 5, 4]) >>> x[5:] array([5, 6, 7, 8, 9]) >>> x = np.array([[[1],[2],[3]], [[4],[5],[6]]]) >>> x.shape (2, 3, 1) >>> x[1:2] array([[[4], [5], [6]]]) >>> x[...,0] array([[1, 2, 3], [4, 5, 6]]) >>> x[:,np.newaxis,:,:].shape (2, 1, 3, 1)
若是selction obj不是一个sequence obj的tuple,而是一个值为int或者bool的ndarray,或者是至少包含一个start:end:step或int/bool性ndarray的tuple,则就会应用advanced indexing.有两种模式:integer和boolean
高级index总会返回数据的一份copy(基础slicing只返回一个view,而未作copy!)
注意:
>>> x = array([[ 0, 1, 2], ... [ 3, 4, 5], ... [ 6, 7, 8], ... [ 9, 10, 11]]) >>> rows = np.array([[0, 0], ... [3, 3]], dtype=np.intp) >>> columns = np.array([[0, 2], ... [0, 2]], dtype=np.intp) >>> x[rows, columns] array([[ 0, 2], [ 9, 11]])
>>> x = np.array([[1, 2], [3, 4], [5, 6]]) >>> x[[0, 1, 2], [0, 1, 0]] array([1, 4, 5])
若是obj是一个boolean值的数组,则使用该slicing策略。
>>> x = np.array([[1., 2.], [np.nan, 3.], [np.nan, np.nan]]) >>> x[~np.isnan(x)] array([ 1., 2., 3.]) >>> x = np.array([1., -1., -2., 3]) >>> x[x < 0] += 20 >>> x array([ 1., 19., 18., 3.]) >>> x = np.array([[0, 1], [1, 1], [2, 2]]) >>> rowsum = x.sum(-1) >>> x[rowsum <= 2, :] array([[0, 1], [1, 1]]) >>> rowsum = x.sum(-1, keepdims=True) >>> rowsum.shape (3, 1) >>> x[rowsum <= 2, :] # fails IndexError: too many indices >>> x[rowsum <= 2] array([0, 1]) >>> x = array([[ 0, 1, 2], ... [ 3, 4, 5], ... [ 6, 7, 8], ... [ 9, 10, 11]]) >>> rows = (x.sum(-1) % 2) == 0 >>> rows array([False, True, False, True]) >>> columns = [0, 2] >>> x[np.ix_(rows, columns)] array([[ 3, 5], [ 9, 11]]) >>> rows = rows.nonzero()[0] >>> x[rows[:, np.newaxis], columns] array([[ 3, 5], [ 9, 11]])
https://pandas.pydata.org/pandas-docs/stable/indexing.html
假设咱们有如下数据集,咱们来练习使用pandas作数据检索和切片:
# Import cars data import pandas as pd cars = pd.read_csv('cars.csv', index_col = 0) # Print out country column as Pandas Series print(cars['country']) In [4]: cars['country'] Out[4]: US United States AUS Australia JAP Japan IN India RU Russia MOR Morocco EG Egypt Name: country, dtype: object: Pandas Series # Print out country column as Pandas DataFrame print(cars[['country']]) In [5]: cars[['country']] Out[5]: country US United States AUS Australia JAP Japan IN India RU Russia MOR Morocco EG Egypt # Print out DataFrame with country and drives_right columns print(cars[['country','drives_right']]) In [6]: cars[['country','drives_right']] Out[6]: country drives_right US United States True AUS Australia False JAP Japan False IN India False RU Russia True MOR Morocco True EG Egypt True # Print out first 3 observations print(cars[0:4]) # Print out fourth, fifth and sixth observation print(cars[4:7]) # Print out first 3 observations print(cars[0:4]) # Print out fourth, fifth and sixth observation print(cars[4:7]) In [14]: cars Out[14]: cars_per_cap country drives_right US 809 United States True AUS 731 Australia False JAP 588 Japan False IN 18 India False RU 200 Russia True MOR 70 Morocco True EG 45 Egypt True In [15]: cars.loc['RU'] Out[15]: cars_per_cap 200 country Russia drives_right True Name: RU, dtype: object In [16]: cars.iloc[4] Out[16]: cars_per_cap 200 country Russia drives_right True Name: RU, dtype: object In [17]: cars.loc[['RU']] Out[17]: cars_per_cap country drives_right RU 200 Russia True In [18]: cars.iloc[[4]] Out[18]: cars_per_cap country drives_right RU 200 Russia True In [19]: cars.loc[['RU','AUS']] Out[19]: cars_per_cap country drives_right RU 200 Russia True AUS 731 Australia False In [20]: cars.iloc[[4,1]] Out[20]: cars_per_cap country drives_right RU 200 Russia True AUS 731 Australia False In [3]: cars.loc['IN','cars_per_cap'] Out[3]: 18 In [4]: cars.iloc[3,0] Out[4]: 18 In [5]: cars.loc[['IN','RU'],'cars_per_cap'] Out[5]: IN 18 RU 200 Name: cars_per_cap, dtype: int64 In [6]: cars.iloc[[3,4],0] Out[6]: IN 18 RU 200 Name: cars_per_cap, dtype: int64 In [7]: cars.loc[['IN','RU'],['cars_per_cap','country']] Out[7]: cars_per_cap country IN 18 India RU 200 Russia In [8]: cars.iloc[[3,4],[0,1]] Out[8]: cars_per_cap country IN 18 India RU 200 Russia print(cars.loc['MOR','drives_right']) True In [1]: cars.loc[:,'country'] Out[1]: US United States AUS Australia JAP Japan IN India RU Russia MOR Morocco EG Egypt Name: country, dtype: object In [2]: cars.iloc[:,1] Out[2]: US United States AUS Australia JAP Japan IN India RU Russia MOR Morocco EG Egypt Name: country, dtype: object In [3]: cars.loc[:,['country','drives_right']] Out[3]: country drives_right US United States True AUS Australia False JAP Japan False IN India False RU Russia True MOR Morocco True EG Egypt True In [4]: cars.iloc[:,[1,2]] Out[4]: country drives_right US United States True AUS Australia False JAP Japan False IN India False RU Russia True MOR Morocco True EG Egypt True
if判断:
age = 3 if age >= 18: print('adult') elif age >= 6: print('teenager') else: print('kid')
循环:
names = ['Michael', 'Bob', 'Tracy'] for name in names: print(name)
>>> list(range(5)) [0, 1, 2, 3, 4] sum =0 for x in range(101): sum = sum+x print(sum)
dist数据相似于javascript的object,由key-value来定义的对象
>>> d = {'Michael': 95, 'Bob': 75, 'Tracy': 85} >>> d['Michael'] 95
set和dist相似,可是它只保存key,不存value,就像是js中literal对象{1,2,3,'a','b'},能够当作数学意义上的无序和无重复元素的集和,支持交集,并集等集合操做,由一个list输入传给set()函数来生成
>>> s = set([1, 2, 3]) >>> s {1, 2, 3}
>>> s1 = set([1, 2, 3]) >>> s2 = set([2, 3, 4]) >>> s1 & s2 {2, 3} >>> s1 | s2 {1, 2, 3, 4}
str,int,None是不可变对象,而List,dict是可变对象
帮助资源查询:
https://docs.python.org/3/library/functions.html#abs
函数有def来定义,能够返回多个值
import math def move(x, y, step, angle=0): nx = x + step * math.cos(angle) ny = y - step * math.sin(angle) return nx, ny >>> x, y = move(100, 100, 60, math.pi / 6) >>> print(x, y) 151.96152422706632 70.0 >>> r = move(100, 100, 60, math.pi / 6) >>> print(r) #本质上函数返回的是一个tuple,而这个tuple的对应元素的值分别赋值给了左变量 (151.96152422706632, 70.0)
函数支持默认参数:
def enroll(name, gender, age=6, city='Beijing'): print('name:', name) print('gender:', gender) print('age:', age) print('city:', city) enroll('Bob', 'M', 7) enroll('Adam', 'M', city='Tianjin')
函数可变参数:
def calc(*numbers): sum = 0
print(type(numbers))
# 注意这里的numbers是tuple数据<class 'tuple'>
for n in numbers:
sum = sum + n * n return sum >>> nums = [1, 2, 3] >>> calc(*nums) #加一个*把list或者tuple变成可变参数传进去*nums表示把nums这个list的全部元素做为可变参数传进去 14
函数关键字参数:
def person(name, age, **kw): print('name:', name, 'age:', age, 'other:', kw)
print(type(kw)) # 注意kw是dict数据类型: <class 'dict'> >>> person('Michael', 30) name: Michael age: 30 other: {} >>> person('Bob', 35, city='Beijing') name: Bob age: 35 other: {'city': 'Beijing'} >>> person('Adam', 45, gender='M', job='Engineer') name: Adam age: 45 other: {'gender': 'M', 'job': 'Engineer'}
>>> extra = {'city': 'Beijing', 'job': 'Engineer'}
>>> person('Jack', 24, **extra)
name: Jack age: 24 other: {'city': 'Beijing', 'job': 'Engineer'}
**extra表示把extra这个dict的全部key-value用关键字参数传入到函数的**kw参数,kw将得到一个dict,注意kw得到的dict是extra的一份拷贝,对kw的改动不会影响到函数外的extra
命名关键字参数:
def person(name, age, *, city='Beijing', job): #含默认值的命名关键字参数,city默认就为'beijing' print(name, age, city, job) >>> person('Jack', 24, city='Beijing', job='Engineer') Jack 24 Beijing Engineer
关键字参数有什么用?它能够扩展函数的功能。好比,在person函数里,咱们保证能接收到name和age这两个参数,可是,若是调用者愿意提供更多的参数,咱们也能收到。试想你正在作一个用户注册的功能,除了用户名和年龄是必填项外,其余都是可选项,利用关键字参数来定义这个函数就能知足注册的需求
不少高级语言都提供相似的功能,其做用是对于list里面的每个元素都执行相同的函数,而且返回一个iterator,进而可使用list()函数来生成新的list
def f(x): return x*x r = map(f,[1,2,3,4,5]) print(r)
print(isinstance(r, Iterator)) # True
print(list(r)) #结果以下 #<map object at 0x000000000072B9B0>, 返回结果是一个Iterator,所以必须经过list()调用才能生成list #[1, 4, 9, 16, 25]
https://pypi.python.org/pypi/mysql-connector-python/2.0.4
image module code example:
from PIL import Image im = Image.open(r'C:\Users\Administrator\Desktop\jj.png') print(im.format,im.size,im.mode) im.thumbnail((100,50)) im.save('thumb.jpg','png')
Python网络服务编程
服务端:
import socket import threading import time def tcplink(sock,addr): print(('Accept new connection from %s:%s...' % addr)) sock.send(b'Welcome, client!') while True: data = sock.recv(1024) time.sleep(1) if not data or data.decode('utf-8') == 'exit': break sock.send(('Hello, %s!' % data).encode('utf-8')) sock.close() print('Connection from %s:%s closed.' %addr) s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.bind(('127.0.0.1',9999)) s.listen(5) print('waiting for connection coming on server...') while True: sock, addr = s.accept() t = threading.Thread(target=tcplink,args=(sock,addr)) t.start()
#下面是server端的打印信息:
waiting for connection coming on server...
Accept new connection from 127.0.0.1:64891...
Connection from 127.0.0.1:64891 closed.
Accept new connection from 127.0.0.1:65304...
Connection from 127.0.0.1:65304 closed.
Accept new connection from 127.0.0.1:65408...
Connection from 127.0.0.1:65408 closed.
Accept new connection from 127.0.0.1:65435...
Connection from 127.0.0.1:65435 closed.
Accept new connection from 127.0.0.1:65505...
Connection from 127.0.0.1:65505 closed.
测试客户端
import socket import threading import time s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.connect(('127.0.0.1',9999)) print((s.recv(1024).decode('utf-8'))) for data in [b'Michael',b'Tracy',b'Sarah']: s.send(data) print(s.recv(1024).decode('utf-8')) s.send(b'exit') s.close()
#下面是客户端的打印信息:
Welcome, client!
Hello, b'Michael'!
Hello, b'Tracy'!
Hello, b'Sarah'!
ipython notebook->jupyter notebooks演进
总的来讲分为interface level和kernel level两个领域,接口这一层能够有notebooks,ipython console, qt console,直接经过一个MQ over socket和kernel level通讯,该通讯接口负责传输要执行的python code以及code执行完成后返回的data。
而jupyter将notebooks的这种模式扩展到多种语言,好比R, bash,在kernel层分别增长对应语言的kernel组件,负责对应语言的执行和返回结果。
https://plot.ly/python/ipython-vs-python/
IPython是一个加强交互能力的python console环境,它提供了不少有用的feature:
和标准的python console相比,它提供: Tab completion的功能,exlporing your objects,好比经过object_name?就将列出全部关于对象的细节。Magic functions, 好比%timeit这个magic经常能够用来检查代码执行的效率, %run这个magic能够容许你执行任何python scirpt而且将其全部的data直接加载到交互环境中。执行系统shell commands,好比!ping www.xxx.com, 也能够获取到系统脚本命令输出的内容:
files = !ls
!grep -rF $pattern ipython/*
.
将python的变量$pattern传入上面的grep系统命令
http://ipython.org/ipython-doc/dev/interactive/tutorial.html#magic-functions
答案是在ipython下执行如下命令
%doctest_mode
Jupyter notebook软件在至少如下两种场景中很是好用:
1. 但愿针对已经存在的notebook作进一步实验或者纯粹的学习;
2. 但愿本身开发一个notebook用于辅助教学或者生成学术文章
在这两种场景下,你可能都但愿在一个特定的目录下运行Jupyter notebook:
cd到你的目录中,执行如下命令:
jupyter notebook
便可打开notebook,而且列出该目录下的全部文件: http://localhost:8888/tree
some python debug study tips:
y=[x*x for x in range(1,11)] print(dir(y)) # 输出: ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
%who Series #列出全部Series类型的变量 s temp_diffs temps1 temps2 %who #列出全部global DataFrame Series dates np pd plt s temp_diffs temps1 temps2 %whos #列出全部global及其详细的type: Variable Type Data/Info --------------------------------------- DataFrame type <class 'pandas.core.frame.DataFrame'> Series type <class 'pandas.core.series.Series'> dates DatetimeIndex DatetimeIndex(['2014-07-0<...>atetime64[ns]', freq='D') my_func function <function my_func at 0x00000211211B7C80> np module <module 'numpy' from 'C:\<...>ges\\numpy\\__init__.py'> pd module <module 'pandas' from 'C:<...>es\\pandas\\__init__.py'> plt module <module 'matplotlib.pyplo<...>\\matplotlib\\pyplot.py'> s Series a 1\nb 2\nc 3\nd 4\ndtype: int64 temp_diffs Series 2014-07-01 10\n2014-07<...>10\nFreq: D, dtype: int64 temps1 Series 2014-07-01 80\n2014-07<...>87\nFreq: D, dtype: int64 temps2 Series 2014-07-01 70\n2014-07<...>77\nFreq: D, dtype: int64
import pandas as pd print(dir(pd)) print(help(pd.Series)) ['Categorical', 'CategoricalIndex', 'DataFrame', 'DateOffset', 'DatetimeIndex', 'ExcelFile', 'ExcelWriter', 'Expr', 'Float64Index', 'Grouper', 'HDFStore', 'Index', 'IndexSlice', 'Int64Index', 'MultiIndex', 'NaT', 'Panel', 'Panel4D', 'Period', 'PeriodIndex', 'RangeIndex', 'Series', 'SparseArray', 'SparseDataFrame', 'SparseList', 'SparsePanel', 'SparseSeries', 'SparseTimeSeries', 'Term', 'TimeGrouper', 'TimeSeries', 'Timedelta', 'TimedeltaIndex', 'Timestamp', 'WidePanel', '__builtins__', '__cached__', '__doc__', '__docformat__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_np_version_under1p10', '_np_version_under1p11', '_np_version_under1p12', '_np_version_under1p8', '_np_version_under1p9', '_period', '_sparse', '_testing', '_version', 'algos', 'bdate_range', 'compat', 'computation', 'concat', 'core', 'crosstab', 'cut', 'date_range', 'datetime', 'datetools', 'dependency', 'describe_option', 'eval', 'ewma', 'ewmcorr', 'ewmcov', 'ewmstd', 'ewmvar', 'ewmvol', 'expanding_apply', 'expanding_corr', 'expanding_count', 'expanding_cov', 'expanding_kurt', 'expanding_max', 'expanding_mean', 'expanding_median', 'expanding_min', 'expanding_quantile', 'expanding_skew', 'expanding_std', 'expanding_sum', 'expanding_var', 'factorize', 'fama_macbeth', 'formats', 'get_dummies', 'get_option', 'get_store', 'groupby', 'hard_dependencies', 'hashtable', 'index', 'indexes', 'infer_freq', 'info', 'io', 'isnull', 'json', 'lib', 'lreshape', 'match', 'melt', 'merge', 'missing_dependencies', 'msgpack', 'notnull', 'np', 'offsets', 'ols', 'option_context', 'options', 'ordered_merge', 'pandas', 'parser', 'period_range', 'pivot', 'pivot_table', 'plot_params', 'pnow', 'qcut', 'read_clipboard', 'read_csv', 'read_excel', 'read_fwf', 'read_gbq', 'read_hdf', 'read_html', 'read_json', 'read_msgpack', 'read_pickle', 'read_sas', 'read_sql', 'read_sql_query', 'read_sql_table', 'read_stata', 'read_table', 'reset_option', 'rolling_apply', 'rolling_corr', 'rolling_count', 'rolling_cov', 'rolling_kurt', 'rolling_max', 'rolling_mean', 'rolling_median', 'rolling_min', 'rolling_quantile', 'rolling_skew', 'rolling_std', 'rolling_sum', 'rolling_var', 'rolling_window', 'scatter_matrix', 'set_eng_float_format', 'set_option', 'show_versions', 'sparse', 'stats', 'test', 'timedelta_range', 'to_datetime', 'to_msgpack', 'to_numeric', 'to_pickle', 'to_timedelta', 'tools', 'tseries', 'tslib', 'types', 'unique', 'util', 'value_counts', 'wide_to_long'] Help on class Series in module pandas.core.series: class Series(pandas.core.base.IndexOpsMixin, pandas.core.strings.StringAccessorMixin, pandas.core.generic.NDFrame) | One-dimensional ndarray with axis labels (including time series). | | Labels need not be unique but must be any hashable type. The object
为什么要引入Numpy?
因为标准的python list中保存的是对象的指针,所以必须二次寻址才能访问到list中的元素。显然这是低效而且浪费空间的。。
而且标准python list或者array不支持二纬数组,也不支持对数组数据作一些复杂适合数字运算的函数。
numpy为了提升性能,而且支持二纬数组的复杂运算使用C语言编写底层的实现而且以python obj方式给python调用。
其核心实现了如下两个东西:
import numpy as np from matplotlib import pyplot as plt x = np.linspace(0,2 * np.pi,100) y = np.sin(x) // y是对x中的全部元素执行sin计算 plt.plot(x,y,'r-',linewidth=3,label='sin function') plt.xlabel('x') plt.ylabel('sin(x)') plt.show()
上面的代码先产生0到$2\pi$的等差数组,而后传递给np.sin()函数,逐个计算其sin值。因为np.sin()是一个ufunc函数,所以其内部对数组x的每一个元素进行循环,分别计算他们的正弦值,将结果保存为一个数组并返回。
https://www.jianshu.com/p/3c3f7da88516
参看<<利用Python进行数据分析·第2版>>
为什么须要pandas
numpy的2d数组虽然能够模拟pandas提供的功能,可是主要numpy原生的2d数组必须使用相同的数据类型,而在现实的数据分析任务中不少是不一样类型的。
pandas在numpy之上又提供了相似于sql数据处理机制,提供Series和Dataframe两种数据类型。 每一个Series实际上包含index和values两个ndarray.其中index保存建立series时传入的index信息,values则是保存对应值的ndarray数组。numpy的ufunc函数都对该values数组来执行.
http://www.tetraph.com/blog/machine-learning/jupyter-notebook-keyboard-shortcut-command-mode-edit-mode/
dataframe.loc/iloc vs []index operator
.oc/iloc都是指的row,而[]则默认给column selection, column总归会有一个name,所以column selection老是label based
df.loc[:,['Name','cost']] #返回全部store的name和cost value
shoplist = ['apple','mango','carrot','banana'] mylist = shoplist del shoplist[0] print('shoplist is:',shoplist) print('mylist is:',mylist) # 上面是相同的输出 print('copy via slice and asignment') mycopiedlist = shoplist[:] # make a copy by doing a full slice del(mycopiedlist[0]) print('shoplist is: ',shoplist) print('mycopiedlist is:',mycopiedlist)
list('ABCD') # 输出 ['A', 'B', 'C', 'D']
有的时候,咱们经过ipython shell作探索式编程,有一些函数已经作了定义和运行,随后想再查看一下这个函数的代码,而且准备调用它,这时你就须要想办法“重现”该函数的代码。
方法是:经过inspect模块
import inspect source_DF = inspect.getsource(pandas.DataFrame) print(type(source_DF)) print(source_DF[:200]) #打印源程序代码 source_file_DF = inspect.getsourcefile(pandas.DataFrame) print(source_file_DF) # D:\Users\dengdong\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py
a = [0,1,2,3,4,5,6,7,8,9] b = a[:] print(id(a)) # 54749320 print(id(b)) # 54749340