在结核菌中,以12SNP为阈值判断菌株间是否为近期传播。利用生物信息学方法从原始测序结果获得各菌株的SNP文件,经过互相比较获得两个文件:(1)各菌株间相互比较的状况,格式为:菌株AVS菌株B SNP距离,总行数为n*(n-1)/2 (2)利用文件1获得近期传播的菌株列表接下来要把这些菌株串起来,造成传播链,核心思想是:链内的任一菌株与至少1个链内其它菌株的SNP距离<=12;链内的任一菌株与链外全部菌株的距离都>12所以,核心思想也就是“迭代”,并不断删除数据,减小计算量import sysimport ref1=open(sys.argv[1]).readlines() #strain distancef2=open(sys.argv[2]).readlines() #distance <= 12 strain listf3=open("rough_transmission_chain","a")sum_chain_list=[i.rstrip().split()[0] for i in f1 if int(i.rstrip().split()[1])<=12]def find_transmission_strain(strain): cycle_strain = [] match_strain = [] True_strain = [i for i in sum_chain_list if i.startswith(strain + "VS") or i.endswith("VS" + strain)] for item in True_strain: temp = re.sub("VS", "", item) sum_chain_list.remove(item) if temp[:len(strain)] == strain: match_strain.append(temp[len(strain):]) cycle_strain.append(temp[len(strain):]) else: match_strain.append(temp[0:len(temp) - len(strain)]) cycle_strain.append(temp[0:len(temp) - len(strain)]) if len(cycle_strain) > 0: for a in range(len(cycle_strain)): for b in range(len(cycle_strain)): try: sum_chain_list.remove(cycle_strain[a] + "VS" + cycle_strain[b]) except ValueError as e: pass for i in cycle_strain: match_strain.extend(find_transmission_strain(i)) return set(match_strain)temp = []for i in f2: i = i.rstrip()if i not in temp: a = list(find_transmission_strain(i)) a.append(i) temp.extend(a) for strain in a: f3.write(strain + "\t") f3.write("\n")else: pass