算法集锦(五)

时间 2019-12-17

标签算法集锦繁體版

原文原文链接

求连通图的割点（关节点）

题目：求一个连通图的割点，割点的定义是，若是除去此节点和与其相关的边，图再也不连通，描述算法。html

分析：ios

1. 最简单也是最直接的算法是，删除一个点而后判断连通性，若是删除此点，图再也不连通，则此点是割点，反之不是割点（图的连通性通常经过深搜来断定，是否能一次搜索完所有顶点）；面试

2. 经过深搜优先生成树来断定。从任一点出发深度优先遍历获得优先生成树，对于树中任一顶点Ｖ而言，其孩子节点为邻接点。由深度优先生成树可得出两类割点的特性：算法

（１）若生成树的根有两棵或两棵以上的子树，则此根顶点必为割点。由于图中不存在链接不一样子树顶点的边，若删除此节点，则树便成为森林；数组

（２）若生成树中某个非叶子顶点V，其某棵子树的根和子树中的其余节点均没有指向V的祖先的回边，则V为割点。由于删去v，则其子树和图的其它部分被分割开来。数据结构

仍然利用深搜算法，只不过在这里定义visited[v]表示为深度优先搜索遍历图时访问顶点v的次序号，定义low[v]=Min{visited[v]，low[w]，visited[k]}，其中w是顶点v在深度优先生成树上的孩子节点；k是顶点v在深度优先生成树上由回边联结的祖先节点。less

割点断定条件：若是对于某个顶点v，存在孩子节点w且low[w]>=visited[v]，则该顶点v必为关节点。由于当w是v的孩子节点时，low[w]>=visited[v]，代表w及其子孙均无指向v的祖先的回边，那么当删除顶点v后，v的孩子节点将于其余节点被分割开来，历来造成新的连通份量。ide

#include <iostream>
#include <string> using namespace std; #define MAX_VERTEX_NUM 13 //邻接表存储结构 typedef struct ArcNode{ int adjvex; ArcNode *nextarc; }ArcNode; typedef struct VNode{ string data; ArcNode* firstarc; }VNode,AdjList[MAX_VERTEX_NUM]; typedef struct{ AdjList vertices; int vexnum, arcnum; }ALGraph; //返回u在图中的位置 int LocateVex(ALGraph G, string u) { for(int i=0; i<G.vexnum; i++) if(G.vertices[i].data==u) return i; return -1; } //构造图 void CreateDG(ALGraph &G) { string v1, v2; int i, j, k; cout<<"请输入顶点数和边数："; cin>>G.vexnum>>G.arcnum; cout<<"请输入顶点："; for(i=0; i<G.vexnum; i++) { cin>>G.vertices[i].data; G.vertices[i].firstarc=NULL; } cout<<"请输入边："<<endl; for(k=0; k<G.arcnum; k++) { cin>>v1>>v2; i=LocateVex(G, v1); j=LocateVex(G, v2); //无向图 ArcNode *arc=new ArcNode; arc->adjvex=j; arc->nextarc=G.vertices[i].firstarc; G.vertices[i].firstarc=arc; arc=new ArcNode; arc->adjvex=i; arc->nextarc=G.vertices[j].firstarc; G.vertices[j].firstarc=arc; } } //求割点 int count ; int visited[MAX_VERTEX_NUM]; int low[MAX_VERTEX_NUM]; //从第v0个顶点出发深搜，查找并输出关节点（割点） void DFSArticul(ALGraph G, int v0) { int min, w; ArcNode *p; visited[v0]=min=++count;//v0是第count个访问的顶点，min的初值为visited[v0]，即v0的访问次序 for(p=G.vertices[v0].firstarc; p ; p=p->nextarc) { w=p->adjvex; if(visited[w]==0)//w不曾访问，是v0的孩子  { DFSArticul(G, w);//从第w个顶点出发深搜，查找并输出关节点（割点），返回前求得low[w] if(low[w]<min)//若是v0的孩子节点w的low[]小，说明孩子节点还与其余节点（祖先）相邻 min=low[w]; if(low[w]>=visited[v0])//v0的孩子节点w只与v0相连，则v0是关节点（割点） cout<<G.vertices[v0].data<<" "; } else if(visited[w]<min)//w已访问，则w是v0生成树上祖先，它的访问顺序必小于min min=visited[w]; } low[v0]=min;//low[v0]取三者最小值  } void FindArticul(ALGraph G) { int i, v; ArcNode *p; count=1; visited[0]=1;//从0号节点开始 for(i=1; i<G.vexnum; i++) visited[i]=0; p=G.vertices[0].firstarc; v=p->adjvex; DFSArticul(G, v); if(count<G.vexnum) { cout<<G.vertices[0].data<<" "; while(p->nextarc) { p=p->nextarc; v=p->adjvex; if(visited[v]==0) DFSArticul(G, v); } } } void main() { ALGraph g; CreateDG(g); cout<<"割点以下: "<<endl; FindArticul(g); cout<<endl; }

寻找最小的k个数

题目描述

输入n个整数，输出其中最小的k个。函数

分析与解法

解法一

要求一个序列中最小的k个数，按照惯有的思惟方式，则是先对这个序列从小到大排序，而后输出前面的最小的k个数。post

至于选取什么的排序方法，我想你可能会第一时间想到快速排序（咱们知道，快速排序平均所费时间为 n*logn ），而后再遍历序列中前k个元素输出便可。所以，总的时间复杂度： O（n * log n)+O(k)=O（n * log n） 。

解法二

我们再进一步想一想，题目没有要求最小的k个数有序，也没要求最后n-k个数有序。既然如此，就没有必要对全部元素进行排序。这时，我们想到了用选择或交换排序，即：

一、遍历n个数，把最早遍历到的k个数存入到大小为k的数组中，假设它们便是最小的k个数；
二、对这k个数，利用选择或交换排序找到这k个元素中的最大值kmax（找最大值须要遍历这k个数，时间复杂度为 O（k） ）；
三、继续遍历剩余n-k个数。假设每一次遍历到的新的元素的值为x，把x与kmax比较：若是 x < kmax ，用x替换kmax，并回到第二步从新找出k个元素的数组中最大元素kmax‘；若是 x >= kmax ，则继续遍历不更新数组。

每次遍历，更新或不更新数组的所用的时间为 O（k） 或 O（0） 。故整趟下来，时间复杂度为 n*O（k）=O（n*k） 。

解法三

更好的办法是维护容量为k的最大堆，原理跟解法二的方法类似：

一、用容量为k的最大堆存储最早遍历到的k个数，一样假设它们便是最小的k个数；
二、堆中元素是有序的，令k1<k2<...<kmax（kmax设为最大堆中的最大元素）
三、遍历剩余n-k个数。假设每一次遍历到的新的元素的值为x，把x与堆顶元素kmax比较：若是 x < kmax ，用x替换kmax，而后更新堆（用时logk）；不然不更新堆。

这样下来，总的时间复杂度: O（k+（n-k）*logk）=O（n*logk） 。此方法得益于堆中进行查找和更新的时间复杂度均为： O(logk) （若使用解法二：在数组中找出最大元素，时间复杂度： O（k）） 。

解法四

在《数据结构与算法分析--c语言描述》一书，第7章第7.7.6节中，阐述了一种在平均状况下，时间复杂度为 O（N） 的快速选择算法。以下述文字：

Since we can sort the file in O(nlog n) time, one might expect to obtain a better time bound for selection. The algorithm we present to find the kth smallest element in a set S is almost identical to quicksort. In fact, the first three steps are the same. We will call this algorithm quickselect（叫作快速选择）. Let |Si| denote the number of elements in Si（令|Si|为Si中元素的个数）. The steps of quickselect are（ 步骤以下 ）:

If |S| = 1, then k = 1 and return the elements in S as the answer. If a cutoff for small files is being used and |S| <=CUTOFF, then sort S and return the kth smallest element.

Pick a pivot element, v (- S.（选取S中一个元素做为枢纽元v）

Partition S - {v} into S1 and S2, as was done with quicksort. （将集合S-{v}分割成S1和S2，就像快速排序那样）

If k <= |S1|, then the kth smallest element must be in S1. In this case, return quickselect (S1, k). If k = 1 + |S1|, then the pivot is the kth smallest element and we can return it as the answer. Otherwise, the kth smallest element lies in S2, and it is the (k - |S1| - 1)st smallest element in S2. We make a recursive call and return quickselect (S2, k - |S1| - 1). （若是k<=|S1|，那么第k个最小元素必然在S1中。在这种状况下，返回quickselect（S1,k）。若是k=1+|S1|，那么枢纽元素就是第k个最小元素，即找到，直接返回它。不然，这第k个最小元素就在S2中，即S2中的第（k-|S1|-1）个最小元素，咱们递归调用并返回quickselect（S2，k-|S1|-1））。

In contrast to quicksort, quickselect makes only one recursive call instead of two. The worst case of quickselect is identical to that of quicksort and is O(n2). Intuitively, this is because quicksort's worst case is when one of S1 and S2 is empty; thus, quickselect（快速选择） is not really saving a recursive call. The average running time, however, is O(n)（ 不过，其平均运行时间为O（N） ）. The analysis is similar to quicksort's and is left as an exercise.

示例代码：

//QuickSelect 将第k小的元素放在 a[k-1]  
void QuickSelect( int a[], int k, int left, int right ) { int i, j; int pivot; if( left + cutoff <= right ) { pivot = median3( a, left, right ); //取三数中值做为枢纽元，能够很大程度上避免最坏状况 i = left; j = right - 1; for( ; ; ) { while( a[ ++i ] < pivot ){ } while( a[ --j ] > pivot ){ } if( i < j ) swap( &a[ i ], &a[ j ] ); else break; } //重置枢纽元 swap( &a[ i ], &a[ right - 1 ] ); if( k <= i ) QuickSelect( a, k, left, i - 1 ); else if( k > i + 1 ) QuickSelect( a, k, i + 1, right ); } else InsertSort( a + left, right - left + 1 ); }

这个快速选择SELECT算法，相似快速排序的划分方法。N个数存储在数组S中，再从数组中选取“中位数的中位数”做为枢纽元X，把数组划分为Sa和Sb俩部分，Sa<=X<=Sb，若是要查找的k个元素小于Sa的元素个数，则返回Sa中较小的k个元素，不然返回Sa中全部元素+Sb中小的k-|Sa|个元素，这种解法在平均状况下能作到 O（N） 的复杂度。

解法五

《算法导论》介绍了一个随机选取主元的选择算法RANDOMIZED-SELECT。它每次都是随机选取数列中的一个元素做为主元，在 O（n） 的时间内找到第k小的元素，而后遍历输出前面的k个小的元素。平均时间复杂度： O（n+k）=O（n） （当k比较小时）。

咱们知道，快速排序是以固定的第一个或最后一个元素做为主元，每次递归划分都是不均等的，最后的平均时间复杂度为： O（n*logn） 。而RANDOMIZED-SELECT与普通的快速排序不一样，它每次递归都是随机选择序列，从第一个到最后一个元素中任一一个做为主元。

下面是RANDOMIZED-SELECT(A, p, r)完整伪码：

PARTITION(A, p, r) //partition过程 p为第一个数，r为最后一个数 1 x ← A[r] //以最后一个元素做为主元 2 i ← p - 1 3 for j ← p to r - 1 4 do if A[j] ≤ x 5 then i ← i + 1 6 exchange A[i] <-> A[j] 7 exchange A[i + 1] <-> A[r] 8 return i + 1 RANDOMIZED-PARTITION(A, p, r) //随机快排的partition过程 1 i ← RANDOM(p, r) //i 随机取p到r中个一个值 2 exchange A[r] <-> A[i] //以随机的 i做为主元 3 return PARTITION(A, p, r) //调用上述原来的partition过程 RANDOMIZED-SELECT(A, p, r, i) //以线性时间作选择，目的是返回数组A[p..r]中的第i 小的元素 1 if p = r //p=r，序列中只有一个元素 2 then return A[p] 3 q ← RANDOMIZED-PARTITION(A, p, r) //随机选取的元素q做为主元 4 k ← q - p + 1 //k表示子数组 A[p…q]内的元素个数，处于划分低区的元素个数加上一个主元元素 5 if i == k //检查要查找的i 等于子数组中A[p....q]中的元素个数k 6 then return A[q] //则直接返回A[q] 7 else if i < k 8 then return RANDOMIZED-SELECT(A, p, q - 1, i) //获得的k 大于要查找的i 的大小，则递归到低区间A[p，q-1]中去查找 9 else return RANDOMIZED-SELECT(A, q + 1, r, i - k) //获得的k 小于要查找的i 的大小，则递归到高区间A[q+1，r]中去查找。

下面则是《算法导论》原版关于RANDOMIZED-SELECT(A, p, r)为 O（n） 的证实，阐述以下：

此RANDOMIZED-SELECT最坏状况下时间复杂度为Θ(n2),即便是要选择最小元素也是如此，由于在划分时可能极不走运，老是按余下元素中的最大元素进行划分，而划分操做须要O（n）的时间。

然而此算法的平均状况性能极好，由于它是随机化的，故没有哪种特别的输入会致使其最坏状况的发生。

算法导论上，针对此 RANDOMIZED-SELECT算法平均时间复杂度为O（n）的证实 ，引用以下，或许，能给你我多点的启示（原本想直接引用第二版中文版的翻译文字，但在中英文对照阅读的状况下，发现第二版中文版的翻译实在不怎么样，因此，得本身一个一个字的敲，最终敲完修正以下），分4步证实：

当RANDOMIZED-SELECT做用于一个含有n个元素的输入数组A[p ..r]上时，所需时间是一个随机变量，记为T(n),咱们能够这样获得线性指望值E [T(n)]的下界：程序RANDOMIZED-PARTITION会以等同的可能性返回数组中任何一个元素为主元，所以，对于每个k，（1 ≤k ≤n）,子数组A[p ..q]有k个元素，它们所有小于或等于主元元素的几率为1/n.对k = 1, 2,...,n,咱们定指示器X k ，为：
X k = I{子数组A[p ..q]恰有k个元素} ,
咱们假定元素的值不一样，所以有
E[X k ]=1/n
当调用RANDOMIZED-SELECT而且选择A[q]做为主元元素的时候,咱们事先不知道是否会当即找到咱们所想要的第i小的元素，由于，咱们颇有可能须要在子数组A[p ..q - 1], 或A[q + 1 ..r]上递归继续进行寻找.具体在哪个子数组上递归寻找，视第i小的元素与A[q]的相对位置而定.

假设T(n)是单调递增的，咱们能够将递归所需时间的界限限定在输入数组时可能输入的所需递归调用的最大时间（此句话，原中文版的翻译也是有问题的）.换言之,咱们判定,为获得一个上界，咱们假定第i小的元素老是在划分的较大的一边，对一个给定的RANDOMIZED-SELECT,指示器Xk恰好在一个k值上取1，在其它的k值时，都是取0.当Xk =1时，可能要递归处理的俩个子数组的大小分别为k-1，和n-k，所以可获得递归式为

取指望值为：

为了能应用等式 (C.23) ,咱们依赖于X k 和T(max(k - 1,n - k))是独立的随机变量（这个能够证实，证实此处略）。
3. 下面，咱们来考虑下表达式max(k - 1,n -k)的结果.咱们有：

若是n是偶数，从T(⌉)到T(n - 1)每一个项在总和中恰好出现俩次，T(⌋)出现一次。所以，有

咱们能够用替换法来解上面的递归式。假设对知足这个递归式初始条件的某个常数c，有T(n) ≤cn。咱们假设对于小于某个常数c（稍后再来讲明如何选取这个常数）的n，有T(n) =O(1)。同时，还要选择一个常数a，使得对于全部的n>0，由上式中O(n)项(用来描述这个算法的运行时间中非递归的部分)所描述的函数，可由an从上方限界获得（这里，原中文版的翻译的确是有点含糊）。利用这个概括假设，能够获得：

为了完成证实，还须要证实对足够大的n，上面最后一个表达式最大为cn，即要证实：cn/4 -c/2 -an ≥ 0.若是在俩边加上c/2，而且提取因子n，就能够获得n(c/4 -a) ≥c/2.只要咱们选择的常数c能知足c/4 -a > 0, i.e.,即c > 4a,咱们就能够将俩边同时除以c/4 -a, 最终获得：

综上，若是假设对n < 2c/(c -4a),有T(n) =O(1)，咱们就能获得E[T(n)] =O(n)。因此，最终咱们能够得出这样的结论，并确认无疑： 在平均状况下，任何顺序统计量（特别是中位数）均可以在线性时间内获得 。

结论：RANDOMIZED-SELECT的时间复杂度为 O（N） ，但它在最坏状况下时间的复杂度为 O（N^2） 。

解法五

《算法导论》第九章第9.3节介绍了一个最坏状况线性时间的选择算法，以下：

9.3 Selection in worst-case linear time（ 最坏状况下线性时间的选择算法 ）

We now examine a selection algorithm whose running time isO(n) in the worst case（如今来看，一个最坏状况运行时间为O（N）的选择算法SELECT）. Like RANDOMIZED-SELECT, the algorithm SELECT finds the desired element by recursively partitioning the input array. The idea behind the algorithm, however, is toguarantee a good split when the array is partitioned. SELECT uses the deterministic partitioning algorithm PARTITION from quicksort (seeSection 7.1), modified to take the element to partition around as an input parameter（像RANDOMIZED-SELECT同样，SELECTT经过输入数组的递归划分来找出所求元素，可是，该算法的基本思想是要保证对数组的划分是个好的划分。SECLECT采用了取自快速排序的肯定性划分算法partition，并作了修改，把划分主元元素做为其参数）.

The SELECT algorithm determines theith smallest of an input array ofn > 1 elements by executing the following steps. (Ifn = 1, then SELECT merely returns its only input value as theith smallest.)（算法SELECT经过执行下列步骤来肯定一个有n>1个元素的输入数组中的第i小的元素。（若是n=1，则SELECT返回它的惟一输入数值做为第i个最小值。））

Divide then elements of the input array into⌋ groups of 5 elements each and at most one group made up of the remainingn mod 5 elements.

Find the median of each of the⌉ groups by first insertion sorting the elements of each group (of which there are at most 5) and then picking the median from the sorted list of group elements.

Use SELECT recursively to find the medianx of the⌉ medians found in step 2. (If there are an even number of medians, then by our convention,x is the lower median.)

Partition the input array around the median-of-mediansx using the modified version of PARTITION. Letk be one more than the number of elements on the low side of the partition, so thatx is thekth smallest element and there aren-k elements on the high side of the partition.（利用修改过的partition过程，按中位数的中位数x对输入数组进行划分，让k比划低去的元素数目多1，因此，x是第k小的元素，而且有n-k个元素在划分的高区）

Ifi =k, then returnx. Otherwise, use SELECT recursively to find theith smallest element on the low side ifi k.（ 若是要找的第i小的元素等于程序返回的k ，即i=k，则返回x。不然，若是ik，则在高区间找第（i-k）个最小元素）

（以上五个步骤，即本文上面的第四节末中所提到的所谓“五分化中项的中项”的方法。）

To analyze the running time of SELECT, we first determine a lower bound on the number of elements that are greater than the partitioning element x. （为了分析SELECT的运行时间，先来肯定大于划分主元元素x的的元素数的一个下界）Figure 9.1 is helpful in visualizing this bookkeeping. At least half of the medians found in step 2 are greater than[1] the median-of-medians x. Thus, at least half of the ⌉ groupscontribute 3 elements that are greater than x, except for the one group that has fewer than 5 elements if 5 does not dividen exactly, and the one group containingx itself. Discounting these two groups, it follows that the number of elements greater thanx is at least：

（Figure 9.1: 对上图的解释或称对SELECT算法的分析：n个元素由小圆圈来表示，而且每个组占一纵列。组的中位数用白色表示，而各中位数的中位数x也被标出。（当寻找偶数数目元素的中位数时，使用下中位数）。箭头从比较大的元素指向较小的元素，从中能够看出，在x的右边，每个包含5个元素的组中都有3个元素大于x，在x的左边，每个包含5个元素的组中有3个元素小于x。大于x的元素以阴影背景表示。）

Similarly, the number of elements that are less thanx is at least 3n/10 - 6. Thus, in the worst case, SELECT is called recursively on at most 7n/10 + 6 elements in step 5.

We can now develop a recurrence for the worst-case running timeT(n) of the algorithm SELECT. Steps 1, 2, and 4 take O(n) time. (Step 2 consists ofO(n) calls of insertion sort on sets of sizeO(1).) Step 3 takes timeT(⌉), and step 5 takes time at mostT(7n/10+ 6), assuming thatT is monotonically increasing. We make the assumption, which seems unmotivated at first, that any input of 140 or fewer elements requiresO(1) time; the origin of the magic constant 140 will be clear shortly. We can therefore obtain the recurrence：

We show that the running time is linear by substitution. More specifically, we will show thatT(n) ≤cn for some suitably large constant c and alln > 0. We begin by assuming thatT(n) ≤cn for some suitably large constantc and alln ≤ 140; this assumption holds ifc is large enough. We also pick a constanta such that the function described by theO(n) term above (which describes the non-recursive component of the running time of the algorithm) is bounded above byan for alln > 0. Substituting this inductive hypothesis into the right-hand side of the recurrence yields
T(n) ≤ c⌉ +c(7n/10 + 6) +an
     ≤ cn/5 +c + 7cn/10 + 6c +an
     = 9cn/10 + 7c +an
     = cn + (-cn/10 + 7c +an) ,
which is at mostcn if

Inequality (9.2) is equivalent to the inequalityc ≥ 10a(n/(n - 70)) when n > 70. Because we assume thatn ≥ 140, we have n/(n - 70) ≤ 2, and so choosing c ≥ 20a will satisfyinequality (9.2). (Note that there is nothing special about the constant 140; we could replace it by any integer strictly greater than 70 and then choosec accordingly.) The worst-case running time of SELECT is therefore linear（ 所以，此SELECT的最坏状况的运行时间是线性的 ）.

As in a comparison sort (seeSection 8.1), SELECT and RANDOMIZED-SELECT determine information about the relative order of elements only by comparing elements. Recall fromChapter 8 that sorting requiresΩ(n lgn) time in the comparison model, even on average (see Problem 8-1). The linear-time sorting algorithms in Chapter 8 make assumptions about the input. In contrast, the linear-time selection algorithms in this chapter do not require any assumptions about the input. They are not subject to the Ω(n lgn) lower bound because they manage to solve the selection problem without sorting.

（与比较排序（算法导论8.1节）中的同样，SELECT和RANDOMIZED-SELECT仅经过元素间的比较来肯定它们之间的相对次序。在算法导论第8章中，咱们知道在比较模型中，即便在平均状况下，排序仍然要 O（n*logn） 的时间。第8章得线性时间排序算法在输入上作了假设。相反地，本节提到的此相似partition过程的SELECT算法不须要关于输入的任何假设，它们不受下界 O（n*logn） 的约束，由于它们没有使用排序就解决了选择问题（看到了没，道出了此算法的本质阿））

Thus, the running time is linear because these algorithms do not sort; the linear-time behavior is not a result of assumptions about the input, as was the case for the sorting algorithms inChapter 8. Sorting requiresΩ(n lgn) time in the comparison model, even on average (see Problem 8-1), and thus the method of sorting and indexing presented in the introduction to this chapter is asymptotically inefficient.（因此，本节中的选择算法之因此具备线性运行时间，是由于这些算法没有进行排序；线性时间的结论并不须要在输入上所任何假设，便可获得）。

触类旁通

一、谷歌面试题：输入是两个整数数组，他们任意两个数的和又能够组成一个数组，求这个和中前k个数怎么作？

分析：

“假设两个整数数组为A和B，各有N个元素，任意两个数的和组成的数组C有N^2个元素。
   那么能够把这些和当作N个有序数列：
          A[1]+B[1] <= A[1]+B[2] <= A[1]+B[3] <=…
          A[2]+B[1] <= A[2]+B[2] <= A[2]+B[3] <=…
          …
         A[N]+B[1] <= A[N]+B[2] <= A[N]+B[3] <=…
    问题转变成，在这N^2个有序数列里，找到前k小的元素”

二、有两个序列A和B,A=(a1,a2,...,ak),B=(b1,b2,...,bk)，A和B都按升序排列。对于1<=i,j<=k，求k个最小的（ai+bj）。要求算法尽可能高效。

三、给定一个数列a1,a2,a3,...,an和m个三元组表示的查询，对于每一个查询(i，j，k)，输出ai，ai+1，...，aj的升序排列中第k个数。