“贪心算法” 算是 "动态规划" 的前置课程。html
在数据结构graph中的优化问题也大量涉及到了”Greedy Method"。算法
也有五大经常使用算法之说:算法设计之五大经常使用算法设计方法总结数据结构
1、【分治法】app
2、【动态规划法】ide
3、【贪心算法】wordpress
4、【回溯法】post
5、【分支限界法】优化
Given two sequences of letters A and B, find if B is a subsequence of A in the
sense that one can delete some letters from A and obtain the sequence B.ui
Greedy领先的思想。(always stay ahead)this
Ref: https://www.geeksforgeeks.org/given-two-strings-find-first-string-subsequence-second/
A上的指针找B的头char,只要同样,就开始“齐头并进”对比。
若是出现不同,就只移动A上的指针;毕竟,只要对比过的,对以后的也是有意义的。
There is a line of 111 stalls, some of which need to be covered with boards.
You can use up to 11 boards, each of which may cover any number of
consecutive stalls.Cover all the necessary stalls, while covering as few total stalls as possible
一块大板,不断去掉大空隙。直到大板被分为要求的11个。
本质:排序“间隙”,先 eliminate 最大的间隙(贪心的体现)
Ref: https://projectalgorithm.wordpress.com/2011/04/25/greedier-than-you/
给定一个长度为m的区间,再给出n条线段的起点和终点(注意这里是闭区间),
求最少使用多少条线段能够将整个区间彻底覆盖。
区间长度8,可选的覆盖线段[2,6],[1,4],[3,6],[3,7],[6,8],[2,4],[3,5]
区间覆盖问题 《区间彻底覆盖》
先按照“起始点”排序,结果以下:
Prove:
须要最少的线段进行覆盖,那么选取的线段必然要尽可能长,而已经覆盖到的区域以前的地方已经无所谓了。
贪心策略就是:head是上一个tail以前的的状况下,看谁的len更长。
给定一个长度为m的区间,再给出n条线段的起点和终点(开区间和闭区间处理的方法是不一样,这里以开区间为例),
问题是从中选取尽可能多的线段,使得每一个线段都是独立的,就是不和其它有任何线段有相交的地方。
例如:
区间长度8,可选的覆盖线段[2,6],[1,4],[3,6],[3,7],[6,8],[2,4],[3,5],选的不能相交哦!
Ref: https://blog.csdn.net/chenguolinblog/article/details/7882316
对线段的左端点进行升序排序,每加入一个线段,而后选择后面若干个(也有多是一个)右端点相同的线段,选择左端点最大的那一条,若是加入之后不会跟以前的线段产生公共部分,那么就加入,不然就继续判断后面的线段
将每个区间按右端点进行递增顺序排列,拍完序后为[1,4],[2,4],[2,6],[3,5],[3,6],[3,7],[6,8]
第一步选取[2,4],发现后面只能加入[6,8],因此区间的个数为2
由于须要尽可能多的独立的线段,因此每一个线段都尽量的小!
对于同一右端点,左端点越大,线段长度越小。
用最少的钉子穿插全部的木条。
Greedy的对象的选择问题:
—— 给后者更多活路
Find a maximum size subset of compatible activities.
求“一段时间内”能容纳的“最多活动数”。《最大不相交覆盖》
最先结束时间的活动优先,保最多的空余时间,才可能会有更多的“活动数”,体现“贪心”。
Transforming any optimal solution to the greedy solution with equal number of
activities
find that proving the greedy solution is also optimal.
证实:
greedy exchange, 即证实greedy所得结论不会worse.
Extended:
“最多活动数” ==> "总的最长活动时间“,且”每一个活动时间不等“,则,greedy失效,需dynamic programming.
There are N robbers who have stolen N items. You would like to distribute the items
amongst the robbers (one item per robber). You know the precise value of each item.
Each robber has a particular range of values they want their item to be worth(too cheap and they will not have made money, too expensive and they will draw a lot of attention).
Devise an algorithm that can distribute the items so each robber is happy or determines that there is no such distribution.
From: 从零开始学贪心算法
“最大活动数”的变形题。
若是咱们每次都选择开始时间最先的活动,不能获得最优解:
若是咱们每次都选择持续时间最短的活动,不能获得最优解:
(贪心体现)
能够用数学概括法证实,咱们的贪心策略应该是每次选取结束时间最先的活动。
直观上也很好理解,按这种方法选择相容活动为未安排活动留下尽量多的时间。这也是把各项活动按照结束时间单调递增排序的缘由。
Sol:
对物品价值v升序排序。遍历每个物品价值 v:
最小bound在范围内,即当前v的左边,最大bound也在(这是默认确定的),这些 j 构成一个集合。
分配集合中“最大bound”最小的那个 to robber。
例如分配v3时的 j3, j4:j4的tail大,因此把机会留给j3。
隐含的道理是:
Prove:
“Cut-and-paste" arguments.
改变一个逆序,不会变得更糟。由于,减小一个逆序,例如 j2从新在j1以前,那么j2的deadline更久,就更能成立!
Schedule all the jobs so that the lateness of the job with the largest lateness is minimised.
最小化任务延迟
只关心deadlines,体现了贪心。
证实:(交换论证)
关键步骤的证实是"减少一个逆序de调度致使的最大延迟不会更糟".
Along the long, straight road from Loololong to Goolagong houses are
scattered quite sparsely, sometimes with long gaps between two
consecutive houses. Telstra must provide mobile phone service to people
who live alongside the road, and the range of Telstras cell base station is
5km.Design an algorithm for placing the minimal number of base stations alongside the road, that is sufficient to cover all houses.
一个思考:从左到右,从右到左,既然都是greedy,minimum是同样的,但stations的位置却不一样。
有点相似以前board覆盖stall的习题,也相似“插棍子”。从左往右时,关注覆盖左边的边缘,体现“贪心”。
Assume you are given n sorted arrays of different sizes. You are allowed
to merge any two arrays into a single new sorted array and proceed in
this manner until only one array is left.Design an algorithm that achieves this task and uses minimal total number of moves of elements of
the arrays. Give an informal justification why your algorithm is optimal.
相似Merge sort过程的,huffman code原理的东东。
较小的块优先合并。(合并时有排序过程)
合并后的较大的块,若是还有后续的操做,那么前面合并得越大,将会成为后续移动操做中的累赘。
A list of n files fi of lengths li which have to be stored on a tape.
Each file is equally likely to be needed. To retrieve a file, one must start from
the beginning of the tape and scan it until the tape is found and read.Order the files on the tape so that the average (expected) retrieval time
is minimised.
若是p再也不均匀,则比较P/L,证实以下。
把长度最小的放前面,平均读取时间最小;这体现了贪心。
若是非均匀,则“假设swap其中的两个”,再比较E。(具体见上图公式)
只有一台机器,单线程执行某我的交代的的任务,任务的重要性不一样。
最小化 总的“重要性*截止时间”
贪心比较"重要性密度":Schedule jobs in decreasing order of the ratio ri = wi /ti (重要性/任务时长)
prove:
假设有更好的方案。然后减小逆序看变化。
相似 the tape storage problem。
You are given a connected graph with weighted edges. Find a spanning tree
such that the largest weight of all of its edges is as small as possible.
求增强连通图的最小生成树
为什么最优?
Ref: http://www.javashuo.com/article/p-pdhqlavg-mt.html
Goto: [Algorithm] Graph
Design an algorithm which produces a minimum spanning tree T 0 for the
new graph containing the additional vertex vn+1 and which runs in time O(n log n).
New vertex与其余n个vertex的edge 作排序,选最小的,体现了“贪心”。
There are n radio towers for broadcasting tsunami warnings. You are given the coordinates of each tower and its radius of range.
When a tower is activated, all towers within the radius of range of the tower will also activate, and those can cause other towers to activate and so on.
You need to equip some of these towers with seismic sensors so that when these sensors activate the towers where these sensors are located all
towers will eventually get activated and send a tsunami warning.
The goal is to design an algorithm which finds the fewest number of towers you must equip with seismic sensors.
总有一个塔,在连锁反应中能激活最多的其余塔。给它安装报警器便可。
Partition the vertices of G into k disjoint subsets so that the minimal distance between two points belonging to different sets of the partition is as large as possible.
Thus, we want a partition into k disjoint sets which are as far apart as possible.
Sort the edges in increasing order and start performing the usual
Kruskal’s algorithm for building a minimal spanning tree,
but stop when you obtain k trees, rather than a single spanning tree.
最小生成树的生成过程当中,红线确定大于任何一条蓝线,体现了“贪心”。
时间复杂度:
N = n^2条边,O(N * N log N).
采用“并查集”数据结构后,
we make at most 2n2 calls of the Find operation and at most n calls of the Union operation.
Assume that a weighted (undirected) graph G = (V, E) has all weights of edges
distinct and that its set of vertices V has been partitioned into two disjoint subsets,
X and V \X and assume that an edge e = (u, v) is the smallest weight edge whose
one end belongs to X and the other end to V\X (Y). Prove that every spanning tree
must contain edge e.
证实过程:(假设法)
通过e,那么e把图分为了两份。
你丫说不通过e,那么就是通过其余边儿咯?加上你说的这条边,但暂时不删除e,是否是就构成了一个circle?
circle里,谁最小?!固然e最小!
有最小的e,为什么还要你说的那条边?!故,你的逻辑有矛盾,得证。
Extended:
Let G = (V, E) be a weighted (undirected) graph has all weights
of edges distinct and let e be the highest weight edge in C.
Prove that e cannot belong to the minimum spanning tree.
证实 circle 里的最大边,不可能属于MST(最小生成树)
反证法:
若是是的话,这条边把图一分为二,再加个其余边,假设为其余方案中选中了这条,而不是这个最大边e,
那么,又出现了一个circle。
可见,还不如不要最大边e为好。
Extended:
Assume that you are given a weighted (undirected) graph G = (V, E) with all weights of edges distinct and its minimum spanning tree T.
Assume now that you add a new edge e to G. Design a linear time algorithm which produces the minimum spanning tree for the new graph with the additional edge.
在已有的MST中加了新边e,如何更新MST(最小生成树)
Sol:
(1) 加了新边后会出现一个新circle,
(2) 而后删掉环上的最大边。
Scheduling unit jobs with penalties and deadlines.
The problem of scheduling unit-time tasks with deadlines and penalties for a single processor has the following inputs:
a set S = {1, 2, . . . , n} of n unit-time tasks;
a set of n integer deadlines d1, d2, . . . , dn, such that each di satisfies 1
di
n and task i is supposed to finish by time di; and
a set of n nonnegative weights or penalties w1,w2, . . . , wn, such that a penalty wi is incurred if task i is not finished by time di and no penalty is incurred if a task finishes by its deadline.
We are asked to find a schedule for S that minimizes the total penalty incurred for missed deadlines.
Ref: 一个任务调度问题-----算法导论
Theory:
实现任务的最优调度主要就是利用贪心算法中拟阵的思想。
若是S是一个带期限的单位时间任务的集合,且I是全部独立的任务集构成的集合,则对应的系统 M =(S,I)是一个拟阵。知足以下条件:
利用拟阵解决任务调度问题的算法原理主要就是:
将最小化迟任务的惩罚之和问题 ----> 转化为 ----> 最大化早任务的惩罚之和的问题,
也就是说在任务调度的时候 优先选择当前任务序列中惩罚最大的任务(体现了"贪心")。
这里,假设集合A存听任务的一个调度。若是存在关于A中任务的一个调度,使得没有一个任务是迟的,称任务集合A是独立的。
Prove:
(1) 先证实其是拟阵
(2) 可采用最大化早任务的惩罚和的"贪心"算法。
Extended:
O(n)次独立性检查的每一次都用O(n)时间。如何优化?
并查集。
实验操做
n取值为7,每一个任务的期限为4, 2, 4, 3, 1, 4, 6,对应的惩罚为70, 60, 50, 40, 30, 20, 10。
放弃了a5, a6。
Assume you have $2, $1, 50c, 20c, 10c and 5c coins to pay for your lunch.
Design an algorithm that, given the amount that is a multiple of 5c, pays it with a minimal number of coins.
明显是从大面值开始。相似于人民币问题,找零时符合贪心算法。
Prove:
知足“最优子结构性质”,“贪心选择性质”。link
一个问题的最优解包含其子问题的最优解。(子结构也是子问题的最优解)
例如95c = 50c + 20c + 20c + 5c 这个贪心算法的结果是最优解,是知足optimal substrcuture的。
少一个20c,本应该推断是75c的最优解;
75c可鞥有更好的解?“两张纸币就能搞定“。
那么在你这个假设的基础上能够认为 加一个20c成为了“仅需3张纸币就能达到95c的最优解“,这与原事实冲突!
故,贪心算法所得结果是知足“最优子结构性质”。
总体最优解,能够经过一系列局部最优的选择来达到。(每张纸币的量已经是最优,不可能更大)
贪心算法的结果应保证了“每一个面额不可能更多,即已经是最大”。由于贪心嘛。
由于,总额不变,纸币量减小的话:减小任意的相对小的纸币(这致使须要更大的面额的纸币来填充差额)
贪心解已知足每张纸币面额达到了最大 (贪心的本质),故产生矛盾。
故,原问题知足“贪心选择性质“。
贪心算法(按单位重量价值排序)(含为何不能够解决) goto: 0-1背包问题、贪心算法、动态规划
Prove:
常识一:
对于最优解而言,若是使用了面值为 ci 的硬币去找零,那么 ci 最多只能使用 c-1 个。若使用c个的话,c*ci 意味着可使用一张更大面额纸币来替换众多小纸币。
根据常识一,
若是非贪心是最优的,非贪心使用的全部面值为c^i的硬币个数应该小于c。不然,不是最优!
Suppose you have n video streams that need to be sent, one after another, over a communication link.
Stream i consists of a total of bi bits that need to be sent, at a constant rate, over a period of ti seconds.
You cannot send two streams at the same time, so you need to determine a schedule for the streams:
an order in which to send them. Whichever order you choose,
there cannot be any delays between the end of one stream and the start of the next.
Suppose your schedule starts at time 0. We assume that all the
values bi and ti are positive integers. Now, because you're just one user, the link does
not want you taking up too much bandwidth, so it imposes the following constraint,
using a fixed parameter r:For each natural number t > 0, the total number of bits you send over the time interval from 0 to t cannot exceed rt.
Note that this constraint is only imposed for time intervals that start at 0, not for
time intervals that start at any other value. We say that a schedule is valid if it
satisfies the constraint.Example.
Suppose we have n = 3 streams, with (b1, t1) = (2000, 1), (b2, t2) = (6000, 2), (b3, t3) = (2000, 1), and suppose the link’s parameter is r = 5000.
Then the schedule that runs the streams in the order 1, 2, 3, is valid, since the constraint (*) is satisfied:
- t = 1: the whole first stream has been sent, and 2000 < 5000 · 1
- t = 2 : half the second stream has also been sent, and 2000 + 3000 < 5000 · 2.
Similar calculations hold for t = 3 and t = 4.
不一样的stream,单位时间能发送的bit不一样。(理解为压缩的效率不一样便可)
右侧的 rt 表明了一种动态的limitation,理解为:带宽限制。
这里只是判断下:是否存在 知足如此条件的schedule。
贪心天然是:
能尽量地 reach/follow this limitation as far as possible。
找最可能的方式,便是保持limitation的距离越远越好。
Design:
(a) In O(nlogn), order the streams in increasing order of bi/ti (压缩率) , and check if this schedule has the desired property.
(b) To get an ordering in O(n) time define si = r*ti − bi and schedule streams so that you
start with all streams for which si is non-negative, in any order, followed by those for which si is negative, also in any order.
(由于只要把小于r*ti的排在前面,大于的排在后面便可,至于排序操做,确实是多余)
Prove:
交换论证(exchange argument)
End.