动态时间规整算法

http://www.cnblogs.com/luxiaoxun/archive/2013/05/09/3069036.htmljavascript

Dynamic Time Warping(DTW)是一种衡量两个时间序列之间的类似度的方法,主要应用在语音识别领域来识别两段语音是否表示同一个单词。php

1. DTW方法原理html

在时间序列中,须要比较类似性的两段时间序列的长度可能并不相等,在语音识别领域表现为不一样人的语速不一样。并且同一个单词内的不一样音素的发音速度也不一样,好比有的人会把“A”这个音拖得很长,或者把“i”发的很短。另外,不一样时间序列可能仅仅存在时间轴上的位移,亦即在还原位移的状况下,两个时间序列是一致的。在这些复杂状况下,使用传统的欧几里得距离没法有效地求的两个时间序列之间的距离(或者类似性)。java

DTW经过把时间序列进行延伸和缩短,来计算两个时间序列性之间的类似性:python

如上图所示,上下两条实线表明两个时间序列,时间序列之间的虚线表明两个时间序列之间的类似的点。DTW使用全部这些类似点之间的距离的和,称之为归整路径距离(Warp Path Distance)来衡量两个时间序列之间的类似性。git

2. DTW计算方法:github

令要计算类似度的两个时间序列为X和Y,长度分别为|X|和|Y|。算法

归整路径(Warp Path)spring

归整路径的形式为W=w1,w2,...,wK,其中Max(|X|,|Y|)<=K<=|X|+|Y|。windows

wk的形式为(i,j),其中i表示的是X中的i坐标,j表示的是Y中的j坐标。

归整路径W必须从w1=(1,1)开始,到wK=(|X|,|Y|)结尾,以保证X和Y中的每一个坐标都在W中出现。

另外,W中w(i,j)的i和j必须是单调增长的,以保证图1中的虚线不会相交,所谓单调增长是指:

最后要获得的归整路径是距离最短的一个归整路径:

最后求得的归整路径距离为D(|X|,|Y|),使用动态规划来进行求解:

上图为代价矩阵(Cost Matrix) D,D(i,j)表示长度为i和j的两个时间序列之间的归整路径距离。

3. DTW实现:

matlab代码:

复制代码
function dist = dtw(t,r)
n = size(t,1);
m = size(r,1);
% 帧匹配距离矩阵
d = zeros(n,m);
for i = 1:n
    for j = 1:m
        d(i,j) = sum((t(i,:)-r(j,:)).^2);
    end
end
% 累积距离矩阵
D = ones(n,m) * realmax;
D(1,1) = d(1,1);
% 动态规划
for i = 2:n
    for j = 1:m
        D1 = D(i-1,j);
        if j>1
            D2 = D(i-1,j-1);
        else
            D2 = realmax;
        end
        if j>2
            D3 = D(i-1,j-2);
        else
            D3 = realmax;
        end
        D(i,j) = d(i,j) + min([D1,D2,D3]);
    end
end
dist = D(n,m);
复制代码

C++实现:

dtwrecoge.h

View Code

 dtwrecoge.cpp

View Code

C++代码下载:DTW算法.rar

 

http://blog.csdn.net/vanezuo/article/details/5586727

动态时间规整 DTW
动态 时间规整DTW(dynamic time warping)曾经是语音识别的一种主流方法。
思想是:因为语音信号是一种具备至关大随机性的信号,即便相同说话者对相同的词,每一次发音的结果都是不一样的,也不可能具备彻底相同的时间长度。所以在与已存储模型相匹配时,未知单词的时间轴要不均匀地扭曲或弯折,以使其特征与模板特征对正。用时间规整手段对正是一种很是有力的措施,对提升系统的识别精度很是有效。
动态时间规整 DTW是一个典型的优化问题,它用知足一定条件的的 时间规整函数W(n)描述输入模板和参考模板的时间对应关系,求解两模板匹配时累计距离最小所对应的规整函数。
 
™将时间规整与距离测度结合起来,采用动态规划技术,比较两个大小不一样的模式,解决语音识别中语速多变的难题;
™一种非线性时间规整模式匹配算法;
 
DTW ( Dynamic Time Warping ),即「动态时间扭曲」或是「动态时间规整」。这是一套根基于「动态规划」(Dynamic Programming,简称DP)的方法,能够有效地将搜寻比对的时间大幅下降。
DTW 的目标就是要找出两个向量之间的最短距离。通常而言,对于两个 n 维空间中的向量 x  y,它们之间的距离能够定义为两点之间的直线距离,称为尤拉距离Euclidean Distance)。
dist(xy) = |x – y
可是若是向量的长度不一样,那它们之间的距离,就没法使用上述的数学式來计算。通常而言,假設这两个向量的元素位置是表明时间,因为咱们必須容忍在时间轴的误差,所以咱们不知道两个向量的元素对应关系,所以咱们必須靠着一套有效的运算方法,才能够找到最佳的对应
关系
 
 
动态规划算法整体思想
动态规划算法基本思想是将待求解问题分解成若干个子问题
可是经分解获得的子问题每每不是互相独立的。不一样子问题的数目经常只有多项式量级。求解时,有些子问题被重复计算了许屡次。
若是可以保存已解决的子问题的答案,而在须要时再找出已求得的答案,就能够避免大量重复计算,从而获得多项式时间算法。
 
动态规划基本步骤
v找出最优解的性质,并刻划其结构特征。
v递归地定义最优值。
v以自底向上的方式计算出最优值。
v根据计算最优值时获得的信息,构造最优解
 
https://en.wikipedia.org/wiki/Dynamic_time_warping

Dynamic time warping

From Wikipedia, the free encyclopedia
 
 
 
Dynamic Time Warping
Not to be confused with the Time Warp mechanism for discrete event simulation, or the Time Warp Operating System that used this mechanism.

In time series analysisdynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences which may vary in time or speed. For instance, similarities in walking patterns could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data which can be turned into a linear sequence can be analyzed with DTW. A well known application has been automaticspeech recognition, to cope with different speaking speeds. Other applications include speaker recognition and onlinesignature recognition. Also it is seen that it can be used in partial shape matching application.

In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restrictions. The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn't guarantee the triangle inequality to hold.

 

 

Implementation[edit]

This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s and t are strings of discrete symbols. For two symbols x and yd(x, y) is a distance between the symbols, e.g. d(x, y) = | x - y |

int DTWDistance(s: array [1..n], t: array [1..m]) {
    DTW := array [0..n, 0..m]

    for i := 1 to n
        DTW[i, 0] := infinity
    for i := 1 to m
        DTW[0, i] := infinity
    DTW[0, 0] := 0

    for i := 1 to n
        for j := 1 to m
            cost:= d(s[i], t[j])
            DTW[i, j] := cost + minimum(DTW[i-1, j  ],    // insertion
                                        DTW[i  , j-1],    // deletion
                                        DTW[i-1, j-1])    // match

    return DTW[n, m]
}

We sometimes want to add a locality constraint. That is, we require that if s[i] is matched with t[j], then | i - j | is no larger than w, a window parameter.

We can easily modify the above algorithm to add a locality constraint (differences marked in bold italic). However, the above given modification works only if | n - m | is no larger than w, i.e. the end point is within the window length from diagonal. In order to make the algorithm work, the window parameter w must be adapted so that | n - m | ≤ w (see the line marked with (*) in the code).

int DTWDistance(s: array [1..n], t: array [1..m], w: int) {
    DTW := array [0..n, 0..m]
 
    w := max(w, abs(n-m)) // adapt window size (*)
 
    for i := 0 to n
     for j:= 0 to m
     DTW[i, j] := infinity
    DTW[0, 0] := 0

    for i := 1 to n
        for j := max(1, i-w) to min(m, i+w)
            cost := d(s[i], t[j])
            DTW[i, j] := cost + minimum(DTW[i-1, j  ],    // insertion
                                        DTW[i, j-1],    // deletion
                                        DTW[i-1, j-1])    // match
 
    return DTW[n, m]

Fast computation[edit]

Computing the DTW requires O(N^2) in general. Fast techniques for computing DTW include SparseDTW[1] and the FastDTW.[2] A common task, retrieval of similar time series, can be accelerated by using lower bounds such as LB_Keogh[3] or LB_Improved.[4] In a survey, Wang et al. reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that other techniques were inefficient.[5]

Average sequence[edit]

Averaging for Dynamic Time Warping is the problem of finding an average sequence for a set of sequences. The average sequence is the sequence that minimizes the sum of the squares to the set of objects. NLAAF[6] is the exact method for two sequences. For more than two sequences, the problem is related to the one of the Multiple alignment and requires heuristics. DBA[7] is currently the reference method to average a set of sequences consistently with DTW. COMASA[8] efficiently randomizes the search for the average sequence, using DBA as a local optimization process.

Supervised Learning[edit]

Nearest Neighbour Classifier can achieve state-of-the-art performance when using Dynamic Time Warping as a distance measure.[9]

Alternative approach[edit]

An alternative technique for DTW is based on functional data analysis, in which the time series are regarded as discretizations of smooth (differentiable) functions of time and therefore continuous mathematics is applied.[10] Optimal nonlinear time warping functions are computed by minimizing a measure of distance of the set of functions to their warped average. Roughness penalty terms for the warping functions may be added, e.g., by constraining the size of their curvature. The resultant warping functions are smooth, which facilitates further processing. This approach has been successfully applied to analyze patterns and variability of speech movements.[11][12]

Open Source software[edit]

  • The lbimproved C++ library implements Fast Nearest-Neighbor Retrieval algorithms under the GNU General Public License (GPL). It also provides a C++ implementation of Dynamic Time Warping as well as various lower bounds.
  • The FastDTW library is a Java implementation of DTW and a FastDTW implementation that provides optimal or near-optimal alignments with an O(N) time and memory complexity, in contrast to the O(N^2) requirement for the standard DTW algorithm. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution..
  • FastDTW fork (Java) published to Maven Central
  • The R package dtw implements most known variants of the DTW algorithm family, including a variety of recursion rules (also called step patterns), constraints, and substring matching.
  • The mlpy Python library implements DTW.
  • The pydtw C++/Python library implements the Manhattan and Euclidean flavoured DTW measures including the LB_Keogh lower bounds.
  • What about the dtw python library?
  • The cudadtw C++/CUDA library implements subsequence alignment of Euclidean-flavoured DTW and z-normalized Euclidean Distance similar to the popular UCR-Suite on CUDA-enabled accelerators.
  • The JavaML machine learning library implements DTW.
  • The ndtw C# library implements DTW with various options.
  • Sketch-a-Char uses Greedy DTW (implemented in JavaScript) as part of LaTeX symbol classifier program.
  • The MatchBox implements DTW to match Mel-Frequency Cepstral Coefficients of audio signals.
  • Sequence averaging: a GPL Java implementation of DBA.[7]
  • C/Python library implements DTW with some variations(distance functions, step patterns and windows)

Applications[edit]

Spoken word recognition[edit]

Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis which needs to be eliminated.[13] DP-matching, which is a pattern matching algorithm discussed in paper "Dynamic Programming Algorithm Optimization For Spoken Word Recognition" by Hiroaki Sakoe and Seibi Chiba, uses a time normalisation effect where the fluctuations in the time axis are modeled using a non-linear time-warping function. Considering any two speech patterns, we can get rid off their timing differences by warping the time axis of one so that the maximum coincidence in attained with the other. Moreover, if the warping function is allowed to take any possible value, very less distinction can be made between words belonging to different categories. So, to enhance the distinction between words belonging to different categories, restrictions were imposed on the warping function slope.

References[edit]

  1. Jump up^ Al-Naymat, G., Chawla, S., & Taheri, J. (2012). SparseDTW: A Novel Approach to Speed up Dynamic Time Warping
  2. Jump up^ Stan Salvador & Philip Chan, FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. KDD Workshop on Mining Temporal and Sequential Data, pp. 70-80, 2004
  3. Jump up^ Keogh, E.; Ratanamahatana, C. A. (2005). "Exact indexing of dynamic time warping". Knowledge and Information Systems 7 (3): 358–386.doi:10.1007/s10115-004-0154-9.
  4. Jump up^ Lemire, D. (2009). "Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound"Pattern Recognition 42 (9): 2169–2180.doi:10.1016/j.patcog.2008.11.030.
  5. Jump up^ Wang, Xiaoyue; et al. "Experimental comparison of representation methods and distance measures for time series data". Data Mining and Knowledge Discovery 2010: 1–35.
  6. Jump up^ Gupta, L.; Molfese, D. L.; Tammana, R.; Simos, P. G. (1996). "Nonlinear alignment and averaging for estimating the evoked potential". IEEE Transactions on Biomedical Engineering 43 (4): 348–356. doi:10.1109/10.486255PMID 8626184.
  7. Jump up to:a b Petitjean, F. O.; Ketterlin, A.; Gançarski, P. (2011). "A global averaging method for dynamic time warping, with applications to clustering".Pattern Recognition 44 (3): 678. doi:10.1016/j.patcog.2010.09.013.
  8. Jump up^ Petitjean, F. O.; Gançarski, P. (2012). "Summarizing a set of time series by averaging: From Steiner sequence to compact multiple alignment".Theoretical Computer Science 414: 76. doi:10.1016/j.tcs.2011.09.029.
  9. Jump up^ Ding, Hui; Trajcevski, Goce; Scheuermann, Peter; Wang, Xiaoyue; Keogh, Eamonn (2008). "Querying and mining of time series data: experimental comparison of representations and distance measures". Proc. VLDB Endow 1 (2): 1542–1552. doi:10.14778/1454159.1454226.
  10. Jump up^ Lucero, J. C.; Munhall, K. G.; Gracco, V. G.; Ramsay, J. O. (1997). "On the Registration of Time and the Patterning of Speech Movements".Journal of Speech, Language, and Hearing Research 40: 1111–1117.
  11. Jump up^ Howell, P.; Anderson, A.; Lucero, J. C. (2010). "Speech motor timing and fluency". In Maassen, B.; van Lieshout, P. Speech Motor Control: New Developments in Basic and Applied Research. Oxford University Press. pp. 215–225. ISBN 978-0199235797.
  12. Jump up^ Koenig, Laura L.; Lucero, Jorge C.; Perlman, Elizabeth (2008). "Speech production variability in fricatives of children and adults: Results of functional data analysis"The Journal of the Acoustical Society of America 124 (5): 3158–3170. doi:10.1121/1.2981639ISSN 0001-4966.PMC 2677351PMID 19045800.
  13. Jump up^ Sakoe, Hiroaki; Chiba, Seibi. "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech and Signal Processing 26 (1): 43–49. doi:10.1109/tassp.1978.1163055.

Further reading[edit]

  • Vintsyuk, T.K. (1968). "Speech discrimination by dynamic programming". Kibernetika 4: 81–88.
  • Sakoe, H.; Chiba (1978). "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech and Signal Processing 26 (1): 43–49. doi:10.1109/tassp.1978.1163055.
  • C. S. Myers and L. R. Rabiner.
    A comparative study of several dynamic time-warping algorithms for connected word recognition.
    The Bell System Technical Journal, 60(7):1389-1409, September 1981.
  • L. R. Rabiner and B. Juang.
    Fundamentals of speech recognition.
    Prentice-Hall, Inc., 1993 (Chapter 4)
  • Muller, M., Information Retrieval for Music and Motion, Ch. 4 (available online athttp://www.springer.com/cda/content/document/cda_downloaddocument/9783540740476-1.pdf?SGWID=0-0-45-452103-p173751818), Springer, 2007, ISBN 978-3-540-74047-6
  • Rakthanmanon, Thanawin (September 2013). "Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping".ACM Transactions on Knowledge Discovery from Data 7 (3): 10:1–10:31. doi:10.1145/2510000/2500489.

See also[edit]

相关文章
相关标签/搜索