STL::sort函数实现

时间 2019-12-15

标签 stl sort 函数实现繁體版

原文原文链接

声明：本文参考连接：STL::sort实现。html

排序是面试中常常被问及的算法基础知识点，虽然实际应用中不会直接使用，可是理解这些简单的算法知识对于更复杂更实用的算法有必定的帮助，毕竟面试总不能问的太过深刻，那么这些知识点就显得很重要了。咱们在程序中常常利用sort给序列排序，那么你知道它是什么实现的吗？面试

函数声明

#include <algorithm>

template <class RandomAccessIterator>
  void sort (RandomAccessIterator first, RandomAccessIterator last);

template <class RandomAccessIterator, class Compare>
  void sort (RandomAccessIterator first, RandomAccessIterator last, Compare comp);

来自sort - C++ Reference。STL提供了两种调用方式，一种是使用默认的 < 操做符比较，一种能够自定义比较函数。但是为何它一般比咱们本身写的排序要快那么多呢？算法

实现原理

STL中的sort不是普通的快排，除了对普通的快速排序进行优化，它还结合了插入排序和堆排序。根据不一样的数量级别以及不一样状况，能自动选用合适的排序方法。当数据量较大时采用快速排序，分段递归。一旦分段后的数据量小于某个阀值，为避免递归调用带来过大的额外负荷，便会改用插入排序；而若是递归层次过深，有出现最坏状况的倾向，还会改用堆排序。dom

普通的快速排序

参考个人另外一篇随笔：十大排序算法，有对各个排序算法的分析。其中快速排序的描述以下：函数

- 从序列中选取排序基准（pivot）；
- 对序列进行排序，全部比基准值小的摆放在基准左边，全部比基准值大的摆在基准的右边，序列分为左右两个子序列。称为分区操做（partition）；
- 递归，对左右两个子序列分别进行快速排序。oop

其中分区操做的方法一般采用两个迭代器head和tail，head从头端往尾端移动，tail从尾端往头端移动，当head遇到大于等于pivot的元素就停下来，tail遇到小于等于pivot的元素也停下来，若head迭代器仍然小于tail迭代器，即二者没有交叉，则互换元素，而后继续进行相同的动做，向中间逼近，直到两个迭代器交叉，结束一次分割。优化

快速排序最关键的地方在于基准的选择，最坏的状况发生在分割时产生了一个空的区间，这样就彻底没有达到分割的效果。STL采用的作法称为median-of-three，即取整个序列的首、尾、中央三个地方的元素，以其中值做为基准。ui

内省式排序 Introsort

不当的基准选择会致使不当的分割，会使快速排序恶化为 O(n^2)。David R.Musser于1996年提出一种混合式排序算法：Introspective Sorting（内省式排序），简称IntroSort，其行为大部分与上面所说的median-of-three Quick Sort彻底相同，可是当分割行为有恶化为二次方的倾向时，可以自我侦测，转而改用堆排序，使效率维持在堆排序的 O(nlgn)，又比一开始就使用堆排序来得好。spa

代码剖析

sort 函数中最后经过调用 __sort 函数，下面是 __sort 函数的具体实现，默认使用<操做符。code

template<typename _RandomAccessIterator, typename _Compare>
    inline void
    __sort(_RandomAccessIterator __first, _RandomAccessIterator __last,
	   _Compare __comp)
    {
      if (__first != __last)
	{
	  std::__introsort_loop(__first, __last,
				std::__lg(__last - __first) * 2,
				__comp);
	  std::__final_insertion_sort(__first, __last, __comp);
	}
    }

其中的 std::__introsort_loop 即是上面介绍的内省式排序，其第三个参数中所调用的函数 __lg() 即是用来控制分割恶化状况，具体功能相似求lg(n)（取下整），意味着快速排序的递归调用最多 2*lg(n) 层。

1.内省式：__introsort_loop

__sort函数首先调用内省式排序，__introsort_loop 函数的实现以下：

/// This is a helper function for the sort routine.
  template<typename _RandomAccessIterator, typename _Size, typename _Compare>
    void
    __introsort_loop(_RandomAccessIterator __first,
		     _RandomAccessIterator __last,
		     _Size __depth_limit, _Compare __comp)
    {
      while (__last - __first > int(_S_threshold))
	{
	  if (__depth_limit == 0)
	    {
	      std::__partial_sort(__first, __last, __last, __comp);
	      return;
	    }
	  --__depth_limit;
	  _RandomAccessIterator __cut =
	    std::__unguarded_partition_pivot(__first, __last, __comp);
	  std::__introsort_loop(__cut, __last, __depth_limit, __comp);
	  __last = __cut;
	}
    }

首先判断元素规模是否大于阀值_S_threshold，_S_threshold是一个常整形的全局变量，值为16，表示若元素规模小于等于16，则结束内省式排序算法，返回sort函数，改用插入排序 __final_insertion_sort。
若元素规模大于_S_threshold，则判断递归调用深度是否超过限制。若已经到达最大限制层次的递归调用，则改用堆排序。代码中的 __partial_sort 即用堆排序实现。
若没有超过递归调用深度，则调用函数 __unguarded_partition_pivot 对当前元素作一趟快速排序，并返回基准位置。
快排以后，再递归对右半部分调用内省式排序算法。而后回到while循环，对左半部分进行排序。源码写法和咱们通常的写法不一样，但原理是同样的，这是很明显的尾递归优化，须要注意。

2.快速排序：__unguarded_partition_pivot

快速排序函数 __unguarded_partition_pivot 的代码以下：

/// This is a helper function...
  template<typename _RandomAccessIterator, typename _Compare>
    _RandomAccessIterator
    __unguarded_partition(_RandomAccessIterator __first,
			  _RandomAccessIterator __last,
			  _RandomAccessIterator __pivot, _Compare __comp)
    {
      while (true)
	{
	  while (__comp(__first, __pivot))
	    ++__first;
	  --__last;
	  while (__comp(__pivot, __last))
	    --__last;
	  if (!(__first < __last))
	    return __first;
	  std::iter_swap(__first, __last);
	  ++__first;
	}
    }

  /// This is a helper function...
  template<typename _RandomAccessIterator, typename _Compare>
    inline _RandomAccessIterator
    __unguarded_partition_pivot(_RandomAccessIterator __first,
				_RandomAccessIterator __last, _Compare __comp)
    {
      _RandomAccessIterator __mid = __first + (__last - __first) / 2;
      std::__move_median_to_first(__first, __first + 1, __mid, __last - 1,
				  __comp);
      return std::__unguarded_partition(__first + 1, __last, __first, __comp);
    }

这个代码比较容易理解，快速排序，并返回枢轴位置。__unguarded_partition()函数采用的即是上面所讲的使用两个迭代器的方法，将序列分为左右两个子序列。其中还注意到 __move_median_to_first 函数，就是以前提到的 median-of-three，目的是从头部、中部、尾部三个数中选出中间值做为“基准”，基准保存在 __first 中，实现代码以下：

/// Swaps the median value of *__a, *__b and *__c under __comp to *__result
  template<typename _Iterator, typename _Compare>
    void
    __move_median_to_first(_Iterator __result,_Iterator __a, _Iterator __b,
			   _Iterator __c, _Compare __comp)
    {
      if (__comp(__a, __b))
	{
	  if (__comp(__b, __c))
	    std::iter_swap(__result, __b);
	  else if (__comp(__a, __c))
	    std::iter_swap(__result, __c);
	  else
	    std::iter_swap(__result, __a);
	}
      else if (__comp(__a, __c))
	std::iter_swap(__result, __a);
      else if (__comp(__b, __c))
	std::iter_swap(__result, __c);
      else
	std::iter_swap(__result, __b);
    }

3.堆排序：__partial_sort

以前在 __introsort_loop 函数中看到若是递归调用深度是否超过限制，若已经到达最大限制层次的递归调用，则改用堆排序。代码中的 __partial_sort 即用堆排序实现，其部分实现代码以下（堆排序的代码特别多）：

/// This is a helper function for the sort routines.
  template<typename _RandomAccessIterator, typename _Compare>
    void
    __heap_select(_RandomAccessIterator __first,
		  _RandomAccessIterator __middle,
		  _RandomAccessIterator __last, _Compare __comp)
    {
      std::__make_heap(__first, __middle, __comp);
      for (_RandomAccessIterator __i = __middle; __i < __last; ++__i)
	if (__comp(__i, __first))
	  std::__pop_heap(__first, __middle, __i, __comp);
    }

template<typename _RandomAccessIterator, typename _Compare>
    void
    __sort_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
		_Compare __comp)
    {
      while (__last - __first > 1)
	{
	  --__last;
	  std::__pop_heap(__first, __last, __last, __comp);
	}
    }

template<typename _RandomAccessIterator, typename _Compare>
    inline void
    __partial_sort(_RandomAccessIterator __first,
		   _RandomAccessIterator __middle,
		   _RandomAccessIterator __last,
		   _Compare __comp)
    {
      std::__heap_select(__first, __middle, __last, __comp);
      std::__sort_heap(__first, __middle, __comp);
    }

4.插入排序:__final_insertion_sort

通过__introsort_loop排序以后，元素规模小于_S_threshold，最后再次回到 __sort 函数，执行插入排序__final_insertion_sort，其实现代码以下：

/// This is a helper function for the sort routine.
  template<typename _RandomAccessIterator, typename _Compare>
    void
    __final_insertion_sort(_RandomAccessIterator __first,
			   _RandomAccessIterator __last, _Compare __comp)
    {
      if (__last - __first > int(_S_threshold))
	{
	  std::__insertion_sort(__first, __first + int(_S_threshold), __comp);
	  std::__unguarded_insertion_sort(__first + int(_S_threshold), __last,
					  __comp);
	}
      else
	std::__insertion_sort(__first, __last, __comp);
    }

结束语

好了，今天就到这里了，相信你们对STL sort也有了必定的了解，若是发现任何错误，欢迎你们批评指正，一块儿交流！

本文版权归做者AlvinZH和博客园全部，欢迎转载和商用，但未经做者赞成必须保留此段声明，且在文章页面明显位置给出原文链接，不然保留追究法律责任的权利.