任务的休眠与唤醒

时间 2019-11-17

标签任务休眠唤醒繁體版

原文原文链接

1、问题linux

任务的基本状态就是可运行与不可运行，这是一个任务的基本状态，正是运行的任务完成了真正的内核功能，而非运行的任务实现了任务的同步。因此任务的运行与非运行的转换是内核调度的一个基本功能。ide

2、设置的时机和方式函数

一、任务的去活跃优化

从调度的代码中看，一个线程设置为活跃与不活跃的两个最基本的操做分别为activate_task何deactivate_task，这两个函数完成了线程从可运行队列到不可运行队列之间的一个实质性转换。这个实质性的转换有别于经过set_current_state这种表面的标志性操做。例如，当经过set_current_state设置当前线程为TASK_INTERRUPTABLE以后，这个线程还会继续运行，直到在这个线程中运行了schedule函数位置。this

如今假设有一个任务以为本身离开某个条件或者环境就没法运行了，那么它能够简单的经过set_current_state设置本身为非RUNNING状态，而后执行schedule函数，该函数将会对当前执行schedule的任务状态进行特殊处理和实质性判断，这个能够说是set_current_state设置以后最重要的生效时机了。idea

咱们看一下这个函数对于调用线程状态的判断spa

switch_count = &prev->nivcsw;
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {其中#define TASK_RUNNING  0，其它全部的非零值表示这个任务处于不可运行状态，因此可能就要将他真正的从可运行队列中剔除了。其中的PREEMPT_ACTIVE表示这次抢占是在内核态执行的一次任务抢占，也就是说这个被抢占的任务并无直接调用这个schedule函数，而是在异常或者中断发生的时候被动调用这个从新调度函数的。这个判断对系统的统计有用。若是不是抢占，那么久表示自愿调度。
  switch_count = &prev->nvcsw;
  if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
    unlikely(signal_pending(prev))))这里判断若是一个任务时能够被信号唤醒的，而且此时它已经有信号到来，则立刻唤醒，不然可能会丢失信号。这个状况可能发生在多核中，由于另个CPU中的任务向这个任务发送一个信号，此时因为判断该任务是运行状态，因此不会作唤醒操做。而这个任务以后将本身设置为可中断，而后准备睡眠，此时要再次进行判断，不然可能会丢失信号，形成信号没法唤醒。(在单核下不知道在什么状况下出现这种状况，可能某些中断或者异常中会向当前任务发送信号，例如SIGSEGV，固然这个在正常的内核里是不会出现的)。
   prev->state = TASK_RUNNING; 这里并不会将任务真正睡眠，而是让它继续运行。
  else {
   if (prev->state == TASK_UNINTERRUPTIBLE)
    rq->nr_uninterruptible++;
   deactivate_task(prev, rq); 这里就是真正的要将线程从调度的运行队列中删除了，这个是实质性操做。
  }
}线程

deactivate_task--->>>dequeue_taskrest

static void dequeue_task(struct task_struct *p, struct prio_array *array)
{
array->nr_active--;
list_del(&p->run_list);这里是一个实质性的删除操做，全部的任务的选择都是经过这个结构来完成的。这个就是将任务p从本身的run_list中删除。此时当遍历run_list的时候就不会找到这个任务。
if (list_empty(array->queue + p->prio))
__clear_bit(p->prio, array->bitmap);
}orm

顺便看优先级队列

struct prio_array {
unsigned int nr_active;
DECLARE_BITMAP(bitmap, MAX_PRIO+1); /* include 1 bit for delimiter */
struct list_head queue[MAX_PRIO];
};

#define MAX_USER_RT_PRIO 100
#define MAX_RT_PRIO MAX_USER_RT_PRIO

#define MAX_PRIO (MAX_RT_PRIO + 40)

也就是说系统中共140个优先级，虽然前100个是实时任务，非实时任务通常经过CFS中的红黑树来实现(因此不须要优先级队列)，可是它们一样有本身对应的优先级链表头。

二、激活一个任务

激活一个任务经过activate_task接口来完成，这个接口

static void enqueue_task(struct task_struct *p, struct prio_array *array)
{
sched_info_queued(p);
list_add_tail(&p->run_list, array->queue + p->prio);全部的运行队列中的任务经过任务中的run_list链接在一块儿。
__set_bit(p->prio, array->bitmap);设置位图。
array->nr_active++;
p->array = array;这里对任务的array进行了赋值，这个值将会在deactivate_task中用到：dequeue_task(p, p->array);。
}

大部分的激活动做都是在try_to_wake_up函数中完成的，因此这里的参数sync的意义不是很清楚，从注释上看是，若是sync为1，标志新唤醒的线程不用抢占当前线程，在《深刻理解LInux内核》中也是如此说明的

A flag (sync) that forbids the awakened process to preempt the process currently running on the local CPU

。由于大部分状况下sync都是0，因此新唤醒的任务通常都会进行抢占判断

if (!sync || cpu != this_cpu) {
if (TASK_PREEMPTS_CURR(p, rq))
resched_task(rq->curr);
}

在resched_task中，事实上没有作什么实质性操做，而只是设置了一个标志，标志着在某个时间以后须要抢占，那么具体在何时抢占呢？一样是不肯定的。大部分发生在中断或者异常返回以后，若是中断或者异常返回以后没有执行，那说明极可能执行了preempt_disable，可是既然执行了disable，就必定会执行enable，在enable的时候会在此判断这个标志，若是线程标志位须要调度，就会执行调度

#define preempt_enable() \
do { \
preempt_enable_no_resched(); \
barrier(); \
preempt_check_resched(); \
} while (0)
#define preempt_check_resched() \
do { \
if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
preempt_schedule(); \
} while (0)

若是一直没有发生中断或者异常，那么不要忘记，从用户态进入内核态就是异常或者中断的一种，因此在返回用户态的时候一样会进行这个判断，从而进行调度。linux-2.6.21\arch\i386\kernel\entry.S

ENTRY(resume_userspace)
  DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
     # setting need_resched or sigpending
     # between sampling and the iret
movl TI_flags(%ebp), %ecx
andl $_TIF_WORK_MASK, %ecx # is there any work to be done on
     # int/exception return?
jne work_pending
jmp restore_all
END(ret_from_exception)

/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK \
(0x0000FFFF & ~(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
_TIF_SECCOMP | _TIF_SYSCALL_EMU))

也就是说，处理上面列出的标志以外，其它的全部的都会致使在返回用户态以前跳转到work_pending中

work_pending:
testb $_TIF_NEED_RESCHED, %cl
jz work_notifysig
work_resched:
call schedule

这里进行再次调度，因此若是设置了从新调度，那么可能在内核中发生异常或者中断以后，或者在内核preempt_enable的时候，最迟在返回用户态的时候进行调度。

三、调度的选择

array = rq->active;
if (unlikely(!array->nr_active)) {若是说active队列已空，那么切换active和expire队列，这主要是为了知足分时系统中，例如SCHED_RR和CFS调度。
  /*
   * Switch the active and expired arrays.
   */
  schedstat_inc(rq, sched_switch);
  rq->active = rq->expired;
  rq->expired = array;
  array = rq->active;
  rq->expired_timestamp = 0;
  rq->best_expired_prio = MAX_PRIO;
}

idx = sched_find_first_bit(array->bitmap);从队列中找到最高优先级的任务，
queue = array->queue + idx;队列头。
next = list_entry(queue->next, struct task_struct, run_list);队列的第一个元素，能够看到是经过run_list遍历链表。

四、2.6.37的调度器

/*
* Pick up the highest-prio task:
*/
static inline struct task_struct *
pick_next_task(struct rq *rq)
{
const struct sched_class *class;
struct task_struct *p;

/*
* Optimization: we know that if all tasks are in
* the fair class we can call that function directly:
*/
if (likely(rq->nr_running == rq->cfs.nr_running)) {简单优化，若是全部的都是CFS任务，则直接调用fair_sched_calse的调度，这在桌面系统中是比较常见的状况。
  p = fair_sched_class.pick_next_task(rq);
  if (likely(p))
   return p;
}

for_each_class(class) {不然从不一样的调度器开始选择，这样就保证了实时任务老是最先的获得调度。
  p = class->pick_next_task(rq);
  if (p)
   return p;
}

BUG(); /* the idle class will always have a runnable task */
}

各个优先级的遍历，注意，这里是一个循环，也就是说，若是第一个调度器返回为空，那么第二个调度器会被调用，因此高优先级的调度器没有必要来本身调用低优先级的调度器。

#define sched_class_highest (&stop_sched_class)
#define for_each_class(class) \
for (class = sched_class_highest; class; class = class->next)
也就是stop_sched_class是最高优先级的调度器

static const struct sched_class stop_sched_class = {
.next = &rt_sched_class,

static const struct sched_class rt_sched_class = {
.next = &fair_sched_class,

static const struct sched_class fair_sched_class = {
.next = &idle_sched_class,

static const struct sched_class idle_sched_class = {
/* .next is NULL */
/* no enqueue/yield_task for idle tasks */

这是一个静态的链表。

五、时间片实时任务

static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
{
update_curr_rt(rq);

watchdog(rq, p);

/*
* RR tasks need a special form of timeslice management.
* FIFO tasks have no timeslices.
*/
if (p->policy != SCHED_RR)
return;

if (--p->rt.time_slice)
return;

p->rt.time_slice = DEF_TIMESLICE;

/*
* Requeue to the end of queue if we are not the only element
* on the queue:
*/
if (p->rt.run_list.prev != p->rt.run_list.next) {
requeue_task_rt(rq, p, 0);
set_tsk_need_resched(p);
}
}

因为fair是按照时间来分配的，因此在时钟中断来临的时候，它是以事件为单位判断的

static void
check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
unsigned long ideal_runtime, delta_exec;

ideal_runtime = sched_slice(cfs_rq, curr);
delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;这里累加了一个任务的真实运行时间。
if (delta_exec > ideal_runtime) {
  resched_task(rq_of(cfs_rq)->curr);
  /*
   * The current task ran long enough, ensure it doesn't get
   * re-elected due to buddy favours.
   */
  clear_buddies(cfs_rq, curr);
  return;
}

而对于实时任务来讲，它因为是不能被抢占的，因此它的循环是经过时钟切换次数来判断的，每次时钟中断到来的时候认为已经完成了一个时间片。并且这个RR只是相同优先级之间的RR，当一个实时线程用完了本身的时间片以后，才会给其余实时任务是用，包括相同优先级的非RR实时任务。

static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
{
update_curr_rt(rq);

watchdog(rq, p);

/*
* RR tasks need a special form of timeslice management.
* FIFO tasks have no timeslices.
*/
if (p->policy != SCHED_RR)
return;

if (--p->rt.time_slice)
return;

p->rt.time_slice = DEF_TIMESLICE;

/*
* Requeue to the end of queue if we are not the only element
* on the queue:
*/
if (p->rt.run_list.prev != p->rt.run_list.next) {
requeue_task_rt(rq, p, 0);
set_tsk_need_resched(p);
}
}

* default timeslice is 100 msecs (used only for SCHED_RR tasks). * Timeslices get refilled after they expire. */#define DEF_TIMESLICE (100 * HZ / 1000)这是一个频率单位。