Orange的扩展插件Widgets开发(四)-Channels和Tokens

Orange的扩展插件Widgets开发(四)

Channels 和 Tokens

        咱们上次介绍的数据抽样的widget例子,在数据传输通道上是简单和直接的。widget 被设计从一个widget接收数据,处理后将Token经过另一个Channel发送出去。像下面这个图同样:
html

_images/schemawithdatasamplerB.png

        关于channels和tokens的管理,其实这里有一些更多的状况,这里咱们将更复杂的事情作一个概览,这些了解能够帮助你作出一些复杂的widgets,用于处理多路输出、多路输入的一些处理逻辑。python

多输入通道:Multi-Input Channels

        简单来讲,“multi-input” channels 就是这个widget能够与多个widgets的多个output channels进行链接。这样子的话,多个来源的数据能够被 feed 到一个Widget中进行处理,就像一个函数能够输入多个参数同样的状况。算法

        好比说,咱们想构建一个widget,将获取数据而且经过多种预测模型在之上进行测试。widget必须有 input data channel, 咱们已经知道如何进行处理。可是,不一样的是,咱们但愿链接多个widgets,像下图定义的逻辑:
app

_images/learningcurve.png

        咱们将了解如何定义learning curve widget的多个channels,以及如何管理多个input tokens。但在此以前,先简单说明一下: learning curve是用于测试机器学习算法的,以试图肯定在特定的训练集上的执行性能。为了这个,须要先抽出一个数据的子集,学习分类器,而后再在其它的数据集上进行测试。为了作这件事情 (by Salzberg, 1997),咱们执行一个k-fold cross validation, 但只使用必定比例的数据用于训练。output widget看起来像下面的样子:dom

_images/learningcurve-output.png

        如今,回到channels 和 tokens的话题。 定义咱们的widget的Input和output channels,像下面这样:机器学习

    inputs = [("Data", Orange.data.Table, "set_dataset"),
              ("Learner", Orange.classification.Learner, "set_learner",
               widget.Multiple + widget.Default)]

        不知道你注意到了没有,这个与以前定义的widgets,大部分都是相同的,除了widget.Multiple + widget.Default (from theOrange.widgets.widget namespace)。 做为input列表定义的最后一项,定义了一个Learner channel。 这个widget.Multiple + widget.Default 的意思是说,这是一个multi-input channel,而且是这个类型输入的缺省input。若是没有指定这个参数,那么缺省状况下widget.Single将被使用。这意味着,这个widget只能从一个widget接收input,而且不是缺省的通道 (缺省的channels后面再说)。ide

注意:Default flag here is used for illustration. Since “Learner”channel is the only channel for a Orange.classification.Learner type it is also the default.函数

        在 Orange中,tokens被发送是依赖于widget的id的,具备multi-input channel 仅仅告诉 Orange 发送token together with sending widget id, the two arguments with which the receiving function is called. 对于咱们的“Learner”channel,接收函数是 set_learner(),看起来像下面这个样子:性能

    def set_learner(self, learner, id):
        """Set the input learner for channel id."""
        if id in self.learners:
            if learner is None:
                # remove a learner and corresponding results
                del self.learners[id]
                del self.results[id]
                del self.curves[id]
            else:
                # update/replace a learner on a previously connected link
                self.learners[id] = learner
                # invalidate the cross-validation results and curve scores
                # (will be computed/updated in `_update`)
                self.results[id] = None
                self.curves[id] = None
        else:
            if learner is not None:
                self.learners[id] = learner
                # initialize the cross-validation results and curve scores
                # (will be computed/updated in `_update`)
                self.results[id] = None
                self.curves[id] = None

        if len(self.learners):
            self.infob.setText("%d learners on input." % len(self.learners))
        else:
            self.infob.setText("No learners.")

        self.commitBtn.setEnabled(len(self.learners))

        OK,看起来有点长、有点复杂。可是,保持耐心! Learning curve 不是一个特简单的widget。学习

这个函数中,有一些更多的代码,用于管理其特定状况的信息。要理解这个信号,咱们下面介绍其机制。咱们存储 learners (objects that learn from data) 在一个OrderedDict中: self.learners。这个词典对象是input id 和input value (the input learner itself)的Mapping类型数据。The reason this is an OrderedDict is that the order of the input learners is important as we want to maintain a consistent column order in the table view of the learning curve point scores.

上面的函数首先检查channelid 是否已经在self.learners中,若是是则删除对象的。若是learnerNone (记住:收到None值意味着链接被移除或者关闭) or invalidates the cross validation results, and curve point for that channel id, marking for update in handleNewSignals(). 一样的状况就是当咱们收到learner for a new channel id。

The function above first checks if the learner sent is empty (None). Remember that sending an empty learner essentially means that the link with the sending widget was removed, hence we need to remove such learner from our list. If a non-empty learner was sent, then it is either a new learner (say, from a widget we have just linked to our learning curve widget), or an update version of the previously sent learner. If the later is the case, then there is an id which we already have in the learners list, and we need to replace previous information on that learner. If a new learner was sent, the case is somehow simpler, and we just add this learner and its learning curve to the corresponding variables that hold this information.

The function that handles learners as shown above is the most complicated function in our learning curve widget. In fact, the rest of the widget does some simple GUI management, and calls learning curve routines from testing and performance scoring functions from evaluation.

注意,在这个widget中求值 (k-fold cross validation)实施只当给出learner, data set 和 evaluation parameters, 而且 scores are then derived from class probability estimates as obtained from the evaluation procedure. 意味着从一个 scoring function到另外一个 (and displaying the result in the table) takes only a split of a second. 查看其它的方面,获取代码: its code.

多通道输出:Using Several Output Channels

        这里没啥新鲜的,只是须要一个widget,具备几个输出通道,演示缺省channels(下面会用到)。为了这个目的,咱们修改以前构建的数据抽样的例子,让抽样数据从一个通道输出,而其它的数据从另外一个通道输出。对应的通道定义以下:

    outputs = [("Sampled Data", Orange.data.Table),
               ("Other Data", Orange.data.Table)]

        咱们使用  data sampler widget 的第三个变体。变化主要在函数selection() and commit()中:

    def selection(self):
        if self.dataset is None:
            return

        n_selected = int(numpy.ceil(len(self.dataset) * self.proportion / 100.))
        indices = numpy.random.permutation(len(self.dataset))
        indices_sample = indices[:n_selected]
        indices_other = indices[n_selected:]
        self.sample = self.dataset[indices_sample]
        self.otherdata = self.dataset[indices_other]
        self.infob.setText('%d sampled instances' % len(self.sample))
    def commit(self):
        self.send("Sampled Data", self.sample)
        self.send("Other Data", self.otherdata)

    若是widget具备同一种类型的多个通道,Orange Canvas打开一个窗口询问用户将链接到哪个通道。所以,若是咱们链接数据抽样器Data Sampler (C) widget 到 Data Table widget,以下:

_images/datasampler-totable.png

咱们获得下面的窗口请求用户输入多个通道的链接信息:

_images/datasampler-channelquerry.png

Default Channels (当使用 Input Channels of the Same Type)

Now, let’s say we want to extend our learning curve widget such that it does the learning the same way as it used to, but can - provided that such data set is defined - test the learners (always) on the same, external data set. That is, besides the training data set, we need another channel of the same type but used for training data set. Notice, however, that most often we will only provide the training data set, so we would not like to be bothered (in Orange Canvas) with the dialog which channel to connect to, as the training data set channel will be the default one.

When enlisting the input channel of the same type, the default channels have a special flag in the channel specification list. So for our new learning curvewidget, the channel specification is

    inputs = [("Data", Orange.data.Table, "set_dataset", widget.Default),
              ("Test Data", Orange.data.Table, "set_testdataset"),
              ("Learner", Orange.classification.Learner, "set_learner",
               widget.Multiple + widget.Default)]

 这个 Train Data channel是一个single-token channel,缺省的一个(第三个参数)。注意,标志能够被添加 (or OR-d)到一块儿,所以 Default + Multiple 是一个有效的标志。为了测试其是否工做,链接一个file widget到learning curve widget ,可是,什么也没有发生:

_images/file-to-learningcurveb.png

直到缺省的“Train Data”被选择时,是没有查询窗口在给定的 channels去链接和打开的。

相关文章
相关标签/搜索