学习using weka in your javacodejava
主要学习两个部分的代码:一、过滤数据集 2 使用J48决策树进行分类。下面的例子没有对数据集进行分割,彻底使用训练集做为测试集,因此不符合数据挖掘的常识,可是下面这段代码的做用只是为了学习using weka in javaapp
学习部分来自:http://weka.wikispaces.com/Use+WEKA+in+your+Java+code性能
part1学习
A filter has two different properties:测试
Most filters implement the OptionHandler interface, which means you can set the options via a String array, rather than setting them each manually via set-methods.
For example, if you want to remove the first attribute of a dataset, you need this filterui
weka.filters.unsupervised.attribute.Remove
with this optionthis
-R 1
If you have an Instances object, called data, you can create and apply the filter like this:lua
import weka.core.Instances; import weka.filters.Filter; import weka.filters.unsupervised.attribute.Remove; ... String[] options = new String[2]; options[0] = "-R"; // "range" options[1] = "1"; // first attribute Remove remove = new Remove(); // new instance of filter remove.setOptions(options); // set options remove.setInputFormat(data); // inform filter about dataset **AFTER** setting options Instances newData = Filter.useFilter(data, remove); // apply filter
part2spa
In case you have a dedicated test set, you can train the classifier and then evaluate it on this test set. In the following example, a J48 is instantiated, trained and then evaluated. Some statistics are printed to stdout:.net
import weka.core.Instances; import weka.classifiers.Evaluation; import weka.classifiers.trees.J48; ... Instances train = ... // from somewhere Instances test = ... // from somewhere // train classifier Classifier cls = new J48(); cls.buildClassifier(train); // evaluate classifier and print some statistics Evaluation eval = new Evaluation(train); eval.evaluateModel(cls, test); System.out.println(eval.toSummaryString("\nResults\n======\n", false));
下面是一个使用weka进行分类的小例子,后面附上实现这段过程的java代码。
设计一个简单的,低耗的可以区分成酒和白酒的感知器(sensor)
要求:
设计的感知器必须可以至少正确的区分95%的红酒和白酒的样本数据,样本数据集大小为:6497。
数据集Download from:www.technologyforge.net/Datasets
实验步骤:
一、 数据预处理:移除属性quality。在这个试验中不须要用到酒的质量,只关注对白酒和红酒分类的准确率
选中:quality->点击remove
一、 运行默认设置的J48分类器获得一个使用全部属性值得分类结果。
从下图咱们能够看到分类准确率达到99.5998%,准确率至关高
3.为了知足低耗的要求,因此咱们要尽可能使用最后的属性值也能达到95%的分类结果。这就须要重复试验。可使用正反两个实验方向的方法试错,过程比较简单。
属性选择过程:能够根据图示观察不一样属性对于分类结果的影响,通过比较观察能够看到下面两个属性是最能区分白酒和红酒的表明性属性。
分类性能:
使用java重复以上实验过程。
Javacode 以下
import weka.core.Instances; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.File; import javax.xml.crypto.Data; import weka.classifiers.Classifier; import weka.classifiers.meta.FilteredClassifier; import weka.classifiers.trees.J48; import weka.filters.Filter; import weka.filters.unsupervised.attribute.Remove; import weka.core.converters.ArffLoader; import weka.core.converters.ConverterUtils.DataSource; import weka.classifiers.Evaluation; public class RWClassifier { public static Instances getFileInstances(String filename) throws Exception{ FileReader frData =new FileReader(filename); Instances data = new Instances(frData); int length= data.numAttributes(); String[] options = new String[2]; options[0]="-R"; options[1]=Integer.toString(length); Remove remove =new Remove(); remove.setOptions(options); remove.setInputFormat(data); Instances newData= Filter.useFilter(data, remove); return newData; } public static void main(String[] args) throws Exception { Instances instances = getFileInstances("D://Weka_tutorial//WineQuality//RedWhiteWine.arff");//存储数据的位置 // System.out.println(instances); instances.setClassIndex(instances.numAttributes()-1); J48 j48= new J48(); j48.buildClassifier(instances); Evaluation eval = new Evaluation(instances); eval.evaluateModel(j48, instances); System.out.println(eval.toSummaryString("\nResults\n====\n", false)); } }
使用完整属性的分类结果(能够对比weka的运行结果,彻底一致):
Results
====
Correctly Classified Instances 6471 99.5998 %
Incorrectly Classified Instances 26 0.4002 %
Kappa statistic 0.9892
Mean absolute error 0.0076
Root mean squared error 0.0617
Relative absolute error 2.0491 %
Root relative squared error 14.3154 %
Total Number of Instances 6497