数据集(一)

转载:http://www.cnblogs.com/bobomouse/archive/2007/05/26/760513.htmlhtml

 

一、气候监测数据集 http://cdiac.ornl.gov/ftp/ndp026bweb

二、几个实用的测试数据集下载的网站算法

http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的网址能够找到reuters数据集http://www.research.att.com/~lewis/reuters21578.html

如下网址上有各类数据集:
http://kdd.ics.uci.edu/summary.data.type.html

进行文本分类,还有一个数据集是能够用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html数据库

三、找了不少测试数据集,写论文的同志们确定须要的,至少能用来检验算法的效果
可能有一些不能访问,可是总有能访问的吧:机器学习

UCI收集的机器学习数据集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm
ide

statlib 
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/
学习

样本数据库
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html
测试

关于基金的数据挖掘的网站
http://www.gotofund.com/index.asp
网站

http://lans.ece.utexas.edu/~strehl/ui

reuters数据集
http://www.research.att.com/~lewis/reuters21578.html

各类数据集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/ 
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/

进行文本分类&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html


时间序列数据的网址
http://www.stat.wisc.edu/~reinsel/bjr-data/

apriori算法的测试数据
http://www.almaden.ibm.com/cs/quest/syndata.html

数据生成器的连接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html


关联:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData

WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar

癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

金融数据:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm

 

另外一我的提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的网址能够找到reuters数据集
http://www.research.att.com/~lewis/reuters21578.html

如下网址上有各类数据集:
http://kdd.ics.uci.edu/summary.data.type.html

进行文本分类,还有一个数据集是能够用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html


Download the Financial Data (~17.5M zipped file, ~67M unzipped data) 
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm


kdnuggets 相关连接数据集:
http://www.kdnuggets.com/datasets/index.html

还有另一个很好的资源网址为:http://kdd.ics.uci.edu/,里面包含的数据资源以下(按应用领域划分):

Direct Marketing 
  KDD CUP 1998 Data 

GIS 
  Forest CoverType 

Indexing 
  Corel Image Features 
  Pseudo Periodic Synthetic Time Series 

Intrusion Detection 
  KDD CUP 1999 Data 

Process Control 
  Synthetic Control Chart Time Series 

Recommendation Systems 
  Entree Chicago Recommendation Data 

Robots 
  Pioneer-1 Mobile Robot Data 
  Robot Execution Failures 

Sign Language Recognition 
  Australian Sign Language Data 
  High-quality Australian Sign Language Data 

Text Categorization 
  20 Newsgroups Data 
  Reuters-21578 Text Categorization Collection 
  NSF Research Awards Abstracts 199 0-2003 

World Wide Web 
  Microsoft Anonymous Web Data 
  MSNBC Anonymous Web Data 
  Syskill Webert Web Data 

这里又找到一个,在一个老外的blog上找到的。(儿童节前一天)
http://www.fs.fed.us/fire/fuelman/

相关文章
相关标签/搜索