|
AI in the News
Machine Learning and Data Mining - Datasets
|
Version |
Size / MD5 |
Description |
Download |
v1.0 Matlab V6 (04/06/2006) |
4.3Mb (8c68b1c84edb8d28ec80a6824c6b06f1) |
This is the Yale Face Database B in a Matlab-friendly format.
Please check with the original authors about what papers to cite before using this data.
It contains all the images scaled down to 30x40 pixels (we used this for clustering).
You might need Rar to unpack it.
Also included are the indizes for the images that were used in the random 90/10 splits.
To try out the data and display image 999, type into Matlab:
load yale_facedatabase_B.mat
image(reshape(bigMatrix(999,:),30,40))
colormap(gray);
|
Download |
v1.0 Matlab V6 (03/05/2007) |
2.5Mb (21d58a4f7e63564b3e7c52ae2974458a) |
This archive contains all the datasets we used for our ICML 2005 paper
"Clustering through ranking on Manifolds" ready for use in Matlab.
Please ensure you cite the sources of the data (e.g. UCI control, USPS, 20 Newsgroups, the face-database).
Note that the uncompressed data is > 250MByte.
|
All data provided for your personal research-use only and "AS IS". All other rights reserved. No warranties
of any kind. Insert Disclaimer here.
Links
- Gunnar Raetsch's Benchmark Datasets
Various benchmark datasets prepared for Matlab (V6 and V7). Includes BreastCancer, Cards, chess, Circle, credit, Heart1, hepatitis, HouseVotes84, Ionosphere, liver, monks3, musk, PimaIndiansDiabetes, promotergene, ringnorm, Sonar, Spirals, threenorm, tictactoe, titanic and twonorm.
Those are Benchmark Data Sets used in [RaeOnoMue01] and [MikRaeWesSchMue99]. Very good for classification tasks.
[RaeOnoMue01 Mirror] [MikRaeWesSchMue99 Mirror]
- Data from "Benchmarking Support Vector Machines"[MeyerLeischHornik02]. Very good for comparing your
classifier or regression algorithm against other algorithms (SVM, KNN, Neural Nets, Bagging, Boosting, Random Forests and others). Includes many data sets such as
liver, hepatitis, credit, monks3, HouseVotes84, Sonar, tictactoe, ringnorm,
musk, Spirals, threenorm, Ionosphere, BreastCancer, Circle, titanic, Heart1, chess, PimaIndiansDiabetes, promotergene, twonorm, Cards.
The data is in images of R. To extract
it, you can use the following R-command: for(i in (1:100)){load(sprintf("%i.RData",i)); write.table(train,file=sprintf("%itrain.txt",i));}
- UCI Machine Learning Repository - Many useful datasets
- DMOZ - Data sets for machine learning
- A dataset for path-finding in images (Field Robotics)
- LETOR - package of benchmark data sets for LEarning TO Rank
- Delve Datasets
- KIN40K regressions data set
- Clustering Data Sets (Mammals, Birth/Death Rates, New Haven Schools, Nutrients)
- UCI and UCIKDD data sets classification and regression in Weka ARFF format. More
ARFF datasets such as Protein & Biomedical data, drug design, Reuters21578 as the ModApte split, and
various agricultural data sets
can be found here.
- Clustering data sets
- Fundamental Clustering Problem Suite (FCPS). Includes
clustering problems such as Hepta, Lsun, Tetra, Chainlink, Atom, EngyTime,
Target, TwoDiamonds, Wingnut and Golfball.
- RCV1 Text Categorization Test Collection
|