| |
AI in the News
Machine Learning and Data Mining - Datasets
| |
Version |
Size / MD5 |
Description |
| Download |
v1.0 Matlab V6 (04/06/2006) |
4.3Mb (8c68b1c84edb8d28ec80a6824c6b06f1) |
This is the Yale Face Database B in a Matlab-friendly format.
Please check with the original authors about what papers to cite before using this data.
It contains all the images scaled down to 30x40 pixels (we used this for clustering).
You might need Rar to unpack it.
Also included are the indizes for the images that were used in the random 90/10 splits.
To try out the data and display image 999, type into Matlab:
load yale_facedatabase_B.mat
image(reshape(bigMatrix(999,:),30,40))
colormap(gray);
|
| Download |
v1.0 Matlab V6 (03/05/2007) |
2.5Mb (21d58a4f7e63564b3e7c52ae2974458a) |
This archive contains all the datasets we used for our ICML 2005 paper
"Clustering through ranking on Manifolds" ready for use in Matlab.
Please ensure you cite the sources of the data (e.g. UCI control, USPS, 20 Newsgroups, the face-database).
Note that the uncompressed data is > 250MByte.
|
All data provided for your personal research-use only and "AS IS". All other rights reserved. No warranties
of any kind. Insert Disclaimer here.
Links
- Gunnar Raetsch's Benchmark Datasets
Various benchmark datasets prepared for Matlab (V6 and V7). Includes BreastCancer, Cards, chess, Circle, credit, Heart1, hepatitis, HouseVotes84, Ionosphere, liver, monks3, musk, PimaIndiansDiabetes, promotergene, ringnorm, Sonar, Spirals, threenorm, tictactoe, titanic and twonorm.
Those are Benchmark Data Sets used in [RaeOnoMue01] and [MikRaeWesSchMue99]. Very good for classification tasks.
[RaeOnoMue01 Mirror] [MikRaeWesSchMue99 Mirror]
- Data from "Benchmarking Support Vector Machines"[MeyerLeischHornik02]. Very good for comparing your
classifier or regression algorithm against other algorithms (SVM, KNN, Neural Nets, Bagging, Boosting, Random Forests and others). Includes many data sets such as
liver, hepatitis, credit, monks3, HouseVotes84, Sonar, tictactoe, ringnorm,
musk, Spirals, threenorm, Ionosphere, BreastCancer, Circle, titanic, Heart1, chess, PimaIndiansDiabetes, promotergene, twonorm, Cards.
The data is in images of R. To extract
it, you can use the following R-command: for(i in (1:100)){load(sprintf("%i.RData",i)); write.table(train,file=sprintf("%itrain.txt",i));}
- UCI Machine Learning Repository - Many useful datasets
- DMOZ - Data sets for machine learning
- Clustering data sets
- A dataset for path-finding in images (Field Robotics)
- LETOR - package of benchmark data sets for LEarning TO Rank
- Delve Datasets
- KIN40K regressions data set
|