Dbscan: Difference between revisions
imported>Jeremy (New page: ===Purpose=== Density-based automatic sample clustering. ===Synopsis=== :[cls,eps] = dbscan(data,minpts,eps) ===Description=== DBSCAN automatically identifies clusters in data (or sco...) |
imported>Donal |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 10: | Line 10: | ||
DBSCAN automatically identifies clusters in data (or scores) using a density-based algorithm. Samples which are within an "acceptable" distance are agglomerated into a single class. Samples which are too far from any cluster and do not have a minimum number of un-assigned neighbors are assigned as "noise" (although such points may be re-assigned as a class if a class is identified acceptably close by). | DBSCAN automatically identifies clusters in data (or scores) using a density-based algorithm. Samples which are within an "acceptable" distance are agglomerated into a single class. Samples which are too far from any cluster and do not have a minimum number of un-assigned neighbors are assigned as "noise" (although such points may be re-assigned as a class if a class is identified acceptably close by). | ||
M. Ester, H.P. Kriegel, J. Sander, X. Xiaowei, in E. Simoudis, J. Han, U. Fayyad, Eds., Proceedings of 2nd Intn'l Conf on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, Portland, OR, USA, 226, 1996. (Available at: http://www.dbs.informatik.uni-muenchen.de/Publikationen/Papers/KDD-96.final.frame.pdf) | |||
Daszykowski, B. Walczak, D. L. Massart, "Looking for natural patterns in data: Part 1. Density-based approach," ''Chemom Intell Lab Sys'', '''56''', 83-92 (2001). | |||
Daszykowski, B. Walczak1, D.L. Massart, "Representative subset selection", Analytica Chimica Acta 468 (2002) 91–103. | |||
http://en.wikipedia.org/wiki/DBSCAN | |||
====Inputs==== | ====Inputs==== | ||
Line 22: | Line 30: | ||
====Outputs==== | ====Outputs==== | ||
* '''cls''' = Numerical classes for each of the m samples in the original data. Samples excluded in original dataset are always returned as class 0 (zero, unknown.) | * '''cls''' = Numerical classes for each of the m samples in the original data. Samples which are identified as noise have cls value -1. Samples excluded in original dataset are always returned as class 0 (zero, unknown.) | ||
* '''eps''' = The eps value used (useful if no eps value was supplied by user.) | * '''eps''' = The eps value used (useful if no eps value was supplied by user.) | ||
===See Also=== | ===See Also=== | ||
[[cluster]], [[knn]], [[pca]] | [[cluster]], [[knn]], [[pca]] |
Latest revision as of 12:05, 3 January 2015
Purpose
Density-based automatic sample clustering.
Synopsis
- [cls,eps] = dbscan(data,minpts,eps)
Description
DBSCAN automatically identifies clusters in data (or scores) using a density-based algorithm. Samples which are within an "acceptable" distance are agglomerated into a single class. Samples which are too far from any cluster and do not have a minimum number of un-assigned neighbors are assigned as "noise" (although such points may be re-assigned as a class if a class is identified acceptably close by).
M. Ester, H.P. Kriegel, J. Sander, X. Xiaowei, in E. Simoudis, J. Han, U. Fayyad, Eds., Proceedings of 2nd Intn'l Conf on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, Portland, OR, USA, 226, 1996. (Available at: http://www.dbs.informatik.uni-muenchen.de/Publikationen/Papers/KDD-96.final.frame.pdf)
Daszykowski, B. Walczak, D. L. Massart, "Looking for natural patterns in data: Part 1. Density-based approach," Chemom Intell Lab Sys, 56, 83-92 (2001).
Daszykowski, B. Walczak1, D.L. Massart, "Representative subset selection", Analytica Chimica Acta 468 (2002) 91–103.
http://en.wikipedia.org/wiki/DBSCAN
Inputs
- data = A double or dataset object.
Optional Inputs
- minpts = The minimum number of unclassed points which should be considered a "class" (default = 2)
- eps = The largest distance between samples considered to be related (can also be considered the minimum distance between unrelated classes.) Default: determined by the range and number of data points available.
Outputs
- cls = Numerical classes for each of the m samples in the original data. Samples which are identified as noise have cls value -1. Samples excluded in original dataset are always returned as class 0 (zero, unknown.)
- eps = The eps value used (useful if no eps value was supplied by user.)