Artificial Intelligence

(23.02.28)Python ํ”„๋กœ๊ทธ๋ž˜๋ฐ: ๋น„์ง€๋„ ํ•™์Šต - DBSCAN(Density-based spatial clustering of applications with noise)(์—ฐ์†ํ˜•+ ๋ฒ”์ฃผํ˜•)

ํ”„๋กœ๊ทธ๋ž˜๋จธ ์˜ค์›” 2023. 3. 6.

DBSCAN(Density-based spatial clustering of applications with noise)   :   ๋ฐ€๋„๊ธฐ๋ฐ˜ clustering algorithm

 

 

https://machinelearninggeek.com/dbscan-clustering/

 

DBSCAN Clustering – Machine Learning Geek

Cluster Analysis comprises of many different methods, of which one is the Density-based Clustering Method. DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. For a given set of data points, the DBSCAN algorithm clusters together

machinelearninggeek.com

 

 

core point
์ฃผ์–ด์ง„ ๋ฐ˜๊ฒฝ ๋‚ด์— minPts ๊ฐœ ์ด์ƒ์˜ ํฌ์ธํŠธ๋ฅผ ๊ฐ€์ง„ ์ 
epsilon
๋ฐ˜๊ฒฝ
minPts
core point๊ฐ€ ๋˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ฐ˜๊ฒฝ ๋‚ด์— (core point ์™ธ์—) ์ตœ์†Œํ•œ minPts ์ด์ƒ์˜ ์ ์ด ์š”๊ตฌ๋จ

๋ฐ˜๊ฒฝ ์•ˆ์— ์žˆ๋Š” ์ ์€ '์ง์ ‘ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ์ '

โ€‹

core point A์˜ ๋ฐ˜๊ฒฝ ๋ฐ–์— ์žˆ์ง€๋งŒ ์ด ๋ฐ˜๊ฒฝ ์•ˆ์˜ ๋‹ค๋ฅธ ์  B๊ฐ€  core point์ผ ๋•Œ ๊ทธ ๋ฐ˜๊ฒฝ ์•ˆ์— ๋“ค์–ด์˜ค๋Š” ์ ์€ 

'๊ฐ„์ ‘ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ์ '

โ€‹

์ง์ ‘ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ์  & ๊ฐ„์ ‘ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ์ ๋“ค์„ ํ•œ cluster๋กœ ๋ฌถ๋Š”๋‹ค.

โ€‹

์–ด๋–ค cluster์—๋„ ์†ํ•˜์ง€ ๋ชปํ•˜๋Š” ์ ๋“ค์€ 'outliers' outliers๋Š” '-1'๋กœ ํ‘œ์‹œ๋œ๋‹ค.

โ€‹

 

 

 

์ถœ์ฒ˜: https://www.reneshbedre.com/blog/dbscan-python.html

 

์œ„ ๊ทธ๋ฆผ์—์„œ  y๋Š” x์— ์˜ํ•ด '์ง์ ‘ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ์ '

x๋Š” p์— ์˜ํ•ด '์ง์ ‘ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ์ '

๋”ฐ๋ผ์„œ y๋Š” p์— ์˜ํ•ด '๊ฐ„์ ‘ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ์ '

 

 

 


 

 

์‹ค์Šต๊ณผ์ •์—์„  ์œ„์˜ ๊ฐœ๋…์˜ DBSCAN์„ ์“ฐ๊ธฐ๋ณด๋‹ค ์ด๋ฏธ ๊ฐ€๊ณต๋œ DF/Series๋ฅผ ๋ฐ์ดํ„ฐ๋กœ ์ค˜์„œ DBSCAN์ด Gower ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•ด์„œ ๋น„์œ ์‚ฌ๋„ matrix๋ฅผ ๋งŒ๋“ค๊ณ  clusteringํ•˜๋„๋ก ํ•  ๊ฒƒ์ด๋‹ค.

โ€‹

DBSCAN์€ ํ–‰ ๊ฐ„์˜ ๋น„์œ ์‚ฌ๋„๋ฅผ ์ฃผ๋ฉด ๊ทธ๊ฒƒ์„ ๊ฐ€์ง€๊ณ  clustering์„ ํ•ด์ค€๋‹ค.

โ€‹

Gower: ์—ฐ์†ํ˜•, ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ๋‹ค๋ฃฌ๋‹ค

Gower ์‚ฌ์šฉํ•˜์—ฌ ํ–‰๊ฐ„์˜ ๋น„์œ ์‚ฌ๋„ ๊ณ„์‚ฐ : create & return ๋น„์œ ์‚ฌ๋„ matrix

DBSCAN์— ๋น„์œ ์‚ฌ๋„ matrix ์ „๋‹ฌ & clustering ์š”์ฒญ: cluster data

 

 

 

๋Œ“๊ธ€