ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [Binary classification : Tabular data] / 3rd level / ์ง€๋„ํ•™์Šต
    Knowledge for Data Analysis/ํ†ต๊ณ„ ๊ณต๋ถ€ 2021. 8. 3. 20:49

     

    Kaggle Study

    Binary Classification: Tabular data

     

     

    3rd level. Home Credit Default Risk

     

    ๐Ÿ’กTabular Data๋ž€?

    : ํ‘œ๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ์˜ ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ํ˜•ํƒœ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

     

     

    1๏ธโƒฃ ๋ฌธ์ œ ์„ค๋ช…

    • ๋ถ„์„ ๋ชฉํ‘œ: ๊ณผ๊ฑฐ ๋Œ€์ถœ ์‹ ์ฒญ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์‹ ์ฒญ์ž๊ฐ€ ๋ฏธ๋ž˜์— ๋Œ€์ถœ์„ ์ƒํ™˜ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์˜ˆ์ธกํ•œ๋‹ค.
    • ํ•™์Šต ๋ฐฉ๋ฒ•: ์ง€๋„ ํ•™์Šต์˜ ๋ถ„๋ฅ˜ ๋ฌธ์ œ
    • ๋ถ„๋ฅ˜ ๋ฐฉ๋ฒ•: 0(๋Œ€์ถœ ์ƒํ™˜ ๊ฐ€๋Šฅ), 1(๋Œ€์ถœ ์ƒํ™˜ ์–ด๋ ค์›€)

     

    2๏ธโƒฃ ๋ฐ์ดํ„ฐ ์„ค๋ช…

    • ์‚ฌ์šฉ ๋ฐ์ดํ„ฐ: application_train/application_test
    • ๋ฐ์ดํ„ฐ ์‹๋ณ„์ž ์ปฌ๋Ÿผ: SK_ID_CURR (๊ณ ๊ฐ๋งˆ๋‹ค ๊ฐ€์ง€๋Š” ๊ณ ์œ  ID๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋จ)
    • ๋ฐ์ดํ„ฐ ์ •๋‹ต ์ปฌ๋Ÿผ: TARGET (0: ๋Œ€์ถœ ์ƒํ™˜ํ–ˆ์Œ 1: ์ƒํ™˜ ๋ชปํ•จ)

     

    *๋Œ€์ถœ ์ƒํ™˜: ๋Œ€์ถœ๋ฐ›์€ ์›๊ธˆ๊ณผ ์ด์ž๋ฅผ ๊ฐš๋Š” ๋ฐฉ์‹

     

     


    ์ง€๋„ํ•™์Šต๊ณผ ๋น„์ง€๋„ ํ•™์Šต


     

    ๐Ÿ’ก์ง€๋„ ํ•™์Šต(Supervised Learning)๊ณผ ๋น„์ง€๋„ ํ•™์Šต(Unsupervised Learning)์ด๋ž€?

    1๏ธโƒฃ ์ง€๋„ ํ•™์Šต

    • ์ •๋‹ต์„ ์•Œ๋ ค์ฃผ๊ณ  ๋ถ„์„์„ ์ง„ํ–‰ํ•œ๋‹ค.
    • ์˜ˆ์ธก์ด๋‚˜ ๋ถ„๋ฅ˜๋ฅผ ํ†ตํ•ด ์–ผ๋งˆ๋‚˜ ์ •๋‹ต์„ ์ž˜ ๋งž์ท„๋Š”์ง€๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
    • ์˜ˆ์ธก: Linear Regression
    • ๋ถ„๋ฅ˜: Decision Tree, Logistic Regression

     

    2๏ธโƒฃ ๋น„์ง€๋„ ํ•™์Šต

    • ์ •๋‹ต์ด ์ฃผ์–ด์ง€์ง€ ์•Š๋Š”๋‹ค.
    • ์œ ์‚ฌํ•œ ๋ฐ์ดํ„ฐ๋“ค์ด ๊ตฐ์ง‘์œผ๋กœ ๋‚˜๋ˆ ์ง€๊ฒŒ ๋งŒ๋“ค์–ด์ค€๋‹ค.
    • ๊ตฐ์ง‘ํ™”: K-Means Clustering, Text Mining

     

     

    ๋Œ“๊ธ€

Designed by Tistory.