문제정의

iris데이터를 사용해서 붓꽃을 분류하는 모델을 만들어보자

데이터 수집

sklearn에서 제공하는 데이터셋 사용

In [1]:

from sklearn.datasets import load_iris
iris = load_iris()

데이터 확인

In [3]:

iris.keys()

Out[3]:

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])

데이터전처리

데이터 프레임으로 만들기

In [4]:

import pandas as pd

In [5]:

iris_pd = pd.DataFrame(iris.data, columns = iris.feature_names)
iris_pd.head()

Out[5]:

sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)01234

5.1	3.5	1.4	0.2
4.9	3.0	1.4	0.2
4.7	3.2	1.3	0.2
4.6	3.1	1.5	0.2
5.0	3.6	1.4	0.2

이상치, 결측치 확인하기

In [6]:

iris_pd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
dtypes: float64(4)
memory usage: 4.8 KB

In [7]:

iris_pd.describe()

Out[7]:

sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)countmeanstdmin25%50%75%max

150.000000	150.000000	150.000000	150.000000
5.843333	3.057333	3.758000	1.199333
0.828066	0.435866	1.765298	0.762238
4.300000	2.000000	1.000000	0.100000
5.100000	2.800000	1.600000	0.300000
5.800000	3.000000	4.350000	1.300000
6.400000	3.300000	5.100000	1.800000
7.900000	4.400000	6.900000	2.500000

탐색적 데이터 분석

모델 선택및 하이퍼 파라미터 튜닝

데이터 분리

In [8]:

from sklearn.model_selection import train_test_split

In [10]:

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size = 0.3, 
                                                   random_state = 6)

In [12]:

X_train.shape, X_test.shape, y_train.shape, y_test.shape

Out[12]:

((105, 4), (45, 4), (105,), (45,))

In [13]:

y_train

Out[13]:

array([0, 1, 0, 0, 1, 2, 2, 2, 0, 2, 0, 0, 0, 1, 2, 1, 1, 1, 2, 1, 1, 0,
       2, 0, 0, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 0, 2, 2, 0, 0, 2, 2, 1,
       2, 1, 0, 1, 2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 2, 0, 2, 1, 0, 2, 1,
       2, 1, 1, 0, 1, 2, 1, 0, 1, 0, 0, 1, 0, 1, 2, 1, 2, 2, 2, 1, 0, 2,
       0, 2, 0, 1, 2, 0, 1, 1, 0, 0, 1, 1, 2, 1, 2, 2, 2])

모델 불러오기

In [22]:

from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors = 5)

학습

In [23]:

knn_model.fit(X_train, y_train)

Out[23]:

KNeighborsClassifier()

평가

In [24]:

knn_model.score(X_train, y_train)

Out[24]:

0.9809523809523809

In [25]:

knn_model.score(X_test, y_test)

Out[25]:

0.9777777777777777

하이퍼 파라미터 튜닝하고 결과 보기

5, 6, 7단계가 한번에
n_neighbors를 바꿔가면서 train, test score를 확인하고 그래프로 그리기

In [26]:

# 점수를 지정할 list생성
train_list = []
test_list = []
n_neighbors_setting = range(1,11)
for n_n in n_neighbors_setting:
    knn_model = KNeighborsClassifier(n_neighbors=n_n)
    
    #학습
    knn_model.fit(X_train, y_train)
    #평가
    train_score = knn_model.score(X_train, y_train)
    test_score = knn_model.score(X_test, y_test)
    #저장
    train_list.append(train_score)
    test_list.append(test_score)

그래프로 확인하기

In [27]:

import matplotlib.pyplot as plt

In [28]:

plt.plot(n_neighbors_setting, train_list, label = 'Train')
plt.plot(n_neighbors_setting, test_list, label = 'Test')
plt.legend()
plt.show()

In [1]:

from IPython.core.display import display, HTML

display(HTML("<style>.container { width:90% !important; }</style>"))

스마트인재개발원 홈페이지 주소는? 바로 여기!!!

스마트인재개발원

취업과 나를 IT다! 빅데이터, 인공지능, 프로그래밍 전문 취업연계교육기관

smhrd.or.kr

저작자표시 비영리 변경금지 (새창열림)

'Data Analysis & AI' 카테고리의 다른 글

타이타닉-재난에서 배우는 머신 러닝 (Titanic - Machine Learning from Disaster)-Domain Knowledge (0)	2021.11.15

블로그의 정보

막만들자!

makeany

KNN 실습 - 국비지원무료교육

문제정의

데이터 수집

데이터 확인

데이터전처리

데이터 프레임으로 만들기

이상치, 결측치 확인하기

탐색적 데이터 분석

모델 선택및 하이퍼 파라미터 튜닝

데이터 분리

모델 불러오기

학습

평가

하이퍼 파라미터 튜닝하고 결과 보기

그래프로 확인하기

'Data Analysis & AI' 카테고리의 다른 글

블로그의 정보

활동하기

티스토리툴바

5.1	3.5	1.4	0.2
4.9	3.0	1.4	0.2
4.7	3.2	1.3	0.2
4.6	3.1	1.5	0.2
5.0	3.6	1.4	0.2

5.1	3.5	1.4	0.2
4.9	3.0	1.4	0.2
4.7	3.2	1.3	0.2
4.6	3.1	1.5	0.2
5.0	3.6	1.4	0.2

문제정의

데이터 수집

데이터 확인

데이터전처리

데이터 프레임으로 만들기

이상치, 결측치 확인하기

탐색적 데이터 분석

모델 선택및 하이퍼 파라미터 튜닝

데이터 분리

모델 불러오기

학습

평가

하이퍼 파라미터 튜닝하고 결과 보기

그래프로 확인하기

'Data Analysis & AI' 카테고리의 다른 글

블로그의 정보

활동하기

공유하기

다른 글

티스토리툴바

5.1	3.5	1.4	0.2
4.9	3.0	1.4	0.2
4.7	3.2	1.3	0.2
4.6	3.1	1.5	0.2
5.0	3.6	1.4	0.2