딥러닝 공부

카테고리 없음

딥러닝 공부

chacha001 2024. 10. 31. 22:46

아자아자

열살열살

딥러닝의 히든 레이어란?

딥러닝에서 히든 레이어는 입력과 출력사이에 위치한 층으로, 모델이 복잡한 패턴과 관계를 학습할 수 있도록 돕는 역할을 한다.

히든 레이어의 역할

입력층(Input Layer)으로 들어온 데이터를 여러 단계에 걸쳐 가공하고 **특징(feature)**을 추출하는 역할을 한다.
각 히든 레이어는 이전 레이어로부터 정보를 받아 가공한 후, 다음 레이어로 전달한다.
네트워크가 깊어질수록 더 복잡한 패턴을 학습할 수 있으며, 이를 통해 더욱 정확한 예측을 수행한다.

활성화 함수

**활성화 함수(Activation Function)**는 딥러닝 모델의 **뉴런(노드)**이 얼마나 활성화(작동)될지 결정하는 수학적 함수예요. 입력 값을 조정해 다음 층으로 보낼 출력 값을 결정하는 역할을 하죠. 활성화 함수 덕분에 딥러닝 모델이 복잡한 패턴을 학습하고, 비선형 관계도 이해할 수 있어요.

왜 활성화 함수가 필요한가요?

활성화 함수가 없다면, 입력 데이터가 층을 거칠 때마다 단순히 더해지거나 곱해지기만 해요. 그래서 입력과 출력의 관계가 직선 관계(선형적 관계)로만 연결되죠.

활성화 함수를 사용하면, 데이터를 비선형적으로 변환할 수 있어서 복잡한 데이터도 학습할 수 있게 돼요!

활성화 함수 정리

함수	출력 범위	주요 특징	사용 용도
ReLU	0 이상	음수는 0, 양수는 그대로	대부분의 은닉층
Sigmoid	0 ~ 1	확률 값처럼 해석 가능	이진 분류 문제
Tanh	-1 ~ 1	출력 값이 0에 가까워지기 쉬움	자연어 처리 등

활성화 함수는 데이터를 적절하게 변형해 모델이 더 똑똑하게 학습할 수 있게 도와주는 중요한 역할을 한다.

이진분류 실습해보기

컬럼들을 분석해서 이직여부를 예측해보자

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import *
from sklearn.preprocessing import MinMaxScaler

from keras.models import Sequential
from keras.layers import Dense, Input
from keras.backend import clear_session
from keras.optimizers import Adam

# 학습곡선 함수
def dl_history_plot(history):
    plt.figure(figsize=(10,6))
    plt.plot(history['loss'], label='train_err', marker = '.')
    plt.plot(history['val_loss'], label='val_err', marker = '.')

    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend()
    plt.grid()
    plt.show()

딥러닝 학습 과정에서 **학습 곡선(learning curve)**을 그려주는 함수이다.

학습 곡선은 **에포크(epoch)**가 진행될수록 모델의 **오차(Loss)**가 어떻게 변하는지를 시각화하여 학습 상태를 평가하는 데 유용합니다. 오버피팅 여부도 확인 가능하다. (훈련 오차는 낮은데 검증 오차가 높으면 오버피팅일 가능성 높음.)

# data data
path = "/Attrition_train_validation.CSV"
data = pd.read_csv(path)
data['Attrition'] = np.where(data['Attrition']=='Yes', 1, 0)
data.head(10)

#데이터 전처리
target = 'Attrition'
# 불필요한 변수 제거
data.drop('EmployeeNumber', axis = 1, inplace = True)
x = data.drop(target, axis = 1)
y = data.loc[:, target]

#가변수화
dum_cols = ['BusinessTravel','Department','Education','EducationField','EnvironmentSatisfaction','Gender',
            'JobRole', 'JobInvolvement', 'JobSatisfaction', 'MaritalStatus', 'OverTime', 'RelationshipSatisfaction',
            'StockOptionLevel','WorkLifeBalance' ]

x = pd.get_dummies(x, columns = dum_cols ,drop_first = True)
print(x.head())


#데이터 분할
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size = 200, random_state = 2022)

#스케일링
scaler = MinMaxScaler()
x_train = scaler.fit_transform(x_train)
x_val = scaler.transform(x_val)

y_train.value_counts() / len(y_train)

모델링 코드 (히든레이어, 노드수 조절)

n = x_train.shape[1]
n

clear_session()

model = Sequential([Input(shape = (n,)),
                    Dense( 16, activation = 'relu' ),
                    Dense( 8 , activation = 'relu' ),
                    Dense( 4 , activation = 'relu'),
                    Dense( 1 , activation= 'sigmoid')])
model.summary()

model.compile( optimizer = Adam(learning_rate = 0.001), loss = 'binary_crossentropy')
hist = model.fit( x_train, y_train, epochs = 50, validation_split= .2 ).history

모델 1

dl_history_plot(hist)

pred = model.predict(x_val)
pred = np.where( pred >= 0.5 , 1 , 0)

print(confusion_matrix(y_val, pred))
print(classification_report(y_val, pred))

모델 1 평가

n = x_train.shape[1]
n

clear_session()

model = Sequential([Input(shape = (n,)),
                    Dense( 16, activation = 'relu' ),
                    Dense( 8 ,  activation = 'relu' ),
                    Dense( 4 ,  activation = 'relu'),
                    Dense( 1 , activation= 'sigmoid')])
model.summary()

model.compile( optimizer = Adam(learning_rate = 0.001), loss = 'binary_crossentropy')
hist = model.fit( x_train, y_train, epochs = 50, validation_split= .2 ).history

모델 2

dl_history_plot(hist)

pred = model.predict(x_val)
pred = np.where( pred >= 0.5 , 1 , 0)

print(confusion_matrix(y_val, pred))
print(classification_report(y_val, pred))

모델 2 평가

from imblearn.over_sampling import RandomOverSampler

ros = RandomOverSampler()
x_train_ros, y_train_ros = ros.fit_resample(x_train, y_train)

n = x_train.shape[1]
n

clear_session()

model2 = Sequential([Input(shape = (n,)),
                     Dense( 16, activation = 'relu' ),
                    Dense( 8 ,  activation = 'relu' ),
                    Dense( 4 ,  activation = 'relu'),
                    Dense( 1 , activation= 'sigmoid')])
model2.summary()

model2.compile( optimizer = Adam(learning_rate = 0.001), loss = 'binary_crossentropy')
hist = model2.fit( x_train_ros, y_train_ros, epochs = 150, validation_split= .2 ).history

dl_history_plot(hist)

pred = model2.predict(x_val)
pred = np.where( pred >= 0.5 , 1 , 0)

print(confusion_matrix(y_val, pred))
print(classification_report(y_val, pred))

모델 3 (resampling) 과정, 평가 코드

세 모델의 비교 요약

모델 구성 학습 데이터 에포크 특이 사항

모델 1	16-8-4 히든 레이어 구조	원본 데이터	50	기본 설정
모델 2	모델 1과 동일	원본 데이터	50	중복 코드
모델 3	16-8-4 히든 레이어 구조	오버샘플링 데이터	150	오버샘플링을 통해 데이터 불균형 해결

주요 차이점

모델 1과 2는 동일한 구조와 데이터로 50 에포크 학습하여 데이터 불균형 문제는 해결하지 않음.
모델 3은 RandomOverSampler로 데이터 불균형 문제를 해결하고 150번 학습해, 더 나은 성능을 기대할 수 있습니다.

끝~~

하

현재글딥러닝 공부

차차

데이터 분석 마스터를 향하여 . . .

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

차차