[2] Data Augmentation 과 Pre-training / Self training

728x90

"Deep Learning은 항상 배고프다"

Data Augmentation

-> pytorch 로도 구현이 되어 있고 Opencv나 Numpy를 이용할 수 있다

Affine Transforamtion (Shear Transformation)

직사각형의 이미지를 약간 평행사변형 + 회전 시킨다고 볼 수 있다

이거 기준들이 좀 애매하기 때문에 전 이미지의 세 점을 이후 이미지의 세 점과 매핑시키는 방싣ㄱ으로 구현

rows, cols, ch = image.shape
pts1 = np.float32([[50,50],[200,50],[50,200]])
pts2 = np.float32([[10,100],[200,50],[100,250]])
M = cv2.getAffineTransform(pts1,pts2)
shear_img = cv2.warpAffine(image,M,(cols,rows))

여기에서 Wrapping 이랑 기하학적으로 영상을 비틀거나 회전하는 것을 말한다

CutMix

이미지 하나를 잘라서 다른 이미지에 붙이는 방식

가장 중요한 것은 "라벨도 비율에 따라서 합성해야 한다"

CutMix : https://arxiv.org/abs/1905.04899

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as opposed to head of

arxiv.org

이러면 물체 위치를 더 정확하게 catch하면서 학습을 하고 구분하는 능력을 키울 수 있다

RandAugment

Augment방법들이 엄청 많고 또 Sequnce에 따라서도 달라지니깐 랜덤하게 적용해보고 가장 성능이 좋은 것을 가져다 쓰자

(Augment 방법들을 조합해줌)

RandAugment https://arxiv.org/abs/1909.13719

RandAugment: Practical automated data augmentation with a reduced search space

Recent work has shown that data augmentation has the potential to significantly improve the generalization of deep learning models. Recently, automated augmentation strategies have led to state-of-the-art results in image classification and object detectio

arxiv.org

위에 조합들 뿐만 아니라 두 가지 중요한 Hyperparameter가 존재

어떤 Augment들을 사용할 것인지에 대한 갯수 N (논문에서는 14개 중 N 개 선택)
얼마나 강하게 적용할 것인지에 대한 M (0~10 추천)

아래의 그림은 N=2 , M = 9, 17, 28 을 적용한 그림이다

코드 구현은 다음과 같이 할 수 있다

이렇게 하면 대부분 Task에서 좋은 성능을 보였다고 한다

기타

Brightness adjustment
- 밝기를 이용한 증강이 필요한 이유는 데이터가 밝기만 하면 어두운 게 왔을 때 뻗어 버림
- 구현은 그냥 Numpy를 통해 값을 더해주는 형식(255 넘어가는 건 처리해줘야 한다)
Rotate, flipping
- cv2.rotate(image, cv2.ROTATE_90_CLOCKWISE) #rotated
- cv2.rotate(image, cv2, ROTATE_180) # flipped
Crop
- 간단하고 생각보다 성능이 굉장히 좋음
  - 중요한 part에 대해 강하게 학습
  - 구현은 그냥 Numpy를 indexing해서 구현

Leveraging pre-trained information

다른 데이터로 학습된 정보를 사용

Transfer learning

새로 학습할 데이터가 너무 작은 경우
- Pre-trained된 Conv layer는 그대로 두고(Freeze) FC layer만 바꿔서 재학습

새로 학습할 데이터가 적당히 있는 경우 (Fine tuning)
- Conv layer의 lr를 조금만 주고 FC layer에는 lr을 많이 줌

Knowlege Distillation (일종의 모델 압축도 가능)

Pre-trained 된 모델을 이용하여 지식을 습득하는 방법

Teacher - student network

Teacher : Pre-trained Model // Student : Not trained Model(Teacher보다 작은 모델 쓰는 게 국룰임)

1. Non - labeled Data (Unsupervised-learning)

새로운 데이터에 대해서 Teacher Student 각각 Output 를 뱉게 함
이 Output에 대해서 KL diversion loss를 Student한테만 적용 (Teacher과 유사해지라고)

2. Labeled Data (Supervised-learning)

Hard labeling이 아닌 Soft labeling을 사용 (오답의 확률 또한 학습하기 위해서)
- one-hot encoding을 하게 되면 (일반적인 Softmax : 0.1,0.15,0.75 이런식으로 나와서 너무 극단적임)
- 그래서 Temperature SoftMax를 사용
- Distillation은 각ㄱ각의 값들이 전반적인 경향성을 나타낸다 가정하고 이 경향성을 student가 학습하길 원함
Distillation loss 와 Student loss를 사용 (이 두개의 Loss를 Weighted Sum)
이 loss 는 student model에게만 적용시킨다
- Distillation loss
  - Teacher의 output과 student output과의 차이
  - KL diversion with soft label and soft Prediction
    - KL diversion 은 두 값의 거리를 재서 가까워질 수 있는 loss를 만듬
- Student loss
  - 진짜 label과 student output의 차이
  - 이건 일반적으로 사용하는 CrossEntropy를 사용 with(Hard label , Soft Prediction)
    - 애초에 정답이 Hard label로 주어지니깐

Leveraging Unlabeled Dataset

Semi- supervised Learning

labeled Data의 개수가 너무 적으니 Unlabled Data를 이용할 수 있는 방안을 제시

1. Labeled Data set를 가지고 Model을 학습한다

2. 학습된 Model을 가지고 Unlabeled Data에 label을 만들어 낸다 (Pseudo-labeling)

3. 기존 Labeled된 Data + Pseudo-labeling Data를 합쳐서 재학습한다

Self-learning

Data Augmentation 이랑 Teacher-student networks랑 semi-supervised learning 썩은 미친 방법

이거 ImageNet 기준 201년에 SOTA였다고 함

Self-training with noisy student

다음 같은 Step을 밟음 (여기서는 초반만 Student model이 더작음)

-점점 Student모델을 키워가고 이게 Teacher로 들어가서 Student mode이 더 큰 꼴

-이 부분이 Knowledge Distillation 과의 차이가 난다 (<- 이거는 무조건 Student가 작음)

1M의 ImageNet(labeled Data)를 가지고 Teacher Model을 학습
300M개의 Unlabeled Data를 가지고 Pseudo-labeling을 진행
ImageNet Data와 Pseudo-labeling Data를 합친 301M개의 데이터를 가지고 Student모델을 학습 (여기서 RandAug 사용)
기존의 Teacher Model을 RandAug 301M Data로 학습한 Student model로 바꿈
새로운 Student model을 정의 (기존 Student model 보다 더 크게)
위에 1~5 Step 반복

위 과정을 정리하면 아래와 같음

관련 논문 : https://arxiv.org/abs/1911.04252

Self-training with Noisy Student improves ImageNet classification

We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. Noisy Student Training achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires

arxiv.org

728x90

저작자표시 비영리 변경금지 (새창열림)

'AI Track > CV Remind' 카테고리의 다른 글

[1] Computer Vision 소개 및 대표 모델 (1)	2024.02.12

Data Augmentation

Affine Transforamtion (Shear Transformation)

CutMix

RandAugment

기타

Leveraging pre-trained information

Transfer learning

Knowlege Distillation (일종의 모델 압축도 가능)

Teacher - student network

Leveraging Unlabeled Dataset

Semi- supervised Learning

Self-learning

Self-training with noisy student

'AI Track > CV Remind' 카테고리의 다른 글

티스토리툴바