Convolutional Neural Networks - 3 week 실습
뭔가 요새 재미난 과제가 많아서 실습 글을 많이 적는다
신난다 신나
1. Car detection with YOLO
2. Image Segmentation with U-Net
Car detection with YOLO
위의 bounding box를 그리는 YOLO 모델을 만들어보자
YOLO에 대한 설명은 생략하고, 바로 기능 별로 함수를 만들어 보자
텐서를 보면 알 수 있듯이, 5개의 앵커 박스를 가지며 19 x 19 그리드로 처리했다
먼저 박스에 필터를 씌우는 함수이다
max인 클래스를 찾고, threshold를 넘지 않으면 값을 날린다
def yolo_filter_boxes(boxes, box_confidence, box_class_probs, threshold = .6):
"""Filters YOLO boxes by thresholding on object and class confidence.
Arguments:
boxes -- tensor of shape (19, 19, 5, 4)
box_confidence -- tensor of shape (19, 19, 5, 1)
box_class_probs -- tensor of shape (19, 19, 5, 80)
threshold -- real value, if [ highest class probability score < threshold],
then get rid of the corresponding box
Returns:
scores -- tensor of shape (None,), containing the class probability score for selected boxes
boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold.
For example, the actual output size of scores would be (10,) if there are 10 boxes.
"""
# Step 1: Compute box scores
box_scores = box_confidence * box_class_probs
# Step 2: Find the box_classes using the max box_scores, keep track of the corresponding score
# IMPORTANT: set axis to -1 (각 anchor에서 찾아야하므로)
box_classes = tf.math.argmax(box_scores,-1)
box_class_scores = tf.math.reduce_max(box_scores,-1)
# Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
# same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
filtering_mask = box_class_scores >=threshold
# Step 4: Apply the mask to box_class_scores, boxes and box_classes
scores = tf.boolean_mask(box_class_scores, filtering_mask)
boxes = tf.boolean_mask(boxes, filtering_mask)
classes = tf.boolean_mask(box_classes, filtering_mask)
return scores, boxes, classes
그리곤 non-max suppression이다
iou부터 구현해 보자
def iou(box1, box2):
"""Implement the intersection over union (IoU) between box1 and box2
Arguments:
box1 -- first box, list object with coordinates (box1_x1, box1_y1, box1_x2, box_1_y2)
box2 -- second box, list object with coordinates (box2_x1, box2_y1, box2_x2, box2_y2)
"""
(box1_x1, box1_y1, box1_x2, box1_y2) = box1
(box2_x1, box2_y1, box2_x2, box2_y2) = box2
# Calculate the (yi1, xi1, yi2, xi2) coordinates of the intersection of box1 and box2. Calculate its Area.
xi1 = max(box1_x1,box2_x1)
yi1 = max(box1_y1,box2_y1)
xi2 = min(box1_x2,box2_x2)
yi2 = min(box1_y2,box2_y2)
inter_width = max(0,yi2 - yi1)
inter_height = max(0,xi2 - xi1)
inter_area = inter_width * inter_height
# Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
box1_area = (box1_x2-box1_x1)*(box1_y2-box1_y1)
box2_area = (box2_x2-box2_x1)*(box2_y2-box2_y1)
union_area = box1_area + box2_area - inter_area
# compute the IoU
iou = inter_area/union_area
return iou
그냥 박스 2개가 주어지면 잘 계산하면 된다
그런데 사실 tf에선 tf.image.non_max_suppression함수를 제공하기에 구현할 필요가 없다
아래와 같이 non_max_suppression 기능을 구현한다
tf 함수들이 좀 낯설었는데, 나중에 공부하자
def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
"""
Applies Non-max suppression (NMS) to set of boxes
Arguments:
scores -- tensor of shape (None,), output of yolo_filter_boxes()
boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
classes -- tensor of shape (None,), output of yolo_filter_boxes()
max_boxes -- integer, maximum number of predicted boxes you'd like
iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
Returns:
scores -- tensor of shape (None, ), predicted score for each box
boxes -- tensor of shape (None, 4), predicted box coordinates
classes -- tensor of shape (None, ), predicted class for each box
Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
function will transpose the shapes of scores, boxes, classes. This is made for convenience.
"""
max_boxes_tensor = tf.Variable(max_boxes, dtype='int32') # tensor to be used in tf.image.non_max_suppression()
# Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes_tensor, iou_threshold)
# Use tf.gather() to select only nms_indices from scores, boxes and classes
scores = tf.gather(scores, nms_indices)
boxes = tf.gather(boxes, nms_indices)
classes = tf.gather(classes, nms_indices)
return scores, boxes, classes
YOLO가 예측한 박스의 형태를 조정해 주는 그냥 함수다
그런가 보다 하자
def yolo_boxes_to_corners(box_xy, box_wh):
"""Convert YOLO box predictions to bounding box corners."""
box_mins = box_xy - (box_wh / 2.)
box_maxes = box_xy + (box_wh / 2.)
return tf.keras.backend.concatenate([
box_mins[..., 1:2], # y_min
box_mins[..., 0:1], # x_min
box_maxes[..., 1:2], # y_max
box_maxes[..., 0:1] # x_max
])
자 지금까지 구현 한 내용을 합쳐보자
conv가 계산한 내용을 모아서 yolo로 처리해 주는 함수이다
# UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: yolo_eval
def yolo_eval(yolo_outputs, image_shape = (720, 1280), max_boxes=10, score_threshold=.6, iou_threshold=.5):
"""
Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.
Arguments:
yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
box_xy: tensor of shape (None, 19, 19, 5, 2)
box_wh: tensor of shape (None, 19, 19, 5, 2)
box_confidence: tensor of shape (None, 19, 19, 5, 1)
box_class_probs: tensor of shape (None, 19, 19, 5, 80)
image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
max_boxes -- integer, maximum number of predicted boxes you'd like
score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
Returns:
scores -- tensor of shape (None, ), predicted score for each box
boxes -- tensor of shape (None, 4), predicted box coordinates
classes -- tensor of shape (None,), predicted class for each box
"""
# Retrieve outputs of the YOLO model
box_xy, box_wh, box_confidence, box_class_probs = yolo_outputs
# Convert boxes to be ready for filtering functions (convert boxes box_xy and box_wh to corner coordinates)
boxes = yolo_boxes_to_corners(box_xy, box_wh)
# Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold
scores, boxes, classes = yolo_filter_boxes(boxes, box_confidence, box_class_probs, score_threshold)
# Scale boxes back to original image shape.
boxes = scale_boxes(boxes, image_shape)
# Use one of the functions you've implemented to perform Non-max suppression with
# maximum number of boxes set to max_boxes and a threshold of iou_threshold
scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)
return scores, boxes, classes
구현 끝
적용 시킨 모습을 보자
def predict(image_file):
"""
Runs the graph to predict boxes for "image_file". Prints and plots the predictions.
Arguments:
image_file -- name of an image stored in the "images" folder.
Returns:
out_scores -- tensor of shape (None, ), scores of the predicted boxes
out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes
out_classes -- tensor of shape (None, ), class index of the predicted boxes
Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes.
"""
# Preprocess your image
image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))
yolo_model_outputs = yolo_model(image_data)
yolo_outputs = yolo_head(yolo_model_outputs, anchors, len(class_names))
out_scores, out_boxes, out_classes = yolo_eval(yolo_outputs, [image.size[1], image.size[0]], 10, 0.3, 0.5)
# Print predictions info
print('Found {} boxes for {}'.format(len(out_boxes), "images/" + image_file))
# Generate colors for drawing bounding boxes.
colors = get_colors_for_classes(len(class_names))
# Draw bounding boxes on the image file
#draw_boxes2(image, out_scores, out_boxes, out_classes, class_names, colors, image_shape)
draw_boxes(image, out_boxes, out_classes, class_names, out_scores)
# Save the predicted bounding box on the image
image.save(os.path.join("out", image_file), quality=100)
# Display the results in the notebook
output_image = Image.open(os.path.join("out", image_file))
imshow(output_image)
return out_scores, out_boxes, out_classes
예측 함수를 잘 만들어 돌려보면...
잘 작동한다
Image Segmentation with U-Net
다음으로 픽셀을 색칠하는 U-Net을 구현해 보자
데이터 셋 자체도 마스킹한 것이 필요하다
구현한 모델 전체는 아래와 같이 생겼다
반반 나눠서 천천히 Encoder, Decoder의 하나의 블록을 구현해 보자
위로 올라가는 Decoder에서 필요한 skip layer을 저장해야 한다
필터의 수가 2배가 되는 것이 눈에 띈다 (이건 모델에서 건들자)
그냥 보이는대로 conv 구현하고, 추가적으로 dropout과 pooling 기능을 추가했다
def conv_block(inputs=None, n_filters=32, dropout_prob=0, max_pooling=True):
"""
Convolutional downsampling block
Arguments:
inputs -- Input tensor
n_filters -- Number of filters for the convolutional layers
dropout_prob -- Dropout probability
max_pooling -- Use MaxPooling2D to reduce the spatial dimensions of the output volume
Returns:
next_layer, skip_connection -- Next layer and skip connection outputs
"""
conv = Conv2D(n_filters, # Number of filters
3, # Kernel size
activation='relu',
padding='same',
kernel_initializer='he_normal')(inputs)
conv = Conv2D(n_filters, # Number of filters
3, # Kernel size
activation='relu',
padding='same',
kernel_initializer='he_normal')(conv)
# if dropout_prob > 0 add a dropout layer, with the variable dropout_prob as parameter
if dropout_prob > 0:
conv = Dropout(dropout_prob)(conv)
# if max_pooling is True add a MaxPooling2D with 2x2 pool_size
if max_pooling:
next_layer = MaxPooling2D(2,strides=2)(conv)
else:
next_layer = conv
skip_connection = conv
return next_layer, skip_connection
다음으로 Decoder 과정이다
크게 다를 게 없지만, skiplayer를 merge 하는 과정을 구현해 줘야 한다
def upsampling_block(expansive_input, contractive_input, n_filters=32):
"""
Convolutional upsampling block
Arguments:
expansive_input -- Input tensor from previous layer
contractive_input -- Input tensor from previous skip layer
n_filters -- Number of filters for the convolutional layers
Returns:
conv -- Tensor output
"""
up = Conv2DTranspose(
n_filters, # number of filters
3, # Kernel size
strides=2,
padding='same')(expansive_input)
# Merge the previous output and the contractive_input
merge = concatenate([up, contractive_input], axis=3)
conv = Conv2D(n_filters, # Number of filters
3, # Kernel size
activation='relu',
padding='same',
kernel_initializer='he_normal')(merge)
conv = Conv2D(n_filters, # Number of filters
3, # Kernel size
activation='relu',
padding='same',
kernel_initializer='he_normal')(conv)
return conv
자자 블록을 모두 만들었으니, 합쳐서 모델을 만들자 (사실 그냥 그림대로 조합하면 된다)
최종 과정에서 n_classes만큼 채널을 조정하기 위해 1 x 1 conv를 사용하였다
def unet_model(input_size=(96, 128, 3), n_filters=32, n_classes=23):
"""
Unet model
Arguments:
input_size -- Input shape
n_filters -- Number of filters for the convolutional layers
n_classes -- Number of output classes
Returns:
model -- tf.keras.Model
"""
inputs = Input(input_size)
# Contracting Path (encoding)
# Add a conv_block with the inputs of the unet_ model and n_filters
cblock1 = conv_block(inputs, n_filters)
# Chain the first element of the output of each block to be the input of the next conv_block.
# Double the number of filters at each new step
cblock2 = conv_block(cblock1[0], n_filters * 2)
cblock3 = conv_block(cblock2[0], n_filters * 4)
cblock4 = conv_block(cblock3[0], n_filters * 8, dropout_prob=0.3) # Include a dropout_prob of 0.3 for this layer
# Include a dropout_prob of 0.3 for this layer, and avoid the max_pooling layer
cblock5 = conv_block(cblock4[0], n_filters * 16, dropout_prob=0.3, max_pooling=False)
# Expanding Path (decoding)
# Add the first upsampling_block.
# Use the cblock5[0] as expansive_input and cblock4[1] as contractive_input and n_filters * 8
### START CODE HERE
ublock6 = upsampling_block(cblock5[0], cblock4[1], n_filters * 8)
# Chain the output of the previous block as expansive_input and the corresponding contractive block output.
# Note that you must use the second element of the contractive block i.e before the maxpooling layer.
# At each step, use half the number of filters of the previous block
ublock7 = upsampling_block(ublock6, cblock3[1], n_filters * 4)
ublock8 = upsampling_block(ublock7, cblock2[1], n_filters * 2)
ublock9 = upsampling_block(ublock8, cblock1[1], n_filters)
conv9 = Conv2D(n_filters,
3,
activation='relu',
padding='same',
kernel_initializer='he_normal')(ublock9)
# Add a Conv2D layer with n_classes filter, kernel size of 1 and a 'same' padding
conv10 = Conv2D(n_classes, 1, padding='same')(conv9)
model = tf.keras.Model(inputs=inputs, outputs=conv10)
return model
모델을 만들고, Loss를 compile 해준다
바로 결과를 보자
17번쯤 에포크에서 정확도가 큰 폭으로 감소하는 구간이 있는데 왜 그런지 모르겠다
잘 작동한다