Day 07：撰寫第一支CNN 程式 -- 比較『阿拉伯數字』辨識力 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

睿智的香菇 · Event Handling | Vue ...· 4 天前 ·

安静的回锅肉 · linux进程注入 - 知世の小屋· 3 天前 ·

沉着的沙滩裤 · StatusCode.RESOURCE_EX ...· 2 天前 ·

坐怀不乱的大葱 · RPC error - Logic 2 ...· 2 天前 ·

要出家的煎饼果子 · CREATE INDEX ...· 23 小时前 ·

风流的韭菜 · 帆软不同数据集时间控件 - CSDN文库· 3 月前 ·

博学的紫菜汤 · HTML5 JavaScript ...· 4 月前 ·

重感情的豌豆 · Speed Stack 计时器连接CS ...· 4 月前 ·

傻傻的凳子 · ROCK2 - Drugs, ...· 5 月前 ·

眼睛小的野马 · Python selenium —— ...· 6 月前 ·

我們仍然作『阿拉伯數字的辨識』，比較 CNN 的作法與簡單的 Neural Network 有何不同。程式來自 https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py ，我在程式中加了註解，請參考這裡，檔案名稱為cnn.py。

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
# 定義梯度下降批量
batch_size = 128
# 定義分類數量
num_classes = 10
# 定義訓練週期
epochs = 12
# 定義圖像寬、高
img_rows, img_cols = 28, 28
# 載入 MNIST 訓練資料
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# 保留原始資料，供 cross tab function 使用
y_test_org = y_test
# channels_first: 色彩通道(R/G/B)資料(深度)放在第2維度，第3、4維度放置寬與高
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else: # channels_last: 色彩通道(R/G/B)資料(深度)放在第4維度，第2、3維度放置寬與高
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)
# 轉換色彩 0~255 資料為 0~1
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
# y 值轉成 one-hot encoding
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# 建立簡單的線性執行的模型
model = Sequential()
# 建立卷積層，filter=32,即 output space 的深度, Kernal Size: 3x3, activation function 採用 relu
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
# 建立卷積層，filter=64,即 output size, Kernal Size: 3x3, activation function 採用 relu
model.add(Conv2D(64, (3, 3), activation='relu'))
# 建立池化層，池化大小=2x2，取最大值
model.add(MaxPooling2D(pool_size=(2, 2)))
# Dropout層隨機斷開輸入神經元，用於防止過度擬合，斷開比例:0.25
model.add(Dropout(0.25))
# Flatten層把多維的輸入一維化，常用在從卷積層到全連接層的過渡。
model.add(Flatten())
# 全連接層: 128個output
model.add(Dense(128, activation='relu'))
# Dropout層隨機斷開輸入神經元，用於防止過度擬合，斷開比例:0.5
model.add(Dropout(0.5))
# 使用 softmax activation function，將結果分類
model.add(Dense(num_classes, activation='softmax'))
# 編譯: 選擇損失函數、優化方法及成效衡量方式
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
# 進行訓練, 訓練過程會存在 train_history 變數中
train_history = model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
# 顯示損失函數、訓練成果(分數)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
整個訓練過程執行有點久，真想買一片好一點的GPU顯示卡，裝個GPU版的TensorFlow，以縮短泡茶/喝咖啡的時間，執行完趕快用下列程式碼存檔，避免下次還要重跑。
# 模型結構存檔
from keras.models import model_from_json
json_string = model.to_json()
with open("cnn.config", "w") as text_file:
    text_file.write(json_string)
# 模型訓練結果存檔
model.save_weights("cnn.weight")
訓練結果準確率達 99.11%，比單純使用簡單的 Neural Network 高多了，但執行時間也相對較長，但只要將模型結果儲存，我們就只要訓練這次就夠了，之後直接載入模型及參數，就可以直接進行預測了。讀者如果不耐久等，也可以自這裡取得 cnn.config 及 cnn.weight，直接載入模型及參數。
另外，各位可以再執行 cnn_1.py，測試第三篇用 draw.exe 撰寫的數字看看預測結果。

這裡再提供個小技巧，可以計算『混淆矩陣』(Confusion Matrix)，顯示測試集分類的正確及錯認總和數，左上至右下的對角線為正確數，其他格為錯認數，可以看出某一數字被錯認為哪一數字的機率最高，可以再加強訓練資料，以改善錯誤分類。
# 計算『混淆矩陣』(Confusion Matrix)，顯示測試集分類的正確及錯認總和數
import pandas as pd 
predictions = model.predict_classes(x_test) 
pd.crosstab(y_test_org, predictions, rownames=['實際值'], colnames=['預測值'])
圖. 『混淆矩陣』(Confusion Matrix)
整個程式結構與第二篇的程式大致相同，主要的差異在層(layer)的設計(第42~59行)，注意，當我們使用ConvxD卷積層時，第一個參數濾波器(filters)數目並不是 output 的大小，它是output 的深度(Depth)，而 output 的寬與高會隨著參數設定有所不同，計算公式為 ((W-F+2P)/S)+1，各變數定義如下：
W: input 的寬度
F：濾波器數量
P：補零的策略，卷積層取週邊NxN的滑動視窗時，若超越邊界時，是否要放棄這個點、還是一律補零，若採後者，P就等於1，反之為0。
S：『滑動步長』(Stride)，指滑動視窗時，要一次滑動幾格。
透過以上公式，我們就會算出output 的寬或高，例如第42行 output 的寬與高 = ((28-3+0)/1)+1 = 26，可以執行指令 model.summary() 驗證，output的維度大小為 (None, 26, 26, 32)。

圖. CNN範例程式的結構
利用CNN來作『阿拉伯數字的辨識』，有點像大材小用，因為，阿拉伯數字的圖形單純，只有線條，而CNN的長處是自動萃取特徵，辨識由線、面、角，構成複雜的形狀，所以，我們會多舉一些應用實例，來彰顯它的威力。但在那之前，我會先在下一篇整理一下這支範例程式的相關函數及參數說明。
但我有個小問題想請教一下

我的score = model.evaluate(x_test, y_test, verbose=0)

score[1] 也就是acc與自行計算混淆矩陣的準確率不相符，請問您知道有什麼原因可能造成這種狀況嗎?

我是做兩類分類的，即非A即B的狀況。

image1 = io.imread(uploaded_file, as_gray=True)
image_resized = resize(image1, (28, 28), anti_aliasing=True)    
if K.image_data_format() == 'channels_first':
    X1 = image_resized.reshape(1,28,28) #/ 255
else:
    X1 = image_resized.reshape(28,28,1) #/ 255
X1 = np.abs(1-X1)
predictions = model.predict_classes(X1)
image1 = io.imread(uploaded_file, as_gray=True)
image_resized = resize(image1, (28, 28), anti_aliasing=True)    
if K.image_data_format() == 'channels_first':
    X1 = image_resized.reshape(1,28,28) #/ 255
else:
    X1 = image_resized.reshape(28,28,1) #/ 255
X1 = np.abs(1-X1)
predictions = model.predict_classes(X1)
image_resized = resize(image1, (28, 28), anti_aliasing=True)

錯誤訊息跑出

name 'resize' is not defined

是我少了甚麼套件嗎
image_resized = resize(image1, (28, 28), anti_aliasing=True) 
錯誤訊息跑出
name 'resize' is not defined
是我少了甚麼套件嗎
if K.image_data_format() == 'channels_first':

X1 = image_resized.reshape(1,28,28) #/ 255

else:

X1 = image_resized.reshape(28,28,1) #/ 255

需要改成這樣

if K.image_data_format() == 'channels_first':

X1 = image_resized.reshape(1,1,28,28) #/ 255

else:

X1 = image_resized.reshape(1,28,28,1) #/ 255

不然會出現

Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (28, 28, 1)

if K.image_data_format() == 'channels_first':
    X1 = image_resized.reshape(1,28,28) #/ 255
else:
    X1 = image_resized.reshape(28,28,1) #/ 255
需要改成這樣
if K.image_data_format() == 'channels_first':
    X1 = image_resized.reshape(1,1,28,28) #/ 255
else:
    X1 = image_resized.reshape(1,28,28,1) #/ 255
不然會出現
Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (28, 28, 1)