Deep learning studio only handling 5 images per second

0

I have a simple neural net that is running on Keras with Deep learning studio as an interface. The problem I'm seeing is that for 80% of the time, the python script is only using 1 core (and maxing it out at that). You can see this exact phenomenon below (click to enlarge).

enter image description here

All settings are default, and Keras is not utilizing my GPU. I suspect that maybe some operations are only single-core utilizable? My training network is pretty average, with the exception that it's relatively simple.

enter image description here

The Keras translation for this is as follows:

    import keras
from keras.layers.convolutional import Convolution2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.convolutional import ZeroPadding2D
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras.layers import Input
from keras.models import Model
from keras.regularizers import *


def get_model():
    aliases = {}
    Input_39 = Input(shape=(3, 480, 648), name='Input_39')
    Convolution2D_1 = Convolution2D(name='Convolution2D_1',nb_col= 5,nb_row= 5,nb_filter= 64)(Input_39)
    MaxPooling2D_1 = MaxPooling2D(name='MaxPooling2D_1',pool_size= (4, 4))(Convolution2D_1)
    ZeroPadding2D_3 = ZeroPadding2D(name='ZeroPadding2D_3')(MaxPooling2D_1)
    Convolution2D_2 = Convolution2D(name='Convolution2D_2',nb_col= 3,nb_row= 3,nb_filter= 64)(ZeroPadding2D_3)
    ZeroPadding2D_4 = ZeroPadding2D(name='ZeroPadding2D_4')(Convolution2D_2)
    Convolution2D_3 = Convolution2D(name='Convolution2D_3',nb_col= 3,nb_row= 3,nb_filter= 64)(ZeroPadding2D_4)
    Flatten_1 = Flatten(name='Flatten_1')(Convolution2D_3)
    Dropout_1 = Dropout(name='Dropout_1',p= .25)(Flatten_1)
    Dense_2 = Dense(name='Dense_2',output_dim= 256,activation= 'relu' )(Dropout_1)
    Dropout_2 = Dropout(name='Dropout_2',p= 0.25)(Dense_2)
    Dense_3 = Dense(name='Dense_3',output_dim= 3)(Dropout_2)

    model = Model([Input_39],[Dense_3])
    return aliases, model


from keras.optimizers import *

def get_optimizer():
    return Adadelta()

def is_custom_loss_function():
    return False

def get_loss_function():
    return 'categorical_crossentropy'

def get_batch_size():
    return 256

def get_num_epoch():
    return 12

def get_data_config():
    return '{"kfold": 1, "datasetLoadOption": "full", "shuffle": false, "numPorts": 1, "mapping": {"Category": {"port": "OutputPort0", "options": {}, "type": "Categorical", "shape": ""}, "Image": {"port": "InputPort0", "options": {"Width": "240", "shear_range": 0, "width_shift_range": 0, "Height": "120", "Normalization": false, "vertical_flip": true, "horizontal_flip": true, "pretrained": "None", "rotation_range": "180", "Resize": false, "Scaling": 1, "height_shift_range": 0, "Augmentation": true}, "type": "Image", "shape": ""}}, "samples": {"validation": 97, "split": 3, "test": 97, "training": 457}, "dataset": {"name": "Training_greyscale", "type": "private", "samples": 653}}'

With this, I've decided to analyze the threads, and I found the thread that is using most of the compute time. It seems to be a function which is only there to set attributes? here is the analysis: enter image description here

Things I've ruled out:
R/W stalls. The data is on a NVMe disk, and only takes up 90MB. Reads and writes should not be of concern.
Anything pertaining to memory; The model has more than enough memory to operate as it stands.

And with this, about 5 samples per second is averaged, and the program has a miserable core use ratio. What can I do to speed this process along, aside from using additional hardware?

If additional details are needed, please do not down vote; I'm happy to provide them in a timely matter.

tuskiomi

Posted 2019-06-21T03:35:43.627

Reputation: 897

No answers