Train High Definition images with Tensorflow and i

I'm looking to do some image classification on PDF documents that I convert to images. I'm using tensorflow inception v3 pre trained model and trying to retrain the last layer with my own categories following the tensorflow tuto. I have ~1000 training images per category and only 4 categories. With 200k iterations I can reach up to 90% of successful classifications, which is not bad but still need some work:

The issue here is this pre-trained model takes only 300*300p images for input. Obviously it messes up a lot with the characters involved in the features I try to recognize in the documents.

Would it be possible to alter the model input layer so I can give him images with better resolution ?

Would I get better results with a home made and way simpler model ?

If so, where should I start to build a model for such image classification ?

If you want to use a different image resolution than the pre-trained model uses , you should use only the convolution blocks and have a set of fully connected blocks with respect to the new size. Using a higher level library like Keras will make it a lot easier. Below is an example on how to do that in Keras.

import keras
from keras.layers import Flatten,Dense,GlobalAveragePooling2D
from keras.models import Model
from keras.applications.inception_v3 import InceptionV3

base_model = InceptionV3(include_top=False,input_shape=(600,600,3),weights='imagenet')
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024,activation='relu')(x)
#Add as many dense layers / Fully Connected layers required
pred = Dense(10,activation='softmax')(x)
model = Model(base_model.input,pred)

for l in model.layers[:-3]:
    l.trainable=False

The input_top = False will give you only the convolution blocks. You can use the input_shape=(600,600,3) to set the required shape you want. And you can add a couple of dense blocks/Fully connected blocks/layers to the model.The last layer should contain the required number of categories .10 represent the number of classes.By this approach you use all the weights associated with the convolution layers of the pre trained model and train only the last dense layers.