Run quantized tensorflow model on FPGA / pure python











up vote
2
down vote

favorite












I have a model trained in keras which is a simple model trained on MNIST dataset.



What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.



First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).



So I have quantized model and accuracy is about 90%.



Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.



Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.



So my question is how to write a feed forward using numpy?



My model in keras looks like this:



model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)


I converted it with TocoConverter. And it works in tensorflow.



Then I try to write feed forward in pure python:



for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1


But this model accuracy is about 10%, so something goes wrong.
How to correct this model?



Thanks in advance.



Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf



And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?










share|improve this question
























  • Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
    – E.Coms
    Nov 21 at 22:24















up vote
2
down vote

favorite












I have a model trained in keras which is a simple model trained on MNIST dataset.



What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.



First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).



So I have quantized model and accuracy is about 90%.



Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.



Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.



So my question is how to write a feed forward using numpy?



My model in keras looks like this:



model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)


I converted it with TocoConverter. And it works in tensorflow.



Then I try to write feed forward in pure python:



for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1


But this model accuracy is about 10%, so something goes wrong.
How to correct this model?



Thanks in advance.



Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf



And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?










share|improve this question
























  • Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
    – E.Coms
    Nov 21 at 22:24













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have a model trained in keras which is a simple model trained on MNIST dataset.



What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.



First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).



So I have quantized model and accuracy is about 90%.



Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.



Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.



So my question is how to write a feed forward using numpy?



My model in keras looks like this:



model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)


I converted it with TocoConverter. And it works in tensorflow.



Then I try to write feed forward in pure python:



for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1


But this model accuracy is about 10%, so something goes wrong.
How to correct this model?



Thanks in advance.



Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf



And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?










share|improve this question















I have a model trained in keras which is a simple model trained on MNIST dataset.



What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.



First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).



So I have quantized model and accuracy is about 90%.



Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.



Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.



So my question is how to write a feed forward using numpy?



My model in keras looks like this:



model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)


I converted it with TocoConverter. And it works in tensorflow.



Then I try to write feed forward in pure python:



for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1


But this model accuracy is about 10%, so something goes wrong.
How to correct this model?



Thanks in advance.



Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf



And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?







python tensorflow deep-learning tensorflow-lite quantization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 at 21:51

























asked Nov 21 at 21:54









Damian

113




113












  • Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
    – E.Coms
    Nov 21 at 22:24


















  • Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
    – E.Coms
    Nov 21 at 22:24
















Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
– E.Coms
Nov 21 at 22:24




Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
– E.Coms
Nov 21 at 22:24












1 Answer
1






active

oldest

votes

















up vote
0
down vote













There are two steps you'll need to do:





  1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



    (w-w_offset)*w_scale




  2. After the Relu, quantize the activations back into integer



    a/a_scale+a_offset



    You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420994%2frun-quantized-tensorflow-model-on-fpga-pure-python%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    There are two steps you'll need to do:





    1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



      (w-w_offset)*w_scale




    2. After the Relu, quantize the activations back into integer



      a/a_scale+a_offset



      You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




    You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.






    share|improve this answer

























      up vote
      0
      down vote













      There are two steps you'll need to do:





      1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



        (w-w_offset)*w_scale




      2. After the Relu, quantize the activations back into integer



        a/a_scale+a_offset



        You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




      You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        There are two steps you'll need to do:





        1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



          (w-w_offset)*w_scale




        2. After the Relu, quantize the activations back into integer



          a/a_scale+a_offset



          You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




        You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.






        share|improve this answer












        There are two steps you'll need to do:





        1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



          (w-w_offset)*w_scale




        2. After the Relu, quantize the activations back into integer



          a/a_scale+a_offset



          You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




        You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 days ago









        SoonYau

        512




        512






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420994%2frun-quantized-tensorflow-model-on-fpga-pure-python%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Contact image not getting when fetch all contact list from iPhone by CNContact

            count number of partitions of a set with n elements into k subsets

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks