Run quantized tensorflow model on FPGA / pure python
up vote
2
down vote
favorite
I have a model trained in keras which is a simple model trained on MNIST dataset.
What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.
First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).
So I have quantized model and accuracy is about 90%.
Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.
Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.
So my question is how to write a feed forward using numpy?
My model in keras looks like this:
model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)
I converted it with TocoConverter. And it works in tensorflow.
Then I try to write feed forward in pure python:
for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1
But this model accuracy is about 10%, so something goes wrong.
How to correct this model?
Thanks in advance.
Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf
And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?
python tensorflow deep-learning tensorflow-lite quantization
add a comment |
up vote
2
down vote
favorite
I have a model trained in keras which is a simple model trained on MNIST dataset.
What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.
First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).
So I have quantized model and accuracy is about 90%.
Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.
Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.
So my question is how to write a feed forward using numpy?
My model in keras looks like this:
model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)
I converted it with TocoConverter. And it works in tensorflow.
Then I try to write feed forward in pure python:
for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1
But this model accuracy is about 10%, so something goes wrong.
How to correct this model?
Thanks in advance.
Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf
And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?
python tensorflow deep-learning tensorflow-lite quantization
Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
– E.Coms
Nov 21 at 22:24
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a model trained in keras which is a simple model trained on MNIST dataset.
What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.
First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).
So I have quantized model and accuracy is about 90%.
Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.
Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.
So my question is how to write a feed forward using numpy?
My model in keras looks like this:
model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)
I converted it with TocoConverter. And it works in tensorflow.
Then I try to write feed forward in pure python:
for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1
But this model accuracy is about 10%, so something goes wrong.
How to correct this model?
Thanks in advance.
Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf
And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?
python tensorflow deep-learning tensorflow-lite quantization
I have a model trained in keras which is a simple model trained on MNIST dataset.
What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.
First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).
So I have quantized model and accuracy is about 90%.
Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.
Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.
So my question is how to write a feed forward using numpy?
My model in keras looks like this:
model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)
I converted it with TocoConverter. And it works in tensorflow.
Then I try to write feed forward in pure python:
for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1
But this model accuracy is about 10%, so something goes wrong.
How to correct this model?
Thanks in advance.
Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf
And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?
python tensorflow deep-learning tensorflow-lite quantization
python tensorflow deep-learning tensorflow-lite quantization
edited Nov 26 at 21:51
asked Nov 21 at 21:54
Damian
113
113
Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
– E.Coms
Nov 21 at 22:24
add a comment |
Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
– E.Coms
Nov 21 at 22:24
Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
– E.Coms
Nov 21 at 22:24
Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
– E.Coms
Nov 21 at 22:24
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
There are two steps you'll need to do:
Dequantize the input, weights and bias back into full precision (or integer equivalent)
(w-w_offset)*w_scale
After the Relu, quantize the activations back into integer
a/a_scale+a_offset
You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.
You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420994%2frun-quantized-tensorflow-model-on-fpga-pure-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
There are two steps you'll need to do:
Dequantize the input, weights and bias back into full precision (or integer equivalent)
(w-w_offset)*w_scale
After the Relu, quantize the activations back into integer
a/a_scale+a_offset
You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.
You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.
add a comment |
up vote
0
down vote
There are two steps you'll need to do:
Dequantize the input, weights and bias back into full precision (or integer equivalent)
(w-w_offset)*w_scale
After the Relu, quantize the activations back into integer
a/a_scale+a_offset
You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.
You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.
add a comment |
up vote
0
down vote
up vote
0
down vote
There are two steps you'll need to do:
Dequantize the input, weights and bias back into full precision (or integer equivalent)
(w-w_offset)*w_scale
After the Relu, quantize the activations back into integer
a/a_scale+a_offset
You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.
You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.
There are two steps you'll need to do:
Dequantize the input, weights and bias back into full precision (or integer equivalent)
(w-w_offset)*w_scale
After the Relu, quantize the activations back into integer
a/a_scale+a_offset
You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.
You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.
answered 2 days ago
SoonYau
512
512
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420994%2frun-quantized-tensorflow-model-on-fpga-pure-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.
– E.Coms
Nov 21 at 22:24