Shape of pytorch model.parameter is inconsistent with how it's defined in the model

I'm attempting to extract the weights and biases from a simple network built in PyTorch. My entire network is composed of nn.Linear layers. When I create a layer by calling nn.Linear(in_dim, out_dim), I expect the parameters that I get from calling model.parameters() for that model to be of shape (in_dim, out_dim) for the weight and (out_dim) for the bias. However, the weights that come out of model.parameters() are instead of shape (out_dim, in_dim).

The intention of my code is to be able to use matrix multiplication to perform a forward pass using only numpy, not any PyTorch. Because of the shape inconsistency, matrix multiplications throw an error. How can I fix this?

Here is my exact code:

class RNN(nn.Module):



    def __init__(self, dim_input, dim_recurrent, dim_output):



        super(RNN, self).__init__()



        self.dim_input = dim_input

        self.dim_recurrent = dim_recurrent

        self.dim_output = dim_output



        self.dense1 = nn.Linear(self.dim_input, self.dim_recurrent)

        self.dense2 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)

        self.dense3 = nn.Linear(self.dim_input, self.dim_recurrent)

        self.dense4 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)

        self.dense5 = nn.Linear(self.dim_recurrent, self.dim_output)



#There is a defined forward pass



model = RNN(12, 100, 6)



for i in model.parameters():

    print(i.shape())

The output is:

torch.Size([100, 12])

torch.Size([100])

torch.Size([100, 100])

torch.Size([100, 12])

torch.Size([100])

torch.Size([100, 100])

torch.Size([6, 100])

torch.Size([6])

The output should, if I'm correct, be:

torch.Size([12, 100])

torch.Size([100])

torch.Size([100, 100])

torch.Size([12, 100])

torch.Size([100])

torch.Size([100, 100])

torch.Size([100, 6])

torch.Size([6])

What is my issue?

edited Nov 24 '18 at 21:34

asked Nov 24 '18 at 21:25

Samuel Carpenter

287

1

Please share the relevant code and highlight the exact issue there

– desertnaut
Nov 24 '18 at 21:29

add a comment |

Here is my exact code:

class RNN(nn.Module):



    def __init__(self, dim_input, dim_recurrent, dim_output):



        super(RNN, self).__init__()



        self.dim_input = dim_input

        self.dim_recurrent = dim_recurrent

        self.dim_output = dim_output



        self.dense1 = nn.Linear(self.dim_input, self.dim_recurrent)

        self.dense2 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)

        self.dense3 = nn.Linear(self.dim_input, self.dim_recurrent)

        self.dense4 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)

        self.dense5 = nn.Linear(self.dim_recurrent, self.dim_output)



#There is a defined forward pass



model = RNN(12, 100, 6)



for i in model.parameters():

    print(i.shape())

The output is:

torch.Size([100, 12])

torch.Size([100])

torch.Size([100, 100])

torch.Size([100, 12])

torch.Size([100])

torch.Size([100, 100])

torch.Size([6, 100])

torch.Size([6])

The output should, if I'm correct, be:

torch.Size([12, 100])

torch.Size([100])

torch.Size([100, 100])

torch.Size([12, 100])

torch.Size([100])

torch.Size([100, 100])

torch.Size([100, 6])

torch.Size([6])

What is my issue?

edited Nov 24 '18 at 21:34

asked Nov 24 '18 at 21:25

Samuel Carpenter

287

1

Please share the relevant code and highlight the exact issue there

– desertnaut
Nov 24 '18 at 21:29

add a comment |

Here is my exact code:

class RNN(nn.Module):



    def __init__(self, dim_input, dim_recurrent, dim_output):



        super(RNN, self).__init__()



        self.dim_input = dim_input

        self.dim_recurrent = dim_recurrent

        self.dim_output = dim_output



        self.dense1 = nn.Linear(self.dim_input, self.dim_recurrent)

        self.dense2 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)

        self.dense3 = nn.Linear(self.dim_input, self.dim_recurrent)

        self.dense4 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)

        self.dense5 = nn.Linear(self.dim_recurrent, self.dim_output)



#There is a defined forward pass



model = RNN(12, 100, 6)



for i in model.parameters():

    print(i.shape())

The output is:

torch.Size([100, 12])

torch.Size([100])

torch.Size([100, 100])

torch.Size([100, 12])

torch.Size([100])

torch.Size([100, 100])

torch.Size([6, 100])

torch.Size([6])

The output should, if I'm correct, be:

torch.Size([12, 100])

torch.Size([100])

torch.Size([100, 100])

torch.Size([12, 100])

torch.Size([100])

torch.Size([100, 100])

torch.Size([100, 6])

torch.Size([6])

What is my issue?

edited Nov 24 '18 at 21:34

asked Nov 24 '18 at 21:25

Samuel Carpenter

287

Here is my exact code:

class RNN(nn.Module):



    def __init__(self, dim_input, dim_recurrent, dim_output):



        super(RNN, self).__init__()



        self.dim_input = dim_input

        self.dim_recurrent = dim_recurrent

        self.dim_output = dim_output



        self.dense1 = nn.Linear(self.dim_input, self.dim_recurrent)

        self.dense2 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)

        self.dense3 = nn.Linear(self.dim_input, self.dim_recurrent)

        self.dense4 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)

        self.dense5 = nn.Linear(self.dim_recurrent, self.dim_output)



#There is a defined forward pass



model = RNN(12, 100, 6)



for i in model.parameters():

    print(i.shape())

The output is:

torch.Size([100, 12])

torch.Size([100])

torch.Size([100, 100])

torch.Size([100, 12])

torch.Size([100])

torch.Size([100, 100])

torch.Size([6, 100])

torch.Size([6])

The output should, if I'm correct, be:

torch.Size([12, 100])

torch.Size([100])

torch.Size([100, 100])

torch.Size([12, 100])

torch.Size([100])

torch.Size([100, 100])

torch.Size([100, 6])

torch.Size([6])

What is my issue?

python machine-learning pytorch

edited Nov 24 '18 at 21:34

asked Nov 24 '18 at 21:25

Samuel Carpenter

287

edited Nov 24 '18 at 21:34

asked Nov 24 '18 at 21:25

Samuel Carpenter

287

edited Nov 24 '18 at 21:34

asked Nov 24 '18 at 21:25

Samuel Carpenter

287

asked Nov 24 '18 at 21:25

Samuel Carpenter

287

asked Nov 24 '18 at 21:25

Samuel Carpenter

287

1

Please share the relevant code and highlight the exact issue there

– desertnaut
Nov 24 '18 at 21:29

add a comment |

1

Please share the relevant code and highlight the exact issue there

– desertnaut
Nov 24 '18 at 21:29

Please share the relevant code and highlight the exact issue there

– desertnaut
Nov 24 '18 at 21:29

add a comment |

1 Answer
1

active

oldest

votes

What you see there is not the (out_dim, in_dim), it is just the shape of the weight matrix. When you call print(model) you can see that input and output features are correct:

RNN(

  (dense1): Linear(in_features=12, out_features=100, bias=True)

  (dense2): Linear(in_features=100, out_features=100, bias=False)

  (dense3): Linear(in_features=12, out_features=100, bias=True)

  (dense4): Linear(in_features=100, out_features=100, bias=False)

  (dense5): Linear(in_features=100, out_features=6, bias=True)

)

You can check the source code to see that the weights are actually transposed before calling matmul.

nn.Linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear

You can check the forward, it looks like this:

def forward(self, input):

    return F.linear(input, self.weight, self.bias)

F.linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html

The respective line for multiplying the weights is:

output = input.matmul(weight.t())

As mentioned above you can see that the weights are transposed before applying matmul and therefore the shape of the weights is different than you expected.

So if you want to do the matrix multiplication manually, you do:

# dummy input of length 5

input = torch.rand(5, 12)

# apply layer dense1 (without bias, for bias just add + model.dense1.bias)

output_first_layer = input.matmul(model.dense1.weight.t())

print(output_first_layer.shape)

Just as you would expect from your dense1 it returns:

torch.Size([5, 100])

I hope this explains your observations with the shape :)

edited Nov 24 '18 at 22:26

answered Nov 24 '18 at 22:16

blue-phoenox

4,11191543

Is there a reason that the weights are transposed?

– Samuel Carpenter
Nov 24 '18 at 23:06

1

@SamuelCarpenter actually I don't know :) I asked a question here: stackoverflow.com/questions/53465608/…

– blue-phoenox
Nov 25 '18 at 7:48

1

@SamuelCarpenter found the answer for this, you can check it out on the link I posted.

– blue-phoenox
Nov 25 '18 at 9:26

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53462493%2fshape-of-pytorch-model-parameter-is-inconsistent-with-how-its-defined-in-the-mo%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

What you see there is not the (out_dim, in_dim), it is just the shape of the weight matrix. When you call print(model) you can see that input and output features are correct:

RNN(

  (dense1): Linear(in_features=12, out_features=100, bias=True)

  (dense2): Linear(in_features=100, out_features=100, bias=False)

  (dense3): Linear(in_features=12, out_features=100, bias=True)

  (dense4): Linear(in_features=100, out_features=100, bias=False)

  (dense5): Linear(in_features=100, out_features=6, bias=True)

)

You can check the source code to see that the weights are actually transposed before calling matmul.

nn.Linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear

You can check the forward, it looks like this:

def forward(self, input):

    return F.linear(input, self.weight, self.bias)

F.linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html

The respective line for multiplying the weights is:

output = input.matmul(weight.t())

As mentioned above you can see that the weights are transposed before applying matmul and therefore the shape of the weights is different than you expected.

So if you want to do the matrix multiplication manually, you do:

# dummy input of length 5

input = torch.rand(5, 12)

# apply layer dense1 (without bias, for bias just add + model.dense1.bias)

output_first_layer = input.matmul(model.dense1.weight.t())

print(output_first_layer.shape)

Just as you would expect from your dense1 it returns:

torch.Size([5, 100])

I hope this explains your observations with the shape :)

edited Nov 24 '18 at 22:26

answered Nov 24 '18 at 22:16

blue-phoenox

4,11191543

Is there a reason that the weights are transposed?

– Samuel Carpenter
Nov 24 '18 at 23:06

1

@SamuelCarpenter actually I don't know :) I asked a question here: stackoverflow.com/questions/53465608/…

– blue-phoenox
Nov 25 '18 at 7:48

1

@SamuelCarpenter found the answer for this, you can check it out on the link I posted.

– blue-phoenox
Nov 25 '18 at 9:26

add a comment |

What you see there is not the (out_dim, in_dim), it is just the shape of the weight matrix. When you call print(model) you can see that input and output features are correct:

RNN(

  (dense1): Linear(in_features=12, out_features=100, bias=True)

  (dense2): Linear(in_features=100, out_features=100, bias=False)

  (dense3): Linear(in_features=12, out_features=100, bias=True)

  (dense4): Linear(in_features=100, out_features=100, bias=False)

  (dense5): Linear(in_features=100, out_features=6, bias=True)

)

You can check the source code to see that the weights are actually transposed before calling matmul.

nn.Linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear

You can check the forward, it looks like this:

def forward(self, input):

    return F.linear(input, self.weight, self.bias)

F.linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html

The respective line for multiplying the weights is:

output = input.matmul(weight.t())

As mentioned above you can see that the weights are transposed before applying matmul and therefore the shape of the weights is different than you expected.

So if you want to do the matrix multiplication manually, you do:

# dummy input of length 5

input = torch.rand(5, 12)

# apply layer dense1 (without bias, for bias just add + model.dense1.bias)

output_first_layer = input.matmul(model.dense1.weight.t())

print(output_first_layer.shape)

Just as you would expect from your dense1 it returns:

torch.Size([5, 100])

I hope this explains your observations with the shape :)

edited Nov 24 '18 at 22:26

answered Nov 24 '18 at 22:16

blue-phoenox

4,11191543

Is there a reason that the weights are transposed?

– Samuel Carpenter
Nov 24 '18 at 23:06

1

@SamuelCarpenter actually I don't know :) I asked a question here: stackoverflow.com/questions/53465608/…

– blue-phoenox
Nov 25 '18 at 7:48

1

@SamuelCarpenter found the answer for this, you can check it out on the link I posted.

– blue-phoenox
Nov 25 '18 at 9:26

add a comment |

What you see there is not the (out_dim, in_dim), it is just the shape of the weight matrix. When you call print(model) you can see that input and output features are correct:

RNN(

  (dense1): Linear(in_features=12, out_features=100, bias=True)

  (dense2): Linear(in_features=100, out_features=100, bias=False)

  (dense3): Linear(in_features=12, out_features=100, bias=True)

  (dense4): Linear(in_features=100, out_features=100, bias=False)

  (dense5): Linear(in_features=100, out_features=6, bias=True)

)

You can check the source code to see that the weights are actually transposed before calling matmul.

nn.Linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear

You can check the forward, it looks like this:

def forward(self, input):

    return F.linear(input, self.weight, self.bias)

F.linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html

The respective line for multiplying the weights is:

output = input.matmul(weight.t())

As mentioned above you can see that the weights are transposed before applying matmul and therefore the shape of the weights is different than you expected.

So if you want to do the matrix multiplication manually, you do:

# dummy input of length 5

input = torch.rand(5, 12)

# apply layer dense1 (without bias, for bias just add + model.dense1.bias)

output_first_layer = input.matmul(model.dense1.weight.t())

print(output_first_layer.shape)

Just as you would expect from your dense1 it returns:

torch.Size([5, 100])

I hope this explains your observations with the shape :)

edited Nov 24 '18 at 22:26

answered Nov 24 '18 at 22:16

blue-phoenox

4,11191543

What you see there is not the (out_dim, in_dim), it is just the shape of the weight matrix. When you call print(model) you can see that input and output features are correct:

RNN(

  (dense1): Linear(in_features=12, out_features=100, bias=True)

  (dense2): Linear(in_features=100, out_features=100, bias=False)

  (dense3): Linear(in_features=12, out_features=100, bias=True)

  (dense4): Linear(in_features=100, out_features=100, bias=False)

  (dense5): Linear(in_features=100, out_features=6, bias=True)

)

You can check the source code to see that the weights are actually transposed before calling matmul.

nn.Linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear

You can check the forward, it looks like this:

def forward(self, input):

    return F.linear(input, self.weight, self.bias)

F.linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html

The respective line for multiplying the weights is:

output = input.matmul(weight.t())

As mentioned above you can see that the weights are transposed before applying matmul and therefore the shape of the weights is different than you expected.

So if you want to do the matrix multiplication manually, you do:

# dummy input of length 5

input = torch.rand(5, 12)

# apply layer dense1 (without bias, for bias just add + model.dense1.bias)

output_first_layer = input.matmul(model.dense1.weight.t())

print(output_first_layer.shape)

Just as you would expect from your dense1 it returns:

torch.Size([5, 100])

I hope this explains your observations with the shape :)

edited Nov 24 '18 at 22:26

answered Nov 24 '18 at 22:16

blue-phoenox

4,11191543

edited Nov 24 '18 at 22:26

answered Nov 24 '18 at 22:16

blue-phoenox

4,11191543

answered Nov 24 '18 at 22:16

blue-phoenox

4,11191543

answered Nov 24 '18 at 22:16

blue-phoenox

4,11191543

Is there a reason that the weights are transposed?

– Samuel Carpenter
Nov 24 '18 at 23:06

1

@SamuelCarpenter actually I don't know :) I asked a question here: stackoverflow.com/questions/53465608/…

– blue-phoenox
Nov 25 '18 at 7:48

1

@SamuelCarpenter found the answer for this, you can check it out on the link I posted.

– blue-phoenox
Nov 25 '18 at 9:26

add a comment |

Is there a reason that the weights are transposed?

– Samuel Carpenter
Nov 24 '18 at 23:06

1

@SamuelCarpenter actually I don't know :) I asked a question here: stackoverflow.com/questions/53465608/…

– blue-phoenox
Nov 25 '18 at 7:48

1

@SamuelCarpenter found the answer for this, you can check it out on the link I posted.

– blue-phoenox
Nov 25 '18 at 9:26

Is there a reason that the weights are transposed?

– Samuel Carpenter
Nov 24 '18 at 23:06

@SamuelCarpenter actually I don't know :) I asked a question here: stackoverflow.com/questions/53465608/…

– blue-phoenox
Nov 25 '18 at 7:48

@SamuelCarpenter found the answer for this, you can check it out on the link I posted.

– blue-phoenox
Nov 25 '18 at 9:26

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl