I have two loss variables, A and B.
When I do
A.backward()
everything works fine, but when I do
B.backward()
i get the following error:
TypeError: backward() takes 2 positional arguments but 3 were given
I expect that the difference is in the history of the variables but I cannot find it.
Any ideas or directions will be appreciated.
I have an architecture that consists largely of regular Modules and Functions, but also uses two custom Functions, each one associated with a different loss (I add their code below).
The loss functions themselves are also non-trivial but are all a series of torch functions and are probably okay so for now I will only add the code for the custom Functions to maintain brevity.
The custom Function for loss
A
is simply adding some noise to the input:
@staticmethod
# bias is an optional argument
def forward(ctx, input, stdev):
normal_sample = torch.normal(torch.zeros(input.size()), torch.zeros(input.size()) + stdev).cuda()
# re-center the means to the input
output = input + normal_sample
ctx.stdev = stdev
ctx.mark_dirty(input)
ctx.save_for_backward(input, output)
return output
@staticmethod
def backward(ctx, grad_output):
input, output = ctx.saved_variables
stdev = ctx.stdev
tensor_output = output.data
tensor_input = input.data
tensor_output.normal_(0, stdev[0][0])
# re-center the means to the input
tensor_output.add(tensor_input)
del ctx.stdev
return Variable(tensor_output), None
The custom function for loss B
uses its input to weigh multinomial sampling and outputs the resulting indexing:
@staticmethod
def forward(ctx, input):
one = torch.ones(1).cuda()
if len(input.size()) == 1:
input = torch.unsqueeze(input, 0)
output = torch.zeros(input.size())
if input.is_cuda:
output = output.cuda()
# sample from categorical with p = input
_index = torch.multinomial(input + constants.epsilon, 1, False)
output.scatter_(1, _index, torch.unsqueeze(one.repeat(_index.size()[0]),1))
ctx.mark_dirty(input)
ctx.save_for_backward(input, output)
return _index.float(), output
@staticmethod
def backward(ctx, grad_output):
input, output = ctx.saved_variables
gradInput = torch.zeros(input.size()).cuda()
gradInput.copy_(output)
gradInput.div_(input)
gradInput.mul_(grad_output)
return gradInput
This might be a dumb question, but I have a similar scenario and want to give multiple outputs from my custom forward function similar to what mordith was doing. One of these will need to be differentiated but the others will not. For example,
def forward(ctx, input)
return differentiated_output, non_differentiated_output_for_diagnostics, another_one