swift
[AutoDiff] Fix `autodiff_function_extract` operand ownership kind and memory leaks.
#27199
Merged

[AutoDiff] Fix `autodiff_function_extract` operand ownership kind and memory leaks. #27199

rxwei
rxwei5 years ago

The autodiff_function_extract instruction behaves like tuple_extract, where it extracts some element from an aggregate. Its operand should have the same ownership kind as that of tuple_extract. That is, it should be defined as CONSTANT_OR_TRIVIAL_OWNERSHIP_INST(Guaranteed, AutoDiffFunctionInst) in ValueOwnershipKindClassifier.

However, this is currently defined wrongly as FORWARDING_OWNERSHIP_INST(AutoDiffFunctionExtract), which caused a bug in the differentiation transform to be uncaught: VJPEmitter and JVPEmitter in the differentiation transform is performing autodiff_function_extract on an @owned @differentiable function, which caused associated functions that are not extracted to be not released:

%f = autodiff_function %original
%f_vjp = autodiff_function_extract [vjp] %f
... // %f is not released, and not caught by ownership verification!

After we fix the operand ownership kind for autodiff_function_extract, all these cases are now caught by ownership verification. The reproducer in TF-795 and most differentiation tests are failing to compile because ownership verification caught the bug in AD-generated code. The existing AD test suite is serving as good test cases for this ownership error.

To fix this, VJPEmitter and JVPEmitter are now changed to emit borrows of @differentiable functions and copies of associated functions and property destroying the @differentiable function:

%f = autodiff_function %original
%f_borrowed = begin_borrow %f
%f_vjp_extracted = autodiff_function_extract [vjp] %f_borrowed
%f_vjp = copy_value %f_vjp_extracted
end_borrow %f_borrowed
destroy_value %f

Fixes TF-795.

rxwei [AutoDiff] Fix `autodiff_function_extract` value ownership kind.
42df376d
rxwei rxwei added tensorflow
rxwei rxwei requested a review from dan-zheng dan-zheng 5 years ago
rxwei
rxwei5 years ago

@swift-ci please test tensorflow

dan-zheng
dan-zheng approved these changes on 2019-09-16
rxwei rxwei merged 0641a22b into tensorflow 5 years ago
rxwei rxwei deleted the adfuncextract_ownership branch 5 years ago
saeta
saeta5 years ago

Can you please add a unit test?

rxwei
rxwei5 years ago

@saeta As explained in the PR description, failures are now caught by ownership verification after operand ownership kind is fixed, so any leak would cause all AD tests to fail to compile. Since this is a closure heap allocation leak, a refcount-tracked type will not work on the TF-795 reproducer so it's not trivially unit-testable. It can only be caught by more general leak-checking-enabled CI. The substantial fix here is to define the right operand ownership kind.

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone