Partial fix to allow partitions to have boundary temporaries of unknown size.
The old behavior was that we'd fall back to full model CPU execution at
compilation time; the new behavior is that we'll get ordinary
partitioned compilation and execution.
Limitations:
- Needs more testing and more tests written.
- The initial guess for the size of a boundary temporary is a single
element. Perhaps it would be useful to remember actual size from
a previous execution.
- Fenced execution punts to unfenced execution (at the NDK API level)
when plan contains subgraph outputs of unknown size.
- Operands of unknown size at control flow construct boundaries still
falls back to full model CPU execution.
Also adds some diagnostic logging.
Test: NeuralNetworksTest_static
Bug: 132458982
Merged-In: I52e7179ff9783d184fd6bfc1c9fefc55972e942a
Change-Id: I52e7179ff9783d184fd6bfc1c9fefc55972e942a
(cherry picked from commit d6183c8db7feb5e2bdf0d2907af01418e7da809e)
diff --git a/runtime/NeuralNetworks.cpp b/runtime/NeuralNetworks.cpp
index 5d3dae4..f5206c8 100644
--- a/runtime/NeuralNetworks.cpp
+++ b/runtime/NeuralNetworks.cpp
@@ -1543,6 +1543,26 @@
waitForList.push_back(syncFenceFd);
}
}
+
+ if (r->getCompilation()->hasDynamicTemporaries()) {
+ // The current implementation of fenced execution does not support
+ // dynamic temporaries. Fall back to non fenced execution.
+ LOG(INFO) << "ANeuralNetworksExecution_startComputeWithDependencies falling back"
+ << " to ANeuralNetworksExecution_startCompute"
+ << " because of boundary operands of unknown size";
+ for (int syncFenceFd : waitForList) {
+ if (syncFenceFd > 0) {
+ auto w = syncWait(syncFenceFd, -1);
+ if (w != FenceState::SIGNALED) {
+ VLOG(EXECUTION) << "syncWait failed, fd: " << syncFenceFd;
+ *event = nullptr;
+ return ANEURALNETWORKS_OP_FAILED;
+ }
+ }
+ }
+ return ANeuralNetworksExecution_startCompute(execution, event);
+ }
+
int syncFenceToSignal = -1;
int n = r->computeFenced(waitForList, duration, &syncFenceToSignal);
std::unique_ptr<SyncFenceEvent> e =