[autotest] fix a race condition in master-ssh startup
This fixes a race condition in master-ssh connection startup, between
the establishment of the connection (and creation of socket file) and
the first use of the connection. It accomplishes the fix by waiting up
to 5s for the socket file to appear, after launching the master-ssh
process, before returning from the master-ssh start function. If the
connection fails to be established within this time, a warning is logged
but no other action is taken, allowing the test to attempt to continue
to run.
BUG=chromium:305641
TEST=Ran some local test_that runs, which prior to this change had been
exhibiting a lot of "lost socket file" events in the logs. Saw these
events eliminted.
Change-Id: Ie8a45edf20adf1320a65201451ab7b64ac837dbb
Reviewed-on: https://chromium-review.googlesource.com/173560
Reviewed-by: Aviv Keshet <[email protected]>
Tested-by: Aviv Keshet <[email protected]>
Commit-Queue: David James <[email protected]>
diff --git a/server/hosts/abstract_ssh.py b/server/hosts/abstract_ssh.py
index 3f13104..8c9ca86 100644
--- a/server/hosts/abstract_ssh.py
+++ b/server/hosts/abstract_ssh.py
@@ -600,13 +600,17 @@
self.master_ssh_option = ''
- def start_master_ssh(self):
+ def start_master_ssh(self, timeout=5):
"""
Called whenever a slave SSH connection needs to be initiated (e.g., by
run, rsync, scp). If master SSH support is enabled and a master SSH
connection is not active already, start a new one in the background.
Also, cleanup any zombie master SSH connections (e.g., dead due to
reboot).
+
+ timeout: timeout in seconds (default 5) to wait for master ssh
+ connection to be established. If timeout is reached, a
+ warning message is logged, but no other action is taken.
"""
if not enable_master_ssh:
return
@@ -654,6 +658,19 @@
self.master_ssh_job = utils.BgJob(master_cmd,
nickname='master-ssh',
no_pipes=True)
+ # To prevent a race between the the master ssh connection startup
+ # and its first attempted use, wait for socket file to exist before
+ # returning.
+ end_time = time.time() + timeout
+ socket_file_path = os.path.join(self.master_ssh_tempdir.name,
+ 'socket')
+ while time.time() < end_time:
+ if os.path.exists(socket_file_path):
+ break
+ time.sleep(.2)
+ else:
+ logging.warn('Timed out waiting for master-ssh connection '
+ 'to be established.')
def clear_known_hosts(self):