behaviour of ssh/scp over flakey links

Discussion:

behaviour of ssh/scp over flakey links - timeout/retry?

c***@bl.echidna.id.au

2001-12-13 01:19:16 UTC

I'm using OpenSSH's ssh and scp to back up some remote
machines, roughly as follows :

ssh remote-host "tar up a few dirs"
scp remote-host:tarfile local-repository

On the whole, as I'd expect, this works just fine.
But .. sometimes the link is a bit dodgey (for lack of
a more explicit term, this being a polite list :) )

Can anyone tell me how ssh and scp timeout and retry, or if they
do, during a session (I know about timeout and retry during
establishment, it's documented in the man page). The job's
scripted, and I need to be able to make my script deal with a loss,
by as a minimum, alerting me that it's happened. At the moment,
it seems to just hang when its lost the link. I could use
expect I guess, and catch timeouts that way, but I'd prefer if
ssh said "timeout on session, giving up" or something.

FWIW, we're using OpenSSH_2.9p2, SSH protocols 1.5/2.0, OpenSSL 0x0090600f
on RedHat 7.1 and 6.2.

I'll be updating to 3.0 "soon", once I've had a chance to test it in
our environment.

Thanks :)

Carl

Dan Kaminsky

2001-12-13 02:10:58 UTC

Permalink

Carl:

I've honestly never had the best of luck with scp. In my experience,
command forwarding the tar command is the fastest, most cross-platform(even
on Windows, using the Cygwin OpenSSH daemon) method of moving files.

Try this:

# For unpacked files on the backup host
alicehost$ ssh ***@bobhost "tar -cf - /path" | tar -xf -
# To get the tarball itself
alicehost$ ssh ***@bobhost "tar -cf - /path" > /path/bobhost.tar
# Slight variant -- send a tarball somewhere else
bobhost$ tar -cf - /path | ssh ***@alicehost "cat > /path/bobhost.tar"

Now, that being said, it sounds like the real problem is that the TCP
session dies on occasion. There are a couple solutions to this, some of
which I'm still working on:

1) Use a transfer system that can handle incremental(and thus resumable)
updates. Rsync comes to mind. Make sure Keepalives are enabled("ssh -o
KeepAlive Yes", or modify your /etc/ssh.conf) so timeouts will happen
quicker, then have your script go back and rsync again if rsync returns an
error code. (It won't upon a successful syncing.) Do something like:

alicehost$ rsync -e "ssh -o KeepAlive yes" ***@bobhost:/path
/path/bobhost/
or
bobhost$ rysnc -e "ssh -o KeepAlive yes" /path
***@alicehost:/path/bobhost

2) Add a TCP Reliability Layer. ROCKS, available at
http://www.cs.wisc.edu/~zandy/rocks/ ,handles links that are...ah..."less
than stable". Quoting the description: "Rock-enabled programs continue to
run after any of these events; their broken connections recover
automatically, without loss of in-flight data, when connectivity returns.
Rocks work transparently with most applications, including SSH clients,
X-windows applications, and network service daemons." You'll almost
certainly need to disable SSH keepalives for this to work, but the
reliability layer will almost certainly handle even extended network
outages.

I haven't tested ROCKS at *all*, but I'll be doing so shortly.

You might find yourself missing, well, the status updates that you get
with scp. cpipe, available at http://wsd.iitb.fhg.de/~kir/cpipehome/ , is a
nice general purpose tool for monitoring the speed of a pipe.

Lemme know if any of this helps; I'm working on stuff related to this
right now.

Yours Truly,

Dan Kaminsky
DoxPara Research
http://www.doxpara.com

c***@bl.echidna.id.au

2001-12-13 02:20:49 UTC

Permalink

Post by Dan Kaminsky
I've honestly never had the best of luck with scp. In my experience,
command forwarding the tar command is the fastest, most cross-platform(even
on Windows, using the Cygwin OpenSSH daemon) method of moving files.
# For unpacked files on the backup host
# To get the tarball itself
# Slight variant -- send a tarball somewhere else
Now, that being said, it sounds like the real problem is that the TCP
session dies on occasion.

Yep, the connection is to a few IDS probes that are behind flakey
switches and overloaded LANs. We get a lot of dropped sessions.

Post by Dan Kaminsky
There are a couple solutions to this, some of
1) Use a transfer system that can handle incremental(and thus resumable)
updates. Rsync comes to mind. Make sure Keepalives are enabled("ssh -o
KeepAlive Yes", or modify your /etc/ssh.conf) so timeouts will happen
quicker, then have your script go back and rsync again if rsync returns an
/path/bobhost/
or
bobhost$ rysnc -e "ssh -o KeepAlive yes" /path

Ok, I like that option.

Post by Dan Kaminsky
2) Add a TCP Reliability Layer. ROCKS, available at
http://www.cs.wisc.edu/~zandy/rocks/ ,handles links that are...ah..."less
than stable". Quoting the description: "Rock-enabled programs continue to
run after any of these events; their broken connections recover
automatically, without loss of in-flight data, when connectivity returns.
Rocks work transparently with most applications, including SSH clients,
X-windows applications, and network service daemons." You'll almost
certainly need to disable SSH keepalives for this to work, but the
reliability layer will almost certainly handle even extended network
outages.
I haven't tested ROCKS at *all*, but I'll be doing so shortly.

Interesting.

Post by Dan Kaminsky
You might find yourself missing, well, the status updates that you get
with scp. cpipe, available at http://wsd.iitb.fhg.de/~kir/cpipehome/ , is a
nice general purpose tool for monitoring the speed of a pipe.

No, don't care. Just want the copy to complete, or tell me it didn't.

Post by Dan Kaminsky
Lemme know if any of this helps; I'm working on stuff related to this
right now.

It sure does, thankyou.

Carl

Dan Kaminsky

2001-12-13 02:34:25 UTC

Permalink

Post by c***@bl.echidna.id.au
Yep, the connection is to a few IDS probes that are behind flakey
switches and overloaded LANs. We get a lot of dropped sessions.

Hmmm. There's an interesting third option that attempts to stream all files
directly to a backend aggregator, using the file system only as a cache in
case connectivity is lost, if at all.

What's the datarate of each of your probes, worstcase?

--Dan

c***@bl.echidna.id.au

2001-12-13 02:49:48 UTC

Permalink

Post by Dan Kaminsky

Post by c***@bl.echidna.id.au
Yep, the connection is to a few IDS probes that are behind flakey
switches and overloaded LANs. We get a lot of dropped sessions.

Hmmm. There's an interesting third option that attempts to stream all files
directly to a backend aggregator, using the file system only as a cache in
case connectivity is lost, if at all.
What's the datarate of each of your probes, worstcase?

I'm not sure, I'm not involved in that part of the system
much.