Discussion:
SSH connection hanging on logout
Daniel David Benson
2001-05-04 20:52:20 UTC
Permalink
I am running OpenSSH 2.9p1 on SunOS 5.7 w/4-24-2001 patch cluster.
Like many other users I am seeing the hanging session on logout
with background processes. This is a huge problem for me as
I centrally manage 50+ machines with rdist across ssh.
Instead of just complaining about the problem I thought I would
put my CS degree to use and try to track down the problem myself.
For starters, though, can someone point me in the right direction?
Also, is there a code rodemap for OpenSSH?

Thanks!

-Dan
Damien Miller
2001-05-05 05:53:56 UTC
Permalink
Post by Daniel David Benson
I am running OpenSSH 2.9p1 on SunOS 5.7 w/4-24-2001 patch cluster.
Like many other users I am seeing the hanging session on logout
with background processes. This is a huge problem for me as
I centrally manage 50+ machines with rdist across ssh.
Instead of just complaining about the problem I thought I would
put my CS degree to use and try to track down the problem myself.
For starters, though, can someone point me in the right direction?
This is the best description of the problem, pinched from Redhat:

About the hang-on-exit bug: this is the TODO item which shows up when you
run "ssh server 'sleep 20 & exit'".

* The shell starts up, and starts its own session. As a side-effect, it
gets its own process group.
* The child forks off sleep, and because it's in the background, puts it
into its own process group. The sleep command inherits a copy of the
shell's descriptor for the tty as its stdout.
* The shell exits, but doesn't SIGHUP all of its child PIDs like it probably
should.
* The sshd server attempts to read from the master side of the pty, and
while there are still process with the pty open, no EOF is produced.
* The sleep command exits, closes its descriptor, sshd detects the EOF, and
the connection gets closed.

Attempts at fixing this in sshd, and why they don't work:
* SIGHUP the sshd's process group.
- The shell is in its own process group.
* Track process group IDs of all children before we reap them (via an extra
field in Session structures which holds the pgid for each child pid), and
SIGHUP the pgid when we reap.
- Background commands are in yet another process group.
* Close the connection when the child dies.
- Background commands may need to write data to the connection. Also
prematurely truncates output from some commands (scp server, the
famous "dd if=/dev/zero bs=1000 count=100" case).

Known-good workarounds:
* bash: shopt huponexit on
* tcsh: none
* zsh: ?
* pdksh: ?

This appears to affect rsh as well: it behaves the same with 'sleep 20 & exit'.
--
| Damien Miller <***@mindrot.org> \ ``E-mail attachments are the poor man's
| http://www.mindrot.org / distributed filesystem'' - Dan Geer
Jason Stone
2001-05-05 11:54:00 UTC
Permalink
Post by Damien Miller
About the hang-on-exit bug: this is the TODO item which shows up when you
run "ssh server 'sleep 20 & exit'".
* The shell starts up, and starts its own session. As a side-effect, it
gets its own process group.
* The sshd server attempts to read from the master side of the pty, and
while there are still process with the pty open, no EOF is produced.
* The sleep command exits, closes its descriptor, sshd detects the EOF, and
the connection gets closed.
Or, put another way, this is a feature, not a bug - sshd has no way of
knowing that "sleep 20" isn't going to eventually produce some output that
you'll want to see, so it stays alive until the background command exits.

The real "bug" is users trying to use the shell's '&' builtin to run
daemon processes. If you want a command to really be backgrounded (ie, to
daemonize), use something other than '&', something that will make the
command close the pty and either start its own process group or else
become a child of init. Eg:

perl -e 'fork && exit; close STDIN; close STDOUT; close STDERR; \
setpgrp(0,$$); exec "sleep 20";'

(Watch out for the quoting if you try this on the commandline....)
Post by Damien Miller
* bash: shopt huponexit on
* tcsh: none
* zsh: setopt HUP
(this is usually the default)


If you use zsh, you might also try something like this in your .zshrc:

daemonize(){
COMMAND="$@"
perl -e 'fork && exit; close STDIN; close STDOUT; close STDERR; \
setpgrp(0,$$); exec "'$COMMAND'";' }
}

You would then run "daemonize sleep 20" and the sleep 20 would be run in
the background and not hang the sshd when you exit.

This will almost certainly work in other bourne-compatible shells as well.


- -Jason
John Bowman
2001-05-08 23:52:24 UTC
Permalink
As is well known, current versions of openssh hang upon exit when
background processes exist.

If these processes do not produce output to stdout or stderr they should be
allowed to continue to run silently. (If they do try to produce output,
they will be killed by the shell.) This would be consistent with the
behaviour of rsh, ssh, rlogin, telnet, csh, and bash. In no case should
openssh wait around for them indefinitely.

Ssh is supposed to be a secure implementation of rsh and openssh is
supposed to be a open source version of ssh, so despite a few suggestions
to the contrary, this *really* is a bug.

The following patch to openssh-2.9p1 fixes the problem. This patch has now
been thoroughly tested and is believed not to break ssh or scp, unlike
previous related attempts.

I hope this patch is helpful,

-- John Bowman
University of Alberta
http://www.math.ualberta.ca/~bowman

diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c
--- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001
+++ openssh-2.9p1J/clientloop.c Wed May 2 16:21:16 2001
@@ -440,9 +440,13 @@
len = read(connection_in, buf, sizeof(buf));
if (len == 0) {
/* Received EOF. The remote host has closed the connection. */
- snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
- host);
- buffer_append(&stderr_buffer, buf, strlen(buf));
+/*
+ * This message duplicates the one already in client_loop().
+ *
+ * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
+ * host);
+ * buffer_append(&stderr_buffer, buf, strlen(buf));
+ */
quit_pending = 1;
return;
}
diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c
--- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001
+++ openssh-2.9p1J/nchan.c Wed May 2 16:19:11 2001
@@ -56,7 +56,7 @@

/* helper */
static void chan_shutdown_write(Channel *c);
-static void chan_shutdown_read(Channel *c);
+void chan_shutdown_read(Channel *c);

/*
* SSH1 specific implementation of event functions
@@ -479,7 +479,7 @@
c->wfd = -1;
}
}
-static void
+void
chan_shutdown_read(Channel *c)
{
if (compat20 && c->type == SSH_CHANNEL_LARVAL)
diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h
--- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001
+++ openssh-2.9p1J/nchan.h Wed May 2 16:19:11 2001
@@ -88,4 +88,5 @@

void chan_init_iostates(Channel * c);
void chan_init(void);
+void chan_shutdown_read(Channel *c);
#endif
diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c
--- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001
+++ openssh-2.9p1J/session.c Wed May 2 16:20:04 2001
@@ -1960,6 +1960,8 @@
*/
if (c->ostate != CHAN_OUTPUT_CLOSED)
chan_write_failed(c);
+ if (c->istate != CHAN_INPUT_CLOSED)
+ chan_shutdown_read(c);
s->chanid = -1;
}
Markus Friedl
2001-05-09 22:55:57 UTC
Permalink
hi,

i think this patch can lead to data loss.

please tell me if you experience this.

-m
Rachit Siamwalla
2001-05-10 01:36:07 UTC
Permalink
If this is a feature, not a bug, then, my (stupid?) question(s) are this:

1. Telnet doesn't have the same problem. (yes, telnet isn't exactly the same
thing, but... this is related to what John Bowman's patch does)
2. F-secure SSH doesn't have the same problem.

Also, I believe that this problem was attempted to workedaround sometime in
2.3.0p1 timeframe by if the connection was closed, ssh will close and exit
immediately (don't quote me on this, this info was gleaned through
observation, not reading the actual code). However, this triggered the
unfortunate bug in that:

ssh myserver echo 0

will not actually print anything out, because the close and exit was too
soon. I am not a pty expert, but I wonder how f-secure ssh managed to get
around this issue (it doesn't have either problems).

-rchit

-----Original Message-----
From: Jason Stone [mailto:***@shalott.net]
Sent: Saturday, May 05, 2001 4:54 AM
To: openssh-unix-***@mindrot.org
Subject: Re: SSH connection hanging on logout


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Damien Miller
About the hang-on-exit bug: this is the TODO item which shows up when you
run "ssh server 'sleep 20 & exit'".
* The shell starts up, and starts its own session. As a side-effect, it
gets its own process group.
* The sshd server attempts to read from the master side of the pty, and
while there are still process with the pty open, no EOF is produced.
* The sleep command exits, closes its descriptor, sshd detects the EOF,
and
Post by Damien Miller
the connection gets closed.
Or, put another way, this is a feature, not a bug - sshd has no way of
knowing that "sleep 20" isn't going to eventually produce some output that
you'll want to see, so it stays alive until the background command exits.

The real "bug" is users trying to use the shell's '&' builtin to run
daemon processes. If you want a command to really be backgrounded (ie, to
daemonize), use something other than '&', something that will make the
command close the pty and either start its own process group or else
become a child of init. Eg:

perl -e 'fork && exit; close STDIN; close STDOUT; close STDERR; \
setpgrp(0,$$); exec "sleep 20";'

(Watch out for the quoting if you try this on the commandline....)
Post by Damien Miller
* bash: shopt huponexit on
* tcsh: none
* zsh: setopt HUP
(this is usually the default)


If you use zsh, you might also try something like this in your .zshrc:

daemonize(){
COMMAND="$@"
perl -e 'fork && exit; close STDIN; close STDOUT; close STDERR; \
setpgrp(0,$$); exec "'$COMMAND'";' }
}

You would then run "daemonize sleep 20" and the sleep 20 would be run in
the background and not hang the sshd when you exit.

This will almost certainly work in other bourne-compatible shells as well.


- -Jason
Damien Miller
2001-05-10 02:33:51 UTC
Permalink
Post by John Bowman
As is well known, current versions of openssh hang upon exit when
background processes exist.
If these processes do not produce output to stdout or stderr they should be
allowed to continue to run silently. (If they do try to produce output,
they will be killed by the shell.) This would be consistent with the
behaviour of rsh, ssh, rlogin, telnet, csh, and bash. In no case should
openssh wait around for them indefinitely.
Ssh is supposed to be a secure implementation of rsh and openssh is
supposed to be a open source version of ssh, so despite a few suggestions
to the contrary, this *really* is a bug.
The following patch to openssh-2.9p1 fixes the problem. This patch has now
been thoroughly tested and is believed not to break ssh or scp, unlike
previous related attempts.
The patch does not work for protocol 1:

while [ 1 ] ; do ssh -p 2222 -o Protocol=1 -oForwardX11=no ***@localhost dd if=/dev/zero bs=1024 count=100 | wc -c ; done

Write failed flushing stdout buffer.
100+0 records in
100+0 records out
16384
Write failed flushing stdout buffer.
100+0 records in
100+0 records out
20480
Write failed flushing stdout buffer.
100+0 records in
100+0 records out
20480
Write failed flushing stdout buffer.
100+0 records in
100+0 records out
20480

etc
--
| Damien Miller <***@mindrot.org> \ ``E-mail attachments are the poor man's
| http://www.mindrot.org / distributed filesystem'' - Dan Geer
John Bowman
2001-05-10 04:05:34 UTC
Permalink
Post by Damien Miller
Write failed flushing stdout buffer.
100+0 records in
100+0 records out
16384
Write failed flushing stdout buffer.
100+0 records in
100+0 records out
20480
Write failed flushing stdout buffer.
100+0 records in
100+0 records out
20480
Write failed flushing stdout buffer.
100+0 records in
100+0 records out
20480
Interesting. First of all, this reminds me to point out that the patch only
fixes the hang-on-exit bug for Protocol 2 anyway. Which OS and which
platform did this occur on? On RedHat Linux 6.2 no data loss appears to
occur under Protocol 1. But I haven't tested it extensively since under
this protocol it doesn't fix the bug anyway.

while [ 1 ] ; do ssh -o Protocol=1 -oForwardX11=no localhost dd if=/dev/zero bs=1024 count=100 | wc -c ; done
100+0 records in
100+0 records out
102400
100+0 records in
100+0 records out
102400
100+0 records in
100+0 records out
102400
100+0 records in
100+0 records out
102400
100+0 records in
100+0 records out
102400
100+0 records in
100+0 records out
102400
100+0 records in
100+0 records out
102400

In any case, given what you have found, I agree that users who are still
supporting Protocol 1 should either not apply the patch at all or else modify
it to call chan_shutdown_read only under Protocol 2.

-- John Bowman
http://www.math.ualberta.ca/~bowman
m***@etoh.eviladmin.org
2001-05-10 04:16:43 UTC
Permalink
Post by Damien Miller
Post by John Bowman
As is well known, current versions of openssh hang upon exit when
background processes exist.
If these processes do not produce output to stdout or stderr they should be
allowed to continue to run silently. (If they do try to produce output,
they will be killed by the shell.) This would be consistent with the
behaviour of rsh, ssh, rlogin, telnet, csh, and bash. In no case should
openssh wait around for them indefinitely.
Ssh is supposed to be a secure implementation of rsh and openssh is
supposed to be a open source version of ssh, so despite a few suggestions
to the contrary, this *really* is a bug.
The following patch to openssh-2.9p1 fixes the problem. This patch has now
been thoroughly tested and is believed not to break ssh or scp, unlike
previous related attempts.
Write failed flushing stdout buffer.
100+0 records in
100+0 records out
16384
Try against the latest CVS snapshot. Consider even applying the patch
the patch Markus put out for turn blocking I/O back on. This is kinda a
seperate issue. I can show this problem exists independent of his patch.

But your right. This does not solve protocol 1 hang on exit. It just
solves protocol 2 which is a hint that it's the wrong solution.

I must point out that this 'work around' is only required for a LIMITED
number of platforms. Which I believe is HP/UX and Linux at this point.
Which leads me to believe their is something unique to those platforms.
So it may cause failure on platforms that don't require this work around.

My fear is by putting this into the CVS tree even the portable version
only that we will end up with another 2.3.0pX feasco. Where we suddenly
learned what the downfall of the patch is months after the patch is
applied and almost forgotten about.

I'm still not sure this the correct solution to the problem. It may look
like it's working, but so did the last hack.

- Ben
Damien Miller
2001-05-10 05:00:38 UTC
Permalink
Post by m***@etoh.eviladmin.org
Try against the latest CVS snapshot. Consider even applying the patch
the patch Markus put out for turn blocking I/O back on. This is kinda a
seperate issue. I can show this problem exists independent of his patch.
This was the latest CVS.
Post by m***@etoh.eviladmin.org
But your right. This does not solve protocol 1 hang on exit. It just
solves protocol 2 which is a hint that it's the wrong solution.
I must point out that this 'work around' is only required for a LIMITED
number of platforms. Which I believe is HP/UX and Linux at this point.
Which leads me to believe their is something unique to those platforms.
So it may cause failure on platforms that don't require this work around.
I suspect that the problem may be with the Linux kernel itself and how it
handles filedescriptiors shared between processes. OpenBSD and Solaris
don't exhibit the problem, the sshd child's fds to the shell get properly
closed when it exits.
Post by m***@etoh.eviladmin.org
My fear is by putting this into the CVS tree even the portable version
only that we will end up with another 2.3.0pX feasco. Where we suddenly
learned what the downfall of the patch is months after the patch is
applied and almost forgotten about.
yeah - I would much prefer an (avoidable) hang on logout to a potential
data loss.

-d
--
| Damien Miller <***@mindrot.org> \ ``E-mail attachments are the poor man's
| http://www.mindrot.org / distributed filesystem'' - Dan Geer
John Bowman
2001-05-10 13:41:37 UTC
Permalink
Post by Damien Miller
Post by m***@etoh.eviladmin.org
Try against the latest CVS snapshot. Consider even applying the patch
the patch Markus put out for turn blocking I/O back on. This is kinda a
seperate issue. I can show this problem exists independent of his patch.
This was the latest CVS.
Does protocol 1 still break (I assume you are using OpenBSD?) when my
hang-on-exit patch is applied to openssh-2.9?

Let's not make the issue any murkier than it already is by applying the
patch to CVS snapshots, which are subject to continual change. In other
words, let's use 2.9 and 2.9p1 as controls for these tests and vary only
one thing at a time (the patch).
Post by Damien Miller
Post by m***@etoh.eviladmin.org
But your right. This does not solve protocol 1 hang on exit. It just
solves protocol 2 which is a hint that it's the wrong solution.
I must point out that this 'work around' is only required for a LIMITED
number of platforms. Which I believe is HP/UX and Linux at this point.
Which leads me to believe their is something unique to those platforms.
So it may cause failure on platforms that don't require this work around.
I suspect that the problem may be with the Linux kernel itself and how it
handles filedescriptiors shared between processes. OpenBSD and Solaris
don't exhibit the problem, the sshd child's fds to the shell get properly
closed when it exits.
Good. At least we have now established beyond any doubt that this really
*is* a bug under HP-UX and Linux (whether one wants to attribute it to the
OS or to openssh is irrelevant to me; it still needs a workaround either way).

If the hanging behaviour were actually the "correct" behaviour, openssh
would hang on other platforms too, right?
Post by Damien Miller
Post by m***@etoh.eviladmin.org
My fear is by putting this into the CVS tree even the portable version
only that we will end up with another 2.3.0pX feasco. Where we suddenly
learned what the downfall of the patch is months after the patch is
applied and almost forgotten about.
I provided the patch only to be helpful to the openssh community. We and
others have been using it on (RedHat and SuSe) Linux production machines
for over a week without problems. For us, the alternative was to switch
back to using ssh.

Linux is the only environment where the patch (restricted to Protocol 2)
has been subject to extensive testing. But of course, with a code this
complex, it is extremely difficult to analyze all possible scenarios.
Post by Damien Miller
yeah - I would much prefer an (avoidable) hang on logout to a potential
data loss.
At the very least, the patch may provide an important clue to solving this
bug. In particular, the fact that workarounds for unusual return values
under HP-UX and Linux (according to the above the only two OS's where the
bug manifests itself) appear in chan_shutdown_read may be relevant.

I'm afraid I can't invest any more time on this patch. However, I can
provide a few questions that perhaps the openssh community can address, in
order to resolve the issues that have been raised here.

QUERIES:

1. Does sleep 20&;exit hang on any OS's other than HP-UX and Linux?

2. Does Protocol 1 lead to data loss when the patch is applied to
openssh-2.9 on BSD?

3. Does chan_shutdown_read really get called under Protocol 1?

When I insert a debug statement at the beginning of chan_shutdown_read
and run with sshd -d,
ssh -v -o Protocol=1 -oForwardX11=no wizard dd if=/dev/zero bs=1024 count=100 | wc -c
does not seem to even call chan_shutdown_read at all under Linux!
This explains why the bug neither fixes the hang-on-exit bug nor leads to
data loss with Protocol 1 under Linux.

4. Has anyone seen a case where Protocol 1 leads to data loss when the
patch is applied to openssh-2.9p1 on Linux?

5. Has anyone seen a case where Protocol 1 leads to data loss when the
patch is applied to openssh-2.9p1 on HP-UX?

6. Has anyone seen a case where Protocol 2 leads to data loss when the
patch is applied to openssh-2.9p1 on Linux?

7. Has anyone seen a case where Protocol 2 leads to data loss when the
patch is applied to openssh-2.9p1 on HP-UX?

8. Has anyone seen a case where Protocol 2 leads to data loss on any OS?
This is the most crucial question.

-- John Bowman
University of Alberta
http://www.math.ualberta.ca/~bowman
David Bronder
2001-05-10 05:26:08 UTC
Permalink
Post by Damien Miller
Post by m***@etoh.eviladmin.org
But your right. This does not solve protocol 1 hang on exit. It just
solves protocol 2 which is a hint that it's the wrong solution.
I must point out that this 'work around' is only required for a LIMITED
number of platforms. Which I believe is HP/UX and Linux at this point.
Which leads me to believe their is something unique to those platforms.
So it may cause failure on platforms that don't require this work around.
I suspect that the problem may be with the Linux kernel itself and how it
handles filedescriptiors shared between processes. OpenBSD and Solaris
don't exhibit the problem, the sshd child's fds to the shell get properly
closed when it exits.
I can confirm that the hang-on-exit problem also occurs under AIX 4.3.3.
Drove me nuts, too, until I realized that it was newmail (from elm)
that was holding open the pty. (See below.)
Post by Damien Miller
Post by m***@etoh.eviladmin.org
My fear is by putting this into the CVS tree even the portable version
only that we will end up with another 2.3.0pX feasco. Where we suddenly
learned what the downfall of the patch is months after the patch is
applied and almost forgotten about.
yeah - I would much prefer an (avoidable) hang on logout to a potential
data loss.
Yes, but I agree with others that the hang on logout is not exactly the
correct behavior, either. As has been pointed out, other remote login
services seem to correctly handle this situation, but OpenSSH does not.

In addition, the "avoidable" hang can sometimes be very non-intuitive to
track down. My problem with newmail threw me because newmail isn't run
as a background process; it does it's own fork/exec and ends up with init
as it's parent, but it still hangs OpenSSH for up to a minute or longer.

=Dave
--
Hello World. David Bronder - Systems Admin
Segmentation Fault ITS-SPA, Univ. of Iowa
Core dumped, disk trashed, quota filled, soda warm. david-***@uiowa.edu
Jason Stone
2001-05-10 22:24:22 UTC
Permalink
Post by John Bowman
1. Does sleep 20&;exit hang on any OS's other than HP-UX and Linux?
It also hangs on FreeBSD.

hermione/home/jason-629: cat /kern/version
FreeBSD 4.3-STABLE #0: Fri May 4 20:22:15 PDT 2001
***@hermione.pas.lab:/usr/src/sys/compile/JKERN

hermione/home/jason-634: date ; ssh localhost 'sleep 20& exit' ; date
Thu May 10 15:01:54 PDT 2001
Thu May 10 15:02:14 PDT 2001

hermione/home/jason-636: date ; ssh localhost 'perl -e '"'"'fork &&
exit; close STDIN ; close STDOUT ; close STDERR ; exec "sleep
20";'"'" ; date
Thu May 10 15:03:45 PDT 2001
Thu May 10 15:03:45 PDT 2001


-Jason

---------------------------
If the Revolution comes to grief, it will be because you and those you
lead have become alarmed at your own brutality. --John Gardner
m***@etoh.eviladmin.org
2001-05-10 22:29:40 UTC
Permalink
Post by Rachit Siamwalla
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by John Bowman
1. Does sleep 20&;exit hang on any OS's other than HP-UX and Linux?
It also hangs on FreeBSD.
hermione/home/jason-629: cat /kern/version
FreeBSD 4.3-STABLE #0: Fri May 4 20:22:15 PDT 2001
hermione/home/jason-634: date ; ssh localhost 'sleep 20& exit' ; date
Thu May 10 15:01:54 PDT 2001
Thu May 10 15:02:14 PDT 2001
Is this really a valid test? This hangs for 20 seconds under OpenBSD
also, but this is not what we are refering to. SSH into your FreeBSD box
using an interactive shell then do: sleep 20&exit

- Ben
Markus Friedl
2001-05-10 22:52:45 UTC
Permalink
Post by m***@etoh.eviladmin.org
Post by Jason Stone
hermione/home/jason-634: date ; ssh localhost 'sleep 20& exit' ; date
Thu May 10 15:01:54 PDT 2001
Thu May 10 15:02:14 PDT 2001
Is this really a valid test?
no
Post by m***@etoh.eviladmin.org
This hangs for 20 seconds under OpenBSD
also, but this is not what we are refering to. SSH into your FreeBSD box
using an interactive shell then do: sleep 20&exit
please, could someone with a system where:
$ ssh -t host
% sleep 1234 &
% exit
$
hangs with openssh try whether rlogin hangs, too?

if rlogin does not hang, could you please check the source of rlogin
and try to figure out how it handles the filedescriptors that connect
rlogind to the shell. what happes with the filedescriptors after the
shell dies?

thanks, -m
John Bowman
2001-05-13 18:44:53 UTC
Permalink
Although still no instances of data loss have been reported with the patch
I posted to this list on 2001-05-08, I have now noticed one inconsistency
with the handling of X connections when the patch is applied to openssh-2.9p1
that I thought I should report:

Without the patch the following will hang (just as any other process will):

ssh host
xclock &
exit

With the patch the ssh connection closes immediately, without waiting for the
X application to terminate. This does not seem to be desirable; suppose the
process had been an emacs or netscape session.

Is it possible to modify the patch so that it will wait for unclosed X
sessions to terminate (but not hang on other processes), just as the commercial
version of SSH does?

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman
Markus Friedl
2001-05-16 22:06:12 UTC
Permalink
Post by John Bowman
Although still no instances of data loss have been reported with the patch
you should check this:

ssh localhost -2 -v -v -v -p 1234 dd if=/bsd bs=65536 count=2 | \
(sleep 10; md5sum)

on my machine the remote command dies, but sshd
still calls read 3 more times on rfd.

this should not lead to data corruption, i.e. the checksums
must match

dd if=/bsd bs=65536 count=2 | md5sum

-m

use this patch if you want to trace the reads from rfd.

Index: channels.c
===================================================================
RCS file: /home/markus/cvs/ssh/channels.c,v
retrieving revision 1.115
diff -u -r1.115 channels.c
--- channels.c 2001/05/09 22:51:57 1.115
+++ channels.c 2001/05/16 21:52:30
@@ -920,6 +920,7 @@
chan_read_failed(c);
}
} else {
+ debug3("channel %d: read rfd %d len %d", c->self, c->rfd, len);
buffer_append(&c->input, buf, len);
}
}
@@ -1029,9 +1031,10 @@
packet_put_int(c->remote_id);
packet_put_int(c->local_consumed);
packet_send();
- debug2("channel %d: window %d sent adjust %d",
+ debug2("channel %d: window %d sent adjust %d (obuf %d)",
c->self, c->local_window,
- c->local_consumed);
+ c->local_consumed,
+ buffer_len(&c->output));
c->local_window += c->local_consumed;
c->local_consumed = 0;
}
@@ -1270,6 +1273,7 @@
}
}
if (len > 0) {
+ debug3("channel %d: channel data: %d", c->self, len);
packet_start(compat20 ?
SSH2_MSG_CHANNEL_DATA : SSH_MSG_CHANNEL_DATA);
packet_put_int(c->remote_id);
John Bowman
2001-05-16 22:14:57 UTC
Permalink
Under linux there is no data corruption and the checksums match:

[wizard: ~] ssh localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
2+0 records in
2+0 records out
86d34e869a31df51922ad2bb9bd202bc -
[wizard: ~] dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
2+0 records in
2+0 records out
86d34e869a31df51922ad2bb9bd202bc -

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman
Markus Friedl
2001-05-17 14:27:57 UTC
Permalink
Post by John Bowman
[wizard: ~] ssh localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
2+0 records in
2+0 records out
86d34e869a31df51922ad2bb9bd202bc -
[wizard: ~] dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
2+0 records in
2+0 records out
86d34e869a31df51922ad2bb9bd202bc -
with my debugging patch,
you should see something like this on the sshd side:

debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 15907
debug2: channel 0: rcvd adjust 16861
debug3: channel 0: channel data: 477
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug2: channel 0: rcvd adjust 65536
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug1: Received SIGCHLD.
^^ shell dies
debug1: session_by_pid: pid 29873
debug1: session_exit_message: session 0 channel 0 pid 29873
debug1: session_exit_message: release channel 0
debug1: channel 0: write failed
debug1: channel 0: output open -> closed
debug1: channel 0: close_write
debug1: session_free: session 0 pid 29873
debug3: channel 0: read rfd 10 len 16384
^^ more reads from the shell.

if you shutdown at the SIGCHLD, you can no longer read
at this point!

debug2: channel 0: read 84 from efd 12
debug3: channel 0: channel data: 16384
debug2: channel 0: rwin 16384 elen 84 euse 1
debug2: channel 0: sent ext data 84
debug1: channel 0: read<=0 rfd 10 len 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed
debug1: channel 0: send eof
John Bowman
2001-05-17 15:35:18 UTC
Permalink
Post by Markus Friedl
Post by John Bowman
[wizard: ~] ssh localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
2+0 records in
2+0 records out
86d34e869a31df51922ad2bb9bd202bc -
[wizard: ~] dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
2+0 records in
2+0 records out
86d34e869a31df51922ad2bb9bd202bc -
with my debugging patch,
debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 15907
debug2: channel 0: rcvd adjust 16861
debug3: channel 0: channel data: 477
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug2: channel 0: rcvd adjust 65536
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug1: Received SIGCHLD.
^^ shell dies
debug1: session_by_pid: pid 29873
debug1: session_exit_message: session 0 channel 0 pid 29873
debug1: session_exit_message: release channel 0
debug1: channel 0: write failed
debug1: channel 0: output open -> closed
debug1: channel 0: close_write
debug1: session_free: session 0 pid 29873
debug3: channel 0: read rfd 10 len 16384
^^ more reads from the shell.
if you shutdown at the SIGCHLD, you can no longer read
at this point!
debug2: channel 0: read 84 from efd 12
debug3: channel 0: channel data: 16384
debug2: channel 0: rwin 16384 elen 84 euse 1
debug2: channel 0: sent ext data 84
debug1: channel 0: read<=0 rfd 10 len 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed
debug1: channel 0: send eof
Here is what I get with the latest patch and your debug patch
installed. There is a SIGCHLD, but only after the very beginning:

ssh -v -v -v localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
...
debug2: channel 0: written 477 to efd 6
debug2: channel 0: rcvd ext data 27
debug1: Received SIGCHLD.
debug2: channel 0: written 27 to efd 6
debug2: channel 0: window 32264 sent adjust 4600 (obuf 28672)
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug2: channel 0: window 0 sent adjust 4096 (obuf 61440)
debug2: channel 0: window 4096 sent adjust 4096 (obuf 57344)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 16384 sent adjust 4096 (obuf 45056)
debug2: channel 0: window 20480 sent adjust 4096 (obuf 40960)
debug2: channel 0: window 24576 sent adjust 4096 (obuf 36864)
debug2: channel 0: window 28672 sent adjust 4096 (obuf 32768)
debug2: channel 0: window 20480 sent adjust 36864 (obuf 8192)
debug2: channel 0: rcvd ext data 31
debug2: channel 0: window 24545 sent adjust 28672 (obuf 12288)
debug1: channel 0: rcvd eof
debug1: channel 0: output open -> drain
debug1: channel 0: rcvd close
debug1: channel 0: input open -> closed
debug1: channel 0: close_read
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug1: channel 0: obuf empty
debug1: channel 0: output drain -> closed
debug1: channel 0: close_write
debug2: channel 0: active efd: 6 len 31 type write
2+0 records in
2+0 records out
debug2: channel 0: written 31 to efd 6
debug1: channel 0: send close
debug1: channel 0: is dead
debug1: channel_free: channel 0: status: The following connections are open:
#0 client-session (t4 r0 i8/0 o128/0 fd -1/-1)

debug1: channel_free: channel 0: dettaching channel user
debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 6.1 seconds
debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
debug1: Exit status 0
86d34e869a31df51922ad2bb9bd202bc -


[wizard: ~] dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
2+0 records in
2+0 records out
86d34e869a31df51922ad2bb9bd202bc -




With 10 counts and a short sleep it looks like this:

ssh -v -v -v localhost dd if=/bin/bash bs=65536 count=10 | ( sleep 5 ; md5sum )
...

debug2: channel 0: written 477 to efd 6
debug2: channel 0: rcvd ext data 27
debug1: Received SIGCHLD.
debug2: channel 0: written 27 to efd 6
debug2: channel 0: window 32264 sent adjust 4600 (obuf 28672)
debug2: channel 0: window 0 sent adjust 4096 (obuf 61440)
debug2: channel 0: window 4096 sent adjust 4096 (obuf 57344)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 16384 sent adjust 4096 (obuf 45056)
debug2: channel 0: window 20480 sent adjust 4096 (obuf 40960)
debug2: channel 0: window 24576 sent adjust 4096 (obuf 36864)
debug2: channel 0: window 28672 sent adjust 4096 (obuf 32768)
debug2: channel 0: window 20480 sent adjust 36864 (obuf 8192)
debug2: channel 0: window 24576 sent adjust 28672 (obuf 12288)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 21136 sent adjust 32768 (obuf 11632)
debug2: channel 0: rcvd ext data 15
4+1 records in
debug2: channel 0: written 15 to efd 6
debug2: channel 0: rcvd ext data 16
4+1 records out
debug2: channel 0: written 16 to efd 6
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: rcvd eof
debug1: channel 0: output open -> drain
debug1: channel 0: rcvd close
debug1: channel 0: input open -> closed
debug1: channel 0: close_read
debug2: channel 0: no data after CLOSE
debug1: channel 0: obuf empty
debug1: channel 0: output drain -> closed
debug1: channel 0: close_write
debug1: channel 0: send close
debug1: channel 0: is dead
debug1: channel_free: channel 0: status: The following connections are open:
#0 client-session (t4 r0 i8/0 o128/0 fd -1/-1)

debug1: channel_free: channel 0: dettaching channel user
debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 4.5 seconds
debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
debug1: Exit status 0
6c80ab2560a5f7b9b778b5498a93ece8 -

[wizard: ~] dd if=/bin/bash bs=65536 count=10 | ( sleep 5 ; md5sum )
4+1 records in
4+1 records out
6c80ab2560a5f7b9b778b5498a93ece8 -


Looks ok to me.

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman
Markus Friedl
2001-05-17 21:38:23 UTC
Permalink
Post by John Bowman
Post by Markus Friedl
debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 15907
debug2: channel 0: rcvd adjust 16861
debug3: channel 0: channel data: 477
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug2: channel 0: rcvd adjust 65536
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug1: Received SIGCHLD.
^^ shell dies
debug1: session_by_pid: pid 29873
debug1: session_exit_message: session 0 channel 0 pid 29873
debug1: session_exit_message: release channel 0
debug1: channel 0: write failed
debug1: channel 0: output open -> closed
debug1: channel 0: close_write
debug1: session_free: session 0 pid 29873
debug3: channel 0: read rfd 10 len 16384
^^ more reads from the shell.
if you shutdown at the SIGCHLD, you can no longer read
at this point!
debug2: channel 0: read 84 from efd 12
debug3: channel 0: channel data: 16384
debug2: channel 0: rwin 16384 elen 84 euse 1
debug2: channel 0: sent ext data 84
debug1: channel 0: read<=0 rfd 10 len 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed
debug1: channel 0: send eof
Here is what I get with the latest patch and your debug patch
ssh -v -v -v localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
i need the server side LOG message!
John Bowman
2001-05-18 05:18:22 UTC
Permalink
Date: Thu, 17 May 2001 23:38:23 +0200
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
Post by John Bowman
Post by Markus Friedl
debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 15907
debug2: channel 0: rcvd adjust 16861
debug3: channel 0: channel data: 477
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug2: channel 0: rcvd adjust 65536
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug1: Received SIGCHLD.
^^ shell dies
debug1: session_by_pid: pid 29873
debug1: session_exit_message: session 0 channel 0 pid 29873
debug1: session_exit_message: release channel 0
debug1: channel 0: write failed
debug1: channel 0: output open -> closed
debug1: channel 0: close_write
debug1: session_free: session 0 pid 29873
debug3: channel 0: read rfd 10 len 16384
^^ more reads from the shell.
if you shutdown at the SIGCHLD, you can no longer read
at this point!
debug2: channel 0: read 84 from efd 12
debug3: channel 0: channel data: 16384
debug2: channel 0: rwin 16384 elen 84 euse 1
debug2: channel 0: sent ext data 84
debug1: channel 0: read<=0 rfd 10 len 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed
debug1: channel 0: send eof
Here is what I get with the latest patch and your debug patch
ssh -v -v -v localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
i need the server side LOG message!
Sorry, I realized after sending it that I forgot to include the server side
output and I didn't get a chance to get back to it until now.

Here is the debugging output, with my latest patch (which fixes the -N
problem you pointed out today) and your debug patch applied to 2.9p1:


ssh -v -v -v localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )

SSH:
debug2: channel 0: written 477 to efd 6
debug2: channel 0: rcvd ext data 27
debug1: Received SIGCHLD.
debug2: channel 0: written 27 to efd 6
debug2: channel 0: window 32264 sent adjust 4600 (obuf 28672)
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug2: channel 0: window 0 sent adjust 4096 (obuf 61440)
debug2: channel 0: window 4096 sent adjust 4096 (obuf 57344)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 16384 sent adjust 4096 (obuf 45056)
debug2: channel 0: window 20480 sent adjust 4096 (obuf 40960)
debug2: channel 0: window 24576 sent adjust 4096 (obuf 36864)
debug2: channel 0: window 28672 sent adjust 4096 (obuf 32768)
debug2: channel 0: window 20480 sent adjust 36864 (obuf 8192)
debug2: channel 0: rcvd ext data 31
2+0 records in
2+0 records out
debug2: channel 0: written 31 to efd 6
debug2: channel 0: window 24545 sent adjust 28703 (obuf 12288)
debug1: channel 0: rcvd eof
debug1: channel 0: output open -> drain
debug1: channel 0: rcvd close
debug1: channel 0: input open -> closed
debug1: channel 0: close_read
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug1: channel 0: obuf empty
debug1: channel 0: output drain -> closed
debug1: channel 0: close_write
debug1: channel 0: send close
debug1: channel 0: is dead
debug1: channel_free: channel 0: status: The following connections are open:
#0 client-session (t4 r0 i8/0 o128/0 fd -1/-1)

debug1: channel_free: channel 0: dettaching channel user
debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 7.3 seconds
debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
debug1: Exit status 0
86d34e869a31df51922ad2bb9bd202bc -

dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )
2+0 records in
2+0 records out
86d34e869a31df51922ad2bb9bd202bc -

SSHD:
debug1: Entering interactive session for SSH2.
debug1: server_init_dispatch_20
debug1: server_input_channel_open: ctype session rchan 0 win 65536 max 32768
debug1: input_session_request
debug1: channel 0: new [server-session]
debug1: session_new: init
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 channel 0 request x11-req reply 0
debug1: Received request for X11 forwarding with auth spoofing.
debug1: x11_create_display_inet: Socket family 10 not supported
debug1: fd 3 setting O_NONBLOCK
debug1: fd 3 IS O_NONBLOCK
debug1: channel 1: new [X11 inet listener]
debug1: temporarily_use_uid: 9062/2501 (e=0)
debug1: restore_uid
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 channel 0 request exec reply 0
debug1: fd 8 setting O_NONBLOCK
debug1: fd 8 IS O_NONBLOCK
debug1: fd 10 setting O_NONBLOCK
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 9448
debug1: session_exit_message: session 0 channel 0 pid 9448
debug1: session_exit_message: release channel 0
debug1: channel 0: write failed
debug1: channel 0: output open -> closed
debug1: channel 0: close_write
debug1: channel 0: close_read
debug1: session_free: session 0 pid 9448
debug1: channel 0: read<=0 rfd 8 len 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed
debug1: channel 0: send eof
debug1: channel 0: send close
debug1: channel 0: rcvd close
debug1: channel 0: is dead
debug1: channel_free: channel 0: status: The following connections are open:
#0 server-session (t4 r0 i8/0 o128/0 fd 8/8)

Connection closed by remote host.
debug1: channel_free: channel 1: status: The following connections are open:

debug1: xauthfile_cleanup_proc called
Closing connection to 127.0.0.1

==============================================================================

ssh -v -v -v localhost dd if=/bin/bash bs=65536 count=10 | ( sleep 5 ; md5sum )

SSH:
debug2: channel 0: written 477 to efd 6
debug2: channel 0: rcvd ext data 27
debug1: Received SIGCHLD.
debug2: channel 0: written 27 to efd 6
debug2: channel 0: window 32264 sent adjust 4600 (obuf 28672)
debug2: channel 0: window 0 sent adjust 4096 (obuf 61440)
debug2: channel 0: window 4096 sent adjust 4096 (obuf 57344)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 16384 sent adjust 4096 (obuf 45056)
debug2: channel 0: window 20480 sent adjust 4096 (obuf 40960)
debug2: channel 0: window 24576 sent adjust 4096 (obuf 36864)
debug2: channel 0: window 28672 sent adjust 4096 (obuf 32768)
debug2: channel 0: window 20480 sent adjust 36864 (obuf 8192)
debug2: channel 0: window 24576 sent adjust 20480 (obuf 20480)
debug2: channel 0: window 28672 sent adjust 8192 (obuf 28672)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288)
debug2: channel 0: window 21136 sent adjust 32768 (obuf 11632)
debug2: channel 0: rcvd ext data 15
4+1 records in
debug2: channel 0: written 15 to efd 6
debug2: channel 0: rcvd ext data 16
4+1 records out
debug2: channel 0: written 16 to efd 6
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: rcvd eof
debug1: channel 0: output open -> drain
debug1: channel 0: rcvd close
debug1: channel 0: input open -> closed
debug1: channel 0: close_read
debug2: channel 0: no data after CLOSE
debug1: channel 0: obuf empty
debug1: channel 0: output drain -> closed
debug1: channel 0: close_write
debug1: channel 0: send close
debug1: channel 0: is dead
debug1: channel_free: channel 0: status: The following connections are open:
#0 client-session (t4 r0 i8/0 o128/0 fd -1/-1)

debug1: channel_free: channel 0: dettaching channel user
debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 2.8 seconds
debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
debug1: Exit status 0
6c80ab2560a5f7b9b778b5498a93ece8 -

dd if=/bin/bash bs=65536 count=10 | ( sleep 5 ; md5sum )
4+1 records in
4+1 records out
6c80ab2560a5f7b9b778b5498a93ece8 -

SSHD:
debug1: Entering interactive session for SSH2.
debug1: server_init_dispatch_20
debug1: server_input_channel_open: ctype session rchan 0 win 65536 max 32768
debug1: input_session_request
debug1: channel 0: new [server-session]
debug1: session_new: init
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 channel 0 request x11-req reply 0
debug1: Received request for X11 forwarding with auth spoofing.
debug1: x11_create_display_inet: Socket family 10 not supported
debug1: fd 3 setting O_NONBLOCK
debug1: fd 3 IS O_NONBLOCK
debug1: channel 1: new [X11 inet listener]
debug1: temporarily_use_uid: 9062/2501 (e=0)
debug1: restore_uid
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 channel 0 request exec reply 0
debug1: fd 8 setting O_NONBLOCK
debug1: fd 8 IS O_NONBLOCK
debug1: fd 10 setting O_NONBLOCK
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 9462
debug1: session_exit_message: session 0 channel 0 pid 9462
debug1: session_exit_message: release channel 0
debug1: channel 0: write failed
debug1: channel 0: output open -> closed
debug1: channel 0: close_write
debug1: channel 0: close_read
debug1: session_free: session 0 pid 9462
debug1: channel 0: read<=0 rfd 8 len 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed
debug1: channel 0: send eof
debug1: channel 0: send close
debug1: channel 0: rcvd close
debug1: channel 0: is dead
debug1: channel_free: channel 0: status: The following connections are open:
#0 server-session (t4 r0 i8/0 o128/0 fd 8/8)

Connection closed by remote host.
debug1: channel_free: channel 1: status: The following connections are open:

debug1: xauthfile_cleanup_proc called
Closing connection to 127.0.0.1

==============================================================================

ssh -v -v -v localhost dd if=/bin/bash bs=655360 count=2 | ( sleep 10 ; md5sum )

SSH:
debug2: channel 0: written 477 to efd 6
debug2: channel 0: rcvd ext data 27
debug1: Received SIGCHLD.
debug2: channel 0: written 27 to efd 6
debug2: channel 0: window 32264 sent adjust 4600 (obuf 28672)
debug2: channel 0: window 0 sent adjust 4096 (obuf 61440)
debug2: channel 0: window 4096 sent adjust 4096 (obuf 57344)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 16384 sent adjust 4096 (obuf 45056)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 16384 sent adjust 4096 (obuf 45056)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 16384 sent adjust 4096 (obuf 45056)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248)
debug2: channel 0: rcvd ext data 31
debug2: channel 0: window 8161 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53217)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49121)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53217)
debug2: channel 0: window 8161 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53217)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49121)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53217)
debug2: channel 0: window 8161 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53217)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49121)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53217)
debug2: channel 0: window 8161 sent adjust 4096 (obuf 53248)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53217)
debug2: channel 0: window 12288 sent adjust 4096 (obuf 49121)
debug2: channel 0: window 8192 sent adjust 4096 (obuf 53217)
debug1: channel 0: rcvd eof
debug1: channel 0: output open -> drain
debug1: channel 0: rcvd close
debug1: channel 0: input open -> closed
debug1: channel 0: close_read
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
0+1 records in
0+1 records out
debug2: channel 0: written 31 to efd 6
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug2: channel 0: no data after CLOSE
debug1: channel 0: obuf empty
debug1: channel 0: output drain -> closed
debug1: channel 0: close_write
debug1: channel 0: send close
debug1: channel 0: is dead
debug1: channel_free: channel 0: status: The following connections are open:
#0 client-session (t4 r0 i8/0 o128/0 fd -1/-1)

debug1: channel_free: channel 0: dettaching channel user
debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 8.3 seconds
debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
debug1: Exit status 0
6c80ab2560a5f7b9b778b5498a93ece8 -

dd if=/bin/bash bs=655360 count=2 | ( sleep 10 ; md5sum )
0+1 records in
0+1 records out
6c80ab2560a5f7b9b778b5498a93ece8 -

SSHD:
debug1: Entering interactive session for SSH2.
debug1: server_init_dispatch_20
debug1: server_input_channel_open: ctype session rchan 0 win 65536 max 32768
debug1: input_session_request
debug1: channel 0: new [server-session]
debug1: session_new: init
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 channel 0 request x11-req reply 0
debug1: Received request for X11 forwarding with auth spoofing.
debug1: x11_create_display_inet: Socket family 10 not supported
debug1: fd 3 setting O_NONBLOCK
debug1: fd 3 IS O_NONBLOCK
debug1: channel 1: new [X11 inet listener]
debug1: temporarily_use_uid: 9062/2501 (e=0)
debug1: restore_uid
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 channel 0 request exec reply 0
debug1: fd 8 setting O_NONBLOCK
debug1: fd 8 IS O_NONBLOCK
debug1: fd 10 setting O_NONBLOCK
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 9496
debug1: session_exit_message: session 0 channel 0 pid 9496
debug1: session_exit_message: release channel 0
debug1: channel 0: write failed
debug1: channel 0: output open -> closed
debug1: channel 0: close_write
debug1: channel 0: close_read
debug1: session_free: session 0 pid 9496
debug1: channel 0: read<=0 rfd 8 len 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed
debug1: channel 0: send eof
debug1: channel 0: send close
debug1: channel 0: rcvd close
debug1: channel 0: is dead
debug1: channel_free: channel 0: status: The following connections are open:
#0 server-session (t4 r0 i8/0 o128/0 fd 8/8)

Connection closed by remote host.
debug1: channel_free: channel 1: status: The following connections are open:

debug1: xauthfile_cleanup_proc called
Closing connection to 127.0.0.1
Markus Friedl
2001-05-18 12:50:15 UTC
Permalink
Post by John Bowman
Post by Markus Friedl
debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 15907
debug2: channel 0: rcvd adjust 16861
debug3: channel 0: channel data: 477
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug2: channel 0: rcvd adjust 65536
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug3: channel 0: read rfd 10 len 16384
debug3: channel 0: channel data: 16384
debug1: Received SIGCHLD.
^^ shell dies
debug1: session_by_pid: pid 29873
debug1: session_exit_message: session 0 channel 0 pid 29873
debug1: session_exit_message: release channel 0
debug1: channel 0: write failed
debug1: channel 0: output open -> closed
debug1: channel 0: close_write
debug1: session_free: session 0 pid 29873
debug3: channel 0: read rfd 10 len 16384
^^ more reads from the shell.
if you shutdown at the SIGCHLD, you can no longer read
at this point!
debug2: channel 0: read 84 from efd 12
debug3: channel 0: channel data: 16384
debug2: channel 0: rwin 16384 elen 84 euse 1
debug2: channel 0: sent ext data 84
debug1: channel 0: read<=0 rfd 10 len 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed
debug1: channel 0: send eof
Here is the debugging output, with my latest patch (which fixes the -N
these are still not the traces i'm looking for.
you need to make sure that SSHD still does
reads after the SIGCLD:

debug1: Received SIGCHLD.
...
debug3: channel 0: read rfd 10 len 16384
John Bowman
2001-05-18 14:48:56 UTC
Permalink
Post by Markus Friedl
these are still not the traces i'm looking for.
you need to make sure that SSHD still does
debug1: Received SIGCHLD.
...
debug3: channel 0: read rfd 10 len 16384
Sorry, I've tested many different cases, such as these ones

ssh -v -v -v localhost dd if=/usr/local/netscape/netscape bs=1300000 count=10 | ( sleep 5 ; md5sum )

ssh -v -v -v localhost dd if=/usr/local/netscape/netscape bs=1300000 count=10 | ( sleep 50 ; md5sum )

ssh -v -v -v localhost dd if=/usr/local/netscape/netscape bs=1300000 count=1 | ( md5sum )

and many others and I've never seen this happen. The checksums are always
correct. If you have a specfic test you want me to try, pleae let me know.

Like I said, under Linux at least, by the time shutdown is called, all of
the data has been read in.

You are most welcome to try to show otherwise. I'll give you an account on
a Linux system; please contact me about this. Until then, I will continue
to use my patch; it makes OpenSSH a practical alternative to SSH on Linux
systems.

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman
Markus Friedl
2001-05-18 20:05:42 UTC
Permalink
Post by John Bowman
Post by Markus Friedl
these are still not the traces i'm looking for.
you need to make sure that SSHD still does
debug1: Received SIGCHLD.
...
debug3: channel 0: read rfd 10 len 16384
Sorry, I've tested many different cases, such as these ones
ssh -v -v -v localhost dd if=/usr/local/netscape/netscape bs=1300000 count=10 | ( sleep 5 ; md5sum )
ssh -v -v -v localhost dd if=/usr/local/netscape/netscape bs=1300000 count=10 | ( sleep 50 ; md5sum )
ssh -v -v -v localhost dd if=/usr/local/netscape/netscape bs=1300000 count=1 | ( md5sum )
ok, so just fyi:
dd if=/bsd bs=65536 count=2
gets truncated on my openbsd development system.

you have to get into this situtation:

shell writes last block into pipe to sshd process.
shell dies
not all data has been read from the pipe.

i can trigger this with
dd if=/bsd bs=65536 count=2

the figures should be different for other systems, but i think
all systems will show this problem.
John Bowman
2001-05-15 02:55:37 UTC
Permalink
Here is a new version of the hang-on-exit patch, which:

1. fixes the hang-on-exit bug (without data loss);
2. does not exit if there are unterminated X applications;
3. exits the session when all X applications have closed.

Of these three tests, Openssh-2.9p1 only passes the second one. The
third one is another type of hanging bug in Openssh, as is demonstrated by
the following test:

ssh host
xterm -e sleep 20 &
exit

Even after the xsession terminates, the ssh session is left hanging forever.
The correct behaviour is to wait 20 seconds for the X application to close
and then exit.

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman


diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c
--- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001
+++ openssh-2.9p1J/channels.c Mon May 14 20:51:14 2001
@@ -1137,6 +1137,10 @@
continue;
if (ftab[c->type] == NULL)
continue;
+ if(c->type == SSH_CHANNEL_OPEN && c->rfd == -1) {
+ c->type = SSH_CHANNEL_FREE;
+ continue;
+ }
(*ftab[c->type])(c, readset, writeset);
if (chan_is_dead(c)) {
/*
@@ -1639,6 +1643,47 @@
for (i = 0; i < channels_alloc; i++)
if (channels[i].type != SSH_CHANNEL_FREE)
channel_close_fds(&channels[i]);
+}
+
+/* Returns true if session is inactive. */
+
+int
+channel_inactive_session()
+{
+ u_int i;
+ if(channels_alloc == 0) return 0;
+
+ for (i = 0; i < channels_alloc; i++) {
+ switch (channels[i].type) {
+ case SSH_CHANNEL_FREE:
+ case SSH_CHANNEL_X11_LISTENER:
+ case SSH_CHANNEL_CLOSED:
+ break;
+ case SSH_CHANNEL_PORT_LISTENER:
+ case SSH_CHANNEL_RPORT_LISTENER:
+ case SSH_CHANNEL_AUTH_SOCKET:
+ case SSH_CHANNEL_DYNAMIC:
+ case SSH_CHANNEL_CONNECTING: /* XXX ??? */
+ return 0;
+ case SSH_CHANNEL_LARVAL:
+ if (!compat20)
+ fatal("cannot happen: SSH_CHANNEL_LARVAL");
+ return 0;
+ case SSH_CHANNEL_OPENING:
+ case SSH_CHANNEL_OPEN:
+ case SSH_CHANNEL_X11_OPEN:
+ return 0;
+ case SSH_CHANNEL_INPUT_DRAINING:
+ case SSH_CHANNEL_OUTPUT_DRAINING:
+ if (!compat13)
+ fatal("cannot happen: OUT_DRAIN");
+ return 0;
+ default:
+ fatal("channel_inactive_session: bad channel type %d", channels[i].type);
+ /* NOTREACHED */
+ }
+ }
+ return 1;
}

/* Returns true if any channel is still open. */
diff -ur openssh-2.9p1/channels.h openssh-2.9p1J/channels.h
--- openssh-2.9p1/channels.h Fri Apr 13 17:28:02 2001
+++ openssh-2.9p1J/channels.h Mon May 14 20:51:14 2001
@@ -197,6 +197,9 @@
*/
void channel_close_all(void);

+/* Returns true if session is inactive. */
+int channel_inactive_session();
+
/* Returns true if there is still an open channel over the connection. */
int channel_still_open(void);

diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c
--- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001
+++ openssh-2.9p1J/clientloop.c Mon May 14 20:51:14 2001
@@ -440,9 +440,13 @@
len = read(connection_in, buf, sizeof(buf));
if (len == 0) {
/* Received EOF. The remote host has closed the connection. */
- snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
- host);
- buffer_append(&stderr_buffer, buf, strlen(buf));
+/*
+ * This message duplicates the one already in client_loop().
+ *
+ * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
+ * host);
+ * buffer_append(&stderr_buffer, buf, strlen(buf));
+ */
quit_pending = 1;
return;
}
diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c
--- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001
+++ openssh-2.9p1J/nchan.c Mon May 14 20:51:14 2001
@@ -56,7 +56,7 @@

/* helper */
static void chan_shutdown_write(Channel *c);
-static void chan_shutdown_read(Channel *c);
+void chan_shutdown_read(Channel *c);

/*
* SSH1 specific implementation of event functions
@@ -479,7 +479,7 @@
c->wfd = -1;
}
}
-static void
+void
chan_shutdown_read(Channel *c)
{
if (compat20 && c->type == SSH_CHANNEL_LARVAL)
diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h
--- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001
+++ openssh-2.9p1J/nchan.h Mon May 14 20:51:14 2001
@@ -88,4 +88,5 @@

void chan_init_iostates(Channel * c);
void chan_init(void);
+void chan_shutdown_read(Channel *c);
#endif
diff -ur openssh-2.9p1/serverloop.c openssh-2.9p1J/serverloop.c
--- openssh-2.9p1/serverloop.c Fri Apr 13 17:28:03 2001
+++ openssh-2.9p1J/serverloop.c Mon May 14 20:51:14 2001
@@ -726,7 +726,7 @@
if (!rekeying)
channel_after_select(readset, writeset);
process_input(readset);
- if (connection_closed)
+ if (connection_closed || channel_inactive_session())
break;
process_output(writeset);
}
diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c
--- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001
+++ openssh-2.9p1J/session.c Mon May 14 20:51:14 2001
@@ -1960,6 +1960,9 @@
*/
if (c->ostate != CHAN_OUTPUT_CLOSED)
chan_write_failed(c);
+ if (c->istate == CHAN_INPUT_OPEN && compat20) {
+ chan_shutdown_read(c);
+ }
s->chanid = -1;
}
John Bowman
2001-05-15 04:23:28 UTC
Permalink
Disregard my previous message...that isn't the right patch....I'm still
testing a new one...

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman
John Bowman
2001-05-16 08:18:03 UTC
Permalink
The following is a CORRECTION, with a REVISED PATCH, to my message posted
to this list on 2001-05-15 2:55:37.

Here is a new version of the hang-on-exit patch (2001-05-08 23:52:24), which:

1. fixes the hang-on-exit bug under Protocol 2 (without data loss);
2. does not exit if there are unterminated X applications;
3. exits the session when all X applications have closed.

Of these three tests, Openssh-2.9p1 under Protocol 2 passes only the second
one. The third item is another type of hanging bug in Openssh, as is
demonstrated by the following test:

ssh -2 host
xterm -e sleep 20 &
exit

Even after the xsession terminates, the ssh session is left hanging forever.
The correct behaviour is to wait 20 seconds for the X application to close
and then exit.

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman

P.S. Since the hang-on-exit patch is only effective under Protocol 2,
a conditional to the call to chan_shutdown_read() has been added.



diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c
--- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001
+++ openssh-2.9p1J/channels.c Wed May 16 01:22:16 2001
@@ -333,6 +333,9 @@
xfree(c->remote_name);
c->remote_name = NULL;
}
+
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(), SHUT_RDWR);
}

/*
@@ -1137,6 +1140,15 @@
continue;
if (ftab[c->type] == NULL)
continue;
+ if(c->istate == CHAN_INPUT_OPEN && c->rfd == -1) {
+ int type=c->type;
+ c->type=SSH_CHANNEL_CLOSED;
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(),
+ SHUT_RDWR);
+ c->type=type;
+ continue;
+ }
(*ftab[c->type])(c, readset, writeset);
if (chan_is_dead(c)) {
/*
diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c
--- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001
+++ openssh-2.9p1J/clientloop.c Wed May 16 01:22:16 2001
@@ -440,9 +440,13 @@
len = read(connection_in, buf, sizeof(buf));
if (len == 0) {
/* Received EOF. The remote host has closed the connection. */
- snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
- host);
- buffer_append(&stderr_buffer, buf, strlen(buf));
+/*
+ * This message duplicates the one already in client_loop().
+ *
+ * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
+ * host);
+ * buffer_append(&stderr_buffer, buf, strlen(buf));
+ */
quit_pending = 1;
return;
}
diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c
--- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001
+++ openssh-2.9p1J/nchan.c Wed May 16 01:22:16 2001
@@ -56,7 +56,7 @@

/* helper */
static void chan_shutdown_write(Channel *c);
-static void chan_shutdown_read(Channel *c);
+void chan_shutdown_read(Channel *c);

/*
* SSH1 specific implementation of event functions
@@ -479,7 +479,7 @@
c->wfd = -1;
}
}
-static void
+void
chan_shutdown_read(Channel *c)
{
if (compat20 && c->type == SSH_CHANNEL_LARVAL)
diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h
--- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001
+++ openssh-2.9p1J/nchan.h Wed May 16 01:22:16 2001
@@ -88,4 +88,5 @@

void chan_init_iostates(Channel * c);
void chan_init(void);
+void chan_shutdown_read(Channel *c);
#endif
diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c
--- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001
+++ openssh-2.9p1J/session.c Wed May 16 02:05:12 2001
@@ -1960,6 +1960,9 @@
*/
if (c->ostate != CHAN_OUTPUT_CLOSED)
chan_write_failed(c);
+ if (c->istate != CHAN_INPUT_CLOSED && compat20) {
+ chan_shutdown_read(c);
+ }
s->chanid = -1;
}
Markus Friedl
2001-05-16 20:36:27 UTC
Permalink
Post by John Bowman
The third item is another type of hanging bug in Openssh, as is
ssh -2 host
xterm -e sleep 20 &
exit
Even after the xsession terminates, the ssh session is left hanging forever.
The correct behaviour is to wait 20 seconds for the X application to close
and then exit.
this is a client bug. try this:

Index: clientloop.c
===================================================================
RCS file: /home/markus/cvs/ssh/clientloop.c,v
retrieving revision 1.70
diff -u -r1.70 clientloop.c
--- clientloop.c 2001/05/11 14:59:55 1.70
+++ clientloop.c 2001/05/16 20:31:44
@@ -346,7 +346,13 @@
if (buffer_len(&stderr_buffer) > 0)
FD_SET(fileno(stderr), *writesetp);
} else {
- FD_SET(connection_in, *readsetp);
+ /* channel_prepare_select could have closed the last channel */
+ if (session_closed && !channel_still_open()) {
+ if (!packet_have_data_to_write())
+ return;
+ } else {
+ FD_SET(connection_in, *readsetp);
+ }
}

/* Select server connection if have data to write to the server. */
John Bowman
2001-05-17 01:44:50 UTC
Permalink
Post by Markus Friedl
Post by John Bowman
The third item is another type of hanging bug in Openssh, as is
ssh -2 host
xterm -e sleep 20 &
exit
Even after the xsession terminates, the ssh session is left hanging forever.
The correct behaviour is to wait 20 seconds for the X application to close
and then exit.
Index: clientloop.c
===================================================================
RCS file: /home/markus/cvs/ssh/clientloop.c,v
retrieving revision 1.70
diff -u -r1.70 clientloop.c
--- clientloop.c 2001/05/11 14:59:55 1.70
+++ clientloop.c 2001/05/16 20:31:44
@@ -346,7 +346,13 @@
if (buffer_len(&stderr_buffer) > 0)
FD_SET(fileno(stderr), *writesetp);
} else {
- FD_SET(connection_in, *readsetp);
+ /* channel_prepare_select could have closed the last channel */
+ if (session_closed && !channel_still_open()) {
+ if (!packet_have_data_to_write())
+ return;
+ } else {
+ FD_SET(connection_in, *readsetp);
+ }
}
/* Select server connection if have data to write to the server. */
Yes, this patch fixes the X hanging bug (test under Protocol 2 on RedHat
6.2 linux systems). Thanks!

I've incorporated it into this latest version of the hang-on-exit patch
(the latest patch will always be available from
http://www.math.ualberta.ca/imaging/snfs)

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman


diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c
--- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001
+++ openssh-2.9p1J/channels.c Wed May 16 16:42:53 2001
@@ -1137,6 +1137,15 @@
continue;
if (ftab[c->type] == NULL)
continue;
+ if(c->istate == CHAN_INPUT_OPEN && c->rfd == -1) {
+ int type=c->type;
+ c->type=SSH_CHANNEL_CLOSED;
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(),
+ SHUT_RDWR);
+ c->type=type;
+ continue;
+ }
(*ftab[c->type])(c, readset, writeset);
if (chan_is_dead(c)) {
/*
diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c
--- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001
+++ openssh-2.9p1J/clientloop.c Wed May 16 18:55:58 2001
@@ -346,7 +346,13 @@
if (buffer_len(&stderr_buffer) > 0)
FD_SET(fileno(stderr), *writesetp);
} else {
- FD_SET(connection_in, *readsetp);
+ /* channel_prepare_select could have closed the last channel */
+ if (session_closed && !channel_still_open()) {
+ if (!packet_have_data_to_write())
+ return;
+ } else {
+ FD_SET(connection_in, *readsetp);
+ }
}

/* Select server connection if have data to write to the server. */
@@ -440,9 +446,13 @@
len = read(connection_in, buf, sizeof(buf));
if (len == 0) {
/* Received EOF. The remote host has closed the connection. */
- snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
- host);
- buffer_append(&stderr_buffer, buf, strlen(buf));
+/*
+ * This message duplicates the one already in client_loop().
+ *
+ * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
+ * host);
+ * buffer_append(&stderr_buffer, buf, strlen(buf));
+ */
quit_pending = 1;
return;
}
diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c
--- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001
+++ openssh-2.9p1J/nchan.c Wed May 16 11:29:36 2001
@@ -56,7 +56,7 @@

/* helper */
static void chan_shutdown_write(Channel *c);
-static void chan_shutdown_read(Channel *c);
+void chan_shutdown_read(Channel *c);

/*
* SSH1 specific implementation of event functions
@@ -479,7 +479,7 @@
c->wfd = -1;
}
}
-static void
+void
chan_shutdown_read(Channel *c)
{
if (compat20 && c->type == SSH_CHANNEL_LARVAL)
diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h
--- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001
+++ openssh-2.9p1J/nchan.h Wed May 16 11:29:36 2001
@@ -88,4 +88,5 @@

void chan_init_iostates(Channel * c);
void chan_init(void);
+void chan_shutdown_read(Channel *c);
#endif
diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c
--- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001
+++ openssh-2.9p1J/session.c Wed May 16 18:57:49 2001
@@ -1960,6 +1960,9 @@
*/
if (c->ostate != CHAN_OUTPUT_CLOSED)
chan_write_failed(c);
+ if (c->istate == CHAN_INPUT_OPEN && compat20) {
+ chan_shutdown_read(c);
+ }
s->chanid = -1;
}
Markus Friedl
2001-05-17 21:41:44 UTC
Permalink
Post by John Bowman
Yes, this patch fixes the X hanging bug (test under Protocol 2 on RedHat
6.2 linux systems). Thanks!
I've incorporated it into this latest version of the hang-on-exit patch
(the latest patch will always be available from
http://www.math.ualberta.ca/imaging/snfs)
-- John Bowman
University of Alberta
http://www.math.ualberta.ca/~bowman
diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c
--- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001
+++ openssh-2.9p1J/channels.c Wed May 16 16:42:53 2001
@@ -1137,6 +1137,15 @@
continue;
if (ftab[c->type] == NULL)
continue;
+ if(c->istate == CHAN_INPUT_OPEN && c->rfd == -1) {
+ int type=c->type;
+ c->type=SSH_CHANNEL_CLOSED;
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(),
+ SHUT_RDWR);
^^^^^^^^
this cannot be correct. you may _not_ shutdown
the TCP connection. this breaks
ssh -N -L 1234:hostb:5678 hosta
Post by John Bowman
+ c->type=type;
+ continue;
+ }
John Bowman
2001-05-18 00:35:43 UTC
Permalink
Post by Markus Friedl
Post by John Bowman
continue;
if (ftab[c->type] == NULL)
continue;
+ if(c->istate == CHAN_INPUT_OPEN && c->rfd == -1) {
+ int type=c->type;
+ c->type=SSH_CHANNEL_CLOSED;
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(),
+ SHUT_RDWR);
^^^^^^^^
this cannot be correct. you may _not_ shutdown
the TCP connection. this breaks
ssh -N -L 1234:hostb:5678 hosta
In what way does this "break" tunnelling? It works just fine for me, under
Linux. Try it out on a Linux machine. The hang-on-exit patch has recently been
reported not to work under HP-UX, so Linux is the only OS where this
patch should be applied anyway (at least, until Markus comes up with a
better alternative :-).

While we are on the topic of tunnelling...I have another patch that I am
about to submit to this list that implements a handy enhancement to tunneling
that was proposed by John Hardin of Apropos Retail Management Systems, Inc.
One can now request that the connection not die after the first TCP
connection is closed (via -N) or after a fixed number of seconds (via a sleep
command), but rather that it stays around n seconds after the most recent TCP
connection is closed. More details to follow...

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman
Markus Friedl
2001-05-18 12:34:46 UTC
Permalink
Post by John Bowman
One can now request that the connection not die after the first TCP
connection is closed (via -N) or after a fixed number of seconds (via a sleep
command), but rather that it stays around n seconds after the most recent TCP
connection is closed. More details to follow...
currently the connection is _not_ closed after the first connection
John Bowman
2001-05-18 04:52:38 UTC
Permalink
Post by John Bowman
continue;
if (ftab[c->type] == NULL)
continue;
+ if(c->istate == CHAN_INPUT_OPEN && c->rfd == -1) {
+ int type=c->type;
+ c->type=SSH_CHANNEL_CLOSED;
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(),
+ SHUT_RDWR);
^^^^^^^^
this cannot be correct. you may _not_ shutdown
the TCP connection. this breaks
ssh -N -L 1234:hostb:5678 hosta
Post by John Bowman
+ c->type=type;
+ continue;
+ }
Ah yes, now I see what you mean...

The OpenSSH -N extension to SSH is supposed to hold the connection open
indefinitely (BTW, the man page doesn't make this explicit). That certainly
seems useful. Thanks, for pointing this out, Markus; I didn't understand
at first what you meant by "break".

Making use of the sleep (-S) option from the patch submitted earlier today,
this feature turned out to be easy to add to the hang-on-exit patch for Linux.
All I had to do was add the line

if(no_tty_flag && options.sleep < 0) options.sleep=0;

after the options are read in in ssh.c.

Here is the complete patch to 2.9p1 to fix all known hanging problems and
restore the desired behaviour with -N, without data loss, on Linux systems.
This includes my sleep patch and Markus' X-hang patch:

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman



diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c
--- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001
+++ openssh-2.9p1J/channels.c Thu May 17 22:21:05 2001
@@ -1137,6 +1137,15 @@
continue;
if (ftab[c->type] == NULL)
continue;
+ if(c->istate == CHAN_INPUT_OPEN && c->rfd == -1) {
+ int type=c->type;
+ c->type=SSH_CHANNEL_CLOSED;
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(),
+ SHUT_RDWR);
+ c->type=type;
+ continue;
+ }
(*ftab[c->type])(c, readset, writeset);
if (chan_is_dead(c)) {
/*
diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c
--- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001
+++ openssh-2.9p1J/clientloop.c Thu May 17 22:25:45 2001
@@ -121,8 +121,8 @@
static int connection_in; /* Connection to server (input). */
static int connection_out; /* Connection to server (output). */
static int need_rekeying; /* Set to non-zero if rekeying is requested. */
-static int session_closed = 0; /* In SSH2: login session closed. */
-
+enum SessionStatus {SessionOpen, SessionClose, SessionWait};
+static int session_status = SessionOpen; /* In SSH2: login session closed. */
void client_init_dispatch(void);
int session_ident = -1;

@@ -324,6 +324,10 @@
client_wait_until_can_do_something(fd_set **readsetp, fd_set **writesetp,
int *maxfdp, int rekeying)
{
+ struct timeval timer;
+ struct timeval *timerp;
+ int rc;
+
/* Add any selections by the channel mechanism. */
channel_prepare_select(readsetp, writesetp, maxfdp, rekeying);

@@ -346,7 +350,14 @@
if (buffer_len(&stderr_buffer) > 0)
FD_SET(fileno(stderr), *writesetp);
} else {
- FD_SET(connection_in, *readsetp);
+ /* channel_prepare_select could have closed the last channel */
+ if ((session_status == SessionClose)
+ && !channel_still_open()) {
+ if (!packet_have_data_to_write())
+ return;
+ } else {
+ FD_SET(connection_in, *readsetp);
+ }
}

/* Select server connection if have data to write to the server. */
@@ -362,7 +374,16 @@
* SSH_MSG_IGNORE packet when the timeout expires.
*/

- if (select((*maxfdp)+1, *readsetp, *writesetp, NULL, NULL) < 0) {
+ if(session_status == SessionWait && options.sleep > 0) {
+ timer.tv_sec=options.sleep;
+ timer.tv_usec=0;
+ timerp=&timer;
+ } else {
+ timerp=NULL;
+ }
+
+ rc=select((*maxfdp)+1, *readsetp, *writesetp, NULL, timerp);
+ if (rc < 0) {
char buf[100];

/*
@@ -379,7 +400,8 @@
snprintf(buf, sizeof buf, "select: %s\r\n", strerror(errno));
buffer_append(&stderr_buffer, buf, strlen(buf));
quit_pending = 1;
- }
+ } else if (rc == 0 && session_status == SessionWait)
+ session_status=SessionClose;
}

void
@@ -440,9 +462,13 @@
len = read(connection_in, buf, sizeof(buf));
if (len == 0) {
/* Received EOF. The remote host has closed the connection. */
- snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
- host);
- buffer_append(&stderr_buffer, buf, strlen(buf));
+/*
+ * This message duplicates the one already in client_loop().
+ *
+ * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
+ * host);
+ * buffer_append(&stderr_buffer, buf, strlen(buf));
+ */
quit_pending = 1;
return;
}
@@ -751,7 +777,7 @@
if (id != session_ident)
error("client_channel_closed: id %d != session_ident %d",
id, session_ident);
- session_closed = 1;
+ session_status = (options.sleep >= 0) ? SessionWait : SessionClose;
if (in_raw_mode())
leave_raw_mode();
}
@@ -776,6 +802,7 @@
start_time = get_current_time();

/* Initialize variables. */
+ if(!have_pty) session_status=SessionWait;
escape_pending = 0;
last_was_cr = 1;
exit_status = -1;
@@ -840,7 +867,8 @@
/* Process buffered packets sent by the server. */
client_process_buffered_input_packets();

- if (compat20 && session_closed && !channel_still_open())
+ if (compat20 && (session_status == SessionClose)
+ && !channel_still_open())
break;

rekeying = (xxx_kex != NULL && !xxx_kex->done);
Only in openssh-2.9p1J: clientloop.c.orig
diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c
--- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001
+++ openssh-2.9p1J/nchan.c Thu May 17 22:21:05 2001
@@ -56,7 +56,7 @@

/* helper */
static void chan_shutdown_write(Channel *c);
-static void chan_shutdown_read(Channel *c);
+void chan_shutdown_read(Channel *c);

/*
* SSH1 specific implementation of event functions
@@ -479,7 +479,7 @@
c->wfd = -1;
}
}
-static void
+void
chan_shutdown_read(Channel *c)
{
if (compat20 && c->type == SSH_CHANNEL_LARVAL)
diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h
--- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001
+++ openssh-2.9p1J/nchan.h Thu May 17 22:21:05 2001
@@ -88,4 +88,5 @@

void chan_init_iostates(Channel * c);
void chan_init(void);
+void chan_shutdown_read(Channel *c);
#endif
diff -ur openssh-2.9p1/readconf.c openssh-2.9p1J/readconf.c
--- openssh-2.9p1/readconf.c Tue Apr 17 12:11:37 2001
+++ openssh-2.9p1J/readconf.c Thu May 17 22:21:05 2001
@@ -111,7 +111,7 @@
oGlobalKnownHostsFile2, oUserKnownHostsFile2, oPubkeyAuthentication,
oKbdInteractiveAuthentication, oKbdInteractiveDevices, oHostKeyAlias,
oDynamicForward, oPreferredAuthentications, oHostbasedAuthentication,
- oHostKeyAlgorithms
+ oHostKeyAlgorithms, oSleep
} OpCodes;

/* Textual representations of the tokens. */
@@ -177,6 +177,7 @@
{ "dynamicforward", oDynamicForward },
{ "preferredauthentications", oPreferredAuthentications },
{ "hostkeyalgorithms", oHostKeyAlgorithms },
+ { "sleep", oSleep },
{ NULL, 0 }
};

@@ -494,6 +495,10 @@
intptr = &options->connection_attempts;
goto parse_int;

+ case oSleep:
+ intptr = &options->sleep;
+ goto parse_int;
+
case oCipher:
intptr = &options->cipher;
arg = strdelim(&s);
@@ -761,6 +766,7 @@
options->num_remote_forwards = 0;
options->log_level = (LogLevel) - 1;
options->preferred_authentications = NULL;
+ options->sleep = -1;
}

/*
diff -ur openssh-2.9p1/readconf.h openssh-2.9p1J/readconf.h
--- openssh-2.9p1/readconf.h Tue Apr 17 12:11:37 2001
+++ openssh-2.9p1J/readconf.h Thu May 17 22:21:05 2001
@@ -97,6 +97,7 @@
/* Remote TCP/IP forward requests. */
int num_remote_forwards;
Forward remote_forwards[SSH_MAX_FORWARDS_PER_DIRECTION];
+ int sleep; /* Exit delay in seconds */
} Options;


diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c
--- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001
+++ openssh-2.9p1J/session.c Thu May 17 22:21:05 2001
@@ -1960,6 +1960,9 @@
*/
if (c->ostate != CHAN_OUTPUT_CLOSED)
chan_write_failed(c);
+ if (c->istate == CHAN_INPUT_OPEN && compat20) {
+ chan_shutdown_read(c);
+ }
s->chanid = -1;
}

diff -ur openssh-2.9p1/ssh.c openssh-2.9p1J/ssh.c
--- openssh-2.9p1/ssh.c Tue Apr 17 12:14:35 2001
+++ openssh-2.9p1J/ssh.c Thu May 17 22:21:05 2001
@@ -182,6 +182,7 @@
fprintf(stderr, " -R listen-port:host:port Forward remote port to local address\n");
fprintf(stderr, " These cause %s to listen for connections on a port, and\n", __progname);
fprintf(stderr, " forward them to the other side by connecting to host:port.\n");
+ fprintf(stderr, " -S delay Set exit delay (in seconds; 0 means wait forever).\n");
fprintf(stderr, " -C Enable compression.\n");
fprintf(stderr, " -N Do not execute a shell or command.\n");
fprintf(stderr, " -g Allow remote hosts to connect to forwarded ports.\n");
@@ -318,7 +319,7 @@
opt = av[optind][1];
if (!opt)
usage();
- if (strchr("eilcmpLRDo", opt)) { /* options with arguments */
+ if (strchr("eilcmpLRSDo", opt)) { /* options with arguments */
optarg = av[optind] + 2;
if (strcmp(optarg, "") == 0) {
if (optind >= ac - 1)
@@ -488,7 +489,13 @@
}
add_local_forward(&options, fwd_port, buf, fwd_host_port);
break;
-
+ case 'S':
+ options.sleep = atoi(optarg);
+ if (options.sleep < 0) {
+ fprintf(stderr, "Bad delay value '%s'\n", optarg);
+ exit(1);
+ }
+ break;
case 'D':
fwd_port = a2port(optarg);
if (fwd_port == 0) {
@@ -526,6 +533,8 @@
if (!host)
usage();

+ if(no_tty_flag && options.sleep < 0) options.sleep=0;
+
SSLeay_add_all_algorithms();
ERR_load_crypto_strings();
Markus Friedl
2001-05-16 21:46:46 UTC
Permalink
Post by John Bowman
The following is a CORRECTION, with a REVISED PATCH, to my message posted
to this list on 2001-05-15 2:55:37.
1. fixes the hang-on-exit bug under Protocol 2 (without data loss);
2. does not exit if there are unterminated X applications;
3. exits the session when all X applications have closed.
Of these three tests, Openssh-2.9p1 under Protocol 2 passes only the second
one. The third item is another type of hanging bug in Openssh, as is
ssh -2 host
xterm -e sleep 20 &
exit
Even after the xsession terminates, the ssh session is left hanging forever.
The correct behaviour is to wait 20 seconds for the X application to close
and then exit.
-- John Bowman
University of Alberta
http://www.math.ualberta.ca/~bowman
P.S. Since the hang-on-exit patch is only effective under Protocol 2,
a conditional to the call to chan_shutdown_read() has been added.
diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c
--- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001
+++ openssh-2.9p1J/channels.c Wed May 16 01:22:16 2001
@@ -333,6 +333,9 @@
xfree(c->remote_name);
c->remote_name = NULL;
}
+
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(), SHUT_RDWR);
continue;
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(),
+ SHUT_RDWR);
imho, this is wrong. you are not allowed to shutdown the TCP
connection to the peer. the peer can still request a second shell
sessions.
John Bowman
2001-05-16 17:42:18 UTC
Permalink
Here is a perhaps a slightly more robust version (in case of internal
errors; see chan_read_failed_12) of the hang-on-exit patch.

In session.c, I've changed the line
if (c->istate != CHAN_INPUT_CLOSED && compat20) {
to
if (c->istate == CHAN_INPUT_OPEN && compat20) {

In practice this shouldn't make any difference, since c->istate
should always equal either CHAN_INPUT_CLOSED or CHAN_INPUT_OPEN
within session_exit_message.

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman


diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c
--- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001
+++ openssh-2.9p1J/channels.c Wed May 16 01:22:16 2001
@@ -333,6 +333,9 @@
xfree(c->remote_name);
c->remote_name = NULL;
}
+
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(), SHUT_RDWR);
}

/*
@@ -1137,6 +1140,15 @@
continue;
if (ftab[c->type] == NULL)
continue;
+ if(c->istate == CHAN_INPUT_OPEN && c->rfd == -1) {
+ int type=c->type;
+ c->type=SSH_CHANNEL_CLOSED;
+ if(channel_find_open() == -1)
+ shutdown(packet_get_connection_out(),
+ SHUT_RDWR);
+ c->type=type;
+ continue;
+ }
(*ftab[c->type])(c, readset, writeset);
if (chan_is_dead(c)) {
/*
diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c
--- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001
+++ openssh-2.9p1J/clientloop.c Wed May 16 01:22:16 2001
@@ -440,9 +440,13 @@
len = read(connection_in, buf, sizeof(buf));
if (len == 0) {
/* Received EOF. The remote host has closed the connection. */
- snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
- host);
- buffer_append(&stderr_buffer, buf, strlen(buf));
+/*
+ * This message duplicates the one already in client_loop().
+ *
+ * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n",
+ * host);
+ * buffer_append(&stderr_buffer, buf, strlen(buf));
+ */
quit_pending = 1;
return;
}
diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c
--- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001
+++ openssh-2.9p1J/nchan.c Wed May 16 01:22:16 2001
@@ -56,7 +56,7 @@

/* helper */
static void chan_shutdown_write(Channel *c);
-static void chan_shutdown_read(Channel *c);
+void chan_shutdown_read(Channel *c);

/*
* SSH1 specific implementation of event functions
@@ -479,7 +479,7 @@
c->wfd = -1;
}
}
-static void
+void
chan_shutdown_read(Channel *c)
{
if (compat20 && c->type == SSH_CHANNEL_LARVAL)
diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h
--- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001
+++ openssh-2.9p1J/nchan.h Wed May 16 01:22:16 2001
@@ -88,4 +88,5 @@

void chan_init_iostates(Channel * c);
void chan_init(void);
+void chan_shutdown_read(Channel *c);
#endif
diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c
--- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001
+++ openssh-2.9p1J/session.c Wed May 16 11:25:17 2001
@@ -1960,6 +1960,9 @@
*/
if (c->ostate != CHAN_OUTPUT_CLOSED)
chan_write_failed(c);
+ if (c->istate == CHAN_INPUT_OPEN && compat20) {
+ chan_shutdown_read(c);
+ }
s->chanid = -1;
}
Markus Friedl
2001-05-16 17:59:11 UTC
Permalink
Post by John Bowman
Here is a perhaps a slightly more robust version (in case of internal
errors; see chan_read_failed_12) of the hang-on-exit patch.
In session.c, I've changed the line
if (c->istate != CHAN_INPUT_CLOSED && compat20) {
to
if (c->istate == CHAN_INPUT_OPEN && compat20) {
In practice this shouldn't make any difference, since c->istate
should always equal either CHAN_INPUT_CLOSED or CHAN_INPUT_OPEN
within session_exit_message.
i think that shutdown should only be allowed if c->istate ==
CHAN_INPUT_CLOSED;

moreover, i'm still waiting for feedback on what rlogind does on
these systems.
John Bowman
2001-05-16 20:32:14 UTC
Permalink
Post by Markus Friedl
i think that shutdown should only be allowed if c->istate ==
CHAN_INPUT_CLOSED;
No, this will cause openssh to hang under interactive use. When the shell
exits, no more data should be read. Even if you execute a backround process
noninteractively, like in Damien's example

ssh host "(sleep 20 ; dd if=/dev/zero bs=1024 count=100 | wc -c)&"

it seems reasonable to me that the sleep 20 should be started in the
background and then the shell should immediately exit. The output from dd
will be discarded. (If you don't like this then don't put the dd in the
background.) This is the same behaviour as in the commercial version
of ssh, which OpenSSH is trying to emulate. See the previous postings about
background processes and interactive shells.

For noninteractive usage, like in the above example, rsh actually *does* keep
the connection open until the backgrounded process finishes. If you really
want OpenSSH to emulate rsh and not SSH then I suppose one could modify the
patch so that it only applies to interactive sessions. Note that under
interactive usage neither rsh or SSH hang:

ssh host
sleep 20&
exit

The only exception, for both SSH and (the patched) OpenSsh-2.9p1, should
be for X applications. In this case, the ssh connection has to be kept open
because it is needed to forward the X traffic.

So a call to shutdown is really needed. But the data that has already been
read should be sent on across the channel in order to avoid data loss
from a command like

while [ 1 ]; do ssh host "dd if=/dev/zero bs=8192 count=10" | wc -c; done

Actually, chan_read_failed_12 should have done this, but it sets
c->istate = CHAN_INPUT_WAIT_DRAIN and there appears to be a bug in the
handling of this state that causes the previously read data not be sent
across the channel. My patch works around this and also implements a proper
handling of X applications.
Post by Markus Friedl
moreover, i'm still waiting for feedback on what rlogind does on
these systems.
See above.

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman
Markus Friedl
2001-05-16 20:49:45 UTC
Permalink
Post by John Bowman
Post by Markus Friedl
i think that shutdown should only be allowed if c->istate ==
CHAN_INPUT_CLOSED;
No, this will cause openssh to hang under interactive use.
but the transition from CHAN_INPUT_CLOSED to state!=CHAN_INPUT_CLOSED
should be caused by shutdown. see nchan2.ms
John Bowman
2001-05-16 20:53:29 UTC
Permalink
Post by Markus Friedl
Post by John Bowman
Post by Markus Friedl
i think that shutdown should only be allowed if c->istate ==
CHAN_INPUT_CLOSED;
No, this will cause openssh to hang under interactive use.
but the transition from CHAN_INPUT_CLOSED to state!=CHAN_INPUT_CLOSED
should be caused by shutdown. see nchan2.ms
If c->istate is CHAN_INPUT_CLOSED, then shutdown won't be called in any case.
Karl M
2001-05-18 05:38:03 UTC
Permalink
Hi All...

I ran into a hanging problem with 2.9p1 in the cygwin environment. I found
that

ssh -f localhost sleep 30

hangs on both 2.9p1 and 2.5p2.

ssh -f -L 5901:localhost:5900 localhost sleep 30

works fine with 2.5.2p2 but hangs with 2.9p1.

I tried all but the most recent patches you had on this thread, with no
effect. Is this an example of one of the known types of hanging? If not, can
you reproduce this case? In any event,

ssh -f -L 5901:localhost:5900 localhost sleep 30

is what I am trying to use with 2.9p1

Thanks,

...Karl
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com
John Bowman
2001-05-21 00:47:48 UTC
Permalink
Post by Markus Friedl
dd if=/bsd bs=65536 count=2
gets truncated on my openbsd development system.
shell writes last block into pipe to sshd process.
shell dies
not all data has been read from the pipe.
i can trigger this with
dd if=/bsd bs=65536 count=2
the figures should be different for other systems, but i think
all systems will show this problem.
It depends on how pipes are implemented. The scenario you describe
doesn't happen under Linux; the shell doesn't exit until all of the data
has been read from the pipe. I suspect these differences in the way the shell
and pipes interact are the underlying reason why you don't see the
hang-on-exit bug at all on OpenBSD.

Tweaking the parameters in your test doesn't make any difference on Linux,
as demonstrated by the output of the script below. Changing localhost
to another host (be sure to compare identical files) or working under
different load average conditions does not affect the results.

The patch has been subjected to exhaustive testing. Unless someone reports
a case where it fails before the next release, please go ahead and include
it in the next Linux version of OpenSSH. (If you don't like the -S option
for some reason, you can always remove it and the sleep config option).

#!/bin/sh
size=$1
incr=$2
count=$3
delay=$4
checksum=
answer=
while [ "$checksum" = "$answer" ]
do
checksum=`ssh localhost dd if=/usr/local/netscape/netscape bs=$size count=$count | ( sleep $delay ; md5sum )`
answer=`dd if=/usr/local/netscape/netscape bs=$size count=$count | md5sum`
echo $size $count $delay $checksum $answer
size=$[ $size + $incr ]
done
echo CHECKSUM MISMATCH!


The output of the tests is available at

http://www.math.ualberta.ca/imaging/snfs/hang-on-exit.test

-- John Bowman

University of Alberta
http://www.math.ualberta.ca/~bowman
Chris Seawood
2001-06-27 00:46:35 UTC
Permalink
Post by John Bowman
The patch has been subjected to exhaustive testing. Unless someone reports
a case where it fails before the next release, please go ahead and include
it in the next Linux version of OpenSSH. (If you don't like the -S option
for some reason, you can always remove it and the sleep config option).
John,

Thanks for providing the patch. I've been using it for a couple of
weeks now without any problems. Well, with one tiny problem that just
seemed to pop up today. If I use agent forwarding, then the patch
doesn't prevent the hang problem. Have you run across this problem?

- cls

Tom Holroyd
2001-05-21 01:41:07 UTC
Permalink
Post by John Bowman
The patch has been subjected to exhaustive testing. Unless someone reports
a case where it fails before the next release, please go ahead and include
it in the next Linux version of OpenSSH. (If you don't like the -S option
for some reason, you can always remove it and the sleep config option).
Just another data point: I've been starting netscape via

ssh -X -n -o 'batchmode yes' ansgimachine /usr/bin/X11/netscape &

from the desktop menubar (Linux/Alpha -> SGI Irix) and when I exit
netscape, ssh (2.9p1) usually fails to exit on the Linux side. With your
patch installed it exits every time. I'm not using -S, just as above.

Dr. Tom
vsync
2001-05-21 05:45:06 UTC
Permalink
Post by Tom Holroyd
Just another data point: I've been starting netscape via
ssh -X -n -o 'batchmode yes' ansgimachine /usr/bin/X11/netscape &
from the desktop menubar (Linux/Alpha -> SGI Irix) and when I exit
netscape, ssh (2.9p1) usually fails to exit on the Linux side. With your
patch installed it exits every time. I'm not using -S, just as above.
Just out of total random curiosity, why not run a local netscape?
--
vsync
http://quadium.net/ - last updated Tue May 15 15:02:08 PDT 2001
(cons (cons (car (cons 'c 'r)) (cdr (cons 'a 'o))) ; Orjner
(cons (cons (car (cons 'n 'c)) (cdr (cons nil 's))) nil))
Tim Rice
2001-06-07 03:52:26 UTC
Permalink
Post by Markus Friedl
Post by m***@etoh.eviladmin.org
Post by Jason Stone
hermione/home/jason-634: date ; ssh localhost 'sleep 20& exit' ; date
Thu May 10 15:01:54 PDT 2001
Thu May 10 15:02:14 PDT 2001
Is this really a valid test?
no
Post by m***@etoh.eviladmin.org
This hangs for 20 seconds under OpenBSD
also, but this is not what we are refering to. SSH into your FreeBSD box
using an interactive shell then do: sleep 20&exit
$ ssh -t host
% sleep 1234 &
% exit
$
hangs with openssh try whether rlogin hangs, too?
Platform SSH hangs Rlogin hangs
Solaris 7 Y N
UnixWare 2.03 Y N
UnixWare 2.13 Y N
UnixWare 7.1.0 Y N
SCO Open Sever 3 Y N
SCO Open Server 5 Y N
Redhat 6.2 Y N
Caldera eServer 2.3 Y N

Hmm, I've got source to Linux.
Post by Markus Friedl
if rlogin does not hang, could you please check the source of rlogin
and try to figure out how it handles the filedescriptors that connect
rlogind to the shell. what happes with the filedescriptors after the
shell dies?
Nothing is jumping out at me. If you'd like to see for yourself,
ftp://ftp.multitalents.net/pub/netkit-rsh-0.10.tgz (~50K)
has rlogin/rlogind source in it.
Post by Markus Friedl
thanks, -m
--
Tim Rice Multitalents (707) 887-1469
***@multitalents.net
Markus Friedl
2001-06-07 07:56:34 UTC
Permalink
Post by Tim Rice
Post by Markus Friedl
$ ssh -t host
% sleep 1234 &
% exit
$
hangs with openssh try whether rlogin hangs, too?
Platform SSH hangs Rlogin hangs
Solaris 7 Y N
UnixWare 2.03 Y N
UnixWare 2.13 Y N
UnixWare 7.1.0 Y N
SCO Open Sever 3 Y N
SCO Open Server 5 Y N
Redhat 6.2 Y N
Caldera eServer 2.3 Y N
Hmm, I've got source to Linux.
Post by Markus Friedl
if rlogin does not hang, could you please check the source of rlogin
and try to figure out how it handles the filedescriptors that connect
rlogind to the shell. what happes with the filedescriptors after the
shell dies?
Nothing is jumping out at me. If you'd like to see for yourself,
ftp://ftp.multitalents.net/pub/netkit-rsh-0.10.tgz (~50K)
has rlogin/rlogind source in it.
sorry, i don't have the time to check linux/sco/solaris.
(check behaviour on SIGCLD, pty handling, etc).

however:

Platform SSH hangs Rlogin hangs
OpenBSD N N
BSD/OS N N

this is all i can do right now.

-m
Loading...