openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

Discussion:

kevin martin

2018-08-21 17:27:43 UTC

Latest Oracle Linux (7.5) and openssh 7.4 is bundled as an rpm. this runs
fine. if I download openssh 7.6 or 7.7, compile it with the flags
--with-pam and --with-pid-dir=/var/run and install it to /usr/local, modify
the sshd.service file to point to /usr/local/sbin/sshd, the start hangs.
The sshd.service files looks like this:

[Unit]
Description=OpenSSH server daemon
Documentation=man:sshd(8) man:sshd_config(5)
After=network.target sshd-keygen.service
Wants=sshd-keygen.service

[Service]
Type=notify
PIDFile=/var/run/sshd.pid
EnvironmentFile=/etc/sysconfig/sshd
ExecStart=/usr/local/sbin/sshd $OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=42s

[Install]
WantedBy=multi-user.target

OPTIONS get's picked up from /etc/sysconfig/sshd and has "-D -4" as the
options. systemctl start sshd hangs and it seems like openssh isn't
notifying systemd that it's started. For awhile, it seemed like sshd
wasn't writing it's pid file but it seems like having changed the flags to
--with-pid-dir has it writing it's pid file. The funny thing is that sshd
*does* get started but systemd never recognizes it so ends up killing it
and restarting it over and over and over. running the sshd under strace
from the command line of course works fine, sshd starts, you can login,
etc., so it's some interaction with systemd that I need to get figured out.

Let me know what more I can add to this that would help please.

---

Regards,

Kevin Martin

Peter Stuge

2018-08-22 11:23:11 UTC

Permalink

Post by kevin martin
Latest Oracle Linux (7.5) and openssh 7.4 is bundled as an rpm. this runs
fine. if I download openssh 7.6 or 7.7, compile it with the flags
--with-pam and --with-pid-dir=/var/run and install it to /usr/local, modify
the sshd.service file to point to /usr/local/sbin/sshd, the start hangs.

Post by kevin martin
[Service]
Type=notify

Post by kevin martin
it seems like openssh isn't notifying systemd that it's started.

I don't think the portable OpenSSH source has any systemd integration,
so that is what you should expect.

--8<-- systemd.service(5)
OPTIONS
..
Type=
..
Behavior of notify is similar to simple; however, it is expected
that the daemon sends a notification message via sd_notify(3) or an
equivalent call when it has finished starting up.
-->8--

I guess that Oracle has patched sshd to call sd_notify() and thus
introduced dependency on the systemd libraries for sshd. I don't
think that's a good idea at all.

To run upstream OpenSSH-portable set Type=simple and be done with it.

//Peter

Stephen Harris

2018-08-22 13:37:49 UTC

Permalink

Post by Peter Stuge
I guess that Oracle has patched sshd to call sd_notify() and thus

Well, RedHat.

Post by Peter Stuge
introduced dependency on the systemd libraries for sshd. I don't

Yup
% ldd /usr/sbin/sshd | grep syst
libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007f0e5b715000)

--
rgds
Stephen

kevin martin

2018-08-22 14:02:14 UTC

Permalink

Simple seems to have fixed it. I was also trying with "forking" as the
type and that was failing as well.

Thanks.

---

Regards,

Kevin Martin

Post by Stephen Harris

Post by Peter Stuge
I guess that Oracle has patched sshd to call sd_notify() and thus

Well, RedHat.

Post by Peter Stuge
introduced dependency on the systemd libraries for sshd. I don't

Yup
% ldd /usr/sbin/sshd | grep syst
libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007f0e5b715000)
--
rgds
Stephen
_______________________________________________
openssh-unix-dev mailing list
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev

Jakub Jelen

2018-08-22 15:45:29 UTC

Permalink

Post by kevin martin
Simple seems to have fixed it. I was also trying with "forking" as the
type and that was failing as well.

That is not as simple as that -- we lived with "simple" for long time,
but it was not covering some corner cases so we ended up using the
sd_notify, since that was the only reliable way for systemd to know the
service is working.

For others interested in this topic, there was a long discussion in bug
#2641, unfortunately without upstream solution:

https://bugzilla.mindrot.org/show_bug.cgi?id=2641

Regards,

--
Jakub Jelen
Software Engineer
Security Technologies
Red Hat, Inc.

kevin martin

2018-08-22 15:53:35 UTC

Permalink

yep, that race condition is exactly what i was experiencing. I'm not sure
why having the systemd notify code in openssh as a configure time option
would be such a bad thing.

---

Regards,

Kevin Martin

Post by Jakub Jelen

Post by kevin martin
Simple seems to have fixed it. I was also trying with "forking" as the
type and that was failing as well.

That is not as simple as that -- we lived with "simple" for long time,
but it was not covering some corner cases so we ended up using the
sd_notify, since that was the only reliable way for systemd to know the
service is working.
For others interested in this topic, there was a long discussion in bug
https://bugzilla.mindrot.org/show_bug.cgi?id=2641
Regards,
--
Jakub Jelen
Software Engineer
Security Technologies
Red Hat, Inc.

Peter Stuge

2018-08-22 21:32:58 UTC

Permalink

not sure why having the systemd notify code in openssh as a
configure time option would be such a bad thing.

At the very least it introduces a dependency on libsystemd into sshd,
which is undesirable for reasons of security and convenience. The
principle of "you are done when you can not remove any more" confirms
that it is unwise to add dependencies without very careful consideration.

I've read through the debian and Red Hat bug reports.

There are two different but related problems here:

1. For systemctl [re]start, when a .service file has Type=simple,
systemd assumes that service startup can never fail, and immediately
considers this service successfully started when the exec() of sshd
has succeeded.

That's debatable design within systemd, but it's hard for systemd to
know when a given service has actually started successfully, and
services which fit that assumption do exist.

So when sshd detects an error on startup and exits with an error code
shortly after being started, systemd considers the service to first
have started successfully and then to have exited with an error, so
it then restarts the service. Repeat.

When service limits are exhausted the service ends up in a failed state.

Meanwhile, the systemctl [re]start command doesn't report any error
to the administrator, because systemd considers the service to have
[re]started successfully once. This is "error messages are lost".

2. For systemctl reload, systemd can and arguably should send SIGHUP
to sshd. More uncertainty and assumptions within systemd follows;
sshd re-exec:s, meaning that the PID stays the same, so systemd
doesn't receive SIGCHLD and so even if 1. is fixed, here systemd will
not understand that there an error during startup of the new sshd is
to be considered a failed reload. Ie. the above problems apply here
again. The systemctl reload sshd command is always immediately
successful, even if re-exec:ed sshd detects an error in the config
file.

In both these cases, systemctl reports no error, while sshd isn't running.

So what to do?

A workaround for [re]start is to add sshd -t ExecStartPre linting,
but that doesn't help at all with reload.

It would be good to have sshd integrate with systemd here, but we
need to avoid the libsystemd dependency.

Fortunately, sd_notify() doesn't need to do all too much; almost
everything is used before in the OpenSSH codebase, so it's easy
enough to add local code for it. It's a sendmsg() with SCM_CREDENTIALS
to the AF_UNIX SOCK_DGRAM named in $NOTIFY_SOCKET.

The file descriptor passing code in monitor_fdpass.c sends other
messages with ancillary data.

Damien, how do you feel about adding the notification without the
dependency, maybe conditioned on a configure.ac check for (Linux-only)
SCM_CREDENTIALS?

I think the minimum viable product would be to emit READY=1 once
startup is complete and RELOADING=1 on SIGHUP receipt.

STOPPING=1 would also make sense in sshd exit paths if something
could end up blocking along the way, but at least the SIGTERM case
in server_accept_loop() doesn't seem to need that.

STATUS= and ERRNO= could be nice-to-haves for error messages.

So I wrote a simple sd_notify() and am attaching it here, but the
address part and a connect() may need to be outside the function
with privilege separation. Thoughts on this idea?

//Peter

Damien Miller

2018-08-23 04:01:15 UTC

Permalink

Post by Peter Stuge

not sure why having the systemd notify code in openssh as a
configure time option would be such a bad thing.

Thanks for the detailed write up, Peter.

I agree: what is happening here seems to be mostly bad assumptions and
inflexibility inside systemd.

I'm surprised that systemd made these design decisions, because sshd is
not doing anything historically unique with regards to startup or reload
behaviour and "works with existing daemons" seems to be requirement #0
if you're writing an init system.

Maybe the other daemon vendors didn't push back against this, but I'm
willing to.

-d

kevin martin

2018-08-23 14:53:27 UTC

Permalink

I'm not sure I agree with Peter in respect to his comment about "building a
dependency to systemd". The only time a "dependency" would be created is
when the end-user would configure it to be there with a configure time flag
of --with-systemd. Just having the code available and dormant without that
flag being provided builds in no dependency whatsoever and gives the
end-user their option to choose.

---

Regards,

Kevin Martin

Post by Damien Miller

Post by Peter Stuge

not sure why having the systemd notify code in openssh as a
configure time option would be such a bad thing.

Thanks for the detailed write up, Peter.
I agree: what is happening here seems to be mostly bad assumptions and
inflexibility inside systemd.
I'm surprised that systemd made these design decisions, because sshd is
not doing anything historically unique with regards to startup or reload
behaviour and "works with existing daemons" seems to be requirement #0
if you're writing an init system.
Maybe the other daemon vendors didn't push back against this, but I'm
willing to.
-d
_______________________________________________
openssh-unix-dev mailing list
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev

Emmanuel Deloget

2018-08-23 16:21:16 UTC

Permalink

Hello,

Post by kevin martin
I'm not sure I agree with Peter in respect to his comment about "building a
dependency to systemd". The only time a "dependency" would be created is
when the end-user would configure it to be there with a configure time flag
of --with-systemd. Just having the code available and dormant without that
flag being provided builds in no dependency whatsoever and gives the
end-user their option to choose.

Not sure I should step in, but the code to deal with the user
selection and to notify systemd is a dependency - even if it's
compiled out. The fact is that you still ave to maintain it and to
test it regularly.

The problem looks like a systemd configuration error. systemd allows
you to start a non-systemd-aware daemon. You need to look at [Service]
/ Type (notify is used for systemd-aware daemons).

BR,

-- Emmanuel Deloget

kevin martin

2018-08-23 16:48:46 UTC

Permalink

While I appreciate the need to code it and test it regularly, Peter wrote a
bit of notify code and provided it to Damien to essentially do what the API
code into systemd already does seemingly which seems like remaking the
wheel to me, and would still require ongoing maintenance and testing. The
systemd API is developed and maintained external to openssh and is there
specifically to make it easier for apps that want to become daemons to be
able to be used effectively in the systemd environment. I hated the fact
that most flavors of Linux moved to systemd from the init system but it's
what we, the end users (companies with 100's of thousands of Linux
instances running) have to live with and to have Redhat make changes to
*your* code to include systemd enhancements (and other vendors that don't
necessarily take their codebase from Redhat) I would think would/could lead
to issues (like this one) ongoing. If *you as the developers included the
API access as a configurable option then *we the consumer could move to
your newer codebase products sooner and get the enhancements that you folks
work so diligently to make in your application which is a win-win for all
of us.

---

Regards,

Kevin Martin

Post by Emmanuel Deloget
Hello,

Post by kevin martin
I'm not sure I agree with Peter in respect to his comment about

"building a

Post by kevin martin
dependency to systemd". The only time a "dependency" would be created is
when the end-user would configure it to be there with a configure time

flag

Post by kevin martin
of --with-systemd. Just having the code available and dormant without

that

Post by kevin martin
flag being provided builds in no dependency whatsoever and gives the
end-user their option to choose.

Not sure I should step in, but the code to deal with the user
selection and to notify systemd is a dependency - even if it's
compiled out. The fact is that you still ave to maintain it and to
test it regularly.
The problem looks like a systemd configuration error. systemd allows
you to start a non-systemd-aware daemon. You need to look at [Service]
/ Type (notify is used for systemd-aware daemons).
BR,
-- Emmanuel Deloget

David Newall

2018-08-24 00:42:19 UTC

Permalink

I'm old school and think systemd is an overly complicated abomination.
Don't support it. The more projects that do support it, the more
legitimacy it is lent. Refuse all systemd-related patches.

Damien Miller

2018-08-23 23:50:55 UTC

Permalink

If it's in the code that we maintain then it's a dependency. I'm don't
think any other definition makes sense.

Peter Stuge

2018-08-23 17:49:54 UTC

Permalink

Post by Damien Miller
I agree: what is happening here seems to be mostly bad assumptions and
inflexibility inside systemd.

I didn't say that, and I don't agree with that, to me it's welcome
ambition rather than bad assumptions.

Consider this:

How could systemd determine whether startup of a foreground daemon
completed successfully or failed?

Other than explicit notification (like a AF_UNIX message) systemd
could only use time; it could wait for the daemon to exit(EXIT_FAILURE)
after exec() - but how long is long enough? Every answer is incorrect.

Since systemd can't know when sshd has successfully started I find it
really reasonable to assume "immediately" in the Type=simple case.

Post by Damien Miller
I'm surprised that systemd made these design decisions, because sshd is
not doing anything historically unique with regards to startup or reload
behaviour and "works with existing daemons" seems to be requirement #0
if you're writing an init system.

That's not fair.

systemd works with sshd just as well as if I would add sshd to my inittab
on a SysV init system, but that's not so useful.

systemd works well with sshd using Type=forking, but if the config
file breaks and a reload is issued (and sshd exits, because bad config)
then systemd detects that sshd exited, but it can't know why, so it
can't output a status message.

systemd is indeed more ambitious than e.g. SysV init, and for service
management I consider that a leap in the right direction. (For many other
things which systemd wants to do not so much - I don't use those.)

Post by Damien Miller
Maybe the other daemon vendors didn't push back against this, but I'm
willing to.

Please don't push back just for the sake of it.

Did you look at the code I sent?

Would you take a patch with essentially that code, without any
libsystemd dependency, to make sshd work as a Type=notify service,
enabling maximum usability with systemd?

//Peter

Jochen Bern

2018-08-24 12:04:13 UTC

Permalink

Post by Peter Stuge
How could systemd determine whether startup of a foreground daemon
completed successfully or failed?
Other than explicit notification (like a AF_UNIX message) systemd
could only use time; it could wait for the daemon to exit(EXIT_FAILURE)
after exec() - but how long is long enough? Every answer is incorrect.

If we can agree that neither systemd nor "legacy" methods(*) of getting
feedback from daemon processes will cease to exist just because the
other side wishes them to hard enough, then complementing either side
(but preferably systemd) with a (general, configurable, contrib/ subdir
based) wrapper to translate as needed would seem a pragmatic solution.
</â¬.02>

(*) PID file, lookup in the process table, check for a LISTEN, pattern
match in a logfile, running a dedicated *client* executable / Nagios
plugin / ${DAEMON}ctl tool for a test, throwing the daemon a
SIGAREYOUWELL/shmem/semaphore/... request, you name it

Regards,

--
Jochen Bern
Systemingenieur

Binect GmbH

Colin Watson

2018-08-24 17:19:09 UTC

Permalink

Post by Jochen Bern

If we can agree that neither systemd nor "legacy" methods(*) of getting
feedback from daemon processes will cease to exist just because the
other side wishes them to hard enough, then complementing either side
(but preferably systemd) with a (general, configurable, contrib/ subdir
based) wrapper to translate as needed would seem a pragmatic solution.
</€.02>
(*) PID file, lookup in the process table, check for a LISTEN, pattern
match in a logfile, running a dedicated *client* executable / Nagios
plugin / ${DAEMON}ctl tool for a test, throwing the daemon a
SIGAREYOUWELL/shmem/semaphore/... request, you name it

I doubt that anyone using OpenSSH with systemd would want to use a
polling-based (and thus inefficient) hack like that when they could just
apply the tiny patch to slot in an sd_notify call between listen and
accept. (And I definitely see the logic behind notifying the service
manager at that point; I've dealt with complex services built on top of
OpenSSH that needed to arrange the boot sequence so that they started
only once sshd was actually ready to accept connections, and without
this kind of approach they had to settle for arbitrary delays and race
conditions.)

systemd has its structural problems, but this is one thing it gets
right. To my mind, the reasons for avoiding linking against libsystemd
with a configure-time switch are essentially political; if you're
running on a systemd-based system then it's paged in anyway so the
runtime cost is negligible, if you're not then sd_notify is already
careful to do nothing and do so cheaply, and in general I think it makes
more sense to use common code to notify the service manager than to
duplicate it. (I still have a soft spot for the hacky "SIGSTOP yourself
and have init send you SIGCONT when it notices" approach to this problem
that we took in upstart, but I can understand why systemd preferred to
do something else.)

Obviously it's better to get patches upstream wherever possible. But
honestly, speaking as a downstream who maintains a patch that calls
sd_notify in the right place, I'd rather have to maintain that patch
indefinitely than have a worse hack upstream that I'd then have to undo
or otherwise work around.

--
Colin Watson [***@debian.org]

Peter Stuge

2018-08-24 18:06:35 UTC

Permalink

Post by Damien Miller
If it's in the code that we maintain then it's a dependency. I'm don't
think any other definition makes sense.

I define dependency as external library or component, required at
compile- and runtime.

Post by Damien Miller
Not sure I should step in, but the code to deal with the user selection

What user selection?

Post by Damien Miller
and to notify systemd is a dependency - even if it's compiled out. The
fact is that you still ave to maintain it and to test it regularly.

Did you read the code I sent? It does six system calls in about 85 lines,
three of which can't fail. I wrote it exactly because something so short
is sufficient.

The existing code in OpenSSH which sends messages with ancillary data
was last touched in 2010, 8 years ago. This isn't complicated, making
a compile- and runtime dependency on libsystemd doubly undesirable.

Post by Damien Miller
The problem looks like a systemd configuration error. systemd allows
you to start a non-systemd-aware daemon. You need to look at [Service]
/ Type (notify is used for systemd-aware daemons).

The discussion isn't about the basic functionality known from sysvinit,
using init scripts or even inittab directly. sshd of course already runs
fine on many systemd systems, and it runs fine also without debian's
patch to depend on libsystemd, when only considering the bare minimum
of a service manager.

But that ignores the additional functionality offered by systemd to its
users. I think it makes sense for sshd to support that functionality if
the cost of doing so is low. I suggest that my proposed code is low cost.

The implemented API (AF_UNIX message with ancillary data) is documented
and there's no technical reason for it to change, so the maintenance burden
will likely be similar to monitor_fdpass.c - little change in 8 years.

Post by Damien Miller
seems like remaking the wheel to me

It is, but I think avoiding the library dependency is a good reason to
do so, especially considering how little code is needed.

Post by Damien Miller
would still require ongoing maintenance and testing

Did you read the code?

Post by Damien Miller
If *you as the developers included the API access as a configurable option
then *we the consumer could move to your newer codebase products sooner and
get the enhancements that you folks work so diligently to make in your
application which is a win-win for all of us.

You imply an obligation for developers to enable consumers to "move
to newer codebase products" in their systems - please re-read the
software license, and please remember that no such obligation exists.

Over the years I've learned that open source software only works if you
take responsibility for it yourself. If you fail to do so then you will
inevitably have a bad experience.

One way is to get a support contract (like Red Hat in this case) and
then they are of course obliged to honor that. You'll get something,
(a dependency on libsystemd) but since they too optimize for cost you
can be pretty sure that it's not the best solution - that would come
from upstream.

Post by Damien Miller
I'm old school and think systemd is an overly complicated abomination.

I agree completely that the systemd implementation and overall ambition
is overly complicated.

But the unit data model is very very good, and I think systemd is a great
improvement in service management compared to anything on Linux before.

While we argue, Windows has proper service management since the 1990s.

Post by Damien Miller
The more projects that do support it, the more legitimacy it is lent.

That ship has sailed. Please think more about the role of Red Hat in
the Linux ecosystem.

Post by Damien Miller
wrapper to translate as needed would seem a pragmatic solution

I consider that wrapper to be Type=forking in the service file, but
anything short of explicit notification from daemon to service manager
leaves the service manager without complete information about the state
of the daemon. That's not great.

Post by Damien Miller
(*) PID file, lookup in the process table, check for a LISTEN, pattern
match in a logfile, running a dedicated *client* executable / Nagios
plugin / ${DAEMON}ctl tool for a test, throwing the daemon a
SIGAREYOUWELL/shmem/semaphore/... request, you name it

Neither are both explicit and generic (across services).

I think the notify socket is a good simple solution, and one that is not
tied to systemd, I think it is worth supporting.

//Peter