Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1317843 - `virt-builder --update` fails with: "dnf -y --best upgrade: command exited with an error"
Summary: `virt-builder --update` fails with: "dnf -y --best upgrade: command exited wi...
Keywords:
Status: CLOSED DUPLICATE of bug 1264198
Alias: None
Product: Fedora
Classification: Fedora
Component: dnf
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1320754 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-15 10:48 UTC by Kashyap Chamarthy
Modified: 2016-03-30 10:35 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-29 22:13:28 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Not truncated debugging plus tracing output (virt-builder -v -x [...]) (304.97 KB, text/plain)
2016-03-15 10:48 UTC, Kashyap Chamarthy
no flags Details

Description Kashyap Chamarthy 2016-03-15 10:48:30 UTC
Created attachment 1136536 [details]
Not truncated debugging plus tracing output (virt-builder -v -x [...])

Description of problem
-----------------------

Running this:

    $ virt-builder fedora-23 -o devstack1.qcow2 --format qcow2 \
        --root-password password:123456 --update --selinux-relabel \
        --size 80G

results in (not truncated verbose plus tracing output is attached):

    [...]
    No '/dev/log' or 'logger' included for syslog logging
    No '/dev/log' or 'logger' included for syslog logging
    virt-builder: error: dnf -y --best upgrade: command exited with an error
    
    If reporting bugs, run virt-builder with debugging enabled and include the 
    complete output:
    
      virt-builder -v -x [...]

However, the above command returns _success_

    $ echo $?
    0


Version
-------

libguestfs-tools-c-1.32.3-2.fc23.x86_64
qemu-system-x86-2.4.1-7.fc23.x86_64
libvirt-daemon-kvm-1.2.18.2-2.fc23.x86_64


How reproducible: Consistently.


Steps to Reproduce
------------------

    $ virt-builder fedora-23 -o devstack1.qcow2 --format qcow2 \
        --root-password password:123456 --update --selinux-relabel \
        --size 80G

Actual results
--------------

`virt-builder` fails to successfully build and update a Fedora-23
machine.

(Also, refer the attached complete debugging output.)


Expected results
----------------

`virt-builder` fails to successfully update a Fedora-23 machine.

Comment 1 Kashyap Chamarthy 2016-03-15 10:50:17 UTC
(In reply to Kashyap Chamarthy from comment #0)

[...]

> Expected results
> ----------------
> 
> `virt-builder` fails to successfully update a Fedora-23 machine.

Copy+paste error above :-)

Of course I meant: `virt-builder` *successfully* creates and updates a Fedora-23 VM

Comment 2 Kashyap Chamarthy 2016-03-15 11:00:20 UTC
If I remove "--update" flag, the command succeeds:

$ virt-builder fedora-23 -o devstack1.qcow2 --format qcow2 --root-password password:fedora  --selinux-relabel --size 100G                                    
[   1.2] Downloading: http://libguestfs.org/download/builder/fedora-23.xz
[   2.0] Planning how to build this image
[   2.0] Uncompressing
[   5.5] Resizing (using virt-resize) to expand the disk to 100.0G
[  34.9] Opening the new disk
[  46.1] Setting a random seed
[  46.1] Setting passwords
[  47.2] SELinux relabelling
[  47.3] Finishing off
                   Output file: devstack1.qcow2
                   Output size: 100.0G
                 Output format: qcow2
            Total usable space: 99.4G
                    Free space: 98.6G (99%)

Comment 3 Richard W.M. Jones 2016-03-24 09:16:59 UTC
*** Bug 1320754 has been marked as a duplicate of this bug. ***

Comment 4 Richard W.M. Jones 2016-03-24 12:14:06 UTC
This is brute-force and slow, but it does find the true cause of the
error (eventually):

$ LIBGUESTFS_MEMSIZE=4096 LIBGUESTFS_BACKEND=direct \
  virt-builder fedora-23 --arch x86_64 \
    -o disk_f23_minimal_x86_64.img --size 20G \
    --no-delete-on-failure \
    --install strace \
    --run-command "strace -o /tmp/log -f -s 1500 dnf -y -v --best update"

$ virt-cat disk_f23_minimal_x86_64.img /tmp/log | less

I believe it is caused by dnf not setting up a SIGPIPE handler,
and therefore dying (in librpm, but that's a coincidence) after
a SIGPIPE signal is received.

The log file is absolutely massive, so I'll upload it to my website
shortly.

Comment 5 Adam Williamson 2016-03-24 12:33:05 UTC
Thanks a lot for looking into it, Richard!

Comment 6 Richard W.M. Jones 2016-03-24 12:35:58 UTC
http://oirase.annexia.org/tmp/bz1317843/

I've just uploaded the last 100000 lines of the log file now.  It's
still compressing the whole log file, and I'll upload that when done.

Comment 7 Richard W.M. Jones 2016-03-24 13:11:56 UTC
The log is 648MB.  Try not to kill my webserver :-/

sha256 of the compressed log:

5a7a96b944d14cb55c42627dce16c1591f737c5fd176606fd472e451ed004e40  /tmp/log.xz

Comment 8 Richard W.M. Jones 2016-03-29 21:37:35 UTC
I tried to capture a stack trace from gdb on exit.  It wasn't
completely successful, but it does still point to something to
do with SIGPIPE.

-------------------
[...]
  Cleanup     : glibc-2.22-3.fc23.x86_64                                283/285Detaching after fork from child process 1193.
 
  Cleanup     : tzdata-2015g-1.fc23.noarch                              284/285 
  Cleanup     : libgcc-5.1.1-4.fc23.x86_64                              285/285Detaching after fork from child process 1194.
Detaching after fork from child process 1195.
Detaching after fork from child process 1196.
Detaching after fork from child process 1199.
 
1039 blocks
Detaching after fork from child process 1588.
No '/dev/log' or 'logger' included for syslog logging
No '/dev/log' or 'logger' included for syslog logging
Detaching after fork from child process 26818.
Detaching after fork from child process 26822.
Detaching after fork from child process 26823.
Detaching after fork from child process 26825.
Detaching after fork from child process 26826.
Detaching after fork from child process 1040.

Program received signal SIGPIPE, Broken pipe.
0x00007ffff6d56d5d in write () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffff7fee700 (LWP 628)):
#0  0x00007ffff6d56d5d in write () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000000000 in ?? ()
No symbol table info available.
A debugging session is active.

	Inferior 1 [process 628] will be killed.

Quit anyway? (y or n) [answered Y; input not from terminal]
-------------------

The reproducer command you can try for yourself is this (all on
a single line):

LIBGUESTFS_MEMSIZE=4096 LIBGUESTFS_BACKEND=direct virt-builder fedora-23 --arch x86_64 -o disk_f23_minimal_x86_64.img --size 20G --no-delete-on-failure --install gdb --run-command 'gdb --batch -ex "break _exit" -ex "handle SIGPIPE pass" -ex "run" -ex "t a a bt full" -ex "quit" --args python3 /usr/bin/dnf -y -v --best update; exit 1'

Comment 9 Richard W.M. Jones 2016-03-29 22:10:36 UTC
(Refer to the log.tail.xz file in http://oirase.annexia.org/tmp/bz1317843/ )

The failure happens in librpm's runExtScript() function.

The dnf parent process writes a script called /var/tmp/rpm-tmp.PDQj6U.
This script contains:

----------------------
/usr/bin/mandb -q

# update cache
----------------------

Then dnf creates a pipe to write to this script.  It then forks and
execs the script.  The shell is PID 1004.  mandb is PID 1005.  Into
the pipe (connected to stdin of the above script) dnf writes:

/usr/share/man/aa
/usr/share/man/aa/man0p
/usr/share/man/aa/man1
/usr/share/man/aa/man1p
/usr/share/man/aa/man1x
... many many more lines of the same thing

However as far as I can tell mandb never reads its stdin at all.
Also mandb does NOT exit with any error.  It appears that mandb
is not expecting any input (also the mandb man page indicates that
it is not expecting to read from stdin).

Eventually when control is returned to dnf's write() syscall, it returns
EPIPE and dnf gets SIGPIPE and dnf then crashes.

Comment 10 Richard W.M. Jones 2016-03-29 22:13:28 UTC
OK I see.  In the latest librpm, this commit was added:

commit 90d8cc16486479441477e89c2e09bd4f9f7604bb
Author: Lubos Kardos <lkardos>
Date:   Fri Oct 30 14:42:32 2015 +0100

    Ignore SIGPIPE signals during execucton of scriptlets (rhbz:1264198)

It looks as if this is a duplicate of bug 1264198.

*** This bug has been marked as a duplicate of bug 1264198 ***

Comment 11 Richard W.M. Jones 2016-03-30 10:35:03 UTC
There is an updated virt-builder fedora-23 image which
fixes this problem.


Note You need to log in before you can comment on or make changes to this bug.