Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1566422 - stap ERROR: Build-id mismatch with (re)inserted/removed kernel module probe
Summary: stap ERROR: Build-id mismatch with (re)inserted/removed kernel module probe
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemtap
Version: 7.4
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: ---
Assignee: Frank Ch. Eigler
QA Contact: Martin Cermak
URL:
Whiteboard:
Depends On:
Blocks: 1571845
TreeView+ depends on / blocked
 
Reported: 2018-04-12 09:50 UTC by Mark Wielaard
Modified: 2018-10-30 10:47 UTC (History)
7 users (show)

Fixed In Version: systemtap-3.2-8.el7.src.rpm
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1571845 (view as bug list)
Environment:
Last Closed: 2018-10-30 10:46:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
suggested patch (1.19 KB, patch)
2018-04-16 19:19 UTC, Serguei Makarov
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:3168 0 None None None 2018-10-30 10:47:19 UTC

Description Mark Wielaard 2018-04-12 09:50:22 UTC
Description of problem:

Setting a probe on an loaded kernel module and then reinserting it (to trigger a init or exit probe) will ERROR out because of a Build-id mismatch.

Version-Release number of selected component (if applicable):

On 7.4:
systemtap-3.1-5.el7_4.x86_64
kernel-3.10.0-693.21.1.el7.x86_64
kernel-debuginfo-3.10.0-693.21.1.el7.x86_64

And also replicated against 7.5:

systemtap-3.2-4.el7.x86_64
kernel-3.10.0-862.el7.x86_64
kernel-debuginfo-3.10.0-862.el7.x86_64

How reproducible:

Always, if module was already inserted.
Never, if module was not inserted yet.

Steps to Reproduce:
1. sudo modprobe igb
2. stap -e 'probe module("igb").function("igb_*_module") { printf("%s: %s.\n", ctime(gettimeofday_s()), ppfunc()); }
3. sudo rmmod igb
4. sudo modprobe igb

Actual results:

Tue Apr 10 09:34:20 2018: igb_exit_module.
ERROR: Build-id mismatch [man error::buildid]: "/usr/lib/modules/3.10.0-693.21.1.el7.x86_64/kernel/drivers/net/ethernet/intel/igb/igb.ko.xz" byte 0 (0x9c vs 0x00) address 0xffffffffc0a97010 rc -14
Tue Apr 10 09:34:22 2018: igb_init_module.

Note that the probe still hits... But the error triggers and takes out
the stap script.

Expected results:

When switching step 1 and 2. Or explicitly doing a sudo rmmod igb before the stap command to make sure the igb kernel module isn't loaded before the stap script starts.

> sudo modprobe igb
> Tue Apr 10 09:32:34 2018: igb_init_module.
> sudo rmmod igb
> Tue Apr 10 09:32:37 2018: igb_exit_module.
> sudo modprobe igb
> Tue Apr 10 09:32:39 2018: igb_init_module.
> sudo rmmod igb
> Tue Apr 10 09:32:41 2018: igb_exit_module.

Additional info:

So for some reason if the kernel module was already inserted and get reinserted when the stap script runs a (bogus) build-id mismatch error occurs.

Comment 3 Serguei Makarov 2018-04-16 17:29:36 UTC
Sequence of events appears to be as follows:

When module has already been loaded:
- modprobe igb
- systemtap module loaded
- notes_sect calculated to be at 0xffffffffc0792000 by _stp_kmodule_update_address (called via _stp_do_relocation)
- _stp_module_check is called and correctly reads notes_sect, 'theory' and 'practice' build-ids match
- rmmod igb; modprobe igb
- second time, _stp_kmodule_update_address is not called before _stp_kmodule_check
- notes_sect is still recorded as being at 0xffffffffc0792000
- _stp_kmodule_check performs invalid memory access to that location, sets 'practice' to 0x00, comparison vs. 'theory' (0x8f) fails

When module has not been loaded:
- systemtap module loaded
- since igb is not loaded, notes_sect for igb is set to 0x0
- modprobe igb
- _stp_kmodule_update_address is not called
- _stp_[k]module_check are called (via if(MODULE_STATE_COMING) branch in _stp_module_notifier), but since notes_sect is 0x0, _stp_build_id_check is never called (not sure if this behaviour is correct or not)
- can continue to load and unload igb without incident, notes_sect remains set to 0x0

Comment 4 Serguei Makarov 2018-04-16 19:19:03 UTC
Created attachment 1422702 [details]
suggested patch

Suggested patch for fixing this problem. Will push upstream once I'm more certain that clearing the section addresses in _stp_module_notifier is the right thing to do (since these addresses were originally populated by a call from _stp_do_relocation/_stp_ctl_write_cmd rather than by _stp_module_notifier).

Comment 5 Mark Wielaard 2018-04-18 15:46:27 UTC
This patch looks good. It makes everything work out of the box for me.
I have added it to the fedora 28 systemtap build:
http://koji.fedoraproject.org/koji/buildinfo?buildID=1073377

There is also a copr build I did for another unwinding bug:
https://copr.fedorainfracloud.org/coprs/mjw/systemtap-fallback-unwind/

With EPEL yum-plugin-copr it can be enabled and installed with:
$ yum install yum-plugin-copr
$ yum copr enable mjw/systemtap-fallback-unwind
$ yum upgrade systemtap

The result is of course totally unsupported, untested and might totally break your system. But it might be a simple way to test this fix.

Comment 6 Fedora Update System 2018-04-20 15:18:34 UTC
systemtap-3.2-11.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-8b5ad21c11

Comment 7 Fedora Update System 2018-04-20 16:29:15 UTC
systemtap-3.2-11.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-b6ed279f75

Comment 8 Fedora Update System 2018-04-21 05:03:35 UTC
systemtap-3.2-11.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-b6ed279f75

Comment 9 Fedora Update System 2018-04-21 18:38:51 UTC
systemtap-3.2-11.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-8b5ad21c11

Comment 16 errata-xmlrpc 2018-10-30 10:46:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:3168


Note You need to log in before you can comment on or make changes to this bug.