Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1647390 - pcp consuming too much log space causing /var to fill
Summary: pcp consuming too much log space causing /var to fill
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: pcp
Version: 28
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
Assignee: Mark Goodwin
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-07 11:45 UTC by Terry Bowling
Modified: 2019-05-03 03:42 UTC (History)
6 users (show)

Fixed In Version: pcp-4.3.2-1.fc30 pcp-4.3.2-1.fc28 pcp-4.3.2-1.fc29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-03 00:58:53 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/etc/pcp/pmlogger/control.d/local (700 bytes, text/plain)
2018-11-07 11:45 UTC, Terry Bowling
no flags Details
pmlogger log files (940 bytes, application/x-xz)
2018-11-08 12:31 UTC, Terry Bowling
no flags Details
updated pmlogger logs after nov 4 & 5 logs restored (8.24 KB, application/x-xz)
2018-11-08 13:03 UTC, Terry Bowling
no flags Details
patch for whole day semantics for $PCP_COMPRESSAFTER (647 bytes, patch)
2018-12-04 04:42 UTC, Mark Goodwin
no flags Details | Diff
patch (v2) for whole day semantics for $PCP_COMPRESSAFTER (732 bytes, patch)
2018-12-04 21:29 UTC, Mark Goodwin
no flags Details | Diff
reduce per-process logging filesystem space requirements (1.44 KB, text/plain)
2019-01-23 03:26 UTC, Mark Goodwin
no flags Details

Description Terry Bowling 2018-11-07 11:45:43 UTC
Created attachment 1502935 [details]
/etc/pcp/pmlogger/control.d/local

Description of problem:

PCP is consuming 9.3 GB of logs space for < 2 days of logging.

Version-Release number of selected component (if applicable):

$ rpm -qa |grep pcp
pcp-gui-4.1.3-1.fc28.x86_64
pcp-zeroconf-4.1.3-1.fc28.x86_64
pcp-pmda-nfsclient-4.1.3-1.fc28.x86_64
pcp-doc-4.1.3-1.fc28.noarch
pcp-4.1.3-1.fc28.x86_64
python3-pcp-4.1.3-1.fc28.x86_64
pcp-pmda-dm-4.1.3-1.fc28.x86_64
pcp-conf-4.1.3-1.fc28.x86_64
python2-pcp-4.1.3-1.fc28.x86_64
cockpit-pcp-180-1.fc28.x86_64
pcp-system-tools-4.1.3-1.fc28.x86_64
pcp-selinux-4.1.3-1.fc28.x86_64
pcp-libs-4.1.3-1.fc28.x86_64


How reproducible:

I have set $PCP_COMPRESSAFTER=1 in both of these config files (because I can never remember which config file to use).

   /etc/pcp/pmlogger/control
   /etc/pcp/pmlogger/control.d/local  # attached to BZ

This same problem happened to me 2 days ago while I had $PCP_COMPRESSAFTER=0.  It was not consolidating logs, nor cleaning up old logs, some of which were older than 30 days.  So I manually deleted all old logs for Nov, Oct, and Sept.


Steps to Reproduce:
1.
2.
3.

Actual results:

$ du -h
9.3G	.

$ du -h * | sort -h
4.0K	Latest
36K	pmlogger.log
36K	pmlogger.log.prior
48K	20181107.00.10.index
176K	20181101.index
176K	20181106.index
180K	20181103.index
180K	20181105.index
184K	20181104.index
220K	20181102.index
336K	20181101.meta.xz
468K	20181102.meta.xz
472K	20181103.meta.xz
4.9M	20181101.22.xz
9.3M	20181102.25.xz
11M	20181101.0.xz
11M	20181101.10.xz
11M	20181101.11.xz
11M	20181101.12.xz
11M	20181101.13.xz
11M	20181101.14.xz
11M	20181101.15.xz
11M	20181101.16.xz
11M	20181101.17.xz
11M	20181101.18.xz
11M	20181101.19.xz
11M	20181101.1.xz
11M	20181101.21.xz
11M	20181101.2.xz
11M	20181101.3.xz
11M	20181101.4.xz
11M	20181101.5.xz
11M	20181101.6.xz
11M	20181101.7.xz
11M	20181101.8.xz
11M	20181101.9.xz
11M	20181102.0.xz
11M	20181102.10.xz
11M	20181102.11.xz
11M	20181102.12.xz
11M	20181102.1.xz
11M	20181102.2.xz
11M	20181102.3.xz
11M	20181102.4.xz
11M	20181102.5.xz
11M	20181102.6.xz
11M	20181102.7.xz
11M	20181102.8.xz
11M	20181102.9.xz
11M	20181103.24.xz
11M	20181103.7.xz
12M	20181101.20.xz
12M	20181102.13.xz
12M	20181102.14.xz
12M	20181102.15.xz
12M	20181102.16.xz
12M	20181102.17.xz
12M	20181102.18.xz
12M	20181102.19.xz
12M	20181102.20.xz
12M	20181102.21.xz
12M	20181102.22.xz
12M	20181102.23.xz
12M	20181102.24.xz
12M	20181103.0.xz
12M	20181103.10.xz
12M	20181103.11.xz
12M	20181103.12.xz
12M	20181103.13.xz
12M	20181103.14.xz
12M	20181103.15.xz
12M	20181103.16.xz
12M	20181103.17.xz
12M	20181103.18.xz
12M	20181103.19.xz
12M	20181103.1.xz
12M	20181103.20.xz
12M	20181103.21.xz
12M	20181103.22.xz
12M	20181103.23.xz
12M	20181103.2.xz
12M	20181103.3.xz
12M	20181103.4.xz
12M	20181103.5.xz
12M	20181103.6.xz
12M	20181103.8.xz
12M	20181103.9.xz
15M	20181107.00.10.meta
33M	20181106.26
54M	20181106.meta
59M	20181104.meta
63M	20181105.meta
100M	20181107.00.10.6
101M	20181106.0
101M	20181106.1
101M	20181106.10
101M	20181106.11
101M	20181106.12
101M	20181106.13
101M	20181106.14
101M	20181106.15
101M	20181106.16
101M	20181106.17
101M	20181106.18
101M	20181106.19
101M	20181106.2
101M	20181106.20
101M	20181106.21
101M	20181106.22
101M	20181106.23
101M	20181106.24
101M	20181106.25
101M	20181106.3
101M	20181106.4
101M	20181106.5
101M	20181106.6
101M	20181106.7
101M	20181106.8
101M	20181106.9
101M	20181107.00.10.0
101M	20181107.00.10.1
101M	20181107.00.10.2
101M	20181107.00.10.3
101M	20181107.00.10.4
101M	20181107.00.10.5
484M	20181105.1
521M	20181104.1
2.0G	20181104.0
2.0G	20181105.0

Expected results:

All of the logs prior to today (Nov 7th) should be consolidated and compressed.

Additional info:

Comment 1 Terry Bowling 2018-11-07 12:13:22 UTC
grep -ie pcp -e pmlogg -e pmie /var/log/messages

Nov  7 06:58:01 tbowling systemd[1]: Created slice User Slice of pcp.
Nov  7 06:58:01 tbowling systemd[1]: Started Session 1022 of user pcp.
Nov  7 06:58:01 tbowling audit[32250]: USER_ACCT pid=32250 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov  7 06:58:01 tbowling audit[32250]: USER_START pid=32250 uid=0 auid=985 ses=1023 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov  7 06:58:01 tbowling audit[32247]: USER_START pid=32247 uid=0 auid=985 ses=1022 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov  7 06:58:01 tbowling audit[32247]: CRED_REFR pid=32247 uid=0 auid=985 ses=1022 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov  7 06:58:01 tbowling audit[32247]: CRED_DISP pid=32247 uid=0 auid=985 ses=1022 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov  7 06:58:01 tbowling audit[32247]: USER_END pid=32247 uid=0 auid=985 ses=1022 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov  7 06:58:01 tbowling systemd[1]: Removed slice User Slice of pcp.


# systemctl -l status pm*
● pmcd.service - Performance Metrics Collector Daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-11-07 06:49:56 EST; 21min ago
     Docs: man:pmcd(8)
  Process: 24051 ExecStop=/usr/share/pcp/lib/pmcd stop (code=exited, status=0/SUCCESS)
  Process: 24687 ExecStart=/usr/share/pcp/lib/pmcd start (code=exited, status=0/SUCCESS)
 Main PID: 24789 (pmcd)
    Tasks: 9 (limit: 4915)
   Memory: 24.3M
   CGroup: /system.slice/pmcd.service
           ├─24789 /usr/libexec/pcp/bin/pmcd
           ├─24792 /var/lib/pcp/pmdas/root/pmdaroot
           ├─24793 /var/lib/pcp/pmdas/proc/pmdaproc -d 3
           ├─24794 /var/lib/pcp/pmdas/xfs/pmdaxfs -d 11
           ├─24795 /var/lib/pcp/pmdas/linux/pmdalinux
           ├─24796 python3 /var/lib/pcp/pmdas/nfsclient/pmdanfsclient.python
           ├─24800 /var/lib/pcp/pmdas/kvm/pmdakvm -d 95
           └─24801 /var/lib/pcp/pmdas/dm/pmdadm -d 129

Nov 07 06:49:56 tbowling.localdomain systemd[1]: Starting Performance Metrics Collector Daemon...
Nov 07 06:49:56 tbowling.localdomain pmcd[24687]: Starting pmcd ...
Nov 07 06:49:56 tbowling.localdomain systemd[1]: Started Performance Metrics Collector Daemon.

● pmproxy.service - Proxy for Performance Metrics Collector Daemon
   Loaded: loaded (/usr/lib/systemd/system/pmproxy.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-11-07 06:49:56 EST; 21min ago
     Docs: man:pmproxy(8)
  Process: 24052 ExecStop=/usr/share/pcp/lib/pmproxy stop (code=exited, status=0/SUCCESS)
  Process: 24814 ExecStart=/usr/share/pcp/lib/pmproxy start (code=exited, status=0/SUCCESS)
 Main PID: 24963 (pmproxy)
    Tasks: 2 (limit: 4915)
   Memory: 2.7M
   CGroup: /system.slice/pmproxy.service
           └─24963 /usr/libexec/pcp/bin/pmproxy

Nov 07 06:49:56 tbowling.localdomain systemd[1]: Starting Proxy for Performance Metrics Collector Daemon...
Nov 07 06:49:56 tbowling.localdomain pmproxy[24814]: Starting pmproxy ...
Nov 07 06:49:56 tbowling.localdomain systemd[1]: Started Proxy for Performance Metrics Collector Daemon.

● pmie.service - Performance Metrics Inference Engine
   Loaded: loaded (/usr/lib/systemd/system/pmie.service; enabled; vendor preset: enabled)
   Active: active (exited) since Wed 2018-11-07 06:49:57 EST; 21min ago
     Docs: man:pmie(1)
  Process: 24053 ExecStop=/usr/share/pcp/lib/pmie stop (code=exited, status=0/SUCCESS)
  Process: 24815 ExecStart=/usr/share/pcp/lib/pmie start (code=exited, status=0/SUCCESS)
 Main PID: 24815 (code=exited, status=0/SUCCESS)
    Tasks: 1 (limit: 4915)
   Memory: 3.2M
   CGroup: /system.slice/pmie.service
           └─25282 /usr/bin/pmie -b -h local: -l /var/log/pcp/pmie/tbowling.localdomain/pmie.log -c config.default

Nov 07 06:49:56 tbowling.localdomain systemd[1]: Starting Performance Metrics Inference Engine...
Nov 07 06:49:57 tbowling.localdomain pmie[24815]: Starting pmie ...
Nov 07 06:49:57 tbowling.localdomain systemd[1]: Started Performance Metrics Inference Engine.

● pmlogger.service - Performance Metrics Archive Logger
   Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: enabled)
   Active: active (exited) since Wed 2018-11-07 06:49:56 EST; 21min ago
     Docs: man:pmlogger(1)
  Process: 24054 ExecStop=/usr/share/pcp/lib/pmlogger stop (code=exited, status=0/SUCCESS)
  Process: 24813 ExecStart=/usr/share/pcp/lib/pmlogger start (code=exited, status=0/SUCCESS)
 Main PID: 24813 (code=exited, status=0/SUCCESS)
    Tasks: 1 (limit: 4915)
   Memory: 52.8M
   CGroup: /system.slice/pmlogger.service
           └─30382 /usr/libexec/pcp/bin/pmlogger -P -r -T24h10m -c config.default -v 100mb -m pmlogger_check 20181107.06.49

Nov 07 06:49:56 tbowling.localdomain systemd[1]: Starting Performance Metrics Archive Logger...
Nov 07 06:49:56 tbowling.localdomain pmlogger[24813]: Starting pmlogger ...
Nov 07 06:49:56 tbowling.localdomain systemd[1]: Started Performance Metrics Archive Logger.

Comment 2 Terry Bowling 2018-11-07 12:52:40 UTC
manually executing this command is not helping either:

/usr/libexec/pcp/bin/pmie_daily -V -c /etc/pcp/pmie/control.d/local -x 1

Comment 3 Mark Goodwin 2018-11-07 21:33:57 UTC
Terry, can you please attach /var/log/pcp/pmlogger/*.{log,prev}
That's where the daily cron driven pmlogger_daily logs are written.

Looks like something prevented log rotation and compression on (and after) Nov 4th 2018, but we'll need to see the messages. It's possible you've hit the following corner case identified and fixed by Kenj for pcp-4.2.0 :

commit ccafcd170d31e0ee56d7c9a3841ee9714457607a
Author: Ken McDonell <kenj.au>
Date:   Sun Oct 14 15:51:46 2018 +1100

    pmlogger_daily: change workflow to cull early
    
    Change the workflow to move culling of old files earlier in the
    pipeline.  So if there is a problem with log rewriting or merging that
    is unattended for a log time, we'll eventually cull the offending
    archives rather than having the rewriting or mering failure block
    the culling and (in extreme cases) lead to full flesystems
    
    Also add qa/686 (new) to check all of this out.


Cheers

Comment 4 Terry Bowling 2018-11-08 12:31:10 UTC
Created attachment 1503308 [details]
pmlogger log files

attached a tarball of pmlogger log files.

I'm not sure how helpful it will be.  I had already deleted everything before Nov 3, and then I had moved Nov 4 & 5 logs to my /home dir to save space.  So when it ran this morning they were not there.

I moved the Nov 4 & 5 logs back to the proper location and ran the daily cron job.  Omitting the -N flag did not seem to have an effect and outputs no verbose messaging.  It just immediately returns.

[root@tbowling tbowling.localdomain]#  /usr/libexec/pcp/bin/pmlogger_daily -VN -c /etc/pcp/pmlogger/control.d/local 
+ date-and-timestamp 2018-11-08 07:20:29 1541679629
# $version=1.1    
+ export version; version=1.1    
# $PCP_COMPRESSAFTER=1    
+ export PCP_COMPRESSAFTER; PCP_COMPRESSAFTER=1    
+ cd /var/log/pcp/pmlogger/tbowling.localdomain

=== daily maintenance of PCP archives for host local: ===

+ get mutex lock
pmlogger_daily: [control.d/local:26]
Error: no pmlogger instance running for host "local:"
... logging for host "local:" unchanged
pmlogger_daily: [control.d/local:26]
Warning: skipping pmnewlog because we don't know which pmlogger to stop
pmlogger_daily: Warning: no archives found to merge
pmlogger_daily: [control.d/local:26]
Warning: current volume of current pmlogger not known, compression skipped

Comment 5 Terry Bowling 2018-11-08 13:03:21 UTC
Created attachment 1503310 [details]
updated pmlogger logs after nov 4 & 5 logs restored

update - after restoring the Nov 4 & 5 logs, it looks like the regular cron jobs ran and compressed them.

However Nov 6th logs are still not combined and compressed.  Here's the updated cron logs.

# du -h * |sort -h; du -h
4.0K	Latest
8.0K	20181108.07.25.index
32K	pmlogger.log
36K	pmlogger.log.prior
56K	20181108.00.10.index
128K	20181107.index
176K	20181106.index
180K	20181105.index
184K	20181104.index
464K	20181104.meta.xz
572K	20181105.meta.xz
2.0M	20181108.07.25.meta
11M	20181106.0.xz
11M	20181106.1.xz
11M	20181106.2.xz
11M	20181106.3.xz
11M	20181106.4.xz
11M	20181106.5.xz
11M	20181106.6.xz
12M	20181106.7.xz
14M	20181108.00.10.meta
23M	20181108.00.10.7
33M	20181106.26
37M	20181107.meta
53M	20181105.1.xz
54M	20181106.meta
59M	20181104.1.xz
64M	20181108.07.25.0
101M	20181106.10
101M	20181106.11
101M	20181106.12
101M	20181106.13
101M	20181106.14
101M	20181106.15
101M	20181106.16
101M	20181106.17
101M	20181106.18
101M	20181106.19
101M	20181106.20
101M	20181106.21
101M	20181106.22
101M	20181106.23
101M	20181106.24
101M	20181106.25
101M	20181106.8
101M	20181106.9
101M	20181108.00.10.0
101M	20181108.00.10.1
101M	20181108.00.10.2
101M	20181108.00.10.3
101M	20181108.00.10.4
101M	20181108.00.10.5
101M	20181108.00.10.6
228M	20181105.0.xz
232M	20181104.0.xz
1.7G	20181107.0
5.0G	.

Comment 6 Mark Goodwin 2018-11-09 04:54:30 UTC
pmlogger_daily.log says :

=== daily maintenance of PCP archives for host local: ===

pmlogger_daily: [control.d/local:26]
Error: no pmlogger instance running for host "local:"
... logging for host "local:" unchanged
pmlogger_daily: [control.d/local:26]
Warning: skipping pmnewlog because we don't know which pmlogger to stop
pmlogger_daily: Warning: no archives found to merge
pmlogger_daily: [control.d/local:26]
Warning: current volume of current pmlogger not known, compression skipped

So compression and culling is being skipped. The fix mentioned in Comment #3 should avoid that problem. But I'm not sure why it can't find the pid for the pmlogger for host "local:" - when that happens, it skips compression. That would also be avoided by the patch in Comment #3.

/var/lib/pcp/config/pmlogger is mode 775 pcp/pcp, but strangely, /var/lib/pcp/config/pmlogger/config.default is mode 644 root/root. My system is the same ... this seems wrong and means any subsequent logging config change can't be changed by pmlogconf when driven from a crontab script (they all run as user pcp).

THis is backed up by Terry's pmlogger_check.log.prev, which says :

cp: cannot create regular file '/var/lib/pcp/config/pmlogger/config.default.tmp': Permission denied
pmlogger_check [/etc/pcp/pmlogger/control.d/local:26]
Warning: pmlogconf failed to reconfigure "/var/lib/pcp/config/pmlogger/config.default"
pmlogconf: Error: existing config file "/var/lib/pcp/config/pmlogger/config.default" is not writeable

So there are a few unexplained anomalies that only seem to be affecting Terry!
Can you paste your /etc/pcp/pmlogger/control.d/local here please?

thanks

Comment 7 Mark Goodwin 2018-11-20 02:14:30 UTC
Terry, please update your F28 system to pcp-4.2.0-1. It's currently in F28 updates-testing repo, so use something like: 

# dnf --enablerepo=updates-testing update pcp

We are very interested if this resolves the original issue as reported - as previously noted, there are new patches in v4.2 related to this issue.

Thanks
-- Mark

Comment 8 Terry Bowling 2018-11-23 14:57:59 UTC
Upgraded to F29 and latest pcp-4.2 from fedora testing.  

Still has not compressed yesterday's logs.  Is this only me?  Is there something wrong with my system or do you see the same on yours?  Or is this just the selinux errors?

I see some selinux errors during the upgrading of the packages, as well as tons of errors related to the pcp cron tasks after
Nov 21 13:41:20 tbowling pmcd[9406]: Starting pmcd ...
Nov 21 13:41:20 tbowling pmcd[9406]: Installing dm PMDA ...
Nov 21 13:41:20 tbowling audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmcd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 13:41:24 tbowling audit[9892]: AVC avc:  denied  { search } for  pid=9892 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmlogger_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0
Nov 21 13:41:24 tbowling pmlogger[9862]: Starting pmlogger ...
Nov 21 13:41:24 tbowling audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmlogger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 13:41:25 tbowling audit[9527]: AVC avc:  denied  { write } for  pid=9527 comm="Install" name="pmcd.conf" dev="dm-1" ino=17860410 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0
Nov 21 13:41:25 tbowling pmcd[9406]: Installing nfsclient PMDA ...
Nov 21 13:41:27 tbowling audit[11618]: AVC avc:  denied  { write } for  pid=11618 comm="Install" name="pmcd.conf" dev="dm-1" ino=17860410 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0
Nov 21 13:41:29 tbowling audit[15624]: AVC avc:  denied  { search } for  pid=15624 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmlogger_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0
Nov 21 13:41:31 tbowling audit[15877]: AVC avc:  denied  { search } for  pid=15877 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmlogger_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0
Nov 21 13:41:38 tbowling setroubleshoot[13343]: SELinux is preventing Install from write access on the file /etc/pcp/pmcd/pmcd.conf. For complete SELinux messages run: sealert -l 7319dc75-1176-4d1e-b50c-2e51d8b863f9
Nov 21 13:41:38 tbowling python3[13343]: SELinux is preventing Install from write access on the file /etc/pcp/pmcd/pmcd.conf.#012#012*****  Plugin catchall (100. confidence) suggests   **************************#012#012If you believe that Install should be allowed write access on the pmcd.conf file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'Install' --raw | audit2allow -M my-Install#012# semodule -X 300 -i my-Install.pp#012
Nov 21 13:41:38 tbowling setroubleshoot[13343]: SELinux is preventing Install from write access on the file /etc/pcp/pmcd/pmcd.conf. For complete SELinux messages run: sealert -l 7319dc75-1176-4d1e-b50c-2e51d8b863f9
Nov 21 13:41:38 tbowling python3[13343]: SELinux is preventing Install from write access on the file /etc/pcp/pmcd/pmcd.conf.#012#012*****  Plugin catchall (100. confidence) suggests   **************************#012#012If you believe that Install should be allowed write access on the pmcd.conf file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'Install' --raw | audit2allow -M my-Install#012# semodule -X 300 -i my-Install.pp#012
Nov 21 13:42:02 tbowling audit[15977]: AVC avc:  denied  { search } for  pid=15977 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmie_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0
Nov 21 13:42:02 tbowling audit[15997]: AVC avc:  denied  { search } for  pid=15997 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmie_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0
Nov 21 13:42:02 tbowling pmie[15947]: /usr/share/pcp/lib/pmie: Warning: Performance Co-Pilot Inference Engine (pmie) not permanently enabled.
Nov 21 13:42:02 tbowling pmie[15947]:    To enable pmie, run the following as root:
Nov 21 13:42:02 tbowling pmie[15947]:    # /usr/bin/systemctl enable pmie.service
Nov 21 13:42:02 tbowling pmie[15947]: Starting pmie ...
Nov 21 13:42:02 tbowling audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 13:55:01 tbowling audit[16924]: USER_ACCT pid=16924 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_access,pam_unix,pam_localuser acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:55:01 tbowling audit[16924]: CRED_ACQ pid=16924 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:55:01 tbowling audit[16930]: USER_ACCT pid=16930 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 13:55:01 tbowling audit[16930]: USER_START pid=16930 uid=0 auid=985 ses=540 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 13:55:01 tbowling systemd[1]: Started Session 539 of user pcp.
Nov 21 13:55:01 tbowling audit[16924]: USER_START pid=16924 uid=0 auid=985 ses=539 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:55:01 tbowling audit[16924]: CRED_REFR pid=16924 uid=0 auid=985 ses=539 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:55:02 tbowling audit[16924]: CRED_DISP pid=16924 uid=0 auid=985 ses=539 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:55:02 tbowling audit[16924]: USER_END pid=16924 uid=0 auid=985 ses=539 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:58:01 tbowling audit[17668]: USER_ACCT pid=17668 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_access,pam_unix,pam_localuser acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:58:01 tbowling audit[17668]: CRED_ACQ pid=17668 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:58:02 tbowling audit[17673]: USER_ACCT pid=17673 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 13:58:02 tbowling audit[17673]: USER_START pid=17673 uid=0 auid=985 ses=542 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 13:58:02 tbowling systemd[1]: Started Session 541 of user pcp.
Nov 21 13:58:02 tbowling audit[17668]: USER_START pid=17668 uid=0 auid=985 ses=541 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:58:02 tbowling audit[17668]: CRED_REFR pid=17668 uid=0 auid=985 ses=541 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:58:02 tbowling audit[17668]: CRED_DISP pid=17668 uid=0 auid=985 ses=541 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 13:58:02 tbowling audit[17668]: USER_END pid=17668 uid=0 auid=985 ses=541 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 14:25:01 tbowling audit[19925]: USER_ACCT pid=19925 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_access,pam_unix,pam_localuser acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 14:25:01 tbowling audit[19925]: CRED_ACQ pid=19925 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 14:25:01 tbowling audit[19930]: USER_ACCT pid=19930 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 14:25:01 tbowling audit[19930]: USER_START pid=19930 uid=0 auid=985 ses=544 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 14:25:01 tbowling systemd[1]: Started Session 543 of user pcp.
Nov 21 14:25:01 tbowling audit[19925]: USER_START pid=19925 uid=0 auid=985 ses=543 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'
Nov 21 14:25:01 tbowling audit[19925]: CRED_REFR pid=19925 uid=0 auid=985 ses=543 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success'



pcp-4.2.0-1.fc29.x86_64

[root@tbowling tbowling.localdomain]# pwd
/var/log/pcp/pmlogger/tbowling.localdomain

[root@tbowling tbowling.localdomain]# du -h *|sort -h; du -h
4.0K	Latest
36K	pmlogger.log
36K	pmlogger.log.prior
84K	20181121.index
128K	20181123.00.10.index
216K	20181122.index
22M	20181123.00.10.meta
29M	20181121.meta
56M	20181122.meta
60M	20181122.26
68M	20181123.00.10.9
101M	20181122.0
101M	20181122.1
101M	20181122.10
101M	20181122.11
101M	20181122.12
101M	20181122.13
101M	20181122.14
101M	20181122.15
101M	20181122.16
101M	20181122.17
101M	20181122.18
101M	20181122.19
101M	20181122.2
101M	20181122.20
101M	20181122.21
101M	20181122.22
101M	20181122.23
101M	20181122.24
101M	20181122.25
101M	20181122.3
101M	20181122.4
101M	20181122.5
101M	20181122.6
101M	20181122.7
101M	20181122.8
101M	20181122.9
101M	20181123.00.10.0
101M	20181123.00.10.1
101M	20181123.00.10.2
101M	20181123.00.10.3
101M	20181123.00.10.4
101M	20181123.00.10.5
101M	20181123.00.10.6
101M	20181123.00.10.7
101M	20181123.00.10.8
1.2G	20181121.0
4.8G	.

Comment 9 Terry Bowling 2018-11-23 15:13:09 UTC
I set selinux to permissive, reinstalled all of the pcp packages, and restarted the services.  This results in far fewer errors and looks cleaner.

I have pmlogger set to keep 1 day of logs.  I still have a 1.2 GB file from the 21st laying around.

Nov 22 has 2.6 GB of logs not compressed.  I understand it should not compress until tomorrow, but still concerning that 1 day's logs are so large. Many customers might find this concerning as I assume at the end of today before they get compressed, Nov 22 + 23 will consume >5G of space for normal operations.  This could be a problem in the cloud and large virt environments where space is a premium.

Comment 10 Lukas Berk 2018-11-23 15:17:53 UTC
Terry, if things are working properly, the logs should be compressed within the same day.  Well before the 1Gb+ sizes you're seeing.

% date
Fri Nov 23 10:16:38 EST 2018

% ls -lh /var/log/pcp/pmlogger/`hostname`/20181123*
-rw-r--r--. 1 pcp pcp  70K Nov 23 09:31 /var/log/pcp/pmlogger/toium/20181123.09.30.0.xz
-rw-r--r--. 1 pcp pcp  412 Nov 23 09:31 /var/log/pcp/pmlogger/toium/20181123.09.30.index
-rw-r--r--. 1 pcp pcp  51K Nov 23 09:31 /var/log/pcp/pmlogger/toium/20181123.09.30.meta.xz
-rw-r--r--. 1 pcp pcp  57M Nov 23 10:16 /var/log/pcp/pmlogger/toium/20181123.09.31.0
-rw-r--r--. 1 pcp pcp 5.7K Nov 23 10:16 /var/log/pcp/pmlogger/toium/20181123.09.31.index
-rw-r--r--. 1 pcp pcp 1.7M Nov 23 10:16 /var/log/pcp/pmlogger/toium/20181123.09.31.meta

Comment 11 Mark Goodwin 2018-11-27 22:38:10 UTC
Hi Terry, I think the only way we'll figure this one out is to grab an extensive tarball of all the relevant directories on your system and then unpack it on an f28 dev machine to repro it. Can you give me a temp login to grab that tarball? Please give it whatever prio you can - we obviously need the log management scripts and compression/culling to be bullet-proof!

Cheers
-- Mark

Comment 12 Terry Bowling 2018-11-28 13:00:26 UTC
Mark,

As we discussed, I have since upgraded to F29.
As you requested, I executed the following command:

cd /; sudo tar czf /tmp/pcp-tarball.tgz etc/{cron.d,pcp,hosts,hostname,sysconfig} usr/lib/systemd/system/pm* var/lib/pcp usr/libexec/pcp usr/share/pcp var/lib/pcp var/log

The resulting file pcp-tarball.tgz has been uploaded to guest.redhat.com and I have verified that md5sums match.

Comment 13 Mark Goodwin 2018-11-29 02:52:29 UTC
Thanks Terry - I've examined the tarball, with the following notes:

1. errors in some of the pmlogger daily/check cronjob logs:

[mgoodwin@pcp pmlogger]$ cat pmlogger_daily.log.prev
pmlogextract: Error: __pmLogRead[log 20181108.00.10]: Corrupted record in a PCP archive log
pmlogextract: Error occurred at byte offset 0 into a file of 23552000 bytes.
The last record, and the remainder of this file will not be extracted.

[mgoodwin@pcp pmlogger]$ cat pmlogger_daily.log
pmlogextract: New log volume 1, at 22:07:06.924
[mgoodwin@pcp pmlogger]$ cat pmlogger_check.log.prev
Restarting primary pmlogger for host "local:" ... [process 8584]  done
Latest folio created for 20181126.07.01
cat: write error: Broken pipe

The first one, "Corrupted record" can cause issues with the script completing it's log merging.
This is probably consequential to running out of filesystem space on /var on or before Nov 8th.

The second one, "cat: write error: Broken pipe" has been seen recently on other systems
(including mine and nathan's!). Not sure if it's benign, but needs further investigation.

2. Other than the above, the only non-standard config changes in your PCP set up are :

   $PCP_COMPRESSAFTER=1 in /etc/pcp/pmlogger/control.d/local (this overrides the default)

   and

   /etc/sysconfig/pmlogger has PMLOGGER_LOCAL=1 enabled (i.e. comment removed)
   I thought this was supposed to be enabled anyway, but a virgin install on an f29 VM
   with pcp-4.2 still had it commented out.

3. I have made both changes listed in (2) on my f29 VM / pcp-4.2 and will let it soak
   for a few days

Regards

Comment 14 Mark Goodwin 2018-12-04 04:42:36 UTC
Created attachment 1511187 [details]
patch for whole day semantics for $PCP_COMPRESSAFTER


The issue has been reproduced on the test VM after several days soak. The root cause appears to be the semantics of the find -mtime option, where the default semantics are N * 24hours before the current time. The desirable semantics are 24 hours before the start of the current day, i.e. "number of days") before archive files are compressed. The fix under test is to use find -daystart -mtime N where N is the desired number of days to hold off compressing PCP archive logs managed by the PCP log rotation scripts (pmlogger_daily et al). Since not all platforms support find -daystart, we test for it early in the script and only use it if it's available. For platforms that do not support this option (known to be BSD derivatives), we are already handling the find options specially anyway, so there will be no change on those platforms.

For QA, all tests in groups 'sanity', 'pmlogger' and 'logutil' are passing with the change applied. Applicable new QA involves at least a week of soak testing on assorted platforms - starting with the test f29 VM.

Comment 15 Mark Goodwin 2018-12-04 21:29:15 UTC
Created attachment 1511470 [details]
patch (v2) for whole day semantics for $PCP_COMPRESSAFTER

Updated patch, quoting wasn't quite right.

Comment 16 Mark Goodwin 2019-01-23 03:26:41 UTC
Created attachment 1522563 [details]
reduce per-process logging filesystem space requirements

New patch, as committed upstream for pcp-4.3.0. This obsoletes the "whole day patch", which was faulty.

Comment 17 Fedora Update System 2019-04-26 08:17:08 UTC
pcp-4.3.2-1.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-89ed2e2383

Comment 18 Fedora Update System 2019-04-26 08:18:35 UTC
pcp-4.3.2-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-30c683f0ae

Comment 19 Fedora Update System 2019-04-26 08:19:33 UTC
pcp-4.3.2-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2019-2f04e77d53

Comment 20 Fedora Update System 2019-04-27 22:30:21 UTC
pcp-4.3.2-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-89ed2e2383

Comment 21 Fedora Update System 2019-04-27 23:11:43 UTC
pcp-4.3.2-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-2f04e77d53

Comment 22 Fedora Update System 2019-04-28 00:13:36 UTC
pcp-4.3.2-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-30c683f0ae

Comment 23 Ben Cotton 2019-05-02 19:38:32 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Fedora Update System 2019-05-03 00:58:53 UTC
pcp-4.3.2-1.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Fedora Update System 2019-05-03 01:35:53 UTC
pcp-4.3.2-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 26 Fedora Update System 2019-05-03 03:42:02 UTC
pcp-4.3.2-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.