Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 444342 - sealert: Input is not proper UTF-8, indicate encoding
Summary: sealert: Input is not proper UTF-8, indicate encoding
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: setroubleshoot
Version: 11
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Daniel Walsh
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F9Target
TreeView+ depends on / blocked
 
Reported: 2008-04-27 14:33 UTC by Robert Scheck
Modified: 2010-04-08 14:39 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-04-08 14:39:21 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/var/lib/setroubleshoot/audit_listener_database.xml (deleted)
2008-04-28 18:27 UTC, Robert Scheck
no flags Details

Description Robert Scheck 2008-04-27 14:33:51 UTC
Description of problem:
$ sealert -v -l 27829382-ddb4-42b4-a1f5-dc02c1d2754b
Entity: line 58: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xF6 0x6E 0x6E 0x65
    Sie können ein lokales Richtlinienmodul generieren, um diesen Zugriff
         ^
2008-04-27 16:30:53,017 [rpc.ERROR] exception parserError: xmlParseDoc() failed
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 940, in
handle_client_io
    self.receiver.feed(data)
  File "/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 762, in feed
    self.process()
  File "/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 754, in
process
    self.dispatchFunc(self.header, self.body)
  File "/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 972, in
default_request_handler
    self.handle_return(type, rpc_id, body)
  File "/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 958, in
handle_return
    interface, method, args = convert_rpc_xml_to_args(body)
  File "/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 143, in
convert_rpc_xml_to_args
    doc = libxml2.parseDoc(cmd)
  File "/usr/lib/python2.5/site-packages/libxml2.py", line 1263, in parseDoc
    if ret is None:raise parserError('xmlParseDoc() failed')
parserError: xmlParseDoc() failed
failed to connect to server: xmlParseDoc() failed
$

Version-Release number of selected component (if applicable):
setroubleshoot-2.0.6-1
setroubleshoot-plugins-2.0.4-5

How reproducible:
Everytime (I've LANG=de_DE@euro)

Actual results:
sealert: Input is not proper UTF-8, indicate encoding

Expected results:
Just working... ;-)

Comment 1 Robert Scheck 2008-04-27 14:42:44 UTC
Apr 27 16:30:53 tux setroubleshoot: [rpc.ERROR] exception parserError:
xmlParseDoc() failed#012Traceback (most recent call last):#012  File
"/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 940, in
handle_client_io#012    self.receiver.feed(data)#012  File
"/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 762, in feed#012
   self.process()#012  File
"/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 754, in
process#012    self.dispatchFunc(self.header, self.body)#012  File
"/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 972, in
default_request_handler#012    self.handle_return(type, rpc_id, body)#012  File
"/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 958, in
handle_return#012    interface, method, args = convert_rpc_xml_to_args(body)#012
 File "/usr/lib/python2.5/site-packages/setroubleshoot/rpc.py", line 143, in
convert_rpc_xml_to_args#012    doc = libxml2.parseDoc(cmd)#012  File
"/usr/lib/python2.5/site-packages/libxml2.py", line 1263, in parseDoc#012    if
ret is None:raise parserError('xmlParseDoc() failed')#012parserError:
xmlParseDoc() failed

Comment 2 John Dennis 2008-04-28 18:13:47 UTC
Robert, would you please attach by using the "Create a New Attachment" link
below the contents of /var/lib/setroubleshoot/audit_listener_database.xml.

I need to see the data in that file to diagnose the problem. Thank you Robert.

Comment 3 Robert Scheck 2008-04-28 18:27:35 UTC
Created attachment 304022 [details]
/var/lib/setroubleshoot/audit_listener_database.xml

Of course, here it is. Looks like the translation is the problem.

Comment 4 John Dennis 2008-04-28 22:15:45 UTC
The problem originates in the de translation of the fix_description in the
plugins/catchall.py plugin with the use of umlaut o. Umlaut o should be encoded
in UTF-8 as 0xC3,0xB6 (e.g. 0303,0266 octal)

In my de.gmo file I have the following snippet (with the correct umlaut o):

    Sie k\303\266nnen ein lokales Richtlinienmodul generieren, um diesen Zugriff

But what appears in your xml database is this snippet:

    Sie k\366nnen ein lokales Richtlinienmodul generieren, um diesen Zugriff

which is wrong. I did a brief test using libxml2 to parse the above phrase with
the correct umlaut o encoding and it it failed. Somehow the 0xC3,0xB6 2 byte
sequence is being converted to the single byte 0xF6 sequence. FWIW I even notice
this in my emacs buffers. At this point I'm guessing the is some problem with
encoding/decoding 0xC3,0xB6 umlaut o utf-8 byte sequence, but I don't have a
handle on it yet.

Comment 5 John Dennis 2008-04-28 22:36:14 UTC
Umlaut o in ISO-8859 is 0xF6, so it appears as though at some point the UTF-8
encoding is being written as ISO-8859 not UTF-8

However any place in the code where we serialize xml we do so via:

serialize(encoding=i18n_encoding)

where the i18n_encoding comes from the config file and should be utf-8.

Still not sure where the encode/decode problem is, but just capturing the
investigation so far.


Comment 6 Robert Scheck 2008-04-29 09:00:06 UTC
All I can say is, that my system locale is de_DE@euro which is ISO-8859(-15)
if this maybe helps.

Comment 7 John Dennis 2008-04-29 13:10:41 UTC
Your system locale should in theory not be significant because internally we
force everything to utf-8. However if you used a tool that touched the database
file, for example an editor, it probably would have rewritten the file in
iso-8859, by any chance did you do something like that?

Comment 8 Robert Scheck 2008-04-29 15:35:38 UTC
...I still hope, less(1) is reading only per default - if not, please open a 
bug report against less :)

Comment 9 Bug Zapper 2008-05-14 10:15:27 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 10 Robert Scheck 2008-05-17 19:01:59 UTC
Ping?

Comment 11 Robert Scheck 2008-07-27 14:43:15 UTC
Ping?

Comment 12 Bug Zapper 2008-11-26 02:14:16 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Bug Zapper 2009-06-09 09:33:18 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Daniel Walsh 2010-01-19 21:04:49 UTC
Fixed in setroubleshoot-2.2.57-1.fc12


Note You need to log in before you can comment on or make changes to this bug.