Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1276711
Summary: | sort: sort results don't match when processing same source data set while cs_CZ.utf8 lang set | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ondrej Kozina <okozina> |
Component: | glibc | Assignee: | Carlos O'Donell <codonell> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 23 | CC: | admiller, arjun.is, codonell, fweimer, jakub, kdudka, kzak, law, mfabian, okozina, ooprala, ovasik, pbrady, p, pfrankli, praiskup, siddhesh, twaugh |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glibc-2.22-5.fc23 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-11-12 05:22:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Created attachment 1087965 [details]
Second source file
Hope I'll make it more clear: I don't expect result files to be equal being sorted with different locales set, but I would expect it to be sorted into same result while processed under specific locales I am not able to reproduce it with coreutils-8.24-4.fc24.x86_64 and cs_CZ.utf8. Please attach also the output of sort you are getting in both of the cases. Created attachment 1087967 [details]
Not matching res.00 (doesn't match with res.01) cz_CS.utf-8 set
Created attachment 1087969 [details]
Not matching res.01 (doesn't match with res.00) cz_CS.utf-8 set
Created attachment 1087971 [details]
matching res.00 (with LC_ALL=C set)
res.00 and res.01 doesn't match. These are results of sort run with cs_CZ.utf-8 being set as stated above. res.00.C is result being created by sort run with export LC_ALL=C cat src.00 | sort > res.00.C (res.00.C and res.01.C matched so I didn't upload the identical matching file res.01.C) The attached output looks broken to me. Do you have valid locale data installed? Please paste output of the following commands: rpm -q glibc-common rpm -V glibc-common I have just reproduced it on a rawhide machine. It happened only if unavailable locales were requested (like the incorrectly spelled cz_CS.utf-8 string you use somewhere in this bug report). After upgrading my rawhide machine, the issue went away. I guess it was glibc update what fixed it, will try to confirm. Anyway, we should consider an explicit fallback to POSIX locales in case the requested locales are not available as Pádraig suggests in bug #1270480 comment #8. Unfortunately, a downgrade of glibc did not reintroduce the issue. It could be that it happens only if the process generating locale-archive crashes during the installation of glibc (bug #1270480 comment #7), which I failed to reproduce either. Please update glibc\* packages to the latest available version and check whether this bug is still reproducible. Maybe this is the recent issue with cs_CZ I analyzed at: https://bugzilla.opensuse.org/show_bug.cgi?id=948165#c10 Re falling back to "C" upon setlocale() failure. That's what we do, but this is silent. We really should be bleating for sort(1) at least. [root@frawhide ~]# rpm -q glibc-common glibc-common-2.22.90-8.fc24.x86_64 [root@frawhide ~]# rpm -V glibc-common; echo $? 0 The typo in locale name (utf8 -> utf-8) was only in my answers (and bug description) but not in my system settings. The only relevant and correct names are in comment #1. My apologies for any confusion in this matter. Perhaps you've found yet another way how to reproduce the bug... I'm going to try reproduce it with updated glibc (if any update available) as requested After update to glibc-common-2.22.90-13 the bug is gone. (Accidentally closed this bug. Letting maintainers to decide its fate...) While I couldn't find the commit fixing the bug in fedora, I'm fairly sure this is the bug mentioned in comment #12 I came to the same conclusion...Andreas (on the SUSE bz) also notes this should be fixed by a glibc patch... Actually I don't see an update for this issue in F23, so reassigning component. This is to track the backport of this to F23: https://sourceware.org/git/?p=glibc.git;a=commit;h=87701a58 Fixed in f23. Waiting for final builds before bodhi. http://koji.fedoraproject.org/koji/taskinfo?taskID=11762533 glibc-2.22-5.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-4563ef63aa glibc-2.22-5.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update glibc' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-4563ef63aa *** Bug 1269895 has been marked as a duplicate of this bug. *** I'm just curious, have we installed test-case for this issue? I mean something like: 'assert(strcoll("cx", "ch") < 0)' in cs_CZ? glibc-2.22-5.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 1087964 [details] First source file Description of problem: I've experienced strange behaviour while sorting following data. I have two source files: 'src.00' and 'src.01'. Both source files contain same set of data (lines). They only differ in order in which data (lines) is stored. When I sort the source files with LC_ALL=C set the sorted output is always the same. In other words files with sorted output res.00 and res.01 are equal. To get files res.00 and res.01 I used following commands: cat src.00 | sort > res.00 cat src.01 | sort > res.01 On the other hand when I sort the source files with following locales set: LANG=cs_CZ.utf8 LC_CTYPE="cs_CZ.utf8" LC_NUMERIC="cs_CZ.utf8" LC_TIME="cs_CZ.utf8" LC_COLLATE="cs_CZ.utf8" LC_MONETARY="cs_CZ.utf8" LC_MESSAGES=en_US.utf8 LC_PAPER="cs_CZ.utf8" LC_NAME="cs_CZ.utf8" LC_ADDRESS="cs_CZ.utf8" LC_TELEPHONE="cs_CZ.utf8" LC_MEASUREMENT="cs_CZ.utf8" LC_IDENTIFICATION="cs_CZ.utf8" LC_ALL= The res.00 and res.01 generated with same procedure as above will differ. Is it expected? Version-Release number of selected component (if applicable): coreutils-8.24-4.fc24.x86_64