Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1276711 - sort: sort results don't match when processing same source data set while cs_CZ.utf8 lang set
Summary: sort: sort results don't match when processing same source data set while cs_...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1269895 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-30 15:18 UTC by Ondrej Kozina
Modified: 2015-11-12 05:22 UTC (History)
18 users (show)

Fixed In Version: glibc-2.22-5.fc23
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-12 05:22:08 UTC
Type: Bug


Attachments (Terms of Use)
First source file (10.83 KB, text/plain)
2015-10-30 15:18 UTC, Ondrej Kozina
no flags Details
Second source file (10.83 KB, text/plain)
2015-10-30 15:19 UTC, Ondrej Kozina
no flags Details
Not matching res.00 (doesn't match with res.01) cz_CS.utf-8 set (10.83 KB, text/plain)
2015-10-30 15:47 UTC, Ondrej Kozina
no flags Details
Not matching res.01 (doesn't match with res.00) cz_CS.utf-8 set (10.83 KB, text/plain)
2015-10-30 15:47 UTC, Ondrej Kozina
no flags Details
matching res.00 (with LC_ALL=C set) (10.83 KB, text/plain)
2015-10-30 15:54 UTC, Ondrej Kozina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 18589 0 None None None Never

Description Ondrej Kozina 2015-10-30 15:18:28 UTC
Created attachment 1087964 [details]
First source file

Description of problem:

I've experienced strange behaviour while sorting following data. I have two source files: 'src.00' and 'src.01'.

Both source files contain same set of data (lines). They only differ in order in which data (lines) is stored.

When I sort the source files with LC_ALL=C set the sorted output is always the same. In other words files with sorted output res.00 and res.01 are equal.

To get files res.00 and res.01 I used following commands:
cat src.00 | sort > res.00
cat src.01 | sort > res.01

On the other hand when I sort the source files with following locales set:
LANG=cs_CZ.utf8
LC_CTYPE="cs_CZ.utf8"
LC_NUMERIC="cs_CZ.utf8"
LC_TIME="cs_CZ.utf8"
LC_COLLATE="cs_CZ.utf8"
LC_MONETARY="cs_CZ.utf8"
LC_MESSAGES=en_US.utf8
LC_PAPER="cs_CZ.utf8"
LC_NAME="cs_CZ.utf8"
LC_ADDRESS="cs_CZ.utf8"
LC_TELEPHONE="cs_CZ.utf8"
LC_MEASUREMENT="cs_CZ.utf8"
LC_IDENTIFICATION="cs_CZ.utf8"
LC_ALL=

The res.00 and res.01 generated with same procedure as above will differ. Is it expected?

Version-Release number of selected component (if applicable):
coreutils-8.24-4.fc24.x86_64

Comment 1 Ondrej Kozina 2015-10-30 15:19:03 UTC
Created attachment 1087965 [details]
Second source file

Comment 2 Ondrej Kozina 2015-10-30 15:24:36 UTC
Hope I'll make it more clear:

I don't expect result files to be equal being sorted with different locales set, but I would expect it to be sorted into same result while processed under specific locales

Comment 3 Kamil Dudka 2015-10-30 15:38:15 UTC
I am not able to reproduce it with coreutils-8.24-4.fc24.x86_64 and cs_CZ.utf8.  Please attach also the output of sort you are getting in both of the cases.

Comment 4 Ondrej Kozina 2015-10-30 15:47:11 UTC
Created attachment 1087967 [details]
Not matching res.00 (doesn't match with res.01) cz_CS.utf-8 set

Comment 5 Ondrej Kozina 2015-10-30 15:47:42 UTC
Created attachment 1087969 [details]
Not matching res.01 (doesn't match with res.00) cz_CS.utf-8 set

Comment 6 Ondrej Kozina 2015-10-30 15:54:24 UTC
Created attachment 1087971 [details]
matching res.00 (with LC_ALL=C set)

Comment 7 Ondrej Kozina 2015-10-30 15:57:51 UTC
res.00 and res.01 doesn't match. These are results of sort run with cs_CZ.utf-8 being set as stated above.

res.00.C is result being created by sort run with
export LC_ALL=C
cat src.00 | sort > res.00.C

Comment 8 Ondrej Kozina 2015-10-30 15:59:18 UTC
(res.00.C and res.01.C matched so I didn't upload the identical matching file res.01.C)

Comment 9 Kamil Dudka 2015-10-30 16:13:08 UTC
The attached output looks broken to me.  Do you have valid locale data installed?

Please paste output of the following commands:
rpm -q glibc-common
rpm -V glibc-common

Comment 10 Kamil Dudka 2015-10-30 16:42:30 UTC
I have just reproduced it on a rawhide machine.  It happened only if unavailable locales were requested (like the incorrectly spelled cz_CS.utf-8 string you use somewhere in this bug report).  After upgrading my rawhide machine, the issue went away.  I guess it was glibc update what fixed it, will try to confirm.  

Anyway, we should consider an explicit fallback to POSIX locales in case the requested locales are not available as Pádraig suggests in bug #1270480 comment #8.

Comment 11 Kamil Dudka 2015-10-30 16:53:13 UTC
Unfortunately, a downgrade of glibc did not reintroduce the issue.  It could be that it happens only if the process generating locale-archive crashes during the installation of glibc (bug #1270480 comment #7), which I failed to reproduce either.

Please update glibc\* packages to the latest available version and check whether this bug is still reproducible.

Comment 12 Pádraig Brady 2015-10-31 02:06:01 UTC
Maybe this is the recent issue with cs_CZ I analyzed at:
https://bugzilla.opensuse.org/show_bug.cgi?id=948165#c10

Re falling back to "C" upon setlocale() failure.
That's what we do, but this is silent.
We really should be bleating for sort(1) at least.

Comment 13 Ondrej Kozina 2015-11-02 08:24:05 UTC
[root@frawhide ~]# rpm -q glibc-common
glibc-common-2.22.90-8.fc24.x86_64

[root@frawhide ~]# rpm -V glibc-common; echo $?
0

The typo in locale name (utf8 -> utf-8) was only in my answers (and bug description) but not in my system settings. The only relevant and correct names are in comment #1. My apologies for any confusion in this matter.

Perhaps you've found yet another way how to reproduce the bug...

I'm going to try reproduce it with updated glibc (if any update available) as requested

Comment 14 Ondrej Kozina 2015-11-02 08:50:09 UTC
After update to glibc-common-2.22.90-13 the bug is gone.

Comment 15 Ondrej Kozina 2015-11-02 09:58:10 UTC
(Accidentally closed this bug. Letting maintainers to decide its fate...)

Comment 16 Pádraig Brady 2015-11-02 11:56:43 UTC
While I couldn't find the commit fixing the bug in fedora, I'm fairly sure this is the bug mentioned in comment #12

Comment 17 Ondrej Oprala 2015-11-03 15:32:21 UTC
I came to the same conclusion...Andreas (on the SUSE bz) also notes this should be fixed by a glibc patch...

Comment 18 Pádraig Brady 2015-11-06 17:34:12 UTC
Actually I don't see an update for this issue in F23, so reassigning component.

This is to track the backport of this to F23:
https://sourceware.org/git/?p=glibc.git;a=commit;h=87701a58

Comment 19 Carlos O'Donell 2015-11-10 02:24:42 UTC
Fixed in f23. Waiting for final builds before bodhi.

http://koji.fedoraproject.org/koji/taskinfo?taskID=11762533

Comment 20 Fedora Update System 2015-11-10 03:54:01 UTC
glibc-2.22-5.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-4563ef63aa

Comment 21 Fedora Update System 2015-11-11 02:23:24 UTC
glibc-2.22-5.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update glibc'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-4563ef63aa

Comment 22 Pavel Raiskup 2015-11-11 14:08:15 UTC
*** Bug 1269895 has been marked as a duplicate of this bug. ***

Comment 23 Pavel Raiskup 2015-11-11 14:50:14 UTC
I'm just curious, have we installed test-case for this issue?  I mean
something like: 'assert(strcoll("cx", "ch") < 0)' in cs_CZ?

Comment 24 Pádraig Brady 2015-11-11 14:57:43 UTC
glibc now has:
https://sourceware.org/git/?p=glibc.git;a=blob;f=string/bug-strcoll2.c

Comment 25 Fedora Update System 2015-11-12 05:21:54 UTC
glibc-2.22-5.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.