Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
DescriptionJaroslav Škarvada
2018-05-24 15:19:21 UTC
Created attachment 1441096[details]
Reproducer
Description of problem:
E.g. '[[=a=]]' regex doesn't match 'á' in Czech locale as it should. It seems to be regression, because it worked in glibc-2.26 and older. All locales seems to be affected, not only the Czech.
Version-Release number of selected component (if applicable):
glibc-2.27-14.fc28.x86_64
How reproducible:
Always
Steps to Reproduce:
1. gcc -o regex regex.c
2. ./regex
3.
Actual results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 1
Expected results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 0
Additional info:
It's blocking grep rebuild.
*** Bug 1582224 has been marked as a duplicate of this bug. ***
Comment 2Jaroslav Škarvada
2018-05-24 15:32:58 UTC
It's not only about 'á' in cs_CZ.UTF-8 or en_US.UTF-8. There are more matches that worked and don't work now, e.g.:
$ echo 'é' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'è' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'ê' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
...
This appears to be a deliberate change in character equivalences. As part of the updates for https://sourceware.org/bugzilla/show_bug.cgi?id=14095, most accented and non-accented characters are no longer considered equivalent. I do not know if this the intend of the current Unicode version.
Created attachment 1441096 [details] Reproducer Description of problem: E.g. '[[=a=]]' regex doesn't match 'á' in Czech locale as it should. It seems to be regression, because it worked in glibc-2.26 and older. All locales seems to be affected, not only the Czech. Version-Release number of selected component (if applicable): glibc-2.27-14.fc28.x86_64 How reproducible: Always Steps to Reproduce: 1. gcc -o regex regex.c 2. ./regex 3. Actual results: locale: cs_CZ.UTF-8 regcomp: 0 regexec: 1 Expected results: locale: cs_CZ.UTF-8 regcomp: 0 regexec: 0 Additional info: It's blocking grep rebuild.