Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1582229 - glibc: regex functions ignore character equivalents
Summary: glibc: regex functions ignore character equivalents
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Florian Weimer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1582224 (view as bug list)
Depends On:
Blocks: 1582219
TreeView+ depends on / blocked
 
Reported: 2018-05-24 15:19 UTC by Jaroslav Škarvada
Modified: 2018-07-17 15:17 UTC (History)
10 users (show)

Fixed In Version: glibc-2.27-30.fc28 glibc-2.27.9000-38.fc29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-17 15:17:40 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Reproducer (364 bytes, text/x-csrc)
2018-05-24 15:19 UTC, Jaroslav Škarvada
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1551009 0 unspecified CLOSED glibc: collation update and sync with cldr 2022-05-16 11:32:56 UTC
Sourceware 23036 0 None None None 2018-05-24 15:19:21 UTC

Internal Links: 1551009

Description Jaroslav Škarvada 2018-05-24 15:19:21 UTC
Created attachment 1441096 [details]
Reproducer

Description of problem:
E.g. '[[=a=]]' regex doesn't match 'á' in Czech locale as it should. It seems to be regression, because it worked in glibc-2.26 and older. All locales seems to be affected, not only the Czech.

Version-Release number of selected component (if applicable):
glibc-2.27-14.fc28.x86_64

How reproducible:
Always

Steps to Reproduce:
1. gcc -o regex regex.c
2. ./regex
3.

Actual results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 1

Expected results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 0

Additional info:
It's blocking grep rebuild.

Comment 1 Florian Weimer 2018-05-24 15:21:43 UTC
*** Bug 1582224 has been marked as a duplicate of this bug. ***

Comment 2 Jaroslav Škarvada 2018-05-24 15:32:58 UTC
It's not only about 'á' in cs_CZ.UTF-8 or en_US.UTF-8. There are more matches that worked and don't work now, e.g.:

$ echo 'é' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'è' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'ê' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
...

Comment 3 Florian Weimer 2018-07-09 12:03:56 UTC
This appears to be a deliberate change in character equivalences.  As part of the updates for https://sourceware.org/bugzilla/show_bug.cgi?id=14095, most accented and non-accented characters are no longer considered equivalent.  I do not know if this the intend of the current Unicode version.

Comment 4 Florian Weimer 2018-07-09 15:35:18 UTC
This may be an algorithmic issue after all, not a data problem.

Comment 5 Fedora Update System 2018-07-12 15:41:42 UTC
glibc-2.27-30.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 6 Fedora Update System 2018-07-13 19:29:20 UTC
glibc-2.27-30.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 7 Fedora Update System 2018-07-17 15:17:40 UTC
glibc-2.27-30.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.