Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1582229

Summary: glibc: regex functions ignore character equivalents
Product: [Fedora] Fedora Reporter: Jaroslav Škarvada <jskarvad>
Component: glibcAssignee: Florian Weimer <fweimer>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 28CC: aoliva, arjun.is, codonell, dj, fweimer, law, mfabian, pfrankli, rth, siddhesh
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.27-30.fc28 glibc-2.27.9000-38.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-17 15:17:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1582219    
Attachments:
Description Flags
Reproducer none

Description Jaroslav Škarvada 2018-05-24 15:19:21 UTC
Created attachment 1441096 [details]
Reproducer

Description of problem:
E.g. '[[=a=]]' regex doesn't match 'á' in Czech locale as it should. It seems to be regression, because it worked in glibc-2.26 and older. All locales seems to be affected, not only the Czech.

Version-Release number of selected component (if applicable):
glibc-2.27-14.fc28.x86_64

How reproducible:
Always

Steps to Reproduce:
1. gcc -o regex regex.c
2. ./regex
3.

Actual results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 1

Expected results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 0

Additional info:
It's blocking grep rebuild.

Comment 1 Florian Weimer 2018-05-24 15:21:43 UTC
*** Bug 1582224 has been marked as a duplicate of this bug. ***

Comment 2 Jaroslav Škarvada 2018-05-24 15:32:58 UTC
It's not only about 'á' in cs_CZ.UTF-8 or en_US.UTF-8. There are more matches that worked and don't work now, e.g.:

$ echo 'é' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'è' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'ê' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
...

Comment 3 Florian Weimer 2018-07-09 12:03:56 UTC
This appears to be a deliberate change in character equivalences.  As part of the updates for https://sourceware.org/bugzilla/show_bug.cgi?id=14095, most accented and non-accented characters are no longer considered equivalent.  I do not know if this the intend of the current Unicode version.

Comment 4 Florian Weimer 2018-07-09 15:35:18 UTC
This may be an algorithmic issue after all, not a data problem.

Comment 5 Fedora Update System 2018-07-12 15:41:42 UTC
glibc-2.27-30.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 6 Fedora Update System 2018-07-13 19:29:20 UTC
glibc-2.27-30.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 7 Fedora Update System 2018-07-17 15:17:40 UTC
glibc-2.27-30.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.