Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 503816 - ICE from c++ after eating all memory
Summary: ICE from c++ after eating all memory
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: 11
Hardware: s390x
OS: Linux
low
medium
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2009-06-02 20:28 UTC by Dan Horák
Modified: 2009-07-29 10:31 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-23 23:16:09 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
failing file (322.74 KB, application/x-gzip)
2009-06-02 20:28 UTC, Dan Horák
no flags Details

Description Dan Horák 2009-06-02 20:28:03 UTC
Created attachment 346314 [details]
failing file

cc1plus eats all available memory and gets killed with

c++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See <http://bugzilla.redhat.com/bugzilla> for instructions.

when compiling a file on s390x. It occurred when building scribus on Fedora 11 on s390x, the build is https://s390.koji.fedoraproject.org/koji/taskinfo?taskID=72412 and search build.log for "scribus134format.o"

g++ (GCC) 4.4.0 20090506 (Red Hat 4.4.0-4)

Preprocessed file is in attachment, the command used to compile is
g++ -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -march=z9-109 -mtune=z10  -O2 -Wall -fPIC -fPIC -o scribus134format.o -c scribus134format.i

With plain  "g++ -o scribus134format.o -c scribus134format.i" the compile succeeds.

I don't have a minimal test case available yet. A copy of the buildroot is stored on the builder for further investigation.

Comment 1 Dan Horák 2009-06-09 20:50:16 UTC
While creating the minimal test case I have found this:
- the option causing abnormal memory usage is -O2
- run times of g++ are cca 5 sec (w/o -O2) vs. 5 mins (with -O2)
- memory consumption by cc1plus cca 300MB vs. 1.8GB (as seen in "top")
Numbers are from a reduced source file, where the compile is successful.

The builder has 2GB RAM + 0.5GB swap

Comment 2 Dan Horák 2009-06-10 12:16:41 UTC
Hm, I was able to successfully compile the original file, it took 13.5 minutes (user) with free memory about few MB, but it was built. I still think that something is wrong when 140 KB/3600 lines of C++ code using the QT library doesn't always build with 2.5 GB memory.

Comment 3 Jakub Jelinek 2009-06-15 19:25:31 UTC
The source isn't that small, e.g. loadFile method has 13784 basic blocks.
Apparently most of the time is spent in var tracking, without -g or with -fno-var-tracking cc1plus (cross from x86-64) topped at 800GB and was also much faster,
in var-tracking it spends huge amount of time.  Haven't tried to find out how many variables are tracked, but probably many.

Comment 4 Jakub Jelinek 2009-06-18 07:17:44 UTC
I've gathered some statistics on this testcase:
--- var-tracking.c.xx	2009-03-04 12:12:08.000000000 +0100
+++ var-tracking.c	2009-06-18 08:54:34.000000000 +0200
@@ -2143,6 +2143,31 @@ compute_bb_dataflow (basic_block bb)
   return changed;
 }
 
+void
+print_vta_stats (void)
+{
+  static int cnt;
+  basic_block bb;
+  size_t size, elements, collisions;
+  fprintf (stderr, "VTA step %d", ++cnt);
+  fprintf (stderr, " attrs_pool %zd (%zd %zd)", attrs_pool->block_size * attrs_pool->blocks_allocated, attrs_pool->elts_allocated, attrs_pool->elts_free);
+  fprintf (stderr, " var_pool %zd (%zd %zd)", var_pool->block_size * var_pool->blocks_allocated, var_pool->elts_allocated, var_pool->elts_free);
+  fprintf (stderr, " loc_chain_pool %zd (%zd %zd)", loc_chain_pool->block_size * loc_chain_pool->blocks_allocated, loc_chain_pool->elts_allocated, loc_chain_pool->elts_free);
+  size = 0;
+  elements = 0;
+  collisions = 0;
+  FOR_EACH_BB (bb)
+    {
+      size += htab_size (VTI (bb)->in.vars);
+      elements += htab_elements (VTI (bb)->in.vars);
+      collisions += htab_collisions (VTI (bb)->in.vars);
+      size += htab_size (VTI (bb)->out.vars);
+      elements += htab_elements (VTI (bb)->out.vars);
+      collisions += htab_collisions (VTI (bb)->out.vars);
+    }
+  fprintf (stderr, " htab %zd (%zd, %zd, %zd)\n", size * sizeof (void *), size, elements, collisions);
+}
+
 /* Find the locations of variables in the whole function.  */
 
 static void
@@ -2185,6 +2210,9 @@ vt_find_locations (void)
       in_pending = in_worklist;
       in_worklist = sbitmap_swap;
 
+if (n_basic_blocks > 5000)
+print_vta_stats ();
+
       sbitmap_zero (visited);
 
       while (!fibheap_empty (worklist))

printed:

VTA step 1 attrs_pool 32776 (1024 1022) var_pool 25608 (64 62) loc_chain_pool 32776 (1024 1022) htab 1543248 (192906, 0, 0)
VTA step 2 attrs_pool 721072 (22528 269) var_pool 5889840 (14720 44) loc_chain_pool 917728 (28672 875) htab 597983248 (74747906, 41276836, 22763)
VTA step 3 attrs_pool 983280 (30720 110) var_pool 8783544 (21952 3224) loc_chain_pool 1311040 (40960 5862) htab 1006302720 (125787840, 67735133, 7017)
VTA step 4 attrs_pool 2064888 (64512 688) var_pool 9014016 (22528 391) loc_chain_pool 1343816 (41984 105) htab 1527233520 (190904190, 93031713, 1047)
VTA step 5 attrs_pool 2589304 (80896 22) var_pool 10729752 (26816 3658) loc_chain_pool 1606024 (50176 6183) htab 1790226768 (223778346, 103611403, 325)
VTA step 6 attrs_pool 4064224 (126976 94) var_pool 11011440 (27520 4342) loc_chain_pool 1638800 (51200 7202) htab 1790226768 (223778346, 107777443, 325)
VTA step 7 attrs_pool 4064224 (126976 94) var_pool 11011440 (27520 3369) loc_chain_pool 1638800 (51200 5767) htab 1790226768 (223778346, 109835059, 325)
VTA step 8 attrs_pool 32776 (1024 1019) var_pool 25608 (64 59) loc_chain_pool 32776 (1024 1019) htab 618576 (77322, 0, 0)
VTA step 9 attrs_pool 884952 (27648 969) var_pool 2227896 (5568 53) loc_chain_pool 360536 (11264 678) htab 70594384 (8824298, 4875009, 8525)
VTA step 10 attrs_pool 1212712 (37888 983) var_pool 3252216 (8128 1069) loc_chain_pool 491640 (15360 1746) htab 159846368 (19980796, 9505894, 2727)
VTA step 11 attrs_pool 1737128 (54272 56) var_pool 3380256 (8448 165) loc_chain_pool 524416 (16384 236) htab 173239072 (21654884, 12905301, 460)
VTA step 12 attrs_pool 1835456 (57344 262) var_pool 3841200 (9600 308) loc_chain_pool 622744 (19456 1293) htab 174586496 (21823312, 14869139, 181)
VTA step 13 attrs_pool 2327096 (72704 538) var_pool 4148496 (10368 1061) loc_chain_pool 655520 (20480 2219) htab 174619360 (21827420, 15376847, 181)
VTA step 14 attrs_pool 2327096 (72704 538) var_pool 4148496 (10368 394) loc_chain_pool 655520 (20480 1397) htab 184610016 (23076252, 16048639, 181)

step 1-7 are in the largest function, which shows that the 3 alloc pools are really uninterestingly small, but all VTA memory (1.7GB) is in the hash tables.
There are 13784 basic blocks in loadFile, each basic block has 2 hash tables (in.vars and out.vars), so on average each hash table has 3984 occupied elements and 8117 allocated elements.

Comment 5 Jakub Jelinek 2009-06-18 08:20:29 UTC
I've gathered another thing, for each bb accounted in print_vta_stats check if
dataflow_set_different (&VTI (bb)->in, &VTI (bb)->out).  On the loadFile functions, out of the 13779 bbs processed, first step obviously had 0 differences,
next step 4083, then the remaining steps ranging from 2239 to 2250 basic blocks where in and out actually differed.  That's just 16% when the memory consumption jumps through the roof, which leads to the question how many bbs have also the !dataflow_set_different between out of predecessors and their in.
Perhaps we could use refcounted copy on write hash tables instead of emptying them and filling again all the time.  If a hash table is shared (refcount > 1), then we'd just use the NO_INSERT lookups and only if we find out we want to modify it, we'd allocate/pick from a free list a htab_t (with a refcount), vars_copy into it the original htab_t, set refcount to 1 and then start doing normal INSERT lookups in it.

Comment 6 Jakub Jelinek 2009-06-23 23:16:09 UTC
Please try gcc-4.4.0-10 in rawhide.

Comment 7 Dan Horák 2009-07-29 10:31:24 UTC
I have updated the buildroot with

gcc-c++-4.4.0-4.s390x.rpm
binutils-2.19.51.0.14-29.1.fc12.s390x.rpm

and this time the build crashes with

{standard input}: Assembler messages:
{standard input}:224817: Warning: end of file not at end of a line; newline inserted
{standard input}:226237: Error: unknown pseudo-op: `.stri'
c++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make[2]: *** [scribus/plugins/fileloader/scribus134format/CMakeFiles/scribus134format.dir/scribus134format.o] Error 1
make[1]: *** [scribus/plugins/fileloader/scribus134format/CMakeFiles/scribus134format.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

but it's most likely because the memory requirements during a parallel build (make -j2). With sequential build, it's very tight, but fine.


Note You need to log in before you can comment on or make changes to this bug.