Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 216534 - Review Request: gocr - GNU Optical Character Recognition program
Summary: Review Request: gocr - GNU Optical Character Recognition program
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Patrice Dumas
QA Contact: Fedora Package Reviews List
URL: http://www.cora.nwra.com/~orion/fedora/
Whiteboard:
Depends On:
Blocks: FE-ACCEPT 216536
TreeView+ depends on / blocked
 
Reported: 2006-11-20 22:59 UTC by Orion Poplawski
Modified: 2007-11-30 22:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-03-21 16:09:20 UTC
Type: ---
Embargoed:
pertusus: fedora-review+


Attachments (Terms of Use)

Description Orion Poplawski 2006-11-20 22:59:21 UTC
Spec Name or Url: http://www.cora.nwra.com/~orion/fedora/gocr.spec
SRPM Name or Url: http://www.cora.nwra.com/~orion/fedora/gocr-0.41-1.fc6.src.rpm
Description: 

GOCR is an OCR (Optical Character Recognition) program, developed under the
GNU Public License. It converts scanned images of text back to text files.
Joerg Schulenburg started the program, and now leads a team of developers.

GOCR can be used with different front-ends, which makes it very easy to port
to different OSes and architectures. It can open many different image
formats, and its quality have been improving in a daily basis.

Comment 1 Patrice Dumas 2006-11-21 22:59:05 UTC
In files, the de file could be marked as:
%lang(de) %doc READMEde.txt

There is a gtk frontend, maybe it could be shipped in a sub-package?

There is a missing dependency on wish. I also think that maybe it 
could make sense to have gocr-tcl for gocr.tcl, because of that
requires?

There are many Requires missing. At least (in pnm.c), 
gzip, bzip2, transfig, netpbm-progs, libjpeg
Maybe upstream could use convert...

Comment 2 Orion Poplawski 2006-11-22 18:26:16 UTC
(In reply to comment #1)
> In files, the de file could be marked as:
> %lang(de) %doc READMEde.txt

Done.

> There is a gtk frontend, maybe it could be shipped in a sub-package?

Done.
 
> There is a missing dependency on wish. I also think that maybe it 
> could make sense to have gocr-tcl for gocr.tcl, because of that
> requires?

Done.
 
> There are many Requires missing. At least (in pnm.c), 
> gzip, bzip2, transfig, netpbm-progs, libjpeg
> Maybe upstream could use convert...

I'm wondering whether to make these hard Requires or not.  Obviously you need
some to get extra functionality, but only for the image types you need to
process.  Perhaps just a note in the description?

The problem at the moment with using convert is that you would need a command
like:  convert <file> pnm:-, but the code expects to append the filename to the
end of the command.

The more I look at the code, the less I like it, but I suppose it's developing
and may be useful.

Just need the new spec file:
http://www.cora.nwra.com/~orion/fedora/gocr.spec
http://www.cora.nwra.com/~orion/fedora/gocr-0.41-2.fc6.src.rpm

Comment 3 Patrice Dumas 2006-11-22 22:52:12 UTC
(In reply to comment #2)

> > There are many Requires missing. At least (in pnm.c), 
> > gzip, bzip2, transfig, netpbm-progs, libjpeg
> 
> I'm wondering whether to make these hard Requires or not.  Obviously you need
> some to get extra functionality, but only for the image types you need to
> process.  Perhaps just a note in the description?

It depends how it fails. But given what those deps are,
except maybe for transfig, I can't see why they couldn't be 
hard requires. png, jpeg, gif and eps support seems to be 
a must to me.

I tested a bit, but I get only segfaults on non pnm files (tried
png and eps):

$ gocr ex.pcx 
Special chars: àá__åæç À Å Æ ß &$Xgo ØØ44t>¢µ
Special chars= àáâãäåæç À Å Æ ß &$XO_o øØ44 _>_µ
Special  chars :  àáâăäåæç À Å _ _  G_#9o 0Ø44>>tµ
$ convert ex.pcx ex.png
$ gocr ex.png 
pngtopnm: warning - non-square pixels; to fix do a 'pamscale -yscale 4.28479'
Erreur de segmentation

$ gdb --args gocr ex.png
....
(gdb) run
Starting program: /usr/bin/gocr ex.png
pngtopnm: warning - non-square pixels; to fix do a 'pamscale -yscale 4.28479'

Program received signal SIGSEGV, Segmentation fault.
0x00aebf18 in pnm_readpaminit () from /usr/lib/libnetpbm.so.10
(gdb) bt
#0  0x00aebf18 in pnm_readpaminit () from /usr/lib/libnetpbm.so.10
#1  0x080a0560 in readpgm (name=0xbfbc6a06 "ex.png", p=0xbfbbc2cc, vvv=0)
    at pnm.c:149
#2  0x080493b9 in main (argn=2, argv=0xbfbc5464) at gocr.c:272
#3  0x0082ce5c in __libc_start_main () from /lib/libc.so.6
#4  0x08048e01 in _start ()



With gziped or bzip2ed files, things are not better:
$ gzip ex.pcx
$ gocr ex.pcx.gz 
ERROR pcx.c L28: no ZSoft sign


Another issue is that in gocr.tcl, the show button seems to
invoke a program which isn't installed. There is an error with

couldn't execute "xli": no such file or directory

similarly with spell

couldn't execute "tkispell": no such file or directory

and with scan it starts xsane, so there is a missing dependency.

Comment 4 Orion Poplawski 2006-11-23 00:00:06 UTC
Added Requires for those things we ship.  We don't ship xli or tkispell though.
 So, change to equivalent apps we do ship or forget about them?

Worry about the segfault, or just report upstream?

Comment 5 Patrice Dumas 2006-11-23 10:51:49 UTC
Looking at tkispell on the web it doesn't seems to be 
maintained, and it is not obvious where upstream is.
Looking at the gocr.tcl code, it looks like spellchecking involves
putting a file named out01.txt in the current directory which is not
cleaned up, and is the same file the output text is saved to in the 
default case... My opinion would be to disable this functionality. I did
it simply by commenting out

pack .abar.spell -side left

I spotted another issue, the config file is found and written in the
current directory, and not in $HOME! This is bad... Maybe we shouldn't
ship gocr.tcl? It hasn't really be changed in 4 years.

Testing a bit gtk-ocr, I found at least 2 bugs (a crash, and also
at another point the files appeared but I couldn't convert them). It is
saner with regard with the handling of config file, however the converted
file is saved in a file with same name than input file with .txt appended
without any possibility to override this, nor any explanation of where
the converted file is saved to... The default image viewer here is 
display from ImageMagick. Looking at the cvs, it seems that it hasn't been
changed in 6 years.

My personal opinion is that those 2 frontends are too buggy and unmaintained
to be shipped. 

Now regarding the segfault, I think it is problematic since it seems to me 
that support for widely used image formats (png, eps, jpeg) should be 
working in a shipped package. For devel it is not problematic, but for FC-6
and below I think this should be a must. Not supporting compressed 
images is not an issue in my opinion.

Comment 6 Greg Swallow 2006-11-27 23:05:51 UTC
Is this here just for FuzzyOcr?  If so, the developers recommend gocr-0.40 now 
according to:
http://fuzzyocr.own-hero.net/wiki/Installation-3.x
"preferably version 0.40 (some people reported bad recognition with 0.41)"


Comment 7 Patrice Dumas 2006-12-12 09:02:34 UTC
There is a new version available, maybe it fixes some of the issues?

Comment 8 Greg Swallow 2006-12-12 15:15:41 UTC
Maybe it does, but relating to use with FuzzyOcr, I read this comment on the 
FuzzyOcr mailing list: (it is a private archive)
http://lists.own-hero.net/mailman/private/devel-spam/2006-December/001091.html
Someone wrote:
"I can confirm that - on large images, scanning times can go through the 
roof (over 30 secs on a pic i had... gocr0.40 needed 1 sec, 0.41 8 secs 
and 0.42 35 secs)
And I already found 3 of 10 images which crash gocr 0.42 with
Error in ocr0.c L208: idx out of range"

Granted, it's just one person's comment, but it seems gocr is heading in a 
different direction than what is good for scanning possible spam.



Comment 9 Greg Swallow 2006-12-17 04:38:21 UTC
Good news, the FuzzyOcr developers are recommending gocr 0.43 now.

Comment 10 Orion Poplawski 2006-12-20 16:41:00 UTC
Just need the new spec file:
http://www.cora.nwra.com/~orion/fedora/gocr.spec
http://www.cora.nwra.com/~orion/fedora/gocr-0.43-1.fc6.src.rpm


This disables the front-ends as they seem unmaintained.  No segfaults. 
gzip/bzip2 only supported for:

src/pnm.c:  ".pnm.gz",  "gzip -cd",  /* compressed pnm-files, gzip package */
src/pnm.c:  ".pbm.gz",  "gzip -cd",
src/pnm.c:  ".pgm.gz",  "gzip -cd",
src/pnm.c:  ".ppm.gz",  "gzip -cd",

But this is in the source and I'm not really interested in adding features.

Comment 11 Patrice Dumas 2007-01-03 14:34:53 UTC
* rpmlint says:
E: gocr explicit-lib-dependency libjpeg
I posted a comment above asking for that Requires, I guess there is an
executable from libjpeg used for conversion.
* follow guidelines
X License is GPL, not included. You should ask upstream to include the
license file, otherwise he may not be able to defend his license.

Some file with an author and no license. This should be investigated 
and certainly corrected upstream. Except from otsu it is the upstream
author, so ther shouldn't be much trouble. Maybe the upstream author thinks
that no license means public domain, but it is not the case, he should
either remove the author notice or explicitly license it in the public 
domain.

otsu.c has no license but an author (in fact 2, as shown by looking at
the comments).
 the following code was send by Ryan Dibble <dibbler>

pnm.c has no license but an author
/* (c) Joerg Schulenburg 2000-2006

pcx.c and tga.c have no license but an author 
// Joerg Schulenburg Mai99
// Joerg Schulenburg Mai99

* build and run fine
* right Requires and BuildRequires. Maybe a comment explaining the
need for the requires could be in order.
* %files section right
* sane provides
* match upstream
f989fe8e24f82d19c8ce55df15784e15  gocr-0.43.tar.gz


The only remaining blocker is the license issue. A statement from
upstream and a promise to fix things for the next release would be 
enough for me.

Comment 12 Patrice Dumas 2007-01-03 15:58:43 UTC
(In reply to comment #11)

> X License is GPL, not included. You should ask upstream to include the
> license file, otherwise he may not be able to defend his license.

Ooops, sorry it is included. The only issue is with files with author
and no license.

Comment 13 Orion Poplawski 2007-03-02 18:48:25 UTC
(In reply to comment #12)
> Ooops, sorry it is included. The only issue is with files with author
> and no license.

0.44 has been released with the licenses added.

http://www.cora.nwra.com/~orion/fedora/gocr.spec
http://www.cora.nwra.com/~orion/fedora/gocr-0.44-1.fc6.src.rpm




Comment 14 Patrice Dumas 2007-03-02 20:49:45 UTC
The way the license has been added to otsu.c is a bit dubious and 
looks like Ryan Dibble copyright was taken away since there is no
evidence that he transfered it. Anyway this was the only blocker,
so it is 

APPROVED.

Comment 15 Orion Poplawski 2007-03-02 20:54:45 UTC
Need initial import and FC-5/6 branches.

Comment 16 Warren Togami 2007-03-04 21:55:21 UTC
Set, but please use the template in the future as described on CVSAdminProcedure.

Comment 17 Bernard Johnson 2007-03-20 19:58:20 UTC
This was approved over two weeks ago and still is not imported.  Is it a dead
package?

Comment 18 Orion Poplawski 2007-03-21 16:09:20 UTC
Just got busy.  Checked in and built.

Comment 19 Jens Petersen 2007-03-22 03:23:38 UTC
If there is no CVS request then please do not change the fedora-cvs flag.


Note You need to log in before you can comment on or make changes to this bug.