Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 520189
Summary: | yum should use LOW_SPEED_{LIMIT,TIMEOUT} for timeout | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mads Kiilerich <mads> | ||||
Component: | python-urlgrabber | Assignee: | James Antill <james.antill> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 12 | CC: | ackistler, dant, ffesti, james.antill, jason, kdudka, khchanel, martin.nad89, maxamillion, pmatilai, tim.lauridsen, wolfgang.rupprecht | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-04-29 20:59:12 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mads Kiilerich
2009-08-28 20:00:00 UTC
what ver of python-urlgrabber? python-urlgrabber-3.9.0-8.fc12.noarch heh. Well, How do you know it wasn't going to take a nearly infinite amount of time? :) I'll see what I can do to make it less nutty but it's not a super high priority. I've seen this bug, too. yum-3.2.24-2 python-urlgrabber-3.9.0-8 It's more than a nutty ETA. yum stops downloading. It can happen at any point in a file, not just at the end. From that point on, the ETA just gets more and more spectacular. I've verified that there's no network traffic upstream of yum. I'm not convinced that it's the server's or network's fault. As the reporter noted, the only recovery is to kill yum, try again, and hope for something better. So the download is slowing down and eventually stopping but never aborting and you're thinking that urlgrabber is doing it and not your network connection? (In reply to comment #5) > So the download is slowing down and eventually stopping but never aborting and > you're thinking that urlgrabber is doing it and not your network connection? It's more accurate to say I haven't observed the occurrences carefully enough to exclude anything, yet. So far the only data I have is that ridiculous numbers of ETA digits means downloading has stopped. Re comment #5: Ok. There are two problems: The hanging download and wild estimates. Right now the wild estimates clobbers the progress and making it harder to find out when the download is hanging. The estimates are so high that there must be either a allmost-division-by-zero error or some other bug in the estimate calculation. For now I am fine with blaming the network connection. I got it again: (6/11): empathy-debuginfo-2.28.1.1-3.fc12.i686.rpm | 1.3 MB 00:06 (7/11): gcc-debuginfo-4.4.2-7.fc12.i686.rpm (19%) 12% [====== ] 0.0 B/s | 9.4 MB 2017959278923735059258845:52 ETA (7/11): gcc-debuginfo-4.4.2-7.fc12.i686.rpm (19%) 12% [===== ] 0.0 B/s | 9.4 MB 76257225264984816572947599078439867:44 ETA But this time it was trigged by a restart of NetworkManager. I assume that that caused the tcp connection to break, and apparently curl or yum got that wrong somehow. can anyone here routinely make this happen? If so let me know - I need someone to test a minimum rate patch that will help. This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping it seems to me that this bug will be triggered when the network is out during download. that explains #8 I think I tracked it down in libcurl-7.19.7-1.fc12.i686. The download with the broken tcp socket keeps spinning in libcurl every 1000 ms in the while loop at transfer.c line 1887, and the only way out is in line 1948 if Curl_socket_ready returns -1. But when the poll in select.c line line 218 returns POLLERR then Curl_socket_ready doesn't return -1 but something with CURL_CSELECT_ERR set. Another issue discussed in http://www.mail-archive.com/curl-library@cool.haxx.se/msg02450.html references http://lists.danga.com/pipermail/memcached/2003-October/000336.html which perhaps also can explain this issue. I can imagine something like the following could solve it - but it is completely untested and I neither know nor understand the curllib code. --- /usr/src/debug/curl-7.19.7/lib/transfer.c 2009-09-27 23:37:24.000000000 +0200 +++ transfer.c 2009-11-28 01:02:54.000000000 +0100 @@ -1945,7 +1945,7 @@ else timeout_ms = 1000; - switch (Curl_socket_ready(fd_read, fd_write, timeout_ms)) { + switch ((res = Curl_socket_ready(fd_read, fd_write, timeout_ms))) { case -1: /* select() error, stop reading */ #ifdef EINTR /* The EINTR is not serious, and it seems you might get this more @@ -1955,14 +1955,16 @@ #endif return CURLE_RECV_ERROR; /* indicate a network problem */ case 0: /* timeout */ + break; /* loop to allow throttle fds to be selectable again */ default: /* readable descriptors */ - + if (res & CURL_CSELECT_ERR) + return CURLE_RECV_ERROR; /* indicate a network problem */ result = Curl_readwrite(conn, &done); + if(result) + return result; /* "done" signals to us if the transfer(s) are ready */ break; } - if(result) - return result; first = FALSE; /* not the first lap anymore */ } Does this make sense? Should this issue be reassigned to curl? Still, I don't understand how this can cause the crazy estimates, but I think it is related. Created attachment 375131 [details] fixed several problems with the transfer progress meter (upstream patch) Attached is a patch for the progress matter written by Daniel Stenberg. Please give it a try: +Daniel Stenberg (4 Nov 2009) +- I fixed several problems with the transfer progress meter. It showed the + wrong percentage for small files, most notable for <1000 bytes and could + easily end up showing more than 100% at the end. It also didn't show any + percentage, transfer size or estimated transfer times when transferring + less than 100 bytes. Is that actually the case? Anyway I don't understand what exactly is the bug about? 1) Is it about the broken progress meter as says the summary? 2) Or is it about a hanging transfer as stated in comment #0? 3) Do we have any curl based minimal example? Once I am able to reliably reproduce the behavior I am happy to review the patch and/or write another one. Thanks in advance for bringing some light at this! > +Daniel Stenberg (4 Nov 2009) > +- I fixed several problems with the transfer progress meter. It showed the > + wrong percentage for small files, most notable for<1000 bytes and could > + easily end up showing more than 100% at the end. It also didn't show any > + percentage, transfer size or estimated transfer times when transferring > + less than 100 bytes. > > Is that actually the case? No. The percentage stays ok, so looking at the patch I cannot imagine that it makes any difference, so I haven't tried it. Ok? > Anyway I don't understand what exactly is the bug about? > > 1) Is it about the broken progress meter as says the summary? It is only the ETA that is crazy, as the summary says. The percentage and progress meter is fine - but fixed, because ... > 2) Or is it about a hanging transfer as stated in comment #0? yes, the download has apparently stopped, but the ETA keeps increasing. (Obviously the ETA should neither decrease nor stay, but getting so high doesn't make sense.) > 3) Do we have any curl based minimal example? No, sorry. I am just a random yum user who noticed the problem - and attached gdb to a failing yum and tries to draw conclusions without knowing anything. > Once I am able to reliably reproduce the behavior I am happy to review the > patch and/or write another one. I can't reproduce what I saw the other day. But please try to follow my reasoning: select.c Curl_socket_ready() can return CURL_CSELECT_ERR according to the docstring, and that is an error situation which should stop the download, but that error situation is not handled by transfer.c Transfer() and it keeps spinning forever. I am pretty sure that is what happened, but I cannot rule out that I might have been tricked by gdb on optimized code... However, here is something which seems to come pretty close and which might have happened when I have been using a flacky wireless network. Start a yum download: yumdownloader wesnoth-data and wait for it to start the download. While it is downloading close the connection on the server side without notifying the client side: iptables -A INPUT -p tcp --sport 80 -j REJECT --reject-with tcp-reset The client now sits waiting forever and ETA now starts increasing and gets crazy while the download rate approaches 0. Once the server has dropped the tcp connection the reject can be dropped: iptables -D INPUT -p tcp --sport 80 -j REJECT --reject-with tcp-reset I can see that that it in some cases can be how some would like curl to work, but in yums case the connection should be dropped and it should try another mirror. I don't know if libcurl (and whatever is in the path between yum and curl) has a good way to set a passiveness-timeout for a download, or if the caller (yum) should detect the situation through the progress callback? (In reply to comment #14) > yes, the download has apparently stopped, but the ETA keeps increasing. > (Obviously the ETA should neither decrease nor stay, but getting so high > doesn't make sense.) If the transfer hangs, ETA grows. It does not sound like a bug to me. Or are you saying the transfer hangs in case it should not? > No, sorry. I am just a random yum user who noticed the problem - and attached > gdb to a failing yum and tries to draw conclusions without knowing anything. Great! Then attach the backtrace please. > ... but I cannot > rule out that I might have been tricked by gdb on optimized code... Are you able to recompile libcurl without optimization? Should I prepare such a build for you? > ... > iptables -D INPUT -p tcp --sport 80 -j REJECT --reject-with tcp-reset Thanks! I'll try it myself. What are you actually expecting to be done by curl in this case? > I can see that that it in some cases can be how some would like curl to work, > but in yums case the connection should be dropped and it should try another > mirror. I don't know if libcurl (and whatever is in the path between yum and > curl) has a good way to set a passiveness-timeout for a download, or if the > caller (yum) should detect the situation through the progress callback? If you don't want to wait indefinitely for the connection to become ready, I think setting timeout is the way to go. Trying to understand and answer the questions I read more of the code, and now I see that the status from Curl_socket_ready intentionally is ignored and the real error handling happens in Curl_readwrite. So my observations _must_ have been wrong, and what I saw was probably just the case I reproduced in #14. Curl was spinning (slowly, once a second) on a stalled connection, not a closed connection. So I conclude that curl does what it is told to do, but that yum either should cancel a stalled download from the progress callback or that it should have set a timeout. This issue should thus be sent back to yum. Do you agree? (In reply to comment #13) > Once I am able to reliably reproduce the behavior I am happy to review the > patch and/or write another one. Try to unplug the network during transfer Now I hopefully see your point. It just hangs too long on a dead connection. That's exactly what CURLOPT_TIMEOUT is for. I completely agree this is a bug of yum, reassigning back. Let me know if you need some additional information. Comment on attachment 375131 [details]
fixed several problems with the transfer progress meter (upstream patch)
The proposed patch does not fix the reported problem. The bug has to be fixed within yum.
we USED to set curlopt_timeout - but when we do curl aborts ANY download which takes longer than curlopt_timeout. any ideas? (In reply to comment #20) > we USED to set curlopt_timeout - but when we do curl aborts ANY download which > takes longer than curlopt_timeout. > > any ideas? No problem. You can of course resume the transfer, thus don't have to download the already downloaded part again. That's IMO the most common way how yum like tools usually work. You can even download different parts from different mirrors and finally only check their size and hash and eventually move back to the step zero ;-) No, you don't understand. if I set curlopt_timeout in python to say 300s. I would assume that means if the download stalls for more than 300s then it timesout. What happens is: if the download is actively downloading data, but the download takes > 300s to come down then the whole download aborts. which makes no sense at all. for a bit more info https://bugzilla.redhat.com/show_bug.cgi?id=515497 (In reply to comment #22) > if I set curlopt_timeout in python to say 300s. I would assume that means if > the download stalls for more than 300s then it timesout. That's only your wrong assumption, not a curl bug. Please read the documentation properly: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTTIMEOUT > What happens is: if the download is actively downloading data, but the download > takes > 300s to come down then the whole download aborts. > > which makes no sense at all. This ^^^ is the documented behavior. You are free to implement your own heuristic to abort (or to not abort) the transfer on application level, but I can't see your point. We have well tested (and widely used) network protocols and you are going to come with something tricky which solves all problems caused by unreliable network? This drives me crazy :-D (In reply to comment #23) > for a bit more info > https://bugzilla.redhat.com/show_bug.cgi?id=515497 IMO the approach described in comment #21 solves it better than what you have (probably) done. I understand the docs. The problem was what curl lists as a timeout and what python socket used for timeout are not the same thing. python socket was saying 'if the socket is open but nothing is going on after N seconds, abort' curl is saying 'if the socket is open, AT ALL for more than N seconds, abort'. I had not yet, but was planning on implementing a minimum speed using: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTLOWSPEEDLIMIT (In reply to comment #25) > curl is saying 'if the socket is open, AT ALL for more than N seconds, abort'. + you can set the connection timeout separately, which usually makes sense. It has been broken for a long time because of migration to NSS, but it's slowly starting to work ;-) > I had not yet, but was planning on implementing a minimum speed using: > http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTLOWSPEEDLIMIT Sure, go ahead and try to set it. That's maybe what you are looking for, though I've never used the option myself. Nevertheless consider also the transfer resuming if it is not implemented already. It can be pretty annoying to download an RPM of e.g. OpenOffice several times on a broken network... connection restarting is already implemented. urlgrabber has done since just about ever. Seth, I doubt that'll work well as a replacement for a timeout. Can we reset the timeout while for the curl object in the middle of a callback? That seems like the best fix, if it works. No, you can't touch curlopts after perform() has been called. (In reply to comment #27) > connection restarting is already implemented. However I am talking about transfer *resuming*, not connection restarting. (In reply to comment #28) > Can we reset the timeout while for the curl object in the middle of a callback? http://permalink.gmane.org/gmane.comp.web.curl.library/24861 Generally you can't rely on anything beyond the documented cURL API: http://curl.haxx.se/libcurl/c (In reply to comment #30) > (In reply to comment #27) > > connection restarting is already implemented. > > However I am talking about transfer *resuming*, not connection restarting. > I am too, I misspoke. We resume using byte-ranges. > (In reply to comment #28) > > Can we reset the timeout while for the curl object in the middle of a callback? > > http://permalink.gmane.org/gmane.comp.web.curl.library/24861 > Curiuously the items mentioned in that email I ran into when porting urlgrabber to pycurl. Specifically, there was no way to handback a sensible progress callback that included a total expected size unless you parsed/accessed the header yourself. if there are better ways of doing this I'm all ears. I've found the python bindings somewhat frustrating. They aren't hard to understand but hard to know which way is 'better' or 'suggested'. (In reply to comment #31) > We resume using byte-ranges. Then it might work fairly well even with the fixed timeout set. > Curiuously the items mentioned in that email I ran into when porting urlgrabber > to pycurl. Specifically, there was no way to handback a sensible progress > callback that included a total expected size unless you parsed/accessed the > header yourself. > > if there are better ways of doing this I'm all ears. I've found the python > bindings somewhat frustrating. They aren't hard to understand but hard to know > which way is 'better' or 'suggested'. http://curl.haxx.se/libcurl/c/curlgtk.html Are you saying it's not possible to write the same using pycurl? part of the requirement of the port from urllib to pycurl for urlgrabber was to do so w/o changing the urlgrabber interface. with urllib I could urlopen the url and get back the header info to do urlgrabbers progress object setup. http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=blob;f=urlgrabber/grabber.py;h=0023fedbd99c8b90147c58204a9b9d9fcdf35c8f;hb=e86d27a4a7a72a8832ad4e1e63996ed8ac616621#l1039 that's where the urlgrabber code using pycurl starts. If you have the time I'd be happy to get some feedback on ways to improve things. (but maybe off this bug) (In reply to comment #33) > that's where the urlgrabber code using pycurl starts. If you have the time I'd > be happy to get some feedback on ways to improve things. (but maybe off this > bug) Sure. The best place to discuss this is the curl-library mailing list: http://cool.haxx.se/mailman/listinfo/curl-library Most of the libcurl hackers hang around there, response time is mostly close to zero. pycurl has (probably) its own community, but I don't think your problem is somehow python specific. I will subscribe to the list again. I unsubscribed after a couple of days of extremely disgusting spam. FWIW this seems to do what I would like: --- grabber.py.org 2009-12-04 01:13:16.000000000 +0100 +++ /usr/lib/python2.6/site-packages/urlgrabber/grabber.py 2009-12-04 01:20:44.000000000 +0100 @@ -1170,10 +1170,11 @@ self.curl_obj.setopt(pycurl.MAXREDIRS, 5) # timeouts - timeout = 300 if opts.timeout: timeout = int(opts.timeout) self.curl_obj.setopt(pycurl.CONNECTTIMEOUT, timeout) + self.curl_obj.setopt(pycurl.LOW_SPEED_LIMIT, 1) + self.curl_obj.setopt(pycurl.LOW_SPEED_TIME, timeout) # ssl options if self.scheme == 'https': Even more FWIW, I think that the 30 s used by is a bit high - I think 10 s would be more appropriate. BTW, I noticed and wonder if it is intentional that many of the initial downloads made by yum doesn't use a timeout at all. *** Bug 539563 has been marked as a duplicate of this bug. *** I am having this problem. I have updated to the latest and this problem is still occurring for me. I find that every time I run yum, the network connection gets dropped and it is more than random. I have tried two different NICs and it has no effect, the problem still remains. It is very difficult to do update and installs and based on luck after repeated retries. # uname -r 2.6.31.12-174.2.22.fc12.i686 # rpm -qa | grep yum anaconda-yum-plugins-1.0-5.fc12.noarch PackageKit-yum-0.5.6-1.fc12.i686 yum-3.2.25-1.fc12.noarch PackageKit-yum-plugin-0.5.6-1.fc12.i686 yum-plugin-fastestmirror-1.1.26-1.fc12.noarch yum-utils-1.1.26-1.fc12.noarch yum-presto-0.6.2-1.fc12.noarch yum-metadata-parser-1.1.2-14.fc12.i686 I have tried removing each of the yum plugins and it seems to have no effect. If there is any information I can provide to help resolve this issue, please let me know. (In reply to comment #38) > I am having this problem. I have updated to the > latest and this problem is still occurring for me. Thank you for heads up! Please define 'this problem'. AFAIK this bug is only about missing timeout in yum downloads. Is your problem somehow dedicated to yum? Other network transfers work fine? Have you tried the curl(1) tool? It seems to be dedicated only to Yum (and its components) and the associated 'Update Software' & 'Add/New software' sort of thing. What I noticed is my latest new minimally installed OS, yum installs/updates are disconnecting quite often and so I gave up trying. Note however, I previously installed an F12 several weeks ago (and in a different partition), fully installed, and yet I don't recall yum acting up this badly, but I do recall it was not smooth - because normally, I use 'Software Update' and 'Add/Remove Software' and yet it did hang, so I finished off the rest by using yum directly, but not without having hang problems. It seemed at the time it was just a simple annoyance, it wasn't THAT bad, but somehow my mind was set into getting a working F12! I have gkrellm installed and I can see an immediately dropped network connection from which Yum spins it's wheels. Most of the time, the ETA spins up fast and ends up as "Infinite" and other times it simply shows --:--. But in all cases, the transfer rate incrementally drops to 0b. It seems to be random, but always breaks given a long enough file list. Strangely enough, I have seen a hang by doing a 'yum clean all' followed by 'yum update' with a hang on attempts to download the repo databases! I seem to recall that at least with previous releases of yum (F9/10/11) that there was a built-in network timeout mechanism that would drop the mirror and try another mirror and not once have I seen this behaviour with F12's yum program. It seems like "robustness code" was removed or is prevented from kicking in? I have pulled the network cable out to see how yum responds, and sure enough - it hangs. Dunno, this is just an observation. I have no idea what what curl(1) is, but perhaps you can tell me what I can do to nail this problem down? (In reply to comment #40) > I seem to recall that at least with previous releases of > yum (F9/10/11) that there was a built-in network timeout > mechanism that would drop the mirror and try another mirror > and not once have I seen this behaviour with F12's yum program. > It seems like "robustness code" was removed or is prevented > from kicking in? +1 for allowing the timeout in yum. From the comments above you can see it's already on my wish-list. Has anybody at least considered the solution from comment #36? > I have pulled the network cable out to see how yum > responds, and sure enough - it hangs. Dunno, this is > just an observation. That's unfortunate. > I have no idea what what curl(1) is, but perhaps you > can tell me what I can do to nail this problem down? It's a tool for download/upload content using various network protocols. It uses libcurl as well as yum indirectly does. So that you can try to use it to download the the remote stuff directly and compare the behavior. OK, I have done what you have asked me to do with curl, and soon after running curl, the network connection is dropped. curl behaves in a similar way as yum. # curl -LO 'http://mirror.uoregon.edu/fedora/linux/releases/12/Everything/i386/os/Packages/kdelibs-apidocs-4.3.2-4.fc12.noarch.rpm' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 3 242M 3 9027k 0 0 103k 0 0:39:56 0:01:27 0:38:29 0 In the above the 'Average Dload' keeps dropping until 0 is reached. Seems clear to me that somehow there is no recovery when the network is dropped at least from curl or yum standpoint. I mean, my network is working for everything else as far as I can tell. Additionally, I tried using wget to copy the entire Fedora packages over sequentially and the network connection gets dropped. # wget -nc -r 'http://mirror.uoregon.edu/fedora/linux/releases/12/Everything/i386/os/Packages' [...] --2010-02-28 10:23:15-- http://mirror.uoregon.edu/fedora/linux/releases/12/Everything/i386/os/Packages/CodeAnalyst-gui-2.8.54-19.fc12.i686.rpm Connecting to mirror.uoregon.edu|128.223.157.9|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 7852468 (7.5M) [application/x-rpm] Saving to: “mirror.uoregon.edu/fedora/linux/releases/12/Everything/i386/os/Packages/CodeAnalyst-gui-2.8.54-19.fc12.i686.rpm” 67% [=========================> ] 5,270,454 --.-K/s eta 45s (In reply to comment #42) > In the above the 'Average Dload' keeps dropping until > 0 is reached. Seems clear to me that somehow there > is no recovery when the network is dropped at least > from curl or yum standpoint. I mean, my network is > working for everything else as far as I can tell. Well, curl probably can't fix your unreliable network :-) But as for the "recovery" you wanted, something like that is indeed there. You want to play with --max-time, --retry, --retry-delay, --retry-max-time, etc. ...but the existence of curl command-line arguments doesn't help with yum. I am only saying libucrl has the ability. It's yum/urlgrabber team's turn now to apply the mentioned 3-lines patch or so :-) Seth said that urlgrabber supported even transfer resuming. It would be great to bring it to reality and enable in yum. Interesting. Thanks for that comment above. I have switched over to F11 (on the same system) and noticed with the wget command line, the network connection is dropped, but after some delay of several seconds to several minutes, the network connection is retried, picks up where it was dropped (resumes) and continues on. This was the behaviour I was expecting. I am not seeing this with F12. With F12, the dropped network connection is not timing out, nor is it retried, so it hangs. Keep in mind, that since I have tried curl/wget and see the same "hang" problem, I wonder if it is more than just a yum issue? (In reply to comment #47) > Keep in mind, that since I have tried curl/wget and see the > same "hang" problem, I wonder if it is more than just a > yum issue? The networking problem itself can't be a bug in yum. Nevertheless we may improve it to work better in case of unreliable network. For downloading packages yum already uses regets. If it is not regetting then it is possible the mirror we're talking to doesn't support byte-ranges. I'm not sure what bug we're dealing with here anymore with all the noise of the last few days. Noise is noise. The core of the issue is IMHO (and with some local authority due to being the reporter) that the yum/rpm mirroring system builds on the sound "don't scale up - scale out" mantra and utilizes a lot of unreliable servers with limited bandwith instead of one central resource. Yum as the client thus has to do fail-over seamlessly (if possible) whenever any download fails or hangs or misbehaves in any way. The user experience is currently that yum isn't good enough at that. One specific problem and solution has been pointed out: If an rpm download stalls (for example because of temporary network problems) then it sometimes hangs forever and neither fails nor do failover. Setting LOW_SPEED_LIMIT and LOW_SPEED_TIME seems to solve this specific problem. Ok, I just put the patch from comment #36 into upstream. Note that it seemed to me like yum calls curl from several places (probably for downloading different kinds of meta data) and uses different timeout settings different places. The other places probably also needs fixing - or a general abstraction layer. Well... then there is the question of why does yum/rpm behave correctly (w/ failover) on F11, but not on F12 and on the same hardware? My HW network setup has not changed the last couple of years... This is what stymies me... While the patch is good practice, but it does not explain why F11 works and F12 does not, unless the timeout code was dropped? On F11, I latest I have installed: # rpm -qa | grep yum anaconda-yum-plugins-1.0-4.fc11.noarch PackageKit-yum-plugin-0.4.9-1.fc11.i586 yum-presto-0.6.2-1.fc11.noarch yum-utils-1.1.23-1.fc11.noarch yum-3.2.24-2.fc11.noarch yum-arch-2.2.2-8.fc11.noarch yum-metadata-parser-1.1.2-12.fc11.i586 PackageKit-yum-0.4.9-1.fc11.i586 yum-plugin-protect-packages-1.1.23-1.fc11.noarch yum-plugin-fastestmirror-1.1.23-1.fc11.noarch yum-updatesd-0.9-2.fc11.noarch # yum whatprovides */grabber.py PyQt4-devel-4.7-1.fc11.i586 : Files needed to build other bindings based on Qt4 Repo : updates Matched from: Filename : /usr/share/doc/PyQt4-devel-4.7/examples/opengl/grabber.py yum-arch-2.2.2-8.fc11.noarch : Extract headers from rpm in a old yum repository Repo : updates Matched from: Filename : /usr/share/yum-arch/urlgrabber/grabber.py python-urlgrabber-3.0.0-15.fc11.noarch : A high-level cross-protocol url-grabber Repo : installed Matched from: Filename : /usr/lib/python2.6/site-packages/urlgrabber/grabber.py (In reply to comment #53) > While the patch is good practice, but it does not > explain why F11 works and F12 does not, unless the > timeout code was dropped? $ rpm -q --changelog python-urlgrabber-3.9.1-4.fc12.noarch|grep -A1 3.9.0-1 * Thu Jul 30 2009 Seth Vidal <skvidal at fedoraproject.org> - 3.9.0-1 - new version - curl-based In reply to comment #52, I think where the change is done now (in urlgrabber) should affect all code paths from yum. If you want to test it, and have F12, I think this should apply cleanly: http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commitdiff;h=8e57ad3fbf14c55434eab5c04c4e00ba4f5986f9 Ok, thanks for supplying the codefix for testing. I have manually added the changes to the grabber.py and I have checked out: + rpm + yum + Add/Remove Software + Update Software In all the above apps, there is at least one or more network disconnects / app, but in every case disconnect retries works, although it can take seconds to minutes between drops and retries. I have not seen a complete retry failure. In yum, there would appear visually, a "frozen state" from seconds to minutes, followed by a message indicating a "mirror switch", followed by a retry on file downloading & curses text status Overall, connection connect/retries works much better at least for the above apps. However, I would like to mention, that with F8/9/11, I have never seen these network disconnects/retries. I suspect that there is an underlying problem causing these disconnects in the first place. This ought to be looked into. Outside of scope of this bug, but to comment: + `Add/Remove Software' could be improved to show more status/ activity than a simple "bouncing download icon". When hung, the "bouncing download icon" implied it was still working. Perhaps additional info/status should be shown as to the file count/total being worked on or something similar to the app below. Need more visual feedback so that one can estimate how long downloads might take, if it is working at all. + `Update Software' has better statistics reporting, however, it has strange activity "jumping around" as to the file being worked on instead of displaying sequential activity? As it is, I was starting to get vertigo just by watching it. :P Dan: It looks like you have done some impressive testing. I haven't. Thank you! I think that your test supports my unsubstantiated claim that other code paths don't apply any limits. IIRC there was a 5 minutes timeout in some places. I would suggest configuring a timeout value of 10 seconds in yum.conf. 30 seconds is long time to wait for a fail over. IMHO it would make sense to change the default. So, are you suggesting that I put the following under [main] in /etc/yum.conf: timeout=10 Please advise. Yes, I think timeout=10 is better. But is a matter of personal taste and preference - not something that will make a huge difference or make things work. So use whatever you want - and if you think it makes a big difference then suggest to the maintainers that the default is changed ;-) This should be fixed upstream and in rawhide. |