IRCaBot 2.1.0
GPLv3 © acetone, 2021-2022
#i2p-dev
/2022/05/10
RN does I2P send the message "Maximum number of open connections reached."
RN I'm trying to locate a clog in my plumbing that comes up intermittantly and could be from other things than I2P
RN I get that message in my browser, sometimes in place of the page I am trying to load, in one instance in a sub frame of the page.
RN just need to figure out if this is coming from OS or privoxy, or I2P so I can increase the offending limit as the machine is mostly sitting there with almost no load
RN I know it is not the other end of the link, since it happens on different things
RN I've noticed this on planet, zzz, fishsticks and others
RN fishsticks is where the message comes up in a sub block of the page
dr|z3d That's most likely a privoxy originated message, RN.
dr|z3d I2P will never inject a message into a 3rd party web page.
RN thanks dr|z3d. I'll investigate privoxy. Though, when I get these messages I'm not loading a bunch of tabs or anything....
dr|z3d check ulimit -n for the user account that's running i2p.
RN not sure I would describe it as injecting though, as I just get that message instead of the page
RN other than fishsticks
RN there I get the page, with the links to the onion version and such, but where the chat part goes (js required) is where that pops up
RN so privoxy does make sense
RN 7028?
RN if it is a privoxy thing, then the number of particiating tunnels should not matter
dr|z3d if privoxy's hitting max open file limits, then participating tunnels can play a part in that.
RN I tried googling privoxy with that message but didn't find anything helpfull...
RN mmm
RN ok
dr|z3d check the link I just posted.
RN trying to envision how that works
RN I am
dr|z3d also, try ulimit -n 100000 and see if you can reproduce the error.
RN I did bump up the kernel limit
RN hmmm
RN still reading
dr|z3d that's a temp fix, you'll want to edit /etc/security/limits.conf or whatever your os uses to permanently configure open file limits.
RN seems I have privoxy log disabled...
RN probably for hardening reasons
RN still reading
dr|z3d might require a reboot. but you'll soon find out if you're testing web browsing.
RN none of the commands in that post work
RN not sure where to look next, think I'm gonna burry my head in the sand and see how long I can hold my breath
RN LOL
RN yeah, I bumped the os limits a few reboots ago, so those are in effect
RN I googled it before, but lemme see...
RN looks like same reults... my search continues. thanks for the consideration.
dr|z3d did you check privoxy's config limits?
RN I'm sure I did... don't recall where.
RN max client connections are unlimitied
RN hmmm
RN just the buffer, which I think I already had above the default, but I doubled it
RN see if that helps
mesh_ zzz: are the properties in ConnectOptions part of the public interface?
mesh zzz: are the ConnectOptions property part of the public interface?
mesh ow the router recovered
mesh zzz: awake?
dr|z3d mesh: he's in hiding. from you. :)
dr|z3d and me too, apparently :)
zzz nope
zzz mesh, you have any luck searching for apache usage?
dr|z3d he's updated his bug report, zzz. I think that's what he wanted to bring to your attention.
zzz he was asking about streaming options
dr|z3d ah, right. well he's updated his report, too.
zzz mesh, the supported streaming options are documented here i2p-projekt.i2p/en/docs/api/streaming
dr|z3d did you see my comments on a url filter for the http tunnel?
zzz yes I believe you proposed it before but don't remember what I said about it
dr|z3d I probably mentioned it briefly before, but I don't recall us having a conversation.
dr|z3d I thought it might interest you, not least because it offers a way to short circuit web spiders.
zzz you have some candidate url list? how many?
dr|z3d I don't have a compiled list, but based on various web logs, I can see probably around 10-20 urls that are getting hit on a repeated basis, and it looks like whoever's doing the crawling is using multiple dests in an attempt to fly under the radar.
zzz nothing new at all, but you say it's getting worse?
dr|z3d yeah, more and more frequent based on observation. here's a small sample of urls:
dr|z3d > /products
dr|z3d > /products/cat
dr|z3d > /wp-users/
dr|z3d > /wp-admin/
dr|z3d > /wp-uploads/
dr|z3d I can virtually guarantee if you review your web logs you'll see those, repeatedly.
dr|z3d that is, assuming you're not suppressing 404s.
zzz sure
zzz so you propose to perma-ban by default after one hit?
dr|z3d to block a webspider, you'd just add an <a href=/"foo" hidden>foo</a> to page(s) and spiders be gone.
dr|z3d for urls you're not hosting, sure, 1 hit and ban, though that may be user-configurable.
zzz this would probably kill all the inproxies for everybody. is that good or bad?
dr|z3d I'm envisioning something along the lines of the whitelist/blacklist in the tunnel manager in terms of implementation.
dr|z3d how would it kill inproxies?
zzz one hit via an inproxy, and the inproxy b32 gets banned
dr|z3d true. so maybe a time limit on the ban as per throttles.
zzz would these need to be exact url matches, or prefixes, or regex patterns?
dr|z3d but you raise an interesting issue.. inproxies could get around that by pre-empting the url requests.
dr|z3d I think we could get away with exact matches or very simple regex of the form ^/foo.*?$ or maybe even simpler.
zzz persisted or not?
dr|z3d checkbox option, dests saved to text file if true, as per tunnel filtering?
dr|z3d maybe with a max limit on dests in the file so older dests get cleaned out.
dr|z3d to prevent bloat.
zzz zab's access filter doesn't offer filtering based on http request line or headers
dr|z3d at it's simplest, regex could just be of the form ^/string or string$
dr|z3d indeed it doesn't. but it does offer persistence.
zzz could be done at streaming layer but it has limited knowledge of http, and the global blacklist there is of limited size
dr|z3d and it might be the easiest way to implement url filtering, by extending it.
zzz because it's just a router.config property
zzz if it's going to turn into a cat/mouse game (like just adding ? at the end) then a simple solution will quickly fail
dr|z3d true. so maybe a simple wildcard in addition to ^ and $
zzz however if it's just dumb crawlers coming thru the inproxy that don't even know anything about i2p, a simple solution is sufficient
dr|z3d sure, for crawlers, they'll likely just be hitting the sites and spidering whatever's there, so they're unlikely to adapt. for the vuln skiddies, a more fine-grained appraoch may be necessary in the medium term.
eyedeekay Slightly off topic but related, do the crawlers have a common user-agent?
dr|z3d maybe url and query are separated, so you can differentiate, similar to how nginx does with $request_url and $request_uri
dr|z3d eyedeekay: yeah. they do. common to the entire network :)
dr|z3d MY/OB 6.66
eyedeekay OK. I know somebody who will show as `Go-HTTP-Client` but I think that crawler has only run once so far
eyedeekay SAM crawler, no user-agent rewrite
dr|z3d if they're using the inproxy, they'll be using the generic UA provided by the inproxy.
zzz unless inproxy disables that (which they should)
dr|z3d zzz: httpserver not sufficient to grep for urls?
dr|z3d > [HTTPServer] Received Request headers
dr|z3d Request: GET /hosts.txt HTTP/1.1
zzz it could go there, may or may not be the best place
zzz because all the blocking right now is down in streaming
dr|z3d sure, but streaming blocking is globally applicable, not just for http. but whatever seems best.
zzz if you want to persist it, and it needs to be flexible, filter feels like the right place
zzz but it will need enhancements
dr|z3d yeah, I was thinking the filter might be a good place for it, with a gui along the lines of the blacklist. the filter as it stands needs work to make it more accessible to average user.
dr|z3d if we're going to roll with this, I can put together a prototype ui demo.
dr|z3d something the embraces both the url filtering and adds a ui for the existing filter options.
dr|z3d I should qualify that. the UI would probably take on characteristics of both the blacklist/whitelist and the throttler.
dr|z3d and thinking about it, it might be a good time to consider merging the functionality of the throttler with the filter, if we can basically provide the same functionality.
zzz zlatinb, thoughts on extending access filter to pass the fetch URL? (and maybe method and/or all the headers too) ?
zlatinb RN suggested something along those lines a while ago; I think it should be a separate filter or maybe even a plugin
zlatinb the access filter config format won't work well with urls/headers
zlatinb also, in its current form the access filter can be used for any type of server tunnel
zlatinb whereas URLs/headers are specific to http tunnels
zlatinb but overall yes, good idea
zzz public boolean allowDestination(Destination d);
zzz basically just need another arg
zlatinb other arg would be very specific to http traffic
zzz I forget, do you specify the filter class name so people can load their own?
zlatinb iirc yes but not sure
zlatinb or you call setFilter
zlatinb but definitely possible for ppl to use custom filters
zlatinb I was thinking of using one for muwire at some point
zzz the idea is to perma-ban the wordpress spiders
zlatinb yes I'm all for that lol
zzz filter feels like the right place to me, but streaming and i2ptunnelhttpserver are alternatives
zlatinb I'd vote for i2ptunnelhttpserver
zlatinb and call it HTTPFilter
zzz the issues are the url ruleset could get large or complex, and if there's persistence the banlist could get large also
zlatinb well yeah
dr|z3d banlist could be limited to a max number of entries, where the oldest get deleted.
dr|z3d (a la round robin)
zzz then you have to track last-seen and it gets more complex
dr|z3d do you? or could you just add the dests in the order they're found, and periodically dedupe?
dr|z3d for periodically read "when the maximum limit is almost reached"
dr|z3d or on a schedule, whatever works.
zzz all is possible, the question is how much is necessary
zlatinb there's no limit to the recorders in the current access filter, but could be added and made configurable
dr|z3d sure.. we could simplify things and forgo the persistence, that's another option, though I do like the idea of an external log of offending dests and a rollover option.
dr|z3d as for filters, if dests were timestamped they could expire after a configured period, and/or when the recorder file reaches a certain size.
zzz I see a fleet of 4 b32's acting simultaneously
dr|z3d there you go.. :)
zzz thats an odd setup but I guess indicates in-i2p?
dr|z3d inproxy access? not sure. it could just indicate that whoever's doing the vuln scanning is rotating the http proxies to evade detection or make blocks harder.
dr|z3d or you mean in-i2p == !inproxy access. my vote's on that.
zzz my filter never used to work but now it does
zzz zlatinb, is that the work you did with RN?
dr|z3d I think that got fixed a while back.
zzz may have to take another look at filters then. I never could figure them out, maybe that was why
zlatinb zzz: yes pretty much
zzz guess I left one in because it's complaining in the logs now
zzz holy crap 2447 entries in the block file
dr|z3d that's the bit about timestamps and or limiting the size.
zzz to do this right we'd need to pass the URL to some http flavor of filter and put URL regexes in the definition file
dr|z3d *nods*
dr|z3d and ideally provide a front end in the tunnel manager to allow easy editing and reviewing of existing filters.
zzz the filter would load all the regexes and compile it as one_big_java_pattern /this|/that|/foo*|...
mesh zzz: I updated the issue with uses of org.apache.http. Did you see?
zzz no mesh what did you find?
dr|z3d sounds good, zzz. a single pattern will be a lot faster to process.
zzz maybe
zzz I'm just thinking the patterns can't be hardcoded, e.g. somebody may want to use wp-foo for remote admin
dr|z3d definitely not hard-coded, no. user-configurable is the way forward.
dr|z3d some sane defaults would be ok, though.
dr|z3d and /wp-uploads is a legit url if you're actually running a wp instance.
zzz as zlatinb points out the current filtering, while the code is in i2ptunnel, actually gets done in streaming, and streaming knows nothing about http
dr|z3d sure, it's protocol agnostic. unlike httpserver
zzz last call for string changes, I'm going to push to tx shortly
mesh zzz: I updated git.idk.i2p/i2p-hackers/i2p.i2p/-/issues/353 . the org.apache.http code is really only used in one place, the i2p.i2p repo in just 2 classes
zzz looks like its in the i2pcontrol plugin too
mesh the i2pcontrol plugin is a bit strange. it looks like code that is duplicated in the i2p.i2p code
zzz yeah that plugin is obsolete, we pulled it into the standard release package
zzz what's the name of your project again? I forget
mesh zzz: the name of the project is 'mesh' hehe as in mesh computing
zzz got a link to the source?
zzz *source code
mesh zzz: the fundamental problem is that code actually uses httpcomponents to start a webserver which is talking to an i2p endpoint
zzz thanks
zzz I did one round of grepping here, I'm going to do one more pass
mesh if you try to run our our app from the command line you will get the error:
mesh Exception in thread "HTTP-listener-0-1" java.lang.NoSuchMethodError:'int org.apache.http.util.Args.positive(int, java.lang.String)'
mesh because the org.apache.http.util.Args in i2p.jar hides the org.apache.http.util.Args in httpcomponents.jar
zzz I understand the issue, you've explained it several times
mesh besides the code usages there's some stuff going on build.xml that I really couldn't understand. but I suspect once the rename is done that stuff could be deleted. (it seems at one point the shading of these classes was optional...?)
zzz yes thats debian stuff, it all has to be untangled
mesh cursed debian. this is why I just dist jars.
mesh zzz: who does the debian release?
mesh alright well let me know if there's anything else I can do to help. Not sure about the build.xml stuff but the other stuff should be a simple rename methinks
mesh zzz: is there a reason why I2PSocket doesn't expose get/setKeepAlive btw?
zzz don't know offhand
mesh zzz: what do you think about adding get/setKeepAlive to I2PSocket? This functionality is exposed through StandardSocket but not through I2PSocket for some reason?
mesh (fortunately keepalive is true by default and is this is the desired behavior 90% of the time but this seems like a strange omission)
mesh zzz: also I wonder if the property name constants defined on ConnectOptions should be moved to the I2PSocketOptions interface. Right now ther doesn't seem to be a way for client code to access these constants and they end up being duplicated
zzz we don't need more options
zzz and I suggest you not worry about micro-micro-micro optimizations
mesh zzz: I'm not saying more options should be added. I'm saying make it easier to set the options that already exist. Right now if I want to specify say PROP_MAX_WINDOW_SIZE I have to redefine the constant in my code, though this constant already exists in ConnectionOptions.PROP_MAX_WINDOW_SIZE
mesh if these constants were defined on I2PSocketOptions then both ConnectionOptions and client code could refer to PROP_MAX_WINDOW_SIZE
zzz I understand. I'm saying don't worry about 20 bytes here and there
mesh I guess. it's not about performance it's about eliminating code duplication.
zzz if you're concerned about that, build it as one big jar and then use pack200 to dedup
mesh pack2000 isn't a thing any more hehe
dr|z3d except it is, if you use apache commons its use could be restored.
zzz still alive and well in older JDKs, and we use it on our releases