~dr|z3d
@RN
@RN_
@StormyCloud
@T3s|4
@T3s|4_
@eyedeekay
@orignal
@postman
@zzz
%Liorar
%ardu
%cumlord
%snex
+FreefallHeavens
+Leopold
+Xeha
+bak83
+hk
+profetikla
+qend-irc2p
+r00tobo_BNC
+uop23ip
AHOH
Arch
BubbRubb
Danny
DeltaOreo
FreeB
Irc2PGuest38625
Irc2PGuest82037
Meow
Onn4l7h
acetone_
anontor
mareki2p_
maylay
not_bob_afk
pisslord
poriori_
r00tobo[2]
r3med1tz-
shiver_
simprelay
solidx66
thetia
u5657
usr001
weko_
wew__
zer0bitz
mareki2p
Somebody is definitely scanning all known b32's for HTTP servers. I got GET request after running my I2CP app (and announcing my b32) for 3 days. I never mentioned my b32 anywhere else. Now I maybe sort of understand the portal guy (the CSS redirect guy).
dr|z3d
no doubt. there are at least 3 people I know of.
dr|z3d
amusingly, they're all in this channel.
snex
i turned my scanner off a while ago
dr|z3d
make that two :)
snex
but go to #scanners if you want to learn more about it
Leopold_
mareki2p: that's notbob.i2p
cumlord
yeah he got kind weird after i told him i was doing that
Leopold_
It seems like he's hiding something
Leopold_
we need to find out
Leopold_
I saw in your news about his thousandth post
dr|z3d
I don't think there's anything dubious happening, other than some of the hosted content.
cumlord
lol he's disappeared
cumlord
yeahhhh big 1k :D
dr|z3d
which brings me back to the idea snex mentioned a while ago.. some sort of subscription blocklist for floodfills to exclude said dubious sites.
snex
yeah i had 2 ideas on implementation (only one original to me) but both have downsides
cumlord
glad there’s so few of them anyway
dr|z3d
well that's a relief.
snex
1. share only a partial b32 string, enough to avoid most false positives but not enough to let users brute force the real one
snex
2. share hashed strings and have routers hash each b32 before allowing it. could be too slow maybe
snex
hash and check before allowing*
cumlord
I pointed a squirt gun at one of them a couple months ago
dr|z3d
sure, either a partial string of the b32 and/or b64, or a hash.
cumlord
I kinda like the partial, maybe b64 though
dr|z3d
*** laughs at the notion of a squirt gun. ***
snex
yeah doing partial b64 should reduce false positives
snex
i bet theres even an equation that can tell us the exact number of characters to obscure to minimize the chance of false positives
snex
while maximizing the cpu required to brute force
Leopold_
What are dubious sites?
Leopold_
Are you referring to cases when an obviously private resource is found, or is it about content censorship?
cumlord
Guess in this case to brute force it you’d have to do trial and error and at that point you might as well just scan for them
cumlord
it’s about cp
snex
it's about whatever you want to block your hardware from serving
snex
it's up to each individual to decide what they want to block and allow
snex
but you want to be able to share your blocks to others without revealing the actual sites
cumlord
true, could see it working just like hosts.txt kinda thing
snex
exactly
cumlord
don’t want it don’t follow it
cumlord
I think it could be more forgiving than I thought because of needing to check each all of them, so maybe it could have some leeway like string…string so it adjusts the omitted chars
not_bob
mareki2p: Likley me. I should not be hitting very often.
not_bob
Leopold_: I am simply collecting data. Nothing more.
not_bob
For stats. there are eight sites serving unwanted content out of the 3100 that I have identified. That's a very small number.
cumlord
yup most likely, I pause mine now and then
not_bob
Mine has been running solid for about two months now?
not_bob
But, I use a backoff so if a site does not responed, while it will get tried again, eventaully that chance becomes almost zero.
cumlord
it’s a small number at least, more than I have
Leopold_
not_bob: Do you have crawlers for content search? I often see crawlers in the logs for wordpress and basic web server panels such as status-server and admin
not_bob
Leopold_: Nope. I grab the site data from /, strip the html and never touch that site again. This is just so I can catagorize them.
not_bob
And, only the html.
dr|z3d
more info on the zzzot client bug, zzz: Caused by: org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte 85 in state 0
not_bob
Leopold_: I get those wordpress and related scans on my sites as well.
cumlord
I might do some of that.
not_bob
Leopold_: I also have no plans to share my data with anyone, other than as general stats. And, if I do share b32s, they will be examples, not the actual b32 addresses.
not_bob
"Wall of shame"
dr|z3d
Leopold_: if you're seeing non-published urls being accessed, that's like a vuln scanner.
Leopold_
eight eepsites...
dr|z3d
And in +, we have mitigations against vuln scanners.
not_bob
That I'm aware of, yes. OUt of 3100 or so.
dr|z3d
(You have to activate them, though)
not_bob
Note that due to the way I2P works, I likely have only fraction of a percent view of the network at any given time.
snex
seems like given enough time you'll eventually see everything
not_bob
If you are curious, I've seen just over half a million unique b32 addresses.
cumlord
I try not to run the vuln scanner constantly pegging the same sites though
not_bob
snex: Possibly.
not_bob
Anyway, once my scanner finds something, it stops on that address. And, if nothing is found, there is a backoff.
cumlord
That’s what I think, even just running it intermittently is enough to get all the stable ones, I think
not_bob
cumlord: Exactly.
snex
i still wanna know why there are so many ntp servers
not_bob
I have some of that logic in my reports. "hosts seen per hour" and then another graph that shows "hosts seen per hour that have been seen more than x times"
mareki2p
Few things, some unrelated to others. The scrape was from 6f5ufkq6636k423ravrzxqsskmehyj2htloyp3dm4bflj32y63pq, crypto type elgamal+eddsa-sha512-ed25519. I2P looks to me like early internet, so there will be the same problems and challenges as they were back then. So suggestion for people running scrapers: Do you want to respect robots.txt? I can easily protect myself by dropping the initial packet anyway. Or s
mareki2p
o many packets until I realize I don't want to serve that connection. Or use different port. Or something, there is plenty options. I already discovered one child porn site and I was not even looking for any.
not_bob_afk
mareki2p: I do not bother with robots.txt. Why? If a site responds with http, then my scanner will never hit it again. If you have a robots.txt or not doesn't matter. Either way it will stop poking you.
dr|z3d
if you ask cumlord, Leopold_, he might tell you how to block and ban vuln scanners.
dr|z3d
(in +, with the http_blocklist file)
snex
if your service is only for your use, use encrypted leasesets
mareki2p
No no, I will deal with this myself. And I'm running this from my own I2CP app, not using Jetty from router (if this is relevant).
mareki2p
Encrypted LeaseSets ... I didn't even started reading the docs about them yet.
cumlord
mine doesnt doesn’t do any crawling, it just saves the html from the index page
cumlord
oh yeah very good for that I think I had a guide for that, had to clear out the wall of shame the other day
cumlord
yeah I think you’d need to do it yourself then with some throttling and blocking endpoints
mareki2p
ok, I read blog post about encrypted leasesets by idk from 2021. It is ideal for me. But I didn't get the DH part. Who is allowing the clients to obtain the encrypted data? I'm guessing not the final destination as it wants to remain hidden. So I guess netDB then?
mareki2p
Oh yeah, I'm stupid, the clients must already know the server's destination, of course.
mareki2p
Another question. Is there any UDP HTTP servers (quic/http2.0/http3.0). I think nothing (I2P itself) is preventing me from this running one.
dr|z3d
no.
dr|z3d
at the bare minimum, http 2 or 3 requires https.
mareki2p
Oh, web browsers refuze to connect to nonHTTPS over UDP.
mareki2p
yes
dr|z3d
we had a discussion about http/2 a while back, it's technically supported in jetty but there's no plan to implement it.
mareki2p
Yes, I remembered, HTTP spec allows it, but browsers decided to not do it.