not_bob
dr|z3d: I was not offered anything :(
dr|z3d
probably a blessing. I still have no idea what was being offered, other than "an app".
not_bob
Likely malware. Odd.
dr|z3d
I must have "I'm a sucker, try me" tattooed on my forehead.
not_bob
You don't seem to be the kind of guy who wears tatoos.
dr|z3d
*** laughs. ***
dr|z3d
how's your image gen stuff going? had a chance to play yet?
not_bob
Some chance, yes. Made some lemons yesterday.
dr|z3d
hooray for lemons! \o/
not_bob
Real lemons, with horror baby heads.
not_bob
I set it to generate about 1000 of them and cheked on it a fwe hours later when it was done.
not_bob
Some real horrorshow stuff in there.
not_bob
Which is what I was going for.
dr|z3d
maybe use rembg or similar and overlay them on lemon-class sites, that way you have a use for them and a forum for display :)
dr|z3d
*lemon-class site screengrabs.
not_bob
Hah, good idea.
not_bob
I also found a model that I really like to generate real people.
not_bob
It's mostly really amazing.
not_bob
Though, it falls flat on it's face from tiem to time. But, if I generate 1000 images I will get a very nice selection to pick the best from.
not_bob
Though, I can often get a really good image in only 100.
dr|z3d
are you on comfyui yet?
not_bob
No, I have not had time to do the switch yet.
not_bob
I will eventually.
not_bob
The number of steps really matters. Some models work fine at 25 steps, but others need 40 to get amazing results.
cumlord
ah seems like the repo for the "app" is still private
dr|z3d
you got offered, cumlord?
dr|z3d
steps matter, sure, as does the scheduler. some schedulers "converge" a lot sooner than others.
dr|z3d
but maybe schedulers aren't so much a thing in a111 or whatever you're using. sd.next?
not_bob
Yeah, sd.
not_bob
My main focus as of late has been my scanner.
not_bob
I'm not sure how much I've talkd about this here. But, like snex and cumlord.
dr|z3d
network sniff sniff.
not_bob
Found some cool stuff. Mostly broken sites.
cumlord
yes
cumlord
afaik it's a reverse proxy to try to keep such network sniffy bots out of sites
not_bob
heh
not_bob
I've also setup some sites that I've not published. They are getting hits.
not_bob
Which means someone else is sniffing :)
not_bob
But, it could also be the other guys in #scanners too.
not_bob
The view of a single floodfill on all the published leasesets is pretty small.
not_bob
But, I think if I had about 50 floodfills I could get pretty good coverage.
cumlord
i got drifted to the guessthesong thing XD
not_bob
I'd have to brute force the router hashes to position them just right though.
not_bob
I saw that, curious site.
not_bob
Better coverage would also help me detect multi-homed sites as well.
not_bob
I'd just have to query the master database for which ones get published more often than nomral.
not_bob
Thoughts?
cumlord
i haven't gone through and try to categorize things in a while lol. lots of abandoned sites
not_bob
I have about 1500 sorted into cats.
not_bob
Out of about 3000 found.
not_bob
3200, current count.
cumlord
holy shit, all online?
not_bob
They were at one time.
not_bob
I don't track them past "this one works", so have no system to determine if they are still online.
not_bob
But, I log every occurance of an announcement, so I can sort out ones that I see more than a few times.
not_bob
I actually do that in my scanning.
not_bob
Scanning happens in three phases. "new sites" These all get scanned. "sites that have never been tried" - these all get hit as well. And, then the magic one. Retries.
not_bob
Retries have an exponentioal backoff. I also don't bother to retry sites that have not been seen at least three times.
not_bob
There are 313000 unique b32 addresses in my database.
not_bob
Many of those are one hit wonders, or short lived.
cumlord
think i'm seeing around 300 online atm
not_bob
I am aware of some stable sites hat are not in any hosts.txt that I'm aware of.
not_bob
Just because we do DNS, does not mean anyone has to use it.
cumlord
i never ended up doing exponential backoff
not_bob
Oh, how do you do rescans then?
not_bob
The number is just too large to do it blindly.
cumlord
that's about double uniques that i have, good work :D
cumlord
it just will retry it 3 times, spaced out about 10 minutes
not_bob
Ahh
cumlord
i'm likely missing some from that
not_bob
I retry forever, but with the backoff.
not_bob
Every failure on a site decreases the chance it will be hit again by 50%
not_bob
And , I do find things in the retries.
not_bob
I get about 8 new hits an hour.
not_bob
Most of them tend to be temp stuff that doens't last long.
not_bob
Also, 8 is wrong. It's more like 4 average per hour.
not_bob
I can't math today.
not_bob
That's new unique hosts that respond with http
cumlord
i do a separate one to check if online, that ones a bit more thorough . caching and attempts to sort into some categories
dr|z3d
btw, if you missed the missive, not_bob, wall.i2p will function without js now.
not_bob
Does it? Let me try it!
dr|z3d
js still recommended for the optimal experience, but hey.
cumlord
i think there's a lot of churn with like temp webui stuff
not_bob
Yeah
not_bob
And, that is to be expected, the churn.
not_bob
Long running services are what I'm looking for.
not_bob
And, with enough data I can find them.
not_bob
dr|z3d: It does not seem to be updateing here.
not_bob
Without JS.
not_bob
hahaha, now I get a 429 :)
cumlord
ah seems to be working for me last time i tried it did something weird
cumlord
after the little loady boy at the top it reloads
not_bob
I'll let it rest for a while and try it again.
dr|z3d
if you have js enabled, when you browse to another tab it'll pause.
not_bob
I do like that feature.
dr|z3d
the 429 thing I see from time to time, still trying to tweak that.
dr|z3d
escape will also pause / play, and space will advance to the next image, though you're throttled when you hit space.
dr|z3d
so you can't hit space too many times in a minute, or once every 3s iirc.
dr|z3d
obviously you need js for that, too.
cumlord
so far so good with the nojs
dr|z3d
*thumbs up*
dr|z3d
just to be clear, not_bob_afk, you did download the snark update from skank and not clearnet?
not_bob_afk
skank, yes.
dr|z3d
ok
dr|z3d
there are 3 links on skank is why I ask, but ok. let me throw up a new build.
not_bob_afk
I grabbed the i2p one.
dr|z3d
ok, can no longer reproduce the issue here, but I haven't tried super hard. anyways, give it 5 minutes to upload, there will be a new build.
not_bob_afk
Thank you, and I'll test.
not_bob_afk
I also found a new thing, but I think it's a me thing. When I have the window really narrow it looks funny. This is not totallly unexpected.
not_bob_afk
Yep, does it n more than one browser.
not_bob_afk
But, that's an edge case. If I make the browser full screen it looks fine.
not_bob_afk
The way I'm using it here is an edge case.
not_bob_afk
I have no idea how many people use it on this OS.
dr|z3d
ok, uploaded.
dr|z3d
I don't think it's an android issue in this case, but you'll want to be on at least java17 real soon now, otherwise it won't run for much longer if you keep updating it.
not_bob_afk
I'm running newer than that.
dr|z3d
ok, good, you're fine then.
not_bob_afk
Testing.
dr|z3d
ok
not_bob_afk
Shit, when did cake die?
not_bob_afk
Oh, May 14th.
not_bob_afk
I had no idea.
not_bob_afk
Lame.
not_bob_afk
I'll be back online later. Need to deal with people...
dr|z3d
aight, laters.
RN
dr|z3d, won't run on this version of comfy
RN
really need to move comfy to pyvenv
dr|z3d
RN: yes you do.
dr|z3d
*pyenv
dr|z3d
but it sounds like you need to update comfy, if you don't have native support for whatever you can't run.
zzz
so tracker 2 treats whitespace as wildcard, so 'white lotus' becomes 'white*lotus', so effectively it enforces token order
mareki2p
Hi, hectobit also contacted me about his new app. Sorry I'm late to respond because I'm stupid and I didn't notice scrollbars in my IRC client. His app is HTTP server in GO that aims to prevent scrapers from scraping your eep site, it is available on community git. It works even without JavaScript, the idea behind it is that scrapers usualy don't download, parse and render CSS. So he serves a landing page that
mareki2p
contains link to a CSS, which contains bunch of rules that are using links to images. He believes that scrapers will not download those images, but real world browser would. So he listens on his server for legit browser to request such images (I call it "knocking") and then he allows redirect to the real eepsite. Scraper would not "knock" so it will not get to the real web page.
RN
I also got a dm from hectobyte
RN
and dr|z3d, I can't update comfy. if I update everything breaks. weather I go one or 4 versions up or all the way to current.
RN
mareki2p, it sounds like he becomes a middleman to your site's connection
RN
I don't like the sound of that
mareki2p
Yeah, something like that, it is targeted to eepsite operators, not eepsite visitors, it is to prevent scrapers to scrape your eepsite. He has demo running - visit his community git repo, there is a link. But I managed to circumvent it using headless chrome.
mareki2p
His repo: git.community.i2p/portal-app/portal-app you need an account there to view it.
RN
yeah I'm not interested in letting someone MITM my site
RN
if your description is accurate I'll stay far away from it.
dr|z3d
there's already some anti-scraping options in +
cumlord
yeah i look through some of it, neat idea, but not sure if adds value if taking advantage of that already
cumlord
didn't notice the demo before
RN
did mareki2p describe it accurately? sounds like mitm to me.
mareki2p
I tought it is MITM that you install on your own server (like reverse proxy) that will protect you against scrapers. You compile it, you run it, you configure it, not hectorbit. Maybe I misunderstood it? Maybe I described it wrongly?
cumlord
that's the way i understand it, the source is up
cumlord
if strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") == false {
cumlord
ErrorHandler(w, r, 403, "Please use a modern browser")
cumlord
return
cumlord
i just need to send this header then?
mareki2p
As hectobit described this to me, it is to distinguish legit web browser from scraper. For examle curl from command line does not use this header, you need to add additional command line parameter to curl to enable this.
RN
ok.. so not that scary then
mareki2p
Does cake support zip files? I could upload my conversation with him to cake.
RN
if cake is back up and running, then yes it allows zip files
mareki2p
Oh, any other upload service?
dr|z3d
just embed some invisible links in your site, or add no-follow tags as cumlord does, and then add the paths to your http_blocklist.txt file and use tunnel filtering to badn the dests. done.
RN
otherwise there's arav and privatbin and i think there's a bin on simp
cumlord
yup, i've added to my scanner, see if it gets through now
mareki2p
OK, not a zip file, just plain text, here: privatebin.i2p/?5f543658986f2213#FVe6nZ2SsrAArVP8U8vhcDBmhbGBGKi1hVKojydHSGYJ but that site requires JS.
cumlord
damn i think i triggered some sort of global ratelimit on the demo, oh well
cumlord
wasn't that easy, 3p4mcgb4k3mu56q2ck6aisysvlzoe23gwfuv2qat7xztclpltzzgyzwo.b32.i2p/style.css you can see what's going on
mareki2p
Yes, I believe you can download that file via curl (or scraper), after you get its URL from the root document, also downloaded via curl (or scraper).
mareki2p
Buf if you have headless chrome, all defenses are moot. Headless chrome can be invoked from command line or from thier own scripting thing (I'm guessing node.js). Headless chrome can be run from server even without x.server and it can produce screenshots or DOM dumps. So...this protection is useless.
mareki2p
Should be something like this: curl --compressed --proxy 127.0.0.1:4444 3p4mcgb4k3mu56q2ck6aisysvlzoe23gwfuv2qat7xztclpltzzgyzwo.b32.i2p -o root.html and curl --compressed --proxy 127.0.0.1:4444 3p4mcgb4k3mu56q2ck6aisysvlzoe23gwfuv2qat7xztclpltzzgyzwo.b32.i2p /style.css -o style.css
cumlord
there's usually ways around these things, i've tried to avoid doing the browser route for scraping things, would just be cleaner into my existing stuff
cumlord
looks like i'd need to follow all of the links from the css to brute force it without triggering ratelimit
mareki2p
I'm not sure. It seems there are plenty if-conditions about screen size or aspect ratio. So only some of the links needs to be followed. I'm not 100% sure.
cumlord
yeah not sure if it will block on the wrong answer, if it does it would be more complicated to scrape
orignal
how about wget?
orignal
it downloads everything
mareki2p
I'm not sure how it works, since it does not use JS. I guess the main request must be done in parallel with all the CSS sub-requests. So the main request is held in a limbo for a while, and only after the "knocking" is finished, the main request is allowed to continue, I'm guessing inserting some HTML redirect tag. wget would block on the first/main request.
mareki2p
And since legit browsers issue requests in parallel, that distinguishes them from scrape scirpts, I guess.
cumlord
i think i get it now, he's using media queries