IRCaBot 2.1.0
GPLv3 © acetone, 2021-2022
#i2p-dev
/2026/04/13
@eyedeekay
&zzz
+FreefallHeavens
+R4SAS
+RN_
+Romster
+StormyCloud
+cims
+eche|off
+hagen
+nilbog
+nyaa2pguy
+orignal
+postman
+qend-irc2p
+rednode
+snex
+synergy3582
Arch
Danny
Holmes
Irc2PGuest17692
Irc2PGuest28384
Irc2PGuest74003
OfficialCIA_
Onn4l7h
Onn4|7h
Over1
Sleepy
U1F642
Wikk_0_
Zapek
aargh4
ac9f
acetone_
ahiru
ananas
anontor2
calamares
dr4wd3_
duanin2
eyedeekay_
eyedeekay_bnc
leopold_
mahlay
makoto
marek
mareki2p_
n2
not_bob_afk
poriori_
profetikla
r00tobo
rapidash
test3847473
uop23ip
urist_
wodencafe2
x74a6
zelgomer
rednode Ever thought about looking into getting I2P’s code audited by Mythos AI? Might not be a bad idea. It’s really good at finding vulns.
rednode Some of the Hyphanet (original Freenet) devs were talking about getting their project audited by it in the FLIP IRC on Hyphanet.
zzz welcome rednode. others seem to be doing the AI stuff for us. There's also lots of updated static analysis tools out there. I've started a new run thru a spotbugs report.
zzz so we're not lacking things to do
orignal zzz, you should claim clearly somehwere that AI slop is not allowed in the project
zzz maybe, will discuss with eyedeekay
nyaa2pguy there has been a huge amount of ai slop bug reports overwhelming open source, but the mythos thing does seem at least slightly interesting
rednode That’s good, thanks! And agreed Mythos seems to be on another level. Headlines like “Anthropic's Mythos AI can spot weaknesses in almost every computer on earth. Uh-oh.” don’t seem to be THAT much of an exaggeration.
rednode Then again, it probably is a bit of an exaggeration. I trust yalls judgment on it, just wanted to bring it up in the off chance you haven’t heard of it.
zzz if this is the new normal I'm going to need a lot more of eyedeekay help on the java side
eyedeekay OK sorry I didn't get with you yesterday, was away from my desk. Bit of a wall-of-text incoming here. Re: LLM generated bug reports I've noticed a few things about them:
eyedeekay - They're getting credibility in places that matter but almost never without a human piloting the thing. See: Linux kernel for instance. LLM assistance is acceptable and sometimes actually helpful, LLM's conjuring reports whole-cloth is not.
eyedeekay - They're actually pretty OK at it, with some qualifications:
eyedeekay - They "Expect" things, they don't "Know" things. Case in point, if there is a code path which is unreachable but which is technically but superficially incorrect, they will consistently overestimate how important the unreachable error is. The "logic(TM)" seems to be that the error arising from violated expectation is obvious so it must be severe.
eyedeekay - They lose focus. Working over a large, old codebase like Java I2P they will surface the most obvious issues across a variety of packages, not deep issues in a single package.
eyedeekay - When you request an itemized analysis of bugs, roughly 50% of the bugs will be obviously incorrect or irrelevant. If you take time to explain the software to it, sometimes you can get the false positives down to 20 or 30% for a while, but the better your software works, the more false positives there will be.
eyedeekay - When you surface all the bugs the LLM can predict, then the false positive ratio will go to 100%. In the absence of bugs it can find, it will make some up.
eyedeekay - The results surprisingly reproducible. Not deterministic, but if you ask it to audit the same code twice, it will generally find the same issues twice, *including* the false positives
eyedeekay So to make it give a useful report, the human who initiated the report needs to sort out the false positives, and at least be able to sort out the false positives.
eyedeekay That gives us a baseline for establishing any rules for LLM submitted bug reports.
eyedeekay I also think this is relevant because I think the behavior tends to converge on this 100% false positive point.
eyedeekay There does seem to be a point where these things stop surfacing real bugs and start surfacing consistent seeming nonsense.
snex tell the LLM to write some automated tests for the bugs
eyedeekay They'll do that if you ask but it won't help people who can't read the tests
snex presumably you guys can read them
snex and can run them
eyedeekay Sure yeah. Just a matter of our time. Ideally, the person who initiated the report should be able to read and run the tests, filter false positives, etc
eyedeekay A bullshit report with a bullshit test is just twice as much bullshit
snex well you need tests anyway so at least make them submit some useful work =P
eyedeekay A real report with a real test is certainly more helpful