#i2p-dev (irc2p) | 2.12.0-9 | https://i2p.github.io/i2p.i2p/i2pupdate.zip | #translators for Weblate translation help

#i2p-dev

/2026/04/13

Online: 62

&zzz

+FreefallHeavens

+R4SAS

+RN

+ReturningNovice

+StormyCloud

+T3s|4

+acetone

+cims

+eche|off

+fa

+mareki2p

+nilbog

+orignal

+postman

+psychopuck

+qend-irc2p

+rednode

+snex

+wodencafe

Arch

Danny

Irc2PGuest28384

Irc2PGuest66257

Irc2PGuest75631

Irc2PGuest81267

Onn4l7h

Onn4|7h

Over

Sisyphus_

Sleepy

T3s|4_

U1F642

Watson

Zapek

aargh4

ahiru

ananas

anontor

calamares

dr4wd3

duanin2

i2potus

ice_juice

justaperson

luvme

mahlay

makoto

marek22k

n2_

not_bob_afk

onon_

pinotto

poriori

profetikla

r00tobo

rapidash

test7363673

uop23ip

w8rabbit

x74a6

zelgomer

rednode Ever thought about looking into getting I2P’s code audited by Mythos AI? Might not be a bad idea. It’s really good at finding vulns.

rednode Some of the Hyphanet (original Freenet) devs were talking about getting their project audited by it in the FLIP IRC on Hyphanet.

zzz welcome rednode. others seem to be doing the AI stuff for us. There's also lots of updated static analysis tools out there. I've started a new run thru a spotbugs report.

zzz so we're not lacking things to do

orignal zzz, you should claim clearly somehwere that AI slop is not allowed in the project

zzz maybe, will discuss with eyedeekay

nyaa2pguy there has been a huge amount of ai slop bug reports overwhelming open source, but the mythos thing does seem at least slightly interesting

rednode That’s good, thanks! And agreed Mythos seems to be on another level. Headlines like “Anthropic's Mythos AI can spot weaknesses in almost every computer on earth. Uh-oh.” don’t seem to be THAT much of an exaggeration.

rednode Then again, it probably is a bit of an exaggeration. I trust yalls judgment on it, just wanted to bring it up in the off chance you haven’t heard of it.

zzz if this is the new normal I'm going to need a lot more of eyedeekay help on the java side

eyedeekay OK sorry I didn't get with you yesterday, was away from my desk. Bit of a wall-of-text incoming here. Re: LLM generated bug reports I've noticed a few things about them:

eyedeekay - They're getting credibility in places that matter but almost never without a human piloting the thing. See: Linux kernel for instance. LLM assistance is acceptable and sometimes actually helpful, LLM's conjuring reports whole-cloth is not.

eyedeekay - They're actually pretty OK at it, with some qualifications:

eyedeekay - They "Expect" things, they don't "Know" things. Case in point, if there is a code path which is unreachable but which is technically but superficially incorrect, they will consistently overestimate how important the unreachable error is. The "logic(TM)" seems to be that the error arising from violated expectation is obvious so it must be severe.

eyedeekay - They lose focus. Working over a large, old codebase like Java I2P they will surface the most obvious issues across a variety of packages, not deep issues in a single package.

eyedeekay - When you request an itemized analysis of bugs, roughly 50% of the bugs will be obviously incorrect or irrelevant. If you take time to explain the software to it, sometimes you can get the false positives down to 20 or 30% for a while, but the better your software works, the more false positives there will be.

eyedeekay - When you surface all the bugs the LLM can predict, then the false positive ratio will go to 100%. In the absence of bugs it can find, it will make some up.

eyedeekay - The results surprisingly reproducible. Not deterministic, but if you ask it to audit the same code twice, it will generally find the same issues twice, *including* the false positives

eyedeekay So to make it give a useful report, the human who initiated the report needs to sort out the false positives, and at least be able to sort out the false positives.

eyedeekay That gives us a baseline for establishing any rules for LLM submitted bug reports.

eyedeekay I also think this is relevant because I think the behavior tends to converge on this 100% false positive point.

eyedeekay There does seem to be a point where these things stop surfacing real bugs and start surfacing consistent seeming nonsense.

snex tell the LLM to write some automated tests for the bugs

eyedeekay They'll do that if you ask but it won't help people who can't read the tests

snex presumably you guys can read them

snex and can run them

eyedeekay Sure yeah. Just a matter of our time. Ideally, the person who initiated the report should be able to read and run the tests, filter false positives, etc

eyedeekay A bullshit report with a bullshit test is just twice as much bullshit

snex well you need tests anyway so at least make them submit some useful work =P

eyedeekay A real report with a real test is certainly more helpful