zlatinb
I asked parg if he's doing or wants to do something with BiglyBT on sleep callback
zzz
zlatinb, I have a long set of emails from him a month ago with failure scenarios, he's waiting for me
zzz
dr|z3d, you have any i2cp fix test results for me?
zlatinb
here's a proposal:
zlatinb
1. Introduce Router.onSystemSuspend() and Router.onSystemResume() which remembers timestamp of last resume as well as current state
zlatinb
2. Modify Router.clockShift to not die if the clock shift notification occurred within X seconds of onSystemResume() or if still suspended
zlatinb
3. Modify RouterWatchdog to not restart router under those same conditions
zlatinb
4. In external apps (MuWire, BiglyBT, easy-install launchers) do the necessary hooking to system events
zlatinb
eot
zzz
zlatinb, we don't have any issues (on our side) with detecting and notification of clock shifts. that's working reliably
zzz
and there is some threshold, small shifts are not notified
zzz
so I don't think we need any more onXXX callbacks
zzz
the issues are in the recovery, both with OP on my forum, and with bigly (which I know you haven't seen the emails, so you're missing the context)
zzz
with OP, the bug is, why didn't the router client manager come back?
zzz
with bigly, there's some race issues
zlatinb
will an internal I2CP client know to reconnect after such a soft restart?
zzz
yes, see zzz.i2p/files/Log.txt
zzz
it's trying
zzz
it looks like Restarter hung
zzz
between "Stopping the comm system" and "Stopping the tunnel manager"
zzz
because the latter was never logged
zzz
looks unrelated to parg's complaints
dr|z3d
zzz: not seeing any more issues in the logs, grep I2CP gives me nothing now.
zlatinb
having trouble reproducing the clock shift in a windows vm; vmware intercepts the sleep call to suspend the vm
mesh
zlatinb: is there no way to virtualize the clock used by the router?
zlatinb
not out of the box I don't think
zlatinb
I was able to reproduce a clock shift in the vm, just had to wait a bit longer
zlatinb
now to get it to restart itself...
mesh
usually that happens automatically I think
zzz
dr|z3d, has it been long enough to have some confidence?
dr|z3d
zzz: about 70%.
dr|z3d
confidence that it's fixed, that is.
zzz
ok thanks for the testing and report, I'll push it. Really bad 17 year old bug
zzz
zlatinb, yeah you have to disable OS'es NTP to do any testing with clock shifts (as opposed to suspend)
zzz
my guess is that OP's issue is some deadlock in the comm system restart that will be hard to reproduce, but isn't OS-specific
mesh
I had the same problem I think until I stopped my laptop from going to sleep
zzz
what "same problem" ?
mesh
this is the problem where a router in hidden mode, windows goes to sleep, then wakes up, sees a massive clock skew, and all its known peers are gone?
zzz
the connected peers should go away, but not the known peers. maybe we have a problem with expiring them all
mesh
it may depend on how long your laptop sleeps for, mine was sleeping a good 4-6 hours, but each day when windows would wake the router was in a wacky state with known peers gone I do believe
zlatinb
ok, reproduced it
zlatinb
with muwire on windows, exact same behavior
zzz
which is what?
mesh
zlatinb: including hidden mode?
zlatinb
actually no, router restarted successfully
zlatinb
nvm
mesh
zlatinb: something else I noticed is that after a "soft restart" the router wasn't always in a good state. stuff like the graphs would be screwy.
dr|z3d
zzz: essentially when the system goes to sleep for a period, RI expiry goes into overdrive when it wakes, so if there's a clock jump, expiry should probably be suspended while the refresh job runs, and then resumed. that's my reading, anyways.
zzz
dr|z3d, are you just guessing, or you have evidence?
dr|z3d
that's based on prior experience.
dr|z3d
I'd probably just hike the expiry time for RI's to clockjump + 2 hours until a full refresh router sweep has run.
dr|z3d
and it wouldn't hurt to force a reseed if there's a clockjump, either. might help mitigate the issue, too.
zzz
there's an easy fix but I'll await confirmation of the problem
dr|z3d
ok
mesh
I love how RAM has become this horribly expensive commodity
zlatinb
heard from parg - bigly does have some native hooks into shutdown/suspend/resume events but didn't say what they do