zzz
zlatinb, would really appreciate it if you could allocate some time to review the pumper MR
zzz
not sure it's solid enough yet for the next release but if we don't start working on it soon it won't make it in
zlatinb
zzz: commented on the MR. There is an architectural flaw I'm afraid - all interest changes must happen on the pumper thread
zlatinb
so one needs to introduce a queue with interest operation changes and wakeup() the selector after enqueing
zzz
hmm. I looked at what jetty does and this is similar
zzz
take a look at the locking before you declare it hopeless
zlatinb
this is not hopeless per se because we have a time limit to how long we wait in select(..) but still as is this patch will add latency in the case where writes need to be done with interest ops
zlatinb
you can write a small test program with two threads - one blocked in select() w/o a timeout, and the other trying to change interest ops
zlatinb
actually let me double-check..
zlatinb
yes, SELECTOR_LOOP_DELAY = 200
zlatinb
so it may not be visible with naked eye
zlatinb
I used the exact same design in one of my HFT jobs where microseconds mattered
zlatinb
since they don't and the selector is guaranteed to wake up within 200ms it's probably not fatal
zzz
zlatinb, the interest ops lock is on the individual SelectionKey, not on the selector; the write lock per-connection, not on the selector
zzz
there's no possibility for a 200ms block
zzz
like I said, take your time to review the diff
zlatinb
I did and added other comments too
zlatinb
but the 200ms wait is real
zzz
I'm looking at github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/sun/nio/ch/SelectionKeyImpl.java
zzz
where's the lock? line 101?
zlatinb
yes, which is an abstract method in SelectorImpl implemented in the subclasses
zzz
github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/sun/nio/ch/SelectorImpl.java at the bottom, still abstract
zzz
still no sign of a lock
zlatinb
EPollSelectorImpl.java
zlatinb
github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/linux/classes/sun/nio/ch/EPollSelectorImpl.java
zlatinb
and the windows equivalent: github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/windows/classes/sun/nio/ch/WindowsSelectorImpl.java
zzz
so that holds updateLock which is also held in processUpdateQueue() called from doSelect()
zzz
processUpdateQueue() is called at line 113; the timeout wait is at line 120
zzz
so the lock's not held in the wait
zzz
it would be very bad design if it did hold a lock in the wait()
zlatinb
it was at least in earlier JDKs. Let me write a quick program
zzz
how early? java 4 or 5? those were known to be bad
zlatinb
around that early yes
zzz
this jdk 11 code we're looking at is pretty clear
zlatinb
it's not clear to me the interest ops will be changed until the next iteration of doSelect
zzz
sure but most writes are happening directly, not in the selector thread
zzz
if they do get thrown onto the selector, it will be woken up as usual
zzz
99.9% of writes bypass the selector
zlatinb
that depends, maybe not in a bulk transfer. That will need to be tested in a testnet
zzz
sure, not claiming it's perfect yet. just that it's been through several live net testing / bug fix iterations already to get to this pint
zlatinb
ok, wrote a small program - the changing of the interestOps does not block the changing thread, but it does not take effect until the next iteration of select
zlatinb
waiting for pastebins to load
zzz
thats fine. if the write doesn't complete, it will get thrown onto the selector which will get woken up, just like before
zlatinb
ok. In the code I did for work I would set a state that the channel is full and any further writes to the channel would need to go through the selector. I don't see such state kept in this MR
zzz
there's no state. every call to NTCPCon.write() will try to write directly, and then if it fails, throw it to the selector
zlatinb
hmm that might be a problem because there is no guarantee that data will be written in order
zzz
true, my statement above was a simplification
zzz
NTCPCon.write() -> pumper.processWrite() tries to clear the write buf queue
zlatinb
yes but there is no guarantee that the pumper thread will grab the cpu before the writing thread writes to a writeable socketchannel
zzz
that's why there's now a separate write lock
zzz
all writes happen in pumper.processWrite(), whether via the selector thread or another thread
zlatinb
that's not sufficient, consider this sequence:
zlatinb
1. writer thread writes to socketchannel, fills it up, registers for interest with selector, releases all locks goes to sleep
zlatinb
2. some time passes, socketchannel becomes writeable again, wakes up selector thread
zlatinb
3. before the selector thread gets the cpu, writer thread comes back, acquires all locks and writes more
zlatinb
= data out of order
zzz
no, because everything to be written gets put in the write buf queue (no matter what thread), and then pumper.processWrite() takes it off the queue
zzz
in your sequence above, if there's still data in the write buf queue after 3), then 4) the selector will write the rest of it (or not)
zzz
so when I say a thread 'writes directly', it is in the thread, but it's first putting the data at the tail of the write buf queue, then writing everything it can starting at the head of the write buf queue
zzz
that keeps it all in-order
zlatinb
ok that looks fine then
zlatinb
I'll run it through the testnet and see how it behaves
zlatinb
not tonight though
zzz
great, glad to finally get your eyeballs on it