SVDM is stops outbound calling after some time - where the mailer using it thinks the other end is "sleeping".
It looks like outbound data is sent, but incoming data is not seen by SVDM and passed onto the mailer.
However, inbound calling works fine, and if the inbound call causes the mailer to exit (and start the BBS), upon return, outbound calling works again for a period of time.
Naturally stopping the mailer (and thus exiting SVDM) and restarting works fine.
Attached is a log from dbgview, it shows:
After a fresh start of SVDM, outbound polling works successfully. I made several outbound polls after each other and they were all OK. See time stamp 08:33:57 for the first call, which was successful and completed 08:34:21 (not mail was actually transferred - so it was all EMSI chat). There were a couple of other polls up to 08:37:28 which were successful.
At 15:21:44 and outbound call failed. (From the other end, I can see the incoming connection and the start of the EMSI setup, but no response to it - and the calling mailer didnt see the chatter.
What's obvious for me:
On the successful calls, there are two "recv returned 107" (probably as part of the EMSI setup), on the failed call there is only 1.
On the failed call SVDM is "TX: 2 bytes" twice a second, where on the successful calls this is normally in done twice, followed by a "TX: 15 bytes".
Are you using the receive data rate option (-r on the command-line, or "Rate" in the svdm.ini file) or the -R command-line option?
It appears you are running with the debug (-d) option, so I would expect the "rx_delay" setting value to be output to the debug log when SVDM terminates. Normally, this value would be 0.
I just committed a change that adds the clock values to several of the debug messages that will help me to eliminate the rx_delay setting/checking as the root cause of the issue (or not). Please run again with a build using that commit, or later.
This smells like an issue sending or receiving over the socket, but I can't tell if that 2-byte TX's are actually getting through to the receiving app or not. A TCP packet dump (e.g. using Wireshark) could be helpful to understand whether this is a transmit or a receive problem.
OK, I'll get your debug file after I've submitted this.
To answer your questions, no, dont use -r (or Rate) in the SVDM.INI file. And -d is only used since I started debugging this. (Not sure where the debug log is, but it outputs to the screen on exit: rx_delay: 0.)
I start SVDM with "-raw -d -l -p2021 pop %4.0.0 %2 %3 %4" (but generally only start the bat file with 1 argument)
Attached, I've created more debugging with dbgview - "a good session" and a "bad session", together with tcpdump on the linux side. (The times in the timestamps are in sync.)
I'm guessing (and my memory of how this starts), on it, the mailer sends 0d 0a to start the EMSI session (the two bytes), where the server responds with two EMSI_REQ, the client should respond with an EMSI_DAT.
In the good logs, you can see this starting.
In the bad logs, I'm thinking that SVDM doesnt receive (or pass through) the EMSI_REQs as received, so the mailer is still sending the 0d 0a to start the session...
W7-SVDM-DEON.zip
The Windows debug log is that thing that's viewable with the "dbgview" utility I referred you to. That said, the rx_delay value and other debug values that are output upon termination of SVDM are only logged to stdout (the console), not the debug log, and they're only output when the '-d' option is used. So I misspoke. In any case, an rx_delay value of 0 is good/normal.
The SVDM debug log output in your attached W7*.TXT files is not from the current revision of svdm.exe in Git (the clock values I recently added and was looking for in the log messages are not there). Do you need me to build a new version and put it up for download on Vertrauen or can you grab an artifact from gitlab.synchro.net?
Also, please attach the resulting cap/pcap files from Ethernet packet captures, not the plain text/hexadecimal output of tcpdump, so I can use a visual packet dissector (e.g. Wireshark), and not have to try and make sense of the layers of protocol headers in hexadecimal, myself.
Right, I wrote the message, and captures with tcpdump before I got your new debug one (to answer the question that you couldnt tell if the two byte TX's were getting through - from the tcpdump, those two bytes are 0x20 0x0d) - and to help determine if it was a sending or receiving issue. (I'm betting the latter, since those two bytes made it to the server's tcpdump capture...). It was failing at the time, so it was easy to quickly grab that tcpdump.
Anyway, since running the debug svdm (I got it from the build artifact from the pipeline linked to the commit 19545587 linked to this issue), it hasnt failed (two days so far). Nightly polling has worked, as well as the occasional forced poll...
I'll leave it running as is for a few days and see if it does eventually fail... If it does, I'll get you the pcaps from tcpdump as well...
OK, I think I have it this time: The zip file has 3 files:
W7-bad.log is the log from DBGVIEW, running for a few days...
w7-bad.pcap is from 8:53:23am (line 1117). Just before 8:53 it failed, but I didnt have tcpdump running. When I started tcpdump, the next two sessions worked OK (and mail transferred - so a capture of a "good session"). I tried twice 15 mins later and it failed (9:10:33am), so this has the first failed sesssion and the below log file has the second.
w7-bad-1.pcap is just a single session showing receipt of the two bytes, but not response passed back to the mailer. (9:11:23am - from line 1340).
I'm not really seeing any clues as to the cause yet - nothing to point the finger at SVDM, specifically.
So far the packet captures have been from the server-side (your "linux mailer"). Can you provide a packet capture for the Windows client side (running SVDM)? Hopefully this will prove if the missing 107 bytes (that POP is looking for as part of the EMSI exchange) were actually received by the network stack on the client.
00000224 10:10:25 PM [3768] SBBSVDM: recv returned 107
Also my TCP networking is a little rusty, but given this is TCP, from the tcpdump that I took it can validate that the receiver (Windows) got my packet right? IE (looking at the w7-bad-1.pcap file):
I ...fe98:b5d1 sent (the 107 bytes - confirmed by length 107) as SEQuence 1:108:
But this did give me an idea - I'm wondering if it is timing issue, ie: I send the EMSI_REQ immediately upon connection and perhaps it got lost (?) before the mailer was ready to consume it? Since the mailer hasnt seen it, it is sending the 0x20 0x0d and not getting a reply (because I already send the EMSI_REQ).
(Not sure why then restarting SVDM resolves it.)
So is there anyway to proove the mailer picked up the 107 bytes?
I'll also get my mailer, to resend the EMSI_REQ a second time (or even delay a few seconds after connect), to see if that helps...
OK, it is a timing issue, and perhaps a lost data issue?
My mailer should have been resending the emsi_dat again after 5 seconds (but wasnt because it didnt time out per-se, it was receiving from SVDM - albeit 0x20 0x0d).
When I fixed that, the session proceeded as normal. So I'll keep an eye on it and see if that does actually provide a workaround to the issue.
00000149 2:15:19 PM [2928] SBBSVDM: Modem response: CONNECT 960000000150 2:15:19 PM [2928] SBBSVDM: select returned 1 at 77298474800000151 2:15:19 PM [2928] SBBSVDM: recv returned 10700000152 2:15:22 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED00000153 2:15:22 PM [2928] SBBSVDM: New esc count = 0 at 77298801000000154 2:15:22 PM [2928] SBBSVDM: TX: 2 bytes00000155 2:15:23 PM [2928] SBBSVDM: TX: 2 bytes00000156 2:15:24 PM [2928] SBBSVDM: TX: 2 bytes00000157 2:15:24 PM [2928] SBBSVDM: TX: 2 bytes00000158 2:15:24 PM [2928] SBBSVDM: select returned 1 at 77298975200000159 2:15:24 PM [2928] SBBSVDM: recv returned 10700000160 2:15:25 PM [2928] SBBSVDM: TX: 2 bytes00000161 2:15:26 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED00000162 2:15:26 PM [2928] SBBSVDM: TX: 15 bytes00000163 2:15:26 PM [2928] SBBSVDM: select returned 1 at 77299119100000164 2:15:26 PM [2928] SBBSVDM: recv returned 1500000165 2:15:26 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED00000166 2:15:26 PM [2928] SBBSVDM: TX: 280 bytes00000167 2:15:26 PM [2928] SBBSVDM: select returned 1 at 77299200500000168 2:15:26 PM [2928] SBBSVDM: recv returned 30
I think the question is: where are the second 107 bytes?
The working log you sent previously, showed:
00000013 10:37:39 AM [3572] SBBSVDM: recv returned 107
00000016 10:37:43 AM [3572] SBBSVDM: TX: 2 bytes
00000018 10:37:44 AM [3572] SBBSVDM: recv returned 107
00000020 10:37:44 AM [3572] SBBSVDM: TX: 2 bytes
In the failing logs, SVDM is responding to the first 107 bytes received (with a 2 byte response), but never gets the apparently-expected second 107 message from the server.
Actually a response to an EMSI_REQ is an EMSI_DAT (not 0x20 0x0d) and the logs show the mailer responds to it when it is sent a second time.
There was a particular scenario that my mailer wasnt sending it a second time, and I think that was what resulted in each failed session.
IIRC, the process should be - upon connection:
Client sends something <-- I always thought this was optional, perhaps not. (eg: 0x20 0x0d)
Server sends EMSI_REQ
Client responds with EMSI_DAT
Thus it could well be that the first 107 bytes is actually sent to POP, but because it hasnt sent it's 0x20 0x0d it ignores it. (ie it's not ready to to start a session).
Now that I've fixed sending it again, I'll monitor it over the next few days - and if there are no more problems, I guess we can put it down to POP throwing away the first EMSI_REQ because it wasnt ready for it yet...
I put in a 2 second delay in my mailer when a TCP connection is made, but POP is still not reacting to the first EMSI_REQ (it still responds with 0x20 0x0d). When a second EMSI_REQ is sent 5 seconds later, it still responds with 0x20 0x0d, but then followed by an EMSI_DAT and all is good.
So my fix was to make sure that that second EMSI_DAT gets sent and all is good :)