SVDM - outbound calling issue

mentioned in commit e4bfafb6

Are you using the receive data rate option (-r on the command-line, or "Rate" in the svdm.ini file) or the -R command-line option?

It appears you are running with the debug (-d) option, so I would expect the "rx_delay" setting value to be output to the debug log when SVDM terminates. Normally, this value would be 0.

I just committed a change that adds the clock values to several of the debug messages that will help me to eliminate the rx_delay setting/checking as the root cause of the issue (or not). Please run again with a build using that commit, or later.

This smells like an issue sending or receiving over the socket, but I can't tell if that 2-byte TX's are actually getting through to the receiving app or not. A TCP packet dump (e.g. using Wireshark) could be helpful to understand whether this is a transmit or a receive problem.

mentioned in commit 19545587

OK, I'll get your debug file after I've submitted this.

To answer your questions, no, dont use -r (or Rate) in the SVDM.INI file. And -d is only used since I started debugging this. (Not sure where the debug log is, but it outputs to the screen on exit: rx_delay: 0.)

I start SVDM with "-raw -d -l -p2021 pop %4.0.0 %2 %3 %4" (but generally only start the bat file with 1 argument)

Attached, I've created more debugging with dbgview - "a good session" and a "bad session", together with tcpdump on the linux side. (The times in the timestamps are in sync.)

I'm guessing (and my memory of how this starts), on it, the mailer sends 0d 0a to start the EMSI session (the two bytes), where the server responds with two EMSI_REQ, the client should respond with an EMSI_DAT.

In the good logs, you can see this starting.

In the bad logs, I'm thinking that SVDM doesnt receive (or pass through) the EMSI_REQs as received, so the mailer is still sending the 0d 0a to start the session... W7-SVDM-DEON.zip

The Windows debug log is that thing that's viewable with the "dbgview" utility I referred you to. That said, the rx_delay value and other debug values that are output upon termination of SVDM are only logged to stdout (the console), not the debug log, and they're only output when the '-d' option is used. So I misspoke. In any case, an rx_delay value of 0 is good/normal.

The SVDM debug log output in your attached W7*.TXT files is not from the current revision of svdm.exe in Git (the clock values I recently added and was looking for in the log messages are not there). Do you need me to build a new version and put it up for download on Vertrauen or can you grab an artifact from gitlab.synchro.net?

Also, please attach the resulting cap/pcap files from Ethernet packet captures, not the plain text/hexadecimal output of tcpdump, so I can use a visual packet dissector (e.g. Wireshark), and not have to try and make sense of the layers of protocol headers in hexadecimal, myself.

Thank you!

Right, I wrote the message, and captures with tcpdump before I got your new debug one (to answer the question that you couldnt tell if the two byte TX's were getting through - from the tcpdump, those two bytes are 0x20 0x0d) - and to help determine if it was a sending or receiving issue. (I'm betting the latter, since those two bytes made it to the server's tcpdump capture...). It was failing at the time, so it was easy to quickly grab that tcpdump.

Anyway, since running the debug svdm (I got it from the build artifact from the pipeline linked to the commit 19545587 linked to this issue), it hasnt failed (two days so far). Nightly polling has worked, as well as the occasional forced poll...

I'll leave it running as is for a few days and see if it does eventually fail... If it does, I'll get you the pcaps from tcpdump as well...

Thanks and Merry Christmas :)

OK, I think I have it this time: The zip file has 3 files:

W7-bad.log is the log from DBGVIEW, running for a few days...
w7-bad.pcap is from 8:53:23am (line 1117). Just before 8:53 it failed, but I didnt have tcpdump running. When I started tcpdump, the next two sessions worked OK (and mail transferred - so a capture of a "good session"). I tried twice 15 mins later and it failed (9:10:33am), so this has the first failed sesssion and the below log file has the second.
w7-bad-1.pcap is just a single session showing receipt of the two bytes, but not response passed back to the mailer. (9:11:23am - from line 1340).

Hope this helps... SVDM-TAKE2.ZIP

mentioned in commit 2a6d9983

Okay, I added some more debug log output around select() for receive and incremented the version to 0.3.

Try that and please supply the debug log output again around a failure scenario (the packet captures aren't necessary right now).

I also did the drop file clean up thing you suggested on DOVE-Net.

Here you go:

00000212        10:10:25 PM     [3768] SBBSVDM: RX command: ' ATD010003001 ' at 455925935
00000213        10:10:25 PM     [3768] SBBSVDM: dial(010003001)
00000214        10:10:25 PM     [3768] SBBSVDM: Connecting to port 11179 at host 'd-11-1.leenooks.lan' via Raw
00000215        10:10:25 PM     [3768] SBBSVDM: C:\GitLab-Runner\builds\4S9KMgUy\2\main\sbbs\src\vdmodem\vdmodem.c 640 calling getaddrinfo
00000216        10:10:25 PM     [3768] SBBSVDM: C:\GitLab-Runner\builds\4S9KMgUy\2\main\sbbs\src\vdmodem\vdmodem.c 662 calling connect
00000217        10:10:25 PM     [3768] SBBSVDM: Setting socket options
00000218        10:10:25 PM     [3768] SBBSVDM: Socket send buffer length: 8192 bytes
00000219        10:10:25 PM     [3768] SBBSVDM: Socket receive buffer length: 8192 bytes
00000220        10:10:25 PM     [3768] SBBSVDM: Socket TCP_NODELAY: 0
00000221        10:10:25 PM     [3768] SBBSVDM: C:\GitLab-Runner\builds\4S9KMgUy\2\main\sbbs\src\vdmodem\vdmodem.c 714 connected!
00000222        10:10:25 PM     [3768] SBBSVDM: Modem response: CONNECT 9600
00000223        10:10:25 PM     [3768] SBBSVDM: select returned 1 at 455926172
00000224        10:10:25 PM     [3768] SBBSVDM: recv returned 107
00000225        10:10:27 PM     [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED
00000226        10:10:28 PM     [3768] SBBSVDM: New esc count = 0 at 455929222
00000227        10:10:28 PM     [3768] SBBSVDM: TX: 2 bytes
00000228        10:10:28 PM     [3768] SBBSVDM: TX: 2 bytes
00000229        10:10:29 PM     [3768] SBBSVDM: TX: 2 bytes
00000230        10:10:30 PM     [3768] SBBSVDM: TX: 2 bytes
00000231        10:10:30 PM     [3768] SBBSVDM: TX: 2 bytes
00000232        10:10:31 PM     [3768] SBBSVDM: TX: 2 bytes
00000233        10:10:32 PM     [3768] SBBSVDM: TX: 2 bytes
00000234        10:10:32 PM     [3768] SBBSVDM: TX: 2 bytes
00000235        10:10:33 PM     [3768] SBBSVDM: TX: 2 bytes
00000236        10:10:33 PM     [3768] SBBSVDM: TX: 2 bytes
00000237        10:10:34 PM     [3768] SBBSVDM: TX: 2 bytes
00000238        10:10:35 PM     [3768] SBBSVDM: TX: 2 bytes
00000239        10:10:35 PM     [3768] SBBSVDM: TX: 2 bytes
00000240        10:10:36 PM     [3768] SBBSVDM: TX: 2 bytes
00000241        10:10:36 PM     [3768] SBBSVDM: TX: 2 bytes
00000242        10:10:36 PM     [3768] SBBSVDM: hangup_event signaled at 455937636
00000243        10:10:36 PM     [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED
00000244        10:10:36 PM     [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED
00000245        10:10:36 PM     [3768] SBBSVDM: RX command: '  ATZ ' at 455937809
00000246        10:10:37 PM     [3768] SBBSVDM: Modem response: OK
00000247        10:10:37 PM     [3768] SBBSVDM: RX command: 'ATI ' at 455938062
00000248        10:10:37 PM     [3768] SBBSVDM: Modem response: Synchronet Virtual DOS Modem for Windows v0.3 Copyright 2022 Rob Swindell
00000249        10:10:37 PM     [3768] HEAD/2a6d9983
00000250        10:10:37 PM     [3768]
00000251        10:10:37 PM     [3768] OK
00000252        10:10:38 PM     [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED

I neglected to give you this when svdm exited:

rx_delay: 0
rx_buflen: 10000
largest recv: 1030
lasttx: 502008272
lastrx: 501996714
lastring: 0
timer: 502013604

I'm not really seeing any clues as to the cause yet - nothing to point the finger at SVDM, specifically.

So far the packet captures have been from the server-side (your "linux mailer"). Can you provide a packet capture for the Windows client side (running SVDM)? Hopefully this will prove if the missing 107 bytes (that POP is looking for as part of the EMSI exchange) were actually received by the network stack on the client.

SVDM confirms receipt of the 107 bytes:

00000224        10:10:25 PM     [3768] SBBSVDM: recv returned 107

Also my TCP networking is a little rusty, but given this is TCP, from the tcpdump that I took it can validate that the receiver (Windows) got my packet right? IE (looking at the w7-bad-1.pcap file):

I ...fe98:b5d1 sent (the 107 bytes - confirmed by length 107) as SEQuence 1:108:

09:11:24.459482 IP6 fd00:368:0:1000:20c:29ff:fe98:b5d1.11179 > fd00:368:0:1000:29a7:8f39:26b1:9e6.49366: Flags [P.], seq 1:108, ack 1, win 507, length 107
        0x0000:  6001 989b 007f 0640 fd00 0368 0000 1000  `......@...h....
        0x0010:  020c 29ff fe98 b5d1 fd00 0368 0000 1000  ..)........h....
        0x0020:  29a7 8f39 26b1 09e6 2bab c0d6 a286 a9c0  )..9&...+.......
        0x0030:  a8d2 2a12 5018 01fb eb44 0000 2a2a 454d  ..*.P....D..**EM
        0x0040:  5349 5f52 4551 4137 3745 0808 0808 0808  SI_REQA77E......
        0x0050:  0808 0808 0808 0808 5468 6973 2069 7320  ........This.is.
        0x0060:  6120 6d61 696c 206f 6e6c 7920 7379 7374  a.mail.only.syst
        0x0070:  656d 202d 2075 6e6c 6573 7320 796f 7520  em.-.unless.you.
        0x0080:  6172 6520 6120 6d61 696c 6572 2c20 796f  are.a.mailer,.yo
        0x0090:  7520 7368 6f75 6c64 2064 6973 636f 6e6e  u.should.disconn
        0x00a0:  6563 7420 3a29 0d                        ect.:).

The windows system ...26b1:9e6 provided both a SACK and an ACK ack 108:

09:11:24.462327 IP6 fd00:368:0:1000:29a7:8f39:26b1:9e6.49366 > fd00:368:0:1000:20c:29ff:fe98:b5d1.11179: Flags [.], ack 108, win 16533, options [nop,nop,sack 1 {1:108}], length 0
        0x0000:  6000 0000 0020 0680 fd00 0368 0000 1000  `..........h....
        0x0010:  29a7 8f39 26b1 09e6 fd00 0368 0000 1000  )..9&......h....
        0x0020:  020c 29ff fe98 b5d1 c0d6 2bab a8d2 2a12  ..).......+...*.
        0x0030:  a286 aa2b 8010 4095 a956 0000 0101 050a  ...+..@..V......
        0x0040:  a286 a9c0 a286 aa2b                      .......+
09:11:27.695760 IP6 fd00:368:0:1000:29a7:8f39:26b1:9e6.49366 > fd00:368:0:1000:20c:29ff:fe98:b5d1.11179: Flags [P.], seq 1:3, ack 108, win 16533, length 2
        0x0000:  6000 0000 0016 0680 fd00 0368 0000 1000  `..........h....
        0x0010:  29a7 8f39 26b1 09e6 fd00 0368 0000 1000  )..9&......h....
        0x0020:  020c 29ff fe98 b5d1 c0d6 2bab a8d2 2a12  ..).......+...*.
        0x0030:  a286 aa2b 5018 4095 5850 0000 200d       ...+P.@.XP....

The ACK included the two bytes (TX: 2 bytes) 0x20 0x0d - as sequence 1:3, which my linux server also ACKed as ack 3:

09:11:27.695901 IP6 fd00:368:0:1000:20c:29ff:fe98:b5d1.11179 > fd00:368:0:1000:29a7:8f39:26b1:9e6.49366: Flags [.], ack 3, win 507, length 0
        0x0000:  6001 989b 0014 0640 fd00 0368 0000 1000  `......@...h....
        0x0010:  020c 29ff fe98 b5d1 fd00 0368 0000 1000  ..)........h....
        0x0020:  29a7 8f39 26b1 09e6 2bab c0d6 a286 aa2b  )..9&...+......+
        0x0030:  a8d2 2a14 5010 01fb ead9 0000            ..*.P.......

But this did give me an idea - I'm wondering if it is timing issue, ie: I send the EMSI_REQ immediately upon connection and perhaps it got lost (?) before the mailer was ready to consume it? Since the mailer hasnt seen it, it is sending the 0x20 0x0d and not getting a reply (because I already send the EMSI_REQ).

(Not sure why then restarting SVDM resolves it.)

So is there anyway to proove the mailer picked up the 107 bytes?

I'll also get my mailer, to resend the EMSI_REQ a second time (or even delay a few seconds after connect), to see if that helps...

OK, it is a timing issue, and perhaps a lost data issue?

My mailer should have been resending the emsi_dat again after 5 seconds (but wasnt because it didnt time out per-se, it was receiving from SVDM - albeit 0x20 0x0d).

When I fixed that, the session proceeded as normal. So I'll keep an eye on it and see if that does actually provide a workaround to the issue.

00000149        2:15:19 PM      [2928] SBBSVDM: Modem response: CONNECT 9600
00000150        2:15:19 PM      [2928] SBBSVDM: select returned 1 at 772984748
00000151        2:15:19 PM      [2928] SBBSVDM: recv returned 107
00000152        2:15:22 PM      [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED
00000153        2:15:22 PM      [2928] SBBSVDM: New esc count = 0 at 772988010
00000154        2:15:22 PM      [2928] SBBSVDM: TX: 2 bytes
00000155        2:15:23 PM      [2928] SBBSVDM: TX: 2 bytes
00000156        2:15:24 PM      [2928] SBBSVDM: TX: 2 bytes
00000157        2:15:24 PM      [2928] SBBSVDM: TX: 2 bytes
00000158        2:15:24 PM      [2928] SBBSVDM: select returned 1 at 772989752
00000159        2:15:24 PM      [2928] SBBSVDM: recv returned 107
00000160        2:15:25 PM      [2928] SBBSVDM: TX: 2 bytes
00000161        2:15:26 PM      [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED
00000162        2:15:26 PM      [2928] SBBSVDM: TX: 15 bytes
00000163        2:15:26 PM      [2928] SBBSVDM: select returned 1 at 772991191
00000164        2:15:26 PM      [2928] SBBSVDM: recv returned 15
00000165        2:15:26 PM      [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED
00000166        2:15:26 PM      [2928] SBBSVDM: TX: 280 bytes
00000167        2:15:26 PM      [2928] SBBSVDM: select returned 1 at 772992005
00000168        2:15:26 PM      [2928] SBBSVDM: recv returned 30

The question is, where is the first 107 bytes? :)

I think the question is: where are the second 107 bytes?

The working log you sent previously, showed:

00000013 10:37:39 AM [3572] SBBSVDM: recv returned 107 00000016 10:37:43 AM [3572] SBBSVDM: TX: 2 bytes 00000018 10:37:44 AM [3572] SBBSVDM: recv returned 107 00000020 10:37:44 AM [3572] SBBSVDM: TX: 2 bytes

In the failing logs, SVDM is responding to the first 107 bytes received (with a 2 byte response), but never gets the apparently-expected second 107 message from the server.

Actually a response to an EMSI_REQ is an EMSI_DAT (not 0x20 0x0d) and the logs show the mailer responds to it when it is sent a second time.

There was a particular scenario that my mailer wasnt sending it a second time, and I think that was what resulted in each failed session.

IIRC, the process should be - upon connection:

Client sends something <-- I always thought this was optional, perhaps not. (eg: 0x20 0x0d)
Server sends EMSI_REQ
Client responds with EMSI_DAT

Thus it could well be that the first 107 bytes is actually sent to POP, but because it hasnt sent it's 0x20 0x0d it ignores it. (ie it's not ready to to start a session).

Now that I've fixed sending it again, I'll monitor it over the next few days - and if there are no more problems, I guess we can put it down to POP throwing away the first EMSI_REQ because it wasnt ready for it yet...

So appreciate your help working through this one.

I put in a 2 second delay in my mailer when a TCP connection is made, but POP is still not reacting to the first EMSI_REQ (it still responds with 0x20 0x0d). When a second EMSI_REQ is sent 5 seconds later, it still responds with 0x20 0x0d, but then followed by an EMSI_DAT and all is good.

So my fix was to make sure that that second EMSI_DAT gets sent and all is good :)

HNY.

closed

mentioned in commit 78c4c5d1

mentioned in commit 64188745

SVDM - outbound calling issue

Designs

Child items ...

Activity