- Details
- Written by Scott G Taylor
During the Fall of 2021, my Nissho PDP-11 (11/94 equivalent CPU in a PDP-11/34a BA11-K and DD11-PK chassis set-up) was humming along unmolested when it suddenly stopped communicating over Ethernet.
Installed in that machine was a Digital DELUA M7521 Network Interface board. And it decided to quit working.
Ethernet networking is important to me to be able to transfer large amounts of data from my simh PDP-11 2.12BSD (My versioning of a completely refactored 2.11BSD with many extensions) to the real PDP-11 machine for boot testing and performance. I also use that machine for imaging RL02 drive packs and utilizing SMD disks (CDC Sabre SMD, Mitsubishi SMD drives and a CDC RSD 80 megabyte cartridge drive).
There's really no finding DELUA boards easily, or affordably, any more.
My diagnosis, and resulting easy repair, follows.
The first thing that I noticed was that the power light was not illuminated on the transceiver bulkhead panel for the cable connection to the DELUA board.
The DELUA User Guide, EK-DELUA-UG-002, section 6.2.1 "Selftest" under "Troubleshooting" on page 6-1, states the following about indicator light at the bulkhead:
"If the DELUA fails its external loopback test, check the LED on the UNA bulkhead assembly that monitors the -15 V supply to the transceiver. There is also a circuit break on the UNA bulkhead assembly for the -15 V power to the transceiver."
The fuse at the bulkhead assembly was OK. But a VOM meter shows no voltage at the fuse.
Time to check the -15 V path coming from the backplane. According to the chart provided on page 2-2,
the -15 V feed is at "finger" F (bottom left-most edge connector #6 when facing component side), side B (aka Side 2), Pin 2 from the right.
Here's the pin on the connector:
I followed that thick trace from pin 2, along the edge of the board, going up to the top corner where the chassis lever is, to its termination at a through-hole junction to the component side. The component side is a through-hold fuse component which returns back to this side at the "Post-fuse connection" point identified. This circuit continues to the pin on the berg header to the left to which the bulkhead cable attaches:
I tested the voltage at the top - 15 V connection, and the voltage was there. The bottom connection was dead. I had found the failure.
On the component side, this is the -15 V circuit through-hole fuse component (after replacement):
I did not know the rating of the fuse that failed, not having found a schematic for the DELUA M7521 board. I did have, however, a failed DEQNA (Qbus Ethernet) board that had the same on-board fuse. I removed it from the scrap DEQNA, soldered it onto the DELUA, and it is now repaired and working. The full CZUAD diagnostics have passed.
I wish all board repairs were so easy, particularly because I do not have board extenders to diagnose faulty boards in-situ. Logic analyzers are not usable in the tight confines of a UNIBUS chassis with its board spacing.
- Details
- Written by Scott G Taylor
Today, while porting the MRY Print Spooler banner-page language processor to TOPS-20, I discovered that the KCC C Compiler does not support binary constants.
I really didn't want to re-code 4,600 lines of big-letter font C-headers.
So I added binary constants to KCC-6. It was trivial.
$diff -C 7 tops20:<kcc-6.kcc>cclex.c cclex.c
*** dist:<kcc-6.kcc>cclex.c Sun Mar 6 23:58:24 2022
--- work:<kccdist.kcc-6.kcc>cclex.c Mon Mar 7 20:27:17 2022
***************
*** 288,301 ****
--- 288,306 ----
if ((c = *cp) == '0') { /* Octal/Hex prefix? */
c = *++cp;
if (c == 'x' || c == 'X') { /* Hex (base 16) */
while (isxdigit(c = *++cp)) {
if (v & (017 << (TGSIZ_LONG-4))) ovfl++;
v = ((unsigned long)v << 4) + toint(c);
}
+ } else if (c == 'b' || c == 'B') { /* Binary (base 2) */
+ while (((c = *++cp) == '0' || c == '1')) {
+ if (v & (01 << (TGSIZ_LONG-1))) ovfl++;
+ v = ((unsigned long)v << 1) + toint(c);
+ }
} else { /* Octal (base 8) */
while (isodigit(c)) {
if (v & (07 << (TGSIZ_LONG-3))) ovfl++;
v = ((unsigned long)v << 3) + c - '0';
c = *++cp;
}
if (isdigit(c)) { /* Helpful msg for common error */
- Details
- Written by Scott G Taylor
The SIMH PDP-10 simulator from Richard Cornwell was updated to his Version 3 recently. See his posting to the SIMH message board on groups.io: https://groups.io/g/simh/message/1111
Having learned that it was supporting the DBD9 disk images for RP07, I decided to set up his version to boot my KLH10 images of TOPS-20 7.1 (my PANDA and SRI hybrid).
Things did not go particularly well, however.
I copied my system disk images (A dual-RP07 pair), and booted to standalone. I disabled most all services (networking, etc) in TOPS-20 and then shut down. I then disabled networking (the NI20 device) in SIMH, and then booted again.
Once booted, I ran a Dhrystone test on the Cornwell's SIMH PDP-10 (KL10B) and under KLH10.
SIMH reported 12936 dhrystones/second.
KLH10 reported 28355 dhrystones/second.
This was all fine and well, as I didn't expect SIMH to perform as fast as KLH10.
The problems encountered were due to the crashes that occur when booted under SIMH when any code compiled with the KCC C compiler is run. Many programs seem to just hang, but can be stopped with ^C. Some sort-of work, but TTY output is problematic. A simplified UNIX cat(1) program doesn't output to the TTY correctly. And many programs will crash TOPS-20. While I'm aware that there is code in KLH10 to accommodate the "Panda Lights" console-light hardware, I do not know of any changes to the underlying machine simulation particular to the Panda or SRI TOPS-20 kernel changes, nor any changes made to accommodate KCC C-generated machine code.
There is also something amiss with the SIMH PDP-10 timer support. Calls to the DISMS monitor JSYS result in waits (dismissals) that are 4 to 5 times longer than the milliseconds requested. That is, a one-second wait/dismissal request doesn't return for 4-5 seconds. Timer-based macro programming through DISMS is failing under SIMH.
I may investigate the SIMH PDP-10 issues/limitations at a later date. For now, I have satisfied my curiosity about the SIMH-based PDP-10 simulation functionality.
- Details
- Written by Scott G Taylor
When connecting via TELNET to a running KLH10-simulated OS instance from a FreeBSD machine, I have found that the connection was being dropped after two hours. The following is what I learned about FreeBSD, and how I resolved the issue.
After an upgrade of my KLH10-hosting FreeBSD platform from FreeBSD 12-RELEASE to FreeBSD 13-RELEASE, TELNET connections began dropping after two hours of inactivity. I had not associated this with the FreeBSD upgrade, and didn't take time to diagnose the problem right away. I did determine that the disconnections did not occur when originating the TELNET connection from Linux-based machines. At that point I opted to create a relay-login account on Linux to perform the TELNET, and I would investigate later.
Today is "Later."
I've whipped out the tcpdump(8) and monitored the TELNET session.
In the output and texts below, the following hostname and operating systems are at issue:
TELNET Client host: shell.mrynet.com running FreeBSD 13.0-RELEASE
TELNET Server host: tops20.mrynet.com running TOPS-20 under the KLH10 PDP-10 simulator.
The actual OS hosting the KLH10 PDP-10 simulator is not at issue, since the raw packets are passed to the TOPS-20 OS and are not otherwise acted upon by the hosting OS.
The following tcpdump(8) output was captured on the KLH10 simulator host with:
# tcpdump -nnSX -i em2 host tops20.mrynet.com and port 23
(em2 is a dedicated ethernet interface name on the simulator host logically attached to the KLH10 simulator):
21:53:43.950967 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1102, win 65535, length 0
21:53:43.981740 IP tops20.mrynet.com.telnet > shell.mrynet.com.21707: Flags [P.], seq 1102:1131, ack 105, win 1356, length 29
21:53:43.982567 IP tops20.mrynet.com.telnet > shell.mrynet.com.21707: Flags [P.], seq 1131:1132, ack 105, win 1356, length 1
21:53:43.982697 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1132, win 65535, length 0
23:53:44.029075 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1132, win 65535, length 0
23:54:59.029593 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1132, win 65535, length 0
23:56:14.029869 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1132, win 65535, length 0
23:57:29.036743 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1132, win 65535, length 0
23:58:44.044758 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1132, win 65535, length 0
23:59:59.058763 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1132, win 65535, length 0
00:01:14.071775 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1132, win 65535, length 0
00:02:29.117760 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [.], ack 1132, win 65535, length 0
00:03:44.133015 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [R.], seq 105, ack 1132, win 0, length 0
00:03:44.134285 IP tops20.mrynet.com.telnet > shell.mrynet.com.21707: Flags [.], ack 105, win 0, length 0
00:03:44.134385 IP shell.mrynet.com.21707 > tops20.mrynet.com.telnet: Flags [R], seq 2319642573, win 0, length 0
Shown in the above log section are the following activities:
21:53:43.982697 is the timestamp of the last TELNET connection transmission, this one being an ACK from the TELNET client.
23:53:44.029075 is the timestamp of the FreeBSD TELNET client-side host sending a keep-alive probe on the TELNET connection to the TOPS-20 TELNET server emulated OS. This is odd, since the TELNET protocol itself does not open the TCP connection with keep-alive (SO_KEEPALIVE) enabled, and TCP connections themselves are not required to employ such functionality. The keep-alive packet is sent by the FreeBSD TELNET client's Host, and not the TELNET client itself. It is being used as a connection keep-alive probe utilizing an ACK-flagged packet with no data payload. The TOPS-20 OS does not respond to this packet.
23:54:59.029593 \
23:56:14.029869 |
23:57:29.036743 |
23:58:44.044758 > These are additional duplicate keep-alive packets sent by the FreeBSD TELNET client host.
23:59:59.058763 |
00:01:14.071775 |
00:02:29.117760 /
00:03:44.133015 This packet is sent by the the FreeBSD TELNET client's host to finally shut down the TCP connection carrying the TELNET communications. It is sent as a result of the FreeBSD client host not having received responses to the keep-alive packets.
00:03:44.134285 is the response from the TOPS-20 OS dutifully complying with the connection shutdown. TOPS-20 will have notified the TELNET server of the connection being closed.
00:03:44.134385 is the final packet from the the FreeBSD TELNET client host closing the TCP connection.
While I do not know why this has suddenly become an issue when upgrading my FreeBSD systems, I have tracked down the functionality within FreeBSD that controls these keep-alive packets being sent on established TCP connections.
The four relevant FreeBSD sysctl(8) settings are in the net.inet.tcp sysctl(3) grouping. They are listed below along with their default values:
net.inet.tcp.always_keepalive: 1 (boolean)
net.inet.tcp.keepidle: 7200000 (msec)
net.inet.tcp.keep.intvl: 75000 (msec)
net.inet.tcp.keepcnt: 8 (packets)
The descriptions of these settings, from the Freebsd tcp(4) man page, are:
net.inet.tcp.always_keepalive
When set, assume that SO_KEEPALIVE is set on all TCP connections, the kernel will periodically send a packet to the remote host to verify the connection is still up.
net.inet.tcp.keepidle
Amount of time, in milliseconds, that the connection must be idle before keep-alive probes (if enabled) are sent.
net.inet.tcp.keepintvl
The interval, in milliseconds, between keep-alive probes sent to remote machines, when no response is received on a keep-alive probe.
net.inet.tcp.keepcnt
Number of probes sent, with no response, before a connection is dropped.
These settings, and their default values, result in the following explanation of the packets at issue:
- Since net.inet.tcp.always_keepalive is enabled, FreeBSD will perform the keep-alive operation on all TCP connections being originated. In this case, the TELNET client host will perform the keep-alive process on the TELNET connection.
- The keep-alive probes begin at exactly 7200000 milliseconds (7200 seconds, or 120 minutes) of idle time (net.inet.tcp.keepidle) on the connection.
- The FreeBSD TELNET CLIENT host (shell.mrynet.com), and not the TELNET client itself, sends 8 (net.inet.tcp.keepcnt) keep-alive packets at 75000 millisecond (75 second) intervals (net.inet.tcp.keepintvl).
- After no response to the the 8th packet, for a total of 10 minutes of keep-alive attempts, the FreeBSD TELNET Client host closes the TCP TELNET connection.
- There was a total of 7800 seconds (130 minutes) of time since the TELNET connection was used until the connection was closed by the FreeBSD client host.
The TCP specification does not state that any keep-alive protocol should be employed. And it hasn't been for TELNET connections previously.
The resolution (workaround?) for this situation is to turn off the always_keepalive setting on the host running the TELNET client to connect to the TOPS-20 system instance.
I do not know if this is purely specific to FreeBSD, but I do know that it isn't occurring with the Ubuntu Linux kernel 5.4.0.
Regardless, the specific action to fix this is to run the following command on the TCP-connection-originating FreeBSD host:
# sysctl -w net.inet.tcp.always_keepalive=0
Page 1 of 5