Amazon Link 440

Tuesday, July 11, 2006

Initial response from JSL - design notes I

I (ralph) am considering building a replica of the ECD Micromind, a high-powered microcomputer designed in the late 1970s.
It did not establish itself in the marketplace, but the designers of the system are still around.


This e-mail exchange comes as a result of my Eagerness to build one.
It is interspersed with warnings and words of Wisdom from the original designer.
The Micromind relied on unusual system design rules, including asynchronous design]

>I'd love even more to have schematics so I can build a semi-clone.

No, trust me, you wouldn't.

Well, you might like to have schematics, so you could marvel at
them (not necessarily at their cleverness, mind you). But you don't
want to build one.

First, the systems were built on fairly large Augat wire-wrap
boards, which were machine-wrapped, except for two prototypes.
Olin wrapped the first; I'm not sure who did number two. Maybe
Dave A. (I forget how to spell his surname; he was a housemate in
the original ECD location at 232 Broadway, a residence in
Cambridge.) The boards alone cost hundreds each, and there were
three per system.

Each board was populated with something like 150 integrated
circuits. The complexity of the system excluding the 6502 surpassed
what was inside the nominal CPU in its 40-pin DIP.

> I have enough Apple 2 carcasses to get CPUs,

Those parts were not specified as fast enough. We used 4-MHz 6502s,
and the Apple systems ran quite a bit slower than that. The early
Apple 2 ran with a 2-MHz memory system, but ran the CPU at 1 MHz,
with half the cycles dedicated to the display subsystem. The late
models may have been faster, but I think the 6502 had been abandoned
(6502 for 68000) before DRAM got fast enough.

The processor and the bus were totally asynchronous. That is, the
CPU ran at a variable speed. The bus resembled a DEC Unibus, but we
went through some contortions to avoid infringing the DEC patents.
It used arbiters (asynchronous circuits composed of NAND gates and
Schmidt triggers) in a way that would jam FM radios if not shielded;
it radiated a lot of power at around 100 MHz, but not at any single
frequency. Most of the logic we designed was LS (low-power
Schottky, faster than TTL), but some was S (high-power, much faster
than TTL). The processor board alone had quite a bit of what must
now be unobtainium, including two different kinds of PROMs that may
no longer be made, except maybe the military versions, the DRAMs,
and a one-shot that was (sigh) single-sourced, an AMD part, because
none of the other vendors came close to its performance. (Boy were
we up a creek when they had process problems and couldn't make them
for a while; we had to make a variant design that used a different
component.)

We got 4-MHz processors. These could run at 125 ns for each clock
phase. However, the bulk of the work inside the CPU was done during
phase 1, so phase 2 could be 100 ns or less. A one-shot was used
p1, and p2 could vary from 100 ns to several microseconds, depending
on how long it took for the thing addressed to respond. Flat out, it
could run at 4.5 MHz, but that never lasted for long. The memory
subsystem was too slow. It used one-shots also; it always started
as soon as an address was available.

There was no cache as such, but when the CPU fetched a one-byte
instruction, it incremented the program counter to fetch the next
byte, but then failed to increment it on the second cycle since it
was on the cycle after that it fetched the next instruction. In
most 6502 designs, the byte would be read twice. We detected when
a one-byte instruction had been fetched (using an external opcode
lookup, about which more below), and did the second cycle at the
maximum speed, since the memory was not needed by the CPU.

An addition, having the address of the next instruction, we started
the memory during the second cycle of the first, effectively
pre-fetching that next opcode, so its first cycle could also run at
the maximum speed. Sadly, only a small portion of any actual
program is made up of one-byte instructions. Other cycles when the
memory was not needed were also done at full speed, e.g., the
penultimate cycle of a R/M/W instruction. There were a lot of
two-byte/two-clock instructions; I think we averaged out to about 1
MIP.

Interrupts were detected by a failure of the program counter to
increment following an opcode fetch. This also caused a transition
of the address space from user to supervisor mode, as appropriate
(the mode bit was implemented externally to the 6502).

The 6502 was not specified to do anything reasonable when an
undefined instruction was encountered; we were warned the CPU might
even hang. This was unacceptable, so we looked up each opcode in a
high-speed PROM at the end of each opcode fetch, and when an invalid
opcode was detected, the clock was stopped and a break instruction
substituted on the data bus to implement an invalid opcode fault.
This made the machine safe for running arbitrary code.

I think I recall we did some other opcode chicanery to implement
other instructions external to the CPU. (It was 28 years ago; I
don't recall everything equally well.) What I recall is that the
address space translation and other states had to be implemented on
the processor board, and some method of accessing those functions
was needed. Some of those functions *had* to be in the global
address space, so they could be accessed by other processors, but I
think there were some local-access provisions. Ah, one I'm fairly
sure of was extending the one-byte return-from-interrupt instruction
to a two-byte instruction where the second byte contained flags for
the memory mapping subsystem, i.e., whether to go back to user mode.

The Apple 2 used a straight bit-map in main memory for its graphics,
which were consequently low-resolution. They did a clever thing
with the frequencies so they could fake out the chrominance
subsystem in TV sets to give even-lower resolution in color. Plus
that took care of DRAM refresh.

We wanted the performance of the best displays we had seen over in
Tech Square, and color, but RAM was not cheap. The display board by
Jerry Roberts had two memory banks, one for text, and one for fonts.
The text memory was scanned to draw the screen, and used to look up
bitmaps which were then piped out to the display. We could put up
as much text and fancy fonts as the Knight consoles, but far cheaper
with less RAM. The price of RAM has been in free fall since, but in
those days it was not cheap. The width and height of the typefaces
was dynamically programable.

Graphics were a hack; it's amazing (to me) that Space War worked,
because it relied on dynamically changing part of the character set
to contain the graphics, so only a small fraction of the screen
could actually contain non-background graphics. It took a lot of
logic to do so much with so little memory, but the display load on
the processor board was low, because that logic did so much of
the work. First, separate memory banks meant the precious main
memory bandwidth was not used, and the hardware rasterized the
characters so the feeble CPU didn't have to do that.

The remaining board done by Roy Zito was the I/O board, which became
the catch-all for any system function not on a processor module or a
display module. There could be only one per system. If I recall
correctly, there were 14 I/O interfaces and some housekeeping
functions, including bootstrap (ROM and processor selection and
system reset) and the priority-interrupt scanner. Roy might
remember more precisely; he was unhappy about how the process made
him the garbage-man of the design.

This could probably be done quite handily all in one ASIC today,
but even if the 6502 and RAM were available macros, the rest would
be a redesign. You might get closer using FPGAs, but I don't know
enough about them. I would worry about the asynchrony and timing
elements; it might all become much simpler if redesigned with a
clock, at least if you are restricted the vocabulary provided by
available programable parts, or at least you'd have to use
different async design techniques.

> and that 'up to 15' 6502 multiprocessor sounds interesting.

That limit was pretty much dictated by the bus capacitance; the
drivers could not drive any more inputs (and wire) than that. The
global address space was 26 bits, or 64 megabytes, while a
processor board only contained 16 kilobytes (later this may have
been extended when bigger DRAMs became available). We spanned the
1K to 4K to 16K transitions, I think.

With an additional interface to bridge or network several such
systems, we envisioned up to 1000 processors. We could hardly
advertise such vaporware (we were already farther out on a limb
than we understood, but without even a working prototype, even we
could see the folly of announcing it).

> There are tidbits here and there, but I found my 29-year old issues
> of Kilobaud, with the full-page color ads.

Yes. With the cats. They were pretty, and very slick.

.... additional details available from the author

email response from JSL

No comments: