Computer: Storage: Memory
ECC on the Cheap   (0)  [vote for, against]
Compiler achieves seamless ECC at the software level

Sometimes stability and cost are far more important than performance. Sometimes all you can find is a Black Friday i3 when you really need a hardened 486. With ECC memory and supported motherboards costing a bit more than standard memory, surely somebody out there is looking for a cheap fault-tolerant system. Proposed is a compiler that rewrites all data and program structures to include fault tolerance at the software level. Even the execution stack would be structured to include fault tolerance. The operating system itself would have to be compiled with this compiler for the system to have real usefulness. The compiler would achieve this using legacy code and without exposing any of the underlying complications to the programmer unless desired. This would probably be ridiculously slow, but I'll bet for certain applications, such as for certain virtual machines or for certain embedded systems, the benefits would outweigh the performance hit. This is particularly true when a system is housing dozens of virtual machines but only a couple need fault tolerance. (Hardware fault tolerance is a different story altogether. Memory is usually the least of your troubles; software, power supplies, and storage devices cause a lot more trouble. This would be for niche applications since, for most applications, it's not hard to find an old server on eBay with yesterday's ECC RAM already installed. Also, most virtualized environments are already using ECC RAM.)
-- kevinthenerd, Nov 18 2013

VIPER microprocessor http://en.m.wikiped...IPER_microprocessor
Not a lot of detail, admittedly. [8th of 7, Nov 19 2013]

What is ECC? http://en.wikipedia.org/wiki/ECC_memory
Defines ECC [popbottle, Nov 19 2013]

<link>

// You can't get 'mission critical' on a budget. //

Really ? Oh dear … maybe you should let the designers of the 787's RDC network know about that.

Or then again, maybe not. It would only worry them, and it's too late now.
-- 8th of 7, Nov 19 2013


What little I know of computers is hopelessly out of date. So grain of salt.

What is the next step?

Is there any small part of " Proposed is a compiler that rewrites all data and program structures to include fault tolerance at the software level." that could be done in the small. The door mat or garbage disposal instead of the whole house. Do faults show up in any part on a regular basis?

A entire compiler seems like something too big just to prove the concept works.
-- popbottle, Nov 19 2013


The post reminds me of an old flame-war in alt.folklore.computers concerning what to do about C language "buffer overflow" exploits (which were actually "out of bounds" problems), the concensus being "hire competent programmers".
-- FlyingToaster, Nov 19 2013


// A entire compiler seems like something too big just to prove the concept works //

It depends on the complexity of the language.

For an 8-bit CISC microprocessor with limited registers, running native code, and a very simple HLL compiler that doesn't support libraries or linking - like a Tiny BASIC - then it's possible to design in a resonable degree of self-monitoring.

Go beyond that and the problems of validation increase exponentially.

The moment a real-time OS is introduced, it's impossible to achieve in any sensible timescale.

The best approach would most likely be to network huge numbers of PIC-16's, each one executing their own little crumb of code, and individually black-box tested.
-- 8th of 7, Nov 19 2013


// The software alone solution sounds like a non-starter as cached code (like the checking code) could get corrupted. //

So what if you have three copies of the checking code in memory running in 3 different threads?

One problem is that if the instructions get corrupted for one of the three threads, it could theoretically execute random actions and corrupt the other threads or override the protection system. Of course the probability of that is probably less than a 2-bit memory coruption, which will take down an ECC protected system anyway. I think having general purpose code runing in 3 threads will be hard to synchronize, especially when hardware access is involved, so unless you can carefully control the code execution so it only runs code from the memory cache that has already been verified, I'd say it's best to run an emulator on the 3-threaded code. You can't have the OS running non-ECC underneither this, so maybe take the VMWARE hypervisor and rewrite that to run in a three-threaded way. It could then run multiple VMs. Some could be direct hardware VMs with no ECC. Others could be emulated VMs with ECC. This approach would mean that you don't need to recompile most code (or rewrite the compiler), just carefully code your hypervisor.
-- scad mientist, Nov 19 2013


// It could then run multiple VMs. Some could be direct hardware VMs with no ECC. Others could be emulated VMs with ECC. //

That's going to be a big task-switching overhead, though. Better to have 3 separate isolated systems, and voting, as [bigs] suggested.

Anything that adds any complexity to the software is a Bad Thing.
-- 8th of 7, Nov 19 2013


Might it not be cheaper/simpler to put a bunch of lead shielding around the memory? My understanding is that a large fraction of memory errors are caused by cosmic rays.
-- MaxwellBuchanan, Nov 19 2013



random, halfbakery