[Yaffs] ECC algorithm is hamming code now, will any new algorithm be enter YAFFS?

Charles Manning manningc2 at actrix.gen.nz
Sun Aug 7 20:13:48 BST 2005


On Sunday 07 August 2005 20:22, Thomas Gleixner wrote:
> On Sun, 2005-08-07 at 16:07 +1200, Charles Manning wrote:
> > > M-systems uses BCH code to support MLC inside its DOC products. This
> > > algorithm can detect 4bit and correct 4bit error. Will YAFFS employ
> > > any other new ECC algorithm?
> >
> > Are the 4 bits 4 bits per page or what? With most ECC structures used
> > with NAND, the ECC corrects one bad bit per 256 bytes. Correcting more
> > requires larger ECC areas and requires more ECC computation (hardware or
> > software).
> >
> > Since ECC is part of mtd (or whatever NAND layer you are using), this is
> > realy independent of YAFFS.
>
> The NAND/MTD layer supports a couple of different ECC solutions. The DoC
> devices use a Reed-Solomon code, which is supported by a en/decoder
> library.
> Reed-Solomon codes can correct and detect more errors than the
> SmartMedia Hamming Code which is the standard ECC since NAND came up.
> OTH such codes need hardware support because the calculation by software
> would be too time consuming. DoC's have a builtin hardware RS encoder.
>
> > It is also important to consider most likely failure modes. I am not
> > familair with MLC failure modes, but single bit errors (as corrected by
> > ECC) are typically very rare with NAND (as used by YAFFS). Double bit
> > errors are even more rare. I have done done tests a few times where over
> > 100Gbytes of data was written to a file system without a single bit of
> > corruption.  Since 100Gbytes translates into many lifetimes of most
> > mobile/embedded products, I am pretty confident that for most usages bit
> > errors are not a significant problem when used with single-bit ECC.
>
> 100GiB data related to which partition size ?
The partition size was about 450MB. The largest test was 300GB of actual NAND 
writing.

>
> Lets assume a 16 MiB partition, where you write 100GiB data. Lets
> further assume that we have a real sum of 256GiB of data written to
> FLASH due to garbage collection, wear levelling...
>
> 256 GiB / 16 MiB = 16384
>
> That means we erased / programmed each block of the FLASH 16k times.
> This is nowhere near the 100k erase cycles.

Yes, *most* systems never get anywhere near the 100k lifetime - a fact that 
should be kept in mind when worrying about lifetime and wear levelling 
issues.  For most mobile/embedded systems you can do a lifetime calculation 
something like:
 10Mbytes of data per day, 365 days per year, 10 years product life = 
36500Mbytes.
Say 16MB flash size: 36500/16 = 2281 cycles average.
Say *10 for garbage collection, skew etc = 22810 cycles

16MB is probably unrealistically small for most devices that would see 
anywhere near this sort of traffic,

It is really up to the system integrator to work the numbers for a particular 
system.

>
> We conducted long time tests, where we definitely encountered multi bit
> errors in different contexts. The effects start to show up, when you
> reach the 60k erase/program range. There are also known effects with
> occasional bitflips on read, but those seem to be single bit only.

Such test results are going to be dependent on many factors:
1) Were you doing partial page programming? THis hurts the flash more.
2) What flash? The newer stuff seems far more reliable than the older stuff.

The 100k lifetimes are based on using 1-bit ECC, and assume a few lost blocks.

Multi-bit errors are most likely to occur when you are doing partial page 
programming: something that YAFFS2 does not do.

YAFFS currently retires blocks if they show any ECC errors - read or write. - 
single or multi-bit. This might seem a bit conservative, but it is probably 
safer, based on the assumption that a block of NAND will encounter 1-bit 
errors before it encounters multi-bit errors.

#disclaimer: I have not tried out MLC and the above might not hold for that.

-- Charles






More information about the yaffs mailing list