[Yaffs] Bad block management

Charles Manning manningc2@actrix.gen.nz
Fri, 21 Jan 2005 08:17:52 +1300


On Thursday 20 January 2005 23:02, Jacob Dall wrote:
> Hello yaffers,
>
> I've a few questions regarding why yaffs' bad block management is desig=
ned
> the way it is.
>
> According to Toshiba, NAND failures can be distinguished as "permanent
> failures" or "soft errors"
>
> 1) Permanent failures: this error occurs when programming or erasing, a=
nd
> can be detected by reading the status register after operation.
>
> 2) Soft errors: this error occurs during a program, but can only be
> detected by reads. The error is cleared by a block erase.
>
> Now, upon read, if yaffs detects an unfixable ECC error in a page, the
> block holding that page is marked as bad. According to 2) it would be o=
k to
> just mark the page as discarded and let the garbage collector do its jo=
b -
> or have I missed something?

This mechanism was designed before Toshiba shared their wonderful documen=
t=20
with the world. I have considered changing this, but it has never been a =
very=20
high priority and it does put data at risk.

The "soft errors" are typically  write disturb failures that can (hopeful=
ly)=20
be fixed by ECC.  My concern is that if a block displays write disturb=20
problems then perhaps it is "going bad". ECC can only fix single bit erro=
rs.=20
I don't want to wait until it has "gone bad" and lost data before I retir=
e=20
it. I'd prefer to retire dodgy looking blocks earlier.

>
> In yaffs, a block is marked bad by writing 0 to byte 517 in page 0 / 1 =
in
> the block. Why wasn't it decided to use another value (for instance, li=
ke
> SmartMedia's 0xF0). Then it would have been possible to destinguish ini=
tial
> bad blocks from operational bad blocks.

This was considered. However I decided to use 0x00 because this would hav=
e=20
the most likelihood of programming a block where the bits don't "stick"we=
ll.=20
A sparse bit pattern  is less likely to program than all 0s.

THis could be changed quite easily.

Generally the factory marked bad blocks are not just marked with this byt=
e.=20
Mostly the whole OOB area or even the whole block is marked zero. THis=20
generally makes it easy enough to distnguish factor marked from YAFFS-mar=
ked=20
bad blocks.

>
> I've an issue with some of my devices - bad blocks is increased very
> rapidly. Beyond the fact that it's due to ECC read errors, I'm yet to
> discover the root of the problem.


I've done extensive lifetime testing on some devices. One test I did wrot=
e=20
approx 130GB stuff, read and verified it with not one ECC failure or bit=20
getting munged.

Some other people doing lifetime testing have expressed concern because t=
hey=20
lose 1-2% of flash during the lifetime of a device.

What do you mean by  rapidly? I assume it is far worse than either of the=
se!

If you're using Linuxx, then the most likely cuases of the problem are a =
miss=20
match between the ECC strategy you're using in YAFFS and what you have=20
configured in mtd.

>
> I'm not blaming yaffs - I'm sure the problem is to be found else where,=
 but
> I'm thinking really hard of making those changes to yaffs, making me ab=
le
> to get back to the state when the NAND was first taken into use.
>
> Please let me know your reasons / thoughts...

Being able to change the bad block marker would help you with bench testi=
ng=20
until you have fixed the real problem.

There are two things you could try:
1) In yaffs_RetireBlock, change the blockstatus to some easy to detect va=
lue=20
that has at least two zero bits (eg. 0xFC).
2) Or even turn off the writing of bad block markers completely.  This wo=
uld=20
cause problems in the file system state, but that probably does not matte=
r=20
for you at the moment.

Of course I'm assuming you just want to do these changes while you find a=
nd=20
fix the real problem.  I would not suggest shipping product with either o=
f=20
these changes.

>
>
> Thanks and regards,
> Jacob Dall
>
> FYI: the 'According to Toshiba' stuff was taken from a document named '=
NAND
> Flash Application Design Guide'

Great doc. Should be required reading for anyone working with NAND.

>
>
> _______________________________________________
> yaffs mailing list
> yaffs@stoneboat.aleph1.co.uk
> http://stoneboat.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs