Hi Ian, Thanks for taking the time to discuss this - I appreciate it! >> [yaffs] retires the block at erase time >I don't know about the real marking of bad blocks. We have >actually disabled this in some versions of products where we >were bitten by transient write errors causing large number of >blocks to be persistently marked bad (OOB) and taken out of >service. > ... Meaning that in your experience it's OK to just defer to the "write fail" mechanism - if it fails a write to page this time, then after erasure either it will fail the write to page again, or if the write to page happens to succeed then the data in page is reliably OK. Right? (with some trepitation) >> Is this right?  If so, it seems OK as long as bad pages within >> an eraseblock does not imply unreliability of other pages >> within the same eraseblock. > >The logic around declaring a block truly bad and broken is >lacking (both Yaffs and MTD). IRCC, NAND vendors recommend that >blocks should be erased when there are write/read errors, and >only marked bad if the erase fails, and then perhaps only after >several attempts. Neither Yaffs nor MTD to this. > [and from a later message] >We'd need the NAND vendors to reveal that, but I think it >reasonable to suspect that if a block is improperly erased that >any data subsequently written to that block is liable to >failure. But if an individual page is bad because of, say, >power loss at the time of the write, that the other pages within >that block would be solid. But this is JUST A GUESS. OK, the whole concept is a bit scary. But I guess an erase fail is more probable in a questionable eraseblock than a write fail of a member page before erasure and subsequent unreliable write success after erasure. If this is the case, then we're left with a discussion of how aggressive we should be about permanently retiring stuff, which is really just a discussion about how quickly the flash "wears out". That's not a big issue - but the possibility of writing data which later proves to be unreliable is. -Scott