Hi Peter:

于 2012年02月13日 23:09, Peter Barada 写道:

>> >         Here is my question:
>> >             1. Is my patch wrong?
>> >             2. Why the official yaffs2 code assume 3 chunkErrorStrike to
>> >                retire a block? Reduce to 1 chunkErrorStrike will wrongly
>> >                mark the good block bad?
>> >             3. Should I remove the patch?
>> >
>> >         Thanks a lot for your advice.
> Yes, your patch is wrong as any read error will retire the block.
>
> If you see bit-flips from data read out of MTD, then your NAND driver
> isn't properly using ECC to correct the data.  If MTD used ECC to
> correct the data you would see a -EUCLEAN return from MTD on read which
> will percolate through yaffs_HandleChunkError() - and increment the
> strike count.


    Thanks for your reply. Now I know patch is wrong. I've read the samsung
nand chip data sheet and anylyse the kernel log. I think so many blocks struck
out are produced by errors in write operation. But it's very strange why those
block went into program error state.  According to chip datasheet, if program
operation results in an error, map out the block including the page in error
and copy the target data to another block. Then it's reasonable for yaffs to
retire the block in yaffs_HandleWriteChunkError even if chunk Error Strike count
only be one. But why so many program errors? Any ideas?

    In addition, I used hardware ECC in MTD driver, the error correcting code
is hamming code. The nand chip is MLC mode, so hardware ECC can't correct multi
bit error and mtd return read error to yaffs, this may increase the number or
blocks struck out. I wondered how yaffs handle the uncorrectable bit error in
order to keep filesytem data reliability and integrality. If yaffs2 key data
read from nand is error in some bits, how can yaffs2 work without crash?

    Thanks again.

    Regards,

    Xueqin Chen