Re: [Yaffs] Can removing chunkErrorStrikes check cause yaffs…

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Peter Barada
Date:  
To: yaffs
Subject: Re: [Yaffs] Can removing chunkErrorStrikes check cause yaffs2 too many Block struck out ?
On 02/13/2012 02:35 AM, CHEN XUEQIN wrote:
> Hi:
>
>      I've run yaffs2 in device, whose enviroment parameters :
>          * kernel 2.6.23
>          * cpu:  powerpc MPC8323
>          * nand: Samsung NAND 1GiB, 4 Level Cell, 3,3V 8-bit

>
>      Sometimes I found bits filp in some key files. The program
> which used the damaged file wound go wrong. This bits flip
> could not be detected by hardware ECC. I thought some aggressive
> error step should be token to reduce problem. So I changed source
> code in yaffs_HandleChunkError from:

>
> void yaffs_HandleChunkError(yaffs_Device *dev, yaffs_BlockInfo *bi)
> {
>     if (!bi->gcPrioritise) {
>         bi->gcPrioritise = 1;
>         dev->hasPendingPrioritisedGCs = 1;
>         bi->chunkErrorStrikes++;

>
>         if (bi->chunkErrorStrikes > 3) {
>             bi->needsRetiring = 1; /* Too many stikes, so retire this */
>             T(YAFFS_TRACE_ALWAYS, (TSTR("yaffs: Block struck out" TENDSTR)));

>
>         }
>     }
> }

>
> to:
>
> void yaffs_HandleChunkError(yaffs_Device *dev, yaffs_BlockInfo *bi)
> {
>     if (!bi->gcPrioritise) {
>         bi->gcPrioritise = 1;
>         dev->hasPendingPrioritisedGCs = 1;
>         bi->chunkErrorStrikes++;

>
>             bi->needsRetiring = 1; /* Too many stikes, so retire this */
>         T(YAFFS_TRACE_ALWAYS, (TSTR("yaffs: Block struck out" TENDSTR)));
>     }
> }

>
>      In the other words, with above patch, any write or verify error
> will cause yaffs2 to mark those block bad. Recently I found faults in
> some devices. The kernel print many continuous bad block. The log like this

>
> //3 continuous bad block
> block 773 is bad
> block 774 is bad
> block 775 is bad
> //7 continuous bad block
> block 777 is bad
> block 778 is bad
> block 779 is bad
> block 780 is bad
> block 781 is bad
> block 782 is bad
> block 783 is bad
>
> // 44 continuous bad block
> block 816 is bad
> block 817 is bad
> block 818 is bad
> block 819 is bad
> block 820 is bad
> block 821 is bad
> block 822 is bad
> block 823 is bad
> block 824 is bad
> block 825 is bad
> block 826 is bad
> block 827 is bad
> block 828 is bad
> block 829 is bad
> block 830 is bad
> block 831 is bad
> ...
> ...
>
>       Here is my question:
>           1. Is my patch wrong?
>           2. Why the official yaffs2 code assume 3 chunkErrorStrike to
>              retire a block? Reduce to 1 chunkErrorStrike will wrongly
>              mark the good block bad?
>           3. Should I remove the patch?

>
>       Thanks a lot for your advice.


Yes, your patch is wrong as any read error will retire the block.

If you see bit-flips from data read out of MTD, then your NAND driver
isn't properly using ECC to correct the data. If MTD used ECC to
correct the data you would see a -EUCLEAN return from MTD on read which
will percolate through yaffs_HandleChunkError() - and increment the
strike count.

You should dig into your MTD driver to verify that it not only writes
with ECC, but uses that ECC to correct any data read from NAND.


--
Peter Barada