[Yaffs] Sorry state of YAFFS1 :((

Sergey Kubushyn ksi at koi8.net
Mon Sep 19 02:16:58 BST 2005


On Mon, 19 Sep 2005, Charles Manning wrote:

> On Sunday 18 September 2005 11:25, Sergey Kubushyn wrote:
>
> > I'm not new to YAFFS and I do know how it works. MTD doesn't have
> anything
> > to it if one uses YAFFS ECC.
>
> That is not necessarily true. It depends on what you have enabled in
> mtd.

It's not all that much one can enable (or disable) in MTD these days. The
only option available is "Verify Writes". That is disabled.

> > As a matter of fact it looks like YAFFS1 DOES
> > work properly because my simple test with copying a file to it, then
> > removing it in an infinite loop worked just fine for 24 hours. Both
> with
> > YAFFS and MTD ECC.
>
> > But every single read and write produces a whole bunch
> > of those "**>>ecc error unfixed" and "**>Block XYZ marked for
> retirement"
> > messages. Wneh using MTD ECC /proc/yaffs shows 1 in "tagsEccUnfixed",
> > "eccFixed" and "eccUnfixed" both stay at 0.
>
> These warnings are trying to tell you something. If you just ignore
> them then
> expect problems.

I know it's supposed to be this way. But this part of YAFFS looks completely
bogus, kernel log gets completely flooded with those errors but /proc/yaffs
shows all 0's and everything seems to work fine surviving 24-hours torture
test. And it also generates that stream of errors when instructed NOT to use
MTD ECC. And it doesn't because MTD constantly complains about
reading/writing data without ECC. But in that case error counters in
/proc/yaffs are updated and error numbers are going over the roof. BTW,
_ALL_ those errors are generated by YAFFS, MTD stays silent and doesn't
complain.

> If you are getting these, then it suggests that both mtd and YAFFS are
> doing
> ECC or that the mtd is mashing something so that the ECC fields being
> used by
> YAFFS are incorrect.

It might be that MTD is mashing something, no doubts. But it doesn't look
like something's broken in MTD because it behaves exactly the same for the
last 2 years. If it were an MTD bug, 2 years is more than enough to fix it
(or fix YAFFS to work around it) and long forget that something like this
ever existed. However that infamous "**>>ecc error unfixed" is discussed in
the list every single week or two for the entire existance of the list. We
lived through two kernel generations in that time, everything got totally
broken and fixed several times but there is only one mantra from YAFFS guys:
"Fix the MTD"...

> > They are growing when using
> > YAFFS ECC. For me it looks like those are not errors per se but some
> bugs
> > in _REPORTING_ errors. Unfortunatelly I don't have time to fix it,
> we're
> > facing a deadline and there is a lot to be done besides YAFFS.
> >
> > > Outside of the mtd interfacing issues I can assure you that YAFFS
> (1
> > > and 2)
> > > work very well and provide a storage backbone for many
> high-reliability
> > > products. Those that are not using mtd (ie. are using YAFFS outside
> of
> > > Linux)
> > > generally have a much simpler time.
> > >
> > > The YAFFS2 stuff is generally a lot cleaner because it uses the new
> mtd
> > > model.
> > > A few changes are still required to the mtd to use the new model
> for
> > > 512-byte
> > > page devices with YAFFS1. When those changes are in place YAFFS1
> > > interfacing
> > > should become a lot cleaner. Behind the scenes there has been quite
> a
> > > bit of
> > > effort trying to straighten out issues by trying to get mtd to be
> more
> > > YAFFS
> > > friendly.
> >
> > I don't see how MTD might be unfriendly. Just pass a proper
> nand_oobinfo to
> > read/write_ecc and that's it. That is when using MTD ECC. When YAFFS
> ECC is
> > used there is no problems at all, MTD does NOT do any error checking,
> > everything is done in YAFFS guts. So no matter how one "fixes" MTD if
> YAFFS
> > itself generates error messages the problem is definitely inside
> YAFFS code
> > and in no way related to MTD. There are complains from MTD that is
> not wise
> > to read/write data without ECC, but they are harmless and they are
> supposed
> > to be output because YAFFS really reads/writes data without ECC.
>
> There are some areas of unfriendliness in that the warnings being
> generated
> are bogus. There is no "trust me I know what I am doing" flag that you
> can
> pass into mtd to make this shut up.

Yeah, it complains about reading/writing data without ECC. So what? Let it
complain; those complains are much smaller than a stream of "**>>ecc
unfixed"... Anyway, one can patch them outta MTD, no big deal...

> There are also an issue with the verification and how this interacts
> with the
> writing of deleted flags markers.

MTD verification is turned off.

> The nand_oobinfo method is broken as a general solution because it
> needs to be
> tweaked to match the hardware + nand driver being used. If you are
> using, for
> example, hardware ECC that uses some different location then the
> nand_oobinfo
> needs to be changed to work with that. These differences have made it
> hard to
> get one set of working settings.

It shows up no matter what. We do NOT use hardware ECC, just a single chip
64 MByte NAND. And everything looks pretty fine in nanddump output, there is
no apparent reason for YAFFS to complain. Even when NAND ECC is not used the
behavior is still the same.

BTW, there are no complains when mounting the FS or reading directories
(e.g. with ls -lR.) They are only produced when reading actual file
contents.

> The newer strategy is far cleaner since it does not rely on any
> particular
> placement but uses AUTO_PLACE. Actual placement is thus handled in one
> place
> only and YAFFS gets cleaner. There are still some issues because,
> AFAIK, the
> AUTOPLACE stuff is not yet fully implemented in mtd. This is what is
> holding
> back the shift to AUTOPLACE and a far cleaner future.
>
> All the testing I have done personally on yaffs in the kernel uses code
> that
> conforms to the NAND mtd interface, but does not use the mtd nand code
> behind
> the scenes. Thus, for real interactions with the mtd nand code I have
> had to
> rely on community feedback. I think I should also use the nand
> simulator to
> be able to reproduce this entirely myself.

There is another issue that was mentioned yesterday. I can't overwrite in
/etc with a setuid root program. E.g. passwd fails to change a password, vi
(that is a symlink to setuid BusyBox) can't save a new passwd (or any other
file under /etc), mount can not overwrite /etc/mtab etc. However that same
vi is perfectly fine when working with any file in any other directory
including a subdirectory under /etc. Non-setuid joe edits any file,
including passwd in /etc without problems. So does sed, awk etc. And the
most funny thing is the error code -- ENOTEMPTY...

It might be my own fault--I finished my system image late yesterday and
didn't have much time to check it--like bogus directory permission or
something but I doubt. Anyway I'll let you know about it tomorrow when I'm
back at my desk.

---
******************************************************************
*  KSI at home    KOI8 Net  < >  The impossible we do immediately.  *
*  Las Vegas   NV, USA   < >  Miracles require 24-hour notice.   *
******************************************************************




More information about the yaffs mailing list