[Yaffs] Some clarifications of AUTOPLACE with mtd & YAFFS2

Mon Oct 17 23:35:36 BST 2005

Hi all

I would like to let the vitriol of the last few week or so slide, but a lot of 
it contained some technical content that was wrong and thus I'd like to 
straighten out a few points on YAFFS2 and AUTOPLACE, hopefully in a 
reasonably coherent fashion.

Three main assertions that have been made are wrong:
1) mtd interfaces are "golden" and cannot be changed. Thus YAFFS must go 
through any required tricks to make a working solution.
2) That it is both possible and wise to attempt this.
3) I have not a clue what AUTOPLACE is all about and how it should work.

Below I make some comments on what Thomas has done. I am not levelling blame 
at him. He is a very busy man and works to a set of priorities that are not 
always what everyone else wants (besides, it is all open source so someone 
else could also jump in and fix ;-)). I have the highest regard for Thomas' 
knowledge and work.

A POTTED HISTORY OF THE AUTOPLACE NAND INTERFACE

Those that have been around YAFFS a long time will remember the original 
integration problems because YAFFS wanted to see things that mtd was not 
making available, in particular the oob info. To get this working, the first 
releases of YAFFS needed mtd patches to work. At that time, mtd nand support 
was very early days and soon the patch was no longer required as the mtd 
layer came up to speed (damn good effort from Thomas).

YAFFS1 is designed for Smartmedia data layout and thus there are the 
hard-wired yaffs_Spare structures etc. Thus, when more, and different flash 
layouts were introduces, things like the oob_sel were introduced to allow the 
generic nand code to work with various byte-layouts and ECC mechanisms.  
oob_sel has also changed  a few rimes. YAFFS1 suffered some minor traumas due 
to this, but nothing too bad. 

When I started designing YAFFS2, about 2.5 years ago now, one of the main 
goals was to be able to support a much wider range of NAND devices and work 
with different hardware ecc etc.  So, one of the first areas that was 
designed was the NAND interface. Clearly fixed binary structures would not 
work properly and a level of abstraction was required. This drove the move 
from yaffs_Spare to yaffs_ExtendedTags and the packed tags mechanisms.

As things slowly unfolded, I realised that it would be a better thing to get 
buy-in from Thomas sooner, rather than layer, as to how to progress 2k and 
other support in mtd. It is far better to get a sensible interface in place 
earlier than try to retro-fit  things later. As YAFFS is a customer to those 
interfaces, it made sense to get involved and discuss things. So in late 
2003, Thomas and I started some discussions on this subject. [Much of the 
discussion was on IRC, and some was email. I deleted some of my old email, 
but I still have some stuff I sent in October 2003]. My starting point was a 
more abstract NAND interface that did not require any knowledge of actual 
byte placement. [See below for a technical outline of the rationale]. Thomas, 
being a very knowledgeable fellow, also brought along a bunch ideas - many of 
them on the same wavelength.  The results of all this were a definition of 
the functionality, but not an absolute function call definition, of AUTOPLACE 
and abstract bad block handling.

I then continued with YAFFS2 development, outside of Linux, and a YAFFS2 
prototype was being stress tested by Christmas 2003. In January it was being 
stress tested on large arrays of 2k page NAND. Note that this was a stripped 
YAFFS1 and did not support simultaneous yaffs1/2 functionality or yaffs1 
compatability.

In approx April 2004, Thomas started working on the AUTOPLACE and bad block 
handling. IIRC, this was all in place by the end of May. 

I then set about retooling yaffs2 to support both yaffs2 and yaffs1 formats 
through a backward compatability layer (tags compatability) - in essence a 
fusion of both the YAFFS1 and YAFFS2 prototype. This effort got somewhat 
delayed by me taking some time off from YAFFS for personal reasons.

When I started checking out the YAFFS2<->mtd interfacing I used a small ram 
emulation driver that I hacked up quickly. This conformed to the interface 
that Thomas and I agreed on.

Then YAFFS2 was released on the world. A few people picked it up and started 
playing. There were a few things that were patched that fixed some of the 
fusion problems (eg. Nick Ban'es patches for compatability mode).  
Unfortunately it takes time before people switch over and start testing with 
the 2k page devices. Some of the bugs discovered were due to some problems in 
the mtd not doing the AUTOPLACE properly (!!shock! horror! Thomas is 
human!!). JFFS2 had not shown this because it does not rely on oob to the 
same extent. This was discussed on the list sometime in May 2005. Some 
aspects of this were corrected, but it is not yet fully sound.

Since then, the AUTOPLACE thing has been somewhat of an open wound, awaiting 
resolution. During that time, pragmatic people have worked around the 
problem, however it has not been cured yet.

The good news though is that a fine fellow by the name of Vitaly Wool has 
started look at at some issues and clearly identifies the problem (see 
http://lists.infradead.org/pipermail/linux-mtd/2005-September/013949.html). I 
therefore hope for some resolution to this pretty soon. If Vitaly's 
suggestions come into being then the current code in YAFFS2 will work as it 
is with no modifications.

So that is pretty much a history of the AUTOPLACE business. Things don't 
always progress the way one hopes, and looking back I'd have done some things 
a bit differently.

As to these mtd interfaces being golden kernel interfaces: Not many people 
think so. Thomas doesn't. grep a whole Linux+yaffs source tree for read_oob 
and you'll only see references in mtd and yaffs. It is not like we're saying 
"change kalloc".

There is no immediate solution, but it looks like the proper solution should 
be there soon. I see Vitaly has proposed some patches. Thomas has asked me to 
look at them and comment. 

-- CHarles

[Technical side bar: Why the correct solution is to fix it in mtd]

There are many NAND types and implementations of hardware ECC etc, which means 
there are a lot of different ways to to bad block management and many 
different byte layouts on NAND.  For instance:
1) The default nand_base uses bytes 0 and 1 for bad block marking.
2) The HW_ECC for the S3C2410 uses these bytes for ECC.
3) Some, perhaps not designed yet, hardware might do something entirely 
different.

Further, one NAND driver might be serving up data to many file systems, so it 
is unreasonable to have a file system's preferred binary layout  This might 
even be impossible with some hardware anyway.

One of the fundamental tenets of Computer Science is to use abstract 
interfaces to hide detail. It is this thinking which lead to AUTOPLACE and 
the abstract bad block handling interface. This is nothing new and examples 
of abstract interfaces abound.

So the AUTOPLACE handling is there to provide a mechanism for abstracting away 
the physical location of oob bytes. It says "here are some bytes, save them 
whereever. When I ask for them back, then give them back. What every you do 
behind thescenes I don't want to know".

With an abstract interface, we can get around a lot of problems quite neatly:
1) Changes in mtd don't mean changes to YAFFS, or other file systems.
2) Implementation of some funky handware can be done without implementing new 
fields in oob_sel structures and having to fiddle multiple bodies of code. 
Optimal handling of data (for example special hardware tricks can be 
exploited).
3) Change in one place, test in one place, then it should "just work".

If we don't use an abstract interface then we get into a world of pain and we 
end up with code looking like 
http://www.aleph1.co.uk/pipermail/yaffs/2005q4/001581.html, which still does 
not handle all cases. This raises a bunch of problems including, but not 
limited to:
1) Replicated code. The algorithms are already in mtd so we should rather use 
them there.
2) mtd interaction. mtd changes. oobsel will most likely change again (it is 
now in about its third or fourth iteration). Don't want to have to change 
YAFFS and put in conditional code for even more changes.
2) Maintenance woes: We want to test YAFFS *once* and not have to test it 
against a zillion different hardware types etc and handle all the associated 
patches etc. When a problem is found, strict adherence to interfaces helps 
isolate problems.
3) If someone makes some new hardware/drivers that do not use oobsel, then we 
don't want to have to have to expose more detail to the outside world.
4) It is insane to use an abstract AUTOPLACE interface for some accesses, and 
not for others. It is important to use abstract interfaces consistently 
otherwise why have them at all?  {Analogy: When you write data to a serial 
port driver, you use some relatively abstract interface like write_byte(), 
you don't do something like outp(dev->uart->tx,b).]

-- Charles