[Yaffs] YAFFS2 and kswapd dead lock problem

Lawson.Reed Reed.Lawson at IGT.com
Fri Dec 2 18:56:07 GMT 2005


Found it.... took me 5 work days. 
And this deadlock issue IS in the YAFFS2 tip code on CVS here:
http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c?rev=1.34&view=auto

So, no one has seen this???

Here is what is happening:

Process 'A' grabs the YAFFS2 grossLock.
Process 'B' preempts and it's job is to free unused inodes everywhere.
(hint: 'B' is kswapd). So, 'B' sets I_FREEING. Then it calls
yaffs_clear_inode() which needs the grossLock. So, it goes 
on the wait queue because 'A' has the grossLock.

Now process 'A' runs. It's holding the grossLock. It calls 
yaffs_get_inode() which calls BACK UP to iget()... With
the grossLock held! That calls find_inode(). It finds 
I_FREEING set and then gets put on a wait queue in 
__wait_on_freeing_inode().

Presto chango deadlock.

So, my solution is to make sure the grossLock is not held when
calling yaffs_get_inode(). Plus, I added grossLocking to 
yaffs_read_inode() since NB's comment in there is no longer
true.

I ran my 20 thread torture test which usually deadlocks in under 30 seconds.
It ran overnight with this fix. The test found no compare 
errors in the 20 files that it reads and writes at random times with
random data and random lengths.

So, I strongly suggest that someone close to the YAFFSs effort review
this change and incorporate it. I am kinda new to all this and I'm
not even sure what the correct way to submit the changes are.
So, let me know how I can help.

Are you sure you have never seen any infrequent unexplained deadlocks
with YAFFS2?

Thanks,

__________________________________
Reed Lawson
IGT Firmware Engineering
(775) 448-0755



> -----Original Message-----
> From: yaffs-bounces at stoneboat.aleph1.co.uk
> [mailto:yaffs-bounces at stoneboat.aleph1.co.uk]On Behalf Of Lawson.Reed
> Sent: Wednesday, November 23, 2005 12:47 PM
> To: yaffs at stoneboat.aleph1.co.uk
> Subject: RE: [Yaffs] YAFFS2 and kswapd dead lock problem
> 
> 
> Hi again.
> 
> I did some more testing and discovered to my dismay that its
> not JUST kswapd that is causing this... Because I still
> see the deadlock even without kswapd running :-(
> I caught ftp and my application crossing swords as well.
> (*sigh*)
> 
> Oh, I know someone is going to ask for this, so here it is:
> (and the $Id's are coming from MY checkin on MY
> CVS server... so guess that does not help much. Sorry)
> 
> /> cat /proc/yaffs
> YAFFS built:Nov 22 2005 16:18:11
> $Id: yaffs_fs.c,v 1.1 2005/06/29 22:14:08 rlawson Exp $
> $Id: yaffs_guts.c,v 1.2 2005/07/11 22:05:18 rlawson Exp $
>  
> Device yaffs
> startBlock......... 1
> endBlock........... 8191
> chunkGroupBits..... 3
> chunkGroupSize..... 8
> nErasedBlocks...... 7709
> nTnodesCreated..... 1900
> nFreeTnodes........ 0
> nObjectsCreated.... 100
> nFreeObjects....... 85
> nFreeChunks........ 494852
> nPageWrites........ 0
> nPageReads......... 0
> nBlockErasures..... 14
> nGCCopies.......... 1
> garbageCollections. 1
> passiveGCs......... 1
> nRetriedWrites..... 0
> nRetireBlocks...... 0
> eccFixed........... 0
> eccUnfixed......... 0
> tagsEccFixed....... 0
> tagsEccUnfixed..... 0
> cacheHits.......... 1434
> nDeletedFiles...... 3
> nUnlinkedFiles..... 19
> nBackgroudDeletions 0
> useNANDECC......... 1
> isYaffs2........... 1
> grossLock count.... -2
> />
> 
> So, again, if any one has ideas, or has seen this before,
> let me know.
> 
> __________________________________
> Reed Lawson
> IGT Firmware Engineering
> (775) 448-0755
> 
> 
> 
> > -----Original Message-----
> > From: yaffs-bounces at stoneboat.aleph1.co.uk
> > [mailto:yaffs-bounces at stoneboat.aleph1.co.uk]On Behalf Of 
> Lawson.Reed
> > Sent: Wednesday, November 23, 2005 8:58 AM
> > To: yaffs at stoneboat.aleph1.co.uk
> > Subject: [Yaffs] YAFFS2 and kswapd dead lock problem
> > 
> > 
> > Hi,
> > 
> > <Background>
> > 
> > We have been using YAFFS2 on 2 Samsung 512Meg parts (1 gig total)
> > on our project for about 6 months now (I'm on Linux 2.4.24-uc0 
> > (I know, I know... upgrade... well, to make a long story short
> > I can not.. because we are on a 5272 ColdFire and no one has ported
> > the USB or FEC drivers to 2.6 that I know of and I sure do not have
> > that much time on my hands....)) and it has been working GREAT!   
> > 
> > Except....
> > 
> > Once in a while (like maybe three times a week), access to 
> the YAFFS2
> > file system blocks. The rest of the system continues to work, but 
> > somewhere someone is holding the grossLock and not letting go. 
> > Once this happens, if I just cd into my /nandfs directory, 
> > an 'ls' just hangs.
> > 
> > I originally thought it was an issue with the nand MTD code.
> > but I put debug LED flashes around every entry point and when
> > it hangs, none of those LEDs are on. 
> > 
> > I wrote a test app that spawns 10 pthreads that all hammer 
> the YAFFS2
> > with fopen, fwrite, and fclose continuously with random nanosleep
> > times of up to 100 ms. It hangs after about 8 seconds of that.
> > I added a grossLock count to the /proc/yaffs file and sure enough
> > when this happens, it is way negative (like -4 or -7 or something)
> > and holding.
> > 
> > So, I modified yaffs_fs.c to print the pid of each call to 
> > the grossLock and grossUnlock routines. I see a flurry of pids
> > in the range of 30 to 40. That's my test app. But then I see
> > a pid of 4 and shortly after, the dead lock occurs....
> > 
> > pid 4 is kswapd.....
> > 
> > Well, being ignorant about kswapd, I googled it and could not find 
> > anything very useful on it. We are a small uClinux embedded system
> > and hence, we do not have a swap partition or an MMU.... So, what
> > good is kswapd??? 
> > 
> > I modified vmscan.c to simply not start the kswapd thread 
> > and my test app ran all night with no hangs....
> > 
> > </Background>
> > 
> > So, to my questions....
> > 
> >   Is anybody else seeing this?
> >   What is different about the way kswapd access YAFFS that 
> > causes the deadlock?
> >   Why is kswapd accessing the YAFFS file system anyway?
> >   How does it even know its there?
> >   What benefit is kswapd in my mmu-less embedded system?
> >   Is there any danger in just disabling kswapd?
> > 
> > I'll be searching for the answers to these questions and will post
> > answers if and when I find anything....
> > 
> > In the mean time, I'd sure appreciate 
> > any light you 'all can shed on this.
> > 
> > Thanks,
> > - Reed.
> > 
> > __________________________________
> > Reed Lawson
> > IGT Firmware Engineering
> > (775) 448-0755
> >  
> > 
> > _______________________________________________
> > yaffs mailing list
> > yaffs at stoneboat.aleph1.co.uk
> > http://stoneboat.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs
> > 
> 
> _______________________________________________
> yaffs mailing list
> yaffs at stoneboat.aleph1.co.uk
> http://stoneboat.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs
> 



More information about the yaffs mailing list