[Yaffs] Yaffs deadlock in yaffs_evict_inode()

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Eivind Tagseth
Date:  
To: yaffs
Subject: [Yaffs] Yaffs deadlock in yaffs_evict_inode()
Hi

We're experiencing deadlocks within the yaffs fs on some of our systems.
The issue is hard to reproduce, it happens on just a few percent of the
systems, and then only after a couple of months.

The systems are running an older version of yaffs
(6820610d6b3ea887af57fbd9706fff78923a2115), but as far as I can see from
the code, the problem is still present.

As far as I can see (see backtrace below), the yaffs_bg_thread_fn() runs
yaffs_bg_gc() with yaffs_gross_lock(). During the gc run, yaffs finds a
bad block and tells the nand layer to mark it bad. nand_update_bbt()
then needs to malloc a page for the bbt, is out of free pages, and calls
try_to_free_pages().
try_to_free_pages() finds a page owned by yaffs that it wants to free,
and calls yaffs_evict_node(), which requires the yaffs_gross_lock(), and
the yaffs fs is now deadlocked!

As far as I can see, the latest version of yaffs_bg_thread_fn() and
yaffs_evict_node() still both require a yaffs_gross_lock(), so I think
this could still happen in the latest version, if I'm wrong, please let
me know.

I'd appreciate input on the validity of my analysis, and of course, on
how to fix it.

The syslog shows the following:

Aug 30 21:29:01 INFO: task yaffs-bg-1:28 blocked for more than 120 seconds.
Aug 30 21:29:01 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 30 21:29:01 yaffs-bg-1    D c02ae38c     0    28      2 0x00000000
Aug 30 21:29:01 Backtrace:
Aug 30 21:29:01 [<c02ae030>] (schedule+0x0/0x3e4) from [<c02af140>]
(__mutex_lock_slowpath+0xa0/0x118)
Aug 30 21:29:01 [<c02af0a0>] (__mutex_lock_slowpath+0x0/0x118) from
[<c02af1d8>] (mutex_lock+0x20/0x24)
Aug 30 21:29:01 r7:c74bc600 r6:c7817088 r5:c79e1648 r4:c7817088
Aug 30 21:29:01 [<c02af1b8>] (mutex_lock+0x0/0x24) from [<c015e9b4>]
(yaffs_gross_lock.clone.3+0x44/0x7c)
Aug 30 21:29:01 [<c015e970>] (yaffs_gross_lock.clone.3+0x0/0x7c) from
[<c015fd68>] (yaffs_evict_inode+0xc8/0xf8)
Aug 30 21:29:01 r4:c74bc5f0 r3:00000001
Aug 30 21:29:01 [<c015fca0>] (yaffs_evict_inode+0x0/0xf8) from
[<c00be3a4>] (evict+0x28/0x9c)
Aug 30 21:29:01 r6:c7ae1b08 r5:c7ae0000 r4:c74bc5f0 r3:c015fca0
Aug 30 21:29:01 [<c00be37c>] (evict+0x0/0x9c) from [<c00be730>]
(dispose_list+0x44/0xcc)
Aug 30 21:29:01 r4:c74bc5f0 r3:c7ae1b08
Aug 30 21:29:01 [<c00be6ec>] (dispose_list+0x0/0xcc) from [<c00bf07c>]
(shrink_icache_memory+0x2ac/0x2f0)
Aug 30 21:29:01 r7:c7ae0000 r6:000000da r5:c77a70a0 r4:c77a7090
Aug 30 21:29:01 [<c00bedd0>] (shrink_icache_memory+0x0/0x2f0) from
[<c00849b4>] (shrink_slab+0x110/0x1ac)
Aug 30 21:29:01 [<c00848a4>] (shrink_slab+0x0/0x1ac) from [<c00873f0>]
(try_to_free_pages+0x1e4/0x354)
Aug 30 21:29:01 [<c008720c>] (try_to_free_pages+0x0/0x354) from
[<c0080018>] (__alloc_pages_nodemask+0x34c/0x578)
Aug 30 21:29:01 [<c007fccc>] (__alloc_pages_nodemask+0x0/0x578) from
[<c008025c>] (__get_free_pages+0x18/0x44)
Aug 30 21:29:01 [<c0080244>] (__get_free_pages+0x0/0x44) from
[<c00a6f20>] (__kmalloc+0x3c/0xcc)
Aug 30 21:29:01 [<c00a6ee4>] (__kmalloc+0x0/0xcc) from [<c01fe8a4>]
(nand_update_bbt+0x60/0x154)
Aug 30 21:29:01 r8:00010000 r7:c7811198 r6:c7811000 r5:c0398198 r4:c03981dc
Aug 30 21:29:01 r3:00000000
Aug 30 21:29:01 [<c01fe844>] (nand_update_bbt+0x0/0x154) from
[<c01fb6f8>] (nand_default_block_markbad+0xa0/0x19c)
Aug 30 21:29:01 [<c01fb658>] (nand_default_block_markbad+0x0/0x19c) from
[<c01fa678>] (nand_block_markbad+0x48/0x4c)
Aug 30 21:29:01 [<c01fa630>] (nand_block_markbad+0x0/0x4c) from
[<c01f4a74>] (part_block_markbad+0x58/0x74)
Aug 30 21:29:01 r8:0001307f r7:00000000 r6:09800000 r5:c797d400 r4:c797d400
Aug 30 21:29:01 r3:00000000
Aug 30 21:29:01 [<c01f4a1c>] (part_block_markbad+0x0/0x74) from
[<c016a7e8>] (nandmtd2_mark_block_bad+0x54/0x68)
Aug 30 21:29:01 r7:c79a0000 r6:00013000 r4:c7817000 r3:00000000
Aug 30 21:29:01 [<c016a794>] (nandmtd2_mark_block_bad+0x0/0x68) from
[<c01694f8>] (yaffs_mark_bad+0x24/0x30)
Aug 30 21:29:01 r6:c79a2600 r5:000004c1 r4:c7817000 r3:c016a794
Aug 30 21:29:01 [<c01694d4>] (yaffs_mark_bad+0x0/0x30) from [<c0163860>]
(yaffs_block_became_dirty+0x244/0x414)
Aug 30 21:29:01 [<c016361c>] (yaffs_block_became_dirty+0x0/0x414) from
[<c0163c44>] (yaffs_chunk_del+0x214/0x234)
Aug 30 21:29:01 [<c0163a30>] (yaffs_chunk_del+0x0/0x234) from
[<c0165498>] (yaffs_check_gc+0x990/0xc1c)
Aug 30 21:29:01 [<c0164b08>] (yaffs_check_gc+0x0/0xc1c) from
[<c0166bec>] (yaffs_bg_gc+0x40/0x60)
Aug 30 21:29:01 [<c0166bac>] (yaffs_bg_gc+0x0/0x60) from [<c016028c>]
(yaffs_bg_thread_fn+0x120/0x1cc)
Aug 30 21:29:01 r5:c7817000 r4:00000000
Aug 30 21:29:01 [<c016016c>] (yaffs_bg_thread_fn+0x0/0x1cc) from
[<c005ad54>] (kthread+0x90/0x98)
Aug 30 21:29:01 [<c005acc4>] (kthread+0x0/0x98) from [<c0044698>]
(do_exit+0x0/0x68c)
Aug 30 21:29:01 r6:c0044698 r5:c005acc4 r4:c7829d10




Eivind