[摘要]

[正文]文件系统挂载

[正文]squashfs通过块设备mtdblock真正读取flash

[正文]metadata block介绍之inode创建

[正文]读文件之squashfs_readpage

[总结]


注意:请使用谷歌浏览器阅读(IE浏览器排版混乱)


【摘要】

本文将以squashfs文件系统为例介绍一下linux内核态是如何读取文件的,读取操作如何由文件系统层到mtd驱动再到nand驱动的.读文件和直接读flash有何异同.

简单地说:直接读flash,是通过mtd驱动到flash驱动直接读取指定分区内偏移的flash地址;而读取文件要复杂一些,读取文件要先用直接读flash的方式读取metadata block,metadata block是保存inode和directory信息的flash地址,然后通过inode信息找到文件内容的flash地址,再将flash内容读取到squashfs_cache中.

阅读本文之前,可以先参考linux文件系统实现原理简述一文  http://blog.csdn.net/eleven_xiy/article/details/71249365

【正文】文件系统挂载

1 文件系统生成 , 举例user分区在flash上区间为0x1a00000-0x3200000(即26M --- 50M):

1> mksquashfs ./user user.squashfs -comp xz -all-root -processors 1

解析:

./user:user目录代表分区内容;即该目录及其内容将被制作成文件系统。

user.squashfs:表示生成的user文件系统,该文件被烧录到flash上user分区里。

-comp xz:表示以xz方式压缩.当读取一个文件时要解压缩。压缩比 高。

-all-root:表示user分区内所有文件归root用户所有. 可选参数。

-processors 1 : 表示mksquashfs打包过程中使用几个处理器。可选参数。

-b:此时虽然没有带-b参数,但是默认逻辑块大小128k。在挂载分区->初始化超级块时会获取此处块大小,见后文分析。

2> mksquashfs ./user user.squashfs -comp xz  -Xbcj arm -Xdict-size 512K -b 512K -processors 1

./user:user目录代表分区内容;

user.squashfs:表示生成的user文件系统,该文件被烧录到flash上user分区里.

-comp xz:表示以xz方式压缩.当读取一个文件时要解压缩。压缩比 高。

-processors 1 : 表示mksquashfs打包过程中使用几个处理器。可选参数。

-b:逻辑块大小512k。在挂载分区->初始化超级块时会获取此处块大小,见后文分析。

2 超级块的初始化squashfs_fill_super

static int squashfs_fill_super(struct super_block *sb, void *data, int silent)
{
struct squashfs_sb_info *msblk;
struct squashfs_super_block *sblk = NULL;
char b[BDEVNAME_SIZE];
struct inode *root;
long long root_inode;
unsigned short flags;
unsigned int fragments;
u64 lookup_table_start, xattr_id_table_start, next_table;
int err;
/*超级块的很多关键信息都保存在squashfs_sb_info中*/
sb->s_fs_info = kzalloc(sizeof(*msblk), GFP_KERNEL);
if (sb->s_fs_info == NULL) {
ERROR("Failed to allocate squashfs_sb_info\n");
return -ENOMEM;
}
msblk = sb->s_fs_info;
/*
msblk->devblksize = 1024;msblk->devblksize_log2=10;
*/
msblk->devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE);
msblk->devblksize_log2 = ffz(~msblk->devblksize);

mutex_init(&msblk->meta_index_mutex);

msblk->bytes_used = sizeof(*sblk);
/*
获取squashfs_super_block信息,这部分信息完全是从flash中读取的.读取flash的起始地址是分区内偏移0;大小是sizeof(struct squashfs_super_block)=96byte;
如上例:此时flash中烧录的是user.squashfs中的内容.即sblk保存了user.squashfs文件的内容:flash起始地址是0;大小是sizeof(struct squashfs_super_block);
*/
sblk = squashfs_read_table(sb, SQUASHFS_START, sizeof(*sblk));

if (IS_ERR(sblk)) {
ERROR("unable to read squashfs_super_block\n");
err = PTR_ERR(sblk);
sblk = NULL;
goto failed_mount;
}

err = -EINVAL;
/* 从分区文件(如user.squashfs)中读取magic= 0x73717368*/
sb->s_magic = le32_to_cpu(sblk->s_magic);
if (sb->s_magic != SQUASHFS_MAGIC) {
if (!silent)
ERROR("Can't find a SQUASHFS superblock on %s\n",
bdevname(sb->s_bdev, b));
goto failed_mount;
}

/*根据分区文件(如user.squashfs)中读取的sblk->compression=4(表示xz压缩类型),找到解压缩方法squashfs_decompressor=squashfs_xz_comp_ops*/
msblk->decompressor = supported_squashfs_filesystem(
le16_to_cpu(sblk->s_major),
le16_to_cpu(sblk->s_minor),
le16_to_cpu(sblk->compression));
if (msblk->decompressor == NULL)
goto failed_mount;

/* 从分区文件(如user.squashfs)中读取分区已使用大小。举例:分区23M;已经使用20M*/
msblk->bytes_used = le64_to_cpu(sblk->bytes_used);
if (msblk->bytes_used < 0 || msblk->bytes_used >
i_size_read(sb->s_bdev->bd_inode))
goto failed_mount;

/* 从分区文件(如user.squashfs)中读取分区已逻辑块大小512k;mksquashfs中-b参数指定*/
msblk->block_size = le32_to_cpu(sblk->block_size);
if (msblk->block_size > SQUASHFS_FILE_MAX_SIZE)
goto failed_mount;

/*
* Check the system page size is not larger than the filesystem
* block size (by default 128K).  This is currently not supported.
*/
if (PAGE_CACHE_SIZE > msblk->block_size) {
ERROR("Page size > filesystem block size (%d).  This is "
"currently not supported!\n", msblk->block_size);
goto failed_mount;
}

/* 从分区文件(如user.squashfs)中读取分区已逻辑块大小512k以2为底的对数,即log512k .校验逻辑块大小时使用*/
msblk->block_log = le16_to_cpu(sblk->block_log);
if (msblk->block_log > SQUASHFS_FILE_MAX_LOG)
goto failed_mount;

/* Check that block_size and block_log match */
if (msblk->block_size != (1 << msblk->block_log))
goto failed_mount;

/* Check the root inode for sanity */
root_inode = le64_to_cpu(sblk->root_inode);
if (SQUASHFS_INODE_OFFSET(root_inode) > SQUASHFS_METADATA_SIZE)
goto failed_mount;

/*
  从分区文件(如user.squashfs)中读取inode_table,如user分区
  sblk->inode_table_start=0x1497002  -- 该superblock的inode信息在flash上保存的起始地址;使用方式见后文.
  sblk->directory_table_start=0x1497ce2;-- 该superblock的directory信息在flash上保存的起始地址;使用方式见后文.
  sblk->fragement_table_start0x1498d72;
  sblk->id_table_start=0x1499036;
  这些地址表示分区内的偏移地址,如0x1497002表示的flash地址为分区起始地址加上偏移地址:即0x1a00000+0x1497002;
*/
msblk->inode_table = le64_to_cpu(sblk->inode_table_start);
msblk->directory_table = le64_to_cpu(sblk->directory_table_start);
msblk->inodes = le32_to_cpu(sblk->inodes);
flags = le16_to_cpu(sblk->flags);
/* 如 Found valid superblock on mtdblock8 */
TRACE("Found valid superblock on %s\n", bdevname(sb->s_bdev, b));
/* 如 inodes are cmpressed */
TRACE("Inodes are %scompressed\n", SQUASHFS_UNCOMPRESSED_INODES(flags)? "un" : "");
/* 如 Data are cmpressed */
TRACE("Data is %scompressed\n", SQUASHFS_UNCOMPRESSED_DATA(flags)? "un" : "");
TRACE("Filesystem size %lld bytes\n", msblk->bytes_used);
TRACE("Block size %d\n", msblk->block_size);
/*inodes 451*/
TRACE("Number of inodes %d\n", msblk->inodes);
/*fragments 21*/
TRACE("Number of fragments %d\n", le32_to_cpu(sblk->fragments));
/* ids 2*/
TRACE("Number of ids %d\n", le16_to_cpu(sblk->no_ids));
TRACE("sblk->inode_table_start %llx\n", msblk->inode_table);
TRACE("sblk->directory_table_start %llx\n", msblk->directory_table);
TRACE("sblk->fragment_table_start %llx\n",(u64) le64_to_cpu(sblk->fragment_table_start));
TRACE("sblk->id_table_start %llx\n",(u64) le64_to_cpu(sblk->id_table_start));

sb->s_maxbytes = MAX_LFS_FILESIZE;
sb->s_flags |= MS_RDONLY;
sb->s_op = &squashfs_super_ops;

err = -ENOMEM;
/* 创建metadata的squashfs_cache ,block_cache用于缓存metadata block上的信息,metadata block上保存inode和directory信息*/
msblk->block_cache = squashfs_cache_init("metadata",SQUASHFS_CACHED_BLKS, SQUASHFS_METADATA_SIZE);
if (msblk->block_cache == NULL)
goto failed_mount;

/* Allocate read_page block */
msblk->read_page = squashfs_cache_init("data",
squashfs_max_decompressors(), msblk->block_size);
if (msblk->read_page == NULL) {
ERROR("Failed to allocate read_page block\n");
goto failed_mount;
}

msblk->stream = squashfs_decompressor_setup(sb, flags);
if (IS_ERR(msblk->stream)) {
err = PTR_ERR(msblk->stream);
msblk->stream = NULL;
goto failed_mount;
}

/* Handle xattrs */
sb->s_xattr = squashfs_xattr_handlers;
xattr_id_table_start = le64_to_cpu(sblk->xattr_id_table_start);
if (xattr_id_table_start == SQUASHFS_INVALID_BLK) {
next_table = msblk->bytes_used;
goto allocate_id_index_table;
}

/* Allocate and read xattr id lookup table */
msblk->xattr_id_table = squashfs_read_xattr_id_table(sb,
xattr_id_table_start, &msblk->xattr_table, &msblk->xattr_ids);
if (IS_ERR(msblk->xattr_id_table)) {
ERROR("unable to read xattr id index table\n");
err = PTR_ERR(msblk->xattr_id_table);
msblk->xattr_id_table = NULL;
if (err != -ENOTSUPP)
goto failed_mount;
}
next_table = msblk->xattr_table;

allocate_id_index_table:
/* Allocate and read id index table */
msblk->id_table = squashfs_read_id_index_table(sb,
le64_to_cpu(sblk->id_table_start), next_table,
le16_to_cpu(sblk->no_ids));
if (IS_ERR(msblk->id_table)) {
ERROR("unable to read id index table\n");
err = PTR_ERR(msblk->id_table);
msblk->id_table = NULL;
goto failed_mount;
}
next_table = le64_to_cpu(msblk->id_table[0]);

/* Handle inode lookup table */
lookup_table_start = le64_to_cpu(sblk->lookup_table_start);
if (lookup_table_start == SQUASHFS_INVALID_BLK)
goto handle_fragments;

/* Allocate and read inode lookup table */
msblk->inode_lookup_table = squashfs_read_inode_lookup_table(sb,
lookup_table_start, next_table, msblk->inodes);
if (IS_ERR(msblk->inode_lookup_table)) {
ERROR("unable to read inode lookup table\n");
err = PTR_ERR(msblk->inode_lookup_table);
msblk->inode_lookup_table = NULL;
goto failed_mount;
}
next_table = le64_to_cpu(msblk->inode_lookup_table[0]);

sb->s_export_op = &squashfs_export_ops;

handle_fragments:
fragments = le32_to_cpu(sblk->fragments);
if (fragments == 0)
goto check_directory_table;

msblk->fragment_cache = squashfs_cache_init("fragment",
SQUASHFS_CACHED_FRAGMENTS, msblk->block_size);
if (msblk->fragment_cache == NULL) {
err = -ENOMEM;
goto failed_mount;
}

/* Allocate and read fragment index table */
msblk->fragment_index = squashfs_read_fragment_index_table(sb,
le64_to_cpu(sblk->fragment_table_start), next_table, fragments);
if (IS_ERR(msblk->fragment_index)) {
ERROR("unable to read fragment index table\n");
err = PTR_ERR(msblk->fragment_index);
msblk->fragment_index = NULL;
goto failed_mount;
}
next_table = le64_to_cpu(msblk->fragment_index[0]);

check_directory_table:
/* Sanity check directory_table */
if (msblk->directory_table > next_table) {
err = -EINVAL;
goto failed_mount;
}

/* Sanity check inode_table */
if (msblk->inode_table >= msblk->directory_table) {
err = -EINVAL;
goto failed_mount;
}

/* root inode内存空间申请allocate root */
root = new_inode(sb);
if (!root) {
err = -ENOMEM;
goto failed_mount;
}
/*
该superblock根inode信息在flash上的保存地址,mksquashfs制作文件系统时就已经指定,保存到squashfs_super_block中;
而squashfs_super_block信息保存到该flash分区0x0偏移地址处见上文
root表示操作系统申请的dram上的inode结构,该结构的关键信息是通过squashfs_read_inode从flash上读取的.
root_inode右移16bit加上该superblock的inode_table_start是保存根inode的信息的flash地址;
root_inode低16bit表示flash上squashfs_inode在根inode信息中的偏移地址;
*/
err = squashfs_read_inode(root, root_inode);
if (err) {
make_bad_inode(root);
iput(root);
goto failed_mount;
}
insert_inode_hash(root);

sb->s_root = d_make_root(root);
if (sb->s_root == NULL) {
ERROR("Root inode create failed\n");
err = -ENOMEM;
goto failed_mount;
}

TRACE("Leaving squashfs_fill_super\n");
kfree(sblk);
return 0;

failed_mount:
squashfs_cache_delete(msblk->block_cache);
squashfs_cache_delete(msblk->fragment_cache);
squashfs_cache_delete(msblk->read_page);
squashfs_decompressor_destroy(msblk);
kfree(msblk->inode_lookup_table);
kfree(msblk->fragment_index);
kfree(msblk->id_table);
kfree(msblk->xattr_id_table);
kfree(sb->s_fs_info);
sb->s_fs_info = NULL;
kfree(sblk);
return err;
}
创建metadata的squashfs_cache缓存区:此时entries=8,block_size=8192;squashfs_cache_get中使用.
struct squashfs_cache *squashfs_cache_init(char *name, int entries,int block_size)
{
 int i, j;
 struct squashfs_cache *cache = kzalloc(sizeof(*cache), GFP_KERNEL);
 if (cache == NULL) {
  ERROR("Failed to allocate %s cache\n", name);
  return NULL;
 }
 cache->entry = kcalloc(entries, sizeof(*(cache->entry)), GFP_KERNEL);
 if (cache->entry == NULL) {
  ERROR("Failed to allocate %s cache\n", name);
  goto cleanup;
 }
 cache->curr_blk = 0;
 cache->next_blk = 0;
 cache->unused = entries;//缓存区中有8个entry,每个entry的数据区有8192bytes空间
 cache->entries = entries;
 cache->block_size = block_size; //8192
 cache->pages = block_size >> PAGE_CACHE_SHIFT;
 cache->pages = cache->pages ? cache->pages : 1;
 cache->name = name;
 cache->num_waiters = 0;
 spin_lock_init(&cache->lock);
 init_waitqueue_head(&cache->wait_queue);
 for (i = 0; i < entries; i++) { //entries=8
  struct squashfs_cache_entry *entry = &cache->entry[i];
  init_waitqueue_head(&cache->entry[i].wait_queue);
  entry->cache = cache;
  entry->block = SQUASHFS_INVALID_BLK;
  entry->data = kcalloc(cache->pages, sizeof(void *), GFP_KERNEL);
  if (entry->data == NULL) {
   ERROR("Failed to allocate %s cache entry\n", name);
   goto cleanup;
  }
  for (j = 0; j < cache->pages; j++) {
    /* metadata每个entry对应2个page ,entry->data上保存的是flash读出的数据*/
   entry->data[j] = kmalloc(PAGE_CACHE_SIZE, GFP_KERNEL);
   if (entry->data[j] == NULL) {
    ERROR("Failed to allocate %s buffer\n", name);
    goto cleanup;
   }
  }
  entry->actor = squashfs_page_actor_init(entry->data,
      cache->pages, 0);
  if (entry->actor == NULL) {
   ERROR("Failed to allocate %s cache entry\n", name);
   goto cleanup;
  }
 }
 return cache;
cleanup:
 squashfs_cache_delete(cache);
 return NULL;
}

2.1 读取flash分区对应的超级快的信息.即获取super_block->s_fs_info的信息,保存到squashfs_super_block中.

suqashfs_fill_super->squashfs_read_table()

/*
block:表示flash分区内的偏移地址而不是指块.具体操作时block要加上分区起始地址,才是真正的flash地址.
*/
void *squashfs_read_table(struct super_block *sb, u64 block, int length)
{
int pages = (length + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
int i, res;
void *table, *buffer, **data;
struct squashfs_page_actor *actor;

table = buffer = kmalloc(length, GFP_KERNEL);
if (table == NULL)
return ERR_PTR(-ENOMEM);

data = kcalloc(pages, sizeof(void *), GFP_KERNEL);
if (data == NULL) {
res = -ENOMEM;
goto failed;
}

actor = squashfs_page_actor_init(data, pages, length);
if (actor == NULL) {
res = -ENOMEM;
goto failed2;
}

for (i = 0; i < pages; i++, buffer += PAGE_CACHE_SIZE)
data[i] = buffer;

res = squashfs_read_data(sb, block, length |
SQUASHFS_COMPRESSED_BIT_BLOCK, NULL, actor);

kfree(data);
kfree(actor);
if (res < 0)
goto failed;
return table;
}
【正文】squashfs通过块设备mtdblock真正读取flash

读取过程主要分为以下几步:

1> 获取buffer_head: squashfs_read_data->sb_getblk(); 

  提交一个读请求:  squashfs_read_data->ll_rw_block->submit_bh->submit_bio->blk_queue_bio();

  buffer_head->b_data上保存从flash上读取到的内容;

2> 处理读请求,真正实现驱动中的读操作 :
mtd_blktrans_work->do_blktrans_request->mtdblock_tr->mtdblock_readsect->do_cached_read->(mtd_read->mtd->_read=part_read)
->nand_read()->nand_do_read_ops()->(chip->cmdfunc);

1 读flash数据过程squashfs_read_data

squashfs_read_data函数很重要,他是文件系统层访问flash的重要接口,该函数以入参length区分访问类型: 

1>length!=0时表示读取data block.注意此时的block都是要读取的flash分区内的偏移地址.如:0x1497ce2.

2>length=0时表示读取metadata block.注意此时的block都是要读取的flash分区内的偏移地址.如:0x1497ce2.

data block和metadata block的异同:

相同点:二者都是通过块设备mtdblock直接访问flash地址.

区别:

1> data block可以指flash上的任何地址,理论上metadata block也可以通过读取data block的方式读取出来,但因为metadata block还有其他特点,所以操作系统有专门的接口   squashfs_read_metadata负责读取metadata事实上squashfs_read_metadata最后也是调用squashfs_read_data读取datablock来获取metadatablock的,见后文.

2> metadata block也是flash的地址,它保存一些inode的信息等,是由操作系统维护的,在制作文件系统mksquashfs,创建新文件等情况下生成其内容.

3> 举例:

datablock典型用法,挂载文件系统时:squashfs_fill_super->squashfs_read_table->squashfs_read_inode,此时从分区的0x0地址处读出包括inode_table起始地址等的squashfs_super_block.后文获取metadata block信息时都要依赖此.

metadata block典型用法,打开一个文件时:do_sys_open->do_last->lookup_real->squashfs_lookup->squashfs_read_metadata,此时通过读取metadata block信息获取文件对应的inode信息,再根据inode信息读出文件内容.后文还会有介绍.

总之,通过块设备mtdblock直接访问flash地址,需要访问datablock;通过访问文件的方式间接访问flash,需要访问metadata block.而访问metadatablock过程其实也包括了访问datablock;

本章节主要介绍data block的读取,metadata读取见后文.

其中:index表示flash分区内的偏移地址而不是指块.具体操作时block要加上分区起始地址,才是真正的flash地址;

从flash上读取数据之后,还要经过解压缩过程:可以参考博文:xz压缩文件的解压缩过程

int squashfs_read_data(struct super_block *sb, u64 index, int length,u64 *next_index, struct squashfs_page_actor *output)
{
/* 上面squashfs_fill_super中赋值 */
struct squashfs_sb_info *msblk = sb->s_fs_info;
struct buffer_head **bh;
/* index是要操作的flash 分区内偏移地址,devlbksize=1024byte*/
int offset = index & ((1 << msblk->devblksize_log2) - 1);
/* 
flash分区内偏移地址msblk->devblksize_log2=10;cur_index表示flash分区内偏移地址对应的逻辑块;
*/
u64 cur_index = index >> msblk->devblksize_log2;
int bytes, compressed, b = 0, k = 0, avail, i;

bh = kcalloc(((output->length + msblk->devblksize - 1)
     >> msblk->devblksize_log2) + 1, sizeof(*bh), GFP_KERNEL);
if (bh == NULL)
return -ENOMEM;

if (length) {
/*
* Datablock.读取数据块内容.
*/
bytes = -offset;
compressed = SQUASHFS_COMPRESSED_BLOCK(length);
length = SQUASHFS_COMPRESSED_SIZE_BLOCK(length);
if (next_index)
*next_index = index + length;

TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
index, compressed ? "" : "un", length, output->length);

if (length < 0 || length > output->length ||
(index + length) > msblk->bytes_used)
goto read_failure;
/* 
  循环读取逻辑块;每次读取devblksize=1024大小;最先读取的逻辑块是flash分区内偏移地址所在的逻辑块. 
*/
for (b = 0; bytes < length; b++, cur_index++) {
/*获取buffer_head*/
bh[b] = sb_getblk(sb, cur_index);
if (bh[b] == NULL)
goto block_release;
/*devblksize=1024*/
bytes += msblk->devblksize;
}
/* 
1 提交一个读请求 ll_rw_block->submit_bh->submit_bio->do_blktrans_request();
2 处理读请求,真正实现驱动中的读操作 :
mtd_blktrans_work->do_blktrans_request->mtdblock_tr->mtdblock_readsect->do_cached_read->(mtd_read->mtd->_read=part_read)
->nand_read()->nand_do_read_ops()->(chip->cmdfunc)
*/
ll_rw_block(READ, b, bh);
} else {
/*
* Metadata block.读取逻辑块内容;
*/
if ((index + 2) > msblk->bytes_used)
goto read_failure;

bh[0] = get_block_length(sb, &cur_index, &offset, &length);
if (bh[0] == NULL)
goto read_failure;
b = 1;

bytes = msblk->devblksize - offset;
compressed = SQUASHFS_COMPRESSED(length);
length = SQUASHFS_COMPRESSED_SIZE(length);
if (next_index)
*next_index = index + length + 2;

TRACE("Block @ 0x%llx, %scompressed size %d\n", index,compressed ? "" : "un", length);

if (length < 0 || length > output->length ||
(index + length) > msblk->bytes_used)
goto block_release;

for (; bytes < length; b++) {
bh[b] = sb_getblk(sb, ++cur_index);
if (bh[b] == NULL)
goto block_release;
bytes += msblk->devblksize;
}
ll_rw_block(READ, b - 1, bh + 1);
}

for (i = 0; i < b; i++) {
wait_on_buffer(bh[i]);
if (!buffer_uptodate(bh[i]))
goto block_release;
}

if (compressed) {
/* 
解压缩操作,将flash上读出的数据解压缩 
msblk:超级块信息squashfs_sb_info ;
bh:buffer_head,bh->data中保存从flash上读取的数据;
b:表示读取的数据长度对应的逻辑块个数;
offset:表示读取的flash地址对应的逻辑块偏移地址,一个逻辑块为1024byte,offset=index&0x3ff
length:表示从flash上读取的数据长度
*/
length = squashfs_decompress(msblk, bh, b, offset, length,output);
if (length < 0)
goto read_failure;
} else {
/*
* Block is uncompressed.
*/
int in, pg_offset = 0;
void *data = squashfs_first_page(output);

for (bytes = length; k < b; k++) {
in = min(bytes, msblk->devblksize - offset);
bytes -= in;
while (in) {
if (pg_offset == PAGE_CACHE_SIZE) {
data = squashfs_next_page(output);
pg_offset = 0;
}
avail = min_t(int, in, PAGE_CACHE_SIZE -
pg_offset);
memcpy(data + pg_offset, bh[k]->b_data + offset,
avail);
in -= avail;
pg_offset += avail;
offset += avail;
}
offset = 0;
put_bh(bh[k]);
}
squashfs_finish_page(output);
}

kfree(bh);
return length;

block_release:
for (; k < b; k++)
put_bh(bh[k]);

read_failure:
ERROR("squashfs_read_data failed to read block 0x%llx\n",
(unsigned long long) index);
kfree(bh);
return -EIO;
}
1.1 获取buffer_head.查找buffer_head 

(1)获取bufferhead的过程:实际上是先查找是否有对应block(block为flash分区内偏移地址所在逻辑块)的buffer_head, __find_get_block中完成;如果没有查找到则创建buffer_head, __getblk_slow中完成;本文先介绍的buffer_head的查找过程,实际上查找过程依赖于创建过程,所以也可以先阅读buffer_head的创建过程.

(2)操作flash为何要获取buffer_head?

原因:以读取过程为例,当读取flash分区内的一个偏移地址时,它唯一对应一个flash分区内偏移地址所在逻辑块block.

创建buffer_head时:

alloc_page_buffers()中申请了一个page和buffer_head结构;

init_page_buffers()中初始化buffer_head->b_blocknr为读取的flash分区内偏移地址所在逻辑块block;

set_bh_page()中初始化buffer_head->b_data为alloc_page_buffers中申请的page->address的分割.bh->b_data保存从flash中夺取的数据.

link_dev_buffers()->attach_page_buffers()中初始化page->private为buffer_head;

(3)squashfs_read_data()->sb_getblk():

static inline struct buffer_head *sb_getblk(struct super_block *sb, sector_t block)
{
 /* block为flash分区内偏移地址所在逻辑块(即block=flash分区内偏移地址/(devblksize=1024)),
    一个逻辑块大小1024 byte;sb->s_blocksize大小为1024 byte该值赋值给bh->b_size 
 */
 return __getblk_gfp(sb->s_bdev, block, sb->s_blocksize, __GFP_MOVABLE);
}

squashfs_read_data()->sb_getblk()->__getblk_gfp(): 

struct buffer_head *__getblk_gfp(struct block_device *bdev, sector_t block,unsigned size, gfp_t gfp)
{
/* block为flash分区内偏移地址/(devblksize=1024)*/
struct buffer_head *bh = __find_get_block(bdev, block, size);
might_sleep();
/* 如果buffer_head获取失败:可参看__find_get_block和__find_get_block_slow分析获取失败的原因*/
if (bh == NULL)
bh = __getblk_slow(bdev, block, size, gfp);
return bh;
}
squashfs_read_data()->sb_getblk()->__getblk_gfp()->__find_get_block():
struct buffer_head *__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
{
/* 
假设偏移地址所在逻辑块block之前没被访问过,则此时bh=NULL,直接分析__find_get_block_slow;
如果访问过,则有可能找到,继续分析lookup_bh_lru;
可以先分析__find_get_block_slow即没有找到的情况,再回过头分析lookup_bh_lru;
*/
struct buffer_head *bh = lookup_bh_lru(bdev, block, size);

if (bh == NULL) {
/* __find_get_block_slow will mark the page accessed */
bh = __find_get_block_slow(bdev, block);
if (bh)
bh_lru_install(bh);
} else
touch_buffer(bh);

return bh;
}
squashfs_read_data()->sb_getblk()->__getblk_gfp()->__find_get_block->__find_get_block_slow()
static struct buffer_head *__find_get_block_slow(struct block_device *bdev, sector_t block)
{
struct inode *bd_inode = bdev->bd_inode;
struct address_space *bd_mapping = bd_inode->i_mapping;
struct buffer_head *ret = NULL;
pgoff_t index;
struct buffer_head *bh;
struct buffer_head *head;
struct page *page;
int all_mapped = 1;

index = block >> (PAGE_CACHE_SHIFT - bd_inode->i_blkbits);
/* 申请新的page */
page = find_get_page_flags(bd_mapping, index, FGP_ACCESSED);
if (!page)
goto out;

spin_lock(&bd_mapping->private_lock);
/* 
如果page的私有数据区page->private=NULL,则不能获得buffer_head,直接goto out_unlock;
page的私有数据区page->private中保存的是buffer_head的头; 
第一次读取block(block为flash分区内偏移地址/(devblksize=1024)时,如果该block不对应buffer_head,则创建buffer_head对应该block,
并把buffer_head的地址保存到page的私有数据区即page->private;可以参考squashfs_read_data()->sb_getblk()->__getblk_gfp():中的另一
个分支__getblk_slow();
)
*/
if (!page_has_buffers(page))
goto out_unlock;
head = page_buffers(page);
bh = head;
/*
page的私有数据区page->private中保存的是buffer_head的头; 
通过bh->b_blocknr是否等于block来判断这个buffer_head是否我们要读取得block对应的buffer_head;
第一次读取block(block为flash分区内偏移地址/(devblksize=1024)时,如果该block不对应buffer_head,则创建buffer_head对应该block,
并把buffer_head的地址保存到page的私有数据区即page->private;可以参考squashfs_read_data()->sb_getblk()->__getblk_gfp():中的另一
个分支__getblk_slow();
*/
do {
if (!buffer_mapped(bh))
all_mapped = 0;
else if (bh->b_blocknr == block) {
ret = bh;
get_bh(bh);
goto out_unlock;
}
bh = bh->b_this_page;
} while (bh != head);
/* we might be here because some of the buffers on this page are
* not mapped.  This is due to various races between
* file io on the block device and getblk.  It gets dealt with
* elsewhere, don't buffer_error if we had some unmapped buffers
*/
if (all_mapped) {
char b[BDEVNAME_SIZE];

printk("__find_get_block_slow() failed. ""block=%llu, b_blocknr=%llu\n",(unsigned long long)block,(unsigned long long)bh->b_blocknr);
printk("b_state=0x%08lx, b_size=%zu\n",bh->b_state, bh->b_size);
printk("device %s blocksize: %d\n", bdevname(bdev, b),1 << bd_inode->i_blkbits);
}
out_unlock:
spin_unlock(&bd_mapping->private_lock);
page_cache_release(page);
out:
return ret;
}
1.2 获取buffer_head.创建buffer_head 

如果__find_get_block未获取到对应block(block为flash分区内偏移地址所在逻辑块)的buffer_head,则创建它.

创建buffer_head时:

alloc_page_buffers()中申请了一个page和buffer_head结构;

init_page_buffers()中初始化buffer_head->b_blocknr为读取的flash分区内偏移地址所在逻辑块block;

set_bh_page()中初始化buffer_head->b_data为alloc_page_buffers中申请的page->address的分割.bh->b_data保存从flash中夺取的数据.

link_dev_buffers()->attach_page_buffers()中初始化page->private为buffer_head;

分析:squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow():

struct buffer_head *__getblk_slow(struct block_device *bdev, sector_t block,unsigned size, gfp_t gfp)
{
 /* Size must be multiple of hard sectorsize */
 if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
   (size < 512 || size > PAGE_SIZE))) {
  printk(KERN_ERR "getblk(): invalid block size %d requested\n",
     size);
  printk(KERN_ERR "logical block size: %d\n",
     bdev_logical_block_size(bdev));
  dump_stack();
  return NULL;
 }
 for (;;) {
  struct buffer_head *bh;
  int ret;
  bh = __find_get_block(bdev, block, size);
  if (bh)
   return bh;
  /* 创建buffer_head */
  ret = grow_buffers(bdev, block, size, gfp);
  if (ret < 0)
   return NULL;
  if (ret == 0)
   free_more_memory();
 }
} 

申请buffer_head和page,并负责page->private,bh->b_blocknr,bh->b_data等信息.

squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow()->grow_buffers->grow_dev_page():

static int grow_dev_page(struct block_device *bdev, sector_t block,pgoff_t index, int size, int sizebits, gfp_t gfp)
{
 struct inode *inode = bdev->bd_inode;
 struct page *page;
 struct buffer_head *bh;
 sector_t end_block;
 int ret = 0;  /* Will call free_more_memory() */
 gfp_t gfp_mask;
 gfp_mask = (mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS) | gfp;
 /*
  * XXX: __getblk_slow() can not really deal with failure and
  * will endlessly loop on improvised global reclaim.  Prefer
  * looping in the allocator rather than here, at least that
  * code knows what it's doing.
  */
 gfp_mask |= __GFP_NOFAIL;
 /* 申请新的page,且page->mapping=inode->i_mapping ,这个page就是bh->b_page(set_bh_page中初始化)*/
 page = find_or_create_page(inode->i_mapping, index, gfp_mask);
 if (!page)
  return ret;
 BUG_ON(!PageLocked(page));
 /* 判断page->private上是否保存buffer_head,如果page_has_buffers为真,则说明page->private上保存了buffer_head */
 if (page_has_buffers(page)) {
  bh = page_buffers(page);
  if (bh->b_size == size) {
   end_block = init_page_buffers(page, bdev,
      (sector_t)index << sizebits,
      size);
   goto done;
  }
  if (!try_to_free_buffers(page))
   goto failed;
 }
 /*
  * 执行至此,说明page->private上未保存block对应的buffer_head,于是申请buffer_head;block为要操作的flash分区内偏移地址所在逻辑块
  */
 bh = alloc_page_buffers(page, size, 0);
 if (!bh)
  goto failed;
 /*
  * Link the page to the buffers and initialise them.  Take the
  * lock to be atomic wrt __find_get_block(), which does not
  * run under the page lock.
  */
 spin_lock(&inode->i_mapping->private_lock);
 link_dev_buffers(page, bh);
 end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits,
   size);
 spin_unlock(&inode->i_mapping->private_lock);
done:
 ret = (block < end_block) ? 1 : -ENXIO;
failed:
 unlock_page(page);
 page_cache_release(page);
 return ret;
}

squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow()->grow_buffers->grow_dev_page()->alloc_page_buffers():

struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,int retry)
{
 struct buffer_head *bh, *head;
 long offset;
try_again:
 head = NULL;
 offset = PAGE_SIZE;
 while ((offset -= size) >= 0) {
   /*创建新的buffer_head,init_page_buffers赋值*/
  bh = alloc_buffer_head(GFP_NOFS);
  if (!bh)
   goto no_grow;
  bh->b_this_page = head;
  bh->b_blocknr = -1;
  head = bh;
  /*bh->b_size=sb->s_blocksize=1024*/
  bh->b_size = size;
  /* Link the buffer to its page */
  /* 指定该bh对应的page,读取到的flash内容就保存到该page上 */
  set_bh_page(bh, page, offset);
 }
 return head;
/*
 * In case anything failed, we just free everything we got.
*/
no_grow:
 if (head) {
  do {
   bh = head;
   head = head->b_this_page;
   free_buffer_head(bh);
  } while (head);
 }
 /*
  * Return failure for non-async IO requests.  Async IO requests
  * are not allowed to fail, so we have to wait until buffer heads
  * become available.  But we don't want tasks sleeping with 
  * partially complete buffers, so all were released above.
  */
 if (!retry)
  return NULL;
 /* We're _really_ low on memory. Now we just
  * wait for old buffer heads to become free due to
  * finishing IO.  Since this is an async request and
  * the reserve list is empty, we're sure there are 
  * async buffer heads in use.
  */
 free_more_memory();
 goto try_again;
}
alloc_page_buffers()->set_bh_page():初始化bh->b_data;
void set_bh_page(struct buffer_head *bh,struct page *page, unsigned long offset)
{
 bh->b_page = page;
 BUG_ON(offset >= PAGE_SIZE);
 /* bh->b_data中保存flash上读取到的内容 */
 if (PageHighMem(page))
  /*
   * This catches illegal uses and preserves the offset:
   */
  bh->b_data = (char *)(0 + offset);
 else
  bh->b_data = page_address(page) + offset;
}

squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow()->grow_buffers->grow_dev_page()->link_dev_buffers():

static inline void link_dev_buffers(struct page *page, struct buffer_head *head)
{
 struct buffer_head *bh, *tail;
 bh = head;
 do {
  tail = bh;
  bh = bh->b_this_page;
 } while (bh);
 tail->b_this_page = head;
 attach_page_buffers(page, head);
}
link_dev_buffers()->attach_page_buffers():初始化page->private
static inline void attach_page_buffers(struct page *page,struct buffer_head *head)
{
 page_cache_get(page);
 SetPagePrivate(page);
 /*page->private赋值,查找过程中的page_has_buffers(page)使用*/
 set_page_private(page, (unsigned long)head);
}
squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow()->grow_buffers->grow_dev_page()->init_page_buffers():
static sector_t init_page_buffers(struct page *page, struct block_device *bdev,sector_t block, int size)
{
 struct buffer_head *head = page_buffers(page);
 struct buffer_head *bh = head;
 int uptodate = PageUptodate(page);
 sector_t end_block = blkdev_max_block(I_BDEV(bdev->bd_inode), size);
 do {
  if (!buffer_mapped(bh)) {
   init_buffer(bh, NULL, NULL);
   bh->b_bdev = bdev;
    /*查找buffer_head时通过比较b_blocknr查找,要操作的flash分区内偏移地址对应的逻辑块(一个逻辑块1024byte)*/
   bh->b_blocknr = block;
   if (uptodate)
    set_buffer_uptodate(bh);
   if (block < end_block)
    set_buffer_mapped(bh);
  }
  block++;
  bh = bh->b_this_page;
 } while (bh != head);
 /*
  * Caller needs to validate requested block against end of device.
  */
 return end_block;
}

1.3 提交操作请求

squashfs_read_data()->ll_rw_block()->submit_bh->__submit_bh():

申请并根据上文创建的buffer_head生成bio,squashfs_read_data()中每0x200大小提交一个请求.

int _submit_bh(int rw, struct buffer_head *bh, unsigned long bio_flags)
{
 struct bio *bio;
 int ret = 0;
 BUG_ON(!buffer_locked(bh));
 BUG_ON(!buffer_mapped(bh));
 BUG_ON(!bh->b_end_io);
 BUG_ON(buffer_delay(bh));
 BUG_ON(buffer_unwritten(bh));
 /*
  * Only clear out a write error when rewriting
  */
 if (test_set_buffer_req(bh) && (rw & WRITE))
  clear_buffer_write_io_error(bh);
 /*
  * from here on down, it's all bio -- do the initial mapping,
  * submit_bio -> generic_make_request may further map this bio around
  */
 bio = bio_alloc(GFP_NOIO, 1);
 /* b_blocknr表示要读取得flash分区内偏移地址所在逻辑块,bh->b_size=1024  */
 bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 bio->bi_bdev = bh->b_bdev;
 /* 由上文buffer_head的创建可知page->private上保存buffer_head;bh->b_size=sb->s_blocksize=1024;bh->b_data地址在bh->b_page页上*/
 bio->bi_io_vec[0].bv_page = bh->b_page;
 bio->bi_io_vec[0].bv_len = bh->b_size;
 bio->bi_io_vec[0].bv_offset = bh_offset(bh);
 bio->bi_vcnt = 1;
 bio->bi_iter.bi_size = bh->b_size;
 bio->bi_end_io = end_bio_bh_io_sync;
 /* 可根据bh找到flash读取到的dram地址 */
 bio->bi_private = bh;
 bio->bi_flags |= bio_flags;
 /* Take care of bh's that straddle the end of the device */
 guard_bio_eod(rw, bio);
 if (buffer_meta(bh))
  rw |= REQ_META;
 if (buffer_prio(bh))
  rw |= REQ_PRIO;
 bio_get(bio);
 /*
 举例:要读取0x23856地址,假设其所在分区起始地址为0x300000;
      do_blktrans_request中处理此处的提交的请求:
      该地址所在分区内逻辑块bh->b_blocknr为0x23856>>10=0x8e;bi_sector=0x8e<<1=0x11c
      则在do_cache_read中读取的逻辑块地址bi_sector<<9=0x23800
      则在part_read中读取的flash地址为0x300000+0x23800;

 */
 submit_bio(rw, bio);
 if (bio_flagged(bio, BIO_EOPNOTSUPP))
  ret = -EOPNOTSUPP;
 bio_put(bio);
 return ret;
}
1.4 do_blktrans_request处理__submit_bh->submit_bio一系列函数中提交的请求.
static struct mtd_blktrans_ops mtdblock_tr = {
	.name		= "mtdblock",
	.major		= MTD_BLOCK_MAJOR,
	.part_bits	= 0,
	.blksize 	= 512,
	.open		= mtdblock_open,
	.flush		= mtdblock_flush,
	.release	= mtdblock_release,
	.readsect	= mtdblock_readsect,
	.writesect	= mtdblock_writesect,
	.add_mtd	= mtdblock_add_mtd,
	.remove_dev	= mtdblock_remove_dev,
	.owner		= THIS_MODULE,
};
static int do_blktrans_request(struct mtd_blktrans_ops *tr,struct mtd_blktrans_dev *dev,struct request *req)
{
	unsigned long block, nsect;
	char *buf;
        /* mtdblock_tr=tr->blkshift=9;tr->blksize=512 */
	block = blk_rq_pos(req) << 9 >> tr->blkshift;
	nsect = blk_rq_cur_bytes(req) >> tr->blkshift;
	buf = bio_data(req->bio);

	if (req->cmd_type != REQ_TYPE_FS)
		return -EIO;

	if (req->cmd_flags & REQ_FLUSH)
		return tr->flush(dev);

	if (blk_rq_pos(req) + blk_rq_cur_sectors(req) >
	    get_capacity(req->rq_disk))
		return -EIO;

	if (req->cmd_flags & REQ_DISCARD)
		return tr->discard(dev, block, nsect);

	switch(rq_data_dir(req)) {
	case READ:
           /* 
           block 是__submit_bh中的bi_sector=bh->b_blocknr * (bh->b_size >> 9)
           所以在mtdblock_readsect中会将block=block<<9,此时block是flash分区内逻辑块号
           举例:要读取0x23856地址,假设其所在分区起始地址为0x300000;
           改地址所在分区内逻辑块bh->b_blocknr为0x23856/1024=0x8e;bi_sector=0x11c
           则在do_cache_read中读取的逻辑块地址bi_sector<<9=0x23800
           则在part_read中读取的flash地址为0x300000+0x23800;
           */
		for (; nsect > 0; nsect--, block++, buf += tr->blksize)
			if (tr->readsect(dev, block, buf))
				return -EIO;
		rq_flush_dcache_pages(req);
		return 0;
	case WRITE:
		if (!tr->writesect)
			return -EIO;

		rq_flush_dcache_pages(req);
		for (; nsect > 0; nsect--, block++, buf += tr->blksize)
			if (tr->writesect(dev, block, buf))
				return -EIO;
		return 0;
	default:
		printk(KERN_NOTICE "Unknown request %u\n", rq_data_dir(req));
		return -EIO;
	}
}

2 mtdblock读取flash过程

以ubi文件系统访问mtd过程为例:ubi文件系统打开一个文件时,会从flash上读取inode信息:

do_filp_open->do_last->lookup_real->ubifs_lookup->ubifs_tnc_lookup_nm->ubifs_tnc_locate

->ubifs_lookup_level0->ubifs_load_znode->ubifs_read_node->ubifs_io_read->mtd_read->part_read->nand_read

ubifs_dump_node:dump从inode信息;

2.1 do_blktrans_request()->mtdblock_readsect():

static int mtdblock_readsect(struct mtd_blktrans_dev *dev,unsigned long block, char *buf)
{
	struct mtdblk_dev *mtdblk = container_of(dev, struct mtdblk_dev, mbd);
     /* block<<9对应的分区内逻辑块地址,注意此时的地址0表示分区的起始地址,不是真正硬件flash的0地址 */
	return do_cached_read(mtdblk, block<<9, 512, buf);
}
2.2 do_blktrans_request()->mtdblock_readsect()->do_cached_read():
static int do_cached_read (struct mtdblk_dev *mtdblk, unsigned long pos,int len, char *buf)
{
	struct mtd_info *mtd = mtdblk->mbd.mtd;
     /* mtdblk->cache_size=0x20000 */
	unsigned int sect_size = mtdblk->cache_size;
	size_t retlen;
	int ret;

	pr_debug("mtdblock: read on \"%s\" at 0x%lx, size 0x%x\n",
			mtd->name, pos, len);

	if (!sect_size)
		return mtd_read(mtd, pos, len, &retlen, buf);

	while (len > 0) {
		unsigned long sect_start = (pos/sect_size)*sect_size;
		unsigned int offset = pos - sect_start;
		unsigned int size = sect_size - offset;
		if (size > len)
			size = len;

		/*
		 * Check if the requested data is already cached
		 * Read the requested amount of data from our internal cache if it
		 * contains what we want, otherwise we read the data directly
		 * from flash.
		 */
		if (mtdblk->cache_state != STATE_EMPTY &&
		    mtdblk->cache_offset == sect_start) {
			memcpy (buf, mtdblk->cache_data + offset, size);
		} else {
			ret = mtd_read(mtd, pos, size, &retlen, buf);
			if (ret)
				return ret;
			if (retlen != size)
				return -EIO;
		}

		buf += size;
		pos += size;
		len -= size;
	}

	return 0;
}

2.3 mtdblock_readsect->do_cached_read->mtd_read->part_read():

static int part_read(struct mtd_info *mtd, loff_t from, size_t len,size_t *retlen, u_char *buf)
{
	struct mtd_part *part = PART(mtd);
	struct mtd_ecc_stats stats;
	int res;

	stats = part->master->ecc_stats;
        /*nand_read:注意此时真正操作flash,需要把分区内的逻辑块偏移地址加上分区起始地址作为真正的硬件flash地址*/
	res = part->master->_read(part->master, from + part->offset, len,
				  retlen, buf);
	if (unlikely(mtd_is_eccerr(res)))
		mtd->ecc_stats.failed +=
			part->master->ecc_stats.failed - stats.failed;
	else
		mtd->ecc_stats.corrected +=
			part->master->ecc_stats.corrected - stats.corrected;
	return res;
}

2.4 part_read->nand_read->nand_do_read_ops->(chip->cmdfunc=amb_nand_cmdfunc)

static int nand_read(struct mtd_info *mtd, loff_t from, size_t len,
       size_t *retlen, uint8_t *buf)
{
 struct mtd_oob_ops ops;
 int ret;
 nand_get_device(mtd, FL_READING);
 ops.len = len;
 ops.datbuf = buf;
 ops.oobbuf = NULL;
 ops.mode = MTD_OPS_PLACE_OOB;
 ret = nand_do_read_ops(mtd, from, &ops);
 *retlen = ops.retlen;
 nand_release_device(mtd);
 return ret;
}
nand_read->nand_do_read_ops->
/* from表示要读取的flash地址,表示从flash起始地址0x0开始偏移的地址,转换成page进行读取,一个page大小为2KB;
举例:log分区地址为<0x7200000,0x7600000>若from=0x7300000,则表示要读取log分区的文件对应page为0x7300000/2048;
*/

static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
       struct mtd_oob_ops *ops)
{
 int chipnr, page, realpage, col, bytes, aligned, oob_required;
 struct nand_chip *chip = mtd->priv;
 int ret = 0;
 uint32_t readlen = ops->len;
 uint32_t oobreadlen = ops->ooblen;
 uint32_t max_oobsize = ops->mode == MTD_OPS_AUTO_OOB ?
                 mtd->oobavail : mtd->oobsize;
 uint8_t *bufpoi, *oob, *buf;
 int use_bufpoi;
 unsigned int max_bitflips = 0;
 int retry_mode = 0;
 bool ecc_fail = false;
 chipnr = (int)(from >> chip->chip_shift);
 chip->select_chip(mtd, chipnr);

 /* readpage表示要读取的flash页page num,flash一个page大小为2KB */
 realpage = (int)(from >> chip->page_shift);
 page = realpage & chip->pagemask;
 col = (int)(from & (mtd->writesize - 1));
 buf = ops->datbuf;
 oob = ops->oobbuf;
 oob_required = oob ? 1 : 0;
 while (1) {
  unsigned int ecc_failures = mtd->ecc_stats.failed;
  bytes = min(mtd->writesize - col, readlen);
  aligned = (bytes == mtd->writesize);
  if (!aligned)
   use_bufpoi = 1;
  else if (chip->options & NAND_USE_BOUNCE_BUFFER)
   use_bufpoi = !virt_addr_valid(buf);
  else
   use_bufpoi = 0;
  /* Is the current page in the buffer? */
  /*表示readpage这个flash页,是否已经读取过,如果未读取过,则调用chip->cmdfunc触发从flash驱动读取过程:
   (chip->cmdfunc=amb_nand_cmdfunc)->	nand_amb_read_data->nand_amb_request
    如果该页被读取过,则从chip->buffers->databuf直接拷贝*/
  if (realpage != chip->pagebuf || oob) {
   bufpoi = use_bufpoi ? chip->buffers->databuf : buf;
   if (use_bufpoi && aligned)
    pr_debug("%s: using read bounce buffer for buf@%p\n", __func__, buf);
read_retry:
/*
调用chip->cmdfunc真正触发从flash驱动读取数据:(chip->cmdfunc=amb_nand_cmdfunc)->nand_amb_read_data->nand_amb_request;
nand_amb_request中有等待flash访问完成的操作

*/
   chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
   /*
    * Now read the page into the buffer.  Absent an error,
    * the read methods return max bitflips per ecc step.
    */
   if (unlikely(ops->mode == MTD_OPS_RAW))
       ret = chip->ecc.read_page_raw(mtd, chip, bufpoi,oob_required,page);
   else if (!aligned && NAND_HAS_SUBPAGE_READ(chip) &&!oob)
       ret = chip->ecc.read_subpage(mtd, chip,col, bytes, bufpoi,page);
   else

/*nand_read_page_hwecc->amb_nand_read_buf 直接从flash驱动里的dma地址拷贝数据,真正读取过程是在cmdfunc中触发*/
   ret = chip->ecc.read_page(mtd, chip, bufpoi,oob_required, page);
   if (ret < 0) {
    if (use_bufpoi)
     /* Invalidate page cache */
     chip->pagebuf = -1;
    break;
   }
   max_bitflips = max_t(unsigned int, max_bitflips, ret);
   /* Transfer not aligned data */
   if (use_bufpoi) {
    if (!NAND_HAS_SUBPAGE_READ(chip) && !oob &&
        !(mtd->ecc_stats.failed - ecc_failures) &&
        (ops->mode != MTD_OPS_RAW)) {
     chip->pagebuf = realpage;
     chip->pagebuf_bitflips = ret;
    } else {
     /* Invalidate page cache */
     chip->pagebuf = -1;
    }
    memcpy(buf, chip->buffers->databuf + col, bytes);
   }
   if (unlikely(oob)) {
    int toread = min(oobreadlen, max_oobsize);
    if (toread) {
     oob = nand_transfer_oob(chip,
      oob, ops, toread);
     oobreadlen -= toread;
    }
   }

   /*等待flash准备好,以便下一page的访问*/
   if (chip->options & NAND_NEED_READRDY) {
    /* Apply delay or wait for ready/busy pin */
    if (!chip->dev_ready)
     udelay(chip->chip_delay);
    else
     nand_wait_ready(mtd);
   }
   if (mtd->ecc_stats.failed - ecc_failures) {
    if (retry_mode + 1 < chip->read_retries) {
     retry_mode++;
     ret = nand_setup_read_retry(mtd,
       retry_mode);
     if (ret < 0)
      break;
     /* Reset failures; retry */
     mtd->ecc_stats.failed = ecc_failures;
     goto read_retry;
    } else {
     /* No more retry modes; real failure */
     ecc_fail = true;
    }
   }
   buf += bytes;
  } else {
   memcpy(buf, chip->buffers->databuf + col, bytes);
   buf += bytes;
   max_bitflips = max_t(unsigned int, max_bitflips,
          chip->pagebuf_bitflips);
  }
  readlen -= bytes;
  /* Reset to retry mode 0 */
  if (retry_mode) {
   ret = nand_setup_read_retry(mtd, 0);
   if (ret < 0)
    break;
   retry_mode = 0;
  }
  if (!readlen)
   break;
  /* For subsequent reads align to page boundary */
  col = 0;
  /* Increment page address */
  realpage++;
  page = realpage & chip->pagemask;
  /* Check, if we cross a chip boundary */
  if (!page) {
   chipnr++;
   chip->select_chip(mtd, -1);
   chip->select_chip(mtd, chipnr);
  }
 }
 chip->select_chip(mtd, -1);
 ops->retlen = ops->len - (size_t) readlen;
 if (oob)
  ops->oobretlen = ops->ooblen - oobreadlen;
 if (ret < 0)
  return ret;
 if (ecc_fail)
  return -EBADMSG;
 return max_bitflips;
}

【正文】metadata block介绍之inode创建

上文对metadatablock已经有所提及,本节将以打开文件过程作为一个实例,介绍一下metadata block的含义及用法.

通过访问文件间接访问flash的方式,通常会用到metadata block,此时通过读取metadata block信息获取文件对应的inode信息,再根据inode信息读出文件内容.

1>首先打开文件过程可参考:

 linux文件系统权限管理一文:http://blog.csdn.net/eleven_xiy/article/details/70210828

2>读取普通文件,首先要读取普通文件inode的信息,根据inode获取文件的具体信息.

1 squashfs_read_inode读取flash上的inode信息是通过上文介绍的squashfs直接读取flash块设备的方式.

flash上保存的inode信息为:

struct squashfs_base_inode {
	__le16			inode_type;
	__le16			mode;
	__le16			uid;
	__le16			guid;
	__le32			mtime;
	__le32			inode_number;
};
union squashfs_inode {
	struct squashfs_base_inode		base;
	struct squashfs_dev_inode		dev;
	struct squashfs_ldev_inode		ldev;
	struct squashfs_symlink_inode		symlink;
	struct squashfs_reg_inode		reg;
	struct squashfs_lreg_inode		lreg;
	struct squashfs_dir_inode		dir;
	struct squashfs_ldir_inode		ldir;
	struct squashfs_ipc_inode		ipc;
	struct squashfs_lipc_inode		lipc;
}; 

从flash上metadata block中读取inode信息,squashfs_base_inode和squashfs_inode是连续的flash区间,且都属于metadata block,读取inode信息时,先从flash上读取squashfs_base_inode再读取squashfs_inode:

/*
入参分析:
inode表示操作系统申请的dram上的inode结构,该结构的关键信息时通过squashfs_read_inode从flash上读取的.
ino表示inode信息在flash上的保存地址.即metadata block的地址.
ino右移16bit加上该superblock的inode_table是保存inode的信息的flash地址.
ino低16bit表示squashfs_base_inode信息在inode信息中的偏移地址;
普通inode和root inode的ino获取方式不同,见后文分析;
*/
int squashfs_read_inode(struct inode *inode, long long ino)
{
	struct super_block *sb = inode->i_sb;
	struct squashfs_sb_info *msblk = sb->s_fs_info;
     /*根据ino计算inode信息在inode_table中的偏移地址*/
	u64 block = SQUASHFS_INODE_BLK(ino) + msblk->inode_table;
     /*根据ino计算squashfs_base_inode信息在inode信息中的偏移地址*/
	int err, type, offset = SQUASHFS_INODE_OFFSET(ino);
	union squashfs_inode squashfs_ino;
	struct squashfs_base_inode *sqshb_ino = &squashfs_ino.base;
	int xattr_id = SQUASHFS_INVALID_XATTR;

	TRACE("Entered squashfs_read_inode\n");

	/*
	 * Read inode base common to all inode types.
	 */
     /*从flash上读取squashfs_base_inode信息*/
	err = squashfs_read_metadata(sb, sqshb_ino, &block,
				&offset, sizeof(*sqshb_ino));
	if (err < 0)
		goto failed_read;
     /* 根据从flash上读取的inode信息更新dram上的inode结构 */
	err = squashfs_new_inode(sb, inode, sqshb_ino);
	if (err)
		goto failed_read;

	block = SQUASHFS_INODE_BLK(ino) + msblk->inode_table;
	offset = SQUASHFS_INODE_OFFSET(ino);

	type = le16_to_cpu(sqshb_ino->inode_type);
	switch (type) {
	case SQUASHFS_REG_TYPE: {
		unsigned int frag_offset, frag;
		int frag_size;
		u64 frag_blk;
		struct squashfs_reg_inode *sqsh_ino = &squashfs_ino.reg;
           /*从flash上读取squashfs_inode信息*/
		err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
							sizeof(*sqsh_ino));
		if (err < 0)
			goto failed_read;

		frag = le32_to_cpu(sqsh_ino->fragment);
		if (frag != SQUASHFS_INVALID_FRAG) {
			frag_offset = le32_to_cpu(sqsh_ino->offset);
			frag_size = squashfs_frag_lookup(sb, frag, &frag_blk);
			if (frag_size < 0) {
				err = frag_size;
				goto failed_read;
			}
		} else {
			frag_blk = SQUASHFS_INVALID_BLK;
			frag_size = 0;
			frag_offset = 0;
		}

		set_nlink(inode, 1);
		inode->i_size = le32_to_cpu(sqsh_ino->file_size);
		inode->i_fop = &generic_ro_fops;
		inode->i_mode |= S_IFREG;
		inode->i_blocks = ((inode->i_size - 1) >> 9) + 1;
		squashfs_i(inode)->fragment_block = frag_blk;
		squashfs_i(inode)->fragment_size = frag_size;
		squashfs_i(inode)->fragment_offset = frag_offset;
		squashfs_i(inode)->start = le32_to_cpu(sqsh_ino->start_block);
		squashfs_i(inode)->block_list_start = block;
		squashfs_i(inode)->offset = offset;
		inode->i_data.a_ops = &squashfs_aops;

		TRACE("File inode %x:%x, start_block %llx, block_list_start "
			"%llx, offset %x\n", SQUASHFS_INODE_BLK(ino),
			offset, squashfs_i(inode)->start, block, offset);
		break;
	}
	case SQUASHFS_LREG_TYPE: {
		unsigned int frag_offset, frag;
		int frag_size;
		u64 frag_blk;
		struct squashfs_lreg_inode *sqsh_ino = &squashfs_ino.lreg;

		err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
							sizeof(*sqsh_ino));
		if (err < 0)
			goto failed_read;

		frag = le32_to_cpu(sqsh_ino->fragment);
		if (frag != SQUASHFS_INVALID_FRAG) {
			frag_offset = le32_to_cpu(sqsh_ino->offset);
			frag_size = squashfs_frag_lookup(sb, frag, &frag_blk);
			if (frag_size < 0) {
				err = frag_size;
				goto failed_read;
			}
		} else {
			frag_blk = SQUASHFS_INVALID_BLK;
			frag_size = 0;
			frag_offset = 0;
		}

		xattr_id = le32_to_cpu(sqsh_ino->xattr);
		set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
		inode->i_size = le64_to_cpu(sqsh_ino->file_size);
		inode->i_op = &squashfs_inode_ops;
		inode->i_fop = &generic_ro_fops;
		inode->i_mode |= S_IFREG;
		inode->i_blocks = (inode->i_size -
				le64_to_cpu(sqsh_ino->sparse) + 511) >> 9;

		squashfs_i(inode)->fragment_block = frag_blk;
		squashfs_i(inode)->fragment_size = frag_size;
		squashfs_i(inode)->fragment_offset = frag_offset;
		squashfs_i(inode)->start = le64_to_cpu(sqsh_ino->start_block);
		squashfs_i(inode)->block_list_start = block;
		squashfs_i(inode)->offset = offset;
		inode->i_data.a_ops = &squashfs_aops;

		TRACE("File inode %x:%x, start_block %llx, block_list_start "
			"%llx, offset %x\n", SQUASHFS_INODE_BLK(ino),
			offset, squashfs_i(inode)->start, block, offset);
		break;
	}
	case SQUASHFS_DIR_TYPE: {
		struct squashfs_dir_inode *sqsh_ino = &squashfs_ino.dir;

		err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
				sizeof(*sqsh_ino));
		if (err < 0)
			goto failed_read;

		set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
		inode->i_size = le16_to_cpu(sqsh_ino->file_size);
		inode->i_op = &squashfs_dir_inode_ops;
		inode->i_fop = &squashfs_dir_ops;
		inode->i_mode |= S_IFDIR;
		squashfs_i(inode)->start = le32_to_cpu(sqsh_ino->start_block);
		squashfs_i(inode)->offset = le16_to_cpu(sqsh_ino->offset);
		squashfs_i(inode)->dir_idx_cnt = 0;
		squashfs_i(inode)->parent = le32_to_cpu(sqsh_ino->parent_inode);

		TRACE("Directory inode %x:%x, start_block %llx, offset %x\n",
				SQUASHFS_INODE_BLK(ino), offset,
				squashfs_i(inode)->start,
				le16_to_cpu(sqsh_ino->offset));
		break;
	}
	case SQUASHFS_LDIR_TYPE: {
		struct squashfs_ldir_inode *sqsh_ino = &squashfs_ino.ldir;

		err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
				sizeof(*sqsh_ino));
		if (err < 0)
			goto failed_read;

		xattr_id = le32_to_cpu(sqsh_ino->xattr);
		set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
		inode->i_size = le32_to_cpu(sqsh_ino->file_size);
		inode->i_op = &squashfs_dir_inode_ops;
		inode->i_fop = &squashfs_dir_ops;
		inode->i_mode |= S_IFDIR;
		squashfs_i(inode)->start = le32_to_cpu(sqsh_ino->start_block);
		squashfs_i(inode)->offset = le16_to_cpu(sqsh_ino->offset);
		squashfs_i(inode)->dir_idx_start = block;
		squashfs_i(inode)->dir_idx_offset = offset;
		squashfs_i(inode)->dir_idx_cnt = le16_to_cpu(sqsh_ino->i_count);
		squashfs_i(inode)->parent = le32_to_cpu(sqsh_ino->parent_inode);

		TRACE("Long directory inode %x:%x, start_block %llx, offset "
				"%x\n", SQUASHFS_INODE_BLK(ino), offset,
				squashfs_i(inode)->start,
				le16_to_cpu(sqsh_ino->offset));
		break;
	}
	case SQUASHFS_SYMLINK_TYPE:
	case SQUASHFS_LSYMLINK_TYPE: {
		struct squashfs_symlink_inode *sqsh_ino = &squashfs_ino.symlink;

		err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
				sizeof(*sqsh_ino));
		if (err < 0)
			goto failed_read;

		set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
		inode->i_size = le32_to_cpu(sqsh_ino->symlink_size);
		inode->i_op = &squashfs_symlink_inode_ops;
		inode->i_data.a_ops = &squashfs_symlink_aops;
		inode->i_mode |= S_IFLNK;
		squashfs_i(inode)->start = block;
		squashfs_i(inode)->offset = offset;

		if (type == SQUASHFS_LSYMLINK_TYPE) {
			__le32 xattr;

			err = squashfs_read_metadata(sb, NULL, &block,
						&offset, inode->i_size);
			if (err < 0)
				goto failed_read;
			err = squashfs_read_metadata(sb, &xattr, &block,
						&offset, sizeof(xattr));
			if (err < 0)
				goto failed_read;
			xattr_id = le32_to_cpu(xattr);
		}

		TRACE("Symbolic link inode %x:%x, start_block %llx, offset "
				"%x\n", SQUASHFS_INODE_BLK(ino), offset,
				block, offset);
		break;
	}
	case SQUASHFS_BLKDEV_TYPE:
	case SQUASHFS_CHRDEV_TYPE: {
		struct squashfs_dev_inode *sqsh_ino = &squashfs_ino.dev;
		unsigned int rdev;

		err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
				sizeof(*sqsh_ino));
		if (err < 0)
			goto failed_read;

		if (type == SQUASHFS_CHRDEV_TYPE)
			inode->i_mode |= S_IFCHR;
		else
			inode->i_mode |= S_IFBLK;
		set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
		rdev = le32_to_cpu(sqsh_ino->rdev);
		init_special_inode(inode, inode->i_mode, new_decode_dev(rdev));


		TRACE("Device inode %x:%x, rdev %x\n",
				SQUASHFS_INODE_BLK(ino), offset, rdev);
		break;
	}
	case SQUASHFS_LBLKDEV_TYPE:
	case SQUASHFS_LCHRDEV_TYPE: {
		struct squashfs_ldev_inode *sqsh_ino = &squashfs_ino.ldev;
		unsigned int rdev;

		err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
				sizeof(*sqsh_ino));
		if (err < 0)
			goto failed_read;

		if (type == SQUASHFS_LCHRDEV_TYPE)
			inode->i_mode |= S_IFCHR;
		else
			inode->i_mode |= S_IFBLK;
		xattr_id = le32_to_cpu(sqsh_ino->xattr);
		inode->i_op = &squashfs_inode_ops;
		set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
		rdev = le32_to_cpu(sqsh_ino->rdev);
		init_special_inode(inode, inode->i_mode, new_decode_dev(rdev));

		TRACE("Device inode %x:%x, rdev %x\n",
				SQUASHFS_INODE_BLK(ino), offset, rdev);
		break;
	}
	case SQUASHFS_FIFO_TYPE:
	case SQUASHFS_SOCKET_TYPE: {
		struct squashfs_ipc_inode *sqsh_ino = &squashfs_ino.ipc;

		err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
				sizeof(*sqsh_ino));
		if (err < 0)
			goto failed_read;

		if (type == SQUASHFS_FIFO_TYPE)
			inode->i_mode |= S_IFIFO;
		else
			inode->i_mode |= S_IFSOCK;
		set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
		init_special_inode(inode, inode->i_mode, 0);
		break;
	}
	case SQUASHFS_LFIFO_TYPE:
	case SQUASHFS_LSOCKET_TYPE: {
		struct squashfs_lipc_inode *sqsh_ino = &squashfs_ino.lipc;

		err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
				sizeof(*sqsh_ino));
		if (err < 0)
			goto failed_read;

		if (type == SQUASHFS_LFIFO_TYPE)
			inode->i_mode |= S_IFIFO;
		else
			inode->i_mode |= S_IFSOCK;
		xattr_id = le32_to_cpu(sqsh_ino->xattr);
		inode->i_op = &squashfs_inode_ops;
		set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
		init_special_inode(inode, inode->i_mode, 0);
		break;
	}
	default:
		ERROR("Unknown inode type %d in squashfs_iget!\n", type);
		return -EINVAL;
	}

	if (xattr_id != SQUASHFS_INVALID_XATTR && msblk->xattr_id_table) {
		err = squashfs_xattr_lookup(sb, xattr_id,
					&squashfs_i(inode)->xattr_count,
					&squashfs_i(inode)->xattr_size,
					&squashfs_i(inode)->xattr);
		if (err < 0)
			goto failed_read;
		inode->i_blocks += ((squashfs_i(inode)->xattr_size - 1) >> 9)
				+ 1;
	} else
		squashfs_i(inode)->xattr_count = 0;

	return 0;

failed_read:
	ERROR("Unable to read inode 0x%llx\n", ino);
	return err;
}

2 通过metadata block上获取flash上的dir和inode信息.

root inode信息在flash的保存地址,由squashfs_fill_super知root inode对应的flash地址也是从flash上超级块的信息中获取的;

找到inode信息在flash上的保存地址,可以参考函数squashfs_lookup();

如打开一个文件时:do_sys_open->do_last->lookup_real->squashfs_lookup:

flash上保存squashfs_dir_header,squashfs_dir_entry,squashfs_dir_entry->name是连续的flash区间,且都属于metadata blcok.比如操作系统在查找一个文件/mnt/test时,需要先从flash上获取待查找文件test所在目录mnt的squashfs_dir_header,再根据squashfs_dir_header获取mnt下的目录文件和普通文件个数,遍历mnt下每个文件,从flash上获取每个文件的squash_dir_entry和squash_dir_entry->name,根据squash_dir_entry->name判断是否用户要查找的文件,如果是则根据squashfs_dir_header->start_block和squashfs_dir_entry->offset指定的inode信息在flash上位置,获取squashfs_inode信息.

struct squashfs_dir_entry {
	__le16			offset;
	__le16			inode_number;
	__le16			type;
	__le16			size;
	char			name[0];
};
struct squashfs_dir_header {
	__le32			count;
	__le32			start_block;
	__le32			inode_number;
};

squashfs_lookup给出了flash上普通文件inode信息的查找与获取方法:

/*
入参分析:dir表示文件所在目录对应的inode;dentry表示文件对应的目录项;
*/
static struct dentry *squashfs_lookup(struct inode *dir, struct dentry *dentry,
				 unsigned int flags)
{
	const unsigned char *name = dentry->d_name.name;
	int len = dentry->d_name.len;
	struct inode *inode = NULL;
	struct squashfs_sb_info *msblk = dir->i_sb->s_fs_info;
	struct squashfs_dir_header dirh;
	struct squashfs_dir_entry *dire;
        /*
        根据文件所在目录的inode信息计算,文件所在目录信息squashfs_dir_header在directory_table中的偏移地址
        因为根节点在挂载文件系统时初始化好了,所以文件所在目录信息在flash上的保存地址总能逐级找到.
        */
	u64 block = squashfs_i(dir)->start + msblk->directory_table;
        /*根据文件所在目录的inode信息计算squashfs_dir_header信息在directory信息中的偏移地址*/
	int offset = squashfs_i(dir)->offset;
	int err, length;
	unsigned int dir_count, size;

	TRACE("Entered squashfs_lookup [%llx:%x]\n", block, offset);

	dire = kmalloc(sizeof(*dire) + SQUASHFS_NAME_LEN + 1, GFP_KERNEL);
	if (dire == NULL) {
		ERROR("Failed to allocate squashfs_dir_entry\n");
		return ERR_PTR(-ENOMEM);
	}

	if (len > SQUASHFS_NAME_LEN) {
		err = -ENAMETOOLONG;
		goto failed;
	}

	length = get_dir_index_using_name(dir->i_sb, &block, &offset,
				squashfs_i(dir)->dir_idx_start,
				squashfs_i(dir)->dir_idx_offset,
				squashfs_i(dir)->dir_idx_cnt, name, len);

	while (length < i_size_read(dir)) {
		/*
		 * Read directory header.
		 */
                /*从flash上获取文件所在目录的squashfs_dir_header信息*/
		err = squashfs_read_metadata(dir->i_sb, &dirh, &block,
				&offset, sizeof(dirh));
		if (err < 0)
			goto read_failure;

		length += sizeof(dirh);

		dir_count = le32_to_cpu(dirh.count) + 1;

		if (dir_count > SQUASHFS_DIR_COUNT)
			goto data_error;
                /*遍历文件所在目录下的所有目录文件和普通文件,注意此处接连读取了squashfs_dir_header和squashfs_dir_entry和squashfs_dir_header->name
                  这说明squashfs_dir_header和squashfs_dir_entry和squashfs_dir_entry->name几个信息在metadata block上是连续的flash区间
                */
		while (dir_count--) {
			/*
			 * Read directory entry.
			 */
                       /*从flash上获取文件所在目录的squashfs_dir_entry信息,该信息中保存了文件名在flash上的保存地址等信息*/
			err = squashfs_read_metadata(dir->i_sb, dire, &block,
					&offset, sizeof(*dire));
			if (err < 0)
				goto read_failure;

			size = le16_to_cpu(dire->size) + 1;

			/* size should never be larger than SQUASHFS_NAME_LEN */
			if (size > SQUASHFS_NAME_LEN)
				goto data_error;

                        /*从flash上获取目录文件或普通文件的文件名*/
			err = squashfs_read_metadata(dir->i_sb, dire->name,
					&block, &offset, size);
			if (err < 0)
				goto read_failure;

			length += sizeof(*dire) + size;

			if (name[0] < dire->name[0])
				goto exit_lookup;
                        /* 比较从flash中读取的文件名和squashfs_looup中查找的文件名,判断是否找到文件的inode信息 */
			if (len == size && !strncmp(name, dire->name, len)) {
				unsigned int blk, off, ino_num;
				long long ino;
				blk = le32_to_cpu(dirh.start_block);
				off = le16_to_cpu(dire->offset);
				ino_num = le32_to_cpu(dirh.inode_number) +
					(short) le16_to_cpu(dire->inode_number);
				ino = SQUASHFS_MKINODE(blk, off);


				TRACE("calling squashfs_iget for directory "
					"entry %s, inode  %x:%x, %d\n", name,
					blk, off, ino_num);
                     /*
                     如果在flash上找到了文件名,则为该文件创建inode,
                     值得注意的是inode信息也是通过squashfs_iget->squashfs_read_inode从flash中获取的.  
                     */
				inode = squashfs_iget(dir->i_sb, ino, ino_num);
				goto exit_lookup;
			}
		}
	}

exit_lookup:
	kfree(dire);
	return d_splice_alias(inode, dentry);

data_error:
	err = -EIO;

read_failure:
	ERROR("Unable to read directory block [%llx:%x]\n",
		squashfs_i(dir)->start + msblk->directory_table,
		squashfs_i(dir)->offset);
failed:
	kfree(dire);
	return ERR_PTR(err);
}
squashfs_lookup->get_dir_index_using_name:
static int get_dir_index_using_name(struct super_block *sb,
			u64 *next_block, int *next_offset, u64 index_start,
			int index_offset, int i_count, const char *name,
			int len)
{
	struct squashfs_sb_info *msblk = sb->s_fs_info;
	int i, length = 0, err;
	unsigned int size;
	struct squashfs_dir_index *index;
	char *str;

	TRACE("Entered get_dir_index_using_name, i_count %d\n", i_count);

	index = kmalloc(sizeof(*index) + SQUASHFS_NAME_LEN * 2 + 2, GFP_KERNEL);
	if (index == NULL) {
		ERROR("Failed to allocate squashfs_dir_index\n");
		goto out;
	}

	str = &index->name[SQUASHFS_NAME_LEN + 1];
	strncpy(str, name, len);
	str[len] = '\0';
     /*i_count=0*/
  	for (i = 0; i < i_count; i++) {
		err = squashfs_read_metadata(sb, index, &index_start,
					&index_offset, sizeof(*index));
		if (err < 0)
			break;

		size = le32_to_cpu(index->size) + 1;
		if (size > SQUASHFS_NAME_LEN)
			break;

		err = squashfs_read_metadata(sb, index->name, &index_start,
					&index_offset, size);
		if (err < 0)
			break;

		index->name[size] = '\0';

		if (strcmp(index->name, str) > 0)
			break;


		length = le32_to_cpu(index->index);
		*next_block = le32_to_cpu(index->start_block) +
					msblk->directory_table;
	}

	*next_offset = (length + *next_offset) % SQUASHFS_METADATA_SIZE;
	kfree(index);

out:
	/*
	 * Return index (f_pos) of the looked up metadata block.  Translate
	 * from internal f_pos to external f_pos which is offset by 3 because
	 * we invent "." and ".." entries which are not actually stored in the
	 * directory.
	 */
	return length + 3;
}

dirctory和inode信息保存在metadata block上,读取metadata block的接口是squashfs_read_metadata:

int squashfs_read_metadata(struct super_block *sb, void *buffer,
		u64 *block, int *offset, int length)
{
	struct squashfs_sb_info *msblk = sb->s_fs_info;
	int bytes, res = length;
	struct squashfs_cache_entry *entry;

	TRACE("Entered squashfs_read_metadata [%llx:%x]\n", *block, *offset);

	while (length) {
/*
从flash metadata block上读取directory或inode信息;block_cache是在squashfs_fill_super中为metadata block创建的缓存区
用来缓存metadata block中的数据;
*/
		entry = squashfs_cache_get(sb, msblk->block_cache, *block, 0);
		if (entry->error) {
			res = entry->error;
			goto error;
		} else if (*offset >= entry->length) {
			res = -EIO;
			goto error;
		}

          /*保存从metadata block上读取的信息到entry->data中*/
		bytes = squashfs_copy_data(buffer, entry, *offset, length);
		if (buffer)
			buffer += bytes;
		length -= bytes;
		*offset += bytes;

		if (*offset == entry->length) {
                        /* block地址更改,主要用于在squashfs_cache_get中判断,该block地址是否在 squashfs_cache_entry中,
                          如果不在需要从flash上读取到squashfs_cache_entry->data中
                         */
			*block = entry->next_index;
			*offset = 0;
		}

		squashfs_cache_put(entry);
	}

	return res;

error:
	squashfs_cache_put(entry);
	return res;
}

squashfs_cache_get:把flash上的数据读取到缓存区中,用户读文件时直接从缓存区读,如果没找到,再从flash上读.读取data/metadata/fragmentdata都是通过该接口.

squashfs_cache缓存区空间小,主要是读过程的缓存区,它和页高速缓存不同,页高速缓存文件内容.squashfs_cache缓存的文件内容要拷贝到页高速缓存中.

squashfs_cache缓存区主要包括:

1 block_cache缓存区(缓存metadata block上的内容,metadata block上主要保存inode和direcotry信息).

2 read_page缓存区(缓存data block上内容,主要是文件内容).

3 fragment_cache缓存区.

举例:squashfs_read_metadata()->squashfs_cache_get()从block_cache中获取metadata block的接口,如果block_cache中没有保存metadata,则从flash中读取到metadata block到block_cache里.

struct squashfs_cache_entry *squashfs_cache_get(struct super_block *sb,struct squashfs_cache *cache, u64 block, int length)
{
int i, n;
struct squashfs_cache_entry *entry;

spin_lock(&cache->lock);

while (1) {
/*metadata 缓存区有8个entry;cache->entries=8*/
for (i = cache->curr_blk, n = 0; n < cache->entries; n++) {
/*通过比较block来判断cache->entry是否使用*/
if (cache->entry[i].block == block) {
cache->curr_blk = i;
break;
}
i = (i + 1) % cache->entries;
}

/*
n == cache->entries表示metadata block数据不在block_cache的缓存区中,需要从flash上读取到缓存区,
metadata缓存区的创建可以查看上文的squashfs_fill_super->squashfs_cache_get
*/
if (n == cache->entries) {
/*
* Block not in cache, if all cache entries are used
* go to sleep waiting for one to become available.
 没有空闲的cache->entry,则等待直到cache->entry有空闲.
*/
if (cache->unused == 0) {
cache->num_waiters++;
spin_unlock(&cache->lock);
wait_event(cache->wait_queue, cache->unused);
spin_lock(&cache->lock);
cache->num_waiters--;
continue;
}

/*
* At least one unused cache entry.  A simple
* round-robin strategy is used to choose the entry to
* be evicted from the cache.
*/
i = cache->next_blk;
for (n = 0; n < cache->entries; n++) {
if (cache->entry[i].refcount == 0)
break;
i = (i + 1) % cache->entries;
}

cache->next_blk = (i + 1) % cache->entries;
entry = &cache->entry[i];

/*
* Initialise chosen cache entry, and fill it in from
* disk.squash_cache_entry未使用的个数减一
*/
cache->unused--;
entry->block = block;
entry->refcount = 1;
entry->pending = 1;
entry->num_waiters = 0;
entry->error = 0;
spin_unlock(&cache->lock);

/* 把flash上数据读取到squashfs_data_cache中,如此用户下次可以直接从cache上取数据,不用再读flash */
entry->length = squashfs_read_data(sb, block, length,
&entry->next_index, entry->actor);

spin_lock(&cache->lock);

if (entry->length < 0)
entry->error = entry->length;

entry->pending = 0;

/*
* While filling this entry one or more other processes
* have looked it up in the cache, and have slept
* waiting for it to become available.
*/
if (entry->num_waiters) {
spin_unlock(&cache->lock);
wake_up_all(&entry->wait_queue);
} else
spin_unlock(&cache->lock);

goto out;
}

/*
* Block already in cache.  Increment refcount so it doesn't
* get reused until we're finished with it, if it was
* previously unused there's one less cache entry available
* for reuse.
*/
entry = &cache->entry[i];
if (entry->refcount == 0)
cache->unused--;
entry->refcount++;

/*
* If the entry is currently being filled in by another process
* go to sleep waiting for it to become available.
*/
if (entry->pending) {
entry->num_waiters++;
spin_unlock(&cache->lock);
wait_event(entry->wait_queue, !entry->pending);
} else
spin_unlock(&cache->lock);

goto out;
}

out:
TRACE("Got %s %d, start block %lld, refcount %d, error %d\n",
cache->name, i, entry->block, entry->refcount, entry->error);

if (entry->error)
ERROR("Unable to read %s cache entry [%llx]\n", cache->name,
block);
return entry;
}
总结一下inode的创建过程,假设为/mnt/test文件创建inode:

1>操作系统中创建inode结构信息,需要从flash中读取关键信息.
2>首先要从flash上读取test所在目录mnt的目录头:squashfs_dir_header,由此可知mnt下面有多少个dir信息,每个普通文件或目录文件都对应一个squashfs_dir_entry信息.
3>接着遍历mnt目录下所有文件,从flash上读取mnt目录下所有文件的squashfs_dir_entry信息;再从flash上读取mnt目录下每个文件的文件名.
4>如果flash上metadata block中保存的文件名和我们查找的文件名匹配,则创建inode,创建inode时也需要从flash中读取inode信息.
5>如果metadata block已经读取到了squashfs_cache中,则不需要再从flash中读取,见squashfs_cache_get;
【正文】读文件之squashfs_readpage

read方式读取文件,系统调用处理过程:generic_file_aio_read->do_generic_file_read()

mmap方式读取文件,缺页异常处理过程:handle_pte_fault->do_nolinear_fault->__do_fault->filemap_fault

其中:mmap方式读取可以参考博文:linux内存回收机制 http://blog.csdn.net/eleven_xiy/article/details/75195490;

do_generic_file_read参见博文:linux文件系统实现原理简述 http://write.blog.csdn.net/postedit/71249365;

普通文件读操作read方式:generic_file_aio_read->do_generic_file_read->squashfs_readpage();
squashfs_readpage中正好包含了squashfs中读取flash的三种类型:

第一 读取metadata block,squashfs_read_metadata->squashfs_cache_get从block_cache缓存区获取数据,block_cache缓存区缓存了metadatalbock数据(inode和direcotry信息).如果缓存区不存在,则squashfs_cache_get->squashfs_read_data把flash上metadata block数据读到block_cache缓存区(squashfs_cache_entry_data中保存读取的flash数据).则下次就不用再从flash上读取.
第二 读取data block,squashfs_get_datablock->squashfs_cache_get从read_page缓存区获取数据,read_page缓存区缓存了datalbock数据(文件内容),如果缓存区不存在,则squashfs_cache_get->squashfs_read_data把flash上data block数据读到read_page缓存区.则下次就不用再从flash上读取.
第三 读取fragment block.squashfs_get_fragment->squashfs_cache_get从fragment_cache缓存区获取data block数据.

static int squashfs_readpage(struct file *file, struct page *page)
{
struct inode *inode = page->mapping->host;
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
int index = page->index >> (msblk->block_log - PAGE_CACHE_SHIFT);
/*把文件大小转换为块个数,1个块512k(mksquashfs -b时指定)*/
int file_end = i_size_read(inode) >> msblk->block_log;
int res;
void *pageaddr;
/*表示一个逻辑块所占页数的掩码
比如:一个分区的逻辑块大小为128k(mksquashfs时默认大小),那么mask=2的5次方-1=31;
即表示这个逻辑块包含32个页.即squashfs_readpage一次读取同一文件的32个页.
*/
int mask = (1<<(msblk->block_log-PAGE_CACHE_SHIFT))-1;
/*index表示page->index>>5;即要读取的文件的偏移位置,在哪一个逻辑块上(一个逻辑块32个页)*/
int index=page->index>>(msblk->block_log-PAGE_CACHE_SHIF);
/*
表示要读取的文件的偏移位置所在偏移页编号如
[160-191]:start_Inex=160,end_index=191
*/
int start_index=page->index&~mask;
/*
表示要读取的文件的偏移位置所在偏移页编号如
[160-191]:start_Inex=160,end_index=191
*/
int end_index=start_index|mask;
TRACE("Entered squashfs_readpage, page index %lx, start block %llx\n",
page->index, squashfs_i(inode)->start);

if (page->index >= ((i_size_read(inode) + PAGE_CACHE_SIZE - 1) >>
PAGE_CACHE_SHIFT))
goto out;

if (index < file_end || squashfs_i(inode)->fragment_block ==
SQUASHFS_INVALID_BLK) {
u64 block = 0;
  /*
   从block_cache缓存区读取metadata block数据;block_cache缓存区在squashfs_fill_super时初始化;block_cache大小为8*8192byte
   (8指squashfs_cache->entries个数;8192是一个entry的大小);读取方式squashfs_read_metadata()->squashfs_cache_get()
  */
int bsize = read_blocklist(inode, index, &block);
if (bsize < 0)
goto error_out;
  /*squashfs_readpage_block->squashfs_get_datablock()->squashfs_cache_get():
   从read_page缓存区读取data block数据;read_page缓存区在squashfs_fill_super时初始化;read_page缓存区大小为1*512kbyte
   (1指squashfs_cache->entries个数;512k是一个entry的大小);读取方式squashfs_get_datablock()->squashfs_cache_get()
    squashfs_get_datablock()中获取squashfs_cache_entry,squashfs_cache_entry->data中保存从flash上读取的数据
  */
if (bsize == 0)
res = squashfs_readpage_sparse(page, index, file_end);
else
res = squashfs_readpage_block(page, block, bsize);
} else
  /*squashfs_readpage_fragment->squashfs_get_fragment()->squashfs_cache_get():
   从fragment_cache缓存区读取data数据;fragment_cache缓存区在squashfs_fill_super时初始化;fragment_cache大小为3*512kbyte
   (3指squashfs_cache->entries个数;512k是一个entry的大小);读取方式squashfs_get_fragment()->squashfs_cache_get()
    squashfs_get_fragment()中获取squashfs_cache_entry,squashfs_cache_entry->data中保存从flash上读取的数据
  */
res = squashfs_readpage_fragment(page);

if (!res)
return 0;

error_out:
SetPageError(page);
out:
pageaddr = kmap_atomic(page);
memset(pageaddr, 0, PAGE_CACHE_SIZE);
kunmap_atomic(pageaddr);
flush_dcache_page(page);
if (!PageError(page))
SetPageUptodate(page);
unlock_page(page);

return 0;
}
【总结】
1> 操作一个文件的过程,需要根据超级块找到超级块的信息squashfs_sb_info=sb->s_fs_info;而squashfs_sb_info在squashfs_fill_super中初始化.
  挂载文件系统时,初始超级块的信息: 
  super_block->s_blocksize=0x400=1024;//逻辑块大小
  suqashfs_sb_info->devblksize=1024; //squashfs_fill_super时初始化;
  squashfs_sb_info->block_size=512K; //mksquashfs制作文件系统时指定;
2>文件系统层读取flash的接口是squashfs_cache_get,该接口可以读取metadata block和data block及fragment block内容,具体实现可以参看上文.squashfs_cache_get先从squashfs_cache缓存中获取metada block或datablock内容,如果获取不到再通过squashfs_read_data从flash中读取.
3> squashfs读写文件时通过squashfs_read_data->ll_rw_block->submit_bh->submit_bio->genric_make_request=blk_queue_bio 提交读写请求.注意文件系统层面的读写如squashfs_read_data等.注意此时涉及的block都是指读写地址的分区内偏移地址,在part_read之后才转换成真正的flash地址.
4> 操作系统中有专门的任务将数据写到flash中:
mtd_blktrans_work->do_blktrans_request->mtdblock_readsect->mtd_read->part_read->nand_read->nand_do_read_ops->nand_read_page_raw->
(驱动中读操作,例如:hifmc100_read_buf);do_blktrans_request()中处理submit_bh中提交的请求,并根据请求的信息完成flash读写等操作.
5>真正读取一个flash地址时先找到这个地址对应的buffer_head,buffer_head中保存了要读取到的dram地址bh->b_data;见上面分析.
6>inode和direcotry信息保存到metadata block中,metadata block在flash上专门用作保存文件inode等信息.squashfs_read_metadata是专门获取metadata block的接口;普通文件是使用squashfs_read_metadata获取文件的inode信息;sqashfs_read_data是读取普通data block的接口,可以通过sqashfs_read_data接口直接获取指定flash地址的内容.

Logo

更多推荐