linux文件的读取过程分析
[摘要][正文]文件系统挂载[正文]squashfs通过块设备mtdblock真正读取flash[正文]metadata block介绍之inode创建[正文]读文件之squashfs_readpage[总结]注意:请使用谷歌浏览器阅读(IE浏览器排版混乱)【摘要】本文将以squashfs文件系统为例介绍一下linux内核态是如何读取
[摘要]
[正文]文件系统挂载
[正文]squashfs通过块设备mtdblock真正读取flash
[正文]metadata block介绍之inode创建
[正文]读文件之squashfs_readpage
[总结]
注意:请使用谷歌浏览器阅读(IE浏览器排版混乱)
【摘要】
本文将以squashfs文件系统为例介绍一下linux内核态是如何读取文件的,读取操作如何由文件系统层到mtd驱动再到nand驱动的.读文件和直接读flash有何异同.
简单地说:直接读flash,是通过mtd驱动到flash驱动直接读取指定分区内偏移的flash地址;而读取文件要复杂一些,读取文件要先用直接读flash的方式读取metadata block,metadata block是保存inode和directory信息的flash地址,然后通过inode信息找到文件内容的flash地址,再将flash内容读取到squashfs_cache中.
阅读本文之前,可以先参考linux文件系统实现原理简述一文 http://blog.csdn.net/eleven_xiy/article/details/71249365
【正文】文件系统挂载
1 文件系统生成 , 举例user分区在flash上区间为0x1a00000-0x3200000(即26M --- 50M):
1> mksquashfs ./user user.squashfs -comp xz -all-root -processors 1
解析:
./user:user目录代表分区内容;即该目录及其内容将被制作成文件系统。
user.squashfs:表示生成的user文件系统,该文件被烧录到flash上user分区里。
-comp xz:表示以xz方式压缩.当读取一个文件时要解压缩。压缩比 高。
-all-root:表示user分区内所有文件归root用户所有. 可选参数。
-processors 1 : 表示mksquashfs打包过程中使用几个处理器。可选参数。
-b:此时虽然没有带-b参数,但是默认逻辑块大小128k。在挂载分区->初始化超级块时会获取此处块大小,见后文分析。
2> mksquashfs ./user user.squashfs -comp xz -Xbcj arm -Xdict-size 512K -b 512K -processors 1
./user:user目录代表分区内容;
user.squashfs:表示生成的user文件系统,该文件被烧录到flash上user分区里.
-comp xz:表示以xz方式压缩.当读取一个文件时要解压缩。压缩比 高。
-processors 1 : 表示mksquashfs打包过程中使用几个处理器。可选参数。
-b:逻辑块大小512k。在挂载分区->初始化超级块时会获取此处块大小,见后文分析。
2 超级块的初始化squashfs_fill_super
static int squashfs_fill_super(struct super_block *sb, void *data, int silent)
{
struct squashfs_sb_info *msblk;
struct squashfs_super_block *sblk = NULL;
char b[BDEVNAME_SIZE];
struct inode *root;
long long root_inode;
unsigned short flags;
unsigned int fragments;
u64 lookup_table_start, xattr_id_table_start, next_table;
int err;
/*超级块的很多关键信息都保存在squashfs_sb_info中*/
sb->s_fs_info = kzalloc(sizeof(*msblk), GFP_KERNEL);
if (sb->s_fs_info == NULL) {
ERROR("Failed to allocate squashfs_sb_info\n");
return -ENOMEM;
}
msblk = sb->s_fs_info;
/*
msblk->devblksize = 1024;msblk->devblksize_log2=10;
*/
msblk->devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE);
msblk->devblksize_log2 = ffz(~msblk->devblksize);
mutex_init(&msblk->meta_index_mutex);
msblk->bytes_used = sizeof(*sblk);
/*
获取squashfs_super_block信息,这部分信息完全是从flash中读取的.读取flash的起始地址是分区内偏移0;大小是sizeof(struct squashfs_super_block)=96byte;
如上例:此时flash中烧录的是user.squashfs中的内容.即sblk保存了user.squashfs文件的内容:flash起始地址是0;大小是sizeof(struct squashfs_super_block);
*/
sblk = squashfs_read_table(sb, SQUASHFS_START, sizeof(*sblk));
if (IS_ERR(sblk)) {
ERROR("unable to read squashfs_super_block\n");
err = PTR_ERR(sblk);
sblk = NULL;
goto failed_mount;
}
err = -EINVAL;
/* 从分区文件(如user.squashfs)中读取magic= 0x73717368*/
sb->s_magic = le32_to_cpu(sblk->s_magic);
if (sb->s_magic != SQUASHFS_MAGIC) {
if (!silent)
ERROR("Can't find a SQUASHFS superblock on %s\n",
bdevname(sb->s_bdev, b));
goto failed_mount;
}
/*根据分区文件(如user.squashfs)中读取的sblk->compression=4(表示xz压缩类型),找到解压缩方法squashfs_decompressor=squashfs_xz_comp_ops*/
msblk->decompressor = supported_squashfs_filesystem(
le16_to_cpu(sblk->s_major),
le16_to_cpu(sblk->s_minor),
le16_to_cpu(sblk->compression));
if (msblk->decompressor == NULL)
goto failed_mount;
/* 从分区文件(如user.squashfs)中读取分区已使用大小。举例:分区23M;已经使用20M*/
msblk->bytes_used = le64_to_cpu(sblk->bytes_used);
if (msblk->bytes_used < 0 || msblk->bytes_used >
i_size_read(sb->s_bdev->bd_inode))
goto failed_mount;
/* 从分区文件(如user.squashfs)中读取分区已逻辑块大小512k;mksquashfs中-b参数指定*/
msblk->block_size = le32_to_cpu(sblk->block_size);
if (msblk->block_size > SQUASHFS_FILE_MAX_SIZE)
goto failed_mount;
/*
* Check the system page size is not larger than the filesystem
* block size (by default 128K). This is currently not supported.
*/
if (PAGE_CACHE_SIZE > msblk->block_size) {
ERROR("Page size > filesystem block size (%d). This is "
"currently not supported!\n", msblk->block_size);
goto failed_mount;
}
/* 从分区文件(如user.squashfs)中读取分区已逻辑块大小512k以2为底的对数,即log512k .校验逻辑块大小时使用*/
msblk->block_log = le16_to_cpu(sblk->block_log);
if (msblk->block_log > SQUASHFS_FILE_MAX_LOG)
goto failed_mount;
/* Check that block_size and block_log match */
if (msblk->block_size != (1 << msblk->block_log))
goto failed_mount;
/* Check the root inode for sanity */
root_inode = le64_to_cpu(sblk->root_inode);
if (SQUASHFS_INODE_OFFSET(root_inode) > SQUASHFS_METADATA_SIZE)
goto failed_mount;
/*
从分区文件(如user.squashfs)中读取inode_table,如user分区
sblk->inode_table_start=0x1497002 -- 该superblock的inode信息在flash上保存的起始地址;使用方式见后文.
sblk->directory_table_start=0x1497ce2;-- 该superblock的directory信息在flash上保存的起始地址;使用方式见后文.
sblk->fragement_table_start0x1498d72;
sblk->id_table_start=0x1499036;
这些地址表示分区内的偏移地址,如0x1497002表示的flash地址为分区起始地址加上偏移地址:即0x1a00000+0x1497002;
*/
msblk->inode_table = le64_to_cpu(sblk->inode_table_start);
msblk->directory_table = le64_to_cpu(sblk->directory_table_start);
msblk->inodes = le32_to_cpu(sblk->inodes);
flags = le16_to_cpu(sblk->flags);
/* 如 Found valid superblock on mtdblock8 */
TRACE("Found valid superblock on %s\n", bdevname(sb->s_bdev, b));
/* 如 inodes are cmpressed */
TRACE("Inodes are %scompressed\n", SQUASHFS_UNCOMPRESSED_INODES(flags)? "un" : "");
/* 如 Data are cmpressed */
TRACE("Data is %scompressed\n", SQUASHFS_UNCOMPRESSED_DATA(flags)? "un" : "");
TRACE("Filesystem size %lld bytes\n", msblk->bytes_used);
TRACE("Block size %d\n", msblk->block_size);
/*inodes 451*/
TRACE("Number of inodes %d\n", msblk->inodes);
/*fragments 21*/
TRACE("Number of fragments %d\n", le32_to_cpu(sblk->fragments));
/* ids 2*/
TRACE("Number of ids %d\n", le16_to_cpu(sblk->no_ids));
TRACE("sblk->inode_table_start %llx\n", msblk->inode_table);
TRACE("sblk->directory_table_start %llx\n", msblk->directory_table);
TRACE("sblk->fragment_table_start %llx\n",(u64) le64_to_cpu(sblk->fragment_table_start));
TRACE("sblk->id_table_start %llx\n",(u64) le64_to_cpu(sblk->id_table_start));
sb->s_maxbytes = MAX_LFS_FILESIZE;
sb->s_flags |= MS_RDONLY;
sb->s_op = &squashfs_super_ops;
err = -ENOMEM;
/* 创建metadata的squashfs_cache ,block_cache用于缓存metadata block上的信息,metadata block上保存inode和directory信息*/
msblk->block_cache = squashfs_cache_init("metadata",SQUASHFS_CACHED_BLKS, SQUASHFS_METADATA_SIZE);
if (msblk->block_cache == NULL)
goto failed_mount;
/* Allocate read_page block */
msblk->read_page = squashfs_cache_init("data",
squashfs_max_decompressors(), msblk->block_size);
if (msblk->read_page == NULL) {
ERROR("Failed to allocate read_page block\n");
goto failed_mount;
}
msblk->stream = squashfs_decompressor_setup(sb, flags);
if (IS_ERR(msblk->stream)) {
err = PTR_ERR(msblk->stream);
msblk->stream = NULL;
goto failed_mount;
}
/* Handle xattrs */
sb->s_xattr = squashfs_xattr_handlers;
xattr_id_table_start = le64_to_cpu(sblk->xattr_id_table_start);
if (xattr_id_table_start == SQUASHFS_INVALID_BLK) {
next_table = msblk->bytes_used;
goto allocate_id_index_table;
}
/* Allocate and read xattr id lookup table */
msblk->xattr_id_table = squashfs_read_xattr_id_table(sb,
xattr_id_table_start, &msblk->xattr_table, &msblk->xattr_ids);
if (IS_ERR(msblk->xattr_id_table)) {
ERROR("unable to read xattr id index table\n");
err = PTR_ERR(msblk->xattr_id_table);
msblk->xattr_id_table = NULL;
if (err != -ENOTSUPP)
goto failed_mount;
}
next_table = msblk->xattr_table;
allocate_id_index_table:
/* Allocate and read id index table */
msblk->id_table = squashfs_read_id_index_table(sb,
le64_to_cpu(sblk->id_table_start), next_table,
le16_to_cpu(sblk->no_ids));
if (IS_ERR(msblk->id_table)) {
ERROR("unable to read id index table\n");
err = PTR_ERR(msblk->id_table);
msblk->id_table = NULL;
goto failed_mount;
}
next_table = le64_to_cpu(msblk->id_table[0]);
/* Handle inode lookup table */
lookup_table_start = le64_to_cpu(sblk->lookup_table_start);
if (lookup_table_start == SQUASHFS_INVALID_BLK)
goto handle_fragments;
/* Allocate and read inode lookup table */
msblk->inode_lookup_table = squashfs_read_inode_lookup_table(sb,
lookup_table_start, next_table, msblk->inodes);
if (IS_ERR(msblk->inode_lookup_table)) {
ERROR("unable to read inode lookup table\n");
err = PTR_ERR(msblk->inode_lookup_table);
msblk->inode_lookup_table = NULL;
goto failed_mount;
}
next_table = le64_to_cpu(msblk->inode_lookup_table[0]);
sb->s_export_op = &squashfs_export_ops;
handle_fragments:
fragments = le32_to_cpu(sblk->fragments);
if (fragments == 0)
goto check_directory_table;
msblk->fragment_cache = squashfs_cache_init("fragment",
SQUASHFS_CACHED_FRAGMENTS, msblk->block_size);
if (msblk->fragment_cache == NULL) {
err = -ENOMEM;
goto failed_mount;
}
/* Allocate and read fragment index table */
msblk->fragment_index = squashfs_read_fragment_index_table(sb,
le64_to_cpu(sblk->fragment_table_start), next_table, fragments);
if (IS_ERR(msblk->fragment_index)) {
ERROR("unable to read fragment index table\n");
err = PTR_ERR(msblk->fragment_index);
msblk->fragment_index = NULL;
goto failed_mount;
}
next_table = le64_to_cpu(msblk->fragment_index[0]);
check_directory_table:
/* Sanity check directory_table */
if (msblk->directory_table > next_table) {
err = -EINVAL;
goto failed_mount;
}
/* Sanity check inode_table */
if (msblk->inode_table >= msblk->directory_table) {
err = -EINVAL;
goto failed_mount;
}
/* root inode内存空间申请allocate root */
root = new_inode(sb);
if (!root) {
err = -ENOMEM;
goto failed_mount;
}
/*
该superblock根inode信息在flash上的保存地址,mksquashfs制作文件系统时就已经指定,保存到squashfs_super_block中;
而squashfs_super_block信息保存到该flash分区0x0偏移地址处见上文
root表示操作系统申请的dram上的inode结构,该结构的关键信息是通过squashfs_read_inode从flash上读取的.
root_inode右移16bit加上该superblock的inode_table_start是保存根inode的信息的flash地址;
root_inode低16bit表示flash上squashfs_inode在根inode信息中的偏移地址;
*/
err = squashfs_read_inode(root, root_inode);
if (err) {
make_bad_inode(root);
iput(root);
goto failed_mount;
}
insert_inode_hash(root);
sb->s_root = d_make_root(root);
if (sb->s_root == NULL) {
ERROR("Root inode create failed\n");
err = -ENOMEM;
goto failed_mount;
}
TRACE("Leaving squashfs_fill_super\n");
kfree(sblk);
return 0;
failed_mount:
squashfs_cache_delete(msblk->block_cache);
squashfs_cache_delete(msblk->fragment_cache);
squashfs_cache_delete(msblk->read_page);
squashfs_decompressor_destroy(msblk);
kfree(msblk->inode_lookup_table);
kfree(msblk->fragment_index);
kfree(msblk->id_table);
kfree(msblk->xattr_id_table);
kfree(sb->s_fs_info);
sb->s_fs_info = NULL;
kfree(sblk);
return err;
}
创建metadata的squashfs_cache缓存区:此时entries=8,block_size=8192;squashfs_cache_get中使用.
struct squashfs_cache *squashfs_cache_init(char *name, int entries,int block_size)
{
int i, j;
struct squashfs_cache *cache = kzalloc(sizeof(*cache), GFP_KERNEL);
if (cache == NULL) {
ERROR("Failed to allocate %s cache\n", name);
return NULL;
}
cache->entry = kcalloc(entries, sizeof(*(cache->entry)), GFP_KERNEL);
if (cache->entry == NULL) {
ERROR("Failed to allocate %s cache\n", name);
goto cleanup;
}
cache->curr_blk = 0;
cache->next_blk = 0;
cache->unused = entries;//缓存区中有8个entry,每个entry的数据区有8192bytes空间
cache->entries = entries;
cache->block_size = block_size; //8192
cache->pages = block_size >> PAGE_CACHE_SHIFT;
cache->pages = cache->pages ? cache->pages : 1;
cache->name = name;
cache->num_waiters = 0;
spin_lock_init(&cache->lock);
init_waitqueue_head(&cache->wait_queue);
for (i = 0; i < entries; i++) { //entries=8
struct squashfs_cache_entry *entry = &cache->entry[i];
init_waitqueue_head(&cache->entry[i].wait_queue);
entry->cache = cache;
entry->block = SQUASHFS_INVALID_BLK;
entry->data = kcalloc(cache->pages, sizeof(void *), GFP_KERNEL);
if (entry->data == NULL) {
ERROR("Failed to allocate %s cache entry\n", name);
goto cleanup;
}
for (j = 0; j < cache->pages; j++) {
/* metadata每个entry对应2个page ,entry->data上保存的是flash读出的数据*/
entry->data[j] = kmalloc(PAGE_CACHE_SIZE, GFP_KERNEL);
if (entry->data[j] == NULL) {
ERROR("Failed to allocate %s buffer\n", name);
goto cleanup;
}
}
entry->actor = squashfs_page_actor_init(entry->data,
cache->pages, 0);
if (entry->actor == NULL) {
ERROR("Failed to allocate %s cache entry\n", name);
goto cleanup;
}
}
return cache;
cleanup:
squashfs_cache_delete(cache);
return NULL;
}
2.1 读取flash分区对应的超级快的信息.即获取super_block->s_fs_info的信息,保存到squashfs_super_block中.
suqashfs_fill_super->squashfs_read_table()
/*
block:表示flash分区内的偏移地址而不是指块.具体操作时block要加上分区起始地址,才是真正的flash地址.
*/
void *squashfs_read_table(struct super_block *sb, u64 block, int length)
{
int pages = (length + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
int i, res;
void *table, *buffer, **data;
struct squashfs_page_actor *actor;
table = buffer = kmalloc(length, GFP_KERNEL);
if (table == NULL)
return ERR_PTR(-ENOMEM);
data = kcalloc(pages, sizeof(void *), GFP_KERNEL);
if (data == NULL) {
res = -ENOMEM;
goto failed;
}
actor = squashfs_page_actor_init(data, pages, length);
if (actor == NULL) {
res = -ENOMEM;
goto failed2;
}
for (i = 0; i < pages; i++, buffer += PAGE_CACHE_SIZE)
data[i] = buffer;
res = squashfs_read_data(sb, block, length |
SQUASHFS_COMPRESSED_BIT_BLOCK, NULL, actor);
kfree(data);
kfree(actor);
if (res < 0)
goto failed;
return table;
}
【正文】squashfs通过块设备mtdblock真正读取flash读取过程主要分为以下几步:
1> 获取buffer_head: squashfs_read_data->sb_getblk();
提交一个读请求: squashfs_read_data->ll_rw_block->submit_bh->submit_bio->blk_queue_bio();
buffer_head->b_data上保存从flash上读取到的内容;
2> 处理读请求,真正实现驱动中的读操作 :
mtd_blktrans_work->do_blktrans_request->mtdblock_tr->mtdblock_readsect->do_cached_read->(mtd_read->mtd->_read=part_read)
->nand_read()->nand_do_read_ops()->(chip->cmdfunc);
1 读flash数据过程squashfs_read_data
squashfs_read_data函数很重要,他是文件系统层访问flash的重要接口,该函数以入参length区分访问类型:
1>length!=0时表示读取data block.注意此时的block都是要读取的flash分区内的偏移地址.如:0x1497ce2.
2>length=0时表示读取metadata block.注意此时的block都是要读取的flash分区内的偏移地址.如:0x1497ce2.
data block和metadata block的异同:
相同点:二者都是通过块设备mtdblock直接访问flash地址.
区别:
1> data block可以指flash上的任何地址,理论上metadata block也可以通过读取data block的方式读取出来,但因为metadata block还有其他特点,所以操作系统有专门的接口 squashfs_read_metadata负责读取metadata事实上squashfs_read_metadata最后也是调用squashfs_read_data读取datablock来获取metadatablock的,见后文.
2> metadata block也是flash的地址,它保存一些inode的信息等,是由操作系统维护的,在制作文件系统mksquashfs,创建新文件等情况下生成其内容.
3> 举例:
datablock典型用法,挂载文件系统时:squashfs_fill_super->squashfs_read_table->squashfs_read_inode,此时从分区的0x0地址处读出包括inode_table起始地址等的squashfs_super_block.后文获取metadata block信息时都要依赖此.
metadata block典型用法,打开一个文件时:do_sys_open->do_last->lookup_real->squashfs_lookup->squashfs_read_metadata,此时通过读取metadata block信息获取文件对应的inode信息,再根据inode信息读出文件内容.后文还会有介绍.
总之,通过块设备mtdblock直接访问flash地址,需要访问datablock;通过访问文件的方式间接访问flash,需要访问metadata block.而访问metadatablock过程其实也包括了访问datablock;
本章节主要介绍data block的读取,metadata读取见后文.
其中:index表示flash分区内的偏移地址而不是指块.具体操作时block要加上分区起始地址,才是真正的flash地址;
从flash上读取数据之后,还要经过解压缩过程:可以参考博文:xz压缩文件的解压缩过程
int squashfs_read_data(struct super_block *sb, u64 index, int length,u64 *next_index, struct squashfs_page_actor *output)
{
/* 上面squashfs_fill_super中赋值 */
struct squashfs_sb_info *msblk = sb->s_fs_info;
struct buffer_head **bh;
/* index是要操作的flash 分区内偏移地址,devlbksize=1024byte*/
int offset = index & ((1 << msblk->devblksize_log2) - 1);
/*
flash分区内偏移地址msblk->devblksize_log2=10;cur_index表示flash分区内偏移地址对应的逻辑块;
*/
u64 cur_index = index >> msblk->devblksize_log2;
int bytes, compressed, b = 0, k = 0, avail, i;
bh = kcalloc(((output->length + msblk->devblksize - 1)
>> msblk->devblksize_log2) + 1, sizeof(*bh), GFP_KERNEL);
if (bh == NULL)
return -ENOMEM;
if (length) {
/*
* Datablock.读取数据块内容.
*/
bytes = -offset;
compressed = SQUASHFS_COMPRESSED_BLOCK(length);
length = SQUASHFS_COMPRESSED_SIZE_BLOCK(length);
if (next_index)
*next_index = index + length;
TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
index, compressed ? "" : "un", length, output->length);
if (length < 0 || length > output->length ||
(index + length) > msblk->bytes_used)
goto read_failure;
/*
循环读取逻辑块;每次读取devblksize=1024大小;最先读取的逻辑块是flash分区内偏移地址所在的逻辑块.
*/
for (b = 0; bytes < length; b++, cur_index++) {
/*获取buffer_head*/
bh[b] = sb_getblk(sb, cur_index);
if (bh[b] == NULL)
goto block_release;
/*devblksize=1024*/
bytes += msblk->devblksize;
}
/*
1 提交一个读请求 ll_rw_block->submit_bh->submit_bio->do_blktrans_request();
2 处理读请求,真正实现驱动中的读操作 :
mtd_blktrans_work->do_blktrans_request->mtdblock_tr->mtdblock_readsect->do_cached_read->(mtd_read->mtd->_read=part_read)
->nand_read()->nand_do_read_ops()->(chip->cmdfunc)
*/
ll_rw_block(READ, b, bh);
} else {
/*
* Metadata block.读取逻辑块内容;
*/
if ((index + 2) > msblk->bytes_used)
goto read_failure;
bh[0] = get_block_length(sb, &cur_index, &offset, &length);
if (bh[0] == NULL)
goto read_failure;
b = 1;
bytes = msblk->devblksize - offset;
compressed = SQUASHFS_COMPRESSED(length);
length = SQUASHFS_COMPRESSED_SIZE(length);
if (next_index)
*next_index = index + length + 2;
TRACE("Block @ 0x%llx, %scompressed size %d\n", index,compressed ? "" : "un", length);
if (length < 0 || length > output->length ||
(index + length) > msblk->bytes_used)
goto block_release;
for (; bytes < length; b++) {
bh[b] = sb_getblk(sb, ++cur_index);
if (bh[b] == NULL)
goto block_release;
bytes += msblk->devblksize;
}
ll_rw_block(READ, b - 1, bh + 1);
}
for (i = 0; i < b; i++) {
wait_on_buffer(bh[i]);
if (!buffer_uptodate(bh[i]))
goto block_release;
}
if (compressed) {
/*
解压缩操作,将flash上读出的数据解压缩
msblk:超级块信息squashfs_sb_info ;
bh:buffer_head,bh->data中保存从flash上读取的数据;
b:表示读取的数据长度对应的逻辑块个数;
offset:表示读取的flash地址对应的逻辑块偏移地址,一个逻辑块为1024byte,offset=index&0x3ff
length:表示从flash上读取的数据长度
*/
length = squashfs_decompress(msblk, bh, b, offset, length,output);
if (length < 0)
goto read_failure;
} else {
/*
* Block is uncompressed.
*/
int in, pg_offset = 0;
void *data = squashfs_first_page(output);
for (bytes = length; k < b; k++) {
in = min(bytes, msblk->devblksize - offset);
bytes -= in;
while (in) {
if (pg_offset == PAGE_CACHE_SIZE) {
data = squashfs_next_page(output);
pg_offset = 0;
}
avail = min_t(int, in, PAGE_CACHE_SIZE -
pg_offset);
memcpy(data + pg_offset, bh[k]->b_data + offset,
avail);
in -= avail;
pg_offset += avail;
offset += avail;
}
offset = 0;
put_bh(bh[k]);
}
squashfs_finish_page(output);
}
kfree(bh);
return length;
block_release:
for (; k < b; k++)
put_bh(bh[k]);
read_failure:
ERROR("squashfs_read_data failed to read block 0x%llx\n",
(unsigned long long) index);
kfree(bh);
return -EIO;
}
1.1 获取buffer_head.查找buffer_head
(1)获取bufferhead的过程:实际上是先查找是否有对应block(block为flash分区内偏移地址所在逻辑块)的buffer_head, __find_get_block中完成;如果没有查找到则创建buffer_head, __getblk_slow中完成;本文先介绍的buffer_head的查找过程,实际上查找过程依赖于创建过程,所以也可以先阅读buffer_head的创建过程.
(2)操作flash为何要获取buffer_head?
原因:以读取过程为例,当读取flash分区内的一个偏移地址时,它唯一对应一个flash分区内偏移地址所在逻辑块block.
创建buffer_head时:
alloc_page_buffers()中申请了一个page和buffer_head结构;
init_page_buffers()中初始化buffer_head->b_blocknr为读取的flash分区内偏移地址所在逻辑块block;
set_bh_page()中初始化buffer_head->b_data为alloc_page_buffers中申请的page->address的分割.bh->b_data保存从flash中夺取的数据.
link_dev_buffers()->attach_page_buffers()中初始化page->private为buffer_head;
(3)squashfs_read_data()->sb_getblk():
static inline struct buffer_head *sb_getblk(struct super_block *sb, sector_t block)
{
/* block为flash分区内偏移地址所在逻辑块(即block=flash分区内偏移地址/(devblksize=1024)),
一个逻辑块大小1024 byte;sb->s_blocksize大小为1024 byte该值赋值给bh->b_size
*/
return __getblk_gfp(sb->s_bdev, block, sb->s_blocksize, __GFP_MOVABLE);
}
squashfs_read_data()->sb_getblk()->__getblk_gfp():
struct buffer_head *__getblk_gfp(struct block_device *bdev, sector_t block,unsigned size, gfp_t gfp)
{
/* block为flash分区内偏移地址/(devblksize=1024)*/
struct buffer_head *bh = __find_get_block(bdev, block, size);
might_sleep();
/* 如果buffer_head获取失败:可参看__find_get_block和__find_get_block_slow分析获取失败的原因*/
if (bh == NULL)
bh = __getblk_slow(bdev, block, size, gfp);
return bh;
}
squashfs_read_data()->sb_getblk()->__getblk_gfp()->__find_get_block():
struct buffer_head *__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
{
/*
假设偏移地址所在逻辑块block之前没被访问过,则此时bh=NULL,直接分析__find_get_block_slow;
如果访问过,则有可能找到,继续分析lookup_bh_lru;
可以先分析__find_get_block_slow即没有找到的情况,再回过头分析lookup_bh_lru;
*/
struct buffer_head *bh = lookup_bh_lru(bdev, block, size);
if (bh == NULL) {
/* __find_get_block_slow will mark the page accessed */
bh = __find_get_block_slow(bdev, block);
if (bh)
bh_lru_install(bh);
} else
touch_buffer(bh);
return bh;
}
squashfs_read_data()->sb_getblk()->__getblk_gfp()->__find_get_block->__find_get_block_slow()
static struct buffer_head *__find_get_block_slow(struct block_device *bdev, sector_t block)
{
struct inode *bd_inode = bdev->bd_inode;
struct address_space *bd_mapping = bd_inode->i_mapping;
struct buffer_head *ret = NULL;
pgoff_t index;
struct buffer_head *bh;
struct buffer_head *head;
struct page *page;
int all_mapped = 1;
index = block >> (PAGE_CACHE_SHIFT - bd_inode->i_blkbits);
/* 申请新的page */
page = find_get_page_flags(bd_mapping, index, FGP_ACCESSED);
if (!page)
goto out;
spin_lock(&bd_mapping->private_lock);
/*
如果page的私有数据区page->private=NULL,则不能获得buffer_head,直接goto out_unlock;
page的私有数据区page->private中保存的是buffer_head的头;
第一次读取block(block为flash分区内偏移地址/(devblksize=1024)时,如果该block不对应buffer_head,则创建buffer_head对应该block,
并把buffer_head的地址保存到page的私有数据区即page->private;可以参考squashfs_read_data()->sb_getblk()->__getblk_gfp():中的另一
个分支__getblk_slow();
)
*/
if (!page_has_buffers(page))
goto out_unlock;
head = page_buffers(page);
bh = head;
/*
page的私有数据区page->private中保存的是buffer_head的头;
通过bh->b_blocknr是否等于block来判断这个buffer_head是否我们要读取得block对应的buffer_head;
第一次读取block(block为flash分区内偏移地址/(devblksize=1024)时,如果该block不对应buffer_head,则创建buffer_head对应该block,
并把buffer_head的地址保存到page的私有数据区即page->private;可以参考squashfs_read_data()->sb_getblk()->__getblk_gfp():中的另一
个分支__getblk_slow();
*/
do {
if (!buffer_mapped(bh))
all_mapped = 0;
else if (bh->b_blocknr == block) {
ret = bh;
get_bh(bh);
goto out_unlock;
}
bh = bh->b_this_page;
} while (bh != head);
/* we might be here because some of the buffers on this page are
* not mapped. This is due to various races between
* file io on the block device and getblk. It gets dealt with
* elsewhere, don't buffer_error if we had some unmapped buffers
*/
if (all_mapped) {
char b[BDEVNAME_SIZE];
printk("__find_get_block_slow() failed. ""block=%llu, b_blocknr=%llu\n",(unsigned long long)block,(unsigned long long)bh->b_blocknr);
printk("b_state=0x%08lx, b_size=%zu\n",bh->b_state, bh->b_size);
printk("device %s blocksize: %d\n", bdevname(bdev, b),1 << bd_inode->i_blkbits);
}
out_unlock:
spin_unlock(&bd_mapping->private_lock);
page_cache_release(page);
out:
return ret;
}
1.2 获取buffer_head.创建buffer_head
如果__find_get_block未获取到对应block(block为flash分区内偏移地址所在逻辑块)的buffer_head,则创建它.
创建buffer_head时:
alloc_page_buffers()中申请了一个page和buffer_head结构;
init_page_buffers()中初始化buffer_head->b_blocknr为读取的flash分区内偏移地址所在逻辑块block;
set_bh_page()中初始化buffer_head->b_data为alloc_page_buffers中申请的page->address的分割.bh->b_data保存从flash中夺取的数据.
link_dev_buffers()->attach_page_buffers()中初始化page->private为buffer_head;
分析:squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow():
struct buffer_head *__getblk_slow(struct block_device *bdev, sector_t block,unsigned size, gfp_t gfp)
{
/* Size must be multiple of hard sectorsize */
if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
(size < 512 || size > PAGE_SIZE))) {
printk(KERN_ERR "getblk(): invalid block size %d requested\n",
size);
printk(KERN_ERR "logical block size: %d\n",
bdev_logical_block_size(bdev));
dump_stack();
return NULL;
}
for (;;) {
struct buffer_head *bh;
int ret;
bh = __find_get_block(bdev, block, size);
if (bh)
return bh;
/* 创建buffer_head */
ret = grow_buffers(bdev, block, size, gfp);
if (ret < 0)
return NULL;
if (ret == 0)
free_more_memory();
}
}
申请buffer_head和page,并负责page->private,bh->b_blocknr,bh->b_data等信息.
squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow()->grow_buffers->grow_dev_page():
static int grow_dev_page(struct block_device *bdev, sector_t block,pgoff_t index, int size, int sizebits, gfp_t gfp)
{
struct inode *inode = bdev->bd_inode;
struct page *page;
struct buffer_head *bh;
sector_t end_block;
int ret = 0; /* Will call free_more_memory() */
gfp_t gfp_mask;
gfp_mask = (mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS) | gfp;
/*
* XXX: __getblk_slow() can not really deal with failure and
* will endlessly loop on improvised global reclaim. Prefer
* looping in the allocator rather than here, at least that
* code knows what it's doing.
*/
gfp_mask |= __GFP_NOFAIL;
/* 申请新的page,且page->mapping=inode->i_mapping ,这个page就是bh->b_page(set_bh_page中初始化)*/
page = find_or_create_page(inode->i_mapping, index, gfp_mask);
if (!page)
return ret;
BUG_ON(!PageLocked(page));
/* 判断page->private上是否保存buffer_head,如果page_has_buffers为真,则说明page->private上保存了buffer_head */
if (page_has_buffers(page)) {
bh = page_buffers(page);
if (bh->b_size == size) {
end_block = init_page_buffers(page, bdev,
(sector_t)index << sizebits,
size);
goto done;
}
if (!try_to_free_buffers(page))
goto failed;
}
/*
* 执行至此,说明page->private上未保存block对应的buffer_head,于是申请buffer_head;block为要操作的flash分区内偏移地址所在逻辑块
*/
bh = alloc_page_buffers(page, size, 0);
if (!bh)
goto failed;
/*
* Link the page to the buffers and initialise them. Take the
* lock to be atomic wrt __find_get_block(), which does not
* run under the page lock.
*/
spin_lock(&inode->i_mapping->private_lock);
link_dev_buffers(page, bh);
end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits,
size);
spin_unlock(&inode->i_mapping->private_lock);
done:
ret = (block < end_block) ? 1 : -ENXIO;
failed:
unlock_page(page);
page_cache_release(page);
return ret;
}
squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow()->grow_buffers->grow_dev_page()->alloc_page_buffers():
struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,int retry)
{
struct buffer_head *bh, *head;
long offset;
try_again:
head = NULL;
offset = PAGE_SIZE;
while ((offset -= size) >= 0) {
/*创建新的buffer_head,init_page_buffers赋值*/
bh = alloc_buffer_head(GFP_NOFS);
if (!bh)
goto no_grow;
bh->b_this_page = head;
bh->b_blocknr = -1;
head = bh;
/*bh->b_size=sb->s_blocksize=1024*/
bh->b_size = size;
/* Link the buffer to its page */
/* 指定该bh对应的page,读取到的flash内容就保存到该page上 */
set_bh_page(bh, page, offset);
}
return head;
/*
* In case anything failed, we just free everything we got.
*/
no_grow:
if (head) {
do {
bh = head;
head = head->b_this_page;
free_buffer_head(bh);
} while (head);
}
/*
* Return failure for non-async IO requests. Async IO requests
* are not allowed to fail, so we have to wait until buffer heads
* become available. But we don't want tasks sleeping with
* partially complete buffers, so all were released above.
*/
if (!retry)
return NULL;
/* We're _really_ low on memory. Now we just
* wait for old buffer heads to become free due to
* finishing IO. Since this is an async request and
* the reserve list is empty, we're sure there are
* async buffer heads in use.
*/
free_more_memory();
goto try_again;
}
alloc_page_buffers()->set_bh_page():初始化bh->b_data;
void set_bh_page(struct buffer_head *bh,struct page *page, unsigned long offset)
{
bh->b_page = page;
BUG_ON(offset >= PAGE_SIZE);
/* bh->b_data中保存flash上读取到的内容 */
if (PageHighMem(page))
/*
* This catches illegal uses and preserves the offset:
*/
bh->b_data = (char *)(0 + offset);
else
bh->b_data = page_address(page) + offset;
}
squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow()->grow_buffers->grow_dev_page()->link_dev_buffers():
static inline void link_dev_buffers(struct page *page, struct buffer_head *head)
{
struct buffer_head *bh, *tail;
bh = head;
do {
tail = bh;
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
attach_page_buffers(page, head);
}
link_dev_buffers()->attach_page_buffers():初始化page->private
static inline void attach_page_buffers(struct page *page,struct buffer_head *head)
{
page_cache_get(page);
SetPagePrivate(page);
/*page->private赋值,查找过程中的page_has_buffers(page)使用*/
set_page_private(page, (unsigned long)head);
}
squashfs_read_data()->sb_getblk()->__getblk_gfp()->__getblk_slow()->grow_buffers->grow_dev_page()->init_page_buffers():
static sector_t init_page_buffers(struct page *page, struct block_device *bdev,sector_t block, int size)
{
struct buffer_head *head = page_buffers(page);
struct buffer_head *bh = head;
int uptodate = PageUptodate(page);
sector_t end_block = blkdev_max_block(I_BDEV(bdev->bd_inode), size);
do {
if (!buffer_mapped(bh)) {
init_buffer(bh, NULL, NULL);
bh->b_bdev = bdev;
/*查找buffer_head时通过比较b_blocknr查找,要操作的flash分区内偏移地址对应的逻辑块(一个逻辑块1024byte)*/
bh->b_blocknr = block;
if (uptodate)
set_buffer_uptodate(bh);
if (block < end_block)
set_buffer_mapped(bh);
}
block++;
bh = bh->b_this_page;
} while (bh != head);
/*
* Caller needs to validate requested block against end of device.
*/
return end_block;
}
1.3 提交操作请求
squashfs_read_data()->ll_rw_block()->submit_bh->__submit_bh():
申请并根据上文创建的buffer_head生成bio,squashfs_read_data()中每0x200大小提交一个请求.
int _submit_bh(int rw, struct buffer_head *bh, unsigned long bio_flags)
{
struct bio *bio;
int ret = 0;
BUG_ON(!buffer_locked(bh));
BUG_ON(!buffer_mapped(bh));
BUG_ON(!bh->b_end_io);
BUG_ON(buffer_delay(bh));
BUG_ON(buffer_unwritten(bh));
/*
* Only clear out a write error when rewriting
*/
if (test_set_buffer_req(bh) && (rw & WRITE))
clear_buffer_write_io_error(bh);
/*
* from here on down, it's all bio -- do the initial mapping,
* submit_bio -> generic_make_request may further map this bio around
*/
bio = bio_alloc(GFP_NOIO, 1);
/* b_blocknr表示要读取得flash分区内偏移地址所在逻辑块,bh->b_size=1024 */
bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio->bi_bdev = bh->b_bdev;
/* 由上文buffer_head的创建可知page->private上保存buffer_head;bh->b_size=sb->s_blocksize=1024;bh->b_data地址在bh->b_page页上*/
bio->bi_io_vec[0].bv_page = bh->b_page;
bio->bi_io_vec[0].bv_len = bh->b_size;
bio->bi_io_vec[0].bv_offset = bh_offset(bh);
bio->bi_vcnt = 1;
bio->bi_iter.bi_size = bh->b_size;
bio->bi_end_io = end_bio_bh_io_sync;
/* 可根据bh找到flash读取到的dram地址 */
bio->bi_private = bh;
bio->bi_flags |= bio_flags;
/* Take care of bh's that straddle the end of the device */
guard_bio_eod(rw, bio);
if (buffer_meta(bh))
rw |= REQ_META;
if (buffer_prio(bh))
rw |= REQ_PRIO;
bio_get(bio);
/*
举例:要读取0x23856地址,假设其所在分区起始地址为0x300000;
do_blktrans_request中处理此处的提交的请求:
该地址所在分区内逻辑块bh->b_blocknr为0x23856>>10=0x8e;bi_sector=0x8e<<1=0x11c
则在do_cache_read中读取的逻辑块地址bi_sector<<9=0x23800
则在part_read中读取的flash地址为0x300000+0x23800;
*/
submit_bio(rw, bio);
if (bio_flagged(bio, BIO_EOPNOTSUPP))
ret = -EOPNOTSUPP;
bio_put(bio);
return ret;
}
1.4 do_blktrans_request处理__submit_bh->submit_bio一系列函数中提交的请求.
static struct mtd_blktrans_ops mtdblock_tr = {
.name = "mtdblock",
.major = MTD_BLOCK_MAJOR,
.part_bits = 0,
.blksize = 512,
.open = mtdblock_open,
.flush = mtdblock_flush,
.release = mtdblock_release,
.readsect = mtdblock_readsect,
.writesect = mtdblock_writesect,
.add_mtd = mtdblock_add_mtd,
.remove_dev = mtdblock_remove_dev,
.owner = THIS_MODULE,
};
static int do_blktrans_request(struct mtd_blktrans_ops *tr,struct mtd_blktrans_dev *dev,struct request *req)
{
unsigned long block, nsect;
char *buf;
/* mtdblock_tr=tr->blkshift=9;tr->blksize=512 */
block = blk_rq_pos(req) << 9 >> tr->blkshift;
nsect = blk_rq_cur_bytes(req) >> tr->blkshift;
buf = bio_data(req->bio);
if (req->cmd_type != REQ_TYPE_FS)
return -EIO;
if (req->cmd_flags & REQ_FLUSH)
return tr->flush(dev);
if (blk_rq_pos(req) + blk_rq_cur_sectors(req) >
get_capacity(req->rq_disk))
return -EIO;
if (req->cmd_flags & REQ_DISCARD)
return tr->discard(dev, block, nsect);
switch(rq_data_dir(req)) {
case READ:
/*
block 是__submit_bh中的bi_sector=bh->b_blocknr * (bh->b_size >> 9)
所以在mtdblock_readsect中会将block=block<<9,此时block是flash分区内逻辑块号
举例:要读取0x23856地址,假设其所在分区起始地址为0x300000;
改地址所在分区内逻辑块bh->b_blocknr为0x23856/1024=0x8e;bi_sector=0x11c
则在do_cache_read中读取的逻辑块地址bi_sector<<9=0x23800
则在part_read中读取的flash地址为0x300000+0x23800;
*/
for (; nsect > 0; nsect--, block++, buf += tr->blksize)
if (tr->readsect(dev, block, buf))
return -EIO;
rq_flush_dcache_pages(req);
return 0;
case WRITE:
if (!tr->writesect)
return -EIO;
rq_flush_dcache_pages(req);
for (; nsect > 0; nsect--, block++, buf += tr->blksize)
if (tr->writesect(dev, block, buf))
return -EIO;
return 0;
default:
printk(KERN_NOTICE "Unknown request %u\n", rq_data_dir(req));
return -EIO;
}
}
2 mtdblock读取flash过程
以ubi文件系统访问mtd过程为例:ubi文件系统打开一个文件时,会从flash上读取inode信息:
do_filp_open->do_last->lookup_real->ubifs_lookup->ubifs_tnc_lookup_nm->ubifs_tnc_locate
->ubifs_lookup_level0->ubifs_load_znode->ubifs_read_node->ubifs_io_read->mtd_read->part_read->nand_read
ubifs_dump_node:dump从inode信息;
2.1 do_blktrans_request()->mtdblock_readsect():
static int mtdblock_readsect(struct mtd_blktrans_dev *dev,unsigned long block, char *buf)
{
struct mtdblk_dev *mtdblk = container_of(dev, struct mtdblk_dev, mbd);
/* block<<9对应的分区内逻辑块地址,注意此时的地址0表示分区的起始地址,不是真正硬件flash的0地址 */
return do_cached_read(mtdblk, block<<9, 512, buf);
}
2.2 do_blktrans_request()->mtdblock_readsect()->do_cached_read():
static int do_cached_read (struct mtdblk_dev *mtdblk, unsigned long pos,int len, char *buf)
{
struct mtd_info *mtd = mtdblk->mbd.mtd;
/* mtdblk->cache_size=0x20000 */
unsigned int sect_size = mtdblk->cache_size;
size_t retlen;
int ret;
pr_debug("mtdblock: read on \"%s\" at 0x%lx, size 0x%x\n",
mtd->name, pos, len);
if (!sect_size)
return mtd_read(mtd, pos, len, &retlen, buf);
while (len > 0) {
unsigned long sect_start = (pos/sect_size)*sect_size;
unsigned int offset = pos - sect_start;
unsigned int size = sect_size - offset;
if (size > len)
size = len;
/*
* Check if the requested data is already cached
* Read the requested amount of data from our internal cache if it
* contains what we want, otherwise we read the data directly
* from flash.
*/
if (mtdblk->cache_state != STATE_EMPTY &&
mtdblk->cache_offset == sect_start) {
memcpy (buf, mtdblk->cache_data + offset, size);
} else {
ret = mtd_read(mtd, pos, size, &retlen, buf);
if (ret)
return ret;
if (retlen != size)
return -EIO;
}
buf += size;
pos += size;
len -= size;
}
return 0;
}
2.3 mtdblock_readsect->do_cached_read->mtd_read->part_read():
static int part_read(struct mtd_info *mtd, loff_t from, size_t len,size_t *retlen, u_char *buf)
{
struct mtd_part *part = PART(mtd);
struct mtd_ecc_stats stats;
int res;
stats = part->master->ecc_stats;
/*nand_read:注意此时真正操作flash,需要把分区内的逻辑块偏移地址加上分区起始地址作为真正的硬件flash地址*/
res = part->master->_read(part->master, from + part->offset, len,
retlen, buf);
if (unlikely(mtd_is_eccerr(res)))
mtd->ecc_stats.failed +=
part->master->ecc_stats.failed - stats.failed;
else
mtd->ecc_stats.corrected +=
part->master->ecc_stats.corrected - stats.corrected;
return res;
}
2.4 part_read->nand_read->nand_do_read_ops->(chip->cmdfunc=amb_nand_cmdfunc)
static int nand_read(struct mtd_info *mtd, loff_t from, size_t len,
size_t *retlen, uint8_t *buf)
{
struct mtd_oob_ops ops;
int ret;
nand_get_device(mtd, FL_READING);
ops.len = len;
ops.datbuf = buf;
ops.oobbuf = NULL;
ops.mode = MTD_OPS_PLACE_OOB;
ret = nand_do_read_ops(mtd, from, &ops);
*retlen = ops.retlen;
nand_release_device(mtd);
return ret;
}
nand_read->nand_do_read_ops->
/* from表示要读取的flash地址,表示从flash起始地址0x0开始偏移的地址,转换成page进行读取,一个page大小为2KB;
举例:log分区地址为<0x7200000,0x7600000>若from=0x7300000,则表示要读取log分区的文件对应page为0x7300000/2048;
*/
static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
struct mtd_oob_ops *ops)
{
int chipnr, page, realpage, col, bytes, aligned, oob_required;
struct nand_chip *chip = mtd->priv;
int ret = 0;
uint32_t readlen = ops->len;
uint32_t oobreadlen = ops->ooblen;
uint32_t max_oobsize = ops->mode == MTD_OPS_AUTO_OOB ?
mtd->oobavail : mtd->oobsize;
uint8_t *bufpoi, *oob, *buf;
int use_bufpoi;
unsigned int max_bitflips = 0;
int retry_mode = 0;
bool ecc_fail = false;
chipnr = (int)(from >> chip->chip_shift);
chip->select_chip(mtd, chipnr);
/* readpage表示要读取的flash页page num,flash一个page大小为2KB */
realpage = (int)(from >> chip->page_shift);
page = realpage & chip->pagemask;
col = (int)(from & (mtd->writesize - 1));
buf = ops->datbuf;
oob = ops->oobbuf;
oob_required = oob ? 1 : 0;
while (1) {
unsigned int ecc_failures = mtd->ecc_stats.failed;
bytes = min(mtd->writesize - col, readlen);
aligned = (bytes == mtd->writesize);
if (!aligned)
use_bufpoi = 1;
else if (chip->options & NAND_USE_BOUNCE_BUFFER)
use_bufpoi = !virt_addr_valid(buf);
else
use_bufpoi = 0;
/* Is the current page in the buffer? */
/*表示readpage这个flash页,是否已经读取过,如果未读取过,则调用chip->cmdfunc触发从flash驱动读取过程:
(chip->cmdfunc=amb_nand_cmdfunc)-> nand_amb_read_data->nand_amb_request
如果该页被读取过,则从chip->buffers->databuf直接拷贝*/
if (realpage != chip->pagebuf || oob) {
bufpoi = use_bufpoi ? chip->buffers->databuf : buf;
if (use_bufpoi && aligned)
pr_debug("%s: using read bounce buffer for buf@%p\n", __func__, buf);
read_retry:
/*
调用chip->cmdfunc真正触发从flash驱动读取数据:(chip->cmdfunc=amb_nand_cmdfunc)->nand_amb_read_data->nand_amb_request;
nand_amb_request中有等待flash访问完成的操作
*/
chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
/*
* Now read the page into the buffer. Absent an error,
* the read methods return max bitflips per ecc step.
*/
if (unlikely(ops->mode == MTD_OPS_RAW))
ret = chip->ecc.read_page_raw(mtd, chip, bufpoi,oob_required,page);
else if (!aligned && NAND_HAS_SUBPAGE_READ(chip) &&!oob)
ret = chip->ecc.read_subpage(mtd, chip,col, bytes, bufpoi,page);
else
/*nand_read_page_hwecc->amb_nand_read_buf 直接从flash驱动里的dma地址拷贝数据,真正读取过程是在cmdfunc中触发*/
ret = chip->ecc.read_page(mtd, chip, bufpoi,oob_required, page);
if (ret < 0) {
if (use_bufpoi)
/* Invalidate page cache */
chip->pagebuf = -1;
break;
}
max_bitflips = max_t(unsigned int, max_bitflips, ret);
/* Transfer not aligned data */
if (use_bufpoi) {
if (!NAND_HAS_SUBPAGE_READ(chip) && !oob &&
!(mtd->ecc_stats.failed - ecc_failures) &&
(ops->mode != MTD_OPS_RAW)) {
chip->pagebuf = realpage;
chip->pagebuf_bitflips = ret;
} else {
/* Invalidate page cache */
chip->pagebuf = -1;
}
memcpy(buf, chip->buffers->databuf + col, bytes);
}
if (unlikely(oob)) {
int toread = min(oobreadlen, max_oobsize);
if (toread) {
oob = nand_transfer_oob(chip,
oob, ops, toread);
oobreadlen -= toread;
}
}
/*等待flash准备好,以便下一page的访问*/
if (chip->options & NAND_NEED_READRDY) {
/* Apply delay or wait for ready/busy pin */
if (!chip->dev_ready)
udelay(chip->chip_delay);
else
nand_wait_ready(mtd);
}
if (mtd->ecc_stats.failed - ecc_failures) {
if (retry_mode + 1 < chip->read_retries) {
retry_mode++;
ret = nand_setup_read_retry(mtd,
retry_mode);
if (ret < 0)
break;
/* Reset failures; retry */
mtd->ecc_stats.failed = ecc_failures;
goto read_retry;
} else {
/* No more retry modes; real failure */
ecc_fail = true;
}
}
buf += bytes;
} else {
memcpy(buf, chip->buffers->databuf + col, bytes);
buf += bytes;
max_bitflips = max_t(unsigned int, max_bitflips,
chip->pagebuf_bitflips);
}
readlen -= bytes;
/* Reset to retry mode 0 */
if (retry_mode) {
ret = nand_setup_read_retry(mtd, 0);
if (ret < 0)
break;
retry_mode = 0;
}
if (!readlen)
break;
/* For subsequent reads align to page boundary */
col = 0;
/* Increment page address */
realpage++;
page = realpage & chip->pagemask;
/* Check, if we cross a chip boundary */
if (!page) {
chipnr++;
chip->select_chip(mtd, -1);
chip->select_chip(mtd, chipnr);
}
}
chip->select_chip(mtd, -1);
ops->retlen = ops->len - (size_t) readlen;
if (oob)
ops->oobretlen = ops->ooblen - oobreadlen;
if (ret < 0)
return ret;
if (ecc_fail)
return -EBADMSG;
return max_bitflips;
}
【正文】metadata block介绍之inode创建
上文对metadatablock已经有所提及,本节将以打开文件过程作为一个实例,介绍一下metadata block的含义及用法.
通过访问文件间接访问flash的方式,通常会用到metadata block,此时通过读取metadata block信息获取文件对应的inode信息,再根据inode信息读出文件内容.
1>首先打开文件过程可参考:
linux文件系统权限管理一文:http://blog.csdn.net/eleven_xiy/article/details/70210828
2>读取普通文件,首先要读取普通文件inode的信息,根据inode获取文件的具体信息.
1 squashfs_read_inode读取flash上的inode信息是通过上文介绍的squashfs直接读取flash块设备的方式.
flash上保存的inode信息为:
struct squashfs_base_inode {
__le16 inode_type;
__le16 mode;
__le16 uid;
__le16 guid;
__le32 mtime;
__le32 inode_number;
};
union squashfs_inode {
struct squashfs_base_inode base;
struct squashfs_dev_inode dev;
struct squashfs_ldev_inode ldev;
struct squashfs_symlink_inode symlink;
struct squashfs_reg_inode reg;
struct squashfs_lreg_inode lreg;
struct squashfs_dir_inode dir;
struct squashfs_ldir_inode ldir;
struct squashfs_ipc_inode ipc;
struct squashfs_lipc_inode lipc;
};
从flash上metadata block中读取inode信息,squashfs_base_inode和squashfs_inode是连续的flash区间,且都属于metadata block,读取inode信息时,先从flash上读取squashfs_base_inode再读取squashfs_inode:
/*
入参分析:
inode表示操作系统申请的dram上的inode结构,该结构的关键信息时通过squashfs_read_inode从flash上读取的.
ino表示inode信息在flash上的保存地址.即metadata block的地址.
ino右移16bit加上该superblock的inode_table是保存inode的信息的flash地址.
ino低16bit表示squashfs_base_inode信息在inode信息中的偏移地址;
普通inode和root inode的ino获取方式不同,见后文分析;
*/
int squashfs_read_inode(struct inode *inode, long long ino)
{
struct super_block *sb = inode->i_sb;
struct squashfs_sb_info *msblk = sb->s_fs_info;
/*根据ino计算inode信息在inode_table中的偏移地址*/
u64 block = SQUASHFS_INODE_BLK(ino) + msblk->inode_table;
/*根据ino计算squashfs_base_inode信息在inode信息中的偏移地址*/
int err, type, offset = SQUASHFS_INODE_OFFSET(ino);
union squashfs_inode squashfs_ino;
struct squashfs_base_inode *sqshb_ino = &squashfs_ino.base;
int xattr_id = SQUASHFS_INVALID_XATTR;
TRACE("Entered squashfs_read_inode\n");
/*
* Read inode base common to all inode types.
*/
/*从flash上读取squashfs_base_inode信息*/
err = squashfs_read_metadata(sb, sqshb_ino, &block,
&offset, sizeof(*sqshb_ino));
if (err < 0)
goto failed_read;
/* 根据从flash上读取的inode信息更新dram上的inode结构 */
err = squashfs_new_inode(sb, inode, sqshb_ino);
if (err)
goto failed_read;
block = SQUASHFS_INODE_BLK(ino) + msblk->inode_table;
offset = SQUASHFS_INODE_OFFSET(ino);
type = le16_to_cpu(sqshb_ino->inode_type);
switch (type) {
case SQUASHFS_REG_TYPE: {
unsigned int frag_offset, frag;
int frag_size;
u64 frag_blk;
struct squashfs_reg_inode *sqsh_ino = &squashfs_ino.reg;
/*从flash上读取squashfs_inode信息*/
err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
sizeof(*sqsh_ino));
if (err < 0)
goto failed_read;
frag = le32_to_cpu(sqsh_ino->fragment);
if (frag != SQUASHFS_INVALID_FRAG) {
frag_offset = le32_to_cpu(sqsh_ino->offset);
frag_size = squashfs_frag_lookup(sb, frag, &frag_blk);
if (frag_size < 0) {
err = frag_size;
goto failed_read;
}
} else {
frag_blk = SQUASHFS_INVALID_BLK;
frag_size = 0;
frag_offset = 0;
}
set_nlink(inode, 1);
inode->i_size = le32_to_cpu(sqsh_ino->file_size);
inode->i_fop = &generic_ro_fops;
inode->i_mode |= S_IFREG;
inode->i_blocks = ((inode->i_size - 1) >> 9) + 1;
squashfs_i(inode)->fragment_block = frag_blk;
squashfs_i(inode)->fragment_size = frag_size;
squashfs_i(inode)->fragment_offset = frag_offset;
squashfs_i(inode)->start = le32_to_cpu(sqsh_ino->start_block);
squashfs_i(inode)->block_list_start = block;
squashfs_i(inode)->offset = offset;
inode->i_data.a_ops = &squashfs_aops;
TRACE("File inode %x:%x, start_block %llx, block_list_start "
"%llx, offset %x\n", SQUASHFS_INODE_BLK(ino),
offset, squashfs_i(inode)->start, block, offset);
break;
}
case SQUASHFS_LREG_TYPE: {
unsigned int frag_offset, frag;
int frag_size;
u64 frag_blk;
struct squashfs_lreg_inode *sqsh_ino = &squashfs_ino.lreg;
err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
sizeof(*sqsh_ino));
if (err < 0)
goto failed_read;
frag = le32_to_cpu(sqsh_ino->fragment);
if (frag != SQUASHFS_INVALID_FRAG) {
frag_offset = le32_to_cpu(sqsh_ino->offset);
frag_size = squashfs_frag_lookup(sb, frag, &frag_blk);
if (frag_size < 0) {
err = frag_size;
goto failed_read;
}
} else {
frag_blk = SQUASHFS_INVALID_BLK;
frag_size = 0;
frag_offset = 0;
}
xattr_id = le32_to_cpu(sqsh_ino->xattr);
set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_size = le64_to_cpu(sqsh_ino->file_size);
inode->i_op = &squashfs_inode_ops;
inode->i_fop = &generic_ro_fops;
inode->i_mode |= S_IFREG;
inode->i_blocks = (inode->i_size -
le64_to_cpu(sqsh_ino->sparse) + 511) >> 9;
squashfs_i(inode)->fragment_block = frag_blk;
squashfs_i(inode)->fragment_size = frag_size;
squashfs_i(inode)->fragment_offset = frag_offset;
squashfs_i(inode)->start = le64_to_cpu(sqsh_ino->start_block);
squashfs_i(inode)->block_list_start = block;
squashfs_i(inode)->offset = offset;
inode->i_data.a_ops = &squashfs_aops;
TRACE("File inode %x:%x, start_block %llx, block_list_start "
"%llx, offset %x\n", SQUASHFS_INODE_BLK(ino),
offset, squashfs_i(inode)->start, block, offset);
break;
}
case SQUASHFS_DIR_TYPE: {
struct squashfs_dir_inode *sqsh_ino = &squashfs_ino.dir;
err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
sizeof(*sqsh_ino));
if (err < 0)
goto failed_read;
set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_size = le16_to_cpu(sqsh_ino->file_size);
inode->i_op = &squashfs_dir_inode_ops;
inode->i_fop = &squashfs_dir_ops;
inode->i_mode |= S_IFDIR;
squashfs_i(inode)->start = le32_to_cpu(sqsh_ino->start_block);
squashfs_i(inode)->offset = le16_to_cpu(sqsh_ino->offset);
squashfs_i(inode)->dir_idx_cnt = 0;
squashfs_i(inode)->parent = le32_to_cpu(sqsh_ino->parent_inode);
TRACE("Directory inode %x:%x, start_block %llx, offset %x\n",
SQUASHFS_INODE_BLK(ino), offset,
squashfs_i(inode)->start,
le16_to_cpu(sqsh_ino->offset));
break;
}
case SQUASHFS_LDIR_TYPE: {
struct squashfs_ldir_inode *sqsh_ino = &squashfs_ino.ldir;
err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
sizeof(*sqsh_ino));
if (err < 0)
goto failed_read;
xattr_id = le32_to_cpu(sqsh_ino->xattr);
set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_size = le32_to_cpu(sqsh_ino->file_size);
inode->i_op = &squashfs_dir_inode_ops;
inode->i_fop = &squashfs_dir_ops;
inode->i_mode |= S_IFDIR;
squashfs_i(inode)->start = le32_to_cpu(sqsh_ino->start_block);
squashfs_i(inode)->offset = le16_to_cpu(sqsh_ino->offset);
squashfs_i(inode)->dir_idx_start = block;
squashfs_i(inode)->dir_idx_offset = offset;
squashfs_i(inode)->dir_idx_cnt = le16_to_cpu(sqsh_ino->i_count);
squashfs_i(inode)->parent = le32_to_cpu(sqsh_ino->parent_inode);
TRACE("Long directory inode %x:%x, start_block %llx, offset "
"%x\n", SQUASHFS_INODE_BLK(ino), offset,
squashfs_i(inode)->start,
le16_to_cpu(sqsh_ino->offset));
break;
}
case SQUASHFS_SYMLINK_TYPE:
case SQUASHFS_LSYMLINK_TYPE: {
struct squashfs_symlink_inode *sqsh_ino = &squashfs_ino.symlink;
err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
sizeof(*sqsh_ino));
if (err < 0)
goto failed_read;
set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_size = le32_to_cpu(sqsh_ino->symlink_size);
inode->i_op = &squashfs_symlink_inode_ops;
inode->i_data.a_ops = &squashfs_symlink_aops;
inode->i_mode |= S_IFLNK;
squashfs_i(inode)->start = block;
squashfs_i(inode)->offset = offset;
if (type == SQUASHFS_LSYMLINK_TYPE) {
__le32 xattr;
err = squashfs_read_metadata(sb, NULL, &block,
&offset, inode->i_size);
if (err < 0)
goto failed_read;
err = squashfs_read_metadata(sb, &xattr, &block,
&offset, sizeof(xattr));
if (err < 0)
goto failed_read;
xattr_id = le32_to_cpu(xattr);
}
TRACE("Symbolic link inode %x:%x, start_block %llx, offset "
"%x\n", SQUASHFS_INODE_BLK(ino), offset,
block, offset);
break;
}
case SQUASHFS_BLKDEV_TYPE:
case SQUASHFS_CHRDEV_TYPE: {
struct squashfs_dev_inode *sqsh_ino = &squashfs_ino.dev;
unsigned int rdev;
err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
sizeof(*sqsh_ino));
if (err < 0)
goto failed_read;
if (type == SQUASHFS_CHRDEV_TYPE)
inode->i_mode |= S_IFCHR;
else
inode->i_mode |= S_IFBLK;
set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
rdev = le32_to_cpu(sqsh_ino->rdev);
init_special_inode(inode, inode->i_mode, new_decode_dev(rdev));
TRACE("Device inode %x:%x, rdev %x\n",
SQUASHFS_INODE_BLK(ino), offset, rdev);
break;
}
case SQUASHFS_LBLKDEV_TYPE:
case SQUASHFS_LCHRDEV_TYPE: {
struct squashfs_ldev_inode *sqsh_ino = &squashfs_ino.ldev;
unsigned int rdev;
err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
sizeof(*sqsh_ino));
if (err < 0)
goto failed_read;
if (type == SQUASHFS_LCHRDEV_TYPE)
inode->i_mode |= S_IFCHR;
else
inode->i_mode |= S_IFBLK;
xattr_id = le32_to_cpu(sqsh_ino->xattr);
inode->i_op = &squashfs_inode_ops;
set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
rdev = le32_to_cpu(sqsh_ino->rdev);
init_special_inode(inode, inode->i_mode, new_decode_dev(rdev));
TRACE("Device inode %x:%x, rdev %x\n",
SQUASHFS_INODE_BLK(ino), offset, rdev);
break;
}
case SQUASHFS_FIFO_TYPE:
case SQUASHFS_SOCKET_TYPE: {
struct squashfs_ipc_inode *sqsh_ino = &squashfs_ino.ipc;
err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
sizeof(*sqsh_ino));
if (err < 0)
goto failed_read;
if (type == SQUASHFS_FIFO_TYPE)
inode->i_mode |= S_IFIFO;
else
inode->i_mode |= S_IFSOCK;
set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
init_special_inode(inode, inode->i_mode, 0);
break;
}
case SQUASHFS_LFIFO_TYPE:
case SQUASHFS_LSOCKET_TYPE: {
struct squashfs_lipc_inode *sqsh_ino = &squashfs_ino.lipc;
err = squashfs_read_metadata(sb, sqsh_ino, &block, &offset,
sizeof(*sqsh_ino));
if (err < 0)
goto failed_read;
if (type == SQUASHFS_LFIFO_TYPE)
inode->i_mode |= S_IFIFO;
else
inode->i_mode |= S_IFSOCK;
xattr_id = le32_to_cpu(sqsh_ino->xattr);
inode->i_op = &squashfs_inode_ops;
set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
init_special_inode(inode, inode->i_mode, 0);
break;
}
default:
ERROR("Unknown inode type %d in squashfs_iget!\n", type);
return -EINVAL;
}
if (xattr_id != SQUASHFS_INVALID_XATTR && msblk->xattr_id_table) {
err = squashfs_xattr_lookup(sb, xattr_id,
&squashfs_i(inode)->xattr_count,
&squashfs_i(inode)->xattr_size,
&squashfs_i(inode)->xattr);
if (err < 0)
goto failed_read;
inode->i_blocks += ((squashfs_i(inode)->xattr_size - 1) >> 9)
+ 1;
} else
squashfs_i(inode)->xattr_count = 0;
return 0;
failed_read:
ERROR("Unable to read inode 0x%llx\n", ino);
return err;
}
2 通过metadata block上获取flash上的dir和inode信息.
root inode信息在flash的保存地址,由squashfs_fill_super知root inode对应的flash地址也是从flash上超级块的信息中获取的;
找到inode信息在flash上的保存地址,可以参考函数squashfs_lookup();
如打开一个文件时:do_sys_open->do_last->lookup_real->squashfs_lookup:
flash上保存squashfs_dir_header,squashfs_dir_entry,squashfs_dir_entry->name是连续的flash区间,且都属于metadata blcok.比如操作系统在查找一个文件/mnt/test时,需要先从flash上获取待查找文件test所在目录mnt的squashfs_dir_header,再根据squashfs_dir_header获取mnt下的目录文件和普通文件个数,遍历mnt下每个文件,从flash上获取每个文件的squash_dir_entry和squash_dir_entry->name,根据squash_dir_entry->name判断是否用户要查找的文件,如果是则根据squashfs_dir_header->start_block和squashfs_dir_entry->offset指定的inode信息在flash上位置,获取squashfs_inode信息.
struct squashfs_dir_entry {
__le16 offset;
__le16 inode_number;
__le16 type;
__le16 size;
char name[0];
};
struct squashfs_dir_header {
__le32 count;
__le32 start_block;
__le32 inode_number;
};
squashfs_lookup给出了flash上普通文件inode信息的查找与获取方法:
/*
入参分析:dir表示文件所在目录对应的inode;dentry表示文件对应的目录项;
*/
static struct dentry *squashfs_lookup(struct inode *dir, struct dentry *dentry,
unsigned int flags)
{
const unsigned char *name = dentry->d_name.name;
int len = dentry->d_name.len;
struct inode *inode = NULL;
struct squashfs_sb_info *msblk = dir->i_sb->s_fs_info;
struct squashfs_dir_header dirh;
struct squashfs_dir_entry *dire;
/*
根据文件所在目录的inode信息计算,文件所在目录信息squashfs_dir_header在directory_table中的偏移地址
因为根节点在挂载文件系统时初始化好了,所以文件所在目录信息在flash上的保存地址总能逐级找到.
*/
u64 block = squashfs_i(dir)->start + msblk->directory_table;
/*根据文件所在目录的inode信息计算squashfs_dir_header信息在directory信息中的偏移地址*/
int offset = squashfs_i(dir)->offset;
int err, length;
unsigned int dir_count, size;
TRACE("Entered squashfs_lookup [%llx:%x]\n", block, offset);
dire = kmalloc(sizeof(*dire) + SQUASHFS_NAME_LEN + 1, GFP_KERNEL);
if (dire == NULL) {
ERROR("Failed to allocate squashfs_dir_entry\n");
return ERR_PTR(-ENOMEM);
}
if (len > SQUASHFS_NAME_LEN) {
err = -ENAMETOOLONG;
goto failed;
}
length = get_dir_index_using_name(dir->i_sb, &block, &offset,
squashfs_i(dir)->dir_idx_start,
squashfs_i(dir)->dir_idx_offset,
squashfs_i(dir)->dir_idx_cnt, name, len);
while (length < i_size_read(dir)) {
/*
* Read directory header.
*/
/*从flash上获取文件所在目录的squashfs_dir_header信息*/
err = squashfs_read_metadata(dir->i_sb, &dirh, &block,
&offset, sizeof(dirh));
if (err < 0)
goto read_failure;
length += sizeof(dirh);
dir_count = le32_to_cpu(dirh.count) + 1;
if (dir_count > SQUASHFS_DIR_COUNT)
goto data_error;
/*遍历文件所在目录下的所有目录文件和普通文件,注意此处接连读取了squashfs_dir_header和squashfs_dir_entry和squashfs_dir_header->name
这说明squashfs_dir_header和squashfs_dir_entry和squashfs_dir_entry->name几个信息在metadata block上是连续的flash区间
*/
while (dir_count--) {
/*
* Read directory entry.
*/
/*从flash上获取文件所在目录的squashfs_dir_entry信息,该信息中保存了文件名在flash上的保存地址等信息*/
err = squashfs_read_metadata(dir->i_sb, dire, &block,
&offset, sizeof(*dire));
if (err < 0)
goto read_failure;
size = le16_to_cpu(dire->size) + 1;
/* size should never be larger than SQUASHFS_NAME_LEN */
if (size > SQUASHFS_NAME_LEN)
goto data_error;
/*从flash上获取目录文件或普通文件的文件名*/
err = squashfs_read_metadata(dir->i_sb, dire->name,
&block, &offset, size);
if (err < 0)
goto read_failure;
length += sizeof(*dire) + size;
if (name[0] < dire->name[0])
goto exit_lookup;
/* 比较从flash中读取的文件名和squashfs_looup中查找的文件名,判断是否找到文件的inode信息 */
if (len == size && !strncmp(name, dire->name, len)) {
unsigned int blk, off, ino_num;
long long ino;
blk = le32_to_cpu(dirh.start_block);
off = le16_to_cpu(dire->offset);
ino_num = le32_to_cpu(dirh.inode_number) +
(short) le16_to_cpu(dire->inode_number);
ino = SQUASHFS_MKINODE(blk, off);
TRACE("calling squashfs_iget for directory "
"entry %s, inode %x:%x, %d\n", name,
blk, off, ino_num);
/*
如果在flash上找到了文件名,则为该文件创建inode,
值得注意的是inode信息也是通过squashfs_iget->squashfs_read_inode从flash中获取的.
*/
inode = squashfs_iget(dir->i_sb, ino, ino_num);
goto exit_lookup;
}
}
}
exit_lookup:
kfree(dire);
return d_splice_alias(inode, dentry);
data_error:
err = -EIO;
read_failure:
ERROR("Unable to read directory block [%llx:%x]\n",
squashfs_i(dir)->start + msblk->directory_table,
squashfs_i(dir)->offset);
failed:
kfree(dire);
return ERR_PTR(err);
}
squashfs_lookup->get_dir_index_using_name:
static int get_dir_index_using_name(struct super_block *sb,
u64 *next_block, int *next_offset, u64 index_start,
int index_offset, int i_count, const char *name,
int len)
{
struct squashfs_sb_info *msblk = sb->s_fs_info;
int i, length = 0, err;
unsigned int size;
struct squashfs_dir_index *index;
char *str;
TRACE("Entered get_dir_index_using_name, i_count %d\n", i_count);
index = kmalloc(sizeof(*index) + SQUASHFS_NAME_LEN * 2 + 2, GFP_KERNEL);
if (index == NULL) {
ERROR("Failed to allocate squashfs_dir_index\n");
goto out;
}
str = &index->name[SQUASHFS_NAME_LEN + 1];
strncpy(str, name, len);
str[len] = '\0';
/*i_count=0*/
for (i = 0; i < i_count; i++) {
err = squashfs_read_metadata(sb, index, &index_start,
&index_offset, sizeof(*index));
if (err < 0)
break;
size = le32_to_cpu(index->size) + 1;
if (size > SQUASHFS_NAME_LEN)
break;
err = squashfs_read_metadata(sb, index->name, &index_start,
&index_offset, size);
if (err < 0)
break;
index->name[size] = '\0';
if (strcmp(index->name, str) > 0)
break;
length = le32_to_cpu(index->index);
*next_block = le32_to_cpu(index->start_block) +
msblk->directory_table;
}
*next_offset = (length + *next_offset) % SQUASHFS_METADATA_SIZE;
kfree(index);
out:
/*
* Return index (f_pos) of the looked up metadata block. Translate
* from internal f_pos to external f_pos which is offset by 3 because
* we invent "." and ".." entries which are not actually stored in the
* directory.
*/
return length + 3;
}
dirctory和inode信息保存在metadata block上,读取metadata block的接口是squashfs_read_metadata:
int squashfs_read_metadata(struct super_block *sb, void *buffer,
u64 *block, int *offset, int length)
{
struct squashfs_sb_info *msblk = sb->s_fs_info;
int bytes, res = length;
struct squashfs_cache_entry *entry;
TRACE("Entered squashfs_read_metadata [%llx:%x]\n", *block, *offset);
while (length) {
/*
从flash metadata block上读取directory或inode信息;block_cache是在squashfs_fill_super中为metadata block创建的缓存区
用来缓存metadata block中的数据;
*/
entry = squashfs_cache_get(sb, msblk->block_cache, *block, 0);
if (entry->error) {
res = entry->error;
goto error;
} else if (*offset >= entry->length) {
res = -EIO;
goto error;
}
/*保存从metadata block上读取的信息到entry->data中*/
bytes = squashfs_copy_data(buffer, entry, *offset, length);
if (buffer)
buffer += bytes;
length -= bytes;
*offset += bytes;
if (*offset == entry->length) {
/* block地址更改,主要用于在squashfs_cache_get中判断,该block地址是否在 squashfs_cache_entry中,
如果不在需要从flash上读取到squashfs_cache_entry->data中
*/
*block = entry->next_index;
*offset = 0;
}
squashfs_cache_put(entry);
}
return res;
error:
squashfs_cache_put(entry);
return res;
}
squashfs_cache_get:把flash上的数据读取到缓存区中,用户读文件时直接从缓存区读,如果没找到,再从flash上读.读取data/metadata/fragmentdata都是通过该接口.
squashfs_cache缓存区空间小,主要是读过程的缓存区,它和页高速缓存不同,页高速缓存文件内容.squashfs_cache缓存的文件内容要拷贝到页高速缓存中.
squashfs_cache缓存区主要包括:
1 block_cache缓存区(缓存metadata block上的内容,metadata block上主要保存inode和direcotry信息).
2 read_page缓存区(缓存data block上内容,主要是文件内容).
3 fragment_cache缓存区.
举例:squashfs_read_metadata()->squashfs_cache_get()从block_cache中获取metadata block的接口,如果block_cache中没有保存metadata,则从flash中读取到metadata block到block_cache里.
struct squashfs_cache_entry *squashfs_cache_get(struct super_block *sb,struct squashfs_cache *cache, u64 block, int length)
{
int i, n;
struct squashfs_cache_entry *entry;
spin_lock(&cache->lock);
while (1) {
/*metadata 缓存区有8个entry;cache->entries=8*/
for (i = cache->curr_blk, n = 0; n < cache->entries; n++) {
/*通过比较block来判断cache->entry是否使用*/
if (cache->entry[i].block == block) {
cache->curr_blk = i;
break;
}
i = (i + 1) % cache->entries;
}
/*
n == cache->entries表示metadata block数据不在block_cache的缓存区中,需要从flash上读取到缓存区,
metadata缓存区的创建可以查看上文的squashfs_fill_super->squashfs_cache_get
*/
if (n == cache->entries) {
/*
* Block not in cache, if all cache entries are used
* go to sleep waiting for one to become available.
没有空闲的cache->entry,则等待直到cache->entry有空闲.
*/
if (cache->unused == 0) {
cache->num_waiters++;
spin_unlock(&cache->lock);
wait_event(cache->wait_queue, cache->unused);
spin_lock(&cache->lock);
cache->num_waiters--;
continue;
}
/*
* At least one unused cache entry. A simple
* round-robin strategy is used to choose the entry to
* be evicted from the cache.
*/
i = cache->next_blk;
for (n = 0; n < cache->entries; n++) {
if (cache->entry[i].refcount == 0)
break;
i = (i + 1) % cache->entries;
}
cache->next_blk = (i + 1) % cache->entries;
entry = &cache->entry[i];
/*
* Initialise chosen cache entry, and fill it in from
* disk.squash_cache_entry未使用的个数减一
*/
cache->unused--;
entry->block = block;
entry->refcount = 1;
entry->pending = 1;
entry->num_waiters = 0;
entry->error = 0;
spin_unlock(&cache->lock);
/* 把flash上数据读取到squashfs_data_cache中,如此用户下次可以直接从cache上取数据,不用再读flash */
entry->length = squashfs_read_data(sb, block, length,
&entry->next_index, entry->actor);
spin_lock(&cache->lock);
if (entry->length < 0)
entry->error = entry->length;
entry->pending = 0;
/*
* While filling this entry one or more other processes
* have looked it up in the cache, and have slept
* waiting for it to become available.
*/
if (entry->num_waiters) {
spin_unlock(&cache->lock);
wake_up_all(&entry->wait_queue);
} else
spin_unlock(&cache->lock);
goto out;
}
/*
* Block already in cache. Increment refcount so it doesn't
* get reused until we're finished with it, if it was
* previously unused there's one less cache entry available
* for reuse.
*/
entry = &cache->entry[i];
if (entry->refcount == 0)
cache->unused--;
entry->refcount++;
/*
* If the entry is currently being filled in by another process
* go to sleep waiting for it to become available.
*/
if (entry->pending) {
entry->num_waiters++;
spin_unlock(&cache->lock);
wait_event(entry->wait_queue, !entry->pending);
} else
spin_unlock(&cache->lock);
goto out;
}
out:
TRACE("Got %s %d, start block %lld, refcount %d, error %d\n",
cache->name, i, entry->block, entry->refcount, entry->error);
if (entry->error)
ERROR("Unable to read %s cache entry [%llx]\n", cache->name,
block);
return entry;
}
总结一下inode的创建过程,假设为/mnt/test文件创建inode:
1>操作系统中创建inode结构信息,需要从flash中读取关键信息.
2>首先要从flash上读取test所在目录mnt的目录头:squashfs_dir_header,由此可知mnt下面有多少个dir信息,每个普通文件或目录文件都对应一个squashfs_dir_entry信息.
3>接着遍历mnt目录下所有文件,从flash上读取mnt目录下所有文件的squashfs_dir_entry信息;再从flash上读取mnt目录下每个文件的文件名.
4>如果flash上metadata block中保存的文件名和我们查找的文件名匹配,则创建inode,创建inode时也需要从flash中读取inode信息.
5>如果metadata block已经读取到了squashfs_cache中,则不需要再从flash中读取,见squashfs_cache_get;
【正文】读文件之squashfs_readpage
read方式读取文件,系统调用处理过程:generic_file_aio_read->do_generic_file_read()
mmap方式读取文件,缺页异常处理过程:handle_pte_fault->do_nolinear_fault->__do_fault->filemap_fault
其中:mmap方式读取可以参考博文:linux内存回收机制 http://blog.csdn.net/eleven_xiy/article/details/75195490;
do_generic_file_read参见博文:linux文件系统实现原理简述 http://write.blog.csdn.net/postedit/71249365;
普通文件读操作read方式:generic_file_aio_read->do_generic_file_read->squashfs_readpage();
squashfs_readpage中正好包含了squashfs中读取flash的三种类型:
第一 读取metadata block,squashfs_read_metadata->squashfs_cache_get从block_cache缓存区获取数据,block_cache缓存区缓存了metadatalbock数据(inode和direcotry信息).如果缓存区不存在,则squashfs_cache_get->squashfs_read_data把flash上metadata block数据读到block_cache缓存区(squashfs_cache_entry_data中保存读取的flash数据).则下次就不用再从flash上读取.
第二 读取data block,squashfs_get_datablock->squashfs_cache_get从read_page缓存区获取数据,read_page缓存区缓存了datalbock数据(文件内容),如果缓存区不存在,则squashfs_cache_get->squashfs_read_data把flash上data block数据读到read_page缓存区.则下次就不用再从flash上读取.
第三 读取fragment block.squashfs_get_fragment->squashfs_cache_get从fragment_cache缓存区获取data block数据.
static int squashfs_readpage(struct file *file, struct page *page)
{
struct inode *inode = page->mapping->host;
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
int index = page->index >> (msblk->block_log - PAGE_CACHE_SHIFT);
/*把文件大小转换为块个数,1个块512k(mksquashfs -b时指定)*/
int file_end = i_size_read(inode) >> msblk->block_log;
int res;
void *pageaddr;
/*表示一个逻辑块所占页数的掩码
比如:一个分区的逻辑块大小为128k(mksquashfs时默认大小),那么mask=2的5次方-1=31;
即表示这个逻辑块包含32个页.即squashfs_readpage一次读取同一文件的32个页.
*/
int mask = (1<<(msblk->block_log-PAGE_CACHE_SHIFT))-1;
/*index表示page->index>>5;即要读取的文件的偏移位置,在哪一个逻辑块上(一个逻辑块32个页)*/
int index=page->index>>(msblk->block_log-PAGE_CACHE_SHIF);
/*
表示要读取的文件的偏移位置所在偏移页编号如
[160-191]:start_Inex=160,end_index=191
*/
int start_index=page->index&~mask;
/*
表示要读取的文件的偏移位置所在偏移页编号如
[160-191]:start_Inex=160,end_index=191
*/
int end_index=start_index|mask;
TRACE("Entered squashfs_readpage, page index %lx, start block %llx\n",
page->index, squashfs_i(inode)->start);
if (page->index >= ((i_size_read(inode) + PAGE_CACHE_SIZE - 1) >>
PAGE_CACHE_SHIFT))
goto out;
if (index < file_end || squashfs_i(inode)->fragment_block ==
SQUASHFS_INVALID_BLK) {
u64 block = 0;
/*
从block_cache缓存区读取metadata block数据;block_cache缓存区在squashfs_fill_super时初始化;block_cache大小为8*8192byte
(8指squashfs_cache->entries个数;8192是一个entry的大小);读取方式squashfs_read_metadata()->squashfs_cache_get()
*/
int bsize = read_blocklist(inode, index, &block);
if (bsize < 0)
goto error_out;
/*squashfs_readpage_block->squashfs_get_datablock()->squashfs_cache_get():
从read_page缓存区读取data block数据;read_page缓存区在squashfs_fill_super时初始化;read_page缓存区大小为1*512kbyte
(1指squashfs_cache->entries个数;512k是一个entry的大小);读取方式squashfs_get_datablock()->squashfs_cache_get()
squashfs_get_datablock()中获取squashfs_cache_entry,squashfs_cache_entry->data中保存从flash上读取的数据
*/
if (bsize == 0)
res = squashfs_readpage_sparse(page, index, file_end);
else
res = squashfs_readpage_block(page, block, bsize);
} else
/*squashfs_readpage_fragment->squashfs_get_fragment()->squashfs_cache_get():
从fragment_cache缓存区读取data数据;fragment_cache缓存区在squashfs_fill_super时初始化;fragment_cache大小为3*512kbyte
(3指squashfs_cache->entries个数;512k是一个entry的大小);读取方式squashfs_get_fragment()->squashfs_cache_get()
squashfs_get_fragment()中获取squashfs_cache_entry,squashfs_cache_entry->data中保存从flash上读取的数据
*/
res = squashfs_readpage_fragment(page);
if (!res)
return 0;
error_out:
SetPageError(page);
out:
pageaddr = kmap_atomic(page);
memset(pageaddr, 0, PAGE_CACHE_SIZE);
kunmap_atomic(pageaddr);
flush_dcache_page(page);
if (!PageError(page))
SetPageUptodate(page);
unlock_page(page);
return 0;
}
【总结】1> 操作一个文件的过程,需要根据超级块找到超级块的信息squashfs_sb_info=sb->s_fs_info;而squashfs_sb_info在squashfs_fill_super中初始化.
挂载文件系统时,初始超级块的信息:
super_block->s_blocksize=0x400=1024;//逻辑块大小
suqashfs_sb_info->devblksize=1024; //squashfs_fill_super时初始化;
squashfs_sb_info->block_size=512K; //mksquashfs制作文件系统时指定;
2>文件系统层读取flash的接口是squashfs_cache_get,该接口可以读取metadata block和data block及fragment block内容,具体实现可以参看上文.squashfs_cache_get先从squashfs_cache缓存中获取metada block或datablock内容,如果获取不到再通过squashfs_read_data从flash中读取.
3> squashfs读写文件时通过squashfs_read_data->ll_rw_block->submit_bh->submit_bio->genric_make_request=blk_queue_bio 提交读写请求.注意文件系统层面的读写如squashfs_read_data等.注意此时涉及的block都是指读写地址的分区内偏移地址,在part_read之后才转换成真正的flash地址.
4> 操作系统中有专门的任务将数据写到flash中:
mtd_blktrans_work->do_blktrans_request->mtdblock_readsect->mtd_read->part_read->nand_read->nand_do_read_ops->nand_read_page_raw->
(驱动中读操作,例如:hifmc100_read_buf);do_blktrans_request()中处理submit_bh中提交的请求,并根据请求的信息完成flash读写等操作.
5>真正读取一个flash地址时先找到这个地址对应的buffer_head,buffer_head中保存了要读取到的dram地址bh->b_data;见上面分析.
6>inode和direcotry信息保存到metadata block中,metadata block在flash上专门用作保存文件inode等信息.squashfs_read_metadata是专门获取metadata block的接口;普通文件是使用squashfs_read_metadata获取文件的inode信息;sqashfs_read_data是读取普通data block的接口,可以通过sqashfs_read_data接口直接获取指定flash地址的内容.
更多推荐
所有评论(0)