AlexJ's Computer Science Journal

alexandru.juncu.ro

File magic

Let’s have some fun with files and filesystems. For practical purposes, I am going to use a simple file that I will treat as a pseudo-block device. So think that ‘vdisk’ would be a generic partition on a disk. I am going to format it with an ext4 filesystem and mount it as a loopback mountpoint. The virtual disk will have a size of 1GB.

 

[root@ptah tmp]# dd if=/dev/zero of=/tmp/vdisk1 bs=1MB count=1000
1000+0 records in
1000+0 records out
1000000000 bytes (1.0 GB) copied, 0.772459 s, 1.3 GB/s

[root@ptah tmp]# mkfs.ext4 /tmp/vdisk1
mke2fs 1.42.9 (28-Dec-2013)
/tmp/vdisk1 is not a block special device.
Proceed anyway? (y,n) y
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
61056 inodes, 244140 blocks
12207 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=251658240
8 block groups
32768 blocks per group, 32768 fragments per group
7632 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

[root@ptah tmp]# mkdir /tmp/vdisk.ext4
[root@ptah tmp]# mount /tmp/vdisk.ext4 /tmp/vdisk1

[root@ptah tmp]# cd /tmp/vdisk.ext4/
[root@ptah vdisk.ext4]# df -h
Filesystem Size Used Avail Use% Mounted on
[..]
/dev/loop0 923M 2.4M 857M 1% /tmp/vdisk.ext4

Now, I want to create a file of a certain size. truncate is a good command of doing that.

[root@ptah vdisk.ext4]# truncate -s 1T huge_file
[root@ptah vdisk.ext4]# ls -lh huge_file
-rw-r–r–. 1 root root 1.0T Mar 7 16:15 huge_file

At this point, you should notice that something is wrong: I just created a file of one TeraByte on a filesystem that has only one GigaByte. Moreover, it seems that the filesystem is still far from full.

[root@ptah vdisk.ext4]# df -h
Filesystem Size Used Avail Use% Mounted on
[..]
/dev/loop0 923M 2.4M 857M 1% /tmp/vdisk.ext4

In fact, it seems that haven’t used any space of the filesystem. To check what is the actual disk usage of the file, I use the du command:

[root@ptah vdisk.ext4]# du -h huge_file
0 huge_file

This confirms that the file isn’t using any space, despite the fact that it has 1TB. How come?

To understand why this happened we need to understand what is a file. A file is actually composed of an inode and 0, one or more data blocks. Note that a file also needs a dentry (directory entry) to exist, but we don’t need to get into that now. An inode is a structure that describes the file having things like file owner, permissions, creation/modification/access times, file size (which is what we care about now) and other things that depend on the specific file system (the name of the file is NOT contained in the inode… that’s why we need a dentry). A data block is a structure where the actual contents of the file are stored. The size of a block depends how the filesystem was formatted (in this example, one block has 4096 bytes). An empty file has zero blocks. But as the file grows, more and more block are allocated.

So a file that has 0 bytes, occupies 0 blocks. A file that has 42 bytes, occupies one block and so does a file that has 1024 or 4096 bytes. If the file has 4097 bytes it will now occupy two blocks (consuming 8192 bytes on disk) and so on. But that means that our 1TB file occupy many blocks (244140625 to be exact). Only it looks like it doesn’t use any. The stats confirms this:

[root@ptah vdisk.ext4]# stat huge_file
File: ‘huge_file’
Size: 1099511627776 Blocks: 0 IO Block: 4096 regular file
Device: 700h/1792d Inode: 12 Links: 1
Access: (0644/-rw-r–r–) Uid: ( 0/ root) Gid: ( 0/ root)

Why is that? It’s because the truncate command just set the file size value in the inode of the file, but it did not actually allocate blocks for the data, because I didn’t write any data into it. So truncate only affects the inode, not the actual data blocks.

If I would actually create a file and write data into, I would find that I can’t really write that much data in the file:

[root@ptah vdisk.ext4]# dd if=/dev/zero of=actual_huge_file bs=1MB count=1000000
dd: error writing ‘actual_huge_file’: No space left on device
949+0 records in
948+0 records out
948240384 bytes (948 MB) copied, 0.910452 s, 1.0 GB/s

Now, let’s try the same thing on a FAT files system.

[root@ptah vdisk.ext4]# dd if=/dev/zero of=/tmp/vdisk2 bs=1MB count=1000
1000+0 records in
1000+0 records out
1000000000 bytes (1.0 GB) copied, 0.923546 s, 1.1 GB/s
[root@ptah vdisk.ext4]# mkfs.vfat /tmp/vdisk2
mkfs.fat 3.0.20 (12 Jun 2013)
[root@ptah vdisk.ext4]# mkdir /tmp/vdisk.vfat/
[root@ptah vdisk.ext4]# mount /tmp/vdisk2 /tmp/vdisk.vfat/
[root@ptah vdisk.ext4]# df -h
Filesystem                                     Size  Used Avail Use% Mounted on
[..]
/dev/loop1                                     952M  4.0K  952M   1% /tmp/vdisk.vfat
[root@ptah vdisk.ext4]# truncate -s 1T huge_file
[root@ptah vdisk.ext4]# cd /tmp/vdisk.vfat/
[root@ptah vdisk.vfat]# truncate -s 1T huge_file
truncate: failed to truncate ‘huge_file’ at 1099511627776 bytes: File too large

So the trick doesn’t work. This is because when you change the file size in the FAT inode, you will have to also allocate the blocks to fit that size. Unlike FAT, ext* filesystems, do not need to allocate blocks until they are actually needed.

Some lessons to take away from this:

  • ls will not show the actual file size. This is why ls takes a very small amount of time to get the size of an entire directory. It will only read the information from the inode
  • du will actually calculate the size of the space occupied on disk, but counting the blocks
  • stats will show you how many blocks a file has (how many data blocks are associated with an inode)
  • a file could show to have a smaller size (in the file size inode field) than the actual space on disk, because the block is the unit of allocation
  • truncate command (along with truncate system call) will only set/modify the file size field in the inode
  • depending on the filesystem and its implementation, when the file size is set, it may or may not actually allocate data blocks
  • dd will actually create data blocks because it actually writes data inside the file

 

Tribute to my colleagues in the Storage and Filesystems team at Red Hat who gave me the idea of writing this.

Comments

Comment

AlphaOmega Captcha Classica  –  Enter Security Code
     
 

*