There is one thing weird with it, though. After a certain number of mountings, both my root and my home partition get checked. While this isn't strange at all, the information this process prints in my screen is: both filesystems are around 20% non-contiguous. To be honest, I'm being a dick here (I'm sorry if your name is Dick); there is no noticeable difference in performance that I can tell, but still, 20% is a big number and I thought that defragging this partitions would be a good thing to do.
The reason I say would is there is no official defragger for ext4. What now, Jose?
I remembered the days I used to use XFS and it had an online defrag. I also remember it was terribly slow with small files, so updating my system with pacman was frigging slow. I also remember that, since then, I've read a lot about filesystems and optimizations and that I've never tried using noatime with XFS. I searched some pages about optimizing XFS and it's incredible how much better it can get with some simple options during it's creation and some other options that should go in fstab .
That being said, I'll try it in a spare partition I have, and maybe change my system to use XFS (hopefully, I'll do it without having to reinstall anything). Wish me luck and let LVM be with you.
Benchmarks
So I decided to make some benchmarks before changing my partitions. I , my . Don't ask for benchmarks of other filesystems, I won't do it. I chose to test only ext4 and XFS, as Reiser3 and ext3 are dated filesystems (my backup is ext3, though) and other bechmarks showed that JFS doesn't have the performance I expect. It may use less CPU time, but whatever. BtrFS is still being developed and other filesystems don't seem to be ready, too.
OK... The actual reason I didn't want to make a lot of tests is that I'll use my only computer to do them, but I want these tests to be honest , so I'll run them with the minimal number of processes running. This isn't per se a problem, but what am I supposed to do during the tests? Shave? So I decided to do only three tests: ext4 with delayed allocation (without nodelalloc ), default XFS, optimized XFS. The reason I didn't optimize ext4, is that I didn't find any nice text about it. There are only small modifications, such as using noatime , but this applies to all filesystems.
I have a spare partition here of 15GB that I'll use for the tests. My home partition is a lot bigger than that, but that's what I have. Besides, my root partition is a little smaller than that, so although the tests won't be a good representation of my home partition, it'll be a very good one of my root partition. This test partition won't be using LVM nor cryptography and these are the options I've used:
mkfs.ext4 /dev/sda3
mkfs.xfs /dev/sda3
mkfs.xfs -l lazy-count=1,size=128m /dev/sda3
The ext4 partition uses the defaults of Arch Linux:
[defaults]
base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
blocksize = 4096
inode_size = 256
inode_ratio = 16384
[fs_types]
ext4 = {
features = has_journal,extents,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
inode_size = 256
}
The optimized XFS has two different options related to logs: by default, XFS uses a log of 22MB, but its performance increases with bigger logs, so we use 128MB. Also, XFS tries to keep counters in superblocks always up to date, but this information can be retrieved only when necessary. Turning on lazy-count, we avoid some disk writes.
All three filesystems will be mounted with noatime . Ext4 and default XFS won't have any other option specified beyond this one. The optimized XFS will be mounted with two other options: logbufs=8, which increase the number of log buffers from 2 to 8, and logbsize=256k which increases log buffers size from 32KB to 256KB. This increases memory usage, of course, but I have 3GB. It won't be 2MB that will make me run out of memory.
Note: I've decided to benchmark only the optimized XFS. When I finished the first set of benchmarks, I was already pretty bored.
Kernel
Every time I see a benchmark of file systems, there is a test like this. So I'll do it, too! I first extract the contents from the kernel .tar.bz2 and then I copy the folder to another place in the same partition and then rm everything.
#!/bin/sh
cp /home/andre/code/linux-2.6.30.5.tar.bz2 /media/bench/
cd /media/bench
tar -xjf linux-2.6.30.5.tar.bz2
cp -R linux-2.6.30.5 lunix
rm -rf linux-2.6.30.5{,.tar.bz2} lunix
Pacman
Pacman is the package manager of Arch Linux. It handles a lot of small files, so I guess this a pretty good test. I run pacman -Syy -b /new/partition, which generates a database in a different folder from the default and then I search it for three packages: kernel26, gimp and ncmpcpp. There is a problem with this test, though: the first part of it depends on the network, too. Statistically, running this test a few times should minimize the differences.
#!/bin/sh
pacman -Syy -b /media/bench
pacman -Si kernel26 -b /media/bench
pacman -Si gimp -b /media/bench
pacman -Si ncmpcpp -b /media/bench
With you... The Beatles!
I decided to use The Beatles' discography as a test for a big number of files of medium size (~4MB). My musical collection is much bigger than that, but I guess we can get some good information from this.
#!/bin/sh
cp -R /home/andre/media/music/The\ Beatles /media/bench/Beatles
cp -R /media/bench/Beatles /media/bench/Rutles
rm -r /media/bench/Beatles /media/bench/Rutles
Moving a disk around
I guess the biggest file I have in my computer is a virtual disk of a virtual machine, so I use it to test the performance of the filesystem with big files (~2.2GB).
#!/bin/sh
cp /home/andre/.local/.VirtualBox/HardDisks/arch.vdi /media/bench
cp /home/andre/.local/.VirtualBox/HardDisks/lose32.vdi /media/bench
cp /media/bench/arch.vdi /media/bench/arch2.vdi
cp /media/bench/lose32.vdi /media/bench/lose64.vdi
rm /media/bench/{arch,arch2,lose32,lose64}.vdi
Results
All the scripts above were run 5 times, then I calculated the mean and the sanitized mean (the mean without the highest and the lowest value). Actually, I used a python script that can be found at the end of this article.
To be honest, I thought XFS would perform much better. With medium and large files, XFS got really close to ext4, but it was never faster than ext4, and it was almost three times slower when handling with the kernel files.
After these tests, I'll keep ext4 for longer, as probably there's no other match to it.
=KERNEL
./time-it 5 ./kernel.sh
ext4
52.2324
42.7818
42.7238
51.5118
42.6375
mean: 46.3774
sanatized mean: 45.6725
xfs-opt
147.237
164.076
188.027
148.23
134.843
mean: 156.483
sanatized mean: 153.181
=PACMAN
./time-it 5 ./pacman.sh
ext4
22.139
5.6207
4.98342
5.74287
5.2469
mean: 8.74658
sanatized mean: 5.53682
xfs-opt
8.51362
14.2627
15.3344
14.9117
14.0883
mean: 13.4222
sanatized mean: 14.4209
=BEATLES
./time-it 5 ./beatles.sh
ext4
292.881
306.477
297.082
303.119
301.018
mean: 300.115
sanatized mean: 300.406
xfs-opt
294.875
301.987
297.663
304.623
301.512
mean: 300.132
sanatized mean: 300.387
=VIRTUAL
./time-it 5 ./virtual.sh
ext4
435.735
432.042
439.759
446.086
502.91
mean: 451.306
sanatized mean: 440.527
* I opened firefox during this one, so maybe it would be better to consider only the 4 first times.
mean: 438.406
xfs-opt
444.722
441.038
432.156
432.414
447.277
mean: 439.521
sanatized mean: 439.391
=USAGE
ext4
/dev/sda3 11535376 159680 10789728 2% /media/bench
xfs-opt
/dev/sda3 11588344 4256 11584088 1% /media/bench
=NOTES
ext4: with both BEATLES and VIRTUAL, the system got really slow.
xfs-opt: the same.
time-it.py
#!/usr/bin/env python
import time
import sys
import subprocess
import math
if len(sys.argv) < 3:
print "time-it.py
exit()
runs = int(sys.argv[1])
command = sys.argv[2:]
def mean(lst):
global runs
return (math.fsum(lst) / float(runs))
def san_mean(lst):
global runs
lst.sort()
return (math.fsum(lst[1:-1]) / float(runs-2))
time.sleep(2)
count = 0
timing = []
while count < runs:
t1 = time.time()
subprocess.call(command)
t2 = time.time()
timing.append(t2-t1)
count += 1
time.sleep(1)
count = 0
print " ".join(command)
while count < runs:
print "%g" % timing[count]
count += 1
print "mean: %g" % mean(timing)
print "sanatized mean: %g" % san_mean(timing)
Good work, but XFS is still faster and safer than ext4. Ext4 has does some very dangerous ommisions with delayed deallocations. XFS use delay deallocations too (ext4 copied the idea from xfs) but XFS has implemented safety proceedures that prevent destruction to old file data if an unclean shutdown/powerfail occurs while updating that files data. BUT ext4 will have zeroed out the old file data before updating the file data from the data journal; now if an unclean shutdown/or powerfail occurs not only with the new new data in the journal be lost, but also the original data in the file (because the stupid ext4 fs would have zeroed out the data prior to performing a commit from the journal). Unlike what some ill informed programmers have written on other blogs, XFS does not suffer from this destructive critical failure.
ReplyDeleteCurious comment that XFS is still faster when the numbers indicate otherwise?
ReplyDeleteIs this still the case in 2011? and what about with large modern files like HD movies?
ReplyDeleteHi, I don't think that EXT4 is always faster then XFS (and vice-versa).
ReplyDeleteHere you can find some benchmark I run in the past weeks:
1) http://www.ilsistemista.net/index.php/linux-a-unix/13-ext4-vs-xfs-large-volumes-with-low-end-raid-controller.html
2) http://www.ilsistemista.net/index.php/linux-a-unix/6-linux-filesystems-benchmarked-ext3-vs-ext4-vs-xfs-vs-btrfs.html
Thanks.