A blog about my nerdy stuff. 0x3a29

Showing posts with label linux. Show all posts
Showing posts with label linux. Show all posts

Friday, August 28, 2009

Benchmark: XFS vs ext4

I guess by now everybody has read about the files trucated to 0 bug of ext4. As a precaution, I turned on the nodelalloc option in fstab just after I installed Arch Linux in my recently bought notebook. Everything is running fine, safety and speedwise (especially considering my partitions are encrypted with LUKS and my password is ********... ops!), but I can't compare it to anything, as I've never used any other filesystem in this computer, and when it arrived I was too anxious to run benchmarks and more benchmarks to decide what filesystem to use.

There is one thing weird with it, though. After a certain number of mountings, both my root and my home partition get checked. While this isn't strange at all, the information this process prints in my screen is: both filesystems are around 20% non-contiguous. To be honest, I'm being a dick here (I'm sorry if your name is Dick); there is no noticeable difference in performance that I can tell, but still, 20% is a big number and I thought that defragging this partitions would be a good thing to do.

The reason I say would is there is no official defragger for ext4. What now, Jose?

I remembered the days I used to use XFS and it had an online defrag. I also remember it was terribly slow with small files, so updating my system with pacman was frigging slow. I also remember that, since then, I've read a lot about filesystems and optimizations and that I've never tried using noatime with XFS. I searched some pages about optimizing XFS and it's incredible how much better it can get with some simple options during it's creation and some other options that should go in fstab .

That being said, I'll try it in a spare partition I have, and maybe change my system to use XFS (hopefully, I'll do it without having to reinstall anything). Wish me luck and let LVM be with you.

Benchmarks



So I decided to make some benchmarks before changing my partitions. I , my . Don't ask for benchmarks of other filesystems, I won't do it. I chose to test only ext4 and XFS, as Reiser3 and ext3 are dated filesystems (my backup is ext3, though) and other bechmarks showed that JFS doesn't have the performance I expect. It may use less CPU time, but whatever. BtrFS is still being developed and other filesystems don't seem to be ready, too.

OK... The actual reason I didn't want to make a lot of tests is that I'll use my only computer to do them, but I want these tests to be honest , so I'll run them with the minimal number of processes running. This isn't per se a problem, but what am I supposed to do during the tests? Shave? So I decided to do only three tests: ext4 with delayed allocation (without nodelalloc ), default XFS, optimized XFS. The reason I didn't optimize ext4, is that I didn't find any nice text about it. There are only small modifications, such as using noatime , but this applies to all filesystems.

I have a spare partition here of 15GB that I'll use for the tests. My home partition is a lot bigger than that, but that's what I have. Besides, my root partition is a little smaller than that, so although the tests won't be a good representation of my home partition, it'll be a very good one of my root partition. This test partition won't be using LVM nor cryptography and these are the options I've used:

mkfs.ext4 /dev/sda3
mkfs.xfs /dev/sda3
mkfs.xfs -l lazy-count=1,size=128m /dev/sda3


The ext4 partition uses the defaults of Arch Linux:

[defaults]
base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
blocksize = 4096
inode_size = 256
inode_ratio = 16384

[fs_types]
ext4 = {
features = has_journal,extents,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
inode_size = 256
}


The optimized XFS has two different options related to logs: by default, XFS uses a log of 22MB, but its performance increases with bigger logs, so we use 128MB. Also, XFS tries to keep counters in superblocks always up to date, but this information can be retrieved only when necessary. Turning on lazy-count, we avoid some disk writes.

All three filesystems will be mounted with noatime . Ext4 and default XFS won't have any other option specified beyond this one. The optimized XFS will be mounted with two other options: logbufs=8, which increase the number of log buffers from 2 to 8, and logbsize=256k which increases log buffers size from 32KB to 256KB. This increases memory usage, of course, but I have 3GB. It won't be 2MB that will make me run out of memory.

Note: I've decided to benchmark only the optimized XFS. When I finished the first set of benchmarks, I was already pretty bored.

Kernel
Every time I see a benchmark of file systems, there is a test like this. So I'll do it, too! I first extract the contents from the kernel .tar.bz2 and then I copy the folder to another place in the same partition and then rm everything.

#!/bin/sh
cp /home/andre/code/linux-2.6.30.5.tar.bz2 /media/bench/
cd /media/bench
tar -xjf linux-2.6.30.5.tar.bz2
cp -R linux-2.6.30.5 lunix
rm -rf linux-2.6.30.5{,.tar.bz2} lunix


Pacman
Pacman is the package manager of Arch Linux. It handles a lot of small files, so I guess this a pretty good test. I run pacman -Syy -b /new/partition, which generates a database in a different folder from the default and then I search it for three packages: kernel26, gimp and ncmpcpp. There is a problem with this test, though: the first part of it depends on the network, too. Statistically, running this test a few times should minimize the differences.

#!/bin/sh
pacman -Syy -b /media/bench
pacman -Si kernel26 -b /media/bench
pacman -Si gimp -b /media/bench
pacman -Si ncmpcpp -b /media/bench


With you... The Beatles!
I decided to use The Beatles' discography as a test for a big number of files of medium size (~4MB). My musical collection is much bigger than that, but I guess we can get some good information from this.

#!/bin/sh
cp -R /home/andre/media/music/The\ Beatles /media/bench/Beatles
cp -R /media/bench/Beatles /media/bench/Rutles
rm -r /media/bench/Beatles /media/bench/Rutles


Moving a disk around
I guess the biggest file I have in my computer is a virtual disk of a virtual machine, so I use it to test the performance of the filesystem with big files (~2.2GB).

#!/bin/sh
cp /home/andre/.local/.VirtualBox/HardDisks/arch.vdi /media/bench
cp /home/andre/.local/.VirtualBox/HardDisks/lose32.vdi /media/bench
cp /media/bench/arch.vdi /media/bench/arch2.vdi
cp /media/bench/lose32.vdi /media/bench/lose64.vdi
rm /media/bench/{arch,arch2,lose32,lose64}.vdi


Results



All the scripts above were run 5 times, then I calculated the mean and the sanitized mean (the mean without the highest and the lowest value). Actually, I used a python script that can be found at the end of this article.

To be honest, I thought XFS would perform much better. With medium and large files, XFS got really close to ext4, but it was never faster than ext4, and it was almost three times slower when handling with the kernel files.

After these tests, I'll keep ext4 for longer, as probably there's no other match to it.

=KERNEL
./time-it 5 ./kernel.sh

ext4
52.2324
42.7818
42.7238
51.5118
42.6375
mean: 46.3774
sanatized mean: 45.6725

xfs-opt
147.237
164.076
188.027
148.23
134.843
mean: 156.483
sanatized mean: 153.181

=PACMAN
./time-it 5 ./pacman.sh

ext4
22.139
5.6207
4.98342
5.74287
5.2469
mean: 8.74658
sanatized mean: 5.53682

xfs-opt
8.51362
14.2627
15.3344
14.9117
14.0883
mean: 13.4222
sanatized mean: 14.4209

=BEATLES
./time-it 5 ./beatles.sh

ext4
292.881
306.477
297.082
303.119
301.018
mean: 300.115
sanatized mean: 300.406

xfs-opt
294.875
301.987
297.663
304.623
301.512
mean: 300.132
sanatized mean: 300.387

=VIRTUAL
./time-it 5 ./virtual.sh

ext4
435.735
432.042
439.759
446.086
502.91
mean: 451.306
sanatized mean: 440.527
* I opened firefox during this one, so maybe it would be better to consider only the 4 first times.
mean: 438.406

xfs-opt
444.722
441.038
432.156
432.414
447.277
mean: 439.521
sanatized mean: 439.391

=USAGE
ext4
/dev/sda3 11535376 159680 10789728 2% /media/bench

xfs-opt
/dev/sda3 11588344 4256 11584088 1% /media/bench

=NOTES
ext4: with both BEATLES and VIRTUAL, the system got really slow.
xfs-opt: the same.


time-it.py



#!/usr/bin/env python
import time
import sys
import subprocess
import math

if len(sys.argv) < 3:
print "time-it.py "
exit()

runs = int(sys.argv[1])
command = sys.argv[2:]

def mean(lst):
global runs
return (math.fsum(lst) / float(runs))

def san_mean(lst):
global runs
lst.sort()

return (math.fsum(lst[1:-1]) / float(runs-2))

time.sleep(2)

count = 0
timing = []
while count < runs:
t1 = time.time()
subprocess.call(command)
t2 = time.time()

timing.append(t2-t1)
count += 1
time.sleep(1)

count = 0
print " ".join(command)
while count < runs:
print "%g" % timing[count]
count += 1

print "mean: %g" % mean(timing)
print "sanatized mean: %g" % san_mean(timing)
print

Saturday, August 8, 2009

The “Linux is only free if your time has no value” myth

Or if “Linux is only free if your time has no value” then Windows is more expensive than it seems.


My first contact with Linux was around 5 years ago, using a Brazilian distro named Kurumin meant to be used as a LiveCD and based on Knoppix/Debian. One year later, I decided to install Slackware on my computer, and as I had a small HD by the time, it became my only operational system.

Tired of having to keep track of dependencies and the like, I replaced it with Arch Linux, my distro of choice for the last years (and for the next years, too). While I am relatively new to Linux (I have never used a kernel from the 2.4 series), I'm used to “hard to use” distros, and I fear no text command.

During this time, I did “waste” my time configuring X, trying different filesystems, partition schemes, window managers... but I don't see it as a waste of time. It was time spent learning something. One may ask “what's the use of learning a OS that almost nobody uses?”, but that wasn't the only thing I've learnt. Throughout these years I've used Linux, I also improved my English and my programming skills, I know more about how computers work (and how they don't), I've started worrying about my privacy online, I've learnt how to be a minority sucks, I've learnt to RTFM... Some of this might not be useful from a professional point of view, but they helped to develop the person that I am today.

When the time to go to college came, I chose to take the Electrical Engineering course. As some of you may know, the Engineering field uses some proprietary software, as AutoCAD, Matlab and others. I had no choice but to learn how to use them. I thought “I'll get educational licenses from the college and install it in a virtual machine. How bad can it be?”. It turned out to be a lot worse than I thought at first.

First step: Install Windows XP
I didn't measure how long it took to install Windows XP, but assume it's about the same time it takes to install a Linux distro, so installing both Linux and Windows “costs” the same, right? Yes, except by the fact that after installing Linux, you have a office suite installed, an image editor, music and video players, an editor with syntax highlighting, a C compiler, a Python interpreter. Some less ideological distros already come with Mono and Java preinstalled.

When you first install Windows, there's almost nothing there. Now you have some options:
  • Install the software you need from a CD, and possibly type a key code.
  • Search and download a freeware version from the internet.
  • Search and download illegal software and search for cracks in the internet.

Install from CDs is bad, as you have to insert it, click “next” some times, type the key code, exchange CDs... Searching for freeware and downloading takes some time, and then you have to click next some times, too. Install cracked software is even worse, as it involves some risk. How good would it be if you could just select some software from a list and then leave to drink some coffee and let the computer doing it's own job -- download and install software? That's how installing software on Linux works.

I think it's clear by now that installing Windows is more time-expensive than installing Linux, but this comparison isn't complete. If the Windows installation we're talking about is OEM, some software may have been installed for us. Sometimes even software we don't want already come installed and removing them is usually cumbersome. On the other hand, if this installation is a “normal” one, now you probably have to install some drivers, but what's your hardware? On Linux, a simple “lspci” describes your computer; if you need more information, use the “-v” flags. On Windows, though, there's no such tool, you have to search for one and install. Now, with some program as Everest or similar, we can start searching for drivers.

I'm in your computer, stealing your CPU time
Of course, no sane person would ever download and install something on Windows without running an anti-virus software and probably a firewall, too. What this means is that there's a software scanning all your connection while you search the web and another one scanning your files while you install something, so the time it takes to search for the software you need and to install is bigger than it would be if there were no need for that.

A problem arises...
Things worked fine for about two weeks, then suddenly AutoCAD stopped working. It could have been any other software, it just happens that it was AutoCAD. I tried it again, and it crashed again. I rebooted the machine, and it happened again. I'm not saying this kind of thing doesn't happen with Linux, but I still can try to run the app from a terminal, see if it's segfaulting, or if it's searching for a file that can't be found.

Unaware of what else could I do, I did what any other Windows user would do: reinstall the program. Now, ask me if it worked. No, it didn't. I decided to try it again, manually deleting some files from the “Documents and Settings” folder, and manually deleting any entry in the registry that could have been left behind. I gave up the idea of trying to understand what happens in the registry and decided to install a registry cleaner, but now, whenever I try to access a shared folder, explorer will simply crash.

From that, what I've learnt is that Windows is expensive, even you're using a free educational license.

Notes
  • Because of laziness, throughout this article I wrote Linux when I meant the GNU/Linux OS. Don't get mad at me, RMS.
  • I compared Linux with Windows because Macs are rare here. From what I've seen, some of what's written here may be applied to Mac OS X, too.