AlexJ's Computer Science Journal

alexandru.juncu.ro

K, kilo, kibi, bytes, bits and the rest

While writing the previous article, I had again to deal with the confusion caused by the question “What is a KB?”.
Let’s start with some well known facts. First, unless you live in the US (and Myanmar and Liberia), you use the metric system. This means that you have some basic units, like meter, second, newton etc. Just like every basic unit, they are arbitrarily defined. But they are clearly defined and standardized (the International System of Units aka SI). What are not arbitrary (unlike the imperial system) are the multiplicators.

The multiplicators are: kilo (k), mega(M), giga (G), tera(T) etc. You also have the reverse, like mili (m), nano (n) etc, but not important to the discussion. What is not arbitrary about them is the fact that one is related to the other by a factor of 1000. Any factor would have been acceptable, but base 10 makes sense since humans decided that base 10 is the ‘natural system’.

Second, we know that the smallest unit of data is the bit (the binary digit). We also “know” that one byte is 8 bits. Actually the byte had different meanings, until it was standardized by IEC_80000-13 as 8 bits. The name octet is used to avoid confusion. But WE KNOW a byte has 8 bits.

So how much is a gigabyte? Here is where the debate starts. In computer science and engineering, the ‘natural base’ is base 2… usually. So multiples of 1024 is usually more helpful. People started to use the idea of kilo, mega, giga to refer as multiples of 1024. Some say that 1 kilobyte is 1024 bytes. And since words can have different meanings, we could use kilo to mean something else in IT than in the rest of science. End of story.

Only it’s not that simple. Because even in IT, sometimes you need multiples of 1000 and sometimes multiples of 1024.

Memory measurements (like the “size of RAM”) needs to be in multiples of 1024. Because sizes of memory are multiples of powers of 2. And that is because memory addressing is done using powers of 2. So a MegaByte of RAM is 1024 KiloBytes, is 1024*1024 Bytes. This might also extend to storage these days because of flash memory based storage.

But in networking, for example, we don’t have that ‘limitation’. The base unit is the bit and we can transmit how many bits we want. Speeds are measured in bits per second. And measurements can be done with normal SI multipliers. So Gigabit Ethernet is 1 gigabit per second, or 1000 megabits per second, or 1 000 000 kilobits per second. We can divide by 8 and just use bytes. But sill, it’s 0.125 gigabytes per second, is 125 megabytes per second is 125 000 kilobytes per second and it’s 125 000 000 bytes per second.

This is confusing and a solution was needed. And one was found: having another set of multipliers for base 2, called binary prefixes. Thus the terms kibi (Ki), mebi (Mi), gibi (Gi), tebi (Ti) etc. were introduced. And all they do is say that one kibi is 1024 something, one mebi is 1024*1024 something, just like any other prefix. So a kibimeter would just be 1024 meters and one gibigram would be 1073741824 grams. But I don’t imagine using them outside measuring bits and bytes of information.

So that should solve it, right? It should, but it didn’t. Because people still keep using kilobit and kilobyte in a non standard way. And this is a problem and it’s going to be a even bigger problem because while a kilobyte and a kibibyte could be equivalent given a margin or error, but a tebibyte is 109% the value of a terabyte and that is a rather big difference.

The problem exists in consumer market where hardware manufacturers sell RAM cards or 8TB, when they actually sell 8TiB and maybe it would be hard to change the perception of the average buyer because it’s too technical. But it seems technical people are confused too and highly technical manual are also lagging behind when documenting things. Hence, I got to the reason behind this article, which is to point out some problems in man pages of Linux commands.

Let’s start with with the man page for the free utility. By default, free gives the output in kilobytes, as the manual says. But does it? [I’ll snip a small portion of output]

[alexj@ptah ~]$ free -k
total
Mem: 7712244

Let’s see the other options from the manual:

-b, –bytes
Display the amount of memory in bytes.

-k, –kilo
Display the amount of memory in kilobytes. This is the default.

-m, –mega
Display the amount of memory in megabytes.

-g, –giga
Display the amount of memory in gigabytes.

–tera Display the amount of memory in terabytes.

Let’s test.

[alexj@ptah ~]$ free -b
total
Mem: 7897337856

[alexj@ptah ~]$ free -m
total
Mem: 7531

So the values are 7897337856, 7712244 and 7531. Which, after a quick calculation are truncated divisions of 1024. So free, by default uses kibibytes and -m shows mebibytes and -g gibibytes.  The parameters should be -Ki  or –kibi and -Mi or –mebi etc.

But since free is surely in a ton of scripts, you can’t change things know. Only patch things out to be less confusing. Like the following addition:

–si Use power of 1000 not 1024.

Only that this parameter only adds more confusion. “–si” is reference to International system. But both kilo and kibi are in the international system. So the name is just confusing. And using 1000 instead of 1024 doesn’t change the definition of the kibi as the writers of the manual think.

Oh well, at lest in the world of storage things are better. dd is a basic tool for block storage. Here is the section of the man page:

N and BYTES may be followed by the following multiplicative suffixes: c =1, w =2, b =512, kB =1000, K =1024, MB =1000*1000, M =1024*1024, xM =M GB =1000*1000*1000, G =1024*1024*1024, and so on for T, P, E, Z, Y.

It is better… 1kB is 1000 bytes, 1 MB is 1000*1000 bytes. But what about K and M? They should be ‘KiB’ and ‘MiB’. Well, at lest kB, MB, GB is consistent in storage tools (like df, du). Except for one thing…

This is from the man page of du:

 Units are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB, … (powers of 1000).

The odd one out here is “KB”. Because “K” does not exist in the SI as a multiplier. This is because K means Kelvin, another basic unit.

So we are back to the initial question? What is a “KB”? A kilobyte is “kB” (lowecase k). And a kibibyte is “KiB” (uppercase k). “KB” Technically doesn’t exist in either base 10 prefixes or the base 2 prefixes. It’s there because people are too lazy to use upper and lower case. So how much is 1KB? Since it’s not officially defined there is no answer. Because if it’s made to look like “MB” and “GB” it should be the equivalent of 1000 bytes. But it’s very often used as the replacement of “KiB” so 1024 bytes.

Confused? Good! Maybe now you understand why we need to use a standard. Which is the following:

Base 10 prefixes are kilo (k), mega (M), giga (G), tera (T).

1 kilo = 1 k = 1000 units

1 mega = 1 M = 1 000 000 units

1 giga = 1 G = 1 000 000 000 units (let’s not start with what a ‘billion’ and ‘milliard’ is)

Base 2 prefixes are: kibi (Ki), mebi (Mi), Gibi (Gi), tebi (Ti).

1 kibi = 1 Ki = 1024 units

1 mebi = 1 Mi = 1024 * 1024 units = 1 048 576 units

1 gibi = 1 Gi = 1024 * 1024 * 1024 units =  1 073 741 824 units

So, let’s recap:

1 kB is 1000 bytes. 1KiB is 1024 bytes or 1.024 kB.

1 MB is 1000 kB or 1 000 000 bytes.

1 MiB is 1024 KiB. It’s also 1 048 576 B.

1 kilobit (kb) is 1000 bits.  1 megabit (Mb) is 1000 * 1000 bits.

1 kibibit (Kib) is 1024 bits and 1 mebibit (Mib) is 1024 * 1024 bits.

So, for sanity reasons, please use the standards.

 

Comment

AlphaOmega Captcha Classica  –  Enter Security Code
     
 

*