Binary data encoding

Data representation

Computers represent and memorize data using electronic components that can assume only two states: on or off, just like a switch. The information that can be stored in a switch is the simplest that can be stored in a computer system and is called a bit. This means that every kind of information that we can see on the internet or store in an electronic memory, must be made up using a suitable combination of enough bits.

The first and obvious step is to represent data that can be in just two states, like:

  • Black/white
  • Male/female
  • North/south
  • 1/0

Even with these very simple examples, a fundamental problem arises: we have no way to tell if a particular piece of information should be represented using an on or an off switch. For example, if we want to represent one’s and zeroes, we could encode them using the following convention:

positive logic
1 ON
0 OFF

Or we could use the following convention, that is used in some electronic devices.

negative logic
0 ON
1 OFF

This means that all the parties which need to use a certain kind of data, have to share an agreement on the rules that are to be used in its encoding and decoding. Otherwise the recipients of the information would have no way to understand it’s meaning.

If we need to represent data that assume more than two states, we have to use a larger number of bits. For example, if we want to encode the cardinal points, we can use the following combination of bits.

North 00
East 01
South 10
West 11

Whenever we add another bit, the possible combinations related to the new configuration can be calculated from the old ones which will be repeated twice. The first time adding a zero to the left and the second time adding a one. For example, if we start from two bits, we will have the following combinations.

00
01
10
11

But, if we add another bit, we will achieve the following result.

added
bit
original
bits
result
0 00 000
01 001
10 010
11 011
1 00 100
01 101
10 110
11 111

This means that, every time we add a bit, the number of combinations is doubled. In general, the formula that has to be used to perform the calculation is:

states=2^{bits}

Of course, we we want to know how many bits are needed to represent data that can assume a given number of states, we have to use the inverse formula.

bits=log_2(states)