Data Structures And Number Systems
© Copyright Brian Brown, 1984-1999. All rights
reserved.
Reference Books:
DATA STRUCTURES
Just as learning to design programs is important, so is the
understanding of the correct format and usage of data. All
programs use some form of data. To design programs which work
correctly, a good understanding of how data is structured will be
required.
This module introduces you to the various forms of data used by programs. We shall investigate how the data is stored, accessed and its typical usage within programs.
A computer stores information in Binary format. Binary is a number system which uses BITS to store data.
BITS
A bit is the smallest element of information used by a computer.
A bit holds ONE of TWO possible values,
Value | Meaning |
0 | OFF |
1 | ON |
A bit which is OFF is also considered to be FALSE or NOT SET; a bit which is ON is also considered to be TRUE or SET.
Because a single bit can only store two values, bits are combined together into large units in order to hold a greater range of values.
NIBBLE
A nibble is a group of FOUR bits. This gives a maximum number of
16 possible different values.
2 ** 4 = 16 (2 to the power of the number of bits)
It is useful, when dealing with groups of bits, to determine which bit of the group has the least value, and which bit has the most or greatest value.
The Least Significant Bit AND The Most Sigificant Bit
This is deemed to be bit 0, and is always drawn at the extreme
right. The Most significant bit is always shown on the extreme
left, and is the bit with the greatest value.
The diagram below shows a NIBBLE, and each bits position and decimal weight value (for more information, consult the module on Number Systems).
3 2 1 0 Bit Number +---+---+---+---+ | | | | | +---+---+---+---+ 8 4 2 1 Decimal Weighting Value MSB LSB
Lets consider an example of converting binary values into decimal.
Bit 3 2 1 0 Value 1 0 1 1 Bit 3 is set, so it has a decimal weight value of 8 Bit 2 is not set, so it has a decimal weight value of 0 Bit 1 is set, so it has a decimal weight value of 2 Bit 0 is set, so it has a decimal weight value of 1 Adding up all the decimal weight values for each bit= 11 So 1011 in binary is 11 in decimal!
For more examples, consult the module on Number Systems.
BYTES
Bytes are a grouping of 8 bits. This comprises TWO nibbles, as
shown below.
7 6 5 4 3 2 1 0 Bit Number +---+---+---+---+---+---+---+---+ | | | | | | | | | +---+---+---+---+---+---+---+---+ 128 64 32 16 8 4 2 1 Decimal Weighting Value MSB LSB
Bytes are often used to store CHARACTERS. They can also be used to store numeric values,
0 to 255 -127 to +128
Binary Coded Decimal [BCD]
Binary code decimal digits (0-9) are represented using FOUR bits.
The valid combinations of bits and their respective values are
Binary Value | Digit |
0000 | 0 |
0001 | 1 |
0010 | 2 |
0011 | 3 |
0100 | 4 |
0101 | 5 |
0110 | 6 |
0111 | 7 |
1000 | 8 |
1001 | 9 |
The binary combinations 1010 to 1111 are invalid and are not used.
If the computer stores one BCD digit per byte, its called normal BCD. The unused nibble may be either all 0's or all 1's.
If two BCD digits are stored per byte, its called Packed BCD. This occurs in data transmission where numbers are being transmitted over a communications link. Packed BCD reduces the amount of time spent transmitting the numbers, as each data byte transmitted results in the sending of two BCD digits.
Consider the storing of the digits 56 in Packed BCD format.
7 6 5 4 3 2 1 0 Bit Number +---+---+---+---+---+---+---+---+ | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | +---+---+---+---+---+---+---+---+ MSB LSB
The UPPER nibble holds the value 5, whilst the LOWER nibble holds the value 6.
Status And Boolean Variables
BOOLEAN variables use a single bit to hold their value, so can
only assume one of two possible states. This is either 0
(considered to be FALSE), or 1 (considered to be TRUE).
The computer handles each boolean variable as a single bit. If the bit is TRUE, then is has a value of 1. If the bit is FALSE, then it has the value 0.
When a group of bits are grouped together to form a limited range of values, this is known as a STATUS variable.
Consider the case in a program where we need to keep track of the number of minutes a phone line is busy for (within the limited range of 0 to 60). This does not require the use of a full integer, so some programming languages allow you to specify the number of bits used to allocate to variables with limited ranges.
The advantage of this approach, is that the storage space of status variables can be combined together into a single 16 or 32 bits, resulting in a saving of space.
Consider where a computer allocates 16 bits of storage per status variable. If we had three status variables, the space consumed would be 48 bits. BUT, if all the status variables could be combined and fitted into a single 16 bits of storage, we could save 32 bits of memory. This is very important in real-time systems where memory space is at a premium.
Consider the following diagram, which illustrates the packing of boolean and status variables together into a single byte.
7 6 5 4 3 2 1 0 Bit Number +---+---+---+---+---+---+---+---+ | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | +---+---+---+---+---+---+---+---+ | | | | | | | +--- local call/TOLL call (bit 0) | | +------- extension busy/free (bit 1) | +------------------- minutes (bits 2-4) +----------------------- extension diverted/TRUE/FALSE
The American Standard Code for Information Interchange
ASCII is a computer code which uses 128 different encoding
combinations of a group of seven bits (27 = 128) to represent,
Lets now look at the encoding method. The table below shows the bit combinations required for each character.
00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 0A | 0B | 0C | 0D | 0E | 0F | |
00 | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | TAB | LF | VT | FF | CR | SO | SI |
10 | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
20 | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / | |
30 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
40 | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
50 | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
60 | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
70 | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
A computer usually stores information in eight bits. The eighth bit is unused in ASCII, thus is normally set to 0. Some systems may use the eight bit to implement graphics or different language symbols, ie, Greek characters.
Control codes are used in communications and printers. They may be generated from an ASCII keyboard by holding down the CTRL (control) key and pressing another key (A to Z, plus {, \, ], ^, <- ).
Example Code the text string 'Hello.' in ASCII using hexadecimal digits.
H = 48 e = 65 l = 6C l = 6C o = 6F . = 2E
thus the string is represented by the byte sequence
48 65 6C 6C 6F 2E
CHARACTERS
Characters are non-numeric symbols used to convey language and
meaning. In English, they are combined with other characters to
form words. Examples of characters are;
a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 ! @ # $ % ^ & * ( ) _ - = + | \ ` , . / ; ' [ ] { } : " < > ?
A computer system normally stores characters using the ASCII code. Each character is stored using eight bits of information, giving a total number of 256 different characters (2**8 = 256).
In the high level language Pascal, characters are defined and used as follows,
var plus_symbol : char; begin plus_symbol := '+';
Variables used in a Pascal program are declared after the keyword var. The above example declares the variable plus_symbol to be a character type, thus eight bits of memory storage are allocated to store its value (as yet undetermined).
Inside the main body of the program, after the keyword begin, the statement shown assigns the symbol + to the character variable plus_symbol. This is equivalent to storing the ASCII value 2B hexadecimal in the eight bits of memory allocated to the variable plus_symbol.
TEXT STRINGS
Text strings are a sequence of characters (ie, words or multi-
character symbols). Each character is stored one after the other,
each occupying eight bits of memory storage.
The text string Hello would be stored as follows
+------+ | 48 | +------+ | 65 | +------+ | 6C | +------+ | 6C | +------+ | 6F | +------+
In Turbo Pascal, text strings are defined and used as follows,
var text_message : string[22]; begin text_message := 'Welcome to text strings';
The above example declares the variable text_message to be a string type of up to 22 characters long (but no more!). Eight bits of memory storage are allocated to store each character in the string (a total of 22 bytes), with the value in each byte as yet undetermined.
Inside the main body of the program, after the keyword begin, the statement shown assigns the message Welcome to text strings to the string variable text_message. This stores the ASCII value of each character into each successive byte of memory allocated to the variable text_message.
INTEGERS
Numeric information cannot efficiently be stored using the ASCII format. Imagine storing the
number 123,769 using ASCII. This would consume 6 bytes, and it
would be difficult to tell if the number was positive or negative
(though we could precede it with the character + or -).
A more efficient way of storing numeric information is to use a different encoding scheme. The encoding scheme in most use is shown below,
Bits 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +---+--------------------------------------------+ | S | Numeric Value in Binary | +---+--------------------------------------------+ S = Sign Bit
Integers store whole numbers only! They do not contain fractional parts. Consider the examples below,
Valid Invalid 123 .987 0 0.0 278903 123.09
The sign bit (which is bit 15) indicates whether the number is positive or negative. A logic 1 indicates negative, a logic 0 indicates positive.
The number is converted to binary and stored in bits 0 to 14 of the two bytes.
Example Store the value +263 as an integer value. 1) The sign bit is 0. 2) The decimal value +263 is 100000111 in binary. Thus the storage in memory of the integer +263 looks like, Bits 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +---+---------------------------------------------+ | 0 | 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 | +---+---------------------------------------------+
When storing negative numbers, the number is stored using the two's complement format.
In Pascal, integers are defined and used as follows,
var whole_number : integer; begin whole_number := 1267;
The example declares the variable whole_number to be an integer type, thus sixteen bits of memory storage are allocated to store its value (as yet undetermined).
Inside the main body of the program, after the keyword begin, the statement shown assigns the numeric value 1267 to the integer variable whole_number. This is equivalent to storing the bit combination 0000010011110011 in the sixteen bits of memory allocated to the variable whole_number.
Signed integers using 16 bits have a number range of,
-32768 to +32767 (+-2^15)
To store larger integer values would require more bits. Some systems and languages also support the use of unsigned integers, which are deemed to be positive only.
FLOATING POINT NUMBERS
There are two problems with integers; they cannot express
fractions, and the range of the number is limited to the number
of bits used. An efficient way of storing fractions is called the
floating point method, which involves splitting the fraction into
two parts, an exponent and a mantissa.
The exponent represents a value raised to the power of 2.
The mantissa represents a fractional value between 0 and 1.
Consider the number
12.50
The number is first converted into the format
2n * 0.xxxxxx
where n represents the exponent and 0.xxxxx is the mantissa.
The computer industry agreed upon a standard for the storage of floating point numbers. It is called the IEEE 754 standard, and uses 32 bits of memory (for single precision), or 64 bits (for double precision accuracy). The single precision format looks like,
Bits 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +---+-----------------------+---------------------------------------------------------------------+ | S | Exponent value | Mantissa value | +---+-----------------------+---------------------------------------------------------------------+ S = Sign Bit
The sign bit is 1 for a negative mantissa, and 0 for a positive mantissa.
The exponent uses a bias of 127.
The mantissa is stored as a binary value using an encoding technique.
Working out the FP bit patterns
The number we have is
12.5
which expressed as fraction to the power of 2 is,
12.5 / 2 = 6.25 6.25 / 2 = 3.125 3.125 / 2 = 1.5625 1.5625 / 2 = 0.78125
NOTE: Keep dividing by 2 till a fraction between 0 and 1 results. The fraction is the mantissa value, the number of divisions is the exponent value.
thus our values now are,
0.78125 * 24
The exponent bit pattern is stored using an excess of 127. This means that this value is added to the exponent when storing (and subtracted when removing).
The exponent bit pattern to store is,
4 + 127 = 131 = '10000011'
As the mantissa is a positive value, the sign bit is 0.
The mantissa is a little more complicated to work out. Each bit represents 2 to the power of a negative number. It looks like,
1st bit of mantissa = 0.5 2nd = 0.25 3rd = 0.125 4th = 0.0625 5th = 0.03125 etc
The mantissa number value we have is 0.78125, which in binary is
11001000000000000000000 (0.5 + 0.25 + 0.03125)
How-ever, to make matters even more complicated, the mantissa is normalized, by moving the bit patterns to the left (each shift subtracts one from the exponent value) till the first 1 drops off.
The resulting pattern is then stored.
The mantissa now becomes
10010000000000000000000
and the exponent is adjusted to become
131 - 1 = 130 = '10000010'
The final assembled format is,
Bits 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +--+-----------------------+---------------------------------------------------------------------+ | 0| 1 0 0 0 0 0 1 0| 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | +--+-----------------------+---------------------------------------------------------------------+ S Exponent Mantissa
Now lets convert the following storage format back into a decimal number.
Bits 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +--+-----------------------+---------------------------------------------------------------------+ | 1| 1 0 0 0 0 0 1 1| 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | +--+-----------------------+---------------------------------------------------------------------+ S Exponent Mantissa
This gives a negative number with an exponent value of,
131 - 127 = 4
and a mantissa value of,
1.0 + 0.5 + 0.0625 = 1.5625
(the 1.0 comes from the bit which was shifted off when the mantissa was normalized ), thus the number is,
-1.5625 * 24 = -25.0
The numeric range for floating point numbers using the IEEE-754 method, using 32 bits, is
+- 0.838860808 * 2-128 to +- 0.838860808 * 2127
or, in decimal,
2.4652 * 10-39 to 1.4272 * 1038
In Pascal, floating point numbers are defined and used as follows,
var fp_number : real; begin fp_number := 12.50;
The example declares the variable fp_number to be a floating- point type, thus thirty-two bits of memory storage are allocated to store its value (as yet undetermined).
Inside the main body of the program, after the keyword begin, the statement shown assigns the numeric value 12.50 to the real variable fp_number. This is equivalent to storing the bit combinations 01000001010010000000000000000000 in the thirty-two bits of memory allocated to the variable fp_number.
The advantages of storing floating point numbers in this way are,
The disadvantages of the storage format are,
Home | Other Courses | Feedback | Notes | Tests
© Copyright Brian Brown, 1984-1999. All rights
reserved.