Data Types

last updated 8-oct-07

Data Types

Data Types, a definition: a set of values.
For many languages this is sufficient (COBOL, FORTRAN, C)

A more modern definition: A set of values together with a set of operations.
(Ada with packages, C++ and Java classes, or in general, an abstract data type)

The relationship of types to algebraic systems is limited to the computer's finite systems. For example the infinite integer range and real number continuum are only approximated by a finite system.

Each data object has a type:

 

 

 

 

 


Type Declarations

The definition of a type beyond what is supplied by the base language. You are naming the type with an identifier.

E.g. typedef in C; class  in C++; type statement in Pascal/Ada. Classes in Java support (or cirumcumvent the need for a separate) type declarations.

Explicit type declaration: strongly typed if all variables must be declared.

No memory allocation is a direct result of a declaration.

Type declarations simplify type equivalence requirements.

Type checking

Type checking: rules that are applied by the compiler for correct usage of the type system.

 

 

 


Data Objects

Scalar data objects:
  • Numeric (Integers, Real)
  • Booleans
  • Characters
  • Enumerations

Composite objects:

  • String
  • Pointer

 

Structured objects:
  • Arrays
  • Records
  • Lists
  • Sets

Abstract data types

  • Classes

Active Objects:

  • Tasks
  • Processes

 

 


Binding of data objects

A compiler tracks two categories:

A variable is a binding of a name to a memory location:

 

 

 

 


L-value and R-value

Location for an object is its L-value. Contents of that location is its R-value.

Where did names L-value and R-value come from?

Consider executing: A = B + C;

  1. Pick up contents of location B
  2. Add contents of location C
  3. Store result into address A.

For each named object, its position on the right-hand-side of the assignment operator (=) is a content-of access, and its position on the left-hand-side of the assignment operator is an address-of access.

 

 

 

 


Subtypes

A is a subtype of B if every value of A is a value of B.

Note: In C almost everything (int, word, long, byte) other than floating point numbers is a subtype of integer.

Conversion between types:

Given variables A and B, when is A := B legal?

Explicit: All conversion between different types must be specified by the programmer

Implicit: Some conversions between different types are implied by the language definition

 

 

 

 


Coercion examples

Examples in Pascal:

var A: real;
    B: integer;

A := B - Implicit, called a coercion - an automatic conversion from one type to another

A := B is called a widening since the type of A has more values than B.

B := A (if it is allowed) is called a narrowing since B has fewer values than A. Information could be lost or corrupted in this case.

In most languages widening coercions are usually allowed; narrowing coercions must be explicit, as in Pascal:

B := round(A); {Go to integer nearest A}

B := trunc(A); {Delete fractional part of A}

 

 


Simple types

Predefined types: built into the language
E.g., Integer, Real, Char, Boolean

Varied precision: E.g., long int, double, short, or number of decimal places in COBOL or Ada

Enumerated types: user defined constant sequence. Implemented as an integer mapping. 
E.g. enum in C, user-defined types in Pascal.

Subranges of a base type.
Generally restricted to ordinal types. Not following the regular principle since it could apply to real types.

in Pascal:

TYPE
ThermTempRange = -40..120;

 


Integer numeric data

Integers:

Binary representation is in 2's complement arithmetic

For 32-bit words:

Maximum value:  231-1

Minimum value: -231

 

 

 

 


Real numeric data

Float (real): hardware representations

 

 

 

 

 

 

Exponents usually biased

e.g., if 8 bits (you have 256 values) then +128 is added to exponent

 

 


IEEE floating point format

IEEE standard 754 specifies both a 32- and 64-bit standard.

Numbers consist of three fields:

S: a one-bit sign field. 0 is positive.

E: an exponent in excess-127 notation. Values (8 bits) range from 0 to 255, corresponding to exponents of 2 that range from -127 to 128.

M: a mantissa of 23 bits. Since the first bit of the mantissa in a normalized number is always 1, it can be omitted and inserted automatically by the hardware, yielding an extra 24th bit of precision.

Examples:

This gives a range from 10-38 to 1038.

In 64-bit format, the exponent is extended to 11 bits giving a range from -1022 to +1023, yielding numbers in the range 10-308 to 10308.

 

 


Other numeric data

Short integers (C) - 16 bit, 8 bit

Long integers (C) - 64 bit

Boolean or logical - 1 bit with value true or false

Byte - 8 bits

Character - Single 8-bit byte - 256 characters

In C, a char variable is simply 8-bit integer numeric data

 

 

 

 


Enumerations

typedef enum thing {A, B, C, D } NewType;

Implemented as small integers with values: A = 0, B = 1, C = 2, D = 3

NewType X, Y, Z;

X = A

Why not simply write: X=0 instead of X=A?

Example:

enum { fresh, soph, junior, senior} ClassLevel;
enum { old, new } BreadStatus;
BreadStatus = fresh; //An error which can be detected

 

 


Decimal data

Fixed decimal in PL/I and COBOL (For financial applications). You see this in database data definition languages.  SQL has the DECIMAL attribute type.

DECLARE X FIXED DECIMAL(p,q);

Example of PL/I fixed decimal:

DECLARE X FIXED DECIMAL (5,3),
Y FIXED DECIMAL (6,2),
Z FIXED DECIMAL (6,1);
X = 12.345;
Y = 9876.54;

What is Z=X+Y?:   By hand you would line up decimal points and add:

0012.345
9876.540
9888.885 = FIXED DECIMAL(8,3)

 


Implementing decimal data

Algorithm:

  1. Store each number as an integer (12345, 987654) Compiler knows scale factor (S=3 for X, S=2 for Y). True value printed by dividing stored integer by 10S
  2. To add, align decimal point. Adjust S by 1 by multiplying by 10.
  3. 10*Y+X = 9876540 + 12345 = 9888885, Compiler knows S=3
  4. S=1 for Z, so need to adjust S of addition by 2; divide by 102 (98888)
  5. Store 98888 into Z. Compiler knows S=1

Note: S never appears in memory, and there is no loss of accuracy by storing data as integers.

 

 

 


Pointer type

Pointer type does not correspond to a set operation

Usually all addresses associated with a particular type

PL/I has just an ADDRESS type. This is easy to lose track of what should be dereferenced.

Pointers support recursive types.

TYPE LinkedListNode = RECORD
   data: Integer;
   next: LinkedListNode;
END;
Above example has infinite size. Why?

The following is a solution that gives a fixed size to the record but dynamic in structure.

TYPE LLNodePtr = POINTER TO LLNode; /*known size*/
TYPE LLNode = RECORD
   data: Integer;
   next: LLNodePtr; /*this is known*/
END;

We see many examples of the use of pointers in C/C++ especially in the compiler project.

 

 


Files and strings

Files are objects and structures found outside the program module and are part of the operating system. Their access are accomplished through function and subroutine system calls. 

Courses such as operating systems, database management, etc consider these structures in great detail.

String issues in implementation

Each language has its own interpretation and implementation of strings.  Some treat strings as an array of characters, others as a simple type.

Static length strings are found in early languages of FORTRAN, COBOL and Pascal and Ada. 

Dynamic length strings are found in interpreted languages such as BASIC, LISP.  Support for dynamic strings are found in Java

Implementation of dynamic strings varies between using a terminating character (null byte) or using a byte or two to represent the length of the string.

 


String Declarations

Character Strings: Primitive object made up of more primitive character data.

Fixed length:

char A[10]; /** C  note A itself is a pointer to 10 bytes **/
DCL B CHAR(10); /** PL/I  **/
var C: packed array [1..10] of char; (** Pascal **)

Variable length:

DCL D CHAR(20) VARYING; /** PL/I - 0 to 20 characters **/
E = "ABC"  /** SNOBOL4 - any size, dynamic **/
char * F = "ABCDEFG\0";  /** C - any size, programmer defined **/
/** F is a pointer into the string constant pool*/

 


String Operations

Usually expressed as built-in function.