OCR Computing A-Level Revision

Records & Files

Record Formats (2.3.d)

Databases are made up of records. Records are made up of fields, which contain data. A record format is simply a table which shows the names of the fields within a record, the type of data they will hold, and the maximum size of that date. For example, this is a record format for a school trip:

Student ID Integer 3 bytes
Name String 20 bytes
Birthday Array 8 bytes
Amount paid Integer 3 bytes
Photo permission? Boolean 1 byte

File Access (2.3.e)

There are four key modes of file access: serial, sequential, indexed sequential, and random.

Serial

Data is stored in the order which it arrives. This is the simplest way to store data, but finding data in a serial file can be difficult. This type of storage is good for when it is unlikely that the data will be needed again, or when the order of the data is determined by the order it is input (for example, log files).

Sequential

Data is stored in a specific order, for example alphabetical order by name. Although this requires more processing to store the data, it can be retrieved more quickly, for example if searching for a record for "Zhad", the file can be searched from the last record.

Storing, Retrieving, and Searching Data (2.3.f)

This is all about creating algorithms in pseudo-code. For example, this is a simple line-by-line searching algorithm:

datafile = file('example.txt')  
    for line in datafile:  
        if term in line:  
            return True

Estimating File Sizes (2.3.g)

For questions about file sizes, you will either be given a record format, or you will have designed the record format in a previous question. You will also be told the number of records in a file (although this might not be made obvious - e.g. you might be told that there are 50 students on a school trip, so this means there will be 50 records in the file).

All you have to do is add up the size of the record (i.e. the size of all the fields), and then multiply that by the number of records in the file. Depending on the question, you might want to give your answer in KB, instead of just bytes, so you'll need to divide your answer by 1024 (as there are 1024 bytes in a kilobyte). After you've got this, you should add around 10% to the size to allow for overheads, for example metadata about when the file was last modified, which user modified it, e.t.c.

Working with Files (2.3.h)

n.b. I'm using Python, other schools may be teaching other programming languages.

To open a file, simply create an object with the file in:

f = open('file.txt', 'w')

The second argument specifies the mode to open the file with, "r" means read only, "w" means write only (any existing file with the same name will be erased), "a" means appending, and "r+" means reading and writing.

To read a file use f.read(size) (assuming you used f as the name for the object). The size argument is optional, but if you open a file which is larger than the free memory of the computer without limiting the size there will be a crash. This can also be used in loops, for example:

for line in f:  
    print line,

Similarly, f.write(string) is used to write the string to the current position in the file. The position in the file can be changed using f.seek(position [, from_what]). There are three possible from_what values: 0 (default) - from the beginning of the file, 1 - from the current position, and 2 - from the end of the file. The position value can be a positive or negative integer. For example, f.seek(5) would set the position to the 6th byte.

When you've finished working with the file, use f.close() to close the file, freeing up system resources.

A final warning about working with files in Python - make sure that you are working with ASCII files, binary files like images and executables could get corrupted.

(pst... read the docs!)