Search                        Top                                  Index
TEACH STATS                                               A. Sloman Jan 1983


Before working through this file you may find it useful to read
    TEACH ARITH

-- DOING STATISTICS IN POP-11 -----------------------------------------------
There is a myth that POP-11 is not a good language for doing arithmetic in.
Maybe programs don't run quite as fast as with Pascal, but human time is
considerably reduced, and program development puts a much smaller load on the
computer, because you don't change systems so often.

-- FINDING AN AVERAGE -------------------------------------------------------

To find the average of all the numbers in a list, you have to add up the
numbers, then divide by the length of the list. So the average of
    [ 1 2 3]    is 2
of
    [ 5 6 7 8]  is 6.5

First you need a procedure which will add up all the numbers in a list.
Lets call it TOTAL.
We define TOTAL so that it looks at each item in the list and adds it to a
running total, the final value of which is produced as the result.

    define total(list) -> tot;
        ;;; add up numbers in list
        vars item;
        0 -> tot;
        for item in list do
            tot + item -> tot;
        endfor;
    enddefine;

Type that in then test it, making sure it produces the right totals

    total([1 2 3])=>
    total([66 33 11])=>

-- How it works -------------------------------------------------------------
The crucial bit is the following
        for item in list do
            tot + item -> tot;
        endfor;

Note FOR is a syntax word telling POP-11 that this is a 'loop' instruction,
i.e. something to be obeyed repeatedly.
The word ENDFOR merely says where the action to be repeated ends.

The whole thing tells POP-11 to take each element of the list in turn, and
assign it to the variable ITEM, then to do the line in the middle -- the
'action':
        tot + item -> tot;

This command takes the previous value of TOT and adds the value of ITEM to
it, then makes the result the new value of TOT.

Notice that this assumes that TOT is already a number. So we need the line:
        0 -> tot;

to make sure that TOT starts off with a number as a value.

Now look back at your procedure TOTAL and make sure you know what it all
means. Try it on several lists and make sure the totals are right. Then try
it on a long list of big numbers.

-- TOTAL must be given only numbers -----------------------------------------
The line:

        tot + item -> tot;

uses the arithmetic procedure "+". This works only for numbers. So you will
get an error if you include something other than a number in the list, e.g. a
word. to see what happens try:

    total([1 2 three four]) =>

-- Using TOTAL to find an average ------------------------------------------

This is fairly straight-forward. To find the average, find the total of the
list, then divide by the number in the list. We can use the POP-11 procedure
length for the last part:

    define average(list) -> av;
        total(list)/length(list) -> av
    enddefine;

In the middle line "/" means "divided by".

Now test your procedure with different lists of numbers

    average([ 1 2 3])=>
    average([ 1 2 3 4 5 6])=>
    average([40 50 60])=>
    average([ 200 300 400 500])=>

-- Averaging an empty lists? ------------------------------------------------

What happens if the list is empty? Try
    average([]) =>

You'll get a mishap. Instead you can change the procedure AVERAGE so that it
checks itself for an empty list.

    define average(list) -> av;
        if list = [] then
            mishap('CANNOT AVERAGE EMPTY LIST',[])
        else
            total(list)/length(list) -> av
        endif
    enddefine;

Now try
    average([]) =>

-- Finding the sum of squares of numbers in a list --------------------------
This is something often required in statistics, though we need not enquire
why now. Here is a way to do it. Try it.

    define sumsq_list(list) -> tot;
        ;;; find sum of squares of numbers in list
        vars item;
        0 -> tot;
        for item in list do
            item * item + tot -> tot
        endfor;
    enddefine;

Now try it
    sumsq_list([ 1 2 3]) =>

This is very like the procedure TOTAL except for the line

            item * item + tot -> tot

The first part 'item * item' says multiply the number by itself, i.e. find
its square. Then add it to TOT the running total. So in the end the value of
TOT will be the sum of all the squares.

-- Standard Deviation -------------------------------------------------------
You can use SUMSQ_LIST to define a procedure to compute the standard
deviation of a list of numbers, i.e. find the sum of the squares of the
differences of the numbers and the average. Then divide by one less than
the number of elements in the list. Here is one way to do it.

define stand_dev(list) -> dev;
    vars item sumsqs av;
    0 -> sumsqs;                ;;; this starts the sum of squares
    average(list) -> av;        ;;; find the average for the list
    for item in list do
                                ;;; find square of deviation from average
                                ;;; and add to sumsqs
        (item - av) * (item - av) + sumsqs -> sumsqs;
    endfor;
    sumsqs/(length(list) - 1) -> dev;
                                ;;; actually this is the variance
                                ;;; so take its square root to get
                                ;;; standard deviation
    sqrt(dev) -> dev
enddefine;


Now try this on various lists, and trace average and total so that you can
see how they are being used:

    trace average, total;
    stand_dev([1 2 3]) =>
    stand_dev([10 11 12]) =>
    stand_dev([1 2 3 4 5]) =>
    stand_dev([10 20 30 40 50]) =>

    vars data;
    [ 5 10 15 6 11 16 7 11 13 8 5 17 6] -> data;
    stand_dev(data) =>

Suppose you have five subjects each with 4 scores on four tests, and you want
to find for each test the mean and the standard deviation over all the
subjects, and store the results in a file.

You can put in the data in a list of lists, e.g.
    vars data;
    [ [ 5 20 31 15]     ;;; four scores for subject 1
      [ 6 22 35 12]     ;;; subject 2
      [ 4 20 29 16]
      [ 5 23 34 16]
      [ 7 19 28 24]
    ] -> data;

We have write a procedure to go through all the data working out the mean and
standard deviation for test number N, where N is 1, 2, 3, or 4

    define resultsfortest(n,list) -> mean -> dev;
        vars scores subj;
        [% for subj in list do subj(n) endfor%] -> scores;
        average(scores) -> mean;
        stand_dev(scores) -> dev;
    enddefine;

You can then get the results for test number 3 thus;
    resultsfortest(3,data) =>
    ** 3.04959 31.4

We can now define a procedure to do it for all the tests, and print out the
results.

    define allresults(num,list);
        ;;; num is the number of tests, list the list of records for
        ;;; all subjects
        vars test_num mean dev;
        for test_num from 1 to num do
            resultsfortest(test_num,list) -> mean -> dev;
            pr('RESULTS FOR TEST No: ' >< test_num
                >< ': Mean:\t' >< mean
                ><'\tStandard Deviation:\t' >< dev);
            pr(newline);
        endfor;
    enddefine;

then
    allresults(4,data);

Will cause the following to be printed out:

RESULTS FOR TEST No: 1: Mean:   5.4     Standard Deviation: 1.14018
RESULTS FOR TEST No: 2: Mean:   20.8    Standard Deviation: 1.64317
RESULTS FOR TEST No: 3: Mean:   31.4    Standard Deviation: 3.04959
RESULTS FOR TEST No: 4: Mean:   16.6    Standard Deviation: 4.44972

If you want the output to go to a named file, define a procedure thus:

    define fileresults(num,list,filename);
        ;;; num the number of tests, list the list of data
        vars cucharout;
        discout(filename) -> cucharout; ;;; prepare to print into file
        allresults(num,list);
        pr(termin);                     ;;; close the file
    enddefine;


NOTE:
This technique is not particularly efficient, since for each test it first
makes a list of all the numbers and then finds the mean and standard
deviation for that test. And it does this separately for each test.
It would be possible to redefine ALLRESULTS to go through all the data
once, working out all the means and deviations in parallel. But that would be
somewhat tedious, and for small data sets not worth the trouble.

There is an additional inefficiency in that all the data are put into a single
list, the whole of which has to be constructed before the program is run.
For very large amounts of data this overhead could be intolerable, and it
would be preferable to store the data in a file, and just read in a record at
a time. Again, there is no problem about doing this in POP-11.


--- C.all//teach/stats
--- Copyright University of Sussex 1991. All rights reserved. ----------