## Testing the statistics functions

To enable repeatable testing of the statistical functions I’m creating I decided to create a test script.

$data1 = @(1,2,3,4,5,6,7,8,9,10) $data2 = @(21,22,23,24,25,26,27,28,29,30) get-mean -numbers $data1 get-mean -numbers $data2 get-standarddeviation -numbers $data1 get-standarddeviation -numbers $data2 get-correlation -numbers1 $data1 -numbers2 $data2

As it always uses the same values and calls the functions in the same way I can generate repeatable results.

I’ll add extra lines to the script as I add more functions

## Calculating the correlation coefficient

This measures the degree of dependence between two sets of values -

+1 indicates perfect positive correlation

0 indicates no correlation

-1 indicates perfect negative correlation

We can calculate the correlation coefficient using this function

function get-correlation { [CmdletBinding()] param ( [double[]]$numbers1, [double[]]$numbers2 ) $count1 = $numbers1.length $count2 = $numbers2.length if ($count1 -ne $count2 ){ Throw "Samples are not of equal length" } $avg1 = (get-mean -numbers $numbers1).Average $avg2 = (get-mean -numbers $numbers2).Average $sd1 = get-standarddeviation -numbers $numbers1 $sd2 = get-standarddeviation -numbers $numbers2 $varsum = 0 for ($i=0; $i -le ($count1 -1); $i++) { $varsum += ($numbers1[$i]-$avg1) * ($numbers2[$i]-$avg2) } $correlation = $varsum / (($count1-1) * $sd1 * $sd2) $correlation }

Get the mean and standard deviation of the two datasets – using our existing functions

Calculate the sum of the products of the difference between each data point and the mean of its dataset

Take that value and divide it by the product of the standard deviations multiplied by the number of samples - 1

## Standard Deviation

Another simple calculation in PowerShell

function get-standarddeviation { [CmdletBinding()] param ( [double[]]$numbers ) $avg = $numbers | Measure-Object -Average | select Count, Average $popdev = 0 foreach ($number in $numbers){ $popdev += [math]::pow(($number - $avg.Average), 2) } $sd = [math]::sqrt($popdev / ($avg.Count-1)) $sd }

Get the numbers. Calculate the average as we saw last time.

Sum the square of the differences between each value and the mean. Divide by the number of samples minus 1 (corrects fro assumption we are dealing with a sample) and then take square root.

## Mean and moody

Been looking at some simple statistical calculations. First off calculating the mean (arithmetic mean aka average in layman’s speak)

For this we can use Measure-Object

function get-mean { [CmdletBinding()] param ( [double[]]$numbers ) $result = $numbers | Measure-Object -Average | select Count, Average $result }

We can use the function like this

get-mean -numbers $(1..10)

get-mean -numbers $(1..100)

(get-mean -numbers $(1..100)).Average