16074

Testing the statistics functions

To enable repeatable testing of the statistical functions I’m creating I decided to create a test script.

$data1 = @(1,2,3,4,5,6,7,8,9,10)            
$data2 = @(21,22,23,24,25,26,27,28,29,30)            
            
get-mean -numbers $data1            
get-mean -numbers $data2            
            
get-standarddeviation -numbers $data1            
get-standarddeviation -numbers $data2            
            
get-correlation -numbers1 $data1 -numbers2 $data2


As it always uses the same values and calls the functions in the same way I can generate repeatable results.



 



I’ll add extra lines to the script as I add more functions

Calculating the correlation coefficient

This measures the degree of dependence between two sets of values -

+1 indicates perfect positive correlation

0 indicates no correlation

-1 indicates perfect negative correlation

We can calculate the correlation coefficient using this function

function get-correlation {                        
[CmdletBinding()]                        
param (                        
  [double[]]$numbers1,            
  [double[]]$numbers2                          
)             
            
$count1 = $numbers1.length            
$count2 = $numbers2.length             
if ($count1 -ne $count2 ){            
  Throw "Samples are not of equal length"             
}                       
                        
$avg1 = (get-mean -numbers $numbers1).Average            
$avg2 = (get-mean -numbers $numbers2).Average            
            
$sd1 = get-standarddeviation -numbers $numbers1             
$sd2 = get-standarddeviation -numbers $numbers2                
            
$varsum = 0                        
                        
for ($i=0; $i -le ($count1 -1); $i++) {                        
  $varsum += ($numbers1[$i]-$avg1) * ($numbers2[$i]-$avg2)              
}                        
                        
$correlation = $varsum / (($count1-1) * $sd1 * $sd2)                        
$correlation                       
}


Get the mean and standard deviation  of the two datasets – using our existing functions



Calculate the sum of the products of the difference between each data point and the mean of its dataset



Take that value and divide it by the product of the standard deviations multiplied by the number of samples - 1

Standard Deviation

Another simple calculation in PowerShell

function get-standarddeviation {            
[CmdletBinding()]            
param (            
  [double[]]$numbers            
)            
            
$avg = $numbers | Measure-Object -Average | select Count, Average            
            
$popdev = 0            
            
foreach ($number in $numbers){            
  $popdev +=  [math]::pow(($number - $avg.Average), 2)            
}            
            
$sd = [math]::sqrt($popdev / ($avg.Count-1))            
$sd            
}


Get the numbers. Calculate the average as we saw last time.



Sum the square of the differences between each value and the mean.  Divide by the number of samples minus 1 (corrects fro assumption we are dealing with a sample) and then take square root.

Mean and moody

Been looking at some simple statistical calculations.  First off calculating the mean (arithmetic mean aka average in layman’s speak)

For this we can use Measure-Object

 

function get-mean {            
[CmdletBinding()]            
param (            
  [double[]]$numbers            
)            
            
$result = $numbers | Measure-Object -Average | select Count, Average            
$result            
}


 



We can use the function like this



get-mean -numbers $(1..10)
get-mean -numbers $(1..100)
(get-mean -numbers $(1..100)).Average