Friday, May 26, 2017

Statistical Analysis - Dan Remenyi - Chapter Summary

Ph.D Research Methodology - Statistical Analysis

Mathematics and Statistics are  important for the analysis and interpretation of evidence in the business and management world. They enable us to deal with and to solve problems that otherwise would be quite intractable.

Representing Evidence
Evidence may be represented by graphical summaries such as bar charts and histograms, tabular summaries such as one-way and two-way relative frequency tables and by numerical summaries such as the mean and the standard deviation.

Bar Charts and Histograms
Familiar to everyone, is to use bar charts or frequency histograms.  A chart is drawn in which the height of each bar is proportional to the frequency with which that outcome occurs or, by dividing each bar by the total number of observed events, to estimate the probability with which that outcome occurs.

Measures of distribution
Distributions can be summaries in terms of certain key characteristics.  Range and quartiles are other dimensions that are sometimes used.

The Mean
The most common measure of location is the mean.

The Median
Another measure of location is the median, which is the measurement that falls in the middle of the distribution so that there are as many items below it as above it.

Standard Deviation

The range would then simply correspond to the largest value minus the smallest value.  The lower corresponding to a point below which one quarter of the points lie (the lower quartile) and the other to a point above which one quarter of the points lie( the upper quartile).

Important, distributions which arise in statistics.   The first is the binomial distribution, which is the case whenever there are only two possible outcomes: heads or tails, true of false, girls or boys, and so on.  The Poisson distribution is the limiting case of the binomial distribution when the probability of one of the outcomes is very small.

But the most important of all is the Normal distribution in which the distribution of outcomes follows the familiar bell-shaped curve.

Testing Hypotheses

The hypothesis of the thesis is many times tested using statistical tests of hypothesis.
A null hypothesis is stated and an alternate hypothesis is stated. One of them is accepted. Technical it is said
that the null hypothesis has not been disproved or disproved.

Type I and Type II Errors
The null hypothesis can be rejected when it is true (Type I) or be accepted when it is false (Type II). A Type I error is small – this is referred to as the significance level of the test.  5 per cent and 1 per cent it is given one star, between 1 per cent and 0.1 per cent two stars; and below 0.1 per cent three stars.

In order to determine the probability of making a Type II error, is specified as the power of the test.   At the 5 per cent significance level and with 90 per cent power.

m1 and S1, m2 and S2  then the difference in the means is d = m1- m2

 and the standard error of the difference is:
                       e =  S²1  + S² 2
So a null hypothesis is made that the true value of d is equal to zero and the d  calculated should exceed 1.96 x e with less than 5 per cent probability.

Gossett showed that even for small numbers of evidence points it is still possible to test the ratio of d/e, and he provided what is now called Student’s t –distribution which is used instead of the normal distribution.

Paired and Unpaired t-Tests
If it is possible to make both measurements on the same person, organisation or sampling unit, a more powerful test can be developed. Instead of testing the difference between the means, the difference between each pair of means is calculated and then the standard deviations of the mean of the differences is calculated.  This is called a paired t-test since it has been possible to treat the evidence points in pairs.

Tests of Association

For example, to ascertain if more beer is sold when the weather is hot the first step would be to plot a graph of the amount of beer sold against the temperatures.
A straight line could be drawn that is considered to ‘best fit’ the evidence and secondly it enables the error in the slope to be determined so that it can be seen if the slope differs significantly from zero.

Y = a + bx   (5)

Factor Analysis
For data reduction and the exploration of underlying dimensions.  It is therefore a technique that can be used to provide a parsimonious description of complex multi-faceted intangible concept such as the quality of service or the relationship between individuals in an organisation.

Consult the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy as it offers some idea of how relevant the factor analysis is for the evidence being used.  The rule for the use of this statistic is that if the KMO is less than 0.50 there is no value in proceeding with the technique.  The greater the value of the KMO the more effective the factor analysis is likely to be.

Examine the eigen-values.  Only factors with an eigen-value of greater than one are used in the analysis will explain more variability than any one of the original variables on their own.

Study the rotated factor matrix.  Examine each factor separately, looking for the input variables that influence the factor, which have a loading of 0.5 or more.

Attempt to combine the meaning of the variables identified in 3 above into an underlying factor or super-variable which will explain the combined effect of these individual variables, what is being sought is a relatively simple description of the complex effect of several of the original variables.

Correspondence Analysis

Correspondence analysis is a multivariate analysis technique that can be used to analyse and interpret cross-tabulations of categorical data.  The only constraint on the cell entries in the contingency table is that they be non negative.

The main output from a correspondence analysis is a graphical display that is a simultaneous plot of the rows and columns of the contingency table in a space of two or more dimensions.  Those rows with similar profiles are plotted ‘close’ together, as are columns with similar profiles.

The number of dimensions needed for  a perfect representation of a contingency table is given by the minimum of (R-1) and (C-1), which for a large contingency table will clearly not be helpful.
The ANACOR program within the SPSS package can be used to perform a correspondence analysis.

Very detailed description of Statistics used in Research Studies

Handbook of Chemometrics and Qualimetrics, Part 1

Elsevier, 12-Dec-1997 - Technology & Engineering - 886 pages

Handbook of Chemometrics and Qualimetrics, Part 2

Elsevier, 04-Dec-1998 - Science - 876 pages

Updated 28 May 2017, 1 June 2013

No comments:

Post a Comment