ANOVA and Chi Squared Testing.

Discussion in 'General Discussions' started by CrC, Jul 27, 2009.

  1. CrC

    CrC Level II

    Joined:
    Nov 18, 2006
    Messages:
    215
    Likes Received:
    0
    1. An organization dedicated to ensuring fairness is investigating “Wheel of Fortune”. This group knows that each contestant number may get to start a different number of games (3 or more games are played per show). Usually contestant number one gets to start at least TWO games, but contestant number three usually only gets to start off ONE game.

    To see whether the game is fair and each contestant wins the same proportion of the time (or not) this organization takes a random sample of 30 shows and finds the following.

    Show the hypothesis test, summary and implications. (hint: try to figure out whether this is an ANOVA or Chi Square test).
    (second hint: it’s a chi square test..you can tell because the numbers in the table represent counts…they are counting up the number of winners in each category).

    Contestant Number
    1
    2
    3

    Number of Winners
    14
    10
    6

    1. A new-car dealer would like to study the amount of money spent on optional equipment (units in $100) purchased for full-sized cars. He thinks it may vary by age of purchaser and would like to ‘target market’ to the highest spenders. A random sample of individuals in each age category is selected and the amount spent is recorded.
    Test (show ALL tests and information) on whether there is a different amount spent across age groups. (hint: try to figure out whether this is an ANOVA or Chi Square test).
    (second hint: its an anova). Show summary and implications.



    Age Group

    young
    6.31
    4.27
    5.75

    middleaged
    7.64
    5.36
    3.85

    Older
    8.37
    9.26
    10.16

    This is due in 9 hours... It isn't too hard... Just a bit time consuming.
     
  2. Commy

    Commy Moderator
    Staff Member

    Joined:
    Aug 11, 2007
    Messages:
    2,781
    Likes Received:
    108
    Location:
    Melbourne
    Question 1:
    Null hypothesis: No significant difference between the expected and observed frequencies
    Alternative hypothesis: Significant difference between expected and observed
    The degrees of freedom is n-1, which is 2.

    The observed data is 14, 10 and 6. If the game was fair, and the contestants have an equal chance to win, then the frequency would be 10,10 and 10.
    Chi-Square test is just Sum((Observed - Expected)^2 / Expected).

    (14-10)^2/10 + 0 + (6-10)^2/10
    = 3.2

    To find the significance you can use some chi-sq table, but I just used a program:
    > 1-pchisq(3.2,2)
    [1] 0.2018965

    Significance value of 0.201, which is greater than .05, so do not reject null hypothesis and conclude that there is not significant difference between the 3 contestants' frequencies. Although 30 as a random sample is pretty small.

    Question 2:
    Can't be bothered...so used a program instead. Should be good for the values.

    > Car <- data.frame(age = c(6.31, 4.27,5.75,7.64,5.36,3.85,8.37,9.26,10.16), age.f = factor((c(rep('young', 3), rep('middle', 3), rep('older', 3)))))
    > lm.h <- lm(age ~ age.f, data = Car)
    > anova(lm.h)
    Analysis of Variance Table

    Response: age
    Df Sum Sq Mean Sq F value Pr(>F)
    age.f 2 27.9206 13.9603 7.5429 0.02304 *
    Residuals 6 11.1048 1.8508
    ---
    Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    Age is significant with a probability of 0.02304, less than 0.05, so conclude that there is a significant difference in amount spent across age groups.
     
    lazypando likes this.
  3. Hally

    Hally Level IV

    Joined:
    Nov 16, 2006
    Messages:
    1,184
    Likes Received:
    34
    What the crap is ANOVA Testing, and why didn't it come up in AP stats?
     
  4. CrC

    CrC Level II

    Joined:
    Nov 18, 2006
    Messages:
    215
    Likes Received:
    0
    Wow, didn't think anyone was going to answer it... I am pretty sure it is right just by reading it. I posted it thinking no one was going to do it and a bit jokingly to see if people could actually help with the homework.

    I am in my last week of this class and these assignments coming at me ever day have just made me so unmotivated to do them. I understand how to do them, just hate the fact that it takes awhile to do them.

    Thank you for the help on this. It is much appreciated.


    ANOVA testing is basically to test whether your set of series (the title was a bit backwards as the 2nd problem is an ANOVA problem) are significantly different from each other. The professor's definition given to me is... "ANOVA test whether average values differ due to group treatment."

    It is very similar to hypothesis testing except that after you basically finish that... You have to test each of your groups vs the other group... like... young vs middle aged and then do young vs older and then do a middle aged vs older.

    That is how I have interpreted it, but I am still learning how to do them. Just curious how much experience you have with these Commy. You really know your stats.

    Hmm... on a chi squared table found at http://www.statsoft.com/textbook/sttable.html#chi ... It shows .05 alpha with 2 degress of freedom is 5.99.

    Here we go...
    X^2 c < x^2 table , but you would still do not reject null. I guess its just different ways of how it was taught
     
  5. Zer0

    Zer0 Level IV

    Joined:
    Mar 2, 2008
    Messages:
    3,037
    Likes Received:
    180
    Location:
    Home sweet home
    ANOVA = analysis of variance
    AP stats = a joke :/
     
  6. Commy

    Commy Moderator
    Staff Member

    Joined:
    Aug 11, 2007
    Messages:
    2,781
    Likes Received:
    108
    Location:
    Melbourne
    Stats history: Majored in stats back in my bachelor of science degree, and tutoring a 1st year biomed stats class currently.
    Well the chi-square statistic that we found was 3.2, and in your table that lies between 2.77259 < X^2 < 4.60517 and so the corresponding probability is 0.1< P < 0.25, which isn't significant. Instead of using a table, I used a program to calculate the p-value directly, which was 0.201.
    We weren't taught differently, I was taught your way, the manual way first.

    And Hally, ANOVA uses F-testing, which is similar to the chi-sq test, but it's for more than 2 groups. It's still hypothesis testing, but your null hypothesis would be something like:
    u1=u2=u3, where u1 = sample mean of young drivers, u2 = sample mean of middle aged drivers and u3 = sample mean of old drivers.
    Your alternative would be not the null.

    If the F-value is significant, and you need to find out where the significant difference is, you'd use a post-hoc test like Tukey's.
     
  7. CrC

    CrC Level II

    Joined:
    Nov 18, 2006
    Messages:
    215
    Likes Received:
    0
    Hmm the professor wants us to show the data stuff...

    Q table alpha K, n-K
    Q.05 2,7 = 3.34
    Qc= 7.54

    Qc > Qtable There is is a significant difference in amount spent across age groups.
    Qc= 7.54
    Qtable = 3.34

    Then the professor wants us to go further and do the w tables..
    x1 = young ; x2 = middle aged ; x3 = older
    | x1 - x2 | = | 5.443 - 5.617 | = .174
    | x1 - x3 | = | 5.443 - 9.263 | = 3.82
    | x2 - x3 | = | 5.617 - 9.263 | = 3.646

    This is where I am stuck right now...

    edit: opps forgot to multiply by square root ( MSE/(couldn't read the board n-k?)

    nope... i wrote correctly... MSE/ng (ng= number in each sample group)
     
  8. Commy

    Commy Moderator
    Staff Member

    Joined:
    Aug 11, 2007
    Messages:
    2,781
    Likes Received:
    108
    Location:
    Melbourne
    I don't think we've learned w-tables. But it looks like you're doing Tukey's standardised range method, where you compare your "w-table" to the LSDQ, which is basically the Tukey statistic. If, let's say | x1 - x2 | is > then LSDQ, then the two means, x1 and x2 differ significantly at the 5% level. You should be able to calculate the LSDQ using a table, and you'll also need the standard error of the difference of the two means.

    Here's something I did awhile back
    Tukey test
    LSDQ= Q(3,8,0.95)/ √2 X se(estimate)
    = 2.857x√(2X97.6/5) = 17.85
    Soil type means:
    Type I = 19.70 Type II = 48.28 Type III = 58.52
    |19.70-48.28| = 28.58 > LSDQ
    |19.70-58.52| = 38.82 > LSDQ
    |48.28-58.52| = 10.27 < LSDQ

    So conclude that type 1 is significantly different from 2 and 3.
     
  9. CrC

    CrC Level II

    Joined:
    Nov 18, 2006
    Messages:
    215
    Likes Received:
    0
    3.34 * sqrt (1.85/3) = 2.6228
    | x1 - x2 | = | 5.443 - 5.617 | = .174 (do not reject null)
    | x1 - x3 | = | 5.443 - 9.263 | = 3.82 (reject null)
    | x2 - x3 | = | 5.617 - 9.263 | = 3.646 (reject null)

    that should have gone with the other stuff... I guess that is how i would conclude the other stuff. I think its right, but not sure.

    young vs older and middle aged vs older has a higher amount spent. There is not overwhelming evidence to suggest amount spent differs between young and middle aged groups.

    Implications The salesman's target market would be to older group to younger group, and older group to middle aged group. Yet, don't prefer target market to younger group then to middle aged group.
     
  10. Commy

    Commy Moderator
    Staff Member

    Joined:
    Aug 11, 2007
    Messages:
    2,781
    Likes Received:
    108
    Location:
    Melbourne
    It's not really "do not reject null" if you're using Tukey's, it's "differ significantly" or not.
    As for your explanation, it seems right if you calculated the Tukey value correctly, although the wording to your answer is a little awkward.
     
  11. CrC

    CrC Level II

    Joined:
    Nov 18, 2006
    Messages:
    215
    Likes Received:
    0
    Sorry making it up as I go... That is the hardest part for me... Trying to make it sound english while trying to use a previous example I have to make sure it is correct...

    Summary : An older group compared to the younger and middle aged group have a higher amount spent. There is not overwhelming evidence to suggest that the amount spent between younger and middle aged groups differ.

    Implications : The salesman's target market would be to the older group then to the younger or middle aged group. Yet there is no significant difference between the younger and middle aged group.

    How does that sound?
     
  12. Commy

    Commy Moderator
    Staff Member

    Joined:
    Aug 11, 2007
    Messages:
    2,781
    Likes Received:
    108
    Location:
    Melbourne
    Yeah, that's better. Maybe something like:
    Summary: The older group's mean is significantly different to both the younger and middle aged group, implying that the older group are the highest spenders on optional equipment. There is no significant difference between the younger and middle aged groups.
    Implications: The new-car dealer should target the older age group, as they spend a significant amount more on optional equipment than the less elderly groups.
     
  13. CrC

    CrC Level II

    Joined:
    Nov 18, 2006
    Messages:
    215
    Likes Received:
    0
    My professor is really strict on the summary and interpretations... I mean REALLY strict. Which is why I always hate that last part because I don't even want to write down a single wrong word...

    The professor took off 3 points on my test because I wrote the people instead of projects... That test sucked, didn't have enough time to proof-read.

    I like your summary better. I think I need a sentence for the implication to discribe the younger and middle aged (as a comparasion between each other)