ANOVA and Chi Squared Testing.

CrC · Jul 27, 2009

1. An organization dedicated to ensuring fairness is investigating â€œWheel of Fortuneâ€. This group knows that each contestant number may get to start a different number of games (3 or more games are played per show). Usually contestant number one gets to start at least TWO games, but contestant number three usually only gets to start off ONE game.

To see whether the game is fair and each contestant wins the same proportion of the time (or not) this organization takes a random sample of 30 shows and finds the following.

Show the hypothesis test, summary and implications. (hint: try to figure out whether this is an ANOVA or Chi Square test).
(second hint: itâ€™s a chi square test..you can tell because the numbers in the table represent countsâ€¦they are counting up the number of winners in each category).

Contestant Number
1
2
3

Number of Winners
14
10
6

1. A new-car dealer would like to study the amount of money spent on optional equipment (units in $100) purchased for full-sized cars. He thinks it may vary by age of purchaser and would like to â€˜target marketâ€™ to the highest spenders. A random sample of individuals in each age category is selected and the amount spent is recorded.
Test (show ALL tests and information) on whether there is a different amount spent across age groups. (hint: try to figure out whether this is an ANOVA or Chi Square test).
(second hint: its an anova). Show summary and implications.

Age Group

young
6.31
4.27
5.75

middleaged
7.64
5.36
3.85

Older
8.37
9.26
10.16

This is due in 9 hours... It isn't too hard... Just a bit time consuming.

Commy · Jul 27, 2009

Question 1:
Null hypothesis: No significant difference between the expected and observed frequencies
Alternative hypothesis: Significant difference between expected and observed
The degrees of freedom is n-1, which is 2.

The observed data is 14, 10 and 6. If the game was fair, and the contestants have an equal chance to win, then the frequency would be 10,10 and 10.
Chi-Square test is just Sum((Observed - Expected)^2 / Expected).

(14-10)^2/10 + 0 + (6-10)^2/10
= 3.2

To find the significance you can use some chi-sq table, but I just used a program:
> 1-pchisq(3.2,2)
[1] 0.2018965

Significance value of 0.201, which is greater than .05, so do not reject null hypothesis and conclude that there is not significant difference between the 3 contestants' frequencies. Although 30 as a random sample is pretty small.

Question 2:
Can't be bothered...so used a program instead. Should be good for the values.

> Car <- data.frame(age = c(6.31, 4.27,5.75,7.64,5.36,3.85,8.37,9.26,10.16), age.f = factor((c(rep('young', 3), rep('middle', 3), rep('older', 3)))))
> lm.h <- lm(age ~ age.f, data = Car)
> anova(lm.h)
Analysis of Variance Table

Response: age
Df Sum Sq Mean Sq F value Pr(>F)
age.f 2 27.9206 13.9603 7.5429 0.02304 *
Residuals 6 11.1048 1.8508
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Age is significant with a probability of 0.02304, less than 0.05, so conclude that there is a significant difference in amount spent across age groups.

Hally · Jul 27, 2009

What the crap is ANOVA Testing, and why didn't it come up in AP stats?

CrC · Jul 27, 2009

Wow, didn't think anyone was going to answer it... I am pretty sure it is right just by reading it. I posted it thinking no one was going to do it and a bit jokingly to see if people could actually help with the homework.

I am in my last week of this class and these assignments coming at me ever day have just made me so unmotivated to do them. I understand how to do them, just hate the fact that it takes awhile to do them.

Thank you for the help on this. It is much appreciated.

ANOVA testing is basically to test whether your set of series (the title was a bit backwards as the 2nd problem is an ANOVA problem) are significantly different from each other. The professor's definition given to me is... "ANOVA test whether average values differ due to group treatment."

It is very similar to hypothesis testing except that after you basically finish that... You have to test each of your groups vs the other group... like... young vs middle aged and then do young vs older and then do a middle aged vs older.

That is how I have interpreted it, but I am still learning how to do them. Just curious how much experience you have with these Commy. You really know your stats.

To find the significance you can use some chi-sq table, but I just used a program:
> 1-pchisq(3.2,2)
[1] 0.2018965

Significance value of 0.201, which is greater than .05, so do not reject null hypothesis and conclude that there is not significant difference between the 3 contestants' frequencies. Although 30 as a random sample is pretty small.
Click to expand...

Hmm... on a chi squared table found at http://www.statsoft.com/textbook/sttable.html#chi ... It shows .05 alpha with 2 degress of freedom is 5.99.

Here we go...
X^2 c < x^2 table , but you would still do not reject null. I guess its just different ways of how it was taught

Zer0 · Jul 27, 2009

Hally said:

What the crap is ANOVA Testing, and why didn't it come up in AP stats?
Click to expand...

ANOVA = analysis of variance
AP stats = a joke :/

Commy · Jul 27, 2009

Stats history: Majored in stats back in my bachelor of science degree, and tutoring a 1st year biomed stats class currently.

CrC said:

Hmm... on a chi squared table found at http://www.statsoft.com/textbook/sttable.html#chi ... It shows .05 alpha with 2 degress of freedom is 5.99.
Click to expand...

Well the chi-square statistic that we found was 3.2, and in your table that lies between 2.77259 < X^2 < 4.60517 and so the corresponding probability is 0.1< P < 0.25, which isn't significant. Instead of using a table, I used a program to calculate the p-value directly, which was 0.201.
We weren't taught differently, I was taught your way, the manual way first.

And Hally, ANOVA uses F-testing, which is similar to the chi-sq test, but it's for more than 2 groups. It's still hypothesis testing, but your null hypothesis would be something like:
u1=u2=u3, where u1 = sample mean of young drivers, u2 = sample mean of middle aged drivers and u3 = sample mean of old drivers.
Your alternative would be not the null.

If the F-value is significant, and you need to find out where the significant difference is, you'd use a post-hoc test like Tukey's.

CrC · Jul 27, 2009

Hmm the professor wants us to show the data stuff...

Q table alpha K, n-K
Q.05 2,7 = 3.34
Qc= 7.54

Qc > Qtable There is is a significant difference in amount spent across age groups.
Qc= 7.54
Qtable = 3.34

Then the professor wants us to go further and do the w tables..
x1 = young ; x2 = middle aged ; x3 = older
| x1 - x2 | = | 5.443 - 5.617 | = .174
| x1 - x3 | = | 5.443 - 9.263 | = 3.82
| x2 - x3 | = | 5.617 - 9.263 | = 3.646

This is where I am stuck right now...

edit: opps forgot to multiply by square root ( MSE/(couldn't read the board n-k?)

nope... i wrote correctly... MSE/ng (ng= number in each sample group)

Commy · Jul 27, 2009

I don't think we've learned w-tables. But it looks like you're doing Tukey's standardised range method, where you compare your "w-table" to the LSDQ, which is basically the Tukey statistic. If, let's say | x1 - x2 | is > then LSDQ, then the two means, x1 and x2 differ significantly at the 5% level. You should be able to calculate the LSDQ using a table, and you'll also need the standard error of the difference of the two means.

Here's something I did awhile back
Tukey test
LSDQ= Q(3,8,0.95)/ âˆš2 X se(estimate)
= 2.857xâˆš(2X97.6/5) = 17.85
Soil type means:
Type I = 19.70 Type II = 48.28 Type III = 58.52
|19.70-48.28| = 28.58 > LSDQ
|19.70-58.52| = 38.82 > LSDQ
|48.28-58.52| = 10.27 < LSDQ

So conclude that type 1 is significantly different from 2 and 3.

CrC · Jul 27, 2009

3.34 * sqrt (1.85/3) = 2.6228
| x1 - x2 | = | 5.443 - 5.617 | = .174 (do not reject null)
| x1 - x3 | = | 5.443 - 9.263 | = 3.82 (reject null)
| x2 - x3 | = | 5.617 - 9.263 | = 3.646 (reject null)

that should have gone with the other stuff... I guess that is how i would conclude the other stuff. I think its right, but not sure.

young vs older and middle aged vs older has a higher amount spent. There is not overwhelming evidence to suggest amount spent differs between young and middle aged groups.

Implications The salesman's target market would be to older group to younger group, and older group to middle aged group. Yet, don't prefer target market to younger group then to middle aged group.

Commy · Jul 27, 2009

It's not really "do not reject null" if you're using Tukey's, it's "differ significantly" or not.
As for your explanation, it seems right if you calculated the Tukey value correctly, although the wording to your answer is a little awkward.

CrC · Jul 27, 2009

Sorry making it up as I go... That is the hardest part for me... Trying to make it sound english while trying to use a previous example I have to make sure it is correct...

Summary : An older group compared to the younger and middle aged group have a higher amount spent. There is not overwhelming evidence to suggest that the amount spent between younger and middle aged groups differ.

Implications : The salesman's target market would be to the older group then to the younger or middle aged group. Yet there is no significant difference between the younger and middle aged group.

How does that sound?

Commy · Jul 27, 2009

Yeah, that's better. Maybe something like:
Summary: The older group's mean is significantly different to both the younger and middle aged group, implying that the older group are the highest spenders on optional equipment. There is no significant difference between the younger and middle aged groups.
Implications: The new-car dealer should target the older age group, as they spend a significant amount more on optional equipment than the less elderly groups.

CrC · Jul 27, 2009

My professor is really strict on the summary and interpretations... I mean REALLY strict. Which is why I always hate that last part because I don't even want to write down a single wrong word...

The professor took off 3 points on my test because I wrote the people instead of projects... That test sucked, didn't have enough time to proof-read.

I like your summary better. I think I need a sentence for the implication to discribe the younger and middle aged (as a comparasion between each other)

Neofriends.net

ANOVA and Chi Squared Testing.

CrC Level II

Commy Moderator
Staff Member

Hally Level IV

CrC Level II

Zer0 Level IV

Commy Moderator
Staff Member

CrC Level II

Commy Moderator
Staff Member

CrC Level II

Commy Moderator
Staff Member

CrC Level II

Commy Moderator
Staff Member

CrC Level II

About Neofriends

Quick Navigation

Help keep the site alive

Useful Searches

ANOVA and Chi Squared Testing.

CrC Level II

Commy Moderator Staff Member

Hally Level IV

CrC Level II

Zer0 Level IV

Commy Moderator Staff Member

CrC Level II

Commy Moderator Staff Member

CrC Level II

Commy Moderator Staff Member

CrC Level II

Commy Moderator Staff Member

CrC Level II

Commy Moderator
Staff Member

Commy Moderator
Staff Member

Commy Moderator
Staff Member

Commy Moderator
Staff Member

Commy Moderator
Staff Member