The t-test function in excel can be incredibly powerful. Without the coding necessary to operate Python or R, everyone can calculate the likelihood that two sets of data are actually different. This function has powered dissertations and scientific discoveries including my own! Unfortunately, many people still do not understand the t-test, how to use this function, or interpret the results of the function.
Through this article, we are going to examine each part of this function and prepare you to use the function to its fullest potential.
When you have two sets of data, you can calculate the average of those data sets. As an example, let’s say that we are measuring the weight of 20 2-year-olds and 3-year-olds. Not all two-year-olds and all three-year-olds are going to weigh the same. Therefore, if we calculate the averages of the two groups, we would need to know if the averages are statistically different. For example, consider the data below.
Overall, it may look like the 2-year-old weights are lower than the 3-year old weights. If we calculate the averages, we see that the average weight of 2-year-olds is 22.35 lbs and 3-year-olds is 37.85 lbs. While the averages are different, we do not know if this difference is just by random chance. If we had weighed more children, would they be closer in weight?
This is where a t-test comes in. The t-test allows us to say the chance is that the difference in the average is by random chance, i.e. we happened to randomly sample the lighter 2-year-olds and the heavier 3-year-olds.
A t-test is specifically used for hypothesis testing. A hypothesis is an educated guess at the relationship or difference between two things.
For example, in our question above, our hypothesis would be that 3-year-olds weigh a different amount than 2-year-olds.
If we want to be more specific, we could say that 3-year-olds weigh more than 2-year-olds. In this case, we have given a direction to the expected change in our groups.
For every hypothesis, there is a null hypothesis. The null hypothesis is the hypothesis that nothing changed and there is no difference.
For example, our null hypothesis would be that 3-year-olds weigh the same as 2-year olds.
The t-test gives us a measurable percentage that we can support our hypothesis and reject our null hypothesis. We will discuss how to use the results of a t-test to reject or accept a null hypothesis later.
So now that you know what the t-test does, it is important to understand how the function works. The return for this function is called a p-value and gives you the significance that two things are in fact different.
=T.Test(array1, array2, tails, type)
There are 4 variables or inputs in the function: array 1, array 2, tails, and type.
The first two arguments for the function are the data you are interested in testing. You want to include all the data for your first set in array 1 and all your data for your second set in array 2. These are interchangeable.
The third input is the tails. There are only two arguments for this input:
1 — one-tailed distribution
2 — two-tailed distribution
The selection of one-tailed vs. two-tailed is dependent on the direction of the expected chance. The most common test is a two-tailed distribution.
A two-tailed distribution is less powerful than a one-tailed distribution, which means all things being considered a two-tailed distribution will give a lower level of significance than a one-tailed distribution.
A one-tailed distribution should only be selected when you know the perceived direction of change and the consequences of missing a change in the other direction are negligible. This is very dependent on the question that we are asking. If we ask are 3-year-olds heavier than 2-year-olds and we do not care that the 3-year-olds may be lighter, then we could perform a one-tailed t-test.
As an example, the results for the above data as a one-tailed and two-tailed t-test are 4.92 x 10^(-8) and 9.82 x 10^(-8), respectively. Since 4.92 is less than 9.82, this means that the one-tailed test is more significant than the two-tailed test.
Importantly, a one-tailed t-test should never be used because a two-tailed t-test found the samples to be not significant.
The fourth and final input is to determine the type of t-test to be performed. There are three arguments for this input:
1 — paired t-test
2 — two-sample equal variance
3 — two-sample unequal variance
Since deciding the type of t-test is fairly complicated, we will explain it in more detail below. However, the most common test is the two-sample unequal variance (3).
When examining which type of t-test you should use, you want to ask two questions: paired vs. two-sampled and then equal variance vs. unequal variance.
The first question you should ask is whether your data and your research question are naturally paired. A paired t-test should be used if one sample from each set has a closer relationship to each other than they do the other samples in the sets and you are investigating this specific relationship.
Let's take the example data above. If our question is “Do 3-year-olds weigh differently than 2-year-olds?”, then we would want to perform a two-sample test (input 2 or 3). This question does not assume any relationship between the individual samples of the 2-year-old and 3-year-olds.
On the other hand, if we ask, “Do 3-year-olds weigh differently than their 2-year-old siblings?”, now we would perform a paired t-test (input 1). In this case, we would arrange our data where siblings are across from each other. For example, the first sample in each set would be siblings and so on. If we included data on non-siblings in our first question, we would need to remove that data for the paired t-test.
The difference between these tests is in the null hypothesis that you are testing.
The first case your null hypothesis is that the average weight of 3-year-olds is the same as the average weight of 2-year-olds.
The second case your null hypothesis is that the difference in weight between 2-year-old and 3-year-old siblings is zero.
One type of test will not necessarily give you a higher level of significance than the other, because they are answering fundamentally different questions. For example, in the data set above, the paired t-test and two-sample t-tests result in a p-value of 2.99 x 10^(-7) and 9.84 x 10^(-8), where the paired t-test has a lower level of significance than the independent two-sample t-test.
If you know that you want to use a two-sample t-test, then you need to decide if you should use an equal variance or unequal variance t-test. The difference between the two-sample equal variance (input = 2) and the two-sample unequal variance (input = 3) is based on the variance or deviation in the population.
Simply put, you should only choose an equal variance if you have data on the variation in the entire population and know that this is equal. Your population variation is not equal to the standard deviation of your subset. In most cases, this information is unavailable, and therefore, a two-sample equal variance t-test should be used the least.
In fact, if your variances are in fact equal then the two-sample unequal variance will perform on par with the two-sample equal variance t-test. Therefore, in general, you should likely choose the two-sample unequal variance to ensure that you are not making false-positive errors.
Once you run your t-test in excel, the output it will give you is called a p-value. The p-value (p) is the probability that the difference between the averages of two sets of data is due to random chance. In more statistical terms, the p-value is the probability that the null hypothesis is actually true, but random chance makes it appear not true.
For example, if the p-value for the differences in the weights of children was 0.01, then there is a 1% chance that 2-year-olds really do weigh the same as 3-year-olds.
Therefore, you need to have a chosen confidence level that you are going to support or reject your null hypothesis. The confidence level tells you what percentage you can be certain that the true population matches your sample.
Confidence level = (1 - p-value) x 100%
For example, if our data is significant at the 95% confidence level (p < 0.05), then you can be 95% sure that two-year-olds weigh less than three-year-olds.
Common confidence levels are 95%, 99%, and 99.9% or a p-value equal to 0.05, 0.01, or 0.001, respectively.
Once you set your confidence level, you will then be able to use the p-value to accept or reject the null hypothesis at that confidence level.
For example, if our confidence level was 95% and p = 0.028, then we can reject our null hypothesis and support our hypothesis with 95% certainty.
Get my 30 Day research jumpstart to go from not knowing your research to collecting publishable data in 30 days.