Hypothesis Testing Flashcards: Complete Study Guide

Q: How do sample size and effect size influence hypothesis testing outcomes?

Sample size and effect size have distinct but interrelated roles in hypothesis testing. Effect size measures the magnitude of the relationship or difference between variables, independent of sample size. Larger true effect sizes make them easier to detect statistically. Sample size affects statistical power: larger samples provide more precise estimates and greater ability to detect effects. A large sample can yield statistically significant results even for tiny effect sizes. Meanwhile, a small sample might fail to detect large effects due to insufficient power. This is why modern research reports both p-values and effect sizes. When planning studies, researchers conduct power analyses using expected effect sizes and desired power levels. These determine required sample size. As a student, understanding this relationship helps you interpret research appropriately. A significant p-value with a tiny effect size in a large study may be less meaningful than a marginally non-significant result with a large effect size in a small study.

Q: Why would researchers choose a nonparametric test over a parametric one?

Researchers choose nonparametric tests when data violate the assumptions required for parametric tests. Parametric tests assume normal distribution, homogeneous variances, and interval/ratio measurement scales. When working with ordinal data, heavily skewed distributions, or violations of homogeneity of variance, nonparametric alternatives provide more accurate results. Additionally, nonparametric tests make fewer assumptions about population distribution, making them robust across different scenarios. However, parametric tests are generally more powerful when assumptions are met. This means they are better at detecting true effects. The trade-off involves losing some statistical power for robustness. Another consideration is sample size: nonparametric tests often require larger samples to achieve equivalent power. In practice, researchers examine data distributions, conduct normality tests, and choose accordingly. Understanding when each test family is appropriate prevents inappropriate analysis choices that could invalidate research conclusions.

Q: What does a confidence interval tell you that a p-value doesn't?

While p-values indicate whether results are statistically significant, confidence intervals provide the range of plausible population parameter values. A 95 percent confidence interval means that if you repeated your study many times, approximately 95 percent of the intervals would contain the true population parameter. Confidence intervals communicate both the point estimate and the precision of that estimate. A narrow interval indicates precise estimation, while a wide interval suggests greater uncertainty. Unlike p-values, confidence intervals directly estimate effect magnitude and allow comparison across studies. If a confidence interval doesn't include zero (for comparisons) or one (for ratios), the result is statistically significant at the corresponding alpha level. Many statisticians and journals now emphasize confidence intervals over p-values because they provide richer information about data. Reporting both together gives the most complete picture. The p-value indicates significance while the confidence interval shows the range and precision of the effect.

By FluentFlash Research Team·Updated 2026-04-30

Hypothesis testing is a fundamental statistical method used in psychology research to determine whether observed data supports a theoretical prediction. Flashcards help you quickly recall critical concepts like p-values, significance levels, Type I and Type II errors, and various test procedures.

Breaking down complex statistical ideas into bite-sized, retrievable chunks enables distributed practice that strengthens long-term retention. This approach builds the conceptual foundation needed to apply hypothesis testing in real research scenarios and ace your research methods course.

Key Takeaways

•Hypothesis testing follows a five-step procedure: establish hypotheses, set significance level, conduct appropriate test, calculate p-value, and compare to alpha to make a decision.
•Type I errors (false positives) reject true null hypotheses; Type II errors (false negatives) fail to reject false null hypotheses. Type I error probability equals your significance level, while Type II relates to statistical power.
•Select statistical tests based on data characteristics: parametric tests (t-tests, ANOVA) assume normality and equal variances, while nonparametric tests are robust with fewer assumptions.
•P-values indicate the probability of observed results given the null hypothesis is true, not the probability the null or alternative hypothesis is true. Statistical significance differs from practical significance.
•Sample size and effect size independently influence hypothesis testing outcomes. Report both p-values and effect sizes for complete interpretation of research findings.
•Confidence intervals provide richer information than p-values alone by showing the range and precision of population parameter estimates, enabling better interpretation and comparison across studies.

Core Components of Hypothesis Testing

Hypothesis testing follows a structured five-step procedure that guides researchers through the decision-making process.

Step 1: Establish Your Hypotheses

You establish the null hypothesis (H0), which posits that no relationship or difference exists between variables. Then you create the alternative hypothesis (H1), which suggests a relationship or difference does exist. These opposing predictions frame your entire analysis.

Step 2 through 5: Test and Decide

Second, select a significance level (alpha), typically set at 0.05. This defines the probability threshold for rejecting the null hypothesis. Third, choose an appropriate statistical test based on your research design. Options include t-tests, chi-square tests, or ANOVA.

Fourth, calculate the test statistic and obtain a p-value. This represents the probability of observing your results if the null hypothesis were true. Finally, compare the p-value to your significance level.

Making Your Decision

If p is less than alpha, reject the null hypothesis. If p is greater than alpha, fail to reject it. Understanding each component ensures you can recognize when to apply specific tests and interpret findings correctly.

Flashcards excel here because you can create cards for each step. Breaking down the decision tree helps you internalize the logical flow before tackling complex problems.

Type I and Type II Errors: Decision Accuracy

When conducting hypothesis tests, two types of errors can occur. Understanding their implications is crucial for research integrity and study design.

Type I Error: False Positive

A Type I error occurs when you reject a true null hypothesis. You conclude a significant effect exists when none actually does. This is also called a false positive. The probability of committing a Type I error equals your significance level (alpha), typically 0.05. In medical research, false positives could harm patients by recommending ineffective treatments.

Type II Error: False Negative

A Type II error occurs when you fail to reject a false null hypothesis. You conclude no effect exists when one actually does. This is called a false negative, and its probability is denoted as beta. The complement of beta (1 - beta) is statistical power.

Power and Research Context

Statistical power represents the probability of correctly rejecting a false null hypothesis. Different research contexts require different error trade-offs. Medical research testing new treatments prioritizes minimizing Type I errors. Exploratory research might tolerate higher Type I error rates to discover new effects.

Flashcards help you quickly reference definitions, symbols, and relationships between these concepts. This prevents confusion that often arises from their similar names and interrelated nature.

Parametric vs. Nonparametric Tests and Selection Criteria

Choosing the correct hypothesis test depends on several characteristics of your data and research design.

Parametric Tests and Their Assumptions

Parametric tests include t-tests, ANOVA, and Pearson correlation. They assume data are normally distributed, have equal variances across groups, and represent interval or ratio measurement scales. These tests are generally more powerful when assumptions are met.

The one-sample t-test compares a sample mean to a population mean. The independent-samples t-test compares means between two unrelated groups. The paired-samples t-test compares means within the same group across two time points.

Nonparametric Alternatives

Nonparametric tests like Mann-Whitney U, Wilcoxon Signed-Rank, Kruskal-Wallis, and Spearman correlation don't require normal distribution assumptions. They work with ordinal or nominal data. When data violate parametric assumptions, nonparametric alternatives provide more accurate results.

ANOVA and Post-Hoc Tests

ANOVA extends t-tests to compare means across three or more groups. It requires post-hoc tests like Tukey's HSD to identify which groups differ. Creating flashcards that map research designs to appropriate tests, including their assumptions and alternatives, builds the decision-making schema you need during exams and projects.

P-Values and Statistical Significance

The p-value is perhaps the most misunderstood concept in hypothesis testing. Clarifying its precise meaning is essential for accurate research interpretation.

What a P-Value Actually Means

A p-value represents the probability of obtaining your observed results, or results more extreme, assuming the null hypothesis is true. Importantly, it does NOT represent the probability that the null hypothesis is true. It also doesn't represent the probability that your alternative hypothesis is true.

When a p-value is less than your significance level (usually 0.05), the result is statistically significant. This means the observed data are unlikely under the null hypothesis. This leads to rejecting the null hypothesis in favor of the alternative.

Statistical vs. Practical Significance

Statistical significance differs from practical significance. A large sample size can produce statistically significant results with trivial effect sizes. These have minimal real-world importance. Effect size measures like Cohen's d, eta-squared, and r quantify the magnitude of relationships or differences independently of sample size.

Modern research emphasizes reporting both p-values and effect sizes. A result can be statistically significant but practically negligible, or practically meaningful but statistically non-significant with small samples. Flashcards should include specific p-value interpretation scenarios and this distinction to help you avoid common misinterpretations that appear on exams.

Practical Study Strategies for Mastering Hypothesis Testing

Effective learning of hypothesis testing requires combining multiple study approaches rather than passive reading.

Create Concept-Linked Flashcards

Start by creating flashcards that link statistical concepts to real psychological examples. Pair the definition of Type I error with a concrete scenario like a false diagnosis in clinical psychology research. Use active recall daily through spaced repetition, reviewing difficult cards more frequently than mastered ones.

The spacing effect demonstrates that distributed practice produces better retention than cramming. Flashcard apps with built-in scheduling are particularly valuable for this purpose.

Apply Concepts to Real Problems

Work through practice problems involving real data analysis to move beyond memorization toward application. Create cards with sample scenarios requiring you to identify the appropriate test, predict the type of error most concerning in that context, and interpret hypothetical p-values.

Study with Others and Explain Aloud

Study with peers and explain concepts aloud, which strengthens neural pathways and reveals gaps in understanding. Record yourself explaining a procedure and listen while reviewing cards during commute times. This multimodal approach deepens retention.

Focus on Conceptual Logic

Understand the conceptual logic underlying each test rather than memorizing formulas. This approach enables transfer to novel problems. Use your flashcard app to quiz yourself with randomized cards, testing yourself in different orders. This prevents reliance on card sequence rather than genuine recall.

Start Studying Hypothesis Testing

Master statistical inference with interactive flashcards covering core concepts, formulas, test selection, and real-world applications. Build the strong foundation needed to ace your research methods exam and understand peer-reviewed psychology research.

Create Free Flashcards

Frequently Asked Questions

What's the difference between failing to reject the null hypothesis and accepting it?

This distinction is crucial in hypothesis testing. When your p-value exceeds your significance level, you fail to reject the null hypothesis. This means the data don't provide sufficient evidence against it.

However, failing to reject is not the same as accepting or proving it true. The null hypothesis remains unproven. You simply lack evidence to reject it. This could occur because no effect truly exists, or because your study lacked statistical power to detect a real effect (Type II error).

The language of hypothesis testing reflects this asymmetry. We reject or fail to reject hypotheses, never accept them. Understanding this distinction prevents overinterpreting null results and ensures you communicate findings accurately in reports and discussions.

How do sample size and effect size influence hypothesis testing outcomes?

Sample size and effect size have distinct but interrelated roles in hypothesis testing. Effect size measures the magnitude of the relationship or difference between variables, independent of sample size. Larger true effect sizes make them easier to detect statistically.

Sample size affects statistical power: larger samples provide more precise estimates and greater ability to detect effects. A large sample can yield statistically significant results even for tiny effect sizes. Meanwhile, a small sample might fail to detect large effects due to insufficient power.

This is why modern research reports both p-values and effect sizes. When planning studies, researchers conduct power analyses using expected effect sizes and desired power levels. These determine required sample size. As a student, understanding this relationship helps you interpret research appropriately. A significant p-value with a tiny effect size in a large study may be less meaningful than a marginally non-significant result with a large effect size in a small study.

Why would researchers choose a nonparametric test over a parametric one?

Researchers choose nonparametric tests when data violate the assumptions required for parametric tests. Parametric tests assume normal distribution, homogeneous variances, and interval/ratio measurement scales.

When working with ordinal data, heavily skewed distributions, or violations of homogeneity of variance, nonparametric alternatives provide more accurate results. Additionally, nonparametric tests make fewer assumptions about population distribution, making them robust across different scenarios.

However, parametric tests are generally more powerful when assumptions are met. This means they are better at detecting true effects. The trade-off involves losing some statistical power for robustness. Another consideration is sample size: nonparametric tests often require larger samples to achieve equivalent power.

In practice, researchers examine data distributions, conduct normality tests, and choose accordingly. Understanding when each test family is appropriate prevents inappropriate analysis choices that could invalidate research conclusions.

What does a confidence interval tell you that a p-value doesn't?

While p-values indicate whether results are statistically significant, confidence intervals provide the range of plausible population parameter values. A 95 percent confidence interval means that if you repeated your study many times, approximately 95 percent of the intervals would contain the true population parameter.

Confidence intervals communicate both the point estimate and the precision of that estimate. A narrow interval indicates precise estimation, while a wide interval suggests greater uncertainty. Unlike p-values, confidence intervals directly estimate effect magnitude and allow comparison across studies.

If a confidence interval doesn't include zero (for comparisons) or one (for ratios), the result is statistically significant at the corresponding alpha level. Many statisticians and journals now emphasize confidence intervals over p-values because they provide richer information about data.

Reporting both together gives the most complete picture. The p-value indicates significance while the confidence interval shows the range and precision of the effect.

How can I avoid confusing similar hypothesis tests like t-tests and ANOVA?

Creating a visual decision tree on flashcards helps distinguish these tests. The key difference involves the number of groups being compared. Use t-tests for comparing exactly two groups (independent samples, paired samples, or one-sample). Use ANOVA when comparing three or more groups on a continuous outcome.

Both assume normal distribution and interval/ratio data. However, ANOVA tests whether means differ across multiple groups simultaneously rather than in pairs. A critical advantage of ANOVA is controlling Type I error: conducting multiple t-tests inflates error rates, while ANOVA maintains your chosen alpha level.

If ANOVA shows significance, you conduct post-hoc tests like Tukey's HSD to identify which specific groups differ. Make flashcards showing example scenarios for each test, including the number of groups and independent variables involved.

Practice determining which test applies before performing analyses. This systematic approach prevents the common mistake of running multiple t-tests when ANOVA is appropriate.