Introduction to Correlation and Regression Analysis
We've looked at the interaction effect between two categorical variables. We'll keep working with our trusty General Social Survey data set. continuous variable) and gender (a categorical, dummy coded variable) as our two predictors. What we want to know is, does the gender difference in income differ based. In statistics, they have different implications for the relationships among. Association is usually measured by correlation for two continuous variables and by The following examples show three situations for three variables: X1, X2, and Y. . know if the correlation is but statistically significant between the 2 variables. Note that these relationships could just as easily, but perhaps less intuitively, be conversely phrased: "the simple effect of the categorical.
Each section gives a brief description of the aim of the statistical test, when it is used, an example showing the Stata commands and Stata output with a brief interpretation of the output.SPSS: Calculating a Correlation between a Nominal and an Interval Scaled Variable
You can see the page Choosing the Correct Statistical Test for a table that shows an overview of when each test is appropriate to use. In deciding which test is appropriate to use, it is important to consider the type of variables that you have i.
How to interpret interaction between a dummy and a continuous variables in ols? - Cross Validated
About the hsb data file Most of the examples in this page will use a data file called hsb2, high school and beyond. This data file contains observations from a sample of high school students with demographic information about the students, such as their gender femalesocio-economic status ses and ethnic background race.
It also contains a number of scores on standardized tests, including tests of reading readwriting writemathematics math and social studies socst. You can get the hsb2 data file from within Stata by typing: For example, using the hsb2 data filesay we wish to test whether the average writing score write differs significantly from We can do this as shown below.
We would conclude that this group of students has a significantly higher mean on the writing test than See also Stata Class Notes: Analyzing Data One sample median test A one sample median test allows us to test whether a sample median differs significantly from a hypothesized value. We will use the same variable, write, as we did in the one sample t-test example above, but we do not need to assume that it is interval and normally distributed we only need to assume that write is an ordinal variable and that its distribution is symmetric.
We will test whether the median writing score write differs significantly from See also Stata Code Fragment: Descriptives, ttests, Anova and Regression Binomial test A one sample binomial test allows us to test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value.
See also Chi-square goodness of fit A chi-square goodness of fit test allows us to test whether the observed proportions for a categorical variable differ from hypothesized proportions. We want to test whether the observed proportions from our sample differ significantly from these hypothesized proportions. To conduct the chi-square goodness of fit test, you need to first download the csgof program that performs this test. You can download csgof from within Stata by typing search csgof see How can I used the search command to search for programs and get additional help?
Example - Correlation of Gestational Age and Birth Weight A small study is conducted involving 17 infants to investigate the association between gestational age at birth, measured in weeks, and birth weight, measured in grams.
Introduction to Correlation and Regression Analysis
We wish to estimate the association between gestational age and infant birth weight. In this example, birth weight is the dependent variable and gestational age is the independent variable. The data are displayed in a scatter diagram in the figure below. Each point represents an x,y pair in this case the gestational age, measured in weeks, and the birth weight, measured in grams.
Note that the independent variable is on the horizontal axis or X-axisand the dependent variable is on the vertical axis or Y-axis. The scatter plot shows a positive or direct association between gestational age and birth weight. Infants with shorter gestational ages are more likely to be born with lower weights and infants with longer gestational ages are more likely to be born with higher weights. The formula for the sample correlation coefficient is where Cov x,y is the covariance of x and y defined as are the sample variances of x and y, defined as The variances of x and y measure the variability of the x scores and y scores around their respective sample meansconsidered separately.
The covariance measures the variability of the x,y pairs around the mean of x and mean of y, considered simultaneously. To compute the sample correlation coefficient, we need to compute the variance of gestational age, the variance of birth weight and also the covariance of gestational age and birth weight. We first summarize the gestational age data.
The mean gestational age is: To compute the variance of gestational age, we need to sum the squared deviations or differences between each observed gestational age and the mean gestational age.
The computations are summarized below.
- What statistical analysis should I use? Statistical analyses using Stata
The variance of gestational age is: Next, we summarize the birth weight data. The mean birth weight is: The variance of birth weight is computed just as we did for gestational age as shown in the table below. The variance of birth weight is: Next we compute the covariance, To compute the covariance of gestational age and birth weight, we need to multiply the deviation from the mean gestational age by the deviation from the mean birth weight for each participant i.
Notice that we simply copy the deviations from the mean gestational age and birth weight from the two tables above into the table below and multiply.
The covariance of gestational age and birth weight is: We now compute the sample correlation coefficient: Not surprisingly, the sample correlation coefficient indicates a strong positive correlation.