Last fall we engaged Dr. Jing Liu, Ph. D, a Postdoctoral Research Associate, at the Brown University School of Education to complete an independent study on the effectiveness of BookNook used with students at five of our partner elementary schools.
When we compared BookNook students to their peers, the BookNook students made significantly more improvements in reading over the course of the school year. One school even gained almost 13 percentage points on their standardized test. What does this mean for educators?
The results of the study indicate that BookNook is an effective tool in supporting the mastery of literacy for elementary students
“When we looked at all BookNook students together across the schools, there was a positive effect with a magnitude of 1.8 standard deviations.”
In simple terms this means BookNook is taking kids above and beyond the average. When implemented effectively, BookNook fosters measurable success in reading outcomes among early readers.
The modern education system is all about added value: added value of a program, an intervention, a teacher. The results of Dr. Liu’s study indicate that BookNook is an effective tool in supporting the mastery of literacy for elementary students. Our students, who began significantly behind their peers, made sizable strides toward literacy mastery.
Download the report to learn why BookNook is quickly becoming a favorite guided reading solution for schools and non-profit programs!
First let’s look at the individual schools. Of the five schools included in our study, three showed positive effects and were statistically significant! Jing’s results show BookNook helped children from School II gain almost four full points on their standardized tests, equivalent to 13 percentage points. Meanwhile, our two KRR schools showed a similar growth of almost twelve points, equivalent to 8.6 percentage points of their standardized test.
When we looked at all BookNook students together across the schools, there was a positive effect with a magnitude of 1.8 standard deviations. This means, on average, BookNook helped students gain a letter grade in their ELA standardized assessments.
While two schools did not show the same positive gains, we are concerned about their results, but not discouraged. In School I, the BookNook results did not show statistical significance, or even close to the .05 threshold, meanwhile, the other interventions used showed a negative influence that was statistically significant. This would indicate to us that there might have been some implementation issues that should have been addressed, but the results should not discourage us from promoting our product. There is a similar case for School III. Perhaps even more exciting than our individual school results are the results of the combined schools test.
In statistics, increasing the population of a study increases the weight of the results. If you think about an experiment, the more times you do the experiment and get the same result, the more you can trust that the independent and dependent variables are related. Therefore, Dr. Liu combined the smaller populations of the five schools to simulate the results of one large population. Because the schools used different assessments, he standardized them to create comparable results and they show that BookNook improved test scores across the schools by almost two whole standard deviations! But what does that actually mean in real terms, well…
The mean score is a fancy way of saying the average. Standard deviations are ways of relating all the observations that create an average to each other. The classic example is to think of the bell curve (a normal distribution). The majority of observations fall under the bell, and the odd balls, the very high or very low observations fall further away. One standard deviation away from the average (mean) captures the majority of all observations (68% to be exact), two standard deviations captures 95% of all observations, three captures 99%.
Let’s look at an example. If the average score on a literacy test is 75 and I know that 68% of all test scores fell between a 70% and an 80%, that would mean one standard deviation from the mean was five percentage points. If a BookNook student took that test, they would score 1.8 standard deviations from the mean, a score of 84%. As Dr. Liu says, 1.8 standard deviations is a “gigantic effect size,” and a phenomenal indicator that we are improving students’ literacy skills. In even simpler terms, these results indicate we are helping kids get results well beyond the average of their peers.
Now we recognize that this coefficient is not statistically significant at the .05 degree most statisticians would like for the full stamp of approval, but we are statistically significant to the .1 degree. In the words of my old professor Dr. Rebecca Maynard, yes 90% is not the 95% convention, but it would be pretty foolish not to think those findings have impacts worth pursuing.
The final point we want to highlight from this study are the students. In our study Dr. Liu found that BookNook students on average “are academically much weaker than their peers.” Our mission at BookNook is to serve any student who is struggling in their literacy development, but our primary goal is to help those with the greatest need. These results suggest we are doing just that!
These results demonstrate an incredibly promising future for BookNook. As we improve our platform by incorporating other best practices as well as improve the training and development processes of reading guides, we believe these results will continue to improve. This is incredibly exciting news for the team because it means our platform is making real changes in our students.
Why did we want a study?
At BookNook, we believe that through our platform we can provide educators a tool that amplifies their skills and meets students at their individual level. To make sure we are providing a tool that is truly effective we want to incorporate and reflect evidence-based practices as much as possible. By funding a third-party evaluation of our own platform, we want to demonstrate that we are true to these values and transparent about the model we promote.
Making sense of statistical jargon.
Unfortunately for researchers (and educators), education does not happen in a vacuum. Children are all unique and they do not live in petri dishes. It is therefore very hard to do that traditional science experiment where we can control everything except our one variable, BookNook.
Instead, researchers must use lots of fun statistical techniques to isolate the real influence of interventions in our kids’ lives. In his report, Jing Liu used a common method of the OLS regression model to calculate the influence BookNook had on the students in the study compared to other variables. Put most basically, a regression model takes all the independent variables represented by a child, like sex, race, socio-economic background, attendance, and BookNook; and compares their influence on the dependent variable, for us a test score.
The equation might look like this:
Student’s test score = influence of age + influence of sex + influence of BookNook + influence of attendance
Within these models there are two very important qualifiers to each independent variable: effect size and statistical significance. Effect size is the magnitude of influence one independent variable has compared to others. Through effect size, the model is trying to tell a story, explain what exactly are the factors within an outcome and which are the most important. From our study we wanted to determine first, if BookNook had a positive or negative influence on student reading scores and second, to what degree.
In science jargon, statistical significance is the degree to which we believe the results did not happen by chance, the degree to which we believe it is safe to reject the hypothesis that our independent variable would have no effect on our dependent variable. In plain English, statistical significance is the test of, do we believe that our results are accurate in explaining an outcome? Within a statistical model, every output gets tested for statistical significance, so every effect size has their own test for significance.
You will see statistical significance represented as a percentage and it will never, ever be 0%. This is because in the statistical world, there is always the possibility that if we ran this experiment, random chance could explain our results. The convention within the statistical community is that you promote results (known as a p-value) of .05 or less, meaning we are 95% sure that what we see is not due to chance. If we got a p-value of .4, that means we are 60% sure that what we are seeing not due to chance; that result holds too much uncertainty and we would say ‘our finding is not statistically significant.’ The closer your p-value gets to 0.00, the more and more confident you can be.
Putting these two things together, in an evaluation we are looking for positive effect sizes and that those effect sizes are statistically significant.