Datapalooza in the age of EdTech
I've written about this video a few times over the past few years, but feel compelled to trot it out once again because of the impending deluge of EdTech investment in our public schools. EdTech is a dream come true for data miners like Knewton promising "the most advanced approach to personalized learning." This post includes a script with featuring frightening quotable quotes like this gem: "Well education happens to be today the world's most data minable industry by far and it's not even close."
While the presentation looks as if it could be a comedy spoof from a B-list move, it's not. It's real, and it's horrific. This video has been around since 2012, and during that time it hasn't lost an iota of its "theater of the absurd " quality.
"Well education happens to be today the world's most data minable industry by far and it's not even close."
Check out Knewton CEO Jose Ferreiera talking about personalized learning and its predictive capabilities. Predictive analytics is all the rage in the EdTech world.
Below is the full transcript.
5]5Knewton - Education Datapalooza
Office of Ed Tech
So the human race is about to enter a totally data mine existence and it's going to
be really fun to watch. It's going to be one of those things where our grandkids
are going to tell our kids I can't believe you grew up in a world like that just the
way our kids complained that we went to record stores. When Tom Cruise walks
through the mall in Minority Report and the ad beams right to his eyes and say
"Hey Mr. Cruise you should you go on that Caribbean vacation you've been
thinking about." I know some entrepreneurs who work on that
technology right now. And I'm still waiting for the day when my refrigerators
going to know when I'm running out of milk and it's ordered for me automatically
on Fast Track. I think that day's coming in a few years it's not far off. The world
in 30 years is going to be unrecognizably data mined. So what does that man for education?
Well education happens to be today the world's most data minable industry
by far and it's not even close. So maybe one day healthcare will be up there
when they have little nanobots that are in your bloodstream that are doing real
time analysis, but until then it's not close, education beats everything else hands
down. So let's look at other big data industries.
The really big data industries in the world right now are not surprisingly on the internet
because that's where it's easy to grab the data and that's also where the congregation
of talent that understands data. So well let's just look at it by the
numbers because the name of the game is Data Per User. So one of the things that fake
us out about data and education is education because it's so big, it's like the
fourth biggest industry in the world that produces incredible quantity of data. But
data that just produces one or two points per user per day is not really all
that valuable to an individual user. It might be valuable to like a school district administrator,
but maybe not even then. So let's just compare. Netflix and Amazon
get in the ones of data points per user per day. Google and Facebook get in the tens
of data points per user per day. So you do 10 minutes of messing around in Google
you produce about a dozen data points for Google. Okay great. So Newton today
gets five to ten million actionable data per student per day.
Now we do that because we get people, if you can believe it, to tag every single
sentence of their content so publishers, we have a large publishing partnership
with Pearson, and they tag all their content. And we're in open standard so
anyone can tag us. If you tag all your content and you do it down to the automatic
concept level, down to the sentence, down to the clause, you unlock an incredible
amount of trapped hidden data. Why do you do that? Well if you use programmatic
taxonomy models and item response theory and I think at the bottom,
we haven't given that a name yet, what you figure out is everything in education
is correlated to everything else down to the concept. Now this is where education's
different from search and social networking. If someone tagged every
single line, every single sentence of all the world's web pages for Google, or every
single line of dialogue from Netflix, which no one's done, but even if they had
they're not really a whole lot of interesting correlations there.
Everything in education is correlated to everything else. Every single concept is
correlated in a predictable way to everything else using psychometrics right. So if
you do 10 minutes of work in Google you produce a dozen data points for Google.
Because everything that we do is tagged at such a grandeur level if you do 10
minutes of work for Newton you cascade out lots and lots of other data, and
here's why. When you took the SAT there might be 40 different concepts about
equal auto triangles that are tested on all the SATs ever given in any one year.
But you didn't get all 40 questions you got two questions on equal auto triangles
because they figure if you're in the Top 14th percentile at those two questions,
13th percentile on this one and 15% on that one, if you're in the Top 14%
percentile on those two questions in equal auto triangles the odds are 98%
percentile chance that you're in the Top 14% percentile at every concept and
equal auto triangles. And there's a 96% chance that you're in the Top 15%
percentile about all triangle concepts, three, four five, 30%, 60%, 90%, asceles,
etc., etc. You did a little bit of work for Newton and
we used just established signs of psychometrics to cascade out hundreds of other
data. So we can produce incredible quantities of data per user per
day. It's really, really hard to get that, but if you can get all that tagging done Ð
and that's one of our tags is on Ð that's a
small part of our overall taxonomy, that's just part of one course and we have
dozens of taxonomies, then you can do this. What you can do with the data if you
actually do all that work is you can figure out exactly what students know and
how well they know it. You can figure it out down to the percentile versus the rest
of the population. So Newton students today we have about 180,000
right now, by December it'll be 650,000, early next year it'll be in the millions
and the next year it'll be closer to 10 million, and that's just through our Pearson
partnership. So for every one of the students we can figure out within a few
hours what they're strong at and what they're weak at, at the beginning of
the course. So we can produce a unique syllabus for each student each day, literally
unique. There's not enough time in the universe for any two students to have
the same syllabus on any one day, that's how many there are.
So it's optimized for each kid down to the atomic concept. And then we can figure
out things like well here's your homework tomorrow night, you're going to
struggle with that homework or you're going to fail it, because concepts in that
homework that we know you haven't mastered the previous concepts for that
build up to that. Or there's concepts in that homework that [inaudible 04:53] very
highly concepts always have trouble with. So we know you're going to fail, we
know it in advance and we can prevent it in advance. We go grab some content
from somewhere else in the portfolio and going to seamlessly blend that into your
homework tonight. So every kid gets a perfectly optimized textbook,
except it's also video and other rich media dynamically generated in real time.
And it also uses the combined data power of the entire network. So here's what
I mean by that, like I said next year we'll have close to 10 million students, a
few years from now we'll have a 100 million. A 100 million first shows up to learn
something like rules of exponents or subject per agreement, whatever. We take the
combined data problem all hundred million to figure out exactly how
to teach every concept to each kid. So the 100 million first shows up to learn
the rules of exponents, great let's go find a group of people who are psychometrically
equivalent to that kid. They learn the same ways, they have the same learning
style, they know the same stuff, because Newton can figure out things like
you learn math best in the morning between 8:40 and 9:13 am. You learn science
best in 42 minute bite sizes the 44 minute mark you click right [inaudible 05:47],
you start missing questions you would normally get right.
You learn social studies best with video clips or 22% video to 78% taxed or
whatever your optimal cocktail. We can tell when we should return content to
you for optimal retention. We literally know everything about what you know and
how you learn best, everything because we have five orders of magnitude and
more data about you than Google has. We literally have more data about our
students than any company has about anybody else about anything, and it's not
even close. That's why we can do all that stuff right.
So then what we can do is take that profile the 100 million kids, next it'll be 10
million. We can go figure out okay whose exactly like that kid? Whose learning
styles up and down the line are just the same? Who knew the same stuff at the
same level of mastery when they had [inaudible 06:24]? Great. Statistically
speaking it has to be the case that some 5% or 10% through shared bad luck did
the absolute wrong thing for themselves without knowing it.
They did questions that were too hard, that got discouraged, they bounced. They
accessed text they should have gotten the video, whatever. It also has to be a fact
or statistics that through pure blind luck, some Top 1% the absolute perfect thing
for themselves without realizing it. And we go take the whole combined data
power that network of millions, soon to be tens of millions, eventually it'll be
hundreds of millions of people. And for every single concept that your child learns
2000 concepts in a particular semester along math course, for every single
autonomic concept we take the combined data part, that vast network and use it
to fund perfect plan forward for that kid for that concept.
So that's what we do right now. Let me give you a couple of examples. This is one
student. There's a few hundred learning clusters there, there's a few tens of
thousands of autonomic learning objects there. That's one student's path, this is a
real student in a US college right now. And you'll see that each student has a
totally different path. Some students have short paths, some have long paths, in
this particular course there were students who finished it in 14 days, there were
students who finished it in two semesters. This is a course at ASU they had to change
their semester structure to a modulate semester structure because we were suddenly
telling them things like if you give this woman here the final right now she'll
get an A, it's only 14 days into the course. I promise you she'll get an A. You
can keep her in that seat if you want, and that's what we've always done now we don't
have to. So let's show you this. This is a 150 student's one class and they
kind of all look like fleas but that's all an
individual learning path. Notice that some of them are going really fast, some of
them are going really slow, and then they'll all kind of speed up when the test
comes. It's kind of like organic and so those different color coded things are like
concept clusters. Like some test obviously just happened, that's why they all
started working. And you can look at some of those students and think boy that
pure schmuck is really in a lot of trouble because they're going too slowly.
So where we think we're going with this obviously it's in market right now. We're
going to be in K-12 starting next year and it's an open platform anyone can plug it
in and use it by APIs. And where we think we're going with the data side of it,
which is the really fun stuff for today, is we think within a few years we'll be able
to start predicting great performance. So teachers grade persistently year in and
year out, if that teacher grades consistently we can match up the student profiles
down to the autonomic concept levels versus great performance.
We can tell you you're on track to get a B- in this course right now. Either that or
if your teacher gets totally inconstant we can't tell you that, but that's another
problem. If your teacher grades consistently we can tell you what your grade's
going to be based on what you know and how fast you're learning it. But if you do
another 30 minutes a day for three days a week you can get it up to an A-. We can
tell you things like that. We're really excited to correlate with other people's
datasets by open API things like, something we've talked about as kind of a joke
but it really should work, is like the food diary. You tell us what you had for
breakfast every morning at the beginning of the semester, by the end of the
semester we should be able to tell you what you had for breakfast because you
always do better on the days you have scrambled eggs or whatever. And more
importantly we should be able to tell you what you should have for breakfast.
So the power of data when you unlock millions of data points per user per day
you can accomplish things that people aren't even conceiving of right now. But
that world is coming we're trying to bring it to you and we're going to be an open
system to allow anyone to just plug that data, take it out, and then plug it back in.
Thanks very much.
© 2012 OfficeOfEdTech Page 8