Data vs Privacy
This is the third year in a row that I've chosen "data" as one of the "top trends" in ed-tech. (See 2011, 2012) If you're looking for a sunnier view of data in education, read those. 2013, in my opinion, was pretty grim.
Edward Snowden: Not TIME Person of the Year
TIME Magazine announced its Person of the Year this morning: Pope Francis. He seems like a pretty swell guy, don’t get me wrong. But many folks have argued it’s a dull even cowardly decision by the magazine. (Of course, its other recent selections include Barack Obama, Vladimir Putin, Ben Bernanke, and Mark Zuckerberg. TIME is not really known for bold choices, let’s be honest).
The appropriate choice for Person of the Year, some argue, would be Edward Snowden, who along with the journalist Glenn Greenwald, is certainly responsible for the most important story of the year: revelations about widespread government surveillance by the National Security Agency – the collection of massive amounts of data from telephone and technology companies. “Email, video and voice chat, videos, photos, voice-over-IP chats, file transfers, social networking details, and more” siphoned from Apple, Google, Facebook, Microsoft, Yahoo, Skype, AOL, and others. Encryption undermined. Malware spread. Our social connections mapped. Warrantless spying by governments (not only by the US’s) – not just on suspected terrorists, but on all of us.
Interestingly I heard very little outcry from ed-tech proponents about the troubling implications of NSA surveillance via the technologies that are being pushed in schools, about the impact that this might have on students’ privacy, – hardly a peep from those who have gone “all in” with Google Apps or iPads or YouTube for Schools or Skype in the Classroom or Facebook.
That’s not to say that there weren’t any red flags raised this year about data collection, data mining, and privacy. But often, these were concerns about corporations‘ use of student data and not governments’. The Snowden revelations should serve as a reminder that the two are inseparable.
And perhaps some educators’ excitement about tools like Google Glass should serve to remind us too that just as an uncritical embrace of “ooo! shiny!” runs deep in some ed-tech circles, a culture of surveillance runs deep in schools as well.
Surveillance and Ed-Tech
Google Glass became available to a small number of “explorers” this year – including a handful of educators – who paid $1500 for the privilege of testing the wearable computing devices. Glass has been hailed by some as a “cybernetic sensory organ.” But it is a sensory organ that delivers its data to a corporate entity (a corporate entity that the government has tapped, so we’ve learned). This extraction of personal data – for the sake of profit or improved marketing or better algorithms – is a process that has spurred very little critical response among ed-tech proponents when it comes to the adoption of software and hardware. There are very few questions about data: who owns education data, who analyzes education data, who uses it. (More on that below.)
As ACLU’s Christopher Soghoian recently tweeted that “Google built one of the largest surveillance networks in the world. Of course the NSA was going to find a way to use it too.” I might add, “of course schools will try to use it as well.”
For many educators enthralled by Google Glass, it’s the unobtrusive and hands-free camera that they frequently tout the most. They shrug off privacy concerns, saying that students already have cameras in the classroom via their various computing devices. But there are many important distinctions here – Glass’s photos and metadata that are automatically shared with Google (and thanks to Google’s Terms of Service, users’ data consolidated across all Google services); it is much easier to take photos surreptitiously with Glass; and Glass is “always on” surveillance. Surveillance and sousveillance practices foster coercive and exploitative learning spaces. As Jeremy Bentham might argue, that’s a feature, not a bug.
Interestingly, having more surveillance cameras in the classroom is one of the goals laid out by Bill Gates this year as part of the Gates Foundation’s efforts to implement a $5 billion teacher monitoring and measurement system – one that includes installing cameras in every classroom in the US. (The Gates Foundation is, of course, best known for its funding of healthcare and education initiatives. But it also invested this year in the security company G4S. Again with the obligatory Bentham nod, I guess, eh?)
Other surveillance efforts undertaken by school districts this year:
Glendale Unified School District hired Geo Listening to monitor students on social media – all their public social media posts, even those made off-campus and after school hours. “For safety,” insists the school (the same reason of course, the NSA gives for monitoring our data too).
Schools in West Cheshire (UK) and Longmont (Colorado) used RFID chips and GPS tracking systems in students’ IDs and bus passes in order to track their locations. A student in San Antonio, Texas, suspended for refusing to wear an RFID-enabled ID, sued her school claiming that it violated her religion, but she lost the case. The district later dropped the RFID program, finding it uneconomical.
The universities of Sunderland and Ulster installed biometric monitoring systems on their satellite campuses to track if students – international students not British ones – are attending lecture.
Alabama University announced that it would use drones to monitor students on campus. Chicago Alderman George Cardenas suggested that the city deploy drones to monitor the city’s “Safe Passage” routes used by children to get to and from school.
It’s the normalization of military and police technology, you might argue, disguised as consumer and ed-tech: drones delivering Amazon packages, drones delivering textbooks, fingerprint scanners on Apple devices, any number of surveillance accessories and practices that parents can use on their children.
Don’t Worry. It’s “Just Metadata”
In the early days of the Snowden-NSA story, President Obama tried to reassure people that the government wasn’t actually reading their email or listening to their phone calls. “Just the metadata,” he insisted.
But analyzing metadata – even without looking at the explicit content of a message – is incredibly revealing. Who you emailed. How often. The IP address from which a website was accessed. Who you called. How long you talked. The geolocation of your cellphone. The patterns that all of these form, particularly when gathered at scale. Metadata is the message, argues Wired Magazine’s Matt Blaze.
At such a scale, people’s intuition about the relative invasiveness of content and metadata starts to fail them. Phone records can actually be more revealing than content when someone has as many records and as complete a set of them as the NSA does.
Voice content is hard to process. It ultimately requires at least some human analysis, and that inherently limits the scale at which it can be used, no matter how much raw material the NSA might have. Intelligence agencies are famously backlogged in translating and analyzing even high-priority intercepts. More content only makes the problem worse.
Metadata, on the other hand, is ideally suited to automated analysis by computer. Having more of it just makes it the analysis more accurate, easier, and better. So while the NSA quickly drowns in data with more voice content, it just builds up a clearer and more complete picture of us with more metadata.
But that’s not the most revealing thing about metadata, or the only reason to be concerned about the privacy implications of a massive call records database. Metadata ultimately exposes something deeper, far more than what a target is talking about.
Metadata is our context. And that can reveal far more about us — both individually and as groups — than the words we speak.
Such is the promise of “big data” and analytics at scale. Such is the promise of big educational data and learning analytics at scale.
What Are and Who Owns Education Data?
Many people still consider “education data” to be simply what we’ve thought of as an individual student’s educational record: name, home address, grade level, dates of attendance, final grade – the sort of stuff that appears on a report card. But thanks in no small part to our increasing use of technology, education data is so much more – so much more “metadata.”
Students’ search engine history. Learning management system log-ins and duration of their LMS sessions. Blog and forum comment history. Internet usage while on campus. Geolocation. Emails sent and received. Social media profiles, the frequency of social media profiles, and their “influence.” Pages read in digital textbooks. Videos watched on Coursera or Khan Academy or Udacity, along with if and where they paused it. Exercises completed on any of these platforms. Keystrokes and mouse clicks logged. (That last item, along with biometric data, is how Coursera said it plans to verify students’ identities as part of its “signature track.”)
Again and again and again this year I’ve tried to ask “who owns education data?” Who controls it? Who sells it? Who analyzes it? To what end? Who gets to learn from it? (The answer in almost all cases is not “the student.”)
A brief look at some of what we’ve learned from “the data” this year (granted, much of this from pretty “traditional” sources):
College enrollment is down; the US News & World Report‘s college rankings are still worth ignoring; teens do pay attention to privacy and mobile apps; SAT scores remain flat; the majority of students in public schools in the American South and West are now low income; Division I public universities’ spending on athletics is growing faster than their spending on academics; state universities are giving a growing share of financial aid support to wealthier students; 95% of teens use the Internet; most MOOCs have a completion rate of around 13%; teacher job satisfaction is at a 25-year low; per student public education spending in the US dropped for the first time in almost four decades; parents still think libraries are important no matter what crap Techcrunch tries to argue; 40 states have suspected cheating on K–12 standardized tests; PISA scores can probably confirm whatever education narrative you want to tell; the same probably goes for NAEP scores; the elite Hunter College High School is the saddest place in New York (based on a sentiment analysis of the city’s Tweets, at least); American adults don’t do well on OECD math tests; and journalists love to misconstrue academic research when it can provide them with a titillating headline like “Tenured Professors Make Worse Teachers.” Maybe we’ll do better in 2014 when data guru Nate Silver, who quit his gig at The New York Times this year, launches the new Five Thirty Eight blog. He did suggest in a Reddit Ask Me Anything this year that he might write more about education data (and hopefully that doesn’t just mean writing about college sports stats, now that he’s working for ESPN).
Public Data / Personal Data
One of the great challenges we face with collecting and analyzing education data is that it often exists in a murky and uncomfortable overlap between the public and the personal. When we push to open data from the former, we must weigh the implications for the latter – we must weigh the ethics and consider the politics of our data initiatives. Open data, while it claims to promote more governmental transparency, is not apolitical.
We can see this in the public records requests for emails relating to Facebook CEO Mark Zuckerberg’s $100 million donations to Newark, New Jersey, for example, and for emails from former Indiana and Florida school chief Tony Bennett, revealing his move to change the grade of a campaign donor’s school.
We can see this too in the ongoing attempts by many local newspapers to print teachers’ VAM (value-added model) scores, despite the widespread recognition that these models are quite flawed: The LA Times, The Florida Times-Union, The Boston Globe, The Cleveland Plains Dealer all requesting districts provide them with teachers’ names and scores (and sometimes suing when districts refused) so they could publish them publicly.
And we can see this – and we’ll see more of it in 2014, I’m sure – in the call by President Obama to “enlist entrepreneurs and technology leaders with a ‘Datapalooza’ to catalyze new private-sector tools, services, and apps to help students evaluate and select colleges.” Collecting, measuring, analyzing data – “data-driven decision-making” – is a cornerstone of the Obama Administration’s education policies at both the K–12 and higher education level.
Privacy, Data, and the Law
OK, sure. The NSA’s surveillance program might have made much of this moot, but there are laws that purport to protect students’ and children’s data. Some legal and legislative updates this year:
A revised COPPA (the Children’s Online Privacy Protection Act) went into effect on July 1. The update clearly reflects lobbying efforts by tech companies as contextual advertising is now exempt – data can be collected from minors without parental permission using this method). But oh! Lest we think that the FTC doesn’t care a whit about kids’ privacy (snicker), it did fine Path $800,000 this year for letting kids under 13 sign up.
In November, Senators Edward Markey (D-MA) and Mark Kirk (R-IL) and Representatives Joe Barton (R-TX) and Rep. Bobby Rush (D-IL) re-introduced their Do Not Track Kids Act, an attempt to extend COPPA provisions to make it tougher to disclose kids’ data, particularly around geolocation and to create an “eraser button” for kids data.
Speaking of erasers, California passed a bill that would do just that: require Web companies (starting in 2015) to remove online activity should a minor in the state request it. (A good idea in theory, perhaps, but there are lots of problems with how this will actually work.)
A bill was proposed in Massachusetts that would, according to Wired, “ban companies that provide cloud computing services from processing student data for commercial purposes.” Turns out the bill was backed by Microsoft in an attempt to unseat Google Apps from schools in the state. Like a lot of recent things Microsoft, the bill went nowhere.
The Atlanta Public Schools cheating scandal started to wind its way through the courts this year, with a former elementary teacher pleading guilty to obstruction of justice and an administrator being acquitted of witness tampering.
And lest one think legislation about student data has all been written and submitted, Education Week suggests that this will be a major push of the corporate lobbying group the American Legislative Exchange Council (ALEC) in 2014. It will push legislation that would require states to have a chief privacy officer to monitor student data collection. (There are other proposals out there regarding CPOs, incidentally, ones that more privacy-focused.)
Data as “The New Oil”
Privacy concerns and legal protections aside, lots of people are betting on “big data” to “fix” education, to offer insights into how people learn, and/or to make a neat profit.
McKinsey issued a report in October arguing that opening up education data could have a potential value of $890 billion to $1.18 trillion. But Common Sense Media cautioned against doing so at the expense of children’s privacy.
If data really is “the new oil,” then we should probably pay attention to data spills – that is, data leaks. FSU admitted this year that it had leaked data from over 47,000 student teachers-in-training. The personal data of some 72,000 past and present employees of the University of Delaware was leaked. One security company said that these sorts of leaks were facilitated, in no small part, by the fact that a quarter of higher ed institutions transmit sensitive data without encryption.
If data is “the new oil” we should probably think about the security of our mining practices. The New York Times questioned the data security of Edmodo in a story this summer, for example, prompting the company to switch on SSL for all users.
So once mined and drilled and extracted and processed, what does all this data give us? “Adaptive” technology! “Personalized” software! Algorithms! Recommendations! Analytics! Insights!
Oh, and if you’re a company selling something that uses “data” in your slide deck to investors, perhaps a nice chunk of funding:
Panorama raised $4 million from Mark Zuckerberg’s Startup: Education fund (the startup offers a survey tool to schools). Clever raised $10 million to standardize APIs for school information systems and “unlock and share” student data. Junyo, which pivoted last year away from selling schools dashboards to selling schools’ data to other companies, acquired a database of K12 grants – “market intelligence.” Pearson acquired Learning Catalytics, a learning analytics company co-founded by Harvard professor Eric Mazur. Kidaptive raised $10.1 million and launched its adaptive learning tools, including an iPad app so parents can track their kids’ development. KnowRe raised $1.4 million for its “adaptive learning” platform. McGraw-Hill acquired a 20% equity stake in Area9 which is helping it build out its “adaptive learning” platform. Desire2Learn acquired DegreeCompass, a tool that offers “personalized” course recommendations to students. Desire2Learn also acquired Knowillage for its “adaptive learning” technology. Knewton expanded its “adaptive learning” platform, partnering with Houghton Mifflin Harcourt and Macmillan.
A couple of important hiccups in the mining process this year:
Course Signals / Error Signals
Purdue University’s Course Signals is probably one of the best known products in the relatively new field of learning analytics. The software uses predictive modeling to give students a red, yellow, or green “traffic light,” informing them of whether they’ll pass or fail a class. It’s been shown to be quite good at helping students improve their grades. Not all courses at Purdue use Course Signals (it’s integrated into the LMS), but this fall the university issued a press release claiming that the software has a long-term effect on students and “boosts graduation rate 21 percent.”
Mike Caulfield was one of the first to suggest that the math “doesn’t add up” and that the experiment might suffer from a “reverse-causality” problem – something that led to inquiries by Michael Feldstein, Alfred Essa, and Doug Clow (among others), along with questions about the ethics of the university and even the future of the field of learning analytics. (A lengthy “explainer” by Caulfield can be found on the e-Literate blog).
While this might sound like a minor glitch in research or PR, it’s a pretty significant stumble. As Feldstein argues,
This is a problem that goes well beyond Course Signals itself for several reasons. First, both Desire2Learn and Blackboard have modeled their own retention early warning systems after Purdue’s work. For that matter, I have praised Course Signals up and down and criticized these companies for not modeling their products more closely on that work, largely based on the results of the effectiveness studies. So we don’t know what we thought we knew about effective early warning systems. The fact that the research results appear to be spurious does not mean that systems like Course Signals has no value, but it does mean that we don’t have the proof that we thought we had of their value.
More generally, we need to work much harder as a community to critically evaluate effectiveness study results. Big decisions are being made based on this research. Products are being designed and bought. Grants are being awarded. Laws are starting to be written. I believe strongly in effectiveness research, but I also believe strongly that effectiveness research is hard. The Purdue results have been around for quite a while now. It is disturbing that they are only now getting critical examination.
While Course Signals has been widely praised (up until very recently at least) for its effective use of data to improve student outcomes, inBloom has never really been successful at convincing the education sector that it would be a good, useful, or even plausible project.
Initially called the Shared Learning Collaborative, the non-profit has received $100 million from the Gates Foundation and Carnegie Corporation and others to build a student data infrastructure for public schools – one that would improve both the storage of student information and the ease with which third party developers can access it.
The SLC rebranded in February of this year to inBloom (an indication, I reckon, that none of those folks know the lyrics to the Nirvana song “In Bloom” – the part about “sell the kids for food.” Anyway…). It had a major presence at SXSWedu in March for its official launch: an inBloom lounge and an inBloom session track (in addition to the data track) and an inBloom party and an inBloom hackathon and lots of folks in inBloom t-shirts and a Gates Foundation party and a Bill Gates keynote. You get the picture.
At that launch at SXSWedu, inBloom boasted 9 state partners (Delaware, Massachusetts, Colorado, Louisiana, New York, Illinois, North Carolina, Georgia, and Kentucky). Many companies said they were on board too, with plans to use and integrate inBloom data, including Amazon, Clever, Compass Learning, Dell, eScholar, Goalbook, Kickboard, LearnSprout, Promethean, Scholastic, and Schoology. But if you visit the partner pages on the inBloom site today, you can see a lot of those names are missing. inBloom has been abandoned right and left.
Louisiana pulled out in April. North Carolina pulled out in May. That same month, Kentucky, Georgia, and Delaware told Reuters that they’d never actually made a commitment to use the platform. Massachusetts said it was on the fence and hadn’t shared any student data with inBloom. In November, the Jefferson County School Board (in Colorado) voted to scrap their partnership with inBloom, and the Chicago Public Schools opted to use their own state-run database instead. New York remains committed to the project, although a lawsuit was recently filed to block it from sharing data with the non-profit.
Much like the roll-out of the Common Core State Standards, opposition to inBloom comes from a variety of perspectives and politics – those fearing a “big brother” government; those fearing a Bill Gates and Rupert Murdoch-led data grab (Wireless Generation, part of Murdoch’s News Corp, built part of the inBloom infrastructure); those fearing students’ personal data will be used for nefarious purposes; those fearing students’ personal data will be used for profit.
inBloom was never able to assuage these fears. It was never able to successfully articulate why an updated data infrastructure was necessary for public schools, often sidestepping inquiries about its plans for student data by pushing the decisions and the liabilities back onto states and districts.
Of course, the collection of student data isn’t new. The storage of student data isn’t new. The sharing of student data with third party vendors isn’t new. There are several other data models (CEDS, SIF, EdFi) that facilitate this.
But inBloom, with its connections to the controversial figures of Bill Gates, Joel Klein, and Rupert Murdoch and with its rollout timed in parallel with the controversial Common Core, became this year a symbol to many of technology’s role in the privatization of public education. It’s unclear how inBloom, or more broadly speaking ed-tech, will be affected by this association.
Data, Privacy, and the Future of Ed-Tech
Facebook CEO Mark Zuckerberg famously declared privacy “dead” back in 2010. This year, incidentally, he bought the four houses adjacent to his after hearing that a developer had plans to market a neighboring property as being “next door to Mark Zuckerberg.”
Nevertheless, you hear it a lot in technology circles – “privacy is dead” – often uttered by those with a stake in our handing over increasing amounts of personal data without question.
To see privacy as something will inevitably “die,” to view it as a monolithic notion is quite ahistorical. To do so ignores the varied cultural and social expectations we have about privacy today. It ignores how power relations have always shaped who has rights and access to autonomy, self-determination, solitude. It ignores the ongoing resistance (by teens, for example, by activists, and by librarians) to surveillance.
Nonetheless, as the adoption of ed-tech continues (and with it, the increasing amount of data created – intentionally or unintentionally, as content or as “exhaust”), there are incredibly important discussions to be had about data and privacy:
- What role will predictive modeling and predictive policing have in education? Who will be marked as “deviant”? Why? Against whom will data discriminate?
- What role does privacy play – or phrase differently: what role does a respite from surveillance play – in a child’s development?
- How can we foster agency and experimentation in a world of algorithms?
- What assumptions go into our algorithms and models? Who builds them? Are they transparent? (After all, data is not objective.)
- What can we really learn from big data in education? Bill Gates says big data will “save American schools.” Really? Save from what? For whom? Or is all this data hype just bullshit?
- Who owns education data?
- How well do schools protect student data, particularly as they adopt more and more cloud-based tools?
- What happens to our democracy if we give up our privacy and surrender our data to tech companies and to the federal government? What role will education play in resisting or acquiescing to these institutions’ demands?