The advocacy of existing open standards has been a core purpose of the original DataPortability workgroup. as evidence in numerous discussions.
Dave Winer was linked to where he said:
There are thorny issues here, but we want these companies to give up control of our information, and we don't want them to be overly scared of public opinion as they do it. Permalink to this paragraph
And this is hardly the most important giving up of control. Most important, I want them to give me control of my data. Permalink to this paragraph
So before we overly politicize the leading edge of technology, let's get together on what actually does and doesn't serve the user's interest. Permalink to this paragraph
I want Netflix and Yahoo to give me an XML version of my movie ratings, for me to decide what to do with. I've been asking for this for a couple of years, I still don't have it. This is information I created. I want to keep a copy. I want to make sure that Netflix knows about all my Yahoo ratings and vice versa. I'd like to give a copy to Facebook (assuming they agree to not disclose it) and maybe to Amazon, so they can recommend products I might want to purchase (again keeping it to themselves). I want to begin a negotiation with various vendors, where I give them something of value, and they give me back something of value. Permalink to this paragraph
The leaders of Silicon Valley begrudgingly gave up their view of us as couch potatoes, now they think of us as generators of content they can put ads on (and pay us nothing). We still need to work on that respect thing. When I have an XML file here on my local hard drive that they want they'll make me a better offer. Two companies that are not as shiny as they used to be, Netflix and Yahoo, have the power to take a leadership role in a what will be the next revolution of the Internet, but neither of them are moving. Permalink to this paragraph
That's something worth fighting for, because once one vendor gives us power over our data, the dominoes will start falling, I bet it'll happen very quickly.
Danny Ayers made a distinction about the standards being used:
The only difference I can see here is whether the data is portable and open (i.e. anyone can use it) or portable and access-controlled (at a personal or organisational level)...I guess what I'm trying to say is that with the right systems in place for transparent communication, there's no clear separation between what we call portable data and any other data - just different kinds of access control. Standards like OpenID etc can help get those systems in place.
Chris Saad first stated the mission of DataPortability as follows
Standardized Data Portability is the next great frontier for the web.
As users, our identity, photos, videos and other forms of personal
data should be discoverable by, and shared between our chosen tools or
vendors. We need a "DHCP" for Identity. A distributed File System for
data. This page will list the standards and contributors who are
making it happen.
The Data Portability Workgroup is working towards choosing,
contextualizing and evangelizing a set of open standards and protocols
for data interchange and a reference design for a distributed data
discovery, import, export and sync.
...whilst fellow workgroup member Chris Messina put it as:
"We want open and non-proprietary formats, methods and protocols for identity discovery, data import and export towards sovereignty over the content and media that we create and the profiles and relationships that we maintain.
Terrell Russell wrote an interesting blog post about his vision that looked at Data Repository, Data Channels, Data Management, and Interfaces.
Chris Messina made a suggestion on what he thinks the workgroup should be doing:
I'd rather see this group focus on documenting existing pain points and prescribing antidotes that could alleviate that pain for issues found in the wild... And perhaps that's the route we're already on.
Danny Ayers challenges the approach with a SemWeb angle:
to solve the general
problem I'd start by looking at how to solve the problem of modelling,
integrating (and querying) data in general using the Web to best
advantage. Well, I don't have to, because those aspects are being
covered by Semantic Web technologies (primarily RDF & SPARQL). What's
more the data conveyed in hCard, xfn, apml, rss, opml etc can be
integrated (and queried) this way, and the exact same tools can be
used to talk about Stratocasters and the population of Luxembourg.
For sure a lot of work is needed around authentication and integration
of data through the generic protocols (HTTP, XMPP...), but I'm
continually surprised to find how many of the /other/ problems around
data portability have simple solutions when you bring Semantic Web
technologies into one's stack.
The reason I've absolutely no intention of kicking up a fuss on this
is that although I'm not convinced approaching data through a specific
set of arbitrary domains/formats is the best route, it's still
pointing towards a more general Web of Data (and it's generally
straightforward to work with formats like these using Semantic Web
Chris Messina in a discussion points out: "A principle for this group should be to invent as little as possible and reuse as much as is useful. "
Brian Suda raises some interesting points about whether DataPortability ought to be "open source" as well
The gist of it was that open standards are not enough, you need
openness at all portions of the stack.
This reminds me of the "i want to export my email" option in alot of
clients. Sure if i want to switch from thunderbird to apple mail or
something else, all i need to do is find my .mbox file. That is a unix
standard that apps tend to use. The problem is that it doesn´t also
export all the metadata. I want my mail filter rules, my preferences,
the spam database and others.
So is just having open formats, like APML, or OPML enough? it is a
great start, but when it comes to workflows, if the whole chain, from
the machine to the data, is not open, then there are potential for
- he links to an article which says several things under the title "open standards are not enough". A useful quote from the UK Government
There can sometimes be a danger of lock-in with some proprietary providers, and we must avoid developing an over-reliance on individual suppliers. The Government, via the Office of Government Commerce, work hard to avoid that by using open standards to ensure that different suppliers' software can be used interchangeably. (Angela Eagle, The Exchequer Secretary to the Treasury during a parliamentary debate)
A legacy DataPortability wiki from December 2007 may contain useful information, especially on action packs to executives: http://dataportability.pbwiki.com/AllPages
This was linked to http://wiki.opencontacts.org/index.php?title=Main_Page
Scoble writes about social network portability: http://scobleizer.com/2007/12/13/can-we-get-a-first-step-in-social-networking-portability/
Danny Ayers writes about what made the web successful:
- The success of the Web is due in no small part to the ability for
developers to work independently yet have interop (mostly) guaranteed
by the base spec - critical here is the use of URIs as identifiers.
Danny Ayers introduced his data licence idea, and makes some useful points as well as sparking a interesting discussion:
The starting point I would suggest is "I own my data". This would
correspond more or less to the default copyright on documents - even
if you don't say anything explicit on something you write, you have
What happens when we sign up for a service is we allow that party
access to (some parts of) our own data, currently usually by filling
in forms. Wen we connect to friends within social networking systems
is we allow them access to (some parts of) our own data. In both cases
this seems an implicit licensing of that data for subsequent use.
However, not everyone sees things that way.
Dare Obasanjo  draws a distinction between information exposed on
the service's web pages and that exposed through the API. While the
quality of the data may differ significantly, I'd suggest that in
terms of licensing this distinction is bogus. If I can copy & paste
from one app to another, that can have the net end result as scraping.
As Paul Downey put it , good web APIs are just web sites.
A more extreme view can be found in a comment on Scoble's blog  -
"...you stole my personal details...". While this seems a kneejerk
reaction, it's clear how such a perception could arise.
Right now the service providers generally allow connection with a
vague "he my friend", and bury any details deep within their Terms of
Service. But if the terms of the connection were made explicit, not
only for signup with the service, but with every connection event, any
ambiguity would be removed. Hence:
"Robert is my friend...I'd like to grant him access to my data"
Which leads onto the question of what form such a license might take.
Many of the options are already visible in copyright and software
licenses, though I don't believe (m)any are directly suitable for use
with data. The difficulties arrive with data derived from the original
data - along the lines of software extension and modification, but
twistier (e.g. attention profiles which might only contain derived
data, but couldn't exist without the original).
Anyhow, possible examples would be:
1. open license - anyone can use my data (with/without attribution)
2. reciprocal open license - anyone can use my data, but whatever they
use it with
must also be exposed under this license
3. trust license - the person to whom I license this data may use it
as they please
4. silo license - the person to whom I license this data may use it as
they please within the local system
1. is likely to be impractical in the context of social networking
sites without fine granularity of data access - e.g. I'm happy for my
name and homepage to be associated together in public, but would
rather my email and geographic address are restricted. Long term I
believe we will need this.
2. is ideal for Open Data, in fact this is essentially what the new
Open Data Commons  license looks like (disclosure - I work for
Talis, the company who got together with Creative Commons/Science
Commons to produce this license). But the copyleft nature of the
license probably wouldn't appeal to many social networking services
who see their data garden as business value.
3. seems naive, but I can't think of a better way of approaching things
4. would in effect be a formalisation of the current Facebook position
So I think it might be worth considering what 3 & 4 might look like in
Which was followed by a deep analysis about the concept of privacy, by Eran Hammer-Lahav
This is very interesting.
I think there are two key questions when trying to address this topic.
The first is, what is this data people are so emotional about, and
care so much to keep private. The second, how much of this is coming
out of an American culture of identity that might not be applicable to
If the Plaxo script only took Scoble friends' names and birthday,
would people be so pissed off? What about just their school
affiliation? I think the scars of dealing with spam, having to change
email address every couple of months, and just wasting time has made
people scared of their email being harvested. But the reality is, spam
is no longer such a big problem. Most people who complain about email
overload, talk about valid email from people they trust, just too much
of it. In a world with no spam, would we be looking for other
identifiers than email?
When we say "share our data", email is the first thing that pops into
people's mind and that tends to frame the conversation. Most of the
people who posted against Scoble's stealing their data, publish all
their contacts on their blog, emails, groups, etc. They hand out their
cards with full information on them like candy.
So answering the first question, what is this data we are trying to
protect, is key. Why? Because while many think this should be an
abstract exercise that can fit any data we have now or might have in
the future, for most of what we have today, we already have pretty
good solutions. As Danny mentioned, we got software licenses and
copyrights where the law can help define rights. We got many years of
dealing with file systems to define read/write/execute access rights
for resources. So listing the kind of data we want to protect is key.
The second question is what's behind the emotional reaction to "losing
one's privacy" and how is privacy defined. I think our conversation if
framed by an American point of view of privacy that is very much
unique. In Israel, your friends will ask you how much you make, people
on the street will ask you how much you paid something, and if you
don't answer, you might be considered rude. And at the same time, this
is the same country where I can find out exactly how much you paid for
you house, internet or no internet. People post photos of the kids,
sometime wearing very little on their family website, and they don't
worry about it because their site gets like 2 hits a month, so for
them it is private. But when they post the same photos on Flickr,
where they can be searched, they freak out that perverts might do bad
things with them.
Some of this is just basic anti-government American culture, and some
comes from the way the credit history system is setup where there is
no strong identity provided by the government, making it easier to
steal your identity, hence making information about you very valuable.
Most accounts you have are happy to verify your identity on the phone
with your address and mother maiden name. In 20 years, mom's maiden
name will be a quick Facebook lookup away, because your mom and dad
met on a social network.
I think the 'why' people want to keep their data private, and the
'what' this data is, are critical to our solution of 'how' to
partition it and protect it. Copyrights work because the government
enforces it. Being banned from Facebook doesn't seem to be a strong
enough reason not to run scripts if the data is of value.
My approach has always been to ask, what can't we do today because
we are afraid of losing our privacy. Social networks are successful
because at the end, most people just don't care if you can find their
birthday on Google or see their drunken photos. They might live to
regret it but it will never stop them from doing it in the first
place. But there is plenty that we would like to do but can't. To me,
this is the interesting part.
So back to Danny's post, what kind of framework would actually enable
new kinds of services that today are not possible?
Elias Bizannes states his view on privacy as such:
EHL - good analysis there.
And of course privacy is a form of control. Subset maybe, but what
else is a sibling of the subset? I think privacy is a pretty broad
Privacy, is my eyes, is person specific. My two best friends, dont
want to reveal their salaries, wheres my third friend openly admits
his 300k sales salary. The point about data like such, is that
everyone has their own interpretation. I am slowly getting pedantic
about giving out my email address, because by controlling it, i
prevent spam. Others dont care. It is interesting you mention email
addresses, because when I researched this whole issue a a few years
back, i noticed all the major sites seem to exclude e-mail as
personally identifiable information in therm terms of service (and
which I recently had a whinge about re facebook )
To me, privacy is the right to control
1) who sees 2) what data about you, 3) when you want them to. I am
more than happy to share my birth date to, say members of the
workgroup. A woman in a nightclub, i wouldnt as freely, because its a
automatic determinant of your status before you even get to flirt to
build rapport. People have their own bias as to how they share their
information, and just like they have the freedom to spend their money
as freely as possible, so let them.
So when we talk about privacy, lets not standardise what information
we deem as private or not for the entire population. Its not just
impossible to get it right, but the wrong approach.
So I'm thinking that we should rename the two main planned
deliverables for the group
DataPortability Technical Reference Design
DataPortability Policy Reference Design
DataPortability Technical Blueprint
DataPortability Policy Blueprint
I think blueprint is more accessible and more immediately descriptive.
Followed by Josh Patterson states his reason for spinning off WFRS:
Chris and group,
Per our discussion, and from some of my discussions with various
people, we want to make a few distinctions about group focus and
technology focus. dp.org is about evangelizing multiple technologies
(OAuth, openID, APML, microformats, etc), and WRFS is simply a new
developmental protocol. dp.org's mission has been stated to not
"reinvent" thing, but in some aspects, WRFS is a new technology
(considering how it has an index "wNode", and the mechanics between
that and the endpoints). From that regard, we decided to announce
draft 1 of the spec here, and then spin WRFS off into its own
development group, since engineering is not the core focus of this
group, and that will help provide a clear distinction between the two
(Which is a complaint ive heard more than once from people). dp.org
and WRFS will continue to be "best buds", but WRFS has to take its
place in the "stack" and be a solid engineering effort and stand on
its own two feet. This is in no way a slight of any group, or a
"divorce" of anyone, just a clear distinction to make WRFS what it is
(nice, solid, engineering effort), and dp.org what it is (nice
organizational effort and evangelization).
Chris Messina is against renaming the deliverables and makes the following comment:
As for Chris' question, I'm against the notion of a "blueprint" (at
least for the foreseeable future) and prefer "guidelines" or "best
practices". Especially since we're quite a ways off from any kind of
consensus, it seems a stretch to think that we have anything close to
a "blueprint" yet, or will, for that matter, for some time to come.
Chris Saad makes a page which describes the group:
The goal here is for self-organizing (if it isn't there, then start a thread, write it), constructive (try to work to consensus rather than sticking to an agenda), and ultimately fruitful (write the result down in a document so the discussion and consensus is not lost) discussion.
Also remember this is not a normal standards development group. It is really a story telling group. We need to build a technical, political and user experience story that connects all the dots between existing pieces of technology and political will.
Once upon a time, in a galaxy far, far away...
This document was created outlined design goals of the blueprint: http://groups.google.com/group/dataportability-public/web/design-goals
Like the above document, one was created that was edited post the workgroup. This one states the workgroups roadmap:
This is the Roadmap for the DataPortability Workgroup
1. Create Decision Making Framework (here)
2. Clarify the DataPortability Charter (here)
3. Decide on Design Principals for DataPortability Technical Blueprint and Policy Blueprint (here)
4. Decide on Use Cases for Technical and Policy Blueprints (here)
5. Debate and design the Technical and Policy Blueprints (here and here)
6. Steering Action Group to ratify Technical and Policy Blueprints
9. Evangelize the Technical and Policy Blueprints
Danny Ayers raises points about why the DP workgroup should not adopt a charter under Identity Commons:
..but I'm not so sure about this, because the purpose of the identity
commons group ("open identity layer for the Internet") is not the same
as that of the DataPortability group.
Not being directly involved, I can't give a definitive answer, but
from the open identity perspective, I suspect data portability would
be perceived as something enabled by open identity. The identity stuff
certainly does have a special role when it comes to trust,
authorization and authentication in data portability. But
fundamentally identity (according to the OI definition) is just one
specific kind of data. (I have a few comments on those definitions,
will mail separately).
A high-level approximation of this would be that to OI folks DP is a
subset of their purpose, and contrariwise to DP folks OI is a subset
of their purpose...
Elias Bizannes makes a post saying he is working with others to create a roadmap for various groups, with this one being for the technical action group
Amongst a discussion about whether DP should affiliate itself with the W3C, Matthew Rothenberg makes a suggestion:
Perhaps it might make sense to keep the overall DP Working Group as a
more ad-hoc institution, but as more concrete deliverables emerge,
moving some of the technical standards finalization into more formal
W3C groups over time.
A suggestion is made about what dp should really be about: policy
m wondering what steps are being taken to explore the privacy
implications of data portability. There are some significant social
and legal concerns that I believe require addressing early on in the
JP Ragaswami picked up on some of the issues here. http://confusedofcalcutta.com/2008/01/04/information-ownership-in-an-...
When I posted on Robert Scoble's "scraping" incident in relationship
to the EU data protection directive, I felt I was raising a reasonable
concern, at least from a European legal and cultural position. Many
folks (especially Americans) didn't agree with me, but free speech is
Those that didnt see the post, http://theotherthomasotter.wordpress.com/2008/01/08/facebook-scoble-m...
I would strongly advise that the data portability group place the
privacy issues at the centre of the design and concept. If not, it is
likely to create significant social and legal backlash at a later
date. It is not easy to retrofit PET (privacy enhancing technology)
The workgroup's role was overshadowed by the creation of action groups, and by February, it was officially announced that the workgroup mailing list has been deprecated. This should mark the first phase of the DataPortability Project lifecycle, and the start of it's social experiment of open and decentralised.