This is my interpretation of the discussions on the DP mailing lists, and does not reflect the opinions/policy of my employer (Microsoft)
The Data-portability Vision
Today, we are in a world where many services try to obtain as many users' data as possible and discourage them from moving that data to other services. This is a rational move on part of the services - each service knows that if the users move to another system - they will lose access to that user's information - since there is no data portability. On the other hand, restricting users to its own eco-system gives them a chance to try out new innovations without losing customers (if those innovations don't work out). Locking users' data gives them breathing space until they become a viable business.
The data-portability vision is to transform this landscape into a place where participating organizations all agree to make their data available to each other in a fair manner (perhaps freely or perhaps in a no-profit manner or some other measure of fairness). Each participating business/service knows that they do not have to worry about losing access to customers' data. Not only does this give users freedom in choosing their services and providers, but also allows services/organizations to focus on technical innovations (without spending effort guarding their users' data or trying to scrape data from other services).
While most data portability scenarios have focused on social networking data, I believe that the benefits of data portability transcend social networking data. Enabling data-portability for health records and financial records possibly has the chance to make much more impact in the marketplace (I'm not saying that the group should tackle those problems, just that we should keep these in mind while we design solutions).
Official data-portability FAQ here. All the "official" definitions here.
For design principles, click here.
What has been discussed on the forum?
Scenarios
While most discussion has focused on social-networking data, there have been posts on portability of emails, personal communications data as well as data like profiles, blogs and photos.
More interestingly, there have been discussions around portability of health and financial records. In particular, one thread has focused on how health-records portability might facilitate a market in companies that act as health-record repositories. Patients get to choose a provider they trust to store their records, and then get to decide how much access to give to whom. The health-record repository potentially evaluates trustworthiness of other organizations to which the patient wants to share their information to. The result is a world where the user is in control of their data, there is better privacy and the user's data is more secure.
Here is a link to the official use-cases document on data portability site.
Aggregation or Migration?
The name of the effort might suggest that the focus is to enable data migration to a one single place. In fact, the group focuses on both of the following aspects:
1. Aggregating data that resides in different sites (for instance, social networking data in different sites) without actually migrating the data.
2. Making migration of user data easier to one place.
Aggregation: The premise behind aggregation is that it will never be possible to consolidate, say, all the user's social networking data in one place. There might be legal restrictions - for example, the EU allows migration of data only if the target repository conforms to EU standards of data privacy. Aggregation (without migration) requires common formats and common protocols. In this model, the web is seen as a distributed file system with the data scattered in different places.
One question with aggregation is - you at least need to store the meta-data in one single place. This meta-data should contain links to all the places where the user's data resides and what kind of data it is. The question has been asked - what if each of these distributed repositories support a different API to get at the data? Does the meta-data store also need to store details of the APIs that can be used to get at the data? Do we also need to drive at API standardization?
Others have pushed back saying that this is not scalable for an aggregator to understand all the different APIs that different services use - and that we have to follow the philosophy of REST where the API is always a small, standard set of actions and all the required information is already embedded in the data. We should not need a complex API to parse this data - if we have to do that, then the data is not really very portable.
One other issue with aggregation is now do you keep the information in all the places consistent if there is duplication/replication? XMPP has some support for this, and in general, for advanced scenarios, there is always the need for a notification/replication system. This thread talks about it more.
Migration: Other members of the group have talked about facilitating data migration. People have talked abouthow some tools that allow you to move blogs from one service to another do not really fix up the links in your blogs. So if you have links in your blog-entries that point to other blog-entries of yours - the links do not remain consistent. Another example of this is Facebook. In Facebook, a user's profile has references to other user's information. What does it mean to make a user's data migratable - if the references to other profiles cannot be accessed by a third-party from outside the facebook ecosystem? The user's data is not very portable if that is the case.
The key takeaway here is that not only should the data be portable, but the links, the references to other data should also be portable.
The Policy Dimension While most of the effort has been focused on technical issues, there have been a few threads focusing on policy issues. What kind of licenses should organizations use to expose user-data such that privacy concerns are addressed and yet the data is portable to other services?
Do we need guidelines on how to come up with effective licenses? One example in this space is the Science Commons effort - which is an effort to allow scientific efforts to reference data belonging to other organizations without any fear of royalty/copyright issues. The Science Commons protocol is a set of guidelines about coming up with a license under which to release data. The protocol does not itself define a license - it defines rules to be followed to come up with an acceptable license. Other related efforts are Open Knowledge Definition and the Budapest Declaration on Open Access.
Of course, the Data Portability problem is harder for user-data because in addition to open access, we need to worry about privacy. Privacy focused initiatives include the New Zealand's privacy principles. Authorization
There seems to be widespread agreement that the effort should also look at how to allow third-party sites and other users to safely view the user's data. Personally, I feel that while effort should be put in figuring out how to safely share data to other services and applications - the problem of sharing data to other users is an authorization problem, not a data portability problem.
Formats
While a majority of members think that XML is the best format, periodically there are dissenting voices advocating JSON. For social networking data, the consensus seems to be RDF/FOAF and XFN (most discussion has assumed FOAF). Similarly, I think the consensus was tending towards RSS because it has the most mindshare, but there has been discussion about making ATOM the standard.
The Question of Ownership of data
Some members have tried to dig deeper into who owns the data. It is clear for addressbook data, profile data, your photos, your blogs, your email, etc - that the user is the owner.
But how about the case where the site amasses a huge set of music albums, and allows users to indicate who their favorite artists are. Who owns that favorites data? Similarly, consider Facebook data where you own your own data, but then you also have links with other users' profiles. Who owns those links? Clearly, many of the links were established because Facebook provided a friends-finder and allowed you to make links with those people. Do you co-own those links with Facebook?
One answer is that these references are just links - FB should make these links globally dereferenceable so that they can be accessed from any site, and from any experience - in other words, these links to other profiles should be URL-like that can be referenced from anywhere.
There are other examples - what if a photo-editing service allows you to edit your photos: who owns the edited photos? Who owns your transaction record at Amazon.com?
Other members of the group have taken the view that the user owns the datain most of the above examples. In the case of photo-editing - if the service made the tool available for free, then it cannot claim co-ownership of the data. Facebook cannot allow users to find the friends for free, and later dangle the threat that the "links" that users have made with other users are their property.
Here is a blog post that discusses the tension between ownership, sharing and rights management.
Privacy
Privacy remains a big concern. Some folks think that privacy will actually be better once the user is in control of their data, no matter where it is. Others have their concerns of how to ensure proper sharing of a user's data.