This past January and February, I had the opportunity to work at SURFnet in Utrecht, The Netherlands. I met a lot of great people from around the world, and learned how their organizations operate and what they're currently looking into.
One area of investigation is the concept of 'Federated Storage'.
Everyone I talked to about this freely admitted that they didn't exactly know what 'Federated Storage' is or how it works, but they saw it as a valid idea worth researching. I agree.
Personally, I see Federated Storage as taking the benefits of Federated Identity and applying them to a shared, distributed storage service. Federated Identity, in a nutshell, allows an organization to log into any online foreign service using their local accounts. The foreign service defers its authentication to the originating service, but still retains authorization over what areas the account has access to.
By applying these concepts to Federated Storage, I can envision a service that allows users to utilize foreign storage without having to create an account with the foreign provider. It's actually a pretty cool idea.
Three use-cases come to my mind for this idea of Federated Storage:
- The first use-case can be thought of as the eduroam of Dropbox. When a user is visiting a foreign institution, they can easily discover if it provides this Dropbox-like service, log into it, and have local access to all of their files. The main benefit of this is that their files are always local, so if they are visiting from across the world, they don't have to worry about latency and slow transfer speed.
- The second use-case takes the same concept of the first, but removes the file replication aspect. So when a user visits a foreign institution and logs into the local 'Dropbox' service for the first time, they have no files. But they can begin uploading new files to this new institution. The benefit of this is the ability to have a separate "Dropbox" at each institution, which can be accessed seamlessly when traveling between locations.
- The final use-case is a hybrid of the first two: giving the user control over which files they would like replicated between the institutions they visit.
Make it work
Dreaming up ideas of how Federated Storage can work is all well in good, but at some point, some tangible service will need to be created. When discussion came to how a Federated Storage service would actually be implemented, most people I talked to suggested using a distributed file system such as Swift, Gluster, or Ceph.
Distributed file systems do seem like the obvious choice in this type of service. Their attributes include the ability to manage a file system across long distances and provide native replication of files.
But I think distributed file systems might be too low-level for a federation that involves storage. Distributed file systems have no knowledge of the actual data they are storing. All the file system knows is that it's currently storing some bytes on behalf of a user. There's no built-in intelligence that prevents sensitive information from being stored by a party that is not authorized to store such information.
In order to manage this, a layer above the file system is required: a policy layer. The easiest way to achieve this is to place policy responsibility with the owner of the data. This is where application-level storage distribution systems such as Seafile or Lobber come in.
Another piece to consider is what kind of data should be hosted in the Federation. For example, should the storage service be completely agnostic and allow the user to store any arbitrary file? Or should the service have some intelligence built in to easily host Calendar and Contact data on behalf of the user?
I think the only way to answer these questions is to just jump in and start trying out different models. This was the conclusion I reached during my time at SURFnet. The involved parties should pick a common file system to set up at their own location, and try to tie it together.
Agile innovation in action.