An App Store for functions

21 Oct 2015

Today I was on the bus, and I had a very odd fleeting thought. Why don’t I contribute more to open source. My thought was that I’ve been an engineer for going on about a decade, and the only open source, open source projects to my name are attributed to things I did in college. This got me thinking that one barrier to open source contribution is utility and time. So the following is a very brief attempt at describing a new ecosystem, that might lead to more code contribution and perhaps a richer eco system for fledgling startups to potentially make money.

What I’m thinkin

I’m going to postulate that if we took any current api and instead of building api calls , we built a compute cluster, we might see a larger amount of diversity in applications that are built from the usage of other applications. The cluster would allow users to submit code that ran against a variety of the cluster data formats, and returned particular value. The cluster would make the availability of those functions discoverable. The cluster maintainers could charge for computation time much like aws lambda, and the functionality creators if they so deemed could either release their functionality for free, or charge a per usage fee or any number of pricing arrangements. In a way it would allow api maintainers to create an app store for functions.

Cluster rules

The compute cluster must allow for the following. - The cluster api data must be stored in a format that would allow for batch computation - The cluster api data must be in a format that allowed for stream processing, within the regards of multicast of the new data. - The cluster must support arbitrary functions that run on the data in an immutable way, no data can be mutated and no function may have state. - The cluster must allow functions to talk to things outside of the cluster.

Example of how it would work

Foursquare API stores its checkins data in the cluster. Susan a developer sees { “time”,”lat”,”lon” ,”numpeople”} in the data , Susan decides to write a simple function that batch computes a predictive model of where people will be on at a time of day. Susan registers the function and says its on sale for .50 cents. Joe comes along says that’s nifty I’d like to use this model in my application for showing people adds from a region of the country. Joe writes is own function that calls Susan’s model function once a day and sends the resultant model to Joe’s server. Another example would be, what developer Lynn does, Lynn writes a function that chains the result of Susan’s predictive model, then applies it to calculate the likelihood of making a sale. Using rating data. Lynn charges a dollar for her code. Susan gets 50 cents and Lynn gets 50 cents . Neither Susan, or Lynn had to make a full app or worry about the efficiency of their code on a server. They were simply able to add the functionality, and reap the benefits of the time spent.

Why ?

The problem I see, is that perhaps we aren’t writing enough code. There’s lots of data out there, there are lots of applications that have data that’s really useful. However, obtaining that data is rarely uniform, and often we have to use a meta form such as an api, to get at the data we care about. Then we must serve this data over TCP , when we could very well just leverage data locality and run the code on the cluster, given that the data is immutable. I think the time is coming with docker, mesos etc… Where we can safely partition computation from other computation and peg the computation to not bring the cluster down to a grinding halt. Additionally, it takes alot of development time to think about what goes into a public api, it takes considerably less time to simply dump a subset of data, and say figure out the rest. The incentives are fair, in the sense that if you write solid modular code for a cluster that supports this data , you will get rewarded. Otherwise you won’t, people can always write their own functions. Additionally , if its your cluster and you want more developers, you can always wave the computation fee, for writers of the functions that run on the cluster. Making the ecosystem more rich, and more than likely making people consuming your api data more productive.

comments powered by Disqus