In episode #289, Richard and I talked with Pablo Castro from the SQL Server Product Group at Microsoft about his work with Astoria, an infrastructure for bringing Web technologies and data sources together.
Astoria uses the Entity Framework, which Pablo is also involved with.
Pablo Castro: At a high level, Astoria is about creating services that are data centric and in a way, it is Web friendly. Essentially, Astoria is a bunch of technologies and also a set of patterns for building services that expose data to the Web and for consuming those services. We’re just putting together bits and pieces of well-known established technologies trying to invent as little as possible and enable various applications out there to expose their data to the Web in a way that all other applications out there can easily consume, with a very, very low barrier of entry.
Carl Franklin: So, basically, it’s a Web service wrapper around a data layer? It’s a Web service data layer?
Pablo Castro: Yeah, you can call it that. Web service is a loaded word these days, but sure, it’s a service, it’s Web facing, and it wraps around your data, clearly, yes.
Carl Franklin: Okay. I’ve got a couple of questions just right off the bat. First of all, and I know you’re going to get into this when you describe it in a little more detail, but it’s based on the Entity Framework, which isn’t shipping yet.
Pablo Castro: Yeah. So yes, Astoria builds on top of it. There are various reasons why these things didn’t ship and we can go into the history, but the very fact that all our products depend on it clearly indicates there is a strong assumption across the board that this thing will ship and will ship as planned. I’m also an active member of the Entity Framework team so I can see the product coming together and all that. The Entity Framework is done. I mean, it’s going through the last baking period…
Carl Franklin: That’s great.
Pablo Castro: Where you go through any remaining issues, work on the finish and so on-it’s in good shape and it’s on track for shipping whenever.
Richard Campbell: But not shipping as a part of Orcas?
Pablo Castro: It is not shipping as part of Orcas. That will ship this year and early next year or sometime next year. In the first half we will go ahead and ship the Entity Framework. It will be an add-on but it will look and feel like it’s part of the platform, the stuff in the System namespace. The tools are deeply integrated into Visual Studio and so on. Once it is installed there will be no seams between the Entity Framework and the rest of the .NET Framework.
Carl Franklin: Okay.
Richard Campbell: Maybe we need to go through the Entity Framework again. We had Dan Simmons on a few months ago to talk about it but now that it’s closer to finish, like you said, it’s basically baked. What does it look like?
Pablo Castro: I’m sure you’ve had a deep conversation with Danny. So we have made the last changes so Entity Framework is what we said it was going to be. It’s a framework that enables developers to work at the conceptual layer. It has a powerful mapping engine built in and it has a nice, fancy option services layer on top of the whole thing so that if you’re working on your business logic and using a .NET language, then you feel right at home. Your business objects are .NET objects, your query language can be language integrated queries or LINQ, so you never open a quote and you don’t need to use anything outside of your development environment. So, it looks pretty much like we said it was going to look. One other thing is that since then we’ve taken a lot of feedback from our users through our beta cycles and through various channels that we have with our developer community. We’ve incorporated a lot of feedback from the developer community at large, so most of the tweaks that you will see now are the direct result of what we’ve heard out there and sort of our reflection on that into the product.
Richard Campbell: To me this looks like the CTP model working.
Pablo Castro: Yeah, yes.
Richard Campbell: You get a chunk of code out there with no obligation attached and see what people do with it.
Pablo Castro: Yeah, yeah, exactly, and we’re pushing it more and more. With the Entity Framework, we took a bunch of feedback and we’re incorporating it now as much as we can. Of course, there’s always stuff that you leave for the next version; otherwise, you will never ship anything.
Richard Campbell: Right.
Carl Franklin: Yeah.
Pablo Castro: It’s clearly working and worked well enough that for Astoria we went one step further and we shipped a prototype. Usually you build a proof of concept before you go into a product, right? Those proofs of concept usually never see the light of day.
Carl Franklin: Right.
Pablo Castro: It worked well enough and we got enough feedback in previous cycles that for Astoria we said, "Okay, we’ll just go for the whole thing." We took the prototype that was literally a prototype. It was not meant for production or even close to production use, and we’ll ship that to get early feedback.
Carl Franklin: Okay. So, let’s get back to Astoria for a bit. I’m curious as to syntax and how it’s used and to what extent. You have SQL-like syntax in the URLs and all of that stuff. Give us a little bit more in-depth overview here of the "How To’s."
Pablo Castro: Okay. The first thing I think we should highlight is the fact that Astoria is a way of exposing data in the most general sense. I know it’s almost an implied tie between data and databases, but the reality is that, especially these days, if you look at the Web there are plenty of data sources out there that range from actual database kind of stuff all the way to Flickr and Facebook and RSS feeds, which exist pretty much for everything out there. We see data as the sources for Astoria in the most general sense and there are some rules on how to plug in the details and in general, how to interact with Astoria and we’ll get into that in a second. But in principle when we talk about Astoria as a way of exposing data, we mean it in the most general sense. This needs to be structured data or semi-structured data at some point but other than that it has no particular technology that needs to be used for starters.
Carl Franklin: I see.
Pablo Castro: So in principle, what Astoria does is there are several components which you run, if you’re exposing data to the Web and you run some Web server or peer, if you’re exposing your data, say an enterprise, then you run it in some Web server within your enterprise and what that server component does is it unifies the URI format. We picked a format and it’s important to pick one, we’ll get deeper into that later but it’s important to get one for the sake of a very uniform interface. The one URI format we picked is simply a format that follows or layers very naturally on top of the underlying data model that we use. Since we are going to expose data sources from varied technologies and databases to feeds to all arbitrary data sources out there, we said, "Let’s make the data more or less look alike." So we took the Entity Data Model or EDM, there’s a bunch of background about it and there are papers written about it that are available in our Web site that talk about the principles of the data model, but in principle there are two elements that are interesting: there are entities and relationships. So from the Web server interface, this is nice because what we do in Astoria is we take these models, regardless of how they’re backed by physical storage, and we expose these models through a vast interface and this vast interface takes every entity and turns it into a resource and it takes every association between two entities and turns it into a link. So there is a very simple, very clean translation from what is the input source and what is the visible set of data in the vast interface. So from there, building the URI is actually fairly straightforward because now we can simply say, "Look, we have a bunch of containers and those containers contain entities." So the URLs are pointers to those entities in those containers, so the syntax is as simple as we say the root of the service slash container name and then parentheses and then we put the key in there. Simply, you can say if you have a customer’s container, slash customer, parentheses one, two, three, and that is a URI that points to the customer’s resource and the customer’s container which has a key of one, two, three.
Richard Campbell: So it seems like you’ve essentially eliminated the concept of the method per se, that it’s always get and here is the entity I want.
Carl Franklin: And you’ve basically created a little syntax for selecting in the URI and my first thought here, Pablo, is I guess that works really well with a lot of other data sources when using SQL Server. Are stored procedures possible or is that just dinosaur thinking?
Pablo Castro: Good question. Depending on the nature of the data sources and how you expose the URI space, we give you a default option where you give us an EDM mapping and if it is an arbitrary data source, which could even be your own business logic, then it’s just anything that looks like a CLR Object graph.
Carl Franklin: Okay.
Pablo Castro: If you’re in the Entity Framework and say point into a database, by default we build a URI space for you that is based on dynamic SQL, meaning you write URLs but under the covers we’ll go generate SQL. Now there are many scenarios where you don’t want that to happen. You want a more restrictive interface to your data and that could have to do with wanting to do for business logic or with predictability around query plans and such. If it is the latter, you probably want to use stored procedures for this and we have a way of hooking up stored procedures into the URI space so that you can take over a piece of the URI space. You can say, for example, "I don’t want you to say Flash customers and get all customers in that container."
Carl Franklin: Yeah.
Pablo Castro: "I want you to go through a special URI that is called My Good Customers and My Good Customers actually take a couple of arguments and are actually stored procedure calls under the covers although the kind cannot really tell that."
Carl Franklin: Right.
Pablo Castro: The only thing you can observe from the client is that query composition is not allowed, meaning stored procedures are by design fixed and that is sort of a feature of stored procedures. So you cannot use custom sorting options or you cannot use filtering or stuff like that because the stored procedure has a predefined interface.
Carl Franklin: So views might become popular in this sense, definitely.
Pablo Castro: Yes, certainly. Views work really well, table-value functions work really, really well because they can take arguments, which views don’t, but at the same time the output is composable, which means you can still support sorting and searching and all of that stuff which is usually handy when you want to create a more flexible data interface.
Carl Franklin: So if I can summarize then, even though it looks like you’re just selecting from tables and that’s all you can do, no, that’s not all you can do. That’s the default behavior built into the URI, the sort of feature set but just because it’s there doesn’t mean you can’t have stored procs in the back end.
Pablo Castro: Yes, yes absolutely. You can also inject some business logic in the middle tier, like check filters based on joint tables and stuff like that, so we have a query form mechanism that allows you to inject bits and pieces of the query as we build a bigger query that is going to be sent to the database.
Carl Franklin: And the format is just pure XML? Is it Entity XML? What do you get back?
Pablo Castro: You mean from the HTTP interface?
Carl Franklin: Yes.
Pablo Castro: What you get back is, actually we use standard HTTP content negotiation, content height negotiation. Right now we support two formats. We have an Atom-based format that is literally, we turn the data into fields and entries in Atom terminology.
Carl Franklin: Ah, that’s cool.
Pablo Castro: It’s not just Atom. We actually support APP, the Atom Publishing Protocol, which brings enough semantics for us to map it to the basic actions on the entities.
Carl Franklin: So was RSS not rich enough for this?
Pablo Castro: Well, RSS, is the feed format but it doesn’t define an interaction model. There are no update semantics built into it and there is, in general, a way of discovering for argument service of all of the containers or the fields available and in general, there are more bits and pieces that are more complete than in the RSS case. I mean, in the RSS that’s not Atom-based or not…
Carl Franklin: Right.
Pablo Castro: What is the name of the-there is a feed constellation?
Carl Franklin: Just RSS 2.0 without Atom.
Pablo Castro: But in general, RSS is mostly a format and what we need is more like a protocol. An APP defines the whole protocol. So, it says not only what the feed format is for the output, but it says, for example, "What are the semantics of when you say an HTTP put against an entry against the feed in the store? What is the expected result of that?" It defines how to deal with links between things and stuff like that. It mapped surprisingly well with what we were trying to do. Further, it’s already a well-established and popular format and protocol so it was a relatively easy choice to go for it. That’s one of the formats. The service is designed so that it can actually support multiple formats. Atom is one of them and APP. We also support an AJAX-based format and the protocol is very similar to the Atom one but the format is what follows or uses the JavaScript Object Notation (JSON).
Carl Franklin: AJAX people would be very happy about that.
Pablo Castro: Yes, exactly. This is for people working on AJAX applications where you receive the data on the client and you want to use it as a regular JavaScript object so if you trust the source you can just eval the answer from the surveys and bang, you have the object. In fact, the AJAX payload is designed for being friendly to the developer so we try to avoid too many spurious control information fields and stuff like that embedded in the payload. We have some but they are placed so that they don’t get in the way for the most part.
The conversation continues online at http://www.dotnetrocks.com/default.aspx?showNum=289 or http://shrinkster.com/tv8.