Discussion:
[Wsf-general] data services and resources
Sanjiva Weerawarana
2007-03-13 17:05:51 UTC
Permalink
Hi .. last week I spent some time with the data services guys and
discussed ways of fixing and finalizing the data services descriptor. I've
thought thru this more and believe we now have enough to handle all the
scenarios James brought up and more. I need to do some examples (and to
complete the resource part and deal with UPDATE etc. queries) but please
take a look and comment.

See: http://www.wso2.org/wiki/display/wsf/Data+Services+and+Resources

I've borrowed quite a bit from WADL etc. but there's ways to go yet.
Everyone please review.

I'd *really* like to get a first cut of the data services stuff done (with
this lang cleaned up) by the end of the month!

Sanjiva.
--
Sanjiva Weerawarana, Ph.D.
Founder, Chairman & CEO; WSO2, Inc.; http://www.wso2.com/
email: ***@wso2.com; cell: +94 77 787 6880; fax: +1 509 691 2000

"Oxygenating the Web Service Platform."
Sanjiva Weerawarana
2007-03-14 04:22:06 UTC
Permalink
FYI I've kept on tweaking and editing it .. sorry if u read it already.

Sanjiva.
Post by Sanjiva Weerawarana
Hi .. last week I spent some time with the data services guys and
discussed ways of fixing and finalizing the data services descriptor.
I've thought thru this more and believe we now have enough to handle all
the scenarios James brought up and more. I need to do some examples (and
to complete the resource part and deal with UPDATE etc. queries) but
please take a look and comment.
See: http://www.wso2.org/wiki/display/wsf/Data+Services+and+Resources
I've borrowed quite a bit from WADL etc. but there's ways to go yet.
Everyone please review.
I'd *really* like to get a first cut of the data services stuff done
(with this lang cleaned up) by the end of the month!
Sanjiva.
--
Sanjiva Weerawarana, Ph.D.
Founder, Chairman & CEO; WSO2, Inc.; http://www.wso2.com/
email: ***@wso2.com; cell: +94 77 787 6880; fax: +1 509 691 2000

"Oxygenating the Web Service Platform."
sumedha rubasinghe
2007-03-14 06:15:05 UTC
Permalink
1. IMHO we also need to consider content filtering based on given
parameters / logged in user credentials.

eg. A customer should only see orders placed by himself only.


2. Just my 2 cents on following statement.
"The database administrator will create a configuration file [xml] with
the needed details for exposing the required data in the database."

<databases>
<database name="xs:NMTOKEN">
<resource type="TABLE | VIEW | STORED-PROCEDURE | FUNCTION"
name="xs:NMTOKEN">
<operation name="SELECT | UPDATE | DELETE | UPDATE">
<allowed>
<role></role>+
<allowed>
</operation>
<resource>
<database>
</databases>


*Example.....*

<databases>
<database name="orderdb">
<resource type="TABLE" name="customers">
<operation="select">
<allowed>
<role>admin</role>
<role>guest</role>
<role>general</role>
</allowed>
</operation>
<resource>
<database>
</databases>


/sumedha
Post by Sanjiva Weerawarana
FYI I've kept on tweaking and editing it .. sorry if u read it already.
Sanjiva.
Post by Sanjiva Weerawarana
Hi .. last week I spent some time with the data services guys and
discussed ways of fixing and finalizing the data services descriptor.
I've thought thru this more and believe we now have enough to handle
all the scenarios James brought up and more. I need to do some
examples (and to complete the resource part and deal with UPDATE etc.
queries) but please take a look and comment.
See: http://www.wso2.org/wiki/display/wsf/Data+Services+and+Resources
I've borrowed quite a bit from WADL etc. but there's ways to go yet.
Everyone please review.
I'd *really* like to get a first cut of the data services stuff done
(with this lang cleaned up) by the end of the month!
Sanjiva.
Chamil Thanthrimudalige
2007-03-14 06:51:02 UTC
Permalink
Post by sumedha rubasinghe
1. IMHO we also need to consider content filtering based on given
parameters / logged in user credentials.
eg. A customer should only see orders placed by himself only.
2. Just my 2 cents on following statement.
"The database administrator will create a configuration file [xml]
with the needed details for exposing the required data in the
database."
<databases>
<database name="xs:NMTOKEN">
<resource type="TABLE | VIEW | STORED-PROCEDURE | FUNCTION"
name="xs:NMTOKEN">
<operation name="SELECT | UPDATE | DELETE | UPDATE">
<allowed>
<role></role>+
<allowed>
</operation>
<resource>
<database>
</databases>
*Example.....*
<databases>
<database name="orderdb">
<resource type="TABLE" name="customers">
<operation="select">
<allowed>
<role>admin</role>
<role>guest</role>
<role>general</role>
</allowed>
</operation>
<resource>
<database>
</databases>
I think if the access is related to the database the db admin can set
the access in the db itself and set the service up so that the db
does the access control. If from the side of WSAS we can let our
security module take care of service/operation level access control
and setting up of such access control.

Best Regards,
Chamil Thanthrimudalige
Post by sumedha rubasinghe
/sumedha
Post by Sanjiva Weerawarana
FYI I've kept on tweaking and editing it .. sorry if u read it already.
Sanjiva.
Post by Sanjiva Weerawarana
Hi .. last week I spent some time with the data services guys and
discussed ways of fixing and finalizing the data services
descriptor. I've thought thru this more and believe we now have
enough to handle all the scenarios James brought up and more. I
need to do some examples (and to complete the resource part and
deal with UPDATE etc. queries) but please take a look and comment.
See: http://www.wso2.org/wiki/display/wsf/Data+Services+and
+Resources
I've borrowed quite a bit from WADL etc. but there's ways to go
yet. Everyone please review.
I'd *really* like to get a first cut of the data services stuff
done (with this lang cleaned up) by the end of the month!
Sanjiva.
_______________________________________________
Wsf-general mailing list
http://wso2.org/cgi-bin/mailman/listinfo/wsf-general
Sanjiva Weerawarana
2007-03-14 16:01:05 UTC
Permalink
Post by sumedha rubasinghe
1. IMHO we also need to consider content filtering based on given
parameters / logged in user credentials.
eg. A customer should only see orders placed by himself only.
In general yes but I'd like to get this working before adding security
stuff properly.
Post by sumedha rubasinghe
2. Just my 2 cents on following statement.
"The database administrator will create a configuration file [xml] with
the needed details for exposing the required data in the database."
<databases>
<database name="xs:NMTOKEN">
<resource type="TABLE | VIEW | STORED-PROCEDURE | FUNCTION"
name="xs:NMTOKEN">
<operation name="SELECT | UPDATE | DELETE | UPDATE">
<allowed>
<role></role>+
<allowed>
</operation>
<resource>
<database>
</databases>
*Example.....*
<databases>
<database name="orderdb">
<resource type="TABLE" name="customers">
<operation="select">
<allowed>
<role>admin</role>
<role>guest</role>
<role>general</role>
</allowed>
</operation>
<resource>
<database>
</databases>
Good start; please edit the wiki and put this as a strawman proposal!

Thanks for the feedback. Others?

Sanjiva.
--
Sanjiva Weerawarana, Ph.D.
Founder, Chairman & CEO; WSO2, Inc.; http://www.wso2.com/
email: ***@wso2.com; cell: +94 77 787 6880; fax: +1 509 691 2000

"Oxygenating the Web Service Platform."
sumedha rubasinghe
2007-03-21 12:32:51 UTC
Permalink
Updated http://www.wso2.org/wiki/display/wsf/Data+Services+and+Resources.

1.
I would like to propose a new REQUIRED attribute called 'name' to data
element. The value of this will be the name of the service deployed.

<data baseURI="xs:anyURI" name="xs:NMTOKEN">
config
query+
(operation | resource)+
</data>

* data/@baseURI: a REQUIRED URI indicating the base URI for the
operations and resources defined within the <data> element.
* data/@name: a REQUIRED name of the data service.


2. Changed one of the examples to match latest specification.

<data baseURI="xs:anyURI" name="StudentInfoService">
<config>
<property name="driver">org.apache.derby.jdbc.EmbeddedDriver</property>
<property name="protocol">jdbc:derby:../database/WSO2WSAS_DB</property>
<property name="user">wso2wsas</property>
<property name="password">wso2wsas</property>
</config>

<query id="studentInfo">
<param name="index" sqlType="INTEGER"/>
<sql>select name, index, e-mail from student where student.index=?</sql>
<result element="Students" rowName="student">
<element name="Name" column="name"/>
<element name="Index" column="index"/>
<element name="E-mail" column="email"/>
</result>
</query>

<operation name="getStudentByProjectId">
<call-query href="studentInfo">
<with-param name="indexParam" query-param="index"/>
</call-query>
</operation>
</data>
Sanjiva Weerawarana
2007-03-26 01:03:51 UTC
Permalink
Post by sumedha rubasinghe
Updated http://www.wso2.org/wiki/display/wsf/Data+Services+and+Resources.
1.
I would like to propose a new REQUIRED attribute called 'name' to data
element. The value of this will be the name of the service deployed.
<data baseURI="xs:anyURI" name="xs:NMTOKEN">
config
query+
(operation | resource)+
</data>
operations and resources defined within the <data> element.
+1.
Post by sumedha rubasinghe
2. Changed one of the examples to match latest specification.
OK good. Can you also put a description of what that sample query does? We
need to explain the stuff a bit more. Also, if a query is only used once,
I suggest we get used to inlining in the operation or resource rather than
putting it outside.

We need to work out the REST design in more detail now.

Sanjiva.
--
Sanjiva Weerawarana, Ph.D.
Founder, Chairman & CEO; WSO2, Inc.; http://www.wso2.com/
email: ***@wso2.com; cell: +94 77 787 6880; fax: +1 509 691 2000

"Oxygenating the Web Service Platform."
Paul Fremantle
2007-03-15 08:37:02 UTC
Permalink
I was at a very interesting talk by Werner Vogels from Amazon here at QCon.

He was describing how Amazon is building a lightweight data service
called Dynamo to handle much of the "shopping cart" data they need.

Basically, the tradition RDBMS is not what they need, because they
simply put XML documents into the store with a key and get them out
again. So Dynamo is a very high performance, highly-available,
clustered, persistent hashtable. It deliberately trades consistency for
availability.

He made the point that they don't need any schema (its key->blob), query
(they don't even need iterators), consistency etc.

So... is there a layering of this API that makes it simple to do XML
storage without defining a schema? For example mapping xs:any?

Paul
Post by Sanjiva Weerawarana
Hi .. last week I spent some time with the data services guys and
discussed ways of fixing and finalizing the data services descriptor.
I've thought thru this more and believe we now have enough to handle all
the scenarios James brought up and more. I need to do some examples (and
to complete the resource part and deal with UPDATE etc. queries) but
please take a look and comment.
See: http://www.wso2.org/wiki/display/wsf/Data+Services+and+Resources
I've borrowed quite a bit from WADL etc. but there's ways to go yet.
Everyone please review.
I'd *really* like to get a first cut of the data services stuff done
(with this lang cleaned up) by the end of the month!
Sanjiva.
--
Paul Fremantle
VP/Technology and Partnerships, WSO2
OASIS WS-RX TC Co-chair

http://bloglines.com/blog/paulfremantle
***@wso2.com
(646) 290 8050

"Oxygenating the Web Service Platform", www.wso2.com
Sanjiva Weerawarana
2007-03-15 09:17:17 UTC
Permalink
Post by Paul Fremantle
I was at a very interesting talk by Werner Vogels from Amazon here at QCon.
He was describing how Amazon is building a lightweight data service
called Dynamo to handle much of the "shopping cart" data they need.
Basically, the tradition RDBMS is not what they need, because they
simply put XML documents into the store with a key and get them out
again. So Dynamo is a very high performance, highly-available,
clustered, persistent hashtable. It deliberately trades consistency for
availability.
He made the point that they don't need any schema (its key->blob), query
(they don't even need iterators), consistency etc.
Cool. This is inline with our thinking for the registry then? (BTW don't
forget that writeup on your todo list ;-))
Post by Paul Fremantle
So... is there a layering of this API that makes it simple to do XML
storage without defining a schema? For example mapping xs:any?
I'm not sure whether "this API" refers to what we're trying to do or what
they're doing. ?

What we're doing is making it easy to take data that's already in a
relational database and put it on the Web. The problem they're solving is
a different one.

Sanjiva.
--
Sanjiva Weerawarana, Ph.D.
Founder, Chairman & CEO; WSO2, Inc.; http://www.wso2.com/
email: ***@wso2.com; cell: +94 77 787 6880; fax: +1 509 691 2000

"Oxygenating the Web Service Platform."
Paul Fremantle
2007-03-16 08:08:40 UTC
Permalink
Post by Sanjiva Weerawarana
I'm not sure whether "this API" refers to what we're trying to do or
what they're doing. ?
What we're doing is making it easy to take data that's already in a
relational database and put it on the Web. The problem they're solving
is a different one.
I think the problem Amazon is solving is different, but there is a
subset of each that is the same - i.e. an intersection. The intersection
is simply storing an XML value (xs:any) against a key (xs:string).
Obviously our data model has more possibilities, and their
performance/availability model is completely different. I guess what I'm
asking is whether someone who wants this very simple storage model can
do it *very simply* with our approach.

Paul
--
Paul Fremantle
VP/Technology and Partnerships, WSO2
OASIS WS-RX TC Co-chair

http://bloglines.com/blog/paulfremantle
***@wso2.com
(646) 290 8050

"Oxygenating the Web Service Platform", www.wso2.com
Sanjiva Weerawarana
2007-03-16 08:37:36 UTC
Permalink
Post by Paul Fremantle
I think the problem Amazon is solving is different, but there is a
subset of each that is the same - i.e. an intersection. The intersection
is simply storing an XML value (xs:any) against a key (xs:string).
Obviously our data model has more possibilities, and their
performance/availability model is completely different. I guess what I'm
asking is whether someone who wants this very simple storage model can
do it *very simply* with our approach.
Well .. yes. Just create a table with two columns- a string key and a BLOB
value. (Or whatever the right way to store a bunch of xml in a single
table cell.

However we'd need to hack up our response stuff to make it work too (we'd
need to treat the blob as XML and get it serialized correctly).

I'd like to get the current stuff done and out and then look at expanding
the scope.

Sanjiva.
--
Sanjiva Weerawarana, Ph.D.
Founder, Chairman & CEO; WSO2, Inc.; http://www.wso2.com/
email: ***@wso2.com; cell: +94 77 787 6880; fax: +1 509 691 2000

"Oxygenating the Web Service Platform."
James Clark
2007-03-19 10:53:30 UTC
Permalink
I've been wondering whether a rather different approach to specifying
the data service might be preferable. I haven't worked this approach
out in anything like the detail that our current approach has been
worked out in, but I will try to explain the basic idea.

The overall philosophy is to be higher-level, more declarative, easier
to use, but less flexible and less powerful. With this philosophy it's
not a goal that the user should be able to design an arbitrary web
service or REST interface to the information in the database and then
use the configuration file to specify that design. Instead the user
gets to decide the kind of reading, writing and searching of the data
that they require the web service/REST interface to provide and we
automatically create a good-quality web service/REST interface that
meets that requirement, together with some modest level of tweakability.

The fundamental concept is an "entity-set". The data service would
declare one or more named entity-sets. An entity-set is (surprise,
surprise) a set of entities. Each entity has an identifier that
uniquely identifies it within its entity-set. On the database side of
things, for each entity set there would be a corresponding table; the
primary key would correspond to the entity's identifier (for simplicity,
at first I would expect we wouldn't handle multi-part primary keys).
However, not every table corresponds to an entity-set. On the REST side
of things for each entity there's a resource that directly corresponds
to that entity; there may also be other resources that provide
alternative views of the entity.

The data service specification would declare one or more top-level named
entity-sets. In my order database example, the top-level entity-sets
might be named "products", "orders" and "customers". The name of the
database table corresponding with a particular named entity set would
obviously default to the name of the entity set.

The second key part of this approach is dealing with things like the
order_items table in my example, where information that is logically
associated with one entity is in a separate database table from the
entity-set to which the entity belongs. I think the way to handle this
is to use the concept that a table that does not correspond to a
top-level entity-set can be "owned by" a table that does. So for
example, the order_items table would be owned by the orders tables. For
some cases, it may be necessary to be explicit about how the rows of the
owned table relate to the rows of the owning table, but my guess is that
in most cases you can do the right thing by looking at the primary
key/foreign key information in the database schema.

I believe it's possible to provide a basic REST interface for many
databases using just

- the top-level entity-sets and their corresponding tables,

- ownership relationships between tables, and

- the database schema

Obviously there are lots of different ways of providing a REST
interface, but I think most of them can be intelligently defaulted (or
even fixed). Let's assume we want to expose the REST interface at
http://example.com/db/.

- There needs to be an XML representation of an entity that both can be
generated from the database and also allows the database to be updated
from the XML representation. The database schema is enough to allow a
reasonable default. Any customization facilities mustn't be so flexible
that they inhibit automatic generation of an XML schema or mapping from
the XML back to the database. The tricky bit will be handling ownership
relationships. My guess is that you can mostly do the right thing by
looking at the primary key/foreign keys. It should also be possible to
automatically turn foreign keys into the appropriate URI because you can
tell from the configuration where the URI for the resource corresponding
to an entity is.

- There needs to be a URI for the resource corresponding to each
entity-set. This can be defaulted from the name of the entity-set: for
example, the orders entity-set might be at
http://example.com/db/orders/. A GET on that would provide a listing of
all the entities in the entity-set. There would need to be a
configurable limit on the number of entities returned by such a GET and
a way to iterate over large entity sets (e.g. using queries to specify
the range of the result to return). There should be some configuration
that says what fields of the entity are returned in a GET on the
entity-set: obviously the URI of the entity needs to be there; you might
want just that, you might want a single title-like field (as in Atom) to
be returned, or you might want all fields to be returned.

- There needs to be a URI for the resource corresponding to each entity.
This would default to the URI for the entity-set plus the canonical
lexical representation of the primary key. For example,
http://example.com/db/orders/12345. A GET on this would return the XML
representation of the entity. The existing entity could be modified by
doing a PUT on its URI. DELETE on the URI will delete the entity.

- There are two ways that a new entity might be added: doing a POST on
the _entity-set_ URI or doing a PUT on the _entity_ URI. It should be
possible to automatically figure out which is the right way for a
particular entity set based on whether the primary key is autogenerated
(I think you can get this from the database schema): POST if it's
autogenerated, PUT if it's not.

The next big thing that a REST interface would need is some searching
capability. A starting point is to allow the user to specify that
certain fields are searchable. For example, if they specify that the
country field of the customers entity-set is searchable. Then
http://example.com/db/customers?country=US would return a listing of all
customers with a country field equal to US. The next step might be to
allow the a query parameter to be associated with an SQL expression. For
example, we might want http://example.com/db/customers?min-age=18 to
give us a list of all customers aged at least 18. The configuration
might have something like this:

<query-param name="min-age" type="int">
<field name="dateOfBirth"/> - now > 18 years
</query-param>

This would allow query parameters to compose properly with no extra work
(e.g. http://example.com/db/customers?min-age=18&country=US would "just
work").

We would also probably want a way to provide different views of the
entities, e.g. that excluded certain fields.

By working at a relatively high level, we can automatically can do
several nice things for the user:

- we can automatically provide introspection facilities (e.g. WADL),
complete with XSD and RELAX NG schemas

- we should (I think) be able to automatically generate ETags; this
important for cacheability and crucial for dealing with concurrent
updates

- it should be a small step to get an Atom interface as well

So far I've focused on REST. That's partly because I think we have a
bit of corporate REST deficit at the moment, and partly because I think
it's easy to go from a REST interface to a WSDL (service-oriented)
interface than vice-versa. How might a WSDL interface be specified? I
envisage there being a number of built-in methods such as add,
addMultiple, delete, deleteMatching, search, iterate, get which could
apply to an entity or entity-set. The basic idea would be that the user
would identify which built-in methods are allowed for which entity-sets.
Each built-in method would have some number of configurable parameters.
For example, the user might specify that they want to enable the "add"
method for the "customers" entity-set. By default we might choose a
WSDL operation name of addCustomer, but there would be a configurable
parameter that would allow it to be changed to createCustomer. There
might be some configurable parameters at the entity-set level: for
example, the singular noun to be used (e.g. so that you can have a table
called "people" and get methods called "addPerson", "removePerson").

Given the built-in method and the database schema it should be possible
to automatically generate a tasteful default WSDL interface. The user
wouldn't need to worry about writing an XSD schema: even when the input
XML is complex, the semantics of the builtin method together with the
database schema should be enough to allow us to create the XSD for the
user. Apart from automatically generating the WSDL, another nice thing
we should be able to do for the user in the WS-* world is automatically
support WS-Transfer and WS-Enumeration. Maybe we could even have a
method that generates events when the database is modified (though this
would require permission to create database triggers).

In some cases, the built-in methods may not be sufficient. I envisage
providing two ways to go beyond this. The first way would require XML
and SQL skills but not programming skills. This would be quite similar
to what we have at the moment: the user would provide a fragment of SQL,
perhaps an XSD for the output XML or more likely the input XML, perhaps
an XPath to get the input XML into SQL parameters, perhaps an XSLT to
get the SQL into the desired XML form. The second way, which would
require programming skills, would be to make the set of built-in methods
extensible. The user would be able to extend the available built-in
methods just by dropping in a jar file containing a class that
implements a particular interface. The tricky bit would be designing
this interface: maybe it would work by generating SQL/XSD/XPath/XSLT, or
maybe it would work completely differently.

In terms of tooling, I think this is declarative enough that it should
be possible to create a nice, easy to use Ajax interface that works on
the XML configuration file, which would guided by an XML representation
of the database schema.

This message is already rather long. I haven't talked about what I see
as the problems with the current approach. I can do that if people
want. The fundamental reason why I prefer the approach I've outlined
above is that I think it's better for the specification to express as
much as it can at as high a semantic level as possible. I don't think
there's a big technical risk in the kind of approach I'm suggesting: it
has a lot of conceptual similarities to object-relational mapping
technologies, such as the Java Persistence API
(http://java.sun.com/developer/technicalArticles/J2EE/jpa/).

BTW, if anybody's a bit rusty on databases, I would recommend this book:
http://www.amazon.com/Database-Systems-Complete-Hector-Garcia-Molina/dp/0130319953/ (the Amazon customer reviews page has an amusing mixture of 1-star and 5-star reviews).

James
Sanjiva Weerawarana
2007-03-26 00:51:05 UTC
Permalink
Hi James,

First of all let me say that I like this idea quite a lot.
James Clark
2007-03-26 06:49:49 UTC
Permalink
Post by Sanjiva Weerawarana
Hi James,
First of all let me say that I like this idea quite a lot.
James Clark
2007-04-14 06:21:30 UTC
Permalink
The ADO.NET Entity Framework

http://msdn2.microsoft.com/en-us/library/aa697427(VS.80).aspx

seems to have some ideas that are relevant to this.

(It's scary how far Microsoft's language-integrated query work puts it
ahead of the Java world.)

James
Post by James Clark
I've been wondering whether a rather different approach to specifying
the data service might be preferable. I haven't worked this approach
out in anything like the detail that our current approach has been
worked out in, but I will try to explain the basic idea.
The overall philosophy is to be higher-level, more declarative, easier
to use, but less flexible and less powerful. With this philosophy it's
not a goal that the user should be able to design an arbitrary web
service or REST interface to the information in the database and then
use the configuration file to specify that design. Instead the user
gets to decide the kind of reading, writing and searching of the data
that they require the web service/REST interface to provide and we
automatically create a good-quality web service/REST interface that
meets that requirement, together with some modest level of tweakability.
The fundamental concept is an "entity-set". The data service would
declare one or more named entity-sets. An entity-set is (surprise,
surprise) a set of entities. Each entity has an identifier that
uniquely identifies it within its entity-set. On the database side of
things, for each entity set there would be a corresponding table; the
primary key would correspond to the entity's identifier (for simplicity,
at first I would expect we wouldn't handle multi-part primary keys).
However, not every table corresponds to an entity-set. On the REST side
of things for each entity there's a resource that directly corresponds
to that entity; there may also be other resources that provide
alternative views of the entity.
The data service specification would declare one or more top-level named
entity-sets. In my order database example, the top-level entity-sets
might be named "products", "orders" and "customers". The name of the
database table corresponding with a particular named entity set would
obviously default to the name of the entity set.
The second key part of this approach is dealing with things like the
order_items table in my example, where information that is logically
associated with one entity is in a separate database table from the
entity-set to which the entity belongs. I think the way to handle this
is to use the concept that a table that does not correspond to a
top-level entity-set can be "owned by" a table that does. So for
example, the order_items table would be owned by the orders tables. For
some cases, it may be necessary to be explicit about how the rows of the
owned table relate to the rows of the owning table, but my guess is that
in most cases you can do the right thing by looking at the primary
key/foreign key information in the database schema.
I believe it's possible to provide a basic REST interface for many
databases using just
- the top-level entity-sets and their corresponding tables,
- ownership relationships between tables, and
- the database schema
Obviously there are lots of different ways of providing a REST
interface, but I think most of them can be intelligently defaulted (or
even fixed). Let's assume we want to expose the REST interface at
http://example.com/db/.
- There needs to be an XML representation of an entity that both can be
generated from the database and also allows the database to be updated
from the XML representation. The database schema is enough to allow a
reasonable default. Any customization facilities mustn't be so flexible
that they inhibit automatic generation of an XML schema or mapping from
the XML back to the database. The tricky bit will be handling ownership
relationships. My guess is that you can mostly do the right thing by
looking at the primary key/foreign keys. It should also be possible to
automatically turn foreign keys into the appropriate URI because you can
tell from the configuration where the URI for the resource corresponding
to an entity is.
- There needs to be a URI for the resource corresponding to each
entity-set. This can be defaulted from the name of the entity-set: for
example, the orders entity-set might be at
http://example.com/db/orders/. A GET on that would provide a listing of
all the entities in the entity-set. There would need to be a
configurable limit on the number of entities returned by such a GET and
a way to iterate over large entity sets (e.g. using queries to specify
the range of the result to return). There should be some configuration
that says what fields of the entity are returned in a GET on the
entity-set: obviously the URI of the entity needs to be there; you might
want just that, you might want a single title-like field (as in Atom) to
be returned, or you might want all fields to be returned.
- There needs to be a URI for the resource corresponding to each entity.
This would default to the URI for the entity-set plus the canonical
lexical representation of the primary key. For example,
http://example.com/db/orders/12345. A GET on this would return the XML
representation of the entity. The existing entity could be modified by
doing a PUT on its URI. DELETE on the URI will delete the entity.
- There are two ways that a new entity might be added: doing a POST on
the _entity-set_ URI or doing a PUT on the _entity_ URI. It should be
possible to automatically figure out which is the right way for a
particular entity set based on whether the primary key is autogenerated
(I think you can get this from the database schema): POST if it's
autogenerated, PUT if it's not.
The next big thing that a REST interface would need is some searching
capability. A starting point is to allow the user to specify that
certain fields are searchable. For example, if they specify that the
country field of the customers entity-set is searchable. Then
http://example.com/db/customers?country=US would return a listing of all
customers with a country field equal to US. The next step might be to
allow the a query parameter to be associated with an SQL expression. For
example, we might want http://example.com/db/customers?min-age=18 to
give us a list of all customers aged at least 18. The configuration
<query-param name="min-age" type="int">
<field name="dateOfBirth"/> - now > 18 years
</query-param>
This would allow query parameters to compose properly with no extra work
(e.g. http://example.com/db/customers?min-age=18&country=US would "just
work").
We would also probably want a way to provide different views of the
entities, e.g. that excluded certain fields.
By working at a relatively high level, we can automatically can do
- we can automatically provide introspection facilities (e.g. WADL),
complete with XSD and RELAX NG schemas
- we should (I think) be able to automatically generate ETags; this
important for cacheability and crucial for dealing with concurrent
updates
- it should be a small step to get an Atom interface as well
So far I've focused on REST. That's partly because I think we have a
bit of corporate REST deficit at the moment, and partly because I think
it's easy to go from a REST interface to a WSDL (service-oriented)
interface than vice-versa. How might a WSDL interface be specified? I
envisage there being a number of built-in methods such as add,
addMultiple, delete, deleteMatching, search, iterate, get which could
apply to an entity or entity-set. The basic idea would be that the user
would identify which built-in methods are allowed for which entity-sets.
Each built-in method would have some number of configurable parameters.
For example, the user might specify that they want to enable the "add"
method for the "customers" entity-set. By default we might choose a
WSDL operation name of addCustomer, but there would be a configurable
parameter that would allow it to be changed to createCustomer. There
might be some configurable parameters at the entity-set level: for
example, the singular noun to be used (e.g. so that you can have a table
called "people" and get methods called "addPerson", "removePerson").
Given the built-in method and the database schema it should be possible
to automatically generate a tasteful default WSDL interface. The user
wouldn't need to worry about writing an XSD schema: even when the input
XML is complex, the semantics of the builtin method together with the
database schema should be enough to allow us to create the XSD for the
user. Apart from automatically generating the WSDL, another nice thing
we should be able to do for the user in the WS-* world is automatically
support WS-Transfer and WS-Enumeration. Maybe we could even have a
method that generates events when the database is modified (though this
would require permission to create database triggers).
In some cases, the built-in methods may not be sufficient. I envisage
providing two ways to go beyond this. The first way would require XML
and SQL skills but not programming skills. This would be quite similar
to what we have at the moment: the user would provide a fragment of SQL,
perhaps an XSD for the output XML or more likely the input XML, perhaps
an XPath to get the input XML into SQL parameters, perhaps an XSLT to
get the SQL into the desired XML form. The second way, which would
require programming skills, would be to make the set of built-in methods
extensible. The user would be able to extend the available built-in
methods just by dropping in a jar file containing a class that
implements a particular interface. The tricky bit would be designing
this interface: maybe it would work by generating SQL/XSD/XPath/XSLT, or
maybe it would work completely differently.
In terms of tooling, I think this is declarative enough that it should
be possible to create a nice, easy to use Ajax interface that works on
the XML configuration file, which would guided by an XML representation
of the database schema.
This message is already rather long. I haven't talked about what I see
as the problems with the current approach. I can do that if people
want. The fundamental reason why I prefer the approach I've outlined
above is that I think it's better for the specification to express as
much as it can at as high a semantic level as possible. I don't think
there's a big technical risk in the kind of approach I'm suggesting: it
has a lot of conceptual similarities to object-relational mapping
technologies, such as the Java Persistence API
(http://java.sun.com/developer/technicalArticles/J2EE/jpa/).
http://www.amazon.com/Database-Systems-Complete-Hector-Garcia-Molina/dp/0130319953/ (the Amazon customer reviews page has an amusing mixture of 1-star and 5-star reviews).
James
_______________________________________________
Wsf-general mailing list
http://wso2.org/cgi-bin/mailman/listinfo/wsf-general
Sanjiva Weerawarana
2007-04-18 07:39:42 UTC
Permalink
I agree the language integrated query work makes it so much easier to do a
wide class of applications. Java is without a doubt so behind in language
innovation now.

On ADO Entities- is that really different from EJB CMP and EJB QL?

Sanjiva.
Post by James Clark
The ADO.NET Entity Framework
http://msdn2.microsoft.com/en-us/library/aa697427(VS.80).aspx
seems to have some ideas that are relevant to this.
(It's scary how far Microsoft's language-integrated query work puts it
ahead of the Java world.)
James
Post by James Clark
I've been wondering whether a rather different approach to specifying
the data service might be preferable. I haven't worked this approach
out in anything like the detail that our current approach has been
worked out in, but I will try to explain the basic idea.
The overall philosophy is to be higher-level, more declarative, easier
to use, but less flexible and less powerful. With this philosophy it's
not a goal that the user should be able to design an arbitrary web
service or REST interface to the information in the database and then
use the configuration file to specify that design. Instead the user
gets to decide the kind of reading, writing and searching of the data
that they require the web service/REST interface to provide and we
automatically create a good-quality web service/REST interface that
meets that requirement, together with some modest level of tweakability.
The fundamental concept is an "entity-set". The data service would
declare one or more named entity-sets. An entity-set is (surprise,
surprise) a set of entities. Each entity has an identifier that
uniquely identifies it within its entity-set. On the database side of
things, for each entity set there would be a corresponding table; the
primary key would correspond to the entity's identifier (for simplicity,
at first I would expect we wouldn't handle multi-part primary keys).
However, not every table corresponds to an entity-set. On the REST side
of things for each entity there's a resource that directly corresponds
to that entity; there may also be other resources that provide
alternative views of the entity.
The data service specification would declare one or more top-level named
entity-sets. In my order database example, the top-level entity-sets
might be named "products", "orders" and "customers". The name of the
database table corresponding with a particular named entity set would
obviously default to the name of the entity set.
The second key part of this approach is dealing with things like the
order_items table in my example, where information that is logically
associated with one entity is in a separate database table from the
entity-set to which the entity belongs. I think the way to handle this
is to use the concept that a table that does not correspond to a
top-level entity-set can be "owned by" a table that does. So for
example, the order_items table would be owned by the orders tables. For
some cases, it may be necessary to be explicit about how the rows of the
owned table relate to the rows of the owning table, but my guess is that
in most cases you can do the right thing by looking at the primary
key/foreign key information in the database schema.
I believe it's possible to provide a basic REST interface for many
databases using just
- the top-level entity-sets and their corresponding tables,
- ownership relationships between tables, and
- the database schema
Obviously there are lots of different ways of providing a REST
interface, but I think most of them can be intelligently defaulted (or
even fixed). Let's assume we want to expose the REST interface at
http://example.com/db/.
- There needs to be an XML representation of an entity that both can be
generated from the database and also allows the database to be updated
from the XML representation. The database schema is enough to allow a
reasonable default. Any customization facilities mustn't be so flexible
that they inhibit automatic generation of an XML schema or mapping from
the XML back to the database. The tricky bit will be handling ownership
relationships. My guess is that you can mostly do the right thing by
looking at the primary key/foreign keys. It should also be possible to
automatically turn foreign keys into the appropriate URI because you can
tell from the configuration where the URI for the resource corresponding
to an entity is.
- There needs to be a URI for the resource corresponding to each
entity-set. This can be defaulted from the name of the entity-set: for
example, the orders entity-set might be at
http://example.com/db/orders/. A GET on that would provide a listing of
all the entities in the entity-set. There would need to be a
configurable limit on the number of entities returned by such a GET and
a way to iterate over large entity sets (e.g. using queries to specify
the range of the result to return). There should be some configuration
that says what fields of the entity are returned in a GET on the
entity-set: obviously the URI of the entity needs to be there; you might
want just that, you might want a single title-like field (as in Atom) to
be returned, or you might want all fields to be returned.
- There needs to be a URI for the resource corresponding to each entity.
This would default to the URI for the entity-set plus the canonical
lexical representation of the primary key. For example,
http://example.com/db/orders/12345. A GET on this would return the XML
representation of the entity. The existing entity could be modified by
doing a PUT on its URI. DELETE on the URI will delete the entity.
- There are two ways that a new entity might be added: doing a POST on
the _entity-set_ URI or doing a PUT on the _entity_ URI. It should be
possible to automatically figure out which is the right way for a
particular entity set based on whether the primary key is autogenerated
(I think you can get this from the database schema): POST if it's
autogenerated, PUT if it's not.
The next big thing that a REST interface would need is some searching
capability. A starting point is to allow the user to specify that
certain fields are searchable. For example, if they specify that the
country field of the customers entity-set is searchable. Then
http://example.com/db/customers?country=US would return a listing of all
customers with a country field equal to US. The next step might be to
allow the a query parameter to be associated with an SQL expression. For
example, we might want http://example.com/db/customers?min-age=18 to
give us a list of all customers aged at least 18. The configuration
<query-param name="min-age" type="int">
<field name="dateOfBirth"/> - now > 18 years
</query-param>
This would allow query parameters to compose properly with no extra work
(e.g. http://example.com/db/customers?min-age=18&country=US would "just
work").
We would also probably want a way to provide different views of the
entities, e.g. that excluded certain fields.
By working at a relatively high level, we can automatically can do
- we can automatically provide introspection facilities (e.g. WADL),
complete with XSD and RELAX NG schemas
- we should (I think) be able to automatically generate ETags; this
important for cacheability and crucial for dealing with concurrent
updates
- it should be a small step to get an Atom interface as well
So far I've focused on REST. That's partly because I think we have a
bit of corporate REST deficit at the moment, and partly because I think
it's easy to go from a REST interface to a WSDL (service-oriented)
interface than vice-versa. How might a WSDL interface be specified? I
envisage there being a number of built-in methods such as add,
addMultiple, delete, deleteMatching, search, iterate, get which could
apply to an entity or entity-set. The basic idea would be that the user
would identify which built-in methods are allowed for which entity-sets.
Each built-in method would have some number of configurable parameters.
For example, the user might specify that they want to enable the "add"
method for the "customers" entity-set. By default we might choose a
WSDL operation name of addCustomer, but there would be a configurable
parameter that would allow it to be changed to createCustomer. There
might be some configurable parameters at the entity-set level: for
example, the singular noun to be used (e.g. so that you can have a table
called "people" and get methods called "addPerson", "removePerson").
Given the built-in method and the database schema it should be possible
to automatically generate a tasteful default WSDL interface. The user
wouldn't need to worry about writing an XSD schema: even when the input
XML is complex, the semantics of the builtin method together with the
database schema should be enough to allow us to create the XSD for the
user. Apart from automatically generating the WSDL, another nice thing
we should be able to do for the user in the WS-* world is automatically
support WS-Transfer and WS-Enumeration. Maybe we could even have a
method that generates events when the database is modified (though this
would require permission to create database triggers).
In some cases, the built-in methods may not be sufficient. I envisage
providing two ways to go beyond this. The first way would require XML
and SQL skills but not programming skills. This would be quite similar
to what we have at the moment: the user would provide a fragment of SQL,
perhaps an XSD for the output XML or more likely the input XML, perhaps
an XPath to get the input XML into SQL parameters, perhaps an XSLT to
get the SQL into the desired XML form. The second way, which would
require programming skills, would be to make the set of built-in methods
extensible. The user would be able to extend the available built-in
methods just by dropping in a jar file containing a class that
implements a particular interface. The tricky bit would be designing
this interface: maybe it would work by generating SQL/XSD/XPath/XSLT, or
maybe it would work completely differently.
In terms of tooling, I think this is declarative enough that it should
be possible to create a nice, easy to use Ajax interface that works on
the XML configuration file, which would guided by an XML representation
of the database schema.
This message is already rather long. I haven't talked about what I see
as the problems with the current approach. I can do that if people
want. The fundamental reason why I prefer the approach I've outlined
above is that I think it's better for the specification to express as
much as it can at as high a semantic level as possible. I don't think
there's a big technical risk in the kind of approach I'm suggesting: it
has a lot of conceptual similarities to object-relational mapping
technologies, such as the Java Persistence API
(http://java.sun.com/developer/technicalArticles/J2EE/jpa/).
http://www.amazon.com/Database-Systems-Complete-Hector-Garcia-Molina/dp/0130319953/ (the Amazon customer reviews page has an amusing mixture of 1-star and 5-star reviews).
James
_______________________________________________
Wsf-general mailing list
http://wso2.org/cgi-bin/mailman/listinfo/wsf-general
--
Sanjiva Weerawarana, Ph.D.
Founder, Chairman & CEO; WSO2, Inc.; http://www.wso2.com/
email: ***@wso2.com; cell: +94 77 787 6880; fax: +1 509 691 2000

"Oxygenating the Web Service Platform."
sumedha rubasinghe
2007-04-18 08:09:20 UTC
Permalink
Post by Sanjiva Weerawarana
I agree the language integrated query work makes it so much easier to
do a wide class of applications. Java is without a doubt so behind in
language innovation now.
On ADO Entities- is that really different from EJB CMP and EJB QL?
Looks almost similar to me. Wonder if the performance is also 'G00D' as
EJB QL, when it comes to large queries with lots of joins.

s/G00D/bad
Post by Sanjiva Weerawarana
Sanjiva.
Post by James Clark
The ADO.NET Entity Framework
http://msdn2.microsoft.com/en-us/library/aa697427(VS.80).aspx
seems to have some ideas that are relevant to this.
(It's scary how far Microsoft's language-integrated query work puts it
ahead of the Java world.)
James
Post by James Clark
I've been wondering whether a rather different approach to specifying
the data service might be preferable. I haven't worked this approach
out in anything like the detail that our current approach has been
worked out in, but I will try to explain the basic idea.
The overall philosophy is to be higher-level, more declarative, easier
to use, but less flexible and less powerful. With this philosophy it's
not a goal that the user should be able to design an arbitrary web
service or REST interface to the information in the database and then
use the configuration file to specify that design. Instead the user
gets to decide the kind of reading, writing and searching of the data
that they require the web service/REST interface to provide and we
automatically create a good-quality web service/REST interface that
meets that requirement, together with some modest level of
tweakability.
The fundamental concept is an "entity-set". The data service would
declare one or more named entity-sets. An entity-set is (surprise,
surprise) a set of entities. Each entity has an identifier that
uniquely identifies it within its entity-set. On the database side of
things, for each entity set there would be a corresponding table; the
primary key would correspond to the entity's identifier (for
simplicity,
at first I would expect we wouldn't handle multi-part primary keys).
However, not every table corresponds to an entity-set. On the REST side
of things for each entity there's a resource that directly corresponds
to that entity; there may also be other resources that provide
alternative views of the entity.
The data service specification would declare one or more top-level named
entity-sets. In my order database example, the top-level entity-sets
might be named "products", "orders" and "customers". The name of the
database table corresponding with a particular named entity set would
obviously default to the name of the entity set.
The second key part of this approach is dealing with things like the
order_items table in my example, where information that is logically
associated with one entity is in a separate database table from the
entity-set to which the entity belongs. I think the way to handle this
is to use the concept that a table that does not correspond to a
top-level entity-set can be "owned by" a table that does. So for
example, the order_items table would be owned by the orders tables.
For
some cases, it may be necessary to be explicit about how the rows of the
owned table relate to the rows of the owning table, but my guess is that
in most cases you can do the right thing by looking at the primary
key/foreign key information in the database schema.
I believe it's possible to provide a basic REST interface for many
databases using just
- the top-level entity-sets and their corresponding tables,
- ownership relationships between tables, and
- the database schema
Obviously there are lots of different ways of providing a REST
interface, but I think most of them can be intelligently defaulted (or
even fixed). Let's assume we want to expose the REST interface at
http://example.com/db/.
- There needs to be an XML representation of an entity that both can be
generated from the database and also allows the database to be updated
from the XML representation. The database schema is enough to allow a
reasonable default. Any customization facilities mustn't be so flexible
that they inhibit automatic generation of an XML schema or mapping from
the XML back to the database. The tricky bit will be handling ownership
relationships. My guess is that you can mostly do the right thing by
looking at the primary key/foreign keys. It should also be possible to
automatically turn foreign keys into the appropriate URI because you can
tell from the configuration where the URI for the resource
corresponding
to an entity is.
- There needs to be a URI for the resource corresponding to each
entity-set. This can be defaulted from the name of the entity-set: for
example, the orders entity-set might be at
http://example.com/db/orders/. A GET on that would provide a listing of
all the entities in the entity-set. There would need to be a
configurable limit on the number of entities returned by such a GET and
a way to iterate over large entity sets (e.g. using queries to specify
the range of the result to return). There should be some configuration
that says what fields of the entity are returned in a GET on the
entity-set: obviously the URI of the entity needs to be there; you might
want just that, you might want a single title-like field (as in Atom) to
be returned, or you might want all fields to be returned.
- There needs to be a URI for the resource corresponding to each entity.
This would default to the URI for the entity-set plus the canonical
lexical representation of the primary key. For example,
http://example.com/db/orders/12345. A GET on this would return the XML
representation of the entity. The existing entity could be modified by
doing a PUT on its URI. DELETE on the URI will delete the entity.
- There are two ways that a new entity might be added: doing a POST on
the _entity-set_ URI or doing a PUT on the _entity_ URI. It should be
possible to automatically figure out which is the right way for a
particular entity set based on whether the primary key is autogenerated
(I think you can get this from the database schema): POST if it's
autogenerated, PUT if it's not.
The next big thing that a REST interface would need is some searching
capability. A starting point is to allow the user to specify that
certain fields are searchable. For example, if they specify that the
country field of the customers entity-set is searchable. Then
http://example.com/db/customers?country=US would return a listing of all
customers with a country field equal to US. The next step might be to
allow the a query parameter to be associated with an SQL expression. For
example, we might want http://example.com/db/customers?min-age=18 to
give us a list of all customers aged at least 18. The configuration
<query-param name="min-age" type="int"> <field
name="dateOfBirth"/> - now > 18 years
</query-param>
This would allow query parameters to compose properly with no extra work
(e.g. http://example.com/db/customers?min-age=18&country=US would "just
work").
We would also probably want a way to provide different views of the
entities, e.g. that excluded certain fields.
By working at a relatively high level, we can automatically can do
- we can automatically provide introspection facilities (e.g. WADL),
complete with XSD and RELAX NG schemas
- we should (I think) be able to automatically generate ETags; this
important for cacheability and crucial for dealing with concurrent
updates
- it should be a small step to get an Atom interface as well
So far I've focused on REST. That's partly because I think we have a
bit of corporate REST deficit at the moment, and partly because I think
it's easy to go from a REST interface to a WSDL (service-oriented)
interface than vice-versa. How might a WSDL interface be specified? I
envisage there being a number of built-in methods such as add,
addMultiple, delete, deleteMatching, search, iterate, get which could
apply to an entity or entity-set. The basic idea would be that the user
would identify which built-in methods are allowed for which
entity-sets.
Each built-in method would have some number of configurable parameters.
For example, the user might specify that they want to enable the "add"
method for the "customers" entity-set. By default we might choose a
WSDL operation name of addCustomer, but there would be a configurable
parameter that would allow it to be changed to createCustomer. There
might be some configurable parameters at the entity-set level: for
example, the singular noun to be used (e.g. so that you can have a table
called "people" and get methods called "addPerson", "removePerson").
Given the built-in method and the database schema it should be possible
to automatically generate a tasteful default WSDL interface. The user
wouldn't need to worry about writing an XSD schema: even when the input
XML is complex, the semantics of the builtin method together with the
database schema should be enough to allow us to create the XSD for the
user. Apart from automatically generating the WSDL, another nice thing
we should be able to do for the user in the WS-* world is automatically
support WS-Transfer and WS-Enumeration. Maybe we could even have a
method that generates events when the database is modified (though this
would require permission to create database triggers).
In some cases, the built-in methods may not be sufficient. I envisage
providing two ways to go beyond this. The first way would require XML
and SQL skills but not programming skills. This would be quite similar
to what we have at the moment: the user would provide a fragment of SQL,
perhaps an XSD for the output XML or more likely the input XML, perhaps
an XPath to get the input XML into SQL parameters, perhaps an XSLT to
get the SQL into the desired XML form. The second way, which would
require programming skills, would be to make the set of built-in methods
extensible. The user would be able to extend the available built-in
methods just by dropping in a jar file containing a class that
implements a particular interface. The tricky bit would be designing
this interface: maybe it would work by generating
SQL/XSD/XPath/XSLT, or
maybe it would work completely differently.
In terms of tooling, I think this is declarative enough that it should
be possible to create a nice, easy to use Ajax interface that works on
the XML configuration file, which would guided by an XML representation
of the database schema.
This message is already rather long. I haven't talked about what I see
as the problems with the current approach. I can do that if people
want. The fundamental reason why I prefer the approach I've outlined
above is that I think it's better for the specification to express as
much as it can at as high a semantic level as possible. I don't think
there's a big technical risk in the kind of approach I'm suggesting: it
has a lot of conceptual similarities to object-relational mapping
technologies, such as the Java Persistence API
(http://java.sun.com/developer/technicalArticles/J2EE/jpa/).
http://www.amazon.com/Database-Systems-Complete-Hector-Garcia-Molina/dp/0130319953/
(the Amazon customer reviews page has an amusing mixture of 1-star
and 5-star reviews).
James
_______________________________________________
Wsf-general mailing list
http://wso2.org/cgi-bin/mailman/listinfo/wsf-general
Sanjiva Weerawarana
2007-03-26 05:07:06 UTC
Permalink
Here's another alternative coming from SCA land:

http://www.mail-archive.com/tuscany-dev%40ws.apache.org/msg09978.html

Sanjiva.
Post by Sanjiva Weerawarana
Hi .. last week I spent some time with the data services guys and
discussed ways of fixing and finalizing the data services descriptor.
I've thought thru this more and believe we now have enough to handle all
the scenarios James brought up and more. I need to do some examples (and
to complete the resource part and deal with UPDATE etc. queries) but
please take a look and comment.
See: http://www.wso2.org/wiki/display/wsf/Data+Services+and+Resources
I've borrowed quite a bit from WADL etc. but there's ways to go yet.
Everyone please review.
I'd *really* like to get a first cut of the data services stuff done
(with this lang cleaned up) by the end of the month!
Sanjiva.
--
Sanjiva Weerawarana, Ph.D.
Founder, Chairman & CEO; WSO2, Inc.; http://www.wso2.com/
email: ***@wso2.com; cell: +94 77 787 6880; fax: +1 509 691 2000

"Oxygenating the Web Service Platform."
Loading...