BlogMatrix
 

Improving Google Base: populating the db

edit David P. Janes 2006-07-17 10:57 UTC add comment  ·  ·  ·

This idea is somewhat different than the previous couple of ones we've been posting and more related to the plumbing that the db model.

We've produced a tool, based on structured blogging concepts, that can easily populate XML-driven databases such as Google Base. As mentioned in the first post of this series, you can go see this for yourself.

So what's the problem? This: how do we get it into Google Base?

Currently, Google Base has an "upload" model, where one logs in and uses browser upload to put the file into Google Base. This is great if you're some guy sitting at a computer, but no so great if you're a third-party service provider that the "some guy" has to give his Google password to!

I have two suggestions, both not-difficult to implement:

  1. When specifying the upload file in Google Base, let the user be able to say that a different Google account can upload into this file. For example, allow BlogMatrix to do it.
  2. Google's good at crawling the net. Instead of specifying a file, let the user specify a URI where Google can go get the data. When the data's ready, the user (or a tool on their behalf) can ping Google that something's changed.

Improving Google Base: inter-record linking

edit David P. Janes 2006-07-17 10:27 UTC add comment  ·  ·

This idea is a little more contraversial and probably "out of scope" for Google Base. Nonetheless, I can see a few non-insignificant advantages. Allow Google Base records to link to other Google Base records. These links can be in one of two forms: 

  • hierarchical and dependent
  • explicit, via URI

Let's deal with the second form first. Google Base could define a new Attribute Type which defines a link to another record in Google Base. Then instead of (or in addition to) creating (say) a Google Base record for a course that lists the professor and the university, it could explicitly link to the professor and university records. Now we can start doing all sorts of interesting things with our data. Obviously, link consistency is a probably but given the fluid nature of Google Base's model, I suggest just letting the end user sort it out.

It would be nice, BTW, if these URIs could be written in such a way that they weren't dependent on having the record actually stored in Google Base. Perhaps this could be defined in terms of a URN?

The other type of link -- hierarchical and dependent -- would introduce a "container" attribute. Inside a container one could place a new set of Attribute values. When the outermost container is deleted, so would all dependent records. What does this get us? Well, it brings for example the Business Locations model back into the fold (especially if we implement simple structure also).

 

Improving Google Base: do something with GData

edit David P. Janes 2006-07-17 09:52 UTC add comment  ·  ·  ·

It's bad that GData and Google Base don't have a lot of overlap. Here's a few ideas:

I'm sure there's a lot more that could be done, though I doubt many of GData's "kinds" would nicely fit intp Google Base's data model.

Improving Google Base: more attribute definitions

edit David P. Janes 2006-07-17 09:43 UTC add comment  ·  ·

In the previous post, we mentioned the "person" attribute as if it were part of the Attribute Type definitions. Of course, it isn't. The definitions do mention all sorts of persons (strangely but no doubt coincidently begining with the latter "a"): actor, agent, artist, author.

It would be very useful if Google Base defined a number of very basic concepts as attributes. This would make reusing and understanding what new (and existing) definitions mean a lot easier. Here are a few suggestions for the basic types:

  • person -- yields actor, agent, and so forth
  • organization -- yields university, employer, ...
  • phone_number -- yields fax_number, home_number, ...

Improving Google Base: simple structure

edit David P. Janes 2006-07-17 09:43 UTC add comment  ·  ·  ·

Another useful feature for Google Base would be to allow "simple" structure to be added to Attribute Types. In this simple structure, readers (i.e. Google Base) are free to move the inner structure elements "up a level" with the net result that there would be no change needed for their DB model.

For example, here's a "location" (from here):

<g:location>
1 Bank Street
Ottawa, Ontario
Canada

</g:location>

I propose they also accept: 

<g:location>
    <g:street-address>1 Bank Street</g:street-address>
    <g:locality>Ottawa</g:locality>, <g:region>Ontario</g:region>
    <g:country-name>Canada</g:country-name>
</g:location> 

The 'location' attribute gets stored exactly the way it would be in the first case (we strip the inner markup) BUT we get the additional benefit of the all the new attributes AND we don't have to throw away information we already know!

One could also see this being used in the proposed "person" attribute

<g:person>
    <g:given-name>David</g:given-name> <g:family-name>Janes</g:family-name>
</g:person>

Note that the new attribute names I'm using are based on the vCard standard

Improving Google Base: reusing attribute definitions

edit David P. Janes 2006-07-16 21:28 UTC add comment  ·  ·  ·

The next several posts will be about using the Google Base data model as a language for the Semantic Web.

As outlined in this series of posts, Google Base is very flexible in defining new attributes for the database. Unfortunately, new attributes have to be defined in terms only of base data types and the definition of the type is implied, not defined, by the tag name the user assigns. This is overly simplistic, an unnecessary restriction and inflexible.

For example, let's say you want to define a new attribute. Let's say the person to contact. Since there's no "contact" defined in the standard Attribute Types under the current model, this is what you'd add:

    <gc:contact_person type="string">Johnny Chase</gc:contact_person>

("gc" is the "http://base.google.com/cns/1.0" namespace, as defined here.) Let's face it: this is pretty thin gruel. The computer knows that this is a string and -- if you can read English -- humans can infer than that this is a "Contact Person". Google Base is so close and can do so much better.

We propose that Google Base should allow new Attribute Types to be defined based on existing Attribute Types. For example:

    <gc:contact base="person">Johnny Chase</gc:contact>

That is, we've defined a new type called Contact that's based on an existing Attribute Type called "person"*. Ooooooo ... very nice, very simple, and we've already gained a lot knowledge -- from a computer point of view -- what "Johnny Chase" is all about. And Google Base hasn't lost anything either -- deep down, it knows it's just a string.

* we know. See the next message.