Posts tagged
gbase —
all matching results
Michael Fagan has spotted that Google Base (which we have written extensively about) is now accessable by the GData API.
We'll have a longer post about this soon. Quick notes:
-
this means Google Base is now readable in a structured fashion -- this is good, very good
-
missed by most coverage so far, but almost as important: AuthSub let's third party applications access Google Base on your behalf. BlogMatrix will be all over this!
Search Engine Watch is reporting (hat tip: Scoble) that Google Base is now providing RSS feeds. For example, this page has this feed (the RSS icon is in the upper-right hand side corner). Alas, it isn't providing the "Google Base data" in these feeds. If you want to know why this would be good, read on.
I hope you enjoyed and found at least a little bit useful this series of posts about Google Base. I'm sure there's a few mistakes and I'll correct them as I -- or you, there's a comments section, you know -- find them.
Everything I've written about Google Base is here.
Broken links are now fixed.
What is the Semantic Web? Here's Wikipedia's definition, which is probably as good as any, but a good working definition is a layer of the World Wide Web that is meant to be read and understood by computer programs (as opposed to the traditional web, where humans are the end consumer). I beleive the Google Base data model provides an excellent addition to tools and languages currently being used to bootstrap the Semantic Web. In particular: - The GBase data model is easy to produce.
This is a huge advantage. When I read articles like this (Danny Ayers comments) about the semantic web, I get the impression of ivory towers and massive queries taking weeks to write to query parts of protein databases. Maybe that's not fair, but my vision of the Semantic Web is something much more personal, something almost trivial to produce as a byproduct of day-to-day activities, such as blogging, wikiing, e-mailing and so forth - The GBase data model is easy to consume
- The GBase data model is easy to transform into RDF (or anything else)
- The GBase data model is easy to understand (RDF's biggest problem, ahem: "A triple can simply be described as three URIs. A language which utilises three URIs in such a way is called RDF" -- that explains a lot!)
- There is a lot of being produced by a lot of different people for Google Base
There's a few things that would greatly improve the utility of Google Base and its data model: - Google should export it's database in XML
- Google should consider modifying (upgrading?) it's data model as per the suggestions below
- We need to see "open" or at least non-Google consumers of GBase data
- We need more Google Base data producers. BlogMatrix is doing its part. (We also produce RDF and N3, thank you very much).
I know the word language isn't probably the right one, but it feels right to me.
This idea is somewhat different than the previous couple of ones we've been posting and more related to the plumbing that the db model. We've produced a tool, based on structured blogging concepts, that can easily populate XML-driven databases such as Google Base. As mentioned in the first post of this series, you can go see this for yourself. So what's the problem? This: how do we get it into Google Base? Currently, Google Base has an "upload" model, where one logs in and uses browser upload to put the file into Google Base. This is great if you're some guy sitting at a computer, but no so great if you're a third-party service provider that the "some guy" has to give his Google password to! I have two suggestions, both not-difficult to implement: - When specifying the upload file in Google Base, let the user be able to say that a different Google account can upload into this file. For example, allow BlogMatrix to do it.
- Google's good at crawling the net. Instead of specifying a file, let the user specify a URI where Google can go get the data. When the data's ready, the user (or a tool on their behalf) can ping Google that something's changed.
This idea is a little more contraversial and probably "out of scope" for Google Base. Nonetheless, I can see a few non-insignificant advantages. Allow Google Base records to link to other Google Base records. These links can be in one of two forms:
-
hierarchical and dependent
-
explicit, via URI
Let's deal with the second form first. Google Base could define a new Attribute Type which defines a link to another record in Google Base. Then instead of (or in addition to) creating (say) a Google Base record for a course that lists the professor and the university, it could explicitly link to the professor and university records. Now we can start doing all sorts of interesting things with our data. Obviously, link consistency is a probably but given the fluid nature of Google Base's model, I suggest just letting the end user sort it out.
It would be nice, BTW, if these URIs could be written in such a way that they weren't dependent on having the record actually stored in Google Base. Perhaps this could be defined in terms of a URN?
The other type of link -- hierarchical and dependent -- would introduce a "container" attribute. Inside a container one could place a new set of Attribute values. When the outermost container is deleted, so would all dependent records. What does this get us? Well, it brings for example the Business Locations model back into the fold (especially if we implement simple structure also).
It's bad that GData and Google Base don't have a lot of overlap. Here's a few ideas:
I'm sure there's a lot more that could be done, though I doubt many of GData's "kinds" would nicely fit intp Google Base's data model.
Another useful feature for Google Base would be to allow "simple" structure to be added to Attribute Types. In this simple structure, readers (i.e. Google Base) are free to move the inner structure elements "up a level" with the net result that there would be no change needed for their DB model.
For example, here's a "location" (from here):
<g:location>
1 Bank Street
Ottawa, Ontario
Canada
</g:location>
I propose they also accept:
<g:location>
<g:street-address>1 Bank Street</g:street-address>
<g:locality>Ottawa</g:locality>, <g:region>Ontario</g:region>
<g:country-name>Canada</g:country-name>
</g:location>
The 'location' attribute gets stored exactly the way it would be in the first case (we strip the inner markup) BUT we get the additional benefit of the all the new attributes AND we don't have to throw away information we already know!
One could also see this being used in the proposed "person" attribute:
<g:person>
<g:given-name>David</g:given-name> <g:family-name>Janes</g:family-name>
</g:person>
Note that the new attribute names I'm using are based on the vCard standard.
In the previous post, we mentioned the "person" attribute as if it were part of the Attribute Type definitions. Of course, it isn't. The definitions do mention all sorts of persons (strangely but no doubt coincidently begining with the latter "a"): actor, agent, artist, author.
It would be very useful if Google Base defined a number of very basic concepts as attributes. This would make reusing and understanding what new (and existing) definitions mean a lot easier. Here are a few suggestions for the basic types:
-
person -- yields actor, agent, and so forth
-
organization -- yields university, employer, ...
-
phone_number -- yields fax_number, home_number, ...
The next several posts will be about using the Google Base data model as a language for the Semantic Web.
As outlined in this series of posts, Google Base is very flexible in defining new attributes for the database. Unfortunately, new attributes have to be defined in terms only of base data types and the definition of the type is implied, not defined, by the tag name the user assigns. This is overly simplistic, an unnecessary restriction and inflexible.
For example, let's say you want to define a new attribute. Let's say the person to contact. Since there's no "contact" defined in the standard Attribute Types under the current model, this is what you'd add:
<gc:contact_person type="string">Johnny Chase</gc:contact_person>
("gc" is the "http://base.google.com/cns/1.0" namespace, as defined here.) Let's face it: this is pretty thin gruel. The computer knows that this is a string and -- if you can read English -- humans can infer than that this is a "Contact Person". Google Base is so close and can do so much better.
We propose that Google Base should allow new Attribute Types to be defined based on existing Attribute Types. For example:
<gc:contact base="person">Johnny Chase</gc:contact>
That is, we've defined a new type called Contact that's based on an existing Attribute Type called "person"*. Ooooooo ... very nice, very simple, and we've already gained a lot knowledge -- from a computer point of view -- what "Johnny Chase" is all about. And Google Base hasn't lost anything either -- deep down, it knows it's just a string.
* we know. See the next message.
If you're researching Google's various APIs, you're bound to come across something called the "Google Data API" aka GData. It describes itself as: The Google data APIs ("GData" for short) provide a simple standard protocol for reading and writing data on the web. GData combines common XML-based syndication formats (Atom and RSS) with a feed-publishing system based on the Atom publishing protocol, plus some extensions for handling queries. It's a lot more than a protocol though. It also defines a data model ("kinds") for populating commonly used elements. Here's some of the types: - gd:comments
- gd:contactSection
- gd:email
- gd:entryLink
- gd:feedLink
- gd:geoPt
- gd:im
- gd:originalEvent
- gd:phoneNumber
- gd:postalAddress
- gd:rating
- gd:recurrence
- gd:recurrenceException
- gd:reminder
- gd:when
- gd:where
- gd:who
These elements have deep structure, attributes and other such things. What does it have to do with the Google Base model? Easy to answer: nothing. This is very very unfortunate and it probably a good sign as any that Google's becoming a pretty big company, like IBM or Microsoft. What use does Google have for GData? It's main purpose at this time is to allow outside to tools to populate Google Calendar. We can only hope this will somehow be merged or made consistent with the Google Base model and API.
One strange discontinuity of the Google Base data model is bulk uploading "business locations". Unlike all other Google Base items, these cannot be uploaded in an RSS/Atom XML file. Instead, they must be uploaded using a CSV spreadsheet file. For completeness, we shall outline the data model used here:
-
STORE_CODE (a unique user defined store code)
-
ADDRESS_LINE_1
-
CITY
-
STATE
-
POSTAL_CODE
-
COUNTRY_CODE
-
MAIN_PHONE
The meaning of thes should be fairly self explanitory. Strangley, bulk uploading store locations is only available in the US and the UK. What's wrong with the rest of us?
The reason for the discontinuity is that the complete set of all addresses for a store is considered a single Google Base item. As you can easily see, this wouldn't easily map back into the low-structure XML definitions all other Google Base items are using.
We'll have suggestions in an upcoming post how this part of the data model could (and should) be brought back into the fold.
In this series of posts on Google Base (read them all), we've been describing parts of the Google Base data model. In this post, we'll attempt to put it altogether. It's important to note that you can try out using the Google web UI just about everything we're discussing here (if you have a Google login, which you probably do):
Google Base is fairly well documented. We've been using the "bulk upload" section to find most the info we've been discussing in this series. Here's the important docs if you want to read through them:
Now, onto our summary. So far we've learned:
Additionally:
-
the Google Base data model is very simple -- there is virtually no structure except "this item has these attributes"
-
i.e. not unlike a fairly standard non-nested struct definition in C, or a row in a CSV database
-
there is no nested structure, except what is defined in the basic Data Types
-
the Google Base data model is very flexible, within the bounds of what can be done in the bounds of the previous points. You're free to invent anything or add anything to anything else, as long as it's built on the basic Data Types
Next, we'll discuss a few outliers and where this data model could go in the future.
The front page of Google Base gives you two basic choices to "Post an item": "Choose an existing item type" or "Create your own item type". The second option indicates the basic flexibility of Google Base, that there's little to a Google Base "information type" beyond being a collection of Attribute Types (attributes discussed here, the basic types that attributes are composed of are discussed here).
The predefined Information Types are:
-
Information type (template, example)
-
Course schedules (template, example)
-
Events (template, example)
-
Jobs (template, example)
-
Housing (template, example)
-
News and articles (template, example)
-
People profiles (template, example)
-
Products (template, example)
-
Recipes (template, example)
-
Research studies and publications (template, example)
-
Reviews (template, example)
-
Services (template, example)
-
Travel (template, example)
-
Vehicles (template, example)
-
Wanted ads (template, example)
Note that this list (from here) actually seems to be out of date with what Google is actually supporting. For example, there's a Podcast information type available on the main search page.
A description of what Attribute Types are expected to be seen in each Information Type are found here. Google encourages you to define as many applicable Attribute Types as possible when filling in items to "greatly increase your item’s chances of showing up in search results".
The main difference (as far as I can tell) between the predefined Information Types and your own is that Google actually is planning to do stuff with the predefined types: i.e. houses for sale, geographic searches, and so forth. However (and again, as far as I can tell) all items stored in Google Base are eligible to show up in Google Base search results.
And finally, you can add any predefined Attribute Type to any Information Type record, even if it isn't formally defined there.
Google Base defines a standard set of "attributes", built upon the basic data types previously mentioned here. These attributes are used to define "information types", which is basically a complete logical record in the Google Base db (we'll explore these soon). This list is open ended, in that Google can and almost certainly will define more attributes to go into this list (as Google adds more information types). You are free to reuse these attributes in an information type, even These are all pretty self explanitory and click on any of the links to get more information.
Google Base (as far as I can tell) defines everything in terms of a few underlying basic data types. These are defined here* and are: - string
- int
- float
- intUnit
- floatUnit
- date
- dateTime
- dateTimeRange
- url
- boolean
- location
When you're editing a Google Base item the types you can dynamically use in the web UI are: - text
- number-unit
- number
- date-range
- large text
- web url
- checkbox
- location
If you squint closely enough, you can see this list more or less maps back to the underyling types. In addition Google Base provides for enumerations, which are strings restricted to a list. For example, the salary_type enumeration takes one of two values: “starting” or “negotiable”. I have not performed any experiments yet to see if Google Base actually enforces the enumeration. * I know this page is defining something else, but it's all the same database, isn't it?
This is the first of several posts I'll be making about Google Base -- and in particular, the RSS/Atom "bulk upload" format which extends those XML formats with addition information that allows Google Base population.
We're working on a project to demonstrate structured data for "sales lead". In terms of standard "exisiting" structured element, this has a contact person, company, phone number and address. In addition, we extend it with Product Name, Percentage Closed, Close Date and so forth. The title of the entry represents the Opportunity and the body is for other comments.
You can see an example of this here. If you're interested in what the blogmatrix.cfg for this looks like, I've attached a sample snippet.
We haven't done anything particular clever yet. In particular, we'll be adding the ability to query against Percentage Closed using tags, items past the close date, and maybe a few other things for the demo.
What's really neat is that we can export this into our RSS feed also, using the Google Base definitions (a mix of predefined type and some we've made up on the spot). You can view the feed (for this one entry!) here or here's the important part (reformatted for readibility):
<rss version="2.0">
<channel>
...
<item>
<title>
The potential to sell 10000 shiny pennies
</title>
<link>
http://home01.semantic.blogmatrix.com/:entry:home01-2006-07-13-0008/
</link>
<g:product_type>
Penny
</g:product_type>
<gc:sales_status type="string">
Still looking for a sucker
</gc:sales_status>
<gc:percent_closed type="int">
0
</gc:percent_closed>
<gc:person hcard:type="fn" type="string">
Johnny Q. Public
</gc:person>
<gc:organization hcard:type="org" type="string">
Bank of Canada
</gc:organization>
<gc:job_position hcard:type="title" type="string">
Secretary to the Undersecretary
</gc:job_position>
<g:location>
1 Bank Street
Ottawa, Ontario
Canada
</g:location>
<gc:phone_work type="string">
605-666-6666
</gc:phone_work>
</item>
</channel>
</rss>
More to follow...
Attached Documents:
Posts tagged
gbase —
all matching results
|
|
Recent Podcasts/Videocasts
|