BlogMatrix
 

Yahoo Pipes

edit David P. Janes 2007-02-08 17:28 UTC add comment  ·  ·

Yahoo introduced a very interesting service this morning: Yahoo Pipes. If you have a Yahoo account (and are somewhat geeky), go check it out. It's an RSS workflow system, where you can take RSS feed(s), do dome processing and logic and produce

From a BlogMatrix technology point of view, it's quite interesting. It uses a similar data model to ours, what I've been calling the "Google Base" model of the semantic web: entry centric, where each entry is extended by ( attribute, value ) pairs. In particular, there's no "deep" model of attribute values with hierarchy or graphs. The beauty of this system is not only that it's fairly straightforward to do, but it's also easy to mentally grasp and thus more like that people will actually use it. Here's the cool this: with the BlogMatrix Platform we should be able to very quickly demonstrate feeding data into this tool (real soon) by feeding our structured data input into our RSS output!

I wonder if there's a way to export the programs created by Yahoo Pipes? And I wonder what this means for Teqlo? And I wonder if we can create a general web programming model on this in a sort of  Dabble DB plugin sort of way?

Here's what people are saying:

  • Nial Kennedy:
    Yahoo! Pipes opens up some interesting possibility for feed aggregators, letting users filter out unwanted content affecting their experience. Pipes opens up a few feeds that were not practical for a human to read in the past, either due to a high volume or possibly a foreign language. My favorite operator is the location extractor which analyzes an item's text attempting to identify addresses, locations, or the URLs of popular mapping services.
  • Anil Dash:
    Most importantly, and perhaps most key to the success or failure of Pipes, are the social functions that underpin the application. With Pipes, it's easy to make your own web services public, to clone web services that others have made, or to offer your own services for others to clone. That element of social sharing of code, first pioneered by platforms like Ning, makes the open source ethos much simpler to participate in. Instead of setting up complex version control systems and submitting patches to a central repository, application cloning works on a principal of infinite forking, taking the idea of embracing failure and building it into the platform. Code 'em all, and let blogs sort 'em out.
  • Tech Crunch:
    The beauty of the application is with its simplicity - a user can take any sources, user input requests or the above mentioned module and drag+drop them into place and then connect the pipes. Within minutes I had built an application (also known as a pipe, they should probably change the name as not everything can be a pipe) that would search for ‘Techcrunch’ in a variety of feeds, bring that data together, sort it and filter it for unique results. I saved the application and published it
  • Tim O'Reilly:
    Yahoo!'s new Pipes service is a milestone in the history of the internet. It's a service that generalizes the idea of the mashup, providing a drag and drop editor that allows you to connect internet data sources, process them, and redirect the output. Yahoo! describes it as "an interactive feed aggregator and manipulator" that allows you to "create feeds that are more powerful, useful and relevant." While it's still a bit rough around the edges, it has enormous promise in turning the web into a programmable environment for everyone.
  • Brady Forrest: Deconstructing a Pipe
  • Brady Forrest: The Modules For Building Pipes
  • Global Nerdy (update):
    There is one important difference between Yahoo! Pipes and those of the Unix variety: while Unix pipes were made with programmers, sysadmins and tech tinkerers in mind, Yahoo! Pipes are made to be more user friendly. While you’ll still need a tiny bit of tech savvy to use Pipes, the user interface, which allows you to visually hook up pieces of code that provide an API significantly lower the barrier to entry for creating applications — you no longer have to be coder!

Google upgrades enterprise search device

edit David P. Janes 2006-09-20 10:26 UTC add comment  ·

InfoWorld reports:

Google has enhanced its enterprise search device, doubling its capacity and adding new query capabilities.

The Search Appliance's maximum capacity has been increased to 30 million documents, up from 15 million, Google announced Tuesday. The increased capacity comes via a software upgrade that also includes search query enhancements.

The intermediate model, the GB-5005, has a 10 million document capacity, while the entry model, the GB-1001, starts at US$30,000 and can index up to 3 million documents.

I think this is one success path for webapps -- selling them directly into the enterprise.

From a Datasphere perspective, there's a few things to note:

  • by merging data + presentation, when we find a document we've often found the data
  • by using weblog presentation to refer to a proprietary document (using attachments/enclosures controlled by a blog CMS), we have a well defined bridge from the HTML world to the proprietary data world

I was going to write something here about searching vs. tagging vs. directories, but suffice it to say that they each bring something to the table and all are required components of the datasphere. (Interesting note: the weakest structuring part of the blogosphere is the directory; thanks to Google, the strongest is the search).

RDF Semantic web research isn't working

edit David P. Janes 2006-09-16 20:49 UTC add comment  ·  ·  ·

Zack Rosen has a post called "RDF Semantic web research isn't working". It's a very easy read and yet so packed full of interesting points that I won't quote any of it and will just say "go read it".

A few additional comments:

  • many SW people "don't get it". Sorry, we don't model the world in triples so starting your sales pitch with that just doesn't cut it; and I'm not an idiot for disagreeing with you
  • the SW missed a wonderful opportunity by not jumping on the "mashup" bandwagon, where it would have been a natural fit for arbitrary data passing between apps (rather than crud-o hand rolled XML formats)
  • every page produced by the BlogMatrix Platform has a corresponding XML/RDF page. Placing the structured data into the RDF shouldn't be too difficult except I'm really not going to make the effort if there isn't the demand
  • I'm working on articulating an alternative vision to the Semantic Web called the Datasphere built microformats (for data sharing), structured blogging (for ad hoc data creation), tagging (for fluid structure) and directories (for inherent structure). Stay tuned.

MIT Technology Review 2006 Young Innovator: Joshua Schachter

edit David P. Janes 2006-09-08 23:42 UTC add comment  ·  ·  ·

Joshua Schachter is the inventor of del.icio.us, the most well-known social bookmarking site. Techology Review has his story here. I like this explanation of "folksonomy":

What del.icio.us's users were creating--without necessarily knowing they were doing so--was what technology blogger Thomas Vander Wal has dubbed a "folksonomy," a flexible system of organization that emerges organically from the choices users make. We're all familiar with the alternative, the kind of rule-bound, top-down classification scheme that Internet theorist Clay Shirky calls "ontological" in nature. The Dewey decimal system is an example: every object is assigned its place in a hierarchical system of organization, and every object is defined as, ultimately, one thing: a book goes in one place in the library and nowhere else. In a folksonomy, by contrast, definitions are fuzzier. With del.icio.us, the same Web page has many different tags, which often aren't even related to one another, and no explicit rules are being followed. Web pages are therefore listed not in one place but in many places, and sometimes pages aren't quite where you might expect them to be. So folksonomies are messier than "ontologies" are.

What del.icio.us has shown, though, is that folksonomies' imperfections are outweighed by their benefits. In the first place, folksonomies are dynamic rather than static. A Web folksonomy thus allows us to reclassify content according to our changing interests. An academic paper that's interesting today might be equally interesting a decade from now--but why it's interesting, why people care about it, might be very different. A traditional categorization system has a hard time dealing with this: once the essence of an object is defined, it's supposed to be defined for good. In a folksonomy, the reclassification happens almost automatically--as people start tagging the paper with new, more relevant tags, for example. Web folksonomies are also better at capturing the multiple meanings and uses that a given site has, rather than constraining the possible range of meanings.

The ability of tags to fluidly organize data is why I indentify it as a key technology for creating the datasphere.

 

Adrian Holovaty on the future of newspapers, with a Datasphere component

edit David P. Janes 2006-09-08 03:40 UTC add comment  ·  ·  ·

One of the inspirations behind this software and the concept of the datasphere is Adrian Holovaty's Chicago Crime which I first say at Mashup Camp 1. Heavily data driven, almost everything in CC is a link -- you can can freely navigate through the data by clicking around. I'm sure if it's strictly-speaking a datasphere application because it doesn't bubble up the data for reuse in the HTML, but one could fairly easily envision how it could in the future.

Adrian has a great post about where he should think newspapers should be going and there's directly applicability to the concept of a datasphere:

But it doesn't stop at those obvious examples. If you take some time to examine what sort of information newspaper journalists collect, the amount of structure will jump at you. If I may take the liberty of giving examples from Web sites I've worked for:

See the theme here? A lot of the information that newspaper organizations collect is relentlessly structured. It just takes somebody to realize the structure (the easy part), and it just takes somebody to start storing it in a structured format (the hard part).

Note that Adrian's mostly talking about recording the structure behind information so new applications can be developed. But the beautiful thing about bubbling up the information into the HTML is you can start cross linking data between different sources (i.e. mashups!).

Datasphere: key concepts

edit David P. Janes 2006-09-08 03:27 UTC add comment

Here are the key concepts and technologies that go into creating the datasphere, from a "blog-centric" point of view.

  • Structured Blogging: allows ad-hoc but well-defined data elements to be added to posts
  • Tagging: fluidly binds posts and other data sources together across the enterprise
  • Microformats: the mechanism to mine and reuse the contents datasphere, to consume data and to mash it up into new applications
  • RSS/Atom/Syndication: provides messaging, efficient updating and notification
  • OPML/Directories: imposes a hierarchy of well-defined structure on top of HTML pages, weblogs, groups and so forth

Blogs aren't the only route into the datasphere, just the most convenient to talk about. Wikis should be included and there is no why traditional databases and legacy systems could not directly export or translate their data into the datasphere, not unlike how the typical intranet was populated with data in Web 1.0 days.

We'll break out each these technologies later and explore how they will work individually and together. 

Introducing the Datasphere

edit David P. Janes 2006-09-08 03:18 UTC add comment

I've been obsessing recently about microformats, tagging and structured data; you're looking at the result right here.

Yesterday I stumbled on the right word to bind these concepts together: the datasphere. Rather than directly defining it, let's start with its root derivation, the blogosphere:

Blogosphere is the collective term encompassing all blogs as a community or social network. Many weblogs are densely interconnected; bloggers read others' blogs, link to them, reference them in their own writing, and post comments on each others' blogs. Because of this, the interconnected blogs have grown their own culture.

The datasphere is like that, but for HTML documents containing data. It is an interconnected and highly linked web of documents that has both human and machine readable data. I'll be exploring the concepts needed to make the datasphere in the coming days and weeks, as it builds upon a lot of the work we've been doing at BlogMatrix.

As a historical note, I certainly can't take credit for the word datasphere. It comes from Dan Simmon's novel Hyperion