I am now posting on a regular basis on the David Janes' Code Blog. Come join
me over there for daily postings about anything and everything that interests me in the code world or even directly subscribe to my feed here.
I'm just starting a project that's using JavaServer Pages (not Java Server
Pages) aka JSP for doing "templates" -- i.e. making dynamic HTML content. The first thing I'm trying is really quite simple --
create an object and print out one of it's values.
Here's the relevant class:
public class Entry {
String hour;
public String getHour () {
return this.hour;
}
}
Java/JSP is big on setters and getters, that is, functions named a certain to get access to object attributes. There's plus
and minuses to this, but from a conceptual level I thought it was quite clear that we are thinking "Entry objects have an
attribute 'hour'", even if we have to use setHour/getHour to access values.
Now, in Cheetah, Django (and
probably Rails, though I haven't looked) you'd access the attribute 'hour' in a
template language as follows:
e.hour
Simple -- the last thing you want to be doing in a HTML template is dumping tons of programming language code into it,
making it hard to read and maintain. Cheetah is quite clever about this: it looks at 'e' and does all sorts of introspection to
figure out how to get 'hour', so you don't have to do that yourself.
So how do we do this in JSP (like in 2008)?
e.getHour()
Give me a break. Maybe there's another way to do this? If there's performance reasons to be wedded to this format (though I
can't see it since everything is compiled) why not invent a new syntax like
e::hour
Here's a few notes I made while working my way through Django's tutuorial:
Installation
Installation of Django is easy. Just following the instructions on this page:
It can either be downloaded as a tarball or via SVN. We've chosen the tarball (0.92.2) option.
Nitpicking, the tarball URL doesn't end with the name of the tarball - it's a directory. This is only slightly annoying.
Our environment has as per-user Python installation, so we don't need to use "sudo" to install the Django packages
Testing (Part 1)
We are following the instructions son this page:
Running a development webserver
Django has options for running under mod_python, WSGI or using a standalone built in webserver. The built-in webserver is
documented as not-for-production, but it's good enough to get going so we're going to play with that for now. Eventually we
expect to use both the mod_python (because we have it here) and WSGI options (because it's both the way to go and the most
efficient).
We immediately ran into issues running the webserver because it's not on the "localhost" - the browser said it couldn't find
the server. In our environment we ssh to a linux server and access that from our desktop computer. After a little googling, the
trick turns out to be adding a "IP:port" argument to the Python command:
python manage.py runserver 192.168.1.10:8000
Connecting to the DB
Next in the instructions is connecting to the DB. Again, we have issues here - no fault of Django - because of the
uniqueness of the local environment. Like Python we run MySQL on a per-user basis so we need to be able to specify a fairly
unique setup with a TCP/IP and UNIX domain socket. It's not clear initially how to specify the path for the UNIX domain socket
but fortunately the stack traceback and clearly written code come to our rescue here: you use the DATABASE_HOST and the code
looks to see if it's a path.
Creating the DBs
The ‘syncdb' asked if I'd like to create a super user for the DB. I said no and everything seems to be working - a bunch of
tables show up in MySQL as promised.
Creating an App
An "app" is something that does something or something like that. The command works as promised.
Create the Model and DB tables
Works as promised. There appears to be some underlying cleverness happening as (for example) the "sqlclear" command knows
whether the tables are their or not.
The "sqlinitialdata" refers you to another command; this is probably a mismatch between the code and the tutorial.
Interactive Shell
This is cute - running "python manage.py shell" will drop you in the environment that the webserver sees so you can do stuff
on the command line to see how it will work.
Stylistically, I differ from Django in that I (almost) never do anything except ‘import x', as I prefer to use the dotpath
and not have to guess when I'm skimming code as to where functions are coming from.
As an aside, it would be cool if Python examples did the ‘>>>' as an image or as an CSS trick so they don't get
copied when cut-and-pasting examples.
Testing (Part 2)
And we're on to page 2 of the tutorial:
Admin Interface
And we get our first "uh oh" - we didn't create a "superuser account" on way back above, well, just because. Fortunately,
Lord Google knows all and we quickly find out that you can run Python commands to do this and we're in business:
$ python manage.py shell
>>> from django.contrib.auth.create_superuser import createsuperuser
>>> createsuperuser()
Thanks to here for the tip (which also looks like interesting reading):
Admining the App
Nice - add an inner class declaration to App "Poll" and it now shows up in the Admin interface and we can do stuff with it.
Neato.
I've also just discovered that I don't need to stop and restart the app - it's using reload().
Adding __str__ to the model makes the Admin app useful; if you do something wrong the HTML function is really quite
clear.
You can change the way the Admin interface displays Poll by adding a ‘fields' declaration. This is a little bizarre - it's a
tuple, of tuples with a dictionary inside. Why not a list of dictionaries?
Bug: if you use the ‘classes/collapse' option of the admin interface and there's a bug while saving an item while the item
is collapsed, there's no indication of where the error is.
Changing Templates
The instructions are a little confusing - pay attention - but it works as advertised. There's a command called ‘adminindex'
which dumps out template code but I'm somewhat confused by it and I wish there was a little more detail here.
Testing (Part 3)
And we're on to page 2 of the tutorial.
Design your URLs
Django converts URLs coming from user requests into actions by:
-
looking at ROOT_URLCONF in settings.py
-
loading mysite/urls.py (as defined in the settings)
-
sequentially looking at regular expressions
-
loading a module - typically a view - and calling a function defined by the matching regular expression. This is
expressed as a dotpath - e.g. ‘x.y.z' will load module ‘x.y' and call function ‘z'
-
the function is called with a request object and all the named groups from the regular expression - clever
-
also note that:
-
you cannot filter on the hostname in the URL (problem?)
-
regular expressions are compiled (and thus are very fast)
The code samples work as advertised. In a later section we'll learn that you can move all the URLs into the app so that you
don't have to (a) put all the URL information at the project level (b) decouple the base path of the URL being used.
Write views that do something
This section explains how to returning meaningful results building on the knowledge in section 2.
-
the template loading system rocks
-
the shortcuts are very handy (i.e. lots of common actions are compressed into single function calls)
Playing with forms (Part 4)
This section is a mess. I'll probably retackle it.
Introduction
This is a repeat of an architecture I worked on several years ago for a high performance/ real-time financial risk
management system involving not-unsimilar scaling problems to what Twitter is doing. It works, and it works well.
This post is inspired by the Dare Obasanjo post (http://www.25hoursaday.com/weblog/2008/05/23/SomeThoughtsOnTwittersAvailabilityProblems.aspx)
about Twitter's scaling issue, and by noting the major saving grace - most calls are looking for the last 20 items, so it
doesn't necessarily matter that Scoble has 21,000+ followers because he's never going to see the damned posts also.
This article is just a sketch of how this would work; I acknowledge the devil is in the details. Onaswarm does not use this
architecture and will hit similar walls as Twitter if it ever is used at its level. And obviously, it's all hypothetical since
I'm not actually doing this. If I was implementing this I would be doing it in Java or C++.
Overview of the problem
-
Twitter sucks, probably because of certain pathological edge cases requiring either large numbers of reads or large
numbers of writes, or both
Overview of the solution
In case you don't want to read everything below, here's what we end up with:
-
one 64Gb machine that stores recent timelines for 30m+ users
-
one 64Gb machine that stores last 200 million messages
-
a switch Gb Ethernet connection between machines
-
a number of smaller machines for easier tasks
-
uni-directionally pass messages from machine to machine
-
messages can be batched, pipelined, are small not exceeding 4K
-
10,000 requests per second (pending further analysis review)
Overview of how it works
-
one-way message passing along always-open connections
-
one processor per-major task
-
keep code in the instruction cache at all costs
-
the core is a few dedicated machines passing messages across the network
-
messages are very small - a few hundred bytes to 4K. Multiple messages can (and will be) sent all at once.
Overview of benefits
-
horizontally scalable - just keep adding machines for load, replicating the entire setup if need be
-
this should work to about 30 million users
-
after about 30 million users another level is needed, but this doesn't overly complicate
-
incredibly high throughput
Overview of assumptions
-
every message has a unique ID (MID) that fits in 8 bytes
-
every user has a unique ID (UID) that fits in 4 bytes
-
the average number of friends per-user is 64 (http://www.kottke.org/07/03/twitter#26563)
-
twitter messages are 140 bytes
-
there's a couple of big-assed machines with lots and lots of memory (64Gb) and fast efficient (Gb) network connections;
these machines can be purchased for about $20,000.
-
favor frequent queries, service infrequent queries, punish highly infrequent or improbably queries
-
there may be a throttling mechanism added to this
-
this could be requiring an ACK/NACK every so many messages, or
-
a separate channel between components
-
There's a lot more going on in the API which we do not address here for brevity - in particular:
-
friend list modifications
-
notification on keywords
-
account information has to be looked up!
Networking assumptions
-
Gb Ethernet
-
4K max messages
-
1024 * 1024 * 1024 / 8 / 4096 = 32768 messages/second - let's call it 10,000 because we won't probably get peak
speed
Servers / Components
Every server (except the FES) has three types of thread:
-
reader threads
-
one "task" thread that does stuff that the reader asks and places the result on the writer thread's queue (might use one
per core)
-
writer threads
FES
The "Front End Server" - i.e. probably an HTTP process that accepts web and API requests. There are lots and lots and lots
of these, as needed.
CS
The "Concentration Server" - there is one of these per N FESs, where N is something like 1024. The CS:
-
accepts requests from the FES
-
pumps the requests into the start of the pipeline
-
gets results from the end of the pipeline
-
passes the results back to the appropriate FES
These can be simple machines with a couple of Gb of memory.
DBWS
The Database Write Server:
-
writes new messages to the DB, returning the MID
-
passes on messages
The exact ratio for CS:DBWS or DBWS:TLS will have to be discovered by experiment, though I would probably base it on the
first ratio.
These can be simple machines with a couple of Gb of memory.
TLS
The "Time Line Server" - this stores:
-
the last N entries for every user, where N is something like 128
-
the friends list for every user
On a 64Gb machine:
-
100 (average number of friends) * 4 (size of UID) = 400 bytes is needed per account to store friends
-
128 (size of TL we are storing) * 12 (size of UID + MID) = 1536 byes is needed per user
So let's say we need 2K per user - that's 33 million users. Scaling beyond this will require another level of merge sorting,
but 33 million is a good start.
MCS
The Message Cache Server - this stores the last M messages via a dictionary/hash table, where M is a big number.
On a 64Gb machine, assuming 256 bytes is needed per-message we are looking at storing 268 million messages for fast
retrieval!
DBRS
The DB Read Server - this backfills information the MCS could not retrieve.
The exact ratio for CS:DBWS or DBWS:TLS will have to be discovered by experiment, though I would probably base it on the
first ratio.
These can be simple machines with a couple of Gb of memory.
Operations
Add Twit
There's probably no need for the complete round trip I've outlined in this flow below
-
FES
-
accepts "add twit" message
-
gets the UID for the user
-
sends message to the CS
-
waits for a result on it's two way socket (this is the only two way connection)
-
CS
-
accepts "add twit" message
-
sends message to the DBWS
-
DBWS
-
accepts "add twit" message
-
writes the DB
-
adds the MID to the message
-
calls the TLS
-
TLS
-
accepts "add twit" message
-
updates user's timeline with the MID and update time, tossing off the oldest entry in the TL
-
NOTE that the text of the twit is not stored!
-
calls the MCS
-
MCS
-
accepts the "add twit" message
-
adds the twit the cache
-
randomly or cleverly removes an old twit from the cache if memory is full
-
calls the CS
-
CS
-
accepts the "add twit" message; it knows that this is a completed operation
-
sends the result (i.e. "OK") back to the FES
And we're finished.
Computations
Note that computations involved:
-
CS: hash table operations, in memory
-
DBWS: 1 DB operation, memory + disk
-
TLS: O(1) index table operations, in memory
-
MCS: ~O(1) hash table operations, in memory
And also note the size of the message is almost certainly under 256 bytes. Note that the amount of work is constant, no
matter if you are JoeBlow-like or Scoble-like.
Get Timeline
-
FES
-
accepts "get timeline" message
-
gets the UID for the user
-
sends message to the CS
-
waits for a result on it's two way socket (this is the only two way connection)
-
CS
-
accepts "get timeline" message
-
sends message to the TLS
-
TLS
-
accepts "get timeline" message
-
looks up all followers
-
start with the first follower, then loop through the remainder:
-
calls the MCS with the 20 MIDs to look up
-
MCS
-
accepts the "get timeline" message
-
looks up each MID
-
if all MIDs are found, calls the CS
-
if any MIDs are missing, calls the DBRS
-
DBRS (optional)
-
accepts the "get timeline" message
-
looks for all MIDs where the message could not be found
-
does a DB search for them and add them
-
calls the CS
-
CS
-
accepts the "get timeline" message; it knows that this is a completed operation
-
sends the result (i.e. all the necessary twitter messages) back to the FES
The DBRS could optionally also route the message through the MCS again to make the DBRS write them into cache!
Computations
We'll assume everything's in cache, but if they're not we're adding a single database read that will return 1 to 20
entries.
Normal User
Has:
-
64 followers
-
assume average 10 integer date comparisons per timeline to do merge sort
So:
-
CS: hash table operations, in memory
-
TLS
-
O(1) index table operations, in memory - to find followers
-
640 integer comparisons
-
MCS: 20 ~O(1) hash table operations, in memory
The message from the TLS to the MCS is less than 256 bytes (4 bytes for the MID, 4 bytes for the MID) * 20.
The message from the MCS to the CS is about 3K (140 byte message * 20 plus overhead)
Scoble-like Reader
Has:
-
10000 followers
-
assume average 4 integer date comparisons per timeline
So:
-
CS: hash table operations, in memory
-
TLS
-
O(1) index table operations, in memory - to find followers
-
40000 integer comparisons
-
MCS: 20 ~O(1) hash table operations, in memory
I.e. the only change from the previous step is the 40,000 integer comparisons, as opposed to 640. These are operating in
process cache due to the one-machine one-task architecture and will be compute quickly - a multi-GHz machine can
probably do hundreds of thousands of these per second.
Bottlenecks
Let's assume that we average 5000 operations in the TLS to merge timelines. With the network bottleneck at 10,000 requests
per second, we need a CPU that can handle 50 million integer operations / second which seems to be an easy fraction of modern
CPUs.
Attached Documents:
In certain circumstances, Safari will hang when uploading documents, eventually causing the server to throw a read timeout
error. This has been tested again Safari 3.1.1 but apparently happens with earlier versions too.
You can tell if you are experiencing this bug if:
-
files upload are hanging and eventually time out; this may happen intermittently
-
if you wait 1 (or 5 depending) minutes and then submit/upload, it works
-
you are likely working a fast local intranet connect rather than a slower internet connection
-
the form works fine in other browsers such as Firefox
This bug is ably described here (http://lists.macosforge.org/pipermail/webkit-unassigned/2007-January/026203.html)
and apparently is something happening at a very low level of the TCP/IP stream. Unfortunately, Apple doesn't really seem to
believe this is a bug (https://bugs.webkit.org/show_bug.cgi?id=5760).
I do not have a workaround for this, except by disabling
Keep-Alive on Safari.
See also:
Add music to your web site with the MP3 Clips widget. Search
through Amazon's catalog of DRM-free MP3 music and addentire albums or select specific MP3 tracks to add to your widget. You
can also showcase the latest Bestsellers from any musicgenre. If that isn't enough, your MP3 Clips widget can also
automatically display the latest MP3 tracks you purchase onAmazon.com
How we'd like to us this
Once upon a time - and probably soon again - Onaswarm had an "Ads" widget in the sidebar which displayed (amongst
otherthings) a list of recently played MP3 tracks, as shared with us via Last.fm. Clicking on the MP3 track would bring the
readerto Amazon.com where they could play MP3 samples - and potentially purchase the track. A win-win-win for everyone
involved.Unfortunately, because of the extra a steps involved plus an off-site navigation, this feature was somewhat
underutilized.
How it works
-
go to the Amazon MP3 Clips Widget page
-
enter a search for music, by album or song title (or anything). You can also select Best Sellers or Recently
Purchased.
-
a list of matching items appear, which you can add to the widget
This is all AJAX-y, so you're just working on one page. When you've selected all the items you want to appear on
thewidget:
-
click Next Step
-
a widget size selector and interactive preview appear; the widget starts with the album cover displayed, clicking on this
brings you down to the individual track level
If you're happy with your widget:
-
click "Add to my web page"
-
a popin appears with the appropriate OBJECT embed code, plus explanations of how to add it to many different blogging
services
The Good
It's pretty cool, easy to use and works exactly as advertised. I could see this driving lots of MP3 sales to Amazon.
The Bad
You can't dynamically select what the widget is going to display, that is, you have to construct a widget for each set
ofmusic you want to share. This makes it somewhat - well, totally - useless for the purpose we're outlining above. This would
bemarginally tolerable if there was an API or something, but alas that option isn't there either.
The Ugly
The widget constructor doesn't work on Safari 3. It doesn't complain, it doesn't pop up errors, it just doesn't work.
If you're not logged in, it forgets all state after you do. Come on guys.
Amazon.com won't sell MP3s to Canadians.
Summing up
Nice, but needs a few minor UI tweaks. Desperately needs a way of dynamically constructing a widget at render time.
Many enterprises - include most I would guess that are Microsoft-centric - use LDAP to establish user identity and
profiles.In the Web 2.0 world, the emerging standard is OpenID. Is there a way to use OpenID to provide logins within the
Enterprise buthave it backed by LDAP, the obvious benefit being one could install off-the-shelf intranet tools inside one's
organization butnot have to LDAP-enable them or create a parallel account system
The OpenID-LDAP Project (http://www.openid-ldap.org/) offers such a tool.
We're testing this on a Macintosh, but there seems to be no reason this won't work on any UNIX-y system.
Installation
First, download an unpack the code into the web server directory.
$ cd ~/Sites
$ curl --location 'http://www.openid-ldap.org/releases/openid-ldap-0.8.5-noarc.tar.gz' >openid-ldap-0.8.5-noarc.tar.gz
$ tar zxvf openid-ldap-0.8.5-noarc.tar.gz
This extracts the code into a non-versioned subdirectory called ‘openid-ldap'. It would be much better form if the
directorywas called ‘openid-0.8.5'.
Interlude: Enabling PHP on Mac OS X Leopard
Leopard has PHP but it has to be explicitly enabled by editing configuration files (if you haven't enabled Apache on
yourMac, see the links below)
$ su -
# cd /etc/apache2
# vi httpd.conf
remove the hash sign on the ‘LoadModule php5_module' line
# apachectl restart
Here are some helpful links if you need more information:
Running to Stand Still
Without configuring anything, let's see what happens when we visit the page:
-
http://localhost/~davidjanes/openid-ldap/
Note that URL is Leopard's way of referencing a user's (i.e. "davidjanes") local webpage.
A webpage appears with a field for entering a username - but not a password. Entering a username - e.g. dpjanes -
redirectsus to the 404 page:
-
http://localhost/~davidjanes/openid-ldap/dpjanes
... which definitely wasn't expected.
Reading through their documentation, it looks like they're mainly doing this using SSL/HTTPS and to do that one has to
addsome rewrite rules to the Apache configuration. Since we're not doing that - at least not yet - we're probably using
aninfrequently used code path, thus hitting a bug. Perusing the code we should see the URL above should be internally
rewrittento:
-
http://localhost/~davidjanes/openid-ldap/index.php?user=dpjanes
To fix this we have modify the Apache configuration again. Changing ".htaccess" does not work because Apache on Leopard
isconfigured "AllowOverride None" which means the rewrites will be ignored
$ su -
# cd /etc/apache2/users
# vi davidjanes.conf
And then we add the following:
RewriteEngine On
RewriteBase /~davidjanes/openid-ldap/
RewriteCond %{REQUEST_URI} ^/.*[/]([a-z][-a-z0-9_]*)$
RewriteRule ([A-Za-z0-9]+)$ /~davidjanes/openid-ldap/index.php?user=$1 [P]
And then
# apachectl restart
Note that these rules are predicated on that we're going to be logging in using OpenID's "uid" which will be lower
caseletters, numbers, dash or underscore.
Configuring LDAP
This is obviously the part where we're going to part paths - everyone does LDAP their own way. We don't have an
ActiveDirectory setup here, but we do have VMWare Fusion (http://www.vmware.com/products/fusion/) and a JumpBox for
OpenLDAPappliance (http://www.vmware.com/appliances/directory/1105) so it should be just a simple matter of figuring out the
rightcombination of configuration settings.
The OpenID appliance has the following configuration:
-
JumpBox Name: openldap 0.9
-
Application Page: http://192.168.1.120/
-
Management Page: https://192.168.1.120:3000/
I've already configured a few accounts on this, but for example we have a user:
-
o=Directory
-
ou=users
-
cn=David Janes
In LDAP terms this gives us a "Distinguished Name" which is the really way LDAP (as I understand it) uniquely identifies
arecord. In this particular case our Distinguished Name is "cn=David Janes,ou=users,o=Directory".
This user has the following configuration:
-
cn: David Janes
-
gidNumber: 1000
-
givenName: David
-
homeDirectory: /home/users/default/dpjanes
-
objectClass:
-
inetOrgPerson
-
posixAccount
-
top
-
sn: Janes
-
uid: dpjanes
-
uidNumber: 1000
We're going to use "uid" as the login ID - note that this is by no means a universal choice nor is it universally
availableon all LDAP servers. I've seen LDAP servers use "name" to provide a unique identifier and it's possible - maybe even
probably -that many LDAP servers don't provide short unique names at all.
Note then how LDAP logins should probably work:
-
one provides a part of the record we are looking for, for example "uid=dpjanes", where the user at login time provides
the "dpjanes" part and the configured application prepends "uid="
-
given a starting point - the "searchdn" in the configuration below - we look for a matching record
-
when we have the matching record, we get the Distinguished Name which uniquely identifies a record and that we ask LDAP
to validate it with a password
Note that OpenID-LDAP doesn't actually work quite this way; we'll explain this further down.
Configuring OpenID-LDAP to contact LDAP
Following, the instructions in openid-ldap/docs/README.txt, especially point (5) we get the key points of configuration
-edit "ldap.php" and fill in the values.
The original connection settings look like this:
'primary' => '10.0.0.111',
'fallback' => '10.0.0.222',
'protocol' => 3,
'binddn' => 'cn=<name>,cn=users,dc=domain,dc=local',
'password' => '<pass>',
'searchdn' => 'cn=users,dc=domain,dc=local',
'filter' => '(&(cn=%s)(mail=*))',
'testdn' => 'cn=%s,cn=users,dc=domain,dc=local',
'nickname' => 'uid',
'email' => 'mail',
'fullname' => array('givenName', 'sn'),
'country' => 'c'
Our new connection settings look like this:
'primary' => '192.168.1.120',
'fallback' => '',
'protocol' => 3,
'binddn' => '',
'password' => '',
'searchdn' => 'ou=users,o=Directory',
'filter' => 'uid=%s',
'testdn' => 'uid=%s,ou=users,o=Directory',
Note the reasons for this:
-
primary: as per the VMWare notes above
-
fallback: we don't have a backup server
-
binddn & password: it works without this; but we assume there's LDAP configurations that require you to login with a
well-known Distinguished Name and password before you can do a search
-
searchdn & filter: the ‘%s' is replaced with the user's login name (i.e. from the login form) and then these items
are put together to search for the user's record
-
testdn: when actually logging in, the ‘%s' is replaced as above; the page then tests the modified testdn with the
password provided against the server
Note then the difference between OpenID-LDAP and our hypothetical login scenario in the previous section -
OpenID-LDAPsearches for the login but after validating that it exists, ignores the Distinguished Name and just tries to log in
using asimply constructed testdn and password. This works, but it strikes me that the search is either unnecessary or the
loginprocedure is insufficient.
Failure
Alas, at this point we're going to have to stop, unless someone has a suggestion. When I attempt to log in with "dpjanes"
weend up with OpenID-LDAP bridge trying to log in with "uid=dpjanes,ou=users,o=Directory", which simply doesn't work.
Whetherthis is specific to my LDAP implementation or not is unknown.
If I alter the rules so that I'm logging in with "David Janes" / "cn=David Janes,ou=users,o=Directory" the
(slightlymodified) Apache rewrite rules get confused because of the space. I could probably fix these but quite frankly I don't
want tobecause I want "dpjanes" to be recognized as the login.
So, that's as far as I'm getting with this. If anyone has further suggestions, please let me know and I'll modify
thisdocument and necessary.
Onaswarm is now provides a interface for finding out the social network connectivity of webpages. Connections are discovered
using XFN, hCard, FOAF, optionally Google's SGN services and in some instances custom APIs if account information is available
to Onaswarm.
URI
Parameters
-
uri - the URI of a page you'd like to discover social network details for
-
wrapper - if "ajax", the results will be returned in JSON format
-
json_pretty - boolean; the results will be pretty printed
-
jsonp - if non-empty, the results will be placed in a JSONP wrapper
-
reverse - boolean; the results will reflect links to this page, as opposed to outbound from this page
-
google - boolean; add results from Google Social Graph API. In HTML mode, this defaults to True; in AJAX, False.
-
appkey - coming soon
Example Queries
form interface
all links from Twitter user "bvl" in HTML, augmented with Google SGN results
all links outbound from Twitter user "bvl", without Google SGN results
Dan Brickley's FOAF file
Notes
-
if our server is experiencing unusual loads, this API will return 503 errors
-
there's a lot more we could do with the FOAF files - tell us what
-
if we did FOAF output, would you use it?
-
we will be using appkeys to access to API in the near future, mainly to stop robots from crawling the web through our
API!
We're migrating the data in Onaswarm to a high performance network attached disk ... we expect we'll be back by 8 AM
EDT.
Onaswarm is pleased to announce that we’ve set up a “swarm” especially for the
Metronauts / Transit Camp community. Your swarm – http://metronauts.onaswarm.com –
will create a lifestream to capture and share all community posts, twits and photos about this community.
Signing up
If you don’t have an Onaswarm account, signing up is very easy:
You’ll then be lead through a set of simple steps to add your profile and social network information. You’ll automatically
be added to the metronauts swarm.
If you already have an Onaswarm account:
Posting
ou must use the “metronauts” tag when posting in order for your post to show up on the Metronauts group:
-
on del.icio.us, Word Press and Flickr, add “metronauts” to the tag field
-
on Twitter and Pownce, use the hash tag “#metronauts” in your twit – you can see examples of this on the Metronauts
swarm
Generally your post/twit/photo will show up in Onaswarm about 15 – 30 minutes after posting, depending on load. Del.icio.us
feeds are only checked every hour, due to terms of use restrictions so they may take a little bit longer to show up.
Widgets
If you’re interested in displaying the Metronauts swarm lifestream on your blog or webpage, try adding our widget:
It’ll only take a few seconds.
I woke up this morning with the intention of writing a "best practices" guide to doing microformats only to find out
that
Glenn Jones had beaten me (handily) to the task. In my mind this should be
converted into a wiki page.
Via the magic of Twitter, Twhirl and @dangerday, I’ve finally found myself in possession of a Fire Eagle (http://fireeagle.yahoo.net/) invite. Clicking through the link that was e-mailed to me, I
logged in with my Yahoo ID and there I was – finally – in Fire Eagle. In case you’re not familiar with FE, here’s their brief
description:
Fire Eagle is a service that helps users share their location online with their friends and with other sites and services.
Find out more about the service by exploring below...
I.e. “twitter for location”, sorta.
he site itself is visually appealing, with large buttons and fairly obvious styled using YUI (http://developer.yahoo.com/yui/). And there’s a pretty background, in pseudo-Miami Vice
colors.
You get to select how often they’ll check back with you to make sure I’m comfortable with sharing my location. A strange
thing I’ll have to admit: if I stop sharing my location with FE, then I’m probably no longer interested in have you know where
I am (or it’s Game Over man and I’m not worried about it). From other reviews I read I thought there was a way to fuzzy my
location – i.e. just show what neighborhood, city, province or even country I’m in – but I can’t seem to find that option.
The first thing I tried in FE is “update your location”. Just for a laugh I entered “home”, but alas FE unsportingly offered
a list of places called “Home” (and 奉免) no doubt populated by some very boring people. More seriously, it would be nice if I
could enter “home”, “glenn’s office”, “doug’s house”, etc. as that more corresponds to my idea of location and is way more
semantic. Perhaps this feature is coming.
Next I entered my (Canadian) postal code and bingo, there I am: a pin in a map. Then I entered “YYZ” to see if FE
understands airports and yes it does. Then I tried to go back home, only to discover that it doesn’t seem to track previous
locations. The INPUT field does respond to the down arrow, but it still shows “home” where I never was apparently and when I do
select something, it doesn’t fill in the field. Sigh. Well, I know what it’s like to be in Beta (http://www.onaswarm.com).
Then I gave the “Application Gallery” a try. Alas, three applications (Fire Eagle Badge , Fire Eagle on Facebook and SMS
Updates! [sic]) are listed, but none of them are there yet.
So where FE stands right now is it's a developer platform. If you're not a developer I wouldn't rush out of my way to get an
invite. I’m going to play with this over the next few days and see how that works out. Here’s some brief notes:
-
there doesn’t appear to be any option for providing a non-protected update stream. Really, I don’t mind providing this
information, if I can fuzzy it up
-
results are available in a custom XML format (boo) and in JSON. Why not GeoRSS or Atom?
-
authentication is done using OAuth (http://oauth.net/). Is Yahoo all OAuth now? Something
to check out. Probably not
-
there’s an excellent selection API kits: Javascript, PHP, Perl, Python and Ruby. No Java? Well it’s official: Java is the
new COBOL – you’re on your own!
-
there’s an API for updating location so it seems that you, for example, have a Twitter client that updates your location
on FE. Or something that looks at your Calendar or TripIt agenda (http://www.tripit.com)
and makes the appropriate updates.
There's too major/minor issues we're trying to solve with Onaswarm right now:
-
when new Swarms are created, users don't seem to be showing up in the results, at least for a while
-
when new Feeds are added, they're not being prioritized properly. Ideally we'd like to see new feeds show up within
seconds.
We're calling these major/minor problems because although they're affecting functionality in a nasty way, we expect the
fixes to be rather small. And implemented ASAP...
I’ve been trying to use FOAF to get profile and friendship/contact information
across social networks. I’ve done the “friend” part, I just need to fill in the profile information.
Now, getting this information out of FOAF is problematic at best. Using Python, the rdflib
library, and SPARQL I’ve managed to coax data out one painful step at a
time. For example, here’s my “friend-getter” code:
SELECT ?bfoaf ?bname ?bnick ?bmbox_sha1sum ?bimage ?bweblog
WHERE {
?a foaf:knows ?b .
?b rdfs:seeAlso ?bfoaf .
OPTIONAL { ?b foaf:name ?bname } .
OPTIONAL { ?b foaf:nick ?bnick } .
OPTIONAL { ?b foaf:mbox_sha1sum ?bmbox_sha1sum } .
OPTIONAL { ?b foaf:image ?bimage } .
OPTIONAL { ?b foaf:weblog ?bweblog } .}
Clear enough, I guess. Unfortunately, I just can’t go look at bnick and stuff it into my results because bnick might be some
sort of “resource” which then has to programmatically traversed also (see http://api.hi5.com/rest/profile/foaf/208329359). I admit that this might –
maybe even probably – is a problem with me, maybe I don’t understand SPARQL well enough.
But that’s old business. The way I’ve been doing this is CURLing down the FOAF file, manually inspecting it, writing some
Python/rdflib/SPARQL code and seeing what happens.
This morning I decided to try a new approach: look for a SPARQL and/or RDF browser and figure out the correct queries online,
then just write the code once, correctly. In my mind, this way all very sweet: an INPUT field for the FOAF/RDF URI, a TEXTAREA
for the SPARQL query, a TABLE for the SPARQL results, and a TABLE showing all the RDF triples, since it’s triples “all the way
down”.
Here’s what I did find:
-
Google rdf
browser
-
Check out Brown Sauce; have to install a local massive development
environment – remember now, I’m trying to save time, not lose it
-
Check out http://browserdf.org/: “Faceted Navigation for arbitrary Semantic Web
data”. Very promising. Unfortunately, “arbitrary” seems to mean three different data sets
-
Check out Stefano’s Linotype -- a high quality
information source usually; find out about Welkin
-
Try Welkin
-
Find out Welkin doesn’t browse the web
-
Download the FOAF file from http://kitschbitch.vox.com/profile/foaf.rdf into test.foaf.
-
Discover that Welkin doesn’t like “*.foaf”
-
Try again with “*.xml”
-
Try again with “*.rdf”
-
Success, except no results. Why? Oppps … I was downloading the wrong URI
-
Try again with the correct URI
-
Verify that it’s a FOAF file
-
Stare at nothingness coming out Welkin
-
Write a blog post about it; partially regret losing 50 minutes of my morning
The problem – a problem – with FOAF and RDF is quite simple. People don’t want formats that can do anything, they want
formats that can do something. I got a Flickr API downloader going in about 30 minutes, taking my time. I’ve put hours into
FOAF and still am unhappy.
Webdistortion has a review of 9 HTML rich text editors (via
the YUI Blog).We’re happy with the new TinyMCE so
far, but there may be something here thatstrikes your fancy if you’re looking for something smaller. Here’s the 9 plusmy brief
notes:
One thing we don’t like about HTML online editors is that they make some pretty lousy looking HTML pages. To deal with this,
we’ve created HTML “scrubbers” to rewrite HTML coming from these widgets.
The first thing we always do is call TIDY () to normalize the HTML. We then run a list of regular expressions to remove
things we don’t like, such as class names, ids, etc. and also things such as trailing empty paragraphs at the end of
documents.
We just added another scrubber to convert double BRs within P paragraphs into paragraph splits – this makes the HTML more
semantic, that is, to make it say what it means, not what it looks like.
This can’t be done with just a regular expression, of course. Here’s our algorithm:
-
find all <p>…</p> paragraph blocks, always looking for the shortest matches
-
reverse this list, so that we can rewrite the document without having to worry about adjusting search indices
-
look at each match: if contains anything non-simple, leave it alone. Theoretically, since we’re coming out of TIDY this
should be well formed and only contain markup like B, STRONG, ABBR, etc. but I never take chances
-
if the match is simple, convert all BR BR sequences to “</p><p>”
With the BlogMatrix Platform editor, every time you save a post it scrubs it and sends it back to the editor.
I completed the upgrade to TinyMCE (http://tinymce.moxiecode.com/download.php) this afternoon, tossing away 95% of
my old code. I’m very very happy – they’ve very much modernized the code to what we’d expect in a modern JS framework.
For example, this is how we create an editor now:
editor = new tinymce.Editor('id_editor', initd);
editor.render();
And here’s how we listen for events:
editor.onKeyPress.add(function(e) {
… do stuff …
});
I haven’t tried to do stuff with the Toolbar yet, but given the apparent serious thought they’ve put into making a nice
interface, I can’t imagine it’s going to be very difficult.
Well, I couldn’t believe how easy it was to make our new editor use TinyMCE – I just downloaded the new version (3.0.4.1 –
http://tinymce.moxiecode.com/download.php), hooked it up to our code
and it ran out of the box.
Since you may not have done this yourself, I’ll just run you through how we use TinyMCE:
-
make a TEXTAREA that you plan to work with; there are some complications if you want or have multiple TEXTAREAs but this
is not an issue for us
-
include TinyMCE: <script type="text/javascript" src="=/jscripts/tiny_mce/tiny_mce.js"></script>
-
call the initalizer function: tinyMCE.init(initd)
That’s it, you have an editor. “initd” is a dictionary that describes how to set up TinyMCE. This is our setup:
initd = {
onchange_callback : "tinymce_onchange_callback",
theme_advanced_buttons1 :
"bullist,numlist,outdent,indent,separator,justifyleft,justifycenter,separator,link,unlink,image,separator,bold,italic,strikethrough,separator,sub,sup,forecolor,backcolor,separator,code",
theme_advanced_buttons2 : "",
theme_advanced_buttons3 : "",
dialog_type : "modal",
theme_advanced_resize_horizontal : false,
entity_encoding : "numeric",
force_p_newlines : true,
force_br_newlines : false,
convert_newlines_to_brs : false,
relative_urls : false,
remove_script_host : false,
verify_html : false,
auto_reset_designmode : true,
remove_linebreaks : false,
theme_advanced_resizing : true,
mode : "textareas",
theme : "advanced",
theme_advanced_toolbar_location : "top",
theme_advanced_toolbar_align : "left",
theme_advanced_path_location : "bottom",
plugins : "inlinepopups",
content_css : "/:root/include/common/tinymce.css"
};
You’ll have to modify the location of the CSS file and the callback (so we know whether the document has been edited!) to
something you prefer to use, but you get the idea.
Also note that you may have to do some magic to move the data between the TinyMCE window and the TEXTAREA:
-
To move the data into the TEXTAREA, do: tinyMCE.triggerSave()
-
To go the other way:
tinyMCE.updateContent(idName), where idName is the DOM ID of the TEXTAREA;
whoops -- in version 3.x use tinyMCE.activeEditor.load();
Version 2 of TinyMCE misbehaved if you tried to create an editor in a hidden DIV (i.e. with display: none); I’m not sure if
this issue is gone or not but try to avoid doing it.
I spent a fair fraction of yesterday playing with Yahoo’s Rich Text Editor (http://developer.yahoo.com/yui/editor/), trying to integrate it into what we’re
calling “V12” of the BlogMatrix Platform – the look and feel you’re seeing on Onaswarm (http://www.onaswarm.com) right now.
Currently, we’re using TinyMCE as a text editor. On the plus side, TinyMCE is standard and reliable; on the minus side, it’s
difficult to work with, very difficult to extend, and fairly hefty. So as a background task I’ve been looking at various
technologies and seeing what can be done. TinyMCE has recently revved from 2.x to 3.x (http://tinymce.moxiecode.com/punbb/viewtopic.php?id=9942), so
we’ll be revisiting that soon.
One of problems with all browser based editors is that that they, well, make weird looking HTML. Especially on Safari (AKA
webkit). We’ve partially solved this problem at BlogMatrix by running a number of “scrubbers” that look for well-known
weirdnesses and transform them into something better. Generally this works as a pipeline from TIDY (http://tidy.sourceforge.net/), to a bunch of regular expressions, to TIDY again. The goal is
to be able to process normal hand-entered input into good-looking HTML, but still preserve the formatting pasted in from
another webpage or document. Believe it or not, we’ve had a lot of luck with this.
You’d think that all this could be done with a Flash component – this would nicely solve multiple browser problem, but alas
I haven’t found a Flash editor that accepts pasted text and does something sane with it. If you’d like to look at this, I’ve
posted notes here on del.icio.us (http://del.icio.us/dpjanes/text_editor).
So, back to the YUI RTE – here’s my positives and negatives in no particular order. I’m aware that this is not a finished
product, so expect to see the negatives to disappear.
-
it’s super easy to configure, especially the toolbar
-
there’s no HTML view; like WTF, I have to write this myself?
-
block indent/undent is broken beyond repair
-
I had no problems mixing and matching with MochiKit (http://www.mochikit.com/),
our JS weapon of choice.
-
The dialogs that pop up for editing links and images are worth a whole section to themselves:
-
here we see the limitations of CSS styling – even the slightest change to text (from my CSS) breaks the dialogs. In this
particular case boys and girls, perhaps even go back to table layout; it’s an app, not a webpage
-
in fact, just isolate this altogether out of RTE. By the looks of it, this seems to be the plan.
-
for proper styling, YUI components need to be descended from an element with CSS class “yui-skin-sam”. Unfortunately,
dialogs attach themselves to the BODY element and we can’t mark that as “yui-skin-sam” because that, well, breaks all our
stuff. So we ended up having to copy out all the CSS, remove “.yui-skin-sam”, adjust all the background images and so on and
so forth.
I may experiment next with Ext’s editor (http://extjs.com/deploy/ext/examples/form/dynamic.html) but I’m
thinking the best bang for our buck is still TinyMCE. If only I could figure out how to add my own buttons and functions….
For what it’s worth, I’m composing this using MS Word 2008 on a Mac, which doesn't play nicely with anything. Sigh
|
|
Recent Podcasts/Videocasts
|