I’ve been paying attention to the discussions about tagging (or folksonomies depending on who you talk to). Some of the concepts are very appealing. I’ve been using del.icio.us and Flickr and Buzznet long enough to understand the usage patterns there. But the recent addition of tagging to Technorati has really opened up some new doors. I was at the Future Salon chatting with Niall about some of them, so I had tagging fresh in my mind when the presentation started and that seems to have kicked off a few ideas.

Arguably, this all has something to do with me asking Elle if anyone else calls that big hill behind Stanford with the satellite dish on it “Satellite Hill”. That’s the question that kicked off a rather abstract discussion about naming, which led to a joke about folksonomies, and only later worked it’s way back into my conscious mind when I went back to tag my photos from the event. I found that someone else had posted images from another event using the futuresalon tag, so I did the same. Of course that means I was paying attention to the tagging I was doing and aligning my usage with others for the benefit of the commons. If I wanted to be all formal I would say something like I paid attention to the taxonomy that someone else had used and then matched myself to it. But what that really means in this case is that I was able to guess the tag that someone else used and then make use of it myself. Because of that one can now pull up the full photoset of Future Salon pictures at Flickr.

This is a usage pattern I would like to explore a bit more. I really like the idea of a common resource that everyone can contribute to and the concept of group action I mentioned in the post about Mobile Media Metadata. Tags present a great way to provide part of this benefit - I was able to add my photos to the pool of Future Salon images at Flickr because one of the organizational methods they have is slicing across user accounts based on tags. However one of the issues with the organizational style is that I had to figure out that tag using a very manual process. That’s just a part of the system, I accept that. Part of the advantage is that everyone has the flexibility to do anything they want, so part of the problem is the chaos that results from everyone doing anything they want. Are there some “social norms” kind of activities that we can introduce to enhance the benefit for some use cases without having to constrict the architecture? Here are a couple of ideas that I was playing with.

The first is using date tags. For example I tagged my post about the event that Niall has proposed with the date on which the event is going to occur. Tags like that could help organize a community calendar (I was just talking to Rayg about calendar feeds the other day) as well as give people a hand when it comes to figuring out what to tag their own content. I’m not sure it would really work out, the system might collapse under the sheer weight of the postings. An important abstraction that makes this easier for the user is a “related” tags view. If you take a look at those Future Salon images you’ll see over on the left hand side that the tag paloalto is related. Another great mechanism would be multiple tag lookup for these systems. I want to find things tagged as 20050131 and event and sanfrancisco. But now we’re talking details and not big picture, so I’m just gonna move right along.

Closely related to the date tags for events are geography tags for exactly the same reason. I want to find other information about people who were at the same events that I was. However, I can tag something as sanfrancisco - but someone else at the same event might tag their content as macworld and a third person might tag Moscone. Just about all the solutions for this get very messy very quickly. We can define hierarchies of tags for locations or tag with inexact GPS coordinates. But ultimately we want to be able to set a tag for the exact position, or close to it, and search within a given area. Not possible with the current tagging systems, but possibly not that far out of line for near term evolution. A location lookup could be used along with dates to find event postings quite well. Weird huh? That del.icio.us was created by the same guy who did geoURL? Maybe not all that weird.

The spam problem. There will be one, it’s happening already I’m sure. People will tag their content so that it shows up in inappropriate places. You’ll do a tag search for fluffybunnies and you’ll be greeted with a set of ads for Viagra and home refinancing. Tags add spam right back into the mix. Their ability to pull together content across user and site boundaries is their strength, and it opens them to exploit. So I was thinking about the union of social networking and tagging. For instance the ability to lookup tags from friends only. Or friends of friends of course. I was thinking about that particular union before the event. But because we were sitting in a digital identity event talking about Identity Commons, I started thinking about a 2idi style system holding XFN or FOAF or OPML info. Of course just about any news site could be greatly improved with access to OPML info. But anything that provides general tagging could potentially benefit by weighting tags from trusted sources higher than untrusted sources. This doesn’t exclude tags from sources that aren’t known - we still have the chance encounters with new info, but hopefully can filter out most of the noise from external interruptions if we’re looking for info that’s well represented within our own network. I saw some use to the identity systems before, but this example helped shed some new light. Now that I grok a few new possible forms of the system I realize that Marc has probably had these shapes burning in his mind for quite some time.

So where does this leave us? I think what we’re seeing is that metadata that used to be mixed into the documents as additional schemas can be broken up and added as tags. Currently we have the Creative Commons search engine for one kind of metadata, rubhub for XFN searches, and geoURL used to be able to provide geographical lookups. But lets say we broke down each of those vocabularies into individual attributes and provided a relatively uniform way to represent them. Say by adding something like the “scheme” field that Tim Bray mentions. Now each tag has the potential to represent something almost like an RDF tuple should the user want to make use of it that way. With just a few of these schemes in use the search engine could answer a query like “give me all photos taken by my friends within 200 yards of Moscone between 1pm and 2pm yesterday”. Things could really start looking up for the lower case semantic web. That would be pretty sexy.