Miker

17th level Hacker

A Quarter of a Million Spam Messages

I’m at almost a quarter of a million spam messages (249,744 to be precise, by the time I post this it should be more than 250 thousand) caught by Akismet so far. I turned Akismet on in January of this year. That’s more than 50,000 spam messages a month. According to the Akismet site 92% of the comments it sees are spam.

The problem is very unevenly distributed however. I have a few other blogs out there, and they tend to get more like a few hundred spam messages a month. I just wanted to point that out given the degree of blowup I see when things happen like someone has to turn off comments for a while. The spam problem is not uniformly distributed, just because you get a managable ammount of spam comments that doesn’t mean someone who blogs about just about the same stuff isn’t getting hundreds of times more.

The Web 2.0 Cease and Desist and Self Healing Systems

I need to post a pointer to the story about CMP sending a conference a C&D; letter on behalf of O’Reilly. I’m sure Tim has nothing to do with it, and I’m sure it’ll go away without much fuss in the end, probably just being throwoff of the corporate machine and the result of the right hand not knowing what the left hand is doing. And of course having lawyers involved, which always screws things up. But it is interesting that something like this would happen in relation to a company that has built its business on being ahead of the knowledge curve, and I would say doing it successfully.

Things like this are bound to happen in any organization/company once the size gets beyond a certain threshold. Perhaps not quite this ridiculous in most cases, but there will be some evidence of lack of syncronization and goal mismatches. In the real world normally this stuff gets cleared up pretty quick. Tim pops up and says “wow, we didn’t mean to do that, sorry!” or someone at CMP chews the lawyers out and releases a statement… whatever. The point being that inconsistencies and unexpected situations are generally noticed and dealt with. Outside the blogospere at least. Inside the blogosphere they can live on forever, but that’s really a whole seperate issue.

Odd situations like that tend to cause failures in software though. When something unexpected comes up within a system it’s generally catastrophic. When something unexpected comes up during an interaction between systems you’re sometimes lucky just to find out what happened after the fact. I’ve been thinking about self healing systems recently, although I can’t remember what might have kicked it off. And after reading this realized that almost all human designed systems are relatively fragile and rigid. Law being one of the obvious areas of intersection between codification and human behavior I would expect it might hold some hints about how to handle that well. But law is just guidelines, and the rationality really derives from a human in the loop pulling the strings, judge or jurry. It’s not a system itself really so much as an extension and recording of the will of the judges to keep them from having to repeat themselves. So I keep coming back to the non-human systems to try to find isolated systems that heal well. Insect behavior is one of the classic examples, ants and bees.

George Bush Jr More Evil Than Cthulhu

Wow, folks really hate this George Bush Jr. guy. You would think he roamed around in the streets killing puppies and kicking children or something. I check out the stats for all my Ning apps on a daily basis. Cause I’m vain, no sense in trying to justify it with anything else. And LesserEvil had a bunch of hits from Google image searches, searches for Georgie there. So I pull up his detail page and what do I see? “Cthulhu is less evil than George Bush Jr. 100% of the time.” Wow, that’s pretty evil. Or maybe it’s just interesting commentary on the mindset of folks in the “online and searching for random stuff cause I have that kind of time on my hands” segment of the population.

More Jerry Taylor

Some additional info about the politician who incorrectly threatened CentOS with FBI action. Gotta love this:

“This is just a bunch of freaks out there that don’t have anything better to do,” he said. “When I came in to work Monday morning, I had about 500 e-mails, plus anonymous phone calls from all the geeks out there. [CentOS is] a free operating system that this guy gives away, which tells you how much time he’s got on his hands.”

Nice, very nice. Unfortunately we can’t all resort to threats and scare tactics to get our work done for us. Or can we….?

High Availability NFS

I’ve tried out the high availability NFS solution detailed here and run into a few issues. It does provide a great way to get a higher degree of availability for one of the most problematic bits of infrastructure, so I don’t mean to crap on the idea as a whole. But here are two issues to make sure you either check out or understand before you deploy something like this:

  • We saw write throughput cut in half when DRBD was up and running. If you’ve got plenty of capacity and can segment as you grow you might be more concerned with the availability than the throughput. And it’s certainly possible that we could have performed some tuning to get DRBD to perform better. I’m just saying perform some benchmarks beforehand and afterward to make sure you’re not impacting the system too negatively without knowing it.

  • This is a block level replication technique, so there’s no guarantee that a filesystem level error that causes corruption isn’t just replicated over to the failover box as it happens. True, it’s much more common that the failure mode be bad blocks at the hardware level than an error at the OS level…. but once again this is block level replication. A bad block on the master side that goes undetected for even a single read/write cycle will have that incorrect information replicated to the slave. As far as I know (and I admit I haven’t researched this in a while) there isn’t a Linux filesystem that does end to end checksumming to prevent this kind of issue.

Just two things to keep in mind. There’s some great info in that article in general, like moving the NFS runtime files out to the replicated block device and how to configure heartbeat.

More SMS.ac Turd Flinging

These SMS.ac folks just don’t know when to quit. I was chatting with Russ yesterday and this morning, and he was saying that the SMS.ac people are sending him takedown letters for non-libelous content on his blog because he’s ranked pretty high in the Google search results. I told him to just take down the stuff cause it’s not worth the effort. I’m not a lawyer or anything, but I did watch Law and Order this one time and it didn’t look too hard. And then I figured, wouldn’t it be a real ball buster if a whole bunch of people linked to Russ’s blog with SMS.ac in the link text, so he’s still in the top search results for them even after doing everything they ask. I think it would be, and I think it’s like, totally justified. That’s the great thing about scammers and spammers, you can do whatever you want to them and you don’t have to feel bad.

Makes You Ponder

Some things make you think. Other things make you wonder if the world wouldn’t be a better place if public stonings were still in place. That’s my fucktard of the month right there. Ignorance I can forgive, not everyone has to understand the Interweb. But this goes well beyond that.

End of Line

I make Tron jokes pretty frequently. I mean, who doesn’t, right? It’s a total classic. But this right here, that’s just fantastic.

Sudden Wave of Spam

Starting yesterday I’ve seen a big upswing in the number of comments that make it through the Akismet plugin. I’m wondering if that’s just me, that I made it onto some new list? Or has someone out there figured out how to beat the filters? The top ones I’m seeing seem to be ephedrine, backgammon, and sex with farm animals. The ephedrine one sounds pretty tempting….

I just updated to the lastest point version of Wordpress too in case that helps out.