August 22, 2007
Officially Switching Over
This will be the last post on the Movable Type version of this blog. If you are reading this, and want to read the continuing and updated blog, then got to http://www.blogd.com/wp/. At least for a while, that is where the new blog will be located. I may leave it there indefinitely, or I may move it back to the main directory and erase the Movable Type blog at some point.
Comments are no longer possible on the Movable Type site. I have made a note of this on (I hope) every blog post I have. If you wish to make a comment on any post, please go to the new blog, do a search for the blog post in question, and leave your comment there. Unfortunately, due to post numbering issues, I cannot make automatic redirects to the corresponding pages in the new blog (if you know how, tell me!).
If you are reading this through a feed/RSS reader, then please go to the new main page and re-bookmark the RSS link. If I am not mistaken, it is feed://www.blogd.com/index.rdf
See you at the new site!
August 21, 2007
Wow, It Might Actually Be Happening!
I seem to actually be having success switching the blog. A variety of factors helped, which I'll go into later. But if you are interested in seeing the new locale while I work on it ,please go ahead and see it here.
August 20, 2007
Whack-a-Mole Spambots and WordPress Migration
One of the issues that I have been dealing with is how to defeat the spammers. One way to do this in Movable Type is the change the name of the comment script from time to time. This seems to work, but only very temporarily. Even instantly, a lot of spammers' software detects the new name (probably once it fails to execute the script once) and just resets with the new script address. Some spammers seem to be set on long-term autopilot, however. I have changed the name of my comments script about 4 or 5 times, and yet I still get attempts to access the previous versions--even though the oldest of those has not been active for more than two weeks.
However, you can see the spammers when they auto-correct: you get two hits from the same IP address; one directed at the old script, and one directed at the new one. Apparently that's when they tried to old script, it failed, and so they accessed the blog entry page and found the new script. All automated; such hits are usually just minutes apart.
Another issue is how to move the blog from Movable Type to WordPress. A process which should be simplified by this time is still, apparently, labyrinthine. The most frustrating aspect to it is, as I have found out many times before, the documentation. Generally speaking, the people who make the documentation are the people who write the software... and as documentation authors, they suck big-time. Their most common error is to assume that everyone using the software has years of experience in web site administration, so they casually say "do this" and "do that" and give no frakking clue as to how these things can be accomplished.
One example is that since my blog is big, I am having trouble uploading the data file so that WordPress can even get started on processing it. They casually set a limit of 2 MB, while the file I need to import is 14 MB. Yes, I know my blog is a lot bigger than most, but still, they must expect most people to have tiny little blogs; I must have exceeded that much in my first year alone. Anyway, the problem, many report, is that the file is too big--so just divide it into pieces. Well, great--but how? No clue is given to this, they just assume that you know all about this kind of stuff. Nobody takes the trouble of spelling it out so that someone with no technical knowledge can do it. Even the better ones, at some point, have one instruction that depends on knowing some protocol, trick, code, or process--and just one broken link will completely stop you in your tracks.
It does not help that Movable Type apparently re-designed their support site, in such a way that none of the links work. If you search the forums, you always find somebody asking about a problem, and other members say, "You dolt! This has already been discussed, just go here!" and they provide a link. But the link is broken, and a search does not bring up the page they are apparently talking about. Swell.
Frankly, I will be rather surprised if I can get this done this week. I was hoping to have it done today, but tech-support lag has hit me. I needed to ask for a php file-size limit to be raised, and they did... for the main directory only. I did not know I had to ask for a specific directory, so another 2-3 hours must go by before I can get that fixed. And I'm only just beginning. If I succeed in importing the file, then there will be issues about making images work, and potentially much more difficult, making links work (where I linked from one post to another). There are tools and solutions for problems like these... each of which I fully expect to have exactly the same quality of documentation that I have experienced thus far.
Right about now, I am actually asking myself, "how much longer can I keep this blogging thing up?" I figure that I have to at least go five years without a day's break, and then maybe back off to several-times-weekly, or even whenever-I-feel-like-it.
August 18, 2007
Comments May Go Offline
Apparently I am under a full-on massive spam attack, and Movable Type is not helping; my web host continues to press me to switch to WordPress, threatening to shut me down. I have been planning such a switch for a while now, and for the past week have been prepping--despite having a full load of work for my school, plus other plans made long before which cannot be cancelled. Still, the change may not come fast enough--and so my web host may shut down the comments script while I am away over the weekend. I will try to get the blog up and running fully again asap when and if that happens, but it may take a few days. Thanks for your patience.
Postscript: checking my spam logs is an incredible revelation: spammers are hitting my comment script every few seconds. I expect there are as many as ten thousand spam attempts on my site every day.
This is a full-out assault of a kind I thought even spammers would not attempt. Who knows, maybe they figured that I had badmouthed them one too many times and decided to drop on me like a ton of bricks. Has anyone else out there experienced anything like this before?
August 14, 2007
Hotlink Protection
To all visitors:
Due to a warning from my web host, I have been forced to enable Hotlink Protection. The last time I did this, a lot of people complained of images not showing up. If you experience such image outages, please let me know in the comments of this post. Thanks!
Update: To those of you building your blogs over time, a little word of warning: avoid piling up too many entries with images in any one category. Why? Well, I just found out.
As I mentioned above, I have been taking heat from my web host. First it was for an "abusive script," which is to say that spammers were hammering my comment script like there was no tomorrow. Even though no spam ever gets through, they were pounding the damn thing like a sunavabitch, tying up the shared hosting server's CPU too much. I fixed that at least temporarily, with more fixes to come.
But that's not what I'm talking about in this note. After the cgi script was called out, and after I fixed it, my web host started complaining about my site getting "too many hits." They pointed out that my site was getting something like 80,000 hits every day. That seemed strange, as I only get a few thousand visits every day--but after checking, I saw the problem. As I said above, it was in the categories.
You see, I had a few categories that were image-rich, like "Focus on Japan," "Photo Stories," and "Birdwatching in Japan." Each category had several hundred posts, most of which had multiple images. And I noticed that these topics were getting the most number of hits of almost any file on my site--over the past two weeks, "Focus on Japan" got 3230 visits, "Photo Stories" got 1826, and "Birdwatching in Japan" got 694.
So what? Well, each time one of those category archives got hit, every single image had to be displayed--sometimes each loading of a page put five or six hundred images, each image representing a "hit" on my site. And a lot of those visits, probably most of those visits, were coming from the image search engine pages.
Each category archive page had accumulated hundreds of different topics, many hundreds of differently-named images; that variety meant that each such page would get more attention from the search engines. People hunting for one image would get the archive page and see hundreds.
As a result, a few thousand visits started generating nearly a hundred thousand hits, and I got into trouble. To fix it, I simply got rid of the "Photo Stories" and "Birdwatching in Japan" categories, figuring that I don't need them so much--few people if any ever comment on them, so I figure they're just drawing image searches. To hell with that! "Focus on Japan," however, I wanted to keep. To solve the problem with that one, I broke the category up into six categories, one for each year since 2003 and one miscellaneous category (see them now in the category list at right). That got rid of the troublemaking plain-vanilla "Focus on Japan" category, created several new categories not yet indexed by the search engines, and kept the number of hits per page down to a lot fewer than before.
So my advice: if you use images in many of your posts, watch how they pile up in the categories, lest you suffer the same fate I did.
August 09, 2007
Okay, That's It
That's it indeed. First free moment I get, I am putting in a block script in my image directory to keep Google's image bot out. I've had it up to here.
When you have a blog with images, you stand the risk of being swamped by hotlinkers. Sometimes I take some nice shots, and I want to share them. But what happens is that some rather impolite person will find that image via Google, and without permission, without a link back to my site, without even visiting my site at all, they simply grab the address of the image and hotlink.
I have described before what hotlinking is. In short, instead of using an image that they store on their own site, they "link" to the image residing on my site; despite being an image on my site, it appears seamlessly to be a part of theirs. The problem: I get charged for the bandwidth it eats up.
Normally, it's not a huge issue; usually, it's a small image and is only accessed a few dozen times. That's a pinprick; I would hardly notice it. A lot of times, people hotlink the images for use on forums or MySpace accounts; those are the worst offenders. But once in a while, someone with a high-bandwidth site decides to take a large image from yours and hotlink to it. That's what happened here.
Someone in Ohio who runs a web site that talks a lot about "how to increase your site traffic" and boasts of their AdSense revenue took a 1280-pixel-wide image from my site, a photo of lightning I took a while back. They then used that image--badly, at that--as the background for the title header on their blog, so that it appeared on every single last page of their own blog.
In just the past 9 days, the image was hotlinked at least 17,792 times... meaning that in just over a week, this one person ate up almost four gigabytes of my bandwidth, or about 20% of the total bandwidth of my site for that period of time.
I would not have been quite as ticked off had this person innocently hotlinked without understanding... but the content of their site, one comment in particular, told me that they knew exactly what hotlinking was. Worse, this person quotes scripture on their main page.
So I substituted the image with a smaller version that contained a text message detailing my annoyance (no obscenities), and left an acerbic comment... but really, this is the last straw.
As I mentioned above, the first chance I get, I am cleaning house. I will leave a robots.txt file to steer Google Images (and whatever other image bots I can identify) away from my images directory. I am then going to clear out every file more than 100 KB--even smaller ones, if there are not too many. I might replace them with smaller/more compressed images, but that could mean too much work. In the future, I will post larger images, but I will yank them within a few weeks or months.
This is disappointing to me. I like sharing the photos I take, I want people to enjoy them. But when even a few bad apples take advantage of that and run hog-wild over my site resources...
It is just too much frakking trouble.
Comments Temporarily Down: Frakking Spammers
I woke up to receive this message in my mailbox:
Your account blogd.com has been suspended or a file disabled in it for the following reason:That, my friends, is what spammers can do to you. Partly my fault, I guess; it has been years since I upgraded my blog software, and just as long since I have taken any evasive maneuvers in regards to my comments script. Fortunately, the web host only shut down the comment script file and has not suspended my account or shut down my scripting abilities altogether. So among other things on my "to do" list today, I'll have to reshuffle my blog software to get comments working again, and then get to work this month on upgrading the blog software--either to the newest Movable Type, or (my long-term intention) switching to WordPress.CGI/PHP Overload
--------------
Your account is causing high load on the server due to an abusive cgi script: ...
Pardon any inconveniences while this is taking place. Comments should be up again in a few hours.
Update: Comments are back up. More on other changes as I get around to them...
August 02, 2007
Four Years
I almost missed the milestone: today is my fourth anniversary of non-stop blogging. 1,461 days (365 x 4 plus a leap year day) since August 2, 2003. That was a photo post on my college's graduation day (we're holding it on the 4th this year, keeping it on a Saturday). I noted the first anniversary, the second, the third, and then today is the fourth. Makes me wonder what's be happening when I reach year five.
Today, I kept busy with a meeting at work (we're getting moving on switching our internal email system to GMail, specifically Google Apps for Education), grading tests, and doing shopping outside. Tonight, there seems to be an O-bon Dance festival at Sunshine City--I can hear the music from here (150 KB mp3 audio), and you can see a bit of the setup between the buildings from our balcony.

That's the Prince Hotel on the left, the NTT Building on the right, the Sunshine 60 Tower behind the NTT Building, and Shinjuku skyscrapers above in the middle. The O-bon festival is below and center. Closer up, it looks like this:

A very standard O-bon setup--you can see similar photos I took a few years back in Inagi. There's always a square platform in the middle, lanterns and lights strung out from that central area, people dancing around it, a wider circle of spectators beyond that.
Later, Sachi came home and we went to what is now our usual yakitori place--not the mom & pop place, but the bigger restaurant around the corner. The one where there's an old guy who comes in every night and monopolizes the waitress' time with idle chat. Before we left, a group of seven or eight older men came in wearing garb that told us they were with the O-bon festival themselves. It seemed like they had gotten an early start on the drinking, and were ready to get soused. So after a few made bombastic and half-drunk attempts to speak English to me, we finished and left, with a friendly "goodbye!" to the O-bon guys.
July 05, 2007
Off of Google Image?
I am seriously considering writing a bot script for my images folder telling Google Image to stop cataloguing my photos for their image search. I've allowed it thus far as a courtesy to web surfers to enjoy the images I have on my site... but this is getting out of hand. A huge number of people are now hotlinking to my images. It seems to have become somewhat of a practice on the web now. The biggest offenders: people in discussion forums, and people on MySpace and similar social sites. People just hotlink to images rather shamelessly, and it's getting out of hand. When I look at who is visiting my blog, it seems like one out of every twenty or so visitors is a hotlinker... which means they are not really "visiting" the site, instead they are swiping an image from my site, on my dime, and there is no link back to or credit for my site.
June 02, 2007
Odometer Click

Not that it's of any real significance, but round numbers can be strangely fascinating. 1400 days of continuous blogging and counting. It'll be four straight years in exactly two months, by coincidence. A good thing that Sachiko is patient and understanding of this quirk of mine, and doesn't begrudge the time it sometimes takes.
May 18, 2007
Blog Stuff Happens
There are sometimes perks to having a blog; interesting stuff can happen from time to time. Just a few days ago, a rather well-known site featured a blog post of mine on their front page. Some time ago, one of my posts was mentioned on CNN and linked to (though they chose to link to the cross-post on the now-defunct xpat.org). My post on Arguing on the Internet was selected for a college reader (they still haven't sent me the copy they promised, though). And unless someone was yanking my chain, author John Varley found a review I wrote on one of his books and left a nice comment.
Nothing ground-shaking or anything, but small and enjoyable brushes with third-class fame. And probably a lot less of this would happen if I weren't posting like a madman , filling my site with huge gobs of writing on a regular basis (1,383 days of non-stop posting, and counting). That certainly helps with The Google, which is what tends to get you noticed. (It also helps to have the word "blog" in the name of your blog, as it is a common search term, and so gets you ranked higher sometimes.)

Recently, a new perk popped up: a freebie DVD. A marketing firm found a post I made recently on WKRP in Cincinnati coming out on DVD. They contacted me and asked if I'd like to review the set on the blog, and if so, they'd send me a copy. A good deal for both of us, to be sure--it's something I would have eventually bought, and since I spoke favorably of the show before, they know my review will probably be a positive one. Can't promise anything--the way the music is presented (due to the music labels' greedy demands for huge payments on the song snippets used in the show) could be fine or it could take away from the quality; the special features, like commentary, could be great, or they could suck big-time like the directors' commentaries on the DVDs for Airplane! and Young Frankenstein. So we'll see. Trust me when I say that while I like the idea of freebie DVDs of media that I like, I don't like it so much that I'd stoop to sucking up for it. But given that I've seen all the episodes in question already and really liked them, there's not really much suspense in that aspect of the review. Look for the post on it sometime soon.
May 15, 2007
Shanghaied Syndication
Living in the Age of the Internet can be strange sometimes. Today, I found out that I've been syndicated, after a fashion. Came as somewhat of a surprise to me.
Usually, I get an average of just one or two comments per post. If I get more, then most tend to come from a group of long-time readers. Less than 24 hours ago, I wrote a blog post on the RIAA and their tactics in dealing with downloaders. Nothing new, I've posted much like it in the past.
But then, I started getting comments. Lots of comments. Not major-site-level comments, but more than a dozen in a single day is unusual (even for my infamous eyelid-twitching post), and all from people I never heard from before. Then a little flag went up, and I recalled when this had happened before. So I check my stats, and indeed, there are several hundred hits from The Raw Story web site. It's a fairly well-known site, having broken several big political stories. I surf to their site, and sure enough, there's a link to my blog from the morning on their front page.

What's more, it's not just a plain link to my page--it's a link that maintains the site's banner with my own blog post in a frame below it.
Now, there's nothing really wrong with that, and I do appreciate the nod and the traffic. But it's kinda weird; I feel like my blog article has been syndicated to their site, and I'm kinda the last one to know about it.
That's just an illusion, from one perspective. The story never left my web site. But the inclusion of the Raw Story's banner--with advertising which may have earned money for them--above the blog post creates the impression, especially to the novice reader, that the writing was done under the publication's wing, so to speak; it gives the impression that it's all part of their site, a site which generates income from advertising. (To be fair, the banner ad, at least at present, is for a Muscular Dystrophy organization; it's probably a PSA. But the banner does include links that ask people to advertise, which allow people to sign up to pay by subscriptions, or which ask people to donate money to the site.)
It strikes me as somewhat of a grey area. On the one hand, they are using another person's published work, without asking, to add content to a site which perhaps makes money from advertising on the same screen on which the borrowed work appears (or at least makes money from ads to the main site, which benefits from added content).
On the other hand, it is a media in which people regularly link to the works of others, and others are glad for the attention; furthermore, the publication did not actually transfer my work to their site, they simply split the screen.
Like I said, I'm not complaining. It just strikes me that the proper procedure would be to at the very least notify someone before going that far, or better, to ask permission. A plain link is not improper at all. But to display a banner and advertisements above the work, on a site that gets paid by advertisers for providing content? I doubt anyone would ever sue or even complain too much--on the contrary. It just seems a little... off.
April 18, 2007
Gun Control Blog
Interesting. Just in the past few hours, I got two different comments on an old post, my post on Gun Control from May last year. Usually I get comments on old posts very sporadically--to get two in such quick succession is very unusual. I thought maybe it was the same person using different names--but the IP addresses were different. Then I thought that someone was linking to my post from a popular forum or something--but no, my "Latest Visitors" stats page showed them coming from Google. Odd.
Then I figured it out: the Virginia Tech shootings. That has prompted people to search for sites that discuss gun control. And if you Google "gun control blog," I'm #2 on the returns. And sure enough, when I checked my recent stats, the search terms "gun control blog" which lead people from search engines to my site have spiked in the last 24 hours.
Interesting how stuff like that comes together....
March 31, 2007
Hotlinking, Political Style
Several times on this blog, I have mentioned my problems with hotlinkers. Since I post a large number of photos (more than many, fewer than some), and I keep those photos on my site indefinitely, I have to deal with people stealing those images. Now, I own almost all of the images that I post; if not, I note the source. But if I do use an outside image, one thing I make sure to do is to host the image on my own site. Why is that important? Let me explain.
When someone sets up a web site with their own domain name, they usually have to pay for "hosting" services. That means that they pay a service to give them disk space, maintenance services, and a 24/7 high-speed connection to the Internet. The web pages, images, and other files are kept on that hard disk. But the hosting service usually keeps track of how much bandwidth you use. For example, if I posted a 1MB picture on my site and 99 people downloaded it, that would equal 100 transmissions (1 up, 99 down) of that 1MB file; that would be 100 MB of bandwidth used. Since almost all accounts have a bandwidth limit (my own is 75 GB per month), it is a commodity which you are paying for.
Now, let's say that someone wants to use that photo on their web site. So they put a command on their web site that embeds the picture into their own web page, so it looks like the image is part of their own page--but the address of the picture is kept as a reference to my web site. In other words, the photo looks like it's on their site--but my site is where the image is taken from, so I get charged for bandwidth every time somebody views their web site.
This is called "hotlinking," and I've mentioned it before. People will not only steal your images, but then they will hotlink them so you will be forced to pay money for their act of theft against you. Kind of like someone stealing your cell phone, then you get charged for their phone calls.
The one defense you have: the image file resides on your site, so you control it. So, when some thief steals an image and hotlinks it, what I usually do is to change the image file to something insulting or (if I'm really ticked off) obscene. One time, a travel agency hotlinked one of my China travel photos, so I replaced the image with a graphic calling them thieves and warning their customers to not to buy from them (they swapped images and hotlinked from someone else after a few days, but here's what the site looked like for a while).
However, the real damage comes when someone hotlinks your image, and their site is heavily traveled--which means that you get swamped, and your site could even be in danger of exceeding bandwidth limits, which could get your site shut down. The worst incident I had was when an image on my site was hotlinked by Ain't It Cool, a popular movie rumor-and-reviews web site. Before I could notice it, my site had suffered 30,000 hits, losing 1.5 GB of bandwidth (this was back when my limit was 15 GB, and I came close to exceeding it sometimes). So I swapped the image with a graphic advertising my site. "Ain't It Cool" noticed the swap quickly, and then replaced the image.
So, why am I mentioning this? Because hotlinking has, at least momentarily, become an issue in the political arena. One of John McCain's web techs set up his MySpace page--but when he did so, he hotlinked to an image on someone else's site. When they made McCain's page, they used a free template offered by NewsVine, which allows people to use their MySpace template so long as they give credit to NewsVine, and so long as they don't hotlink.
McCain's site didn't honor either condition, so NewsVine's CEO Mike Davidson swapped out the image McCain's people were using with one announcing that McCain had reversed is opinion on gay marriage, "Particularly marriage between passionate females." Since Davidson's swapped image was on his own web site, and McCain's people were the one's who had placed it on their own web site, this was an "immaculate hack" that broke no laws whatsoever. Below are the real and hacked pages captures, the hacked one on the right.

image borrowed, but not hotlinked. Click on the image to visit the source.
Davidson writes about his decision to slap McCain's hand on this one:
But then I read the article in today's Newsweek about how politicians are all setting up MySpace pages in order to "connect" with younger audiences. McCain's MySpace page is listed, as are the pages from several other candidates. I think the idea of politicians setting up MySpace pages and pretending to actually use them is a bit disingenuous, so I figured it was time to play a little prank on Johnny Mac.Bravo! This is amusing to me on two levels--the political and the geek levels, to be specific. Well done, Davidson.
Luckily, I had already set up a special .htaccess rule on my server which served my real "contact me" image if the image was referenced from my own MySpace page, and served up a sample image if it was served from anywhere else. This is the whole reason I even figured out what was going on. I had my real image in cache and upon loading McCain's page, the real image showed up (including my special note that said "NO REQUESTS FOR DESIGN HELP PLEASE"). Thinking it was weird that McCain would get any requests for design help, I immediately realized what happened.
So, the only thing necessary to effectively commandeer McCain's page with my own messaging was to simply replace my own sample image on my server with a newly created sample on my server. No server but my own was touched and no laws were broken. The immaculate hack.
Abortion? The Iraq War? Probably too heavy to joke about. Gay marriage seemed like a more of a non-lethal subject to center the prank around.
So with a few minutes in Photoshop and a quick FTP, a new John McCain was born...
...and The Straight-Talk Express isn't just for straight people anymore.
February 18, 2007
New Spam Activity
Something that I just noticed: spammers are now adding a new tactic to their spamming: searching my blog. Not with Google, but rather using my own blog's internal software. My blog's internal search engine records the searches done by visitors; usually it just reads search keywords like "nova drug arrest," "hypnotoad," "right eye twitching," or "google video," to name four legitimate searches done in the past few days.
However, I am now seeing a few hundred searches performed every week which look for spam keywords; apparently, the spammers have automated programs access my blog's search engine script to look for spam terms they have undoubtedly tried to get past my filters in comment spam. In the past one week, 127 searches were made for URLs for legitimate sites (usually educational addresses), with words like "viagra" or "cialis" tacked onto the end. Another 44 searches were made directly for the spam terms themselves, like "used rolex," airline flight tickets cheap airline tickets," or even strings like "This excellent site!!! Want you good luck!!!" (clearly one of the fake compliments posted riding a spam link).
Apparently, Google ain't good enough for them. Now they have to directly access the blog's own search engine script to see if their spam has stuck. Which, of course, it hasn't. In the same 1-week period, SpamLookup blocked 1,681 comment spam. MT-Blacklist blocks a lot more than that--between 1,000 and 1,500 on any given day. MT-Blacklist blocks spam which matches the blacklist filter; SpamLookup stops the rest that get through, usually on the basis of having URLs or recognized spammer IP addresses, or whatnot.
But to the best of my knowledge, out of 2,301 blog posts and 5,513 comments over the past four years minus a few months, not a single shred or hint of spam exists on this blog. That ain't gonna stop 'em from continuing their barrage, of course.
February 14, 2007
Darn
I tried to get this blog listed in Google News, after seeing a good number of other political blogs and plain-simple web sites included. Google wrote back:
Thank you for your note. We've reviewed http://www.blogd.com but are unable to include it in Google News at this time. We don't include sites that are written and maintained by one individual. We appreciate your taking the time to contact us and will log your site for consideration should our requirements change.
February 04, 2007
Splogs
Yes, yet another weird new word describing something to do with those danged Internet tubes. This is one that I actually reported on six and a half months ago--though at the time, I did not know the name for it. I called them "Spam Blogs," and have since found that the abbreviated name "Splog" has been applied. An ugly word for an ugly practice, fitting enough.
A Splog is a fake, auto-generated blog that acts as a platform for spam links, or for AdSense or Amazon Associates links that will generate money for the spammer. The idea is to horn in on the keywords generated by blog posts to get listed in search engines, which will lead to people linking in and using the outbound links, thus generating cash for the spammer.
If that weren't bad enough, these splogs don't generate their own content. You do it for them, if you have a blog. The splog first finds your blog, then it grabs the text you wrote and automatically slaps all or selected parts of your entry into the splog. One way to discover if your site is being mined by splogs is to search for unique content on your site in Google's Blog search engine, and then see if more than just your entry comes up. I did a post on "Maeuri-ken," or advance ticket sales in Japan, just a few days ago... and within two days a splog had swiped my text and used it to try to generate themselves some cash. Here are the results from the Google Blog search (results will disappear after a few weeks of this post).
Here's the original post, in part (image):

And here's what comes out on the splog:

The splog page has links to dog breeding and pet care sites, as well as cross-links to other splogs and spam stuff. As splogs go, it is relatively inoffensive. It actually links back to the blog post it stole from, does not go overboard on spammy stuff like some splogs do, and does not inject spam links into the stolen text directly, as has been reported to happen. Also, note that in this case, the splog does not reprint the full text of my post, or even a coherent paragraph. It is more like the result of a search engine query looking for the keywords "small," "dog," and "sale," all of which appear in that post.
That doesn't mean that it is inoffensive, it just means that it is slightly less offensive than the many other splogs I've found mining my writing for their sleazy profit.
January 02, 2007
Fast Google
I'm rather impressed. Just one and a half days after posting an entry titled "Akemashite Omedeto Gozaimasu," that blog entry is ranked 8th on Google under a search for that expression. Which is quite something, as it is a common expression in Japanese, at least as far as New Year's is concerned. Somebody at Google seems to like me.
Or maybe it is a combination of the recency of the posting, along with someone on a Google Blog linking to the entry. That would be my sister-in-law, who has (so far) done a four-part series on New Year's in Japan, from the neighborhood-at-day perspective.
December 31, 2006
C&L "Achievement"?
I love Crooks & Liars. They're a great blog. They're on my LinkBoard. But I do think that they're going way overboard with effusive self-praise at a recent ranking released in the press:
There are literally millions of blogs now. For one single blog, on its own, to generate 40% of the ten most linked-to posts for the year is a truly remarkable achievement. It is a testament to the uniquely valuable role C&L plays in the blogosphere — not only in providing invaluable video content but, more importantly, in helping to shape the dialogue and agenda for the liberal blogosphere as a whole.The thing is, all four of their top 10 most-linked-to blog posts were posts that were linked to because they contained video content, namely Colbert's White House Correspondents' dinner monologue, Al Gore's SNL "presidential address," and two Keith Olbermann commentaries.
The reason people linked to these posts were not because of the commentary offered by C&L, it was because the video was there. But Glenn Greenwald, author of the self-praising post, claims that C&L's ability to "shape dialog and agenda" was more important than providing video content. Now, I think that C&L does a good job in regard to shaping the liberal dialog and agenda, but in all fairness, I do not believe that they received all those links on those posts for that. Remove the video content from those posts and allow them to stand on their commentary alone, and they would instantly drop into obscurity.
Don't get me wrong--providing the video content is fantastic, and I depend on C&L to see so much of this stuff that I would never get to see otherwise, living overseas as I do. But the true credit for the inbound links lies with C&L only insofar as they went to the trouble and expense to provide the video content; the credit for shaping the dialog and agenda goes to Stephen Colbert, Al Gore, and most of all, Keith Olbermann.
November 20, 2006
Canary Obligations
This is weird. I have a blog entry from December 2003 where I put up some photos of my dad's canaries. I made no claims about canary expertise, and made it clear that they were not my canaries. And yet, since then, I have been getting comments from people--most from people with Arabic-sounding names--asking for canary stuff, as if I were a canary go-to guy or something. The comments include:
I WANT FHOTO CANARY FOR BREEDING AND AECHIVES ALBOMIn the comments, I replied to these, saying that I know nothing about canaries, they are not mine, I don't have regular access to them and so on. But the weird comments keep on coming. Just a few minutes ago, another came in from "Rashid":
THANK U. ["Masoud"]visit [my] site and write me a letter showing me where the canaries live. ["Meshari"]
Please send me picture and article Kingstroat and Backsrtoat breedings, thank'c (Indonesia) ["Anjar Siswanto"]
My canary is sick. I'm to give him 3cc of liquid antibiotic two times a day. Is there an easy way to do this? Thanks! G ["Georgianna"]
hi please send me photo by canaryAll part of the risks of blogging on random stuff.
thanks you.
November 13, 2006
Beating the Spammers
About a month ago, somebody asked me if there was a way they could write their email address on a web page and yet avoid having it picked up by spammers. I was about to tell them it was impossible, but then I had an idea--and it seems to have been a good one. I tested it, and indeed, it did work. The best part is, it incorporates a technique used by spammers themselves, and beats them at their own game!
A little background first. Putting an email address on a web page, or for that matter, anywhere on the Internet where the public can see, is an open invitation for spam. Spammers use automated programs, called "bots," to "harvest" email addresses. The bots scour every last web page, discussion group, and other public piece of information on the Internet for anything that looks like an email address. When they find one, they add it to a list, and start sending spam to it.
I know this is true because I have tested it. FIrst, I create a brand-new email address (e.g., "brandnewemailaddress@blogd.com"), one which has never been used before, and one which no one but me knows about. I then put the email address up on this blog's main page. To ensure that only spammers can see it, I make the address the same color as the background, rendering it invisible to the human eye. Five months ago, I put up one such address, and after a week, spam started coming through; after one month, it had drawn 41 spams; this week, it has been getting about 7-8 spams per day, and has collected about 500 spams altogether.
So, I know these bots are constantly surveilling my web site. I know that any email address posted in such a fashion will be picked up. So how could I post an email address and not have it be picked up?
The idea came from a technique I saw spammers use themselves. When spammers send email, they know that certain words will trip spam filters, and that will send their spam to the waste pile, where it will never be read. One key word, for example, is "Viagra." So spammers who want to use this word will try to disguise is. One way is to misspell the word, for example, "V1@gra" or any other of a hundred variants. But the technique I saw spammers use years ago works to foil the spammers themselves.
It involves the use of HTML code. HTML is the language used to write web pages. If you go to the "View" menu of your browser and choose the "View Source" command (or anything that promises to show the "source"), you'll see the same page, but as it appears originally, the source code. You will see that it is filled with stuff inside <angled brackets>. On a web page, anything in angled brackets is considered a command for the browser. As a simple example, "<b>" is a command to make text bold. One "harmless" command is <!-- text -->. That is a comment command, an exclamation point followed by double-hyphens within a set of angled brackets. It doesn't do anything, it's intended solely as a comment in the code. Because it's an HTML command, it does not "render," that is, it does not get shown to the viewer on the web page; it is "edited out" by the browser.
Now, spammers used to use this as a way to break up a word so it would get past the spam filters for email. For example, instead of writing "Viagra," if they instead wrote "Vi<!-- text -->ag<!-- text -->ra," the spam filters of the time would not see the word "Viagra," but since an email reader will render HTML code, the stuff in the brackets would disappear for the person who was looking at the email, and they would see "Viagra" in the clear. Clever! Until, of course, the email spam filters were updated, and it no longer worked for spammers, so they stopped doing it.
But apparently, the spammers never updated their own bots to filter out their own trick! I tried writing an email address in the clear, broken up by this old spammer's trick, and it has been a month, and not a single spam has been generated! In all other tests where I put up email addresses, spam started coming within a week, and dozens had come by the end of the first month.
So if you want to post an email address so that people can see it but spammers (who are not people, after all) cannot, then add those comment commands within the HTML code on your web page. Look at this new email address I just made:
spammerssuck@blogd.com
Now, I didn't really type that in the HTML code. You see it as being in the clear, but if you were to look at the HTML code for this page, you would see that it really looks like:
spa<!-- toy -->mm<!-- blue -->erss<!-- bottle -->uck@blo<!-- phone -->gd.<!-- box -->com
Note that in the HTML comments breaking up the email, I inserted random common words--another spammer's trick, to throw off filters. But really, you could probably put anything in the comments, it likely doesn't matter.
Now, will you be safe by doing this forever? Hard to say. Spammers might never pick up on this, or they might write a fix for this tomorrow. Heck, they might read this blog regularly, and I might be tipping them off. My guess is that they won't bother changing their code to account for this trick until a significant number of people start using it. So if you're forced to put your email address up on a web page anyway, and you don't have access to complex coding that might protect it, then you might as well give this method a try.
November 12, 2006
Behind the Curtain
Yet another of my Internet peeves: stuff like this.
Now, I could have gone ironic and left it at that. But my point is that the post consists solely of a link, but does not describe at all what the link is. This TPM post is more egregious than usual, as it only consists of the word "Yep," which is a link to a Daily Kos story. Josh Marshall is a repeat offender with this.
There are three reasons I don't like it when people have links and yet give no clue as to what they are. The first reason is that I don't like jumping around. If I am on a page where many entries might appear and I haven't finished all of them yet, I might not want to leave the page to look at something and then have to come back.
The second reason is that I don't like being led blindly about. I want to know where I am going to before I go. For me, a blind link like Marshall's is equivalent to someone holding something I can't identify up to my nose, and without explanation, saying, "smell this!" If you write a post about something, it stands to reason that you should make clear what you are talking about. If you don't have time for that, maybe you don't have time for writing the post. Take a minute and at least write a short note about where you're leading people and why. If the mystery was intentional, I like that even less; being coy may be fun for the writer, but it's a lot less fun for the reader. Maybe the article is one I'd like to read--but maybe not. People who have slow Internet connections and have to wait a while for pages to load must be really annoyed by stuff like this.
The third reason I am uncomfortable with this is because links go bad. If the post's archive is kept, people will find it with Google--but the link the entry points to could disappear at any time, especially news stories, which often have a very short lifetime. Without any exposition in the blog entry, the reader will be mystified at what the blogger is talking about. Broken links can also nullify the point of an entry by making key data or evidence inaccessible; ergo, bloggers should take the time to copy and paste the relevant portions of the page they are linking to, so the pertinent information is preserved.
An associated peeve often appears in comments left by readers, usually combative ones: some will make a vague argument ("I disagree with what you said"), and for a riposte they will link to an article--but they do not explain which part of my argument they disagree with, nor why, nor what it is in the article that supports their point. Unless the entire article is their point (which it almost never is), then I am left to read through an often lengthy piece (usually an annoying diatribe by a right-wing pundit), and then guess as to which part of the article the visitor was referring to; essentially, I have to do all the work of creating the visitor's argument for them, and even then, I am not sure if I understand what they were thinking about. In such cases, I usually ask for specifics and refuse to respond until they are given.
Long story short, it's best to be specific, and not count on links to tell a story that you should be making yourself.
Coming soon, another pet peeve: people who constantly whine and complain on their blogs. I hate that.
November 07, 2006
Firefox Fixed
For some of you, this site has not been displaying correctly for a week or so: the header and sidebar remained intact, but the main body of the page with the blog entries disappeared. If that's what you experienced here, then the problems started when you upgraded to Firefox 2.0 and had a certain screen size. You might have found that decreasing the text size would have brought it back into view, but might have made the text too small to read comfortably.
It took me a bit of work to hunt down and fix the problem--an old bit of css coding (which Movable Type put in the stylesheet years ago) that told browsers to hide "overflow"; Firefox 2.0 must have redefined "overflow" in some way. I removed the offending code, and all should be as it was before. If not, I hope you'll let me know.
November 06, 2006
Comment Comment
A note on a milestone: this blog received its 5000th comment last night. More than 900 of those comments are for my post on Eyelid Twitching.
That number, of course, does not include any of the spam comments that regularly shower the site (more than 7000 have gotten past my spam filters and had to be manually removed; the filters have caught and blocked hundreds of thousands more--and no, I am not kidding or exaggerating about that number).
5000 comments may sound like a lot, but it averages out to between two and three comments per post. Not too bad, but not as many as blogs with smaller readerships. I think one of the reasons for this is that I am not very good at responding to comments; if readers feel that the blog's author is not reading the comments, they will be less likely to write more comments. Many prefer conversation of a sort, instead of just leaving a note.
Actually, when you leave a comment on this blog, it generates an email to me, and I read every one. I'm just not good at correspondence, is all. Most times I read it and agree, and have nothing to say. I should respond more, though. My apologies if you commented and I said nothing in response.
October 09, 2006
Comment Spam--OK, This Is Not Blowing Over
Another spam deluge. I have been getting a dozen or two blog comment spams which have been blowing past all my filters, mostly with blogspot addresses. Even if I blocked all blogspot addresses (which I don't want to do anyway), it still wouldn't get all of them. I though I could wait the deluge out, but it's been almost a month now, so for the time being, I'm setting the number of allowed links in the comment text to zero.
You can still leave links though; both in the URL window (less attractive to spammers because it's a dynamic link which give no Google juice), and in the comment text as straight text; any URL without hyperlink code (the "a href" command) will be OK. In the comment window, leave out the "http://"--just type beginning with "www..."
Furthermore, if you write the URL out in plain text, I will turn it into a hyperlink myself during moderation. So few of my visitors leave URLs that this will not be a problem--and it will certainly be easier than manually deleting the dozens of comment spams that come through in any case.
As always, thanks for your patience in this regard.
September 18, 2006
Site Outages
If you've had trouble getting through to my site, you aren't alone. I've been having small to large outages for the past few weeks. I'm on my web host about this, but sometimes web hosts do this, and you have to bear through it... One of the frustrating things is that you don't always know what causes it. Web hosts never admit to screwing up. They either dismiss it as a minor technical glitch beyond their control, or they blame it on a disruptive user on the server. With continued outages like this one, neither makes any sense; either they should have repaired the glitch or given the boot to the disruptive user long before now... which probably means it's the web host screwing up somehow.
My apologies for the inconveniences. I'll post a quick note when service seems to be stabilized.
Update: Great. I get a message back saying that the problem "was due vulnerable perl scripts running under a user," and that they have "disabled that user." Except when I check, my own site's comment script has been disabled. Charming.
...And as I tried to post this, I got an error message. Can't post, either. Swell. My site is up, but it's also dead. [Obviously I posted this after the problem was fixed.]
September 03, 2006
What The...
Great, now I've got something new to worry about. It seems that blog spam (comment spam, referral spam, trackback spam) was not enough. Now I've seen a Movable Type blog hacked by spammers. True, it was an older version of the MT software--though I'm not even sure that was the weakness the hacker exploited--but it's now officially got me worried.
It was not my own site, but the site of a family member who needed a web site with certain features, and so I quickly set up an MT blog to fill their needs. I was checking it out today, and suddenly I noticed that on a few entries, a spam link was inserted right smack into the middle of the blog entry page. Worse than that, the browser I was using (Firefox on XP) suddenly bogged down, trying to open up a variety of weird windows and files, to the point where I had to force-quit the application. Later, I tried opening the same pages on my Mac with Safari--and though Safari did not try to open the files, it opened windows filled with junk characters--and it, too, froze and had to be force-quit.
When I looked at the individual pages, it turned out that the link was added directly into the page's HTML, and there was an indecipherable script added to the bottom of the page. When I checked the entry's core text on the blog's control page, it was clean--so the blog's software and database seem not to have been compromised. But the site's security apparently was, and that site's security was standard for the field.
But that wasn't all. In addition to the 8 pages of the site that were hacked, 3 PHP scripts and an htaccess file were also added to the directory.
Since the original blog database was clean, it was a simple matter to rebuild the site and then erase any files added by the hacker (who apparently did the deed several months ago). So now the site is clean, but whatever scumbag hacked the page might still have access... so I'm going to have to reset the entire site, and then keep an eye out for more stuff like this in the future.
I knew spammers were despicable lowlife criminals, but I've never seen or heard of them going this far.
September 01, 2006
Sidebar Rework--Finished!
A small note--I've just reworked the sidebar a bit. If you use it at all, you'll notice a few of the changes. The first change is in the LinkBoard. Remember that episode of Seinfeld about speed-dial rankings? I kind of feel like that when I'm reworking the board, wondering if the people listed will notice the changes and say, "Hey! He moved me from 14 to 16! What's up with that?" Added to the Linkboard is Sean's and Justin's blogs (about time), Blog-Q Taro's (a lot of actual link referrals come from there, I don't know why), and a completely new Japan blog, "My So-Called Japanese Life," authored by Shari.
Next, the "Best of BfAD" is revised. There are now sections to the list. I've collected all of my serious-issue entries and collected them at the top; if you want to know where I stand on just about any issue, they're all there, in gory detail. Included is my post on "Arguing on the Internet," which, to my rather great surprise, was selected by a professor in Texas as an entry in a college reader, a book used by Academic Writing students as examples of writing to emulate. It's not published yet, but will be out in the next few months, I'm told--I'll be sure to go on at length about it when it's released, as if it were an astonishing accomplishment or something.
The second section is entries that come in series, including four almost-categories where I've blogged repeatedly on specific matters (The Republican Blame Game, The "Liberal Media" Lie, Right Wing vs. Judiciary & The Constitution , and Church, State, & Christianity ). Not repetitively, though, I hope. The third section is general posts, followed by a few resources in last place.
Next, I've gotten rid of a few sections, including the "Media Reviews" panel which showed "What I'm Reading Now," which, of course, was always out of date (no, I did not read "Big Lies" for two years straight). I figured I don't change it enough, and certainly I don't get people ordering from Amazon through it, so out it went.
Last was a small rework of the end links, adding a number of blogs that have been kind enough to link to me over time. I'm sure I've missed a few; if I did, please don't take it the wrong way.
August 23, 2006
Fun with Blogging
This is one of the fun things about blogging--you never know what's going to click with people, and sometimes it's just plain weird. Like my fourth blog post after starting this whole shebang was an offhand remark about my eyelid twitching, and suddenly people swarmed to it. Even today, three years later, I still get several comments a week on a subsequent post on eyelid twitching, and it's up to almost 900 comments so far. That one post is my main draw from Google. Go figure.
But that's what blogging is like. You make long, arduous, well-thought-out posts on serious subjects, but then your stats show people crowding to read your pithy comments on errant bodily functions. With a blog, you never know what people are going to want.
Just before heading off to Karuizawa, I wrote a post mainly consisting of whinging about whether or not I should get my Powerbook screen replaced, and then a short recalling of the Apple store people giving me a big bag to protect my backpack from the rain. After getting several useful comments with advice on the repairs, I forgot all about the specific post--only to later notice a big spike in visitors around that time. For some bizarre, unknown reason, the Mac news aggregator MacSurfer.com picked up the blog post (usually someone has to recommend it), and then two other Mac sites (like this one) picked it up under the topic of "Opinion," without comment. I gotta figure that they just automatically, mechanically pick random stories from MacSurfer, because the post is pretty ditzy, to be quite honest. But hundreds of people came just for that. I always hate to think people have come to my blog and have left disappointed, but then, people are still raving about my eyelid twitching thing. Maybe I'm just not that good a judge of what people like.
August 12, 2006
Diary Rescue
Hey, whaddaya know... my first "diary rescue" over at DKos, on the Moralities post. I've been cross-posting a few of my entries over there, but the non-recommended diaries disappear off the page so fast--there are so many posts by so many people--that it seems few people get the chance to read it. The only way you can really get an appreciable audience is to get "rescued." And, it does encourage one somewhat. (Tip of the hat to Paul for the original suggestion.)
August 02, 2006
Three Years
...And that makes three. Three years of blogging daily, without a break in that whole time. I started on August 2, 2003, and noted the yearly milestones for year one and year two. I'm also coming up on the two-thousandth-post mark, though not quite so soon--probably I'll hit that late September or early October. Blog readership is down a bit recently, but that may be more because of the fact that I put trackbacks to rest, and as a result spam, now limited to comment spam and referrer spam, has trickled down to just three or four thousand a week. About 15% of my visitors are returning readers, and according to Google Analytics, several hundred people visit regularly (more than half from the U.S., another quarter from Japan, and the rest from mostly English-speaking countries, naturally).
And so on to year number four...
July 30, 2006
Good Web Sites
Here are some web sites I regularly visit, but don't link to on my sidebar, at least not at present. They're good regular visits:
Cosmic Buddha: Japan blogger with a good sense of humor.Most of these will probably get on to the LinkBoard, once I find the time to clean up the blog some. Should be soon--I just taught my last class of the semester, and with just one final exam to give, some grading to do, and the graduation ceremony, there's not much left before my one-month summer vacation. Ah, the college professor's life!
Crooks and Liars: why the heck don't I have them up on my sidebar yet?
Engadget: a good general tech blog.
FG: a good blog on Japan's happenings (don't make me print their full title).
Pharyngula: a good science blog; often takes on the creationists.
SeanPAune: some good stuff here.
The Straight Dope: fun facts for the day.
TUAW: The Unofficial Apple Website, good Apple tech blog.
July 28, 2006
Now I Remember....
My luck with web hosts has not been spectacular, though it has probably been about par for the course. Although my current web host has done the best so far, two that I stuck with before had to treat me pretty badly before I left. In every case, the thing that has torn it with me with web hosts has been flaky service--site outages and things just generally falling apart. But the two hosts that really let me down went farther than that.
If you like to read people who vent their whining, read on below the fold--I won't inflict my incessant whinging on those visiting the main page. Essentially, I go into detail about how the two web hosts were very, very sucky, and how I had to revisit one of them just the other day, in a way that brought back all my memories of just how sucky the suckers sucked. If you for some bizarre reason don't like to read other people moaning and bellyaching (what's wrong with you?), then just go on to the next post.
The first web host I used was ait.com (formerly named, confusingly, "aitcom.net"). These guys have now been around for about a decade, and even now they're regarded as scammers to a certain degree. When I signed up for them, all was OK for a while, but then I got burned rather badly. It started when my account got suspended due to "nonpayment," rather strange as they were charging me automatically. So I started looking more closely at them. One thing I found out: they swapped out the contract on me. And not just me, but lots of people.
When I signed up with AIT, it was on a monthly contract. Never sign up with any web host who fails to offer monthly payments; it usually means they suck and want to bag you for a whole year before you discover how much they suck. With AIT, they sneak up on you. When they got to be so bad that I considered quitting, I checked my contract--and found that I was not bound to a six-month contract. Whoa! When did that happen? Well, most online companies have a silent-switch clause in them. The clause says that the business may:
make changes to the terms and conditions of this Agreement at any time, and to the on-line application to include service pricing, advising of the change and the effective date thereof by publishing it to the appropriate AIT web site.... Utilization of the Service by the Customer following the effective date of such change shall constitute acceptance by the Customer of such change(s).Essentially: we can change your contract anytime just by changing the long, boring and confusing legalese, and if you don't re-read the contract every month and notice even the smallest change, that means you agree.
Most people accept this as a way for them to make reasonable changes without having to get a positive confirmation from every single member, an impossible task. The understanding is that if there is any change that is significant, the company will at least make a good-fait effort to contact you. They do not expect that the clause will be used to sucker you.
That's what AIT did. They changed the contracts to six-month contracts and didn't tell me or any of the other members I asked. I was getting email from them, and never heard a word about it. Fortunately, I happened to check just as a new six-month contract was about to start, so I pulled the plug and went elsewhere. Not that easily, of course--I had to deal with them overbilling me as I left, with outrageously rude customer support people and maddeningly frustrating run-arounds... until I finally emailed their top execs and threatened to make a big noise to the Better Business Bureau--at which point they immediately coughed up what they owed me and claimed that it was all a big mistake. The whole painful story is memorialized here. That was October 2001.
For a year after that, I used a service called Aletia. They were OK people, but the service got so buggy that I had to bail after ten months. Just as I was leaving, they said sorry--in the form of promising to host my web site gratis, indefinitely. I left my main email domain there, while moving my blog and other accounts to newer pastures. Ironically, after I left, their service got better, and even when they were sold to JaguarPC web hosting, they continued to keep me on as a freebie.
But the next host for my blog was another nightmare: Myacen, an Aussie firm. They were OK for about a year and a half, except for billing problems. They kept promising an autopayment system which never materialized, and the manual payment system was a joke. They kept sending confusing invoices and messages that seemed to say that payment was not due when it actually was. They even sent emails to me noting that many people were having the same problems.
But then, in 2004, they started experiencing major difficulties. Repeated outages, my domain would drop off the Internet repeatedly, settings would be lost, all kinds of problems that would take lots of work to fix. Myacen would change servers, IP addresses, and carry out prolonged maintenance without warning. Worse, their customer support sucked. AIT sucked by requiring international phone calls. Myacen sucked by not reading your "helpdesk ticket." Either their responses were automated or the support personnel never read the tickets well enough to give a helpful answer--and it could take hours to get an answer, after which you would inevitably have to send another ticket to tell them how they got it wrong.
So I wanted to quit. Like all businesses, they will accept the minimal application to start billing you, but they make a huge production out of making you jump through hoops before they will believe that it's really you trying to quit. Further, when I called beforehand to ask about lead time in telling them to cancel my account, they told me one thing. Then, when I canceled my account, they said that because of the billing problems (which they had previously admitted were their own fault!), I was "on probation" and so they'd bill me for an extra month for canceling "late." That was summer, 2004. That lovely story is chronicled here.
One last detail: in both AIT and Myacen's case, they claimed to want your feedback when you left. In both cases, when you tried to go to their "feedback" page, the page was inaccessible. That pretty much sums it up nicely.
It's been two years now, and SurpassHosting is surviving the test of time so far, outlasting the other hosts now with no major problems (oh, they all have outages and stuff from time to time).
But by chance, I had to deal with Myacen again. My father still uses them for his business web site (he signed on before I hit the worst spot with them), and has just left the thing on autopilot for the past two and a half years. Now we need to access his account--and Myacen changed the login system, which apparently erased his user ID and password. We're locked out.
So I submitted a helpdesk ticket telling them the story, referencing emails they sent him about the billing system changing (none of which said his login info would be erased if he did not act), and I specified that the login using his email address and password did not work, and could they please tell us how we could log in.
Some time later, the answer came back. It simply said, "here is the URL to log in," and gave the URL. That's it. Not a word about why the user ID and password don't work. Ah, the good old days. How nostalgic.
In this case, the answer they gave demonstrated a complete ignorance of my problem: the URL they gave me to log in was the same one I used to submit the ticket. Apparently, they thought that I could not find the page that I used to send them the ticket in the first place.
So I sent them another ticket, in the vain hope that they would understand this time and send the correct information. No answer after 8 hours. So I sent another ticket after that, high priority, lights flashing... another 8 hours later, still no response. Sent a fourth ticket. What is wrong with these people?
Finally, I got an answer. Their response? It's my fault because I didn't understand their answer. According to the "support" message, "The page that was linked to you provides users with the oppurtunity to reset their passwords." What is that "opportunity"? The "I forgot my password" fallback. Which, of course, is BS--I didn't forget the password, the password simply didn't work. They forgot the password. And there is no reason to believe that the "forgot my password" option will work anyway--if they forgot my password, why should they remember the user ID? It's a bad answer to try to cover up for the bad initial support message.
Recently, my SurpassHosting account got upgraded, and I got a whole slew of new add-on domain slots, hard disk space, and bandwidth. So when my father's contract with Myacen comes around in December, I'm going to save him more than a hundred bucks a year and simply piggyback his site on my own.
July 06, 2006
Links to 3 Again
As someone pointed out, my filters have been aggressive as of late, limiting the number of links allowed. I've reset the limit to 3 again; hopefully, attacks have subsided enough that I won't get deluged...
June 30, 2006
RSS Back at Full Blast
After a 1-month experiment with trimmed RSS feeds, I have reset the feeds to display full stories (or at least, much, much longer excerpts of them). For those of you monitoring the site by feed readers, thanks for your patience.
June 18, 2006
Trackback Cleanup
Wow! Turning off the trackbacks really made a difference in clearing out the spammers. I'm still getting about one hundred comment spam a day (MT-Blacklist gets about 75% of them, SpamLookup the other 25%), but the trackbacks seem to have been the real monster here, as I noted before. And it showed up like crazy in the AwStats: immediately after I deactivated the trackback script, the number of "visitors" dropped by a bit more than half. This may represent the clearest picture of visitors I've gotten from AwStats in quite some time.
But it also shows how lopsided spam has become. Half of my site's "visitor" traffic was trackback spam. Maybe 25% of all traffic volume, in megabytes, was spam--the spam, at least, did not download movies or anything. Just made smaller hits--but there were so damned many of them. For a long time, I'd known that about a quarter of my site's traffic was spam. Recently, I could see that my "visitor" traffic had inexplicably risen by about 30% to 40%. This explains why.
When I look at the "recent visitors" in my domain's CPanel, I can still see them constantly hitting the site--but now all they're getting are "500"-type server errors. Maybe in time they'll even stop trying, once they see there's no trackback there anymore.
Turning the damned things off was the smartest maintenance move I've made so far. If you still use trackbacks, consider shutting them down. They're not worth it.
June 16, 2006
Trackback Off
So much for that failed blog element. Trackbacks are supposed to be an automated linking system between blogs that refer to each other. Say if Blog A has an interesting post, and the author of Blog B writes his own entry which refers to Blog A, complete with a link. When Blog B publishes this entry, his blog software will send a "trackback ping" to Blog A, informing him of the reference. Blog A will often catch this ping and automatically make a link right back to Blog B within the original entry's comment section. A nice idea--you link to me, I link back to you, we know we're talking about each other. Community. Cooperation. Swell.
The problem: spam. What else? Now, maybe once a month, at most, I get a genuine trackback ping. In that same time period, I get tens of thousands of fake trackback pings from spammers (called "spings"). What's more, these fake pings show up in my referrals (AwStats, not Google Analytics) and make a further hash of the numbers beyond what referral spam does. It got so bad that after my spam logs got so flooded that it caused a server error and I had to purge the database, within a half an hour, 114 more trackback spams had started filling it back up again. It got to the point where comment spam was beginning to look like a minor annoyance.
Mind you, the filters were working--no trackbacks were getting through. But I have no real use for trackbacks anyway, so I got a special utility and closed off all the trackbacks for all the entries in the entire blog, and then I switched off the trackback script. And good riddance.
June 13, 2006
Spam Blogs
Well, if this doesn't beat all. Spammers have a new trick: create fake blogs, then steal blog post text from real blogs, and then load the blogs with ads for ringtones, pharmaceuticals, porn, and others spam commerce sites.
I found out about this by doing a Google blog search on the title of my last blog post, "Japanese Green Pheasant," to see who else may have blogged on the bird. To my surprise, my own post appeared--three times. One was the original, and two more were spam plagiarists. Only a few lines of text are stolen, but it's still theft, and still it's spammers using my work to sell their crap.
Looking up other titles, I've found more of my posts stolen as well. So far, all the spam blogs I've found (and no, I won't link to them) are on "yahoo-blogs.com"--but don't be fooled, it's not really Yahoo. A spammer apparently got ahold of that domain name, and uses it just for spamming.
June 08, 2006
Finally, Someone in Nigeria--er, Russia--Needs My Help
I don't know what took them this time, but finally, five days after putting a virgin email address up on the main page, spam started coming in. A birthday present for me! Just one message so far--a Nigerian variant--but it's beginning. I am adding it in the comments of the original post, where I will collect all future spam.
This Nigerian variant pretends to be a Russian barrister representing an oil tycoon who got arrested for bribing politicians, and needs to funnel $18 million through my bank account. I sent him my credit card number right away.
Oh, and the travel agency is still hotlinking to my image. Bless their hearts.
June 07, 2006
Web Host Alert
I'm not getting anything for this, but it could be a fairly good deal, so I thought I might put out the word. The web hosting service I use for this blog is Surpass Hosting. I signed on to this host 2 years ago when my old host went into the "too much trouble" mode that all web hosts seem to reach at some point. At the time, Surpass was having a 2-for-1 promotion, where if you signed up during the promotion, you got two accounts for the price of one. If you only have one domain, then it isn't that great. But if you have several, like me, and especially if you're hitting high bandwidth, then the deal is very good. I signed up, and have been more or less satisfied with the service. There were two periods where I almost felt like leaving, but it never passed that threshold.
In the second half of June, they're reviving the promotion for the first time since I signed on. The lowest-cost shared hosting deal is their "Power" plan, which has these specs:
$6 per month ($65 for a one-year contract)You always have to take the "unlimited" claims carefully; there are limits in that you can only use up a certain percentage of the server resources. Too many active scripts on your site could get you into trouble, but you have to do a pretty heavy amount of activity to get there.
5GB hard disk space
200GB monthly data transfer
10 add-on domains
Unlimited email, subdomain, and FTP accounts, and unlimited SQL databases
"Shared hosting," by the way, means that your web site is one of a few hundred that inhabit the same server computer at the web host's facility. It's usually enough for a regular person's web site, like a blog or something else casual. However, there are more chances for one or more other users on the same server to behave badly and negatively impact your site's uptime and performance. The next step up is "Virtual" or "VPS" hosting, which also divides a single server among many accounts (often up to two or three dozen), but each gets a bigger share and each account resembles a private server in some ways. These accounts can be more expensive (starting at around $50/mo.), but allow greater access to site resources. Then you have "Dedicated" hosting, where you get a server machine all to yourself, with the CPU and hard drive dedicated to serving your site, with no one elbowing in on the resources. These are the most expensive, ranging from a hundred to several thousand dollars a month, depending on the package and the bandwidth you get.
As for the other specs: "transfer" means how much data can be uploaded by you/downloaded by visitors each month. My blog is pretty active (around 30,000 unique visitors/mo.), and I have several multi-MB files that get downloaded a hundred times each or so every month; my transfer amounts to about 30GB/mo., and is growing. "Add-on domains" means that in addition to the domain that dominates your account (mine is blogd.com), you can also have other domains run from the same account. Each one inhabits a subdomain, but appears to the world as an independent domain. For example, I have the domain "xpat.org" settled within my "blogd.com" account. Its real address is "xpat.blogd.com," a subdomain, but if "xpat.org" is directly accessed, it'll act like its own domain--just not at the moment, though, as I currently have it sleeping, and it redirects to this blog's main page.
As I mentioned, Surpass is OK as web hosts go, but like all hosts, it has had its rough spots. Soon after I signed on in 2004, a string of hurricanes hitting Florida showed up a lack in their backup facilities, but that got resolved. And late last year/early this year, I had enough site slowdown and script failure problems to almost make me move, but I stayed on and that got resolved. Both are good signs--the bad sign is when the problems don't get resolved, and your host shows little interest in doing anything about it. But even with Surpass, you gotta keep on their butts about it. Eventually, my latest problems were solved when they moved me onto a new server; the problems arose from another account on the shared server being a CPU hog.
One of the better points about Surpass is the rather quick response time to support ticket requests; they tend to answer in a few minutes, and often the solution is not far behind. But as with any host, different people have different experiences, and there is never any guarantee that any host won't go south at any given time.
As usual, you can find user experiences and a variety of good advice on Web Hosting Talk forums.
Ah, and the tech support just got back to me: they upgraded my account. When I signed up, ten bucks a month got me 7GB of hard disk space and 75GB traffic; under the present deal, the same amount gets you 10GB and 400GB. So I asked if they'd up me to the present levels, and they did. Fair enough, but apparently you gotta ask in order to get it....
June 01, 2006
Feeds Click Through, Please
Starting today, as an experiment in statistical accuracy, I am resetting the RSS feed for each blog story to a lower amount, meaning people who read this blog by feed readers will have to click through to the actual blog page to read the whole blog posts. If this is too annoying for some, please let me know. For the time being, at least, I would like to know how many people are actually coming to read via RSS. Thanks for your cooperation.
May 27, 2006
Comments Getting Through?
As a result of even greater waves of spam (I'm not the only one--a Google Blog Search reveals that one or more spammers are being even more obnoxious than usual), I turned on a few more protective layers of my spam filters. I don't want to shut out legit users, however, so I've set up a throwaway account for you to email if your comment is blocked. Just drop me a line and let me know if you're having any trouble with it. The throwaway is aptly set as: throwaway2 at blogd dot com. Thanks!
May 20, 2006
Google Analytics Review, Part 1
Okay, I've been using Google's beta web site statistics service for a little over a week now, enough to get data for the time-range analysis which is part of the software. This is a basic layman's report on how Google Analytics (GA) works, what it offers, and how useful it is for a non-commercial blogger like me. This first part will give an overall review. Later I will go over all of the different statistical analyses the service provides, or at least the ones not concerning marketing. Keep in mind that I am still relatively new to GA, and so there may be features or tricks that I have not yet come across. (If you know any, please comment!)
First, the service seems to be aimed at (but by no means limited to) commercial sites, especially ones that use Google's AdWords service, in that a good portion of the analyses are set up to measure the performance of that service. If you're not advertising, then a lot of GA's data sets will be empty for you. However, that still leaves quite a few highly useful statistics for you to peruse.
GA, like Google's GMail, Calendar, and other services, makes extensive use of Javascript. There are a lot of toggled menus and pop-up windowlets that will expand your choices without having to regenerate the page view. When clicked on, graphs will acquire or lose labels, and pie charts will explode specific segments. In short, the app is designed in a very sexy way, much better than a simple web page with buttons and static graphic elements. Of course, there is still much that requires the page to be rebuilt, so some things take more time than a stand-alone app independent of a browser. But it's worth hanging around for.
The way it works: because GA is not resident on your web site or domain, you need to append a small script to the end of every HTML script for your site. Blogs use templates to generate each page, so it's easy to install in that sense. Every time a page so tagged is viewed by someone, the script sends the users' data to your GA account, where it is accordingly tabulated to create the display. This system, while necessary for an off-site service, has a rather glaring flaw: files that cannot carry the GA script are not counted. As a result, GA cannot see anyone who only monitors your RSS feed, nor can it detect when non-HTML files (such as images or movies) are accessed by direct external links, including hotlinking. This creates a rather large blind spot for the service. GA does offer a way to track specific file views and downloads, but only if the referrer is from within your site.
On the other hand, the script tag does allow you to choose precisely which HTML files on your site you want to have tracked. For example, a huge amount of traffic on my site is generated by spammers, who focus primarily on the scripts that are not content pages in and of themselves. Spammers are constantly accessing my comment and trackback scripts without going through the actual pages of the blog, which are what I am interested in. As a result, very, very little of the spam that hits my site gets recorded, and despite the blindness to RSS visitors (which may constitute as much as 1/3 of my visitors), GA gives me a much truer view of the real people who come and read my blog.
Ideally, GA would be perfect as an on-domain script (like AwStats), which would directly monitor all traffic on the site--but still give you the ability to dictate which files are tracked and which are not. Even better would be a way for the data to be tracked from within the domain, and then compiled as a data package which an application resident on your computer could regularly download, allowing you to analyze the data far more flexibly and quickly.
Another downside to GA is the restricted filtering ability. GA allows you to dictate filters on incoming data; for example, if a spammer is hitting a page in your site, you can specify to GA through a powerful filter engine exactly what to block from coming in. That's the good part; the bad part is that once the data has hit you, you can't edit it out of the data you have already collected. This means that any spammer or generator of bogus data will have a permanent impact on your GA stats until you notice them and go to the trouble of applying a specific filter. If the files you track are frequently hit by spammers who constantly fake and/or rotate IP addresses, domain names, and other data, it will be a constant game of catch-up for you. The ability to purge the existing database of spammer activity you have noticed would be a vast improvement.
GA does offer you the ability to temporarily filter data as it is generated in the current display, but this ability is far too limited. First, it only allows you to include or exclude a single keyword. If you want to see only one element isolated from all others, it's very good. But if you want to exclude more than one data element, it is more or less useless. One example: when GA lists all referrals (visitors who followed links to your site from outside sites), it does not differentiate search engines from other referrals (a rather glaring omission, in my opinion). The temporary filter will allow you to only exclude one keyword; that allows me, for example, to exclude all hits coming from Google, but I cannot also exclude all the other search engines, such as Yahoo, MSN, and so on--not at the same time. One other problem with these filters is that they are discarded as soon as you leave the particular analysis you apply the filter to; the filter cannot be remembered, and so must be re-input every time.
That pretty much wraps up all the major shortfalls I have noted so far; after that, it's all gravy. As I mentioned, a sexy interface, tons of options, lots of useful and interesting data. There are dozens of useful breakdowns, lists, and charts. Almost every piece of data can be analyzed in cross-section--for example, when I view the chart showing new vs. returning visitors, I can break down either group by their region, browser type, or the keyword they used to find my site via a search engine. For example, most of the people who find me via a Google search for "eyelid twitch" only visit once; fewer than 5% return for another visit (within the same week, at least). The data GA collects is very flexibly viewable in these respects. You can also specify a range of dates within which to view data.
GA also has a Help Center which covers a surprising number of topics. Usually such "help" areas are lacking, leaving you in the dark about how to use the software. GA breaks that trend, explaining a wide range of features and issues, and doing a very good job at that. The explanations are not too technical for the casual user, usually favoring a complete omission of the hacker-level stuff. Support forums, in the form of Usenet groups tracked by Google's Beta Group search engine, exist to highlight any specific requests or exchange of information between users.
Next: What you can see, and how you can see it.
May 12, 2006
Google Analytics: Initial Reaction

The first data set came in for this site from GA, and while it's incomplete (only a half-day's report), it does give some interesting insight.
One huge drawback is that GA can't track RSS feeds. According to AwStats, fully 1/3 of the visits to my site are people who get a feed of the blog, and don't visit the blog directly. But when someone reads the feed, that means they aren't actually visiting any pages on the site. RSS feed can't use Javascript, which is how GA tracks visitors. So that's a rather significant blind spot. It could be significant for me because I'd like to know for certain that all the hits on the RSS feed are actual people, and not bots or spam or whatever. Other data on it would be nice too, of course.
Of course, part of this is my own fault: I'm too generous with the RSS feed. Most feeds only let you glimpse the first few lines of text of any entry/article; I opened the floodgates, letting people see everything, all the content, entire posts, including photos, in the RSS feed alone. This is relevant because if I limited the feed to a snippet, RSS visitors would have to click "read more" and visit the actual web page in order to see my whole posts. Depending on how things go, I might temporarily (or even permanently) change to snippets rather than full feed. The good side of that would be that more people would visit the actual pages, giving me a better view of who is actually reading things on my site. The down side is that people who enjoy reading in RSS might be annoyed that I'm making them click links all of a sudden. Are there any RSS readers out there who would like to give feedback on that before I decide?
On the other hand, there is a positive blind spot for GA: it doesn't track spambot-driven accesses to comment and trackback scripts, which is mostly how the spammers attack. Simply by nature, GA ignores most spam.
Of the data that does come in, GA seems to claim some pretty amazing abilities, tracking data I had no idea was possible. For example, GA claims to be able to detect the screen resolution and color depth settings of each visitor. According to the stats, about 52% of visitors to this site over the past 12 hours had a screen resolution of 1024x768; 14% had 800x600, and 13% had 1280x1024. How can GA tell that? It also claims to be able to tell if someone is using DSL, Dial-up, or "Corporate," whatever the last one means.
GA also does a fun trick with the IP addresses of visitors, detailing which country, region, and even city the hit apparently came from. I'm not sure how accurate this is when you get down to the city--the IP address would be from the ISP, not the user, after all. 12 visitors appeared to come from Saint Petersburg, FL, but I think that's where my blog host is, so they might be entering into the data somehow. After that, in the past 12 hours, 12 people came from New York, 7 from Denver, 6 each from L.A., Washington D.C., and St. Louis, 5 each from Oakland and Sacramento, 4 each from Plano, Atlanta, Austin, and Wilmington, and so on. Apparently visitors are coming from locales like Schaumburg, Mililani, Panorama City, Colchester, Poughkeepsie, Buzzard's Bay, Glen Carbon, Sedro Woolley, Cockeysville (no, I did not make that up), Halethorp, and Plymouth Meeting. A shout-out to my peeps in Plymouth Meeting!
Another cute trick GA does is give you the ability to view cross-sections, stat breakdowns on any one group. For example, I can see what screen resolutions were used to view my birdwatching in Japan page, or see which cities people came from to view my eyelid-twitching page (apparently there's an outbreak in Schaumburg).
But as I've mentioned, I'm playing with a very limited amount of data--I don't even have a full day's stats yet. And a better view will come after a few weeks or longer, when I can start really tracking returning visitors and various trends. One thing is for certain, GA is an incredibly powerful analytical tool, limited only by which data it is unable to track at all.
May 11, 2006
Google Analytics
Yes! I (finally) got my invite to Google Analytics (GA), formerly Urchin Stats. It requires you to insert a script to each web page on the site (simple to do with a blog site--just edit the templates), and according to theory, the stats should start being generated within 24 hours.
I've been looking forward to this as a way of finally discounting referral, trackback, and comment spam in my stats; up until now, spammers have been throwing my numbers off, and even with AwStats, nice as it is, there is no way to weed them out. I've had to simply guess at how many visitors actually come to my site. GA has filters that can--hopefully--clean out the unwanted spam and show me how many actual visitors are coming through.
Problem is, I've been looking at the GA site and the filters seem powerful but complex. Also, they mention that you can only screen incoming data--though their "exclude" filter seems like you can at least temporarily screen out existing data. It's not 100% clear from the help files, so I'll have to wait for data to come in before I can figure it out. So we'll see how that goes. It'll probably take several days as I try new things with various data sets as they arrive. Here's hoping....
Contextual Spam
I notice that the blog comment spammers are trying a new trick: contextual comment spamming. It's still automated, but it's smarter. Instead of just sending a few dozen obvious spam comments, the spammers try to trick you by making their spam appear to be an actual message. The first one I got related a news story about an artistic photo shoot in Spain, and the second talked about how to deal with Hamas. Except for the placement of three or four spam links, they would appear to be valid comments. So far, the spam comes one at a time--smart, otherwise it would look suspicious.
Of course, there are problems related to automating the spam: the topic of the comments, so far, is not exactly in line with the topic of the blog posts. The photo shoot news story comment spam was posted to a blog entry about blogging (though there was a tangential reference to the press media), and the Hamas comment spam was posted to an entry about Air America Radio. Who knows, maybe they do have software to try to match topics and it's just not doing a very good job. But I imagine that sometime soon, they'll have enough fake comments on all kinds of topics, and smart enough spam software so that the comment and blog topic will be closely matched.
Not that it's causing me any problems--I review comments in my email software, and links show up as blue and underlined; the spam links stand out. But if one were to moderate comments using a browser, the links might be less apparent, or the blog author might gloss over them, and the fact that it is spam could be missed. I'm also not sure how the spam filters are doing with these messages; I haven't checked to see if my filters are blocking out more of these.
In one way, this is a good sign--it means that spam blocking is effective enough that the spammers are having to go to a lot of trouble to try to get their spam through.
May 01, 2006
No Links for the Time Being, Please
Just a note: the current spam attack is prolonged enough that I'm getting tired of moderating the few that get through the blog's usual defenses. So, for the moment, I'm setting the blocking software so that active links cannot be posted in comments--that is, you can't use the "a href" tag. You can, of course, simply paste URLs in comments without the link tags, so they don't become a link. I'll set it back to accepting active HTML links when the spam attack dies off. Thanks for your patience.
April 29, 2006
Spam Weeks

So I started to see quite a few comment spam slip through the blacklists and get into moderation. Not too many, between 4 and 6 a day, but much more than usual. I wondered if it was due to some spammer figuring out a way past the blocking software, so I checked the numbers, and found that instead it was simply due to brute force. I quickly punched out the chart above, tracking numbers for the past 18 days (since I last purged my spam lists). On the chart, the black line is the total spam (including trackback spam); the blue line shows the number blocked by MT Blacklist, and the red line shows the ones blocked by SpamLookup. SpamLookup tends to block mostly trackback spam, about 95% of what it gets, whereas MT Blacklist gets the comment spam.
You can see that for the first week, it was going pretty much as it has been since I checked the numbers last August, just depositing a few hundred spam per day. Then there was a sudden spike on the 18th, and a more steady attack pattern after the 23rd. Considering that only 4-6 spam get through to moderation among floods of almost 2000 a day, that's not bad--a block rate of up to 99.7%. And, of course, not a single one gets through moderation.
This made me think back to the days before mass comment spam. I installed MT Blacklist in December 2003, a little more than two years ago, because one spammer posted 50 comment spams on my site. Back then, 50 in one day was massive, the first time that sort of thing ever happened. And just four months previous to that, I got my first comment spam ever, a single comment, just a month after I started daily blogging. Now, just a few years later, 300 comment spams and 100 trackback spams per day is low traffic. Thank god that the vast majority of it gets filtered out in the background...
1000 Days
That's one milestone passed on the blogging road: I just blogged for the 1000th consecutive day. Started on August 2nd, 2003, and haven't missed a day since. Sometimes it's very hard to come up with something that feels worth writing about. Sometimes there are several things, and I blog about them all or choose which ones seem best. I try to make each day's post more than just a quote or a one-line reference. Sometimes, as you may have noticed, I get carried away and write if not a book, then a novella perhaps, or at least it feels like that.
So the question is, how long can I keep this up? I mean, it's become a real thing, now. I'll get back to you the next time a significant milestone comes along, I guess.
April 16, 2006
As an Example...
Just as a quick example of spam proliferation--I have gotten spam on several occasions since late last year with a particular keyword in it. This happens fairly often, as spammers want some unique identifier they can use via Google to see which of their spam droppings got published, and how many Google has recognized. The one that I am speaking of includes the words "compliment whose" and then the word "wondered." In the spam, those three words appear connected as one long word; I separated them here so they could not be found by the spammers via a Google search.
My point, however, is that if you do a Google search on that text string, you'd find Google reporting "about 130,000" hits. Considering that a great many of the spams sent get filtered by anti-spam software, including spam filters, registration, and captchas, then consider all of those that Google did not find, and you can estimate that just this one version of spam has been sent easily several hundred thousand times, probably millions--and it is very likely that this is just one of many different versions or types sent out by the spammer responsible. You will also find that blog or even newspaper posts which contain the keyword often have dozens if not hundreds of spam messages accumulated in the comments--this being the reason why spammers continue to spam so much.
Nothing huge or earth-shattering about this. I simply wanted to point out a small example of how outrageously prolific spammers can be, and some of the methods they use to check on their success. I'm still waiting for someone to institute a system that will do something to at least take the edge off of these people who take a valuable system of worldwide communications and use it as their own personal toilet.
March 26, 2006
Blog Note
The "Itsok" spammer is back. Though I have blocked the string "itsok," the spammer seems to have caught on to that, and has introduced variations. I am trying to block them, but just to be safe, more than one URL per comment will trip the automatic comment block.
March 18, 2006
Ain't It Uncool
Son of a...
I check my stats after leaving them alone for 3-4 days, and I get a shock. About 30,000 more hits than could be expected. Roughly 12,000 more unique-IP visitors than normal. A one-day high on Thursday of 7500 visitors, 5500 on Friday, 4000 on Wednesday, when the recent average has been hanging around 1800. Huge spike in visitors.
At first, I think this may be because I got a link in from somewhere, some big-traffic site saw mine, put a link out, and thousands of people come to check out what I've been saying. But when I check my referrers and page stats, there's not a single blip. Nothing to account for it all. And then I figure it out. I check the cellar of my referral list, and there they are. A massive hotlinking event.
What is a hotlink, you ask?
A hotlink is when one web site hijacks an image from another web site. You see an image on web site "A," you think it exists there. Usually it does. But with hotlinking, the image is really from web site "B," a different site. The owner of web site A simply told your browser to go to site B, get the image, and display it as part of web site A. Web site B hosts the image, owns it, pays for it--but web site A, the thief, gets the props for displaying it, and no one ever hears of the site that actually has the image. It's like someone taking credit for something you did. You did the work, you paid the price, but someone else reaps the benefit and suppresses your name.
Allow me to demonstrate. Look at the "Links" button at right. The image appears here on my site, blogd.com, right? Ha! Fooled you! It's not on blogd.com at all! It's on one of my other sites (www.teach-japan.com, a site forever under construction). If you looked at the coding for this blog page, you'd see that the image address is really at the other site. But by a simple web page command, I was able to integrate the image into this web page, so it seamlessly appears to be located here. My other site gets no credit, no one knows it came from there.
Well, what's wrong with that, you ask? Plenty. You see, hosting a web site costs money. You pay for disk space. You also pay for "traffic," also called "bandwidth," meaning that every time you visit my site and see the images, they are being sent ("downloaded") to your computer. If the image is 100 KB in size, then I get docked 100 KB in traffic by the people who host my site. Every month, I pay that web host a fee for disk space and traffic.
When someone hotlinks to one of my images, it appears on their web site, but I'm paying for the storage and transport of that image. It's as if someone not only steals a painting from your house, but they also charge you for the gas used by the getaway car and the fees for the storage locker to keep the stuff they stole from you. It's more than that, since the image is repeatedly stolen by each and every visitor to their web site. If they hotlink to an image 100 KB in size, and 1000 people visit their page, then I get docked for 100 Megabytes of traffic fees. Now, I pay a flat fee for a lot of traffic, but still, it's my traffic.
Hotlinking is considered even more of a no-no because the hotlinker appears to be taking credit
