The Post Man Always Saves Twice
By Andy Langer
July 14, 1997
At a time when few Internet experts seem to agree on anything, one Web theory is emerging: Over the next five years, what is now an open market of data will likely shake down to only one, two, or three real winners. According to this spin on traditional media theory, these sites (or network of sites) will prosper as the unofficial "sites of record" for their given field -- be it "children," "games," "sports," or "travel." These monster sites stand to become each Internet category's ABC [ http://www.abctelevision.com/ ], NBC [ http://www.yahoo.com/Business_and_Economy/Companies/Media/Television/Networks/US/NBC/ ], and CBS [ http://www.yahoo.com/Business_and_Economy/Companies/Media/Television/Networks/US/CBS/ , and one of the fiercest competitions for dominance lies in the "search engine" category, where early leaders like Yahoo!, AltaVista, and Excite are battling to become the library that marks the starting point for each Web trip. It can be argued that there are no real winners in any one category so far -- too many start-ups with deep pockets are still entering the game for any one site to monopolize any one area of interest. The exception is, of all things, a search engine created and maintained in Austin -- Deja News (http://www.dejanews.com).
Last year, iGuide, an online review of websites, declared "Deja News may do only one thing, but it does it remarkably well." That "thing" entails making the Usenet portion of the Internet searchable via a retrievable archive/database system. By loose definition, the term "Usenet" envelopes all of the Internet's community aspects, where people communicate via publicly posted and viewed messages -- most commonly within "newsgroups," that is, forums that range from the general (rec.food.cooking) to the highly specific (austin.food). There's still some debate over what constitutes the "Usenet," but there's no denying the untamed beast is big. In just over two years, Deja News has collected over 109 million different posts or "articles" from over 20,000 newsgroups. Until the advent of Deja News, there had been neither a vehicle for archiving or searching these Usenet posts because the process of storing that much information took up far too much disk space. By creating the proprietary software to operate, maintain, and search a database as large as Usenet, Deja News founder Steve Madere has not only discovered an uncharted Internet niche but has created and seemingly held on to Deja News' self-appointed title as "The Source For Internet Newsgroups."
"[iGuide] is right, it's a niche, for sure," says Madere of his company, which has grown from three employees in December 1995 to over 50 today. "But if you can get a sufficient handle on a niche, it can be significant. At this point, there are five or six players battling it out for web-searching dominance and we pretty much own Internet discussion groups. We plan to continue that by focusing specifically on discussion groups and do discussion groups extraordinarily well. Our focus is one of our big strengths in that it allows us to get way far ahead of anybody who could possibly compete with us in the area. So, it may be a niche, but owning the whole thing is very significant."
What Deja News seems to have is a focus on the segment of the Internet that is traditionally the most unfocused -- with an estimated 24 million users spreading the messages over those 20,000 newsgroups. The result is a complicated and burdensome "feed," or path that Usenet follows. And as Usenet has grown, many local and regional providers have begun limiting their intake of that feed to save the money on hardware, offering their customers access to only a portion of the newsgroups. But Deja News, along with allowing a user to search the archives, also allows users to access the Usenet's unabridged feed and is so far the only site on the Internet that allows "everybody and anybody" to connect to their site for Usenet access. As such, Deja News has the distinction of originating an estimated 3% of all newsgroups postings -- numbers that even the largest regular Internet provider in New York [ http://www.yahoo.com/Regional/U_S__States/New_York/ ] or Tokyo can't claim. And that number should grow as Deja News unveils an updated newsreading application next month, which promises to make accessing and navigating the full feed easier. In addition, the new program should also reduce the already impressive four-hour lag time it takes for Deja News to recognize a post to an unheard of, almost instantaneous rate that parallels the speed it takes to send and receive e-mail.
"Part of the reason we're improving our newsreading capabilities, in some sense, is to save Usenet," says Madere, who admits that the potential for finding more users by making Usenet more user-friendly is also in his own best interest. "And as it turns out, our system is, in technical terms, perfectly capable of daily reading. At the same time, we're also finding that because of this problem of Usenet growing faster than the machines, more and more providers start restricting the feed or, in some cases, getting out altogether. Frankly, we think Internet discussion groups are incredibly useful and are far and away the most powerful communication medium ever invented."
http://www.dejanews.com/But while newsreading has become a popular use of Deja News, the site's primary purpose and use has been twofold: as a bridge between Usenet and the Web, and as a search engine capable of recalling from the archives detailed information or data mentions. Madere says neither effect could be achieved without the creation of Deja News' search software -- software that Madere first looked into based on his own Usenet frustrations.
"I'd been using Usenet since the mid-Eighties and had been wanting to be able to search through it pretty much ever since I found it," says Madere, a University of Texas graduate. "I always thought, `Wow, this is great stuff, and I know that somebody answered my question last week, I just wish I could find it.'"
But until Madere rolled out Deja News' first site in May of 1995, questions and answers were disappearing just as fast as they were being posted. According to Madere, although disc drive storage capability typically doubles in size each year, so have the number of Usenet posts. "It's always been just as unmanageable as it is today," he says. "And so because everybody's always been expiring what they could store after two weeks, everybody has had to buy new hardware every year just to keep up."
Interestingly, Madere's decision to buy the hardware necessary to store a Usenet archive wasn't nearly as significant as the creation of Deja News' search engine. "Before, even if you had a private-collection posts on disc, it would have taken so long to search through it that everybody else would have killed you for taking up so much computer time," Madere says. "So creating a specific database just for the specific purpose of searching through everything that had ever been done in discussion groups was really our breakthrough."
And like most great inventions, the mechanics of Deja News' search engine are fairly uncomplicated. By their nature, all posts come pre-packaged in text form, and are public information, reducing the overhead costs of transferring messages to computer text and paying the authors a licensing fee. But Deja News' advantage is the speed in which its search technology can cover its database and find particular words, names, and discussions within Usenet, what experts routinely call "the world's largest database." By comparison, AltaVista -- the search engine that covers the largest percentage of the Web side of the Internet -- indexes just over 30 gigabytes of information. Deja News, according to Madere, currently indexes and stores 180 gigabytes of Internet discussion groups -- at what is believed to be a fraction of the cost it takes to run the AltaVista search engine.
"What we have that nobody else does is the database and the large database technology to go with it," says Madere. "And the interesting thing is that is what also makes it plausible for us to do large scale news reading, because when it comes down to it, Usenet is just a giant database. Discussion groups represent a huge database of messages and when you're reading a newsgroup, it just means you're reading messages that meet the particular criterion that they were posted on this newsgroup. But we also can make large databases searchable for a reasonable cost, and it costs us one-fifth to one-tenth as much for searching on a large datable than it costs other people because of our special software. As a result, it's just cost-prohibitive for anybody else to offer an Internet discussion group search, because the data itself is so big."
Madere does indeed have a lock on the discussion group search market so far. In fact, Microsoft [ http://www.microsoft.com/ ], America Online, Yahoo!, and Excite have all signed up with Deja News to offer, through each of their individual (and competing) services, Deja News-driven searches. Madere believes that not only do his "strategic deals" with his Web-based competitors help stretch the reach of Usenet itself, but also ensures they'll stay out of the discussion group market themselves. "Right now, it's still the cost advantage that is allowing us to dominate. Certain huge companies could decide to loose a ton of money and compete with us, and that way could even be able to take over our space," admits Madere. "But frankly, it's still far easier for them to team up with us. It would be short-sighted to take us on."
Updating Deja News' readers and the actual interface while forging more relationships with potential competitors appear to be Deja News' own short-term goals, Madere acknowledges, though he says that Deja News' actual long-term mission is based around marketing -- particularly the DNCampaign, a suite of Internet-based marketing tools Deja News provides other businesses interested in online advertising. As the Deja News search engine is utilized by nearly 3.5 million different users monthly, Madere can offer other companies an opportunity to test the effectiveness of their online advertising by targeting distinctive demographic groups within that audience -- whom Madere can recognize and target based on what newsgroup topic the user is searching through.
"Since we have 25,000 distinct newsgroups, we can extremely rapidly target messages in the form of advertising banners to specific groups, be it programmers, system administrators, travelers, or European travelers," says Madere, who regularly offers local businesses, such as local rock band Velvet Hammer, the chance to advertise free so that Deja News can test the response of one banner against the other. "Then, we can test the responses of those people in the specific demographic groups to specific messages and report back to the companies what we've found and how we can help them better achieve their goals."
And yet, Deja News' marketing studies would be nothing without its base of users, who come to the site not to be tested, but to instead test the database for information they seek. Deja News' critics contend that increasingly, posts to several of the more controversial newsgroups are being dragged out of Deja News' search engine and reposted to either make a point or discredit someone else's opinion with what they'd perhaps said before on the topic. Could Deja News be creating more clutter?
"Sure, following up an old article or thread could be like walking into a conversation three hours after it ended just to answer somebody," Madere says. "But our service tries to prevent that because when you try to follow up something older than a couple of weeks it says `You can't follow up that message,' which is a general warning to alert people `This conversation is over.'"
Even so, there is nothing that Deja News can do about people cutting and pasting old posts into new ones, and it's Madere's contention that the search engine itself actually reduces the number of repetitive posts by cutting down on the number of questions already asked and answered. Better yet, in a virtual community where anybody can run amok unidentified or unchecked, the Deja News' "Author Profile" feature -- which allows access to a complete list of where a user has posted and what they've said -- has been applauded for unmasking users that routinely post misinformation or for verifying an author's credibility on a subject.
"The reason we invented the author profile was as a convenience to help people filter out bozos," says Madere. "Discussion groups have a lot more in common with traditional communities. When you meet somebody in the real world and they say something interesting and you want to get to know them better, typically you'll go to other people who've talked with them in the past and find out about the person. Here, you're looking at a post from somebody giving you information and you'll want to get an idea of how credible this person is -- what they've done elsewhere on Usenet and what they've said in the past. And this, by looking at what and where they've posted, can help you determine whether somebody's a gadfly or significant contributor."
In theory, offering full access to a user's online history might appear to raise
a new set of Internet privacy concerns. What's to stop someone from typing in the
name of a friend interested in cooking and discovering his set of posts to alt.personals.spanking?
And is there a reasonable expectation of privacy, considering that,
pre-Deja News, posts disappeared after two weeks and were far less likely to come back and haunt someone?
"We do not expose any private information," says Madere. "The only things that are available on Deja News are things that have been posted to Usenet and have been already visible to 24 million people daily. There can be no privacy concerns to making something that has been posted to Internet discussion groups searchable. It's a question of making it more clear to the original poster that maybe they shouldn't have posted it. No reason to get mad at us, we're just the messenger -- 24 million people saw this before we got it."
"In fact, at our site, we have a copy of the one FAQ posts to news.newusers and most of the newsreading software essentially forced the new user to read this document before their first post. In this document, which was written in 1985 or 1986, it says, `Don't say anything that you wouldn't want to come back years later.' This stuff will get back to everybody that's related to you. We didn't have to write anything, because it's already there."
Privacy issues aside, the real story behind Deja News is the search engine itself and the ease with which information is now available on Usenet. Already, Deja News is witnessing a 20% monthly increase in the number of users at their site, numbers that are not only strengthening Deja News' stronghold on the Usenet market, but also making Madere's chosen "niche" into something far more significant than it was just two years ago.
"When I first started the company I knew the potential of the discussion groups, and knew the tens of millions of people participating needed a way to search. The only question in my mind was whether we could singularly dominate the space. If we were able, I knew we'd have to grow at this rate. And if we didn't, I didn't know if we could survive."
"But now, we expect the Internet discussion group market to grow rather rapidly over the next few years again. We're going to make it easier to participate, and as more people get more used to using the Internet they realize that the only thing that's unique about the Internet, that no other medium can offer, is discussion groups. Web pages aren't that different than a library -- except being able to access everything at once. Internet telephone is a temporary thing based on regulatory glitches and the whole Pointcast `push' thing is just television -- and people are going to get a clue on that soon.
"Discussion groups are the one thing that the Internet is the only medium that can do it. There's no other communication medium in which you can go out and send a message to thousands of people and any of them can respond to it, also to thousands of people. It's like a global telepathy system, instantly capable of reaching people with similar interests all over the world."