It was 1993 and the web was new. Very new.
At Stanford University, deep in the heart of what was becoming Silicon Valley, nestled a trailer stuffed with graduate engineering students, computer gear and the remnants of quickly eaten meals. One person who was there described the level of sanitation as "a cockroach's idea of Christmas."
Normally, this would not concern us much, except that the trailer was the working home of David Filo and Jerry Yang. They were PhD students. They were fascinated by a new phenomenon called the World Wide Web. They used their workstations and connections to explore it.
The web they were exploring wasn't' very big. The web wouldn't really go public until 1994 when a product called Internet-in-a-Box made it available to the masses. For the moment the web was small enough that it seemed like you could explore most of it.
The problem was that David's bookmark file had just gotten too big. So he and his friend Jerry wrote a bit of software that made it possible to group bookmarks into categories. They wanted to share what they had, so they created "Jerry's Guide to the World Wide Web" and put it on the web. They split things up with the data on Yang's workstation and the search tool on David's.
Then they got email. People really liked this thing. I was one of them. I found out about the service from a friend on the old WELL. Other people found out, too.
The young men decided to use human categorizing of the sites they found, instead of using software. The site was renamed Yahoo! Filo says the exclamation point is "pure marketing hype." In 1996, Yahoo! went public. Jerry and David got rich.
Yahoo! is now the consistently most popular site on the web. According to StatMarket, almost 40% of all search tool referrals worldwide come from Yahoo! This might be the end of the story except for one thing. Most folks still can't find what they're looking for on the web.
There are lots of reasons for this. The most obvious one is that there's an awful lot out there on the web, with more pouring on every day. Last year, a company called Bright Planet estimated that there were more than 2.5 billion web documents in the public or "surface" web.
That leads us to another reason for the trouble you may be having finding the information you want. There are really two webs. One is that surface web that's indexed on most common search tools like Yahoo, AltaVista, Excite, etc. The other, Bright Planet calls the "deep" web. Other researchers have called it the "hidden" web.
The hidden web is made up of information files that simply don't get found by the key search tools. Sometimes that's because those pages aren't "search engine friendly," using features that make the searching and indexing job easier. Sometimes it's because they have information in databases that are in other digital forms besides the HTML of the web.
A study in 2000 by the University of California at Berkeley estimated that 93% of the information in the world is available in digital format. The problem is that most of it is not available in HTML, even if it's accessible from the web. It might be in some other file or database form. If you know where to look on the web, you can find a form that helps you get at this stuff, but no search engine will help you find it in the first place.
There's also the fact that the web is messy, messy, messy. By some estimates a quarter of all links lead to the dreaded 404 error - page not found. Some days it seems like the links that work are the exception.
There's another problem, too. Most of us learned to do research in school, and the schools left out a big part of the truth.
We learned to do research in a controlled, indexed environment. We used card catalogs and periodical indexes. It worked. But what most of our teachers didn't tell us was that researching in a controlled, indexed environment is the exception, not the rule. You do it in school, but you don't do it in most of your life.
Before we jump to tools you can use to get at information on the public and hidden webs, you have to lose that school idea of research as it applies to the web. The web is not a library. It's not even close to being completely cataloged. It's not indexed beyond a fraction of the available content.
If it's not like a library, what's it like, then? It's like a room full of people. And doing effective research on it is more like doing research by calling people than it is like burrowing through a card catalog.
When you're presented with a problem in your non-web life, you usually do just that. You figure out who you know that's either most likely to know the answer to your question, or most likely to know how to point you toward the answer. So you call your friend. Usually you either get the answer or your get closer. When you don't, you don't throw up your hands or shake your fist at the sky and shout "I can never find anything when I call people!" Instead, you call someone else.
This works because some of your friends are experts and some of your friends know lots of people and because you usually search for information in the same areas over and over. What most of us do is search for a reliable source and then search for the information we want.
That works on the web and for the same reasons. Some sites have detailed information about specific topics. Association sites usually fill this bill, for example. Other sites have lots of connections. They're information hubs. And if you search out a reliable source first, it's easier to find information. That's rule number one - search for sources first, and information second.
Rule number two is to start in a good place. Your bookmark file should have links to sites you've used before that led you to good information. Search tools are good, too. I'm going to mention several different tools below. Links to all of them are at a special Monday Memo page.
Yahoo is often described as the world's most popular search engine. It's popular all right, but it's not a search engine. It's a catalog. People put information about websites into a category structure.
Search engines are different. Search engines, like Google and Excite, use software to index and display information. They're organized around keywords.
Use catalogs if you're searching for a concept. If you're interested in "baseball," you'll get usually get better results with a catalog than a search engine. That's because the word "baseball" doesn't appear in lots of stories or article about baseball. Check your local newspaper sports section to see if I'm right.
On the other hand, if it's a word or name you're looking for, use a search engine. There you've got a shot at finding a source than mentions the specific word you're looking for.
Here's a couple of more tips. Use both search engines and directories. Use more than a single keyword.
Top catalogs include Yahoo, About.com, and LookSmart. My favorite search engine right now is Google, but I like AltaVista and Excite, too.
You may be more effective using something called a "metacrawler." This is a site that searches several different search engines at the same time. Ask Jeeves is one of these, with the advantage of allowing some natural text searching. Others are Dog Pile, Pro Fusion, and Metacrawler.
Search engines, catalogs, metacrawlers and bookmarks are great places to start, but there's probably an information hub on almost any topic you're interested in. There are two extensive lists out there to help you find these great hidden places.
Gary Price is a librarian at George Washington University. His list called Direct Search has a search engine to help you find information hubs on a number of topics. It won't search the contents of the files on those hubs, you'll have to go there yourself to do that.
Another great list is on the site for the University of Leiden. It was developed by Marten Hofstede. There's no search engine here, but you'll find a long list of categories.
There are also general information hubs out there that can help. Two that I like and that I've mentioned in Monday Memo are CEO Express (great for business sources and customizable) and Refdesk.com (huge array of reference works and sites on a variety of topics).
But wouldn't it be great if you could get some software that would help you search more effectively. Well, you can. At least you can get software that's helped other folks search more effectively. You'll have to test the tools to see how that work for you. That's fairly easy though, since you can download the software and try it for free. Here are three software packages in order of increasing costs.
Copernic has been around in various versions for quite a while. It's gotten good reviews. And it's free.
Lexibot is a product of Bright Planet, the folks who did that study of the hidden web. It's designed to help you find those hidden information sources. If you keep it, after the thirty day free trial, the cost is $89.95.
Then there's Bullseye Pro. This is the most expensive of the bunch at $249, so try the other two before you try this one.
Those are all good places to start. What next?
Rule Three is that you follow the links. You almost never find exactly what you need on the first site. So explore. Follow the links. If you hit a dead end, back up. Or start over from another point.
That's Rule Four. Try, try, again. And again. Like calling folks looking for information, this is a process of trying, gathering feedback and refining or narrowing the search.
Then there's Rule Five. Do something outside the rules. Look for experts by hunting for author's names on Amazon.com and then searching for sites that they developed on where they're mentioned. Search for names of companies to find articles that have paragraphs analyzing an industry. And anything else you can think of.
The bad news is that searching is often frustrating and unrewarding. Good news is that help is one the way. Sites are starting to index more than just HTML. Last week, Google announced that it was now indexing Adobe PDF files. Companies are working on better search tools and software for you and for the search sites out there.
In the meantime, you'll be more effective and less frustrated you learn what works, stay flexible and keep trying.
It's good to keep some historical perspective on this. Dave Filo and Jerry Yang started Yahoo! as something for themselves and their friends. They, and others, have only been at this search technology development game for less than a decade.
It's something like the development of the great star catalogs. Hipparchus completed the first one of those we know about, sometime around 130 BC. There were other catalogs over the centuries. The great Tycho Brae completed his masterful, pre-telescope catalog in the sixteenth century. And since then the star catalogs have expanded in coverage and features.
It'll happen with web catalogs, too. It just won't take a few hundred years.
This feature appeared on 5 March 2001