Chapter 9: TCP/IP Internet Services

TCP/IP Internet Services

FTP Manners
FTP Clients

Connecting
Navigating
Listing Style
Recognizing File Type and Decoding

FTP by Email

FTPmail
BITFTP
Mailservers
Mailing List Managers

Archie

Telnet Usage
IRC
MUDs
WAIS
Gopher

Veronica
Jughead

World Wide Web
Wrapping Up

TCP/IP Internet Services

I must tread a fine line when talking about Internet services, because the level of connection (and thus the level of service) varies widely. People who can send Internet email, for instance, may not be able to use Gopher or the World Wide Web. The services that I talk about in this chapter (except for FTP and Archie via email) all require a full TCP/IP connection to the Internet. For Mac folks, a full TCP/IP connection to the Internet means that you have MacTCP loaded and properly configured, and either a dedicated Internet connection or a modem connection via PPP or SLIP. If you don't have the proper sort of account and connection, you do not have a full TCP/IP connection to the Internet. To be fair, there some ways of getting access to these services via America Online, CompuServe, or some of the bulletin board systems, and in those cases you're limited to the software they provide for you.

FTP

Despite the occasionally confusing way people use the term both as a noun and a verb, most people don't have much trouble using FTP. FTP stands for File Transfer Protocol, and not surprisingly, it's only good for transferring files between machines. In the past, you could only use an FTP client to access files stored on FTP servers. Today, however, enough other services such as Gopher and the World Wide Web have implemented the FTP protocols that you can often FTP files no matter what service you happen to be using. Heck, you can even FTP files via email. I'll get to the specifics of the different clients in later chapters; for now, here are a few salient facts to keep in mind regarding FTP.

FTP Manners

The Internet does a wonderful job of hiding geographical boundaries. You may never realize that a person with whom you correspond lives on the other side of the globe. When using FTP, however, try to keep the physical location of the remote machine in mind.

First, as with everything on the Internet, someone pays for all this traffic. It's probably not you directly, so try to behave like a good citizen who's being given free access to an amazing resource. Show some consideration by not, for example, using machines in Australia when one in the same area of your country works equally well. Because trans-oceanic traffic is expensive, many machines mirror others; that is, they make sure to transfer the entire contents of one machine to the other, updating the file collection on a regular, often daily basis.

Here's an example. Because the Info-Mac archive site at sumex-aim.stanford.edu is popular and kept up-to-date, other sites carrying Macintosh software don't want to duplicate the effort. It's much easier to set up a mirror to sumex so that machines in Australia and Scandinavia can have exactly the same contents as sumex. Mirroring not only saves work, it enables users in those countries to access a cheaper, local site for files. Everyone wins, but only if you, the user, utilize local sites whenever possible. You can usually tell where a site is located by looking at the two-letter country domain at the end of the address.

Sometimes, of course, the file you need exists only on a remote site in Finland, for example, so that's where you must go to get it. Another point of etiquette to keep in mind, wherever the file may be, is sensitivity to the time of day at the site from which you retrieve it. Like most things in life other than universities during exams, more people use the Internet during their daytime hours than at night. Thus, it's generally polite to retrieve files during off hours; otherwise, you're preventing people from doing their work. That's not polite, especially if the file you're retrieving is a massive QuickTime movie or something equally frivolous.

Notice that I said "their daytime hours." Because the Internet spans the globe, it may be 4:00 A.M. where you are, but it's the middle of the day somewhere else. You can figure out the local time by using the Map control panel that comes with your Mac.

FTP Clients

FTP is inherently simple to use, but there's plenty of room for FTP client software to make your life miserable. The following sections, therefore, describe several benefits and features to look for in an FTP client.

Connecting

Most of the time, people use an FTP client program to log on to a remote FTP site, find a file or two, download them, and then log off. As such, a disproportionate amount of your time is spent connecting and navigating to the desired files.

A good FTP client enables you to define shortcuts for frequently used FTP sites, along with the userid and password necessary for connecting to them. This benefit is minor but makes a big difference when repeated numerous times. I can't tell you how much I hate typing sumex-aim.stanford.edu on a Unix command line when I'm trying to connect to that site with FTP.

Navigating

Once you're on, the FTP client program should make it very easy to move between directories (or folders, in Mac jargon). Some programs do this by emulating the Standard File Dialog used on the Mac to open and save files, which is a good start (although the Standard File Dialog is one of the most confusing parts of the Macintosh interface). It's helpful when the client program remembers the contents of directories. That way, if you go back to one you've already visited, you don't have to wait for it to refresh.

Other programs, Anarchie and Snatcher mostly, take the navigational aspect of FTP one step further, and actually emulate the Finder. Snatcher in particular goes a bit overboard in trying to mimic the Finder, in my opinion.

A useful variant of shortcuts (also known as bookmarks) to FTP site names is the addition of directory information to the site name. Say, for instance, you want to retrieve something from ftp.tidbits.com. Not only do you have to enter the host name, userid, and password, but you must also go to the proper directory, which is /pub/tidbits.

Listing Style

In Unix, you can choose among several different methods of viewing files. Some show you more information, such as file size and creation date, and others show you less, in order to fit more entries on the screen. Although the Mac doesn't have the problem of trying to fit multiple columns in a list (no Macintosh program uses multiple column lists), not all the FTP clients are good about showing you the entire filename, size, or date. I think this failure is inexcusable, because you need to know how large a file is before you spend an hour retrieving it -- especially if you're connecting at a slow speed. Make sure the program you use provides this information. In addition, a truly useful feature is the capability of sorting the listing on date, file size, or whatnot.

Recognizing File Type and Decoding

Much of the time, an FTP client can figure out what sort of file you're retrieving by looking at the extension to the filename. This being the case, the client can make sure it is transferring the file in the proper format. If you're lucky, it even decodes some of the common formats that you see on the Internet.

"Wait a minute," you say. "He didn't mention strange file formats before." Sorry about that. I'll get to file formats in the next chapter, after I've discussed the various different ways that files may appear on your machine. But first, let's look at how you can retrieve files from FTP sites armed only with an email program.

FTP by Email

One of the problems with FTP is that it requires your attention -- in fact, pretty much your full attention. In addition, you must have a TCP/IP connection to the Internet. If you're connecting via AppleLink or UUCP, you simply cannot use FTP normally. The one caveat to this TCP/IP requirement is that more and more of the commercial services are starting to support FTP, so you can use FTP on America Online or CompuServe, for instance, without resorting to FTP by email.

Note: If you're clever and are able to do a little scripting with AppleScript or Frontier, you can script Anarchie to retrieve files automatically.

There is a solution, although not a terribly good one. You can retrieve files remotely, using only your email program, in two different ways. The most generic way is by using one of the FTPmail or BITFTP servers. The other way is to use a specific mailserver program that only knows how to serve a specific site, sometimes as part of a mailing list manager such as LISTSERV, ListProcessor, or Majordomo. Let's look at the generic way first.

FTPmail

Using FTPmail or BITFTP isn't particularly difficult, but it can be extremely frustrating. The problem is twofold. First, the main FTPmail server is seriously overloaded. Because it's a free service that someone at DEC runs in their machine's spare time, FTPmail is a low priority. It can take a week for your file to come back. I've even had requests seemingly disappear into the ether. Second, talking to an FTPmail server is like playing 20 Questions -- unless you know precisely what you're looking for, where it is, and are able to enter the commands perfectly, you'll get an error message back. And, if that message comes back a week later, you may not even have the original information with which to correct your mistake.

Note: When you use email to retrieve files stored on FTP sites, the files often are split into chunks. Usually, you can control the size of the chunks, but manually joining them in a word processor can be difficult. Some email programs, such as Eudora, and various utilities make joining file chunks easier. If you don't use Eudora, check out ChunkJoiner and BinHqx (if the file is a BinHex file). You can find them in:
ftp://ftp.tidbits.com/pub/tidbits/tisk/util/

Talking to an FTPmail or BITFTP server feels much like logging into an FTP site, changing directories, and finally retrieving the file. The only problem is, you must type in the commands all at once. So, to get a file from the main FTPmail server, you would send email to ftpmail@decwrl.dec.com, put something in the Subject line (it doesn't care what, but dislikes empty Subject lines), and then, in the body of the message, put a set of commands like this:

help connect ftp.tidbits.com chdir /pub/tidbits/misc get easy-view.hqx quit

So, in English, what you're doing is first getting the help file from FTPmail, then connecting to the anonymous FTP site at ftp.tidbits.com, then changing into the /pub/tidbits/misc/ directory, then retrieving the file called easy-view.hqx, and finally quitting. If you wanted, you could retrieve more files. And, if you included an ls command, FTPmail would return the directory listing to you, enabling you to see what's there before requesting specific files.

Needless to say, there are a number of other commands that FTPmail accepts, and you will probably want to use some of them. They're all explained in the help file that FTPmail returns to you when you send it the help command.

I only know of one other FTPmail server. It's in Ireland and uses a somewhat different command set, so I don't recommend using it unless you're in Europe. If you want to find out more about it, send email to ftpmail@ieunet.ie and put the single command help in the body of the message.

BITFTP

BITFTP stands for BITNET FTP, or something like that. Machines that are only on BITNET cannot use FTP normally, so some enterprising programmers created the BITFTP program to enable BITNET users to retrieve files stored on FTP sites.

I know of only three BITFTP servers. One is in the U.S., another in Germany, and the third in Poland (see Table 9.1). Don't use the ones in Europe unless you too are in Europe -- it's a waste of net bandwidth, and probably won't result in particularly good service anyway.

                       Table 9.1: BITFTP Servers

                  Server name                 Location

                  bitftp@pucc.princeton.edu   U.S.
                  bitftp@vm.gmd.de            Germany
                  bitftp@plearn.edu.pl        Poland

Retrieving a file from a BITFTP server works similarly to retrieving a file from FTPmail, but the commands are somewhat different. Here's how you retrieve the same file (along with the help file again) that we snagged before. Send email to bitftp@pucc.princeton.edu and put these commands in the body of the letter:

help ftp ftp.tidbits.com user anonymous cd /pub/tidbits/misc get easy-view.hqx quit

Enough about BITFTP. You can probably figure out the rest on your own, with the aid of the help file. I wouldn't want to spoil all the fun of figuring some of this stuff out for yourself!

Mailservers

More common than FTPmail or BITFTP programs that serve everyone are mailserver programs that provide email access to FTP archives on a specific site. There are many of these mailservers around, although finding them can be a bit difficult, and I cannot tell which FTP sites that might interest you also have mailservers. I can, however, tell you of a few mailservers that you may find useful. Each has its own command structure.

BART, the mailserver for the massive Macintosh software archives at mac.archive.umich.edu is an extremely useful way to access many Macintosh files, especially since the load on the machine via FTP is often so great that you cannot easily connect.

Note: BART is short for Brode's Archive Retrieval Thang. Glad you asked?

BART only provides access to the files stored on mac.archive.umich.edu. If you want more general access via email, you must use one of the FTPmail or BITFTP servers mentioned previously. Luckily, because BART is so specific, its command structure is relatively simple.

For instance, to retrieve help and StuffIt Expander from BART, you would send email to mac@mac.archive.umich.edu, and in the body of the message you would put the following commands:

path ace@tidbits.com help chunk 1500 send util/compression/stuffitexpander3.52.sea.hqx

That's about it. BART limits you to 1,500K and five files per day (the list of files isn't currently considered against your quota), so you can't abuse it. Your quota is cleared every day at midnight (Eastern Daylight Time). Perhaps the main problem with BART is that if there's something wrong with your request, it tends to ignore the request entirely and not send any error messages. So, for instance, if you surpass your quota, BART simply throws out any additional requests and you must send them again after midnight when your quota has been cleared.

Using the chunk command, you should set your chunk size as high as your mailer can handle, since that reduces the load on BART and makes it easier for you to deal with the files on your end.

Note: You may run into trouble with certain files if you use Eudora to retrieve them via BART. If the submitter also used Eudora or a compatible program to attach the file, here's what will happen: Eudora will start downloading the first chunk from BART, see that it contains an attachment, and then complain when it sees the attachment is too short (since the other parts aren't included). Simply tell Eudora to fetch the attachment again as a normal message, and it will work fine.

Mailing List Managers

Mailing list manager programs such as LISTSERV, ListProcessor, and Majordomo also often provide access to files, although these files aren't always available via FTP. Most often the files in question are logs of mailing list discussions, but in a few instances, they're more interesting.

The LISTSERV at Rice University that helps distribute the Info-Mac Digest also provides access to all of the files stored in the Info-Mac archives at sumex-aim.stanford.edu. Using it is simplicity itself. The LISTSERV doesn't care about directory paths, chunks, or anything like that. You need not specify your email address, or tell the LISTSERV how to encode the files. Instead, all you do is send listserv@ricevm1.rice.edu a message with one-line commands that look like any of the following:

$MAC HELP $MAC GET tidbits-267.etx

Actually, I'm oversimplifying slightly. There are four commands, all told (see Table 9.2).

             Table 9.2: LISTSERV File Retrieval Commands

         Command                      Function

         $MAC IND ALL          Gets list of recent or all files
         $MAC DIR directory    Gets subdirectory contents
         $MAC GET name.type    Gets Info-Mac archive file
         $MAC HELP             Gets help information

The LISTSERV limits you to 250K per day, although if you request a single file larger than that, it won't refuse that single request. Since all new files in the Info-Mac archives are announced in the Info-Mac Digest, you can easily copy the filenames out to a file request message as you're reading, send off the message when you're finished, and have files coming back quite quickly.

Enough about FTP by email. It's like playing Pin the Tail on the Donkey with a donkey the size of ... Nah, I'll avoid the easy shot at some sleazy politician. Let's talk next about how you find files via FTP. The answer is Archie.

Archie

Archie is an example of what happens when you apply simple technology to a difficult problem. Here is the problem: How do you find any given file on the nets if you don't already know where it's located? After all, in comparison with finding a single file on several million machines, the proverbial haystack looks tiny, and its cousin, the proverbial needle, sticks out like the sore thumb you get when you find it. In a nutshell, Archie uses normal FTP commands to get directory listings of all the files on hundreds of anonymous FTP sites around the world. It then puts these file listings into a database and provides a simple interface for searching it. That's really all there is to Archie.

Unfortunately, and for reasons I don't fully understand, Archie servers have become less and less useful over time. They're almost impossible to get through to via an Archie client (telnetting to them is the most successful in my recent experience), and much of the time they don't seem to know about certain large FTP sites that I know have the file for which I'm looking. In other words, sometimes Archie simply won't work. Don't worry about it and just try another technique or tool.

Note: Archie was developed in early 1991 by Alan Emtage, Peter Deutsch, and Bill Heelan from the McGill University Computing Center, Canada. Development now takes place at a company founded by Deutsch and Emtage, Bunyip Information Systems. Although the basic Archie client software is distributed freely, Bunyip sells and supports the Archie server software.

You can access Archie via Telnet, email, Gopher, the World Wide Web, and special Macintosh client programs. Some Unix machines may also have Unix Archie clients installed. It seems to me there are two basic goals an Archie client should meet. First, it should be easy to search for files, but when you want to define a more complex search, that should be possible as well. Second, since the entire point of finding files is so that you can retrieve them, an Archie client ideally should make it very easy to retrieve anything that it finds. This second feature appears to be less common than you would expect. On the Mac, only Anarchie can retrieve found files with just a double-click.

Note: Archie isn't an acronym for anything, although it took me half an hour searching through files about Archie on the Internet to determine that once and for all.

Accessing Archie via email is extremely easy, although the Archie server offers enough options (I'll let you discover them for yourself) to significantly increase the complexity. For a basic search through, merely send email to archie@archie.internic.net and put in the body of the message lines like the following:

help find easy-view find easyview

In a short while (or perhaps a long while, depending on the load on the Archie server), the results should come back -- the help file that you asked for and the results of your search for "easy-view" and "easyview." The example above uses both terms because I'm not sure of the exact wording of the filename, but experience tells me that one of those two possiblities is likely.

However, if the Archie server you chose is down, or merely being flaky (as is their wont) you may want to try another one. There are plenty. Simply send email to the userid archie at any one of the Archie servers from the list in Table 9.3. As usual, it's polite to choose a local server.

Table 9.3: Current Archie Servers

Server Name Server IP Number Location archie.au 139.130.4.6 Australia archie.edvz.uni-linz.ac.at 140.78.3.8 Austria archie.univie.ac.at 131.130.1.23 Austria archie.cs.mcgill.ca 132.206.51.250 Canada archie.uqam.ca 132.208.250.10 Canada archie.funet.fi 128.214.6.102 Finland archie.univ-rennes1.fr 129.20.128.38 France archie.th-darmstadt.de 130.83.128.118 Germany archie.ac.il 132.65.16.18 Israel archie.unipi.it 131.114.21.10 Italy archie.wide.ad.jp 133.4.3.6 Japan archie.hana.nm.kr 128.134.1.1 Korea archie.sogang.ac.kr 163.239.1.11 Korea archie.uninett.no 128.39.2.20 Norway archie.rediris.es 130.206.1.2 Spain archie.luth.se 130.240.12.30 Sweden archie.switch.ch 130.59.1.40 Switzerland archie.nctuccca.edu.tw Taiwan archie.ncu.edu.tw 192.83.166.12 Taiwan archie.doc.ic.ac.uk 146.169.11.3 United Kingdom archie.hensa.ac.uk 129.12.21.25 United Kingdom archie.unl.edu 129.93.1.14 USA (NE) archie.internic.net 198.49.45.10 USA (NJ) archie.rutgers.edu 128.6.18.15 USA (NJ) archie.ans.net 147.225.1.10 USA (NY) archie.sura.net 128.167.254.179 USA (MD)

Telnet Usage

Telnet is a bit hard to talk about because using it is just like using a modem to connect to another computer. Telnet simply enables you to connect to a computer somewhere else on the Internet and to do whatever that computer allows you to do. Because Telnet is similar to FTP in the sense that you're logging in to a remote machine, the same rules of etiquette apply (although running a program over Telnet usually places less stress on a machine). As long as you try to avoid bogging down the network when people want to use it for their local work, you shouldn't have to worry about it too much. When you telnet to another machine, you generally telnet into a specific program that provides information you want. The folks making that information available may have specific restrictions on the way you can use their site. Pay attention to these restrictions. The few people who abuse a network service ruin it for everyone else.

What might you want to look for in a Telnet program? That's a good question, I suppose, but not one that I'm all that qualified to answer. For the most part, I avoid Telnet-based command-line interfaces. Thus, in my opinion, you should look for features in a Telnet program that will make using it, and any random program that you may happen to run on the remote machine, easier to use.

It's useful to be able to save connection documents that save you the work of logging into a specific machine (but beware of security issues if they also store your password). Also, any sort of macro capability will come in handy for automating repetitive keystrokes. Depending on what you're doing, you also may want some feature for capturing the text that flows by for future reference. And, you should of course be able to copy and paste out of the Telnet program.

IRC

IRC, which stands for Internet Relay Chat, is a method of communicating with others on the Internet in real time. It was written by Jarkko Oikarinen of Finland in 1988 and has spread to 20 countries. IRC is perhaps better defined as a multi-user chat system, where people gather in groups that are called channels, usually devoted to some specific subject. Private conversations also are possible.

Note: IRC gained a certain level of fame during the Gulf War, when updates about the fighting flowed into a single channel where a huge number of people had gathered to stay up-to-date on the situation.

I personally have never messed with IRC much, having had some boring experiences with RELAY, a similar service on BITNET, back in college. I'm not all that fond of IRC, in large part because I find the amount of useful information there almost nonexistent, and I'm uninterested in making small talk with people from around the world. Nevertheless, IRC is one of the most popular Internet services. Thousands of people connect to IRC servers throughout any given day. If you're interested in IRC, refer to the section on it back in chapter 5, the excerpt from Internet Explorer Kit for Macintosh. That should give you a sense of what IRC is like. You can find more information in the IRC tutorials posted for anonymous FTP in:
ftp://cs-ftp.bu.edu/irc/support/

Client programs for many different platforms exist, including two for the Macintosh called ircle and Homer. Much as with Telnet, you're looking for features that make the tedious parts of IRC simpler. I could blather on about all the features you might want, but frankly, if you're using a Macintosh with either a Unix shell account or a MacTCP-based account, just get Homer. It has more features than one would think possible, and can even -- in conjunction with Apple's PlainTalk software -- speak some or all of the text that flows by.

MUDs

MUD, which stands for Multi-User Dungeon or often Multi-User Dimension, may be one of the most dangerously addictive services available on the Internet. The basic idea is somewhat like the text adventures of old, where you type in commands like "Go south," "Get knife," and so on. The difference with MUDs is that they can take place in a wide variety of different realities -- basically anything someone could dream up. More importantly, the characters in the MUD are actually other people interacting with you in real time. Finally, after you reach a certain level of proficiency, you are often allowed to modify the environment of the MUD.

The allure of the MUDs should be obvious. Suddenly, you can become your favorite alter-ego, describing yourself in any way you want. Your alternate-reality prowess is based on your intellect, and if you rise high enough, you can literally change your world. Particularly for those who may feel powerless or put upon in the real world, the world of the MUD is an attractive escape, despite its text-environment limitations.

After the publication of an article about MUDs, the magazine Wired printed a letter from someone who had watched his brother fail out of an engineering degree and was watching his fianc�e, a fourth-year astrophysics student, suffer similar academic problems, both due to their addictions to MUDs. But don't take my word for it; read the letter for yourself on Wired's Web server:
http://www.wired.com/Etext/1.4/departments/rants.html

Note: Wired's Web server requires authentication now, which means that you must sign up with them and get a userid and a password before you can get in. It's free, and you can register at:
http://www.wired.com/

I've seen people close to me fall prey to the addictive lure of MUDs. As an experiment in interactive communications and human online interactions, MUDs are extremely interesting, but be aware of the time they can consume from your real life.

I don't want to imply that MUDs are evil. Like almost anything else, they can be abused. But in other situations, they have been used in fascinating ways, such as to create an online classroom for geographically separated students. There's also a very real question of what constitutes addiction and what constitutes real life. I'd say that someone who is failing out of college or failing to perform acceptably at work because of a MUD has a problem, but if that person is replacing several hours per day of television with MUDing, it's a tougher call. Similarly, is playing hours and hours of golf each week any better than stretching your mind in the imaginative world of a MUD? You decide, but remember: there are certain parts of real life that we cannot and should not blow off in favor of a virtual environment.

Although MUDs are currently text-only, rudimentary graphics will almost certainly appear at some point, followed by more realistic graphics, sound, and video, and perhaps some day even links to the virtual reality systems of tomorrow. I don't even want to speculate on what those changes might mean to society, but you may want to think about what may happen, both positive and negative.

MUDs generally run under Unix, but you could run your own with a Macintosh port of a MUD, called MacMud, and connect to other Unix MUDs with a simple MUD client program, MUDDweller. Even more interesting is the program Meeting Space from a small company called World Benders. Meeting Space is billed as a virtual conference room, and is marketed to large businesses as money- and time-saving alternative to business trips. However, it's actually a business MUD with a snazzy Macintosh interface and hefty price tag. Meeting Space works over any Macintosh network, including the Internet, and although I don't know of any public Meeting Space servers yet, some were being discussed earlier. For more information about Meeting Space, send email to wb-info@worldbenders.com and check out the discussion of it later on in chapter 27, "MacTCP-based Utilities & Miscellany."

WAIS

Unlike almost every other resource mentioned in this book, the WAIS, or Wide Area Information Servers, project had its conception in big business and was designed for big business. The project started in response to a basic problem. Professionals from all walks of life, and corporate executives in particular, need tremendous amounts of information that is usually stored online in vast databases. However, corporate executives are almost always incredibly busy people without the time, inclination, or skills to learn a complex database query language. Of course, corporate executives are not alone in this situation; many people have the same needs and limitations.

In 1991, four large companies -- Apple Computer, Dow Jones & Co., Thinking Machines Corporation, and KPMG Peat Marwick -- joined together to create a prototype system to address this pressing problem. Apple brought its user interface design expertise; Dow Jones was involved because of its massive databases of information; Thinking Machines provided the programming and expertise in high-end information retrieval engines; and KPMG Peat Marwick provided the information-hungry guinea pigs.

One of the initial concepts was the formation of an organizational memory -- the combined set of memos, reports, guidelines, email, and whatnot -- that make up the textual history of an organization. Because all of these items are primarily text and completely without structure, stuffing them into a standard relational database is like trying to fill a room with balloons. They don't fit well, they're always escaping, and you can never find anything. WAIS was designed to help with this problem.

So far I haven't said anything about how WAIS became such a useful tool for finding free information. With such corporate parentage, it's in some ways surprising that it did. The important thing about the design of WAIS is that it doesn't discriminate. WAIS can incorporate data from many different sources, distribute them over various types of networks, and record whether the data is free or carries a fee. WAIS is also scalable, so that it can accept an increasing number and complexity of information sources. This is an important feature in today's world of exponentially increasing amounts of information. The end result of these design features is that WAIS works perfectly well for serving financial reports to harried executives, but equally well for providing science fiction book reviews to curious undergraduates.

In addition, the WAIS protocol is an Internet standard and is freely available, as are some clients and servers. Anyone can set up her own WAIS server for anyone with a WAIS client to access. Eventually, we may see Microsoft, Lotus, and WordPerfect duking it out over who has the best client for accessing WAIS. With the turn the Internet has taken in the past year, however, it's far more likely that we'll see Microsoft, Lotus, and WordPerfect (now a division of Novell) competing with World Wide Web clients. Although WAIS has continued to grow in utility and popularity, it has also faded into the shadow of the snazzier looking Web clients. That's not to say that WAIS isn't being used heavily, just that it tends to work behind the scenes as a search engine for a Web page interface, rather than through a dedicated client program.

At the beginning of this section, I mentioned the problem of most people not knowing how to communicate in complex database query languages. WAIS solves that problem by implementing a sophisticated natural language input system, which is a fancy way of saying that you can talk to it in English. If you want to find more information about deforestation in the Amazon rainforest, you simply formulate your query as: "Tell me about deforestation in the Amazon rainforest." Pretty rough, eh? In its current state, WAIS does not actually examine your question for semantic content; that is, it searches based on the useful words it finds in your question (and ignores, for instance, "in" and "the"). However, nothing prevents advances in language processing from augmenting WAIS so that it has a better idea of what you mean.

In any database, you find only the items that match your search. In a very large database, though, you often find far too many items; so many, in fact, that you are equally at a loss as to what might be useful. WAIS attempts to solve this problem with ranking and relevance feedback. Ranking is just what it says. WAIS looks at each item that answers the user's question and ranks them based on the proximity of words and other variables. The better the match, the higher up the document appears in your list of found items. Although by no means perfect, this basic method works well in practice.

Relevance feedback, although a fuzzier concept, also helps you refine a search. If you ask a question and WAIS returns 30 documents that match, you may find one or two that are almost exactly what you're looking for. You can then refine the search by telling WAIS, in effect, that those one or two documents are "relevant" and that it should go look for other documents that are "similar" to the relevant ones. Relevance feedback is basically a computer method of pointing at something and saying, "Get me more like this."

The rise of services such as WAIS and Gopher on the Internet will by no means put librarians out of business. Instead, the opposite is true. Librarians are trained in ways of searching and refining searches. We need their experience, both in making sense of the frantic increase in information resources and in setting up the information services of tomorrow. More than ever, we need to eliminate the stereotype of the little old lady among dusty books and replace it with an image of a person who can help us navigate through data in ways we never could ourselves. There will always be a need for human experts.

When you put all this information together, you end up with a true electronic publishing system. This definition, pulled from a paper written by Brewster Kahle, then of Thinking Machines and now president of WAIS, Inc., is important for Internet users to keep in mind as the future becomes the present: "Electronic publishing is the distribution of textual information over electronic networks." (Kahle later mentions that the WAIS protocol does not prohibit the transmission of audio or video.) I emphasize that definition because I've been fighting to spread it for some years now because of my role with TidBITS.

Note: Electronic publishing has little to do with using computer tools to create paper publications. For those of you who know about Adobe Acrobat, Common Ground from No Hands Software, Envoy from Novell, and Replica from Farallon, those programs aren't directly related to electronic publishing because they all work on the metaphor of a printed page. With them, you create a page and then print to a file format that other platforms can read (using special readers), but never edit or reuse in any significant way. We're talking about electronic fax machines. We should enjoy greater flexibility with electronic data.

So, how can you use WAIS? I see two basic uses. Most of the queries WAIS gets are probably one-time shots where the user has a question and wants to see whether WAIS stores any information that can provide the answer. This use has much in common with the way reference librarians work -- someone comes in, asks a question, gets an answer, and leaves.

More interesting for the future of electronic publishing is a second use, that of periodic information requests. As I said earlier in this book, most people read specific sections of the newspaper and, even within those sections, are choosy about what they do and don't read. I, for instance, always read the sports section but I am interested only in baseball, basketball, football to a lesser extent, and hockey only if the Pittsburgh Penguins are mentioned. Even within the sports I follow closely, baseball and basketball, I'm more interested in certain teams and players than others.

Rather than skim through the paper each Sunday to see whether anything interesting happened to the teams or players I follow, I can instead ask a question of a WAIS-based newspaper system (which is conceivable right now, using the UPI news feed that ClariNet sells via Usenet). In fact, I might not ask only one question, but I may gradually come up with a set of questions, some specific, others abstract. Along with "What's happening with Cal Ripken and the Baltimore Orioles?" could be "Tell me about the U.S. economy."

In either case, WAIS would run my requests periodically, every day or two, and indicate which items are new in the list. Ideally, the actual searching would take place at night to minimize the load on the network and to make the search seem faster than the technology permits. Once again, this capability is entirely possible today; all that lacks for common usage is the vast quantities of information necessary to address everyone's varied interests. Although the amount of data available in WAIS is still limited (if you call 700-plus sources limited), serious and important uses are already occurring.

Note: There's a project, probably destined to be commercial, in testing now that will provide this sort of an electronic newspaper. I'm not quite sure yet how WAIS is involved in it, but there are some links. Check out this URL for more information:
http://www.ensemble.com/

In large part due to its corporate parentage, the WAIS project has been careful to allow for information to be sold and for owners of the information to control who can access the data and when. Although not foolproof, the fact that WAIS addresses these issues makes it easier to deal with copyright laws and information theft.

Because of the controls WAIS allows, information providers are likely to start making sources of information more widely available. With the proliferation of these information sources, it will become harder for the user to keep track of what's available. To handle that problem, WAIS incorporates a Directory of Servers, which tracks all the available information servers. Posing a question to the Directory of Servers source (WAIS calls sets of information sources or servers) returns a list of servers that may have information pertaining to your question. You can then easily ask the same question of those servers to reach the actual data.

Most of the data available on WAIS is public and free at the moment, and I don't expect that arrangement to change. I do expect more commercial data to appear in the future, however.

In regard to that issue I want to propose two ideas. First, charges should be very low to allow and encourage access, which means that profit is made on high volume rather than high price. Given the size of the Internet, I think this approach is the way to go, rather than charging exorbitant amounts for a simple search that may not even turn up the answer to your question.

Second, I'd like to see the appearance of more "information handlers," who foot the cost of putting a machine on the Internet and buying WAIS server software and then, for a percentage, allow others to create information sources on their server. WAIS, Inc. already provides this service, but I haven't heard of much competition yet. That service enables a small publisher to make, say, a financial newsletter available to the Internet public for a small fee, but the publisher doesn't have to go to the expense of setting up and maintaining a WAIS server. This arrangement will become more commonplace; the question is when? Of course, as the prices of server machines, server software, and network connections drop, the number of such providers will increase.

WAIS has numerous client interfaces for numerous platforms, but you probably can use either a simple VT100 interface via Telnet or, if you have a MacTCP link to the Internet, a program called MacWAIS. When evaluating WAIS client programs, keep in mind my comments about the two types of questions and the relevance feedback. A WAIS client should make it easy to ask a quick question without screwing around with a weird interface, and it should also enable you save questions for repeated use (as in the electronic newspaper example). Similarly, with relevance feedback, that act of pointing and saying, "Find me more like this one that I'm pointing at" should be as simple as possible without making you jump through hoops.

Finally, none of the WAIS clients I've seen provide a simple method of keeping track of new sources as they appear, not to mention keeping track of which sources have gone away for good.

Gopher

In direct contrast to WAIS, Gopher originated in academia at the University of Minnesota, where it was intended to help distribute campus information to staff and students. The name is actually a two-way pun (there's probably a word for that) because Gopher was designed to enable you to "go fer" some information. Many people probably picked up on that pun, but the less well-known one is that the University of Minnesota is colloquially known as the home of the Golden Gophers, the school mascot. In addition, one of the Gopher Team members said that there are gophers living outside their office.

Note: Calling yourself the Golden Gophers makes more sense than calling yourself the Trojans, not only considering that the Trojans were one of the most well-known groups in history that lost, but also considering that they lost the Trojan War because they fell for a really dumb trick. "Hey, there's a gigantic wooden horse outside, and all the Greeks have left. Let's bring it inside!" Not a formula for long-term survival. Now, if they had formed a task force to study the Trojan Horse and report back to a committee, everyone wouldn't have been massacred. Who says middle management is useless? Anyway, I digress.

The point of Gopher is to make information available over the network, much in the same way that FTP does. In some respects, Gopher and FTP are competing standards for information retrieval, although they serve somewhat different purposes. Gopher only works for retrieving data; you cannot use it to send data. Also, there's no easy way to give Gopher users usernames and passwords so only they can access a Gopher site.

Gopher has several advantages over FTP. First, it provides direct access to far more types of information resources than FTP. Gopher provides access to online phone books, online library catalogs, the text of the actual files, databases of information stored in WAIS, various email directories, Usenet news, and Archie. Second, Gopher pulls all this information together under one interface and makes it all available from a basic menu system.

Note: Menu items on a Gopher server are not Macintosh menus, but list items in a Macintosh window under TurboGopher. Keep that in mind, and you'll be fine.

If you retrieve a file via FTP and the file gives you a reference to another FTP server, you as the user must connect to that site separately to retrieve any more files from there. In contrast, you connect to a single home Gopher server, and from there, wend your way out into the wide world of Gopherspace without ever having to consciously disconnect from one site and connect to another (although that is what happens under the hood). Gopher servers almost always point at each other, so after browsing through one Gopher server in Europe, you may pick a menu item that brings you back to a directory on your home server. Physical location matters little, if at all, in Gopherspace.

Gopher has also become popular because it uses less net bandwidth than standard FTP. When you connect to a Gopher server, the Gopher client software actually connects only long enough to retrieve the menu, and then it disconnects. When you select something from the menu, the client connects again very quickly, so you barely notice that you weren't actually wasting net bandwidth during that time. Administrators like using Gopher for this reason. They don't have to use as much computing power providing files to Internet users.

Note: There's actually no reason why FTP servers couldn't be rewritten to work this way, as well. Jim Matthews, the author of Fetch, is always going on about how writing an FTP server that used something called lightweight threads would make FTP more efficient. In the meantime, Peter Lewis's Anarchie FTP client for the Mac works much like a Gopher client in that it is continually connecting again and again to your target FTP site, enabling you to perform more than one FTP task at a time.

Several Gopher clients exist for the Macintosh. The one written by the Gopher programmers themselves is arguably the best Gopher client for any platform. They claim that it's the fastest over slow connections, and although I haven't used clients on other platforms, TurboGopher is certainly fast. You also can access Gopher via Telnet and a VT100 interface. It's nowhere near as nice (it's slower, you can only do one thing at a time, and you cannot view pictures and the like online), but it works if you don't have MacTCP-based access to the Internet.

Veronica

The most important adjunct to Gopher is a service called Veronica, developed by Steve Foster and Fred Barrie at University of Nevada. Basically, Veronica is to Gopher what Archie is to FTP -- a searching agent; hence, the name.

Note: Veronica stands for Very Easy Rodent-Oriented Net-wide Index to Computerized Archives, but apparently the acronym followed the name.

Veronica servers work much like Archie servers. They tunnel through Gopherspace recording the names of available items and adding them to a massive database.

You usually find a Veronica menu within an item called Other Gopher and Information Servers, or occasionally simply World. When you perform a Veronica search, you either look for Gopher directories, which contain files, or you look for everything available via Gopher, which includes the files and things like WAIS sources as well. There are only a few Veronica servers in the world (between four and six, depending on which machines are up), so you may find that the servers are heavily overloaded at times, at which point they'll tell you that there are too many connections and that you should try again later. Although it's not as polite as I'd like, I find that using the European Veronica servers during their night is the least frustrating.

It's definitely worth reading the "Frequently Asked Questions about Veronica" document that lives with the actual Veronica servers. It provides all sorts of useful information about how Veronica works, including the options for limiting your search to only directories or only searchable items. You can use Boolean searches within Veronica, and there are ways of searching for word stems -- that is, the beginning of words. So, if you wanted to learn about yachting, you could search for "yacht*." The possibilities aren't endless, but Veronica is utterly indispensable for navigating Gopherspace and for searching on the Internet in general.

Jughead

Getting sick of the Archie Comics puns yet? They just keep coming and, like Veronica, I somehow doubt that this acronym came before the name. Jughead stands for Jonzy's Universal Gopher Hierarchy Excavation And Display. Jughead does approximately the same thing as Veronica, but if you've ever done a Veronica search on some generic word, you know that Veronica can provide just a few too many responses (insert sarcasm here). Jughead is generally used to limit the range of a search to a certain machine, and to limit it to directory titles. This makes Jughead much more useful than Veronica if you know where you want to search, or if you're only searching on a Gopher server that runs Jughead.

I don't use Jughead all that much, because what I like about the massive number of Veronica results is that they often give me a sense of what information may exist on any given topic. I suppose that if I regularly performed fairly specific searches on the same set of Gopher servers, I'd use Jughead more.

Note: The best way to find a generally accessible Jughead server is to do a Veronica search on "jughead -t7." That returns a list of all searchable Jughead servers, rather than all the documents and directories in Gopherspace that contain the word "jughead."

World Wide Web

The World Wide Web is the most recent and ambitious of the major Internet services. The Web was started at CERN, a high-energy physics research center in Switzerland, as an academic project. It attempts to provide access to the widest range of information by linking not only documents made available via its native HTTP (HyperText Transfer Protocol), but also additional sources of information via Usenet news, FTP, WAIS, and Gopher. The Web tries to suck in all sorts of data from all sorts of sources, avoiding the problems of incompatibility by allowing a smart server and a smart client program to negotiate the format of the data.

Note: CERN doesn't stand for anything any more, but it once was an acronym for a French name.

In theory, this capability to negotiate formats enables the Web to accept any type of data, including multimedia formats, once the proper translation code is added to the servers and the clients. And, when clients don't understand the type of data that's appearing, such as a QuickTime movie, for instance, they generally just treat the data as a generic file, and ask another program to handle it after downloading.

The theory behind the Web makes possible many things, such as linking into massive databases without the modification of the format in which they're stored, thereby reducing the amount of redundant or out-dated information stored on the nets. It also enables the use of intelligent agents for traversing the Web. But what the Web really does for the Internet is take us one step further toward total ease of use. Let's think about this evolution for a minute.

FTP simply transfers a file from one place to another -- it's essentially the same thing as copying a file from one disk to another on the Mac. WAIS took the concept of moving information from one place to another, and made it possible for client and server to agree on exactly what information is transferred. When that information is searched or transferred, you get the full text without having to use additional tools to handle the information. Gopher merged both of those concepts, adding in a simple menu-based interface that greatly eased the task of browsing through information. Gopher also pioneered the concept of a virtual space, if you will, where any menu item on a Gopher server can refer to an actual file anywhere on the Internet. Finally, the World Wide Web subsumes all of the previous services and concepts, so it can copy files from one place to another; it can search through and transfer the text present in those files; and it can present the user with a simple interface for browsing through linked information.

But aside from doing everything that was already possible, the World Wide Web introduced four new concepts. The first one I've mentioned already -- it's the capability to accept and distribute data from any source, given an appropriately written Web server.

Second, the Web introduced the concept of rich text and multimedia elements in Internet documents. Gopher and WAIS can display the text in a document, but they can't display it with fonts and styles and sizes and sophisticated formatting. You're limited to straight, boring text (not that it was boring when it first appeared, I assure you). With the Web, you can create HTML (short for HyperText Markup Language) documents that contain special codes that tell a Web browser program to display the text in various different fonts and styles and sizes. Web pages (that's what documents on the Web are generally called) also can contain inline graphics -- that is, graphics that are mixed right in with the text, much as you're used to seeing in books and magazines. And finally, for something you're not used to seeing in books and magazines, a Web page can contain sounds and movies, although sound and movie files are so large that you must follow a link to play each one.

Link? What's a link? Ah, that's the third concept that the Web brought to the Internet. Just as an item in a Gopher menu can point to a file on another Internet machine in a different country, so can Web links. The difference is that any Web page can have a large number of links, all pointing to different files on different machines, and those links can be embedded in the text. For instance, if I were to say in a Web page that I have a really great collection of penguin pictures stored on another Web page (and if you were reading this on the Web and not in a book), you could simply click on the underlined words to immediately jump to that link. Hypertext arrives on the Internet.

Hmm, I should probably explain hypertext. A term coined by Ted Nelson many years ago, hypertext refers to nonlinear text. Whereas you normally read left to right, top to bottom, and beginning to end, in hypertext you follow links that take you to various different places in the document, or even to other related documents, without having to scan through the entire text. Assume, for instance, that you're reading about wine. There's a link to information on the cork trees that produce the corks for wine bottles, so you take that link, only to see another link to the children's story about Ferdinand the Bull, who liked lying under a cork tree and smelling the flowers. That section is in turn linked to a newspaper article about the running of the bulls in Pamplona, Spain. A hypertext jump from there takes you to a biography of Ernest Hemingway, who was a great fan of bull fighting (and of wine, to bring us full circle). This example is somewhat facetious, but hopefully it gives you an idea of the flexibility a hypertext system with sufficient information, such as the World Wide Web, can provide.

Fourth, the final new concept the Web introduced to the Internet is forms. Forms are just what you would think, online forms that you can fill in, but on the Internet, forms become tremendously powerful since they make possible all sorts of applications, ranging from surveys to online ordering to reservations to searching agents to who knows what. Forms are extremely useful, and are increasingly heavily used on the Web for gather information in numerous contexts.

For some time, the Web lacked a searching agent such as Archie or Veronica, a major limitation because the Web is so huge. However, a number of searching agents have appeared, and although they simply don't feel as successful as Veronica yet, I suspect that's merely because I'm less used to them. You can find a page of the Web searching agents at:

http://cuiwww.unige.ch/meta-index.html

In addition, a number of useful subject catalogs have sprung up; currently my favorite one is called Yahoo, and can be accessed at:

http://www.yahoo.com/

You can access the Web via a terminal and a VT100 interface, or even via email (which is agonizingly slow), but for proper usage, you must have a special browser.

Note: To try the Web via email, send email to listproc@www0.cern.ch with the command www in the body of the message.

When you're evaluating Web browsers, there are a number of features to seek. The most important is one that seems obvious: an easy way to traverse links. Since the entire point of a Web browser is to display fonts and styles in text, a Web browser should give you the ability to change the fonts to ones on your Mac that you find easy to read. HTML documents don't actually include references to Times and Helvetica; they encode text in certain styles, much like a word processor or page layout program does. Then, when your Web browser reads the text of a Web page, it decodes the HTML styles and displays them according to the fonts that are available. Sometimes the defaults are ugly, so I recommend playing with them a bit. Many, if not most, Web pages also contain graphics, which is all fine and nice unless you're the impatient sort who dislikes waiting for the graphics to travel over a slow modem. Web browsers should have an option to turn off auto-loading of images or let you move on before the images have finished loading. You should be able to do anything you can do in a normal Mac application, such as copy and paste. You should be able to save a hotlist, preferably hierarchical, of Web sites that you'd like to visit again. Finally, you should be able to easily go back to previously visited pages without having to reload them over the Internet.

As I said previously, there are a number of ways to access the Web. But frankly, if you use a Mac and don't have access to a MacTCP-based connection, you'll miss out on the best parts, even if you can see the textual data in a VT100 browser such as Lynx.

Wrapping Up

That should do it for the background material about the various TCP/IP Internet services, such as FTP, Telnet, Gopher, WAIS, the World Wide Web, and a few other minor ones like IRC and MUDs. Feel free to flip back here and browse if you're confused about basic usage or what might be important to look for in a client program.

Enough about all the Internet services. But, before we go on and talk about ways you can get Internet access, I should explain about all the different file formats that you run into on the Internet. They're a source of confusion for many new users, so let's move on to chapter 10, "File Formats."