Web Searching Tips

Overview

The Internet, otherwise known as the World Wide Web (WWW or Web for short), is an international network of networks linking computer to computer by way of a TCP/IP (Transmission Control Protocol/Internet Protocol). An IP is a unique address number that a computer must have in order to be on the Internet (e.g. 185.134.542.2). The Web has become a significant communications medium for the electronic exchange of files between computers for government, commerce, and personal use. Internet users can connect to regional, national, and international backbone networks through their local Internet Service Providers (ISPs) via twisted pair (dial-up) or CATV modems. Upon connecting to the Internet, the user has near unlimited connectivity to servers in the public domain, with a huge repository of information stored as Web pages and files to their avail. The information (access to files) is available to all Web users at no cost, other than that for subscribing to an ISP.

Most of the information on the Web is in the form of unstructured data; there is currently no universally adapted standard in structuring, or indexing the data. Structured data makes use of invisible tags that describe the following content, making the information “intelligent” and easier to search. Thus, the challenge is how to find the desired information in this milieu of unstructured data. In this paper, I describe the process of searching the Web, and tips on how to find and retrieve information on a specific topic of interest.

Content of the Web

The primary vehicle for delivering information over the Internet is the Web page. A Web page is a text file with code that instructs the Web browser how to display the text on the page. Hypertext Markup Language (HTML) has been the standard computer language code used on the Internet since its inception, although there are a number of other Internet language codes used today that perform the same, and enhanced functions (e.g., Dynamic HTML (DHTML), Java script (Java), Application Service Provider (ASP), ColdFusion (CFM), Extensible Markup Language (XML), etc.).

In addition to displaying text, Web pages often have links to documents, audio, music (MP3), video clips, animations, and other Web pages. Users can click on a link, called a hyperlink, to open the file with the associated application, or download it to their computer. It is important to know that when performing searches on the Internet, only the text in selected Web pages that are stored in a search provider’s database can be searched (see How Do Search Engines Work). To aid the visitor in finding information, the Web site publisher will often provide a local search engine for finding and retrieving files that reside on their server, or simply list the files with hyperlinks.

Accessing the Web

To gain access to the Web, a user must purchase a dialup or cable modem account from an Internet provider (IP) for connectivity. The IP provides the user with a Web browser software application used for navigating the Web, which they install locally on their computer. The user can then log on to the Web entering their account ID and password.

Web Browsers

The two most popular browsers are Netscape Navigator (Netscape) and Microsoft Internet Explorer (IE). These, and several Web browsers, are available for download on the Web at no cost. Both Netscape and IE have a built-in feature for searching and displaying Web pages on computers connected to the Internet. The way that these search features work differ slightly for each browser, however, they both employ the use of the same search engines. These search engines, developed by Internet search providers, are available to all Web users at no cost. Before I go on to describe methods and tips on how best to search the Web, it is important to understand how a search engine works.

Internet Search Providers and Search Engines

The ongoing effort to provide an improved means for finding specific information on the Web has precipitated a large number of Internet search provider companies, of which there are now hundreds. Search providers are companies that have developed Web search engines designed to aid users in retrieving information of interest. This part of their service is free. The way a search provider makes money is selling Web visibility to individual Web sites, and through advertising.

When you perform a Web search using a provider’s search engine, such as Google (rated as one of the best search engines), you are actually searching a database that resides on Google’s server. The search provider’s database stores a recent “snapshot” of select “visible” Web pages, rather than the Web directly.  Therefore, when you search the Web, you are actually searching a copy of the Web. In distinguishing “visible” from “invisible” Web pages, “Visible” pages are those that are listed in the search results of a search engine. “Invisible” pages are those that cannot be found by search engines due to technical or policy reasons (e.g., password protected, and file format such as PDF, AVI (video), and SWF (Flash). The number of “invisible” Web pages far exceeds the number of “visible” pages, further exasperating the search effort.

The search provider’s server contains full text Web pages selected from billions of pages published on Internet servers. Clicking a link in the search results opens the current version of the page at the source site (server). If the page has been changed, relocated, or deleted, since the last time the search provider’s database was updated—which can take as long as six months to complete—the results, if any, are often disappointing. 

The way a search engine works is to search for key words and phrases in unstructured data files using a ranking algorithm that defines and filters a search query. Many search engines (e.g., Google, Alta Vista, and Yahoo) allow for defining more advanced searches, using Boolean query, where the terms AND, OR, AND NOT, or NEAR between words typed in the search field are used to further filter a search (see Tips on Improving the Precision of Your Web Search).

Search engine databases are maintained by robot computer programs called spiders. As the term implies, spiders continuously roam the Web, automatically updating or deleting existing Web pages already stored in the search engine database. Upon finding pages, another computer program identifies the text, links, and other content in the Web page, and then stores this information in the search engine database. This process, called indexing, assigns structure to the Web page data, allowing the user to search the database by typing a keyword, phrase, or Boolean query method.

Spiders are not “intelligent” applications, meaning they cannot decide to visit new URLs (Uniform Resource Locator—a unique address of any Web document) to capture Web pages and store them in the database. If a new Web page is not linked to a page already stored in the database, a spider will not find it. The only way that a new Web page can be added to the search engine database is by human input. This is done at the request of the owner of the Web page to a particular search provider, usually for a fee.

Using a Web Browser’s Search Feature to Find Web Pages

IE and Netscape Web browsers have a built-in search capability, and both allow you to choose from one of several search engines. This is a feature of convenience so that you do not have to go to the URL for a search provider in order to perform a search.

With a Web browser, a user can access a particular Web page four ways:

1.     Typing in a known URL in the address field of the browser;

2.     Clicking the “Search” button (see Figure 1 and Figure 2), typing text (a word or phrase) on a topic of interest in the search box, and then clicking a link in the search results list (see Figure 3 and Figure 4);

3.     Accessing a search provider’s Web page and performing a search similar to the one described in the previous method 2;

4.     Clicking on a browser- or user-defined bookmark in the “Bookmarks” menu for Netscape, or the “Favorites” menu for IE.

This paper describes how to search the Web for information on specific topics using methods 2 and 3 above.

Figure 1: The Search button on Netscape’s Toolbar

Figure 2: The Search button on Internet Explorer’s Toolbar

Figure 3: The Search Query Dialog Box in Netscape

Figure 4: Internet Explorer’s Search Query Dialog Box

Note that by clicking on the “Search” button in IE, the window is split with the search results to the left, and the opened Web pages to the right.

A Strategy for Performing Web Searches

There is a tremendous amount of information available on the Web, and the challenge is in finding information on a particular subject. The way a search works is that it literally hunts for instances of words or phrases page by page in the search provider’s database. Therefore, in order to achieve the desired results, it is important that you give careful thought in defining the search query.

The University of California, Berkeley, has an excellent tutorial on the Internet (). They describe a methodology for performing Web searches in conducting academic research. This methodology, presented in abbreviated form in the following paragraphs, can be applied for all types of Web searches.

Analyze the Subject

Before you haphazardly begin typing words and phrases in a search engine that often return thousands of links (hits) it is best to analyze the subject of interest and develop a search strategy. As an aid in analyzing the subject, the University of California, Berkeley offers a form with the following questions:
1.     What unique words, distinctive names, abbreviations, or acronyms are associated with your topic?
2.     Can you think of societies, organizations, or groups that might have information on your subject via their pages?
3.     What other words are likely to be in ANY Web documents on your topic?
4.     Do any of the words in 1, 2, or 3 belong in phrases or strings—-together in a certain order, like a cliché?
5.     For any of the terms in #4, can you think of synonyms, variant spellings, or equivalent terms you would also accept in relevant documents?
6.     Can you think of any extraneous or irrelevant documents these words might pick up?
7.     What broader terms could your topic be covered by?

After having giving some thought on these above considerations, you will be better prepared to begin a Web search, as explained in the following paragraphs.

Select a Search Engine

There are many search engines from which to choose, some designed to focus searches on particular subject areas, while others are more generic. Rather than using different ones each time, it is more efficient to choose one or two search engines that you are most familiar with, and have consistently proved useful in retrieving the desired information. If you simply cannot decide on one particular search engine, you can perform a search using multiple search engines at the same time through a provider, such as Dogpile (dogpile.com).

The following is a partial list of the more popular search engines from the hundreds that are available on the Web:
·        Google (http://www.google.com)
·        AltaVista (http://www.altavista.com)
·        Yahoo! (http://yahoo.com)
·        Excite (http://www.excite.com)
·        Overture (http://overture.com)
·        Lycos (http://www.lycos.com)
·        Dogpile (http://www.dogpile.com
·        Webcrawler (http://webcrawler.com)
·        HotBot (http://hotbot.com)
·        Go (http://go.com/)
·        WhatUseek (http://www.whatuseek.com)
·        Northern Light (http://www.northernlight.com)
·        SearchIt (http://www.searchit.com)
·        AllTheWeb (http://www.alltheweb.com)
·        Magellan (http://www.magellan.com)
·        DirectHit (http://www.directhit.com)

Visit the AccuSumbit Web site for information on the ranking algorithm for six popular search engines at the following URL: http://accusubmit.com/secrets/engines.html.

Choose a Language

Most search engines allow you to filter your search query by language (e.g., English, French, Spanish, German, etc.). This helps narrow your search, but at the same time, it may prevent you from retrieving pages that are bi-lingual.

Select an Area of Interest

Most search providers have the option for searching within a particular area of interest to narrow search results. For example, Dogpile has a drop-down list for choosing a particular content category (e.g., Web, Files, Audio, News, Images, etc). Other search providers (e.g., Altavista and Webcrawler) have predefined (canned) search queries for topics of interest, including Autos, Computing, Games, Health & Fitness, Music, Travel, Sports, etc. These pre-defined search queries simplify the search process, but restrict the results to what the provider deems as being worthy.  For academic and research purposes, these canned search queries are generally unsatisfactory.

Be Precise in Typing a Word or Phrase in the Search Box

As explained in the Internet Providers and Search Engines section, a search engine retrieves Web pages based on key words and phrases typed into the search box. Therefore, the results of your search query are fully dependent upon the accuracy of your entry. If you are vague with your search definition, you will have a greater number of ambiguous hits making it difficult for you to identify which links have the information you seek. For example, let’s say you want to learn more about Siberian Huskies before bringing one home. Typing the word “Dog,” or “Siberian,” or “Husky” produces thousands of links in the search results, most of which are of little value.  Instead, type the phrase “Siberian Husky,” or even better, type “owning a Siberian Husky.”  You can experiment with different word combinations to achieve the desired results.

Tips on Improving the Precision of Your Web Search

Each search engine behaves differently based on their ranking algorithm. One of the first things you should do upon accessing a search provider’s Web page is to read any available instructions, or tips on performing a basic and advanced search. After familiarizing yourself with the instructions for using a particular search engine, apply the following tips for improving the precision of your Web search:

·        Check for correct spelling of the word or words. When in doubt, refer to a dictionary or do a spell check in a word processing application.

·        Use a phrase rather than a single word. Type the words in the exact order you expect them to appear.

·        Enclose a phrase in quotes. By using quotes, only instances with an exact match of the phrase are in the search results. A phrase can be the name of a person, for example, “President Richard Nixon.”

·        Avoid nonessential words, such as “the,” “and,” “it,” “me,” and “of.” Using these words strains the search engine unnecessarily, slowing the search and resulting in a large number of useless hits.

·        Define the context of common used words using the Boolean “AND.” For example, “dogs AND owning,” “schools AND guns.” Instances where both words appear in the text will be listed in the search results, but not necessarily in the same order typed.

·        For search engines that support Boolean search queries, Try using AND, OR, NOT between words to include both words, include either word, or exclude the word respectively.

·        Use the plus (+) and minus (-) symbol to refine your search. The plus symbol retrieves only pages that include the word that follows (similar to AND). The minus symbol discards pages that contain the word that follows (similar to NOT). For example, “kids+guns.” Do not use spaces between the words and symbol.

·        Be careful in using punctuation. For example, including a question mark might result in a “No matches found” message.

·        Use sub-searches to narrow the search results. Some search engines, such as AltaVista, have a “Search within these results” option for performing follow-on filtered searches. For example, after performing a kids+guns search, perform a second search within these results by typing “education” (see Figure 5)

·        Where available, select the “Show site abstracts” option, or equivalent. It is often easier to find the desired link in the search results by reading the abstract.

Figure 5: AltaVista—Search Within These Results

Tips for Searching on a Opened Web Page

After opening a Web page from your search results (clicking on a link), it is often difficult to locate the instance of the word or phrase that led you to this page in the first place. Netscape and IE browsers provide a nifty feature for searching within an opened Web page as described in the following paragraphs.

Find in Page – Netscape

In Netscape, to search for a word or phrase in an opened Web page, follow the steps below:

1.        From the Edit menu, choose Find in Page to open a search dialog box (Figure 6).

2.        Type the exact text you want to find. You can type more than one word with a space between.

3.        Select the “Match case” checkbox to find all instances of this text with upper or lower case letters. (I normally  Leave the “Match case” checkbox unchecked to find all instances of this text whether or not there are capital letters).

4.        Select the Up or Down radio buttons to search from the beginning or end of the page. If you have selected any text on the page (highlighted), the search only goes in one direction and does not wrap around to the beginning of the page.

5.        Click the Find Next button multiple times to locate each instance of the word or phrase, which once found, is highlighted.

Figure 6: Netscape’s Find on Page Search Dialog Box

Find in Page – IE

IE has as similar feature as Netscape for finding a word or phrase on an opened Web page:

1.     From the Edit menu, choose Find (on This Page) to open a search dialog box (Figure 7).

2.     Type the exact text of a whole or partial word or phrase you want to find. You can type more than one word with a space between.

3.     Select the “Match whole word only” to find only instances where the complete word matches.

4.     Select the “Match case” checkbox to find all instances of this text with upper or lower case letters.

5.     Select the Up or Down radio buttons to search from the beginning or end of the page. If you have selected any text on the page (highlighted), the search only goes in one direction and does not wrap around to the beginning of the page.

6.     Click the Find Next button multiple times to locate each instance of the word or phrase, which once found, is highlighted.

Figure 7: Internet Explorer’s Find on This Page Search Dialog Box

Using Search Engines on Web Pages

As a convenience to the visitor, large host Web sites often provide a search engine for locating information on their site, or within their LAN environment. These search engines vary in their usage and sophistication, and most have instructions.

Conclusion

The current architecture of the Web does not lend itself to performing well-defined, dynamic searches. This is why there are so many different search providers, search engines, and Web site file and database search options provided. In the future, with the adaptation of structured XML technology, information will become more intelligent. When XML becomes the standard for the Internet, the cross platform exchange of information and the ability to perform highly defined, accurate searches will become a reality. Until then, you will need to apply the Web search strategy and tips presented here to achieve satisfactory results.

Related Links

Tutorial from the University of California at Berkeley on using the Internet

Google’s The Basics of Google Search page has a very good explanation on performing basic and advanced searches using the Google search engine.

Microsoft has some basic tips on using Internet Explorer, searching the Internet, and finding words and phrases in a Web page at this site.