Data Search Engine
Introduction. The World Wide Web was first developed by Tim Berners- Lee and his colleagues in 1990. In just over a decade, it has become the largest information source in human history. The total number of documents and database records that are accessible via the Web is estimated to be in the hundreds of billions (1).
By the end of 2005, there were already over 1 billion Internet users worldwide. Finding information on the Web has become an important part of our daily lives. Indeed, searching is the second most popular activity on the Web, behind e-mail, and about 550 million Web searches are performed every day.
The Web consists of the Surface Web and the Deep Web (Hidden Web or Invisible Web). Each page in the Surface Web has a logical address called Uniform Resource Locator (URL). The URL of a page allows the page to be fetched directly. In contrast, the Deep Web contains pages that cannot be directly fetched and database records stored in database systems. It is estimated that the size of the Deep Web is over 100 times larger than that of the Surface Web (1).
The tools that we use to find information on the Web are called search engines. Today, over 1 million search engines are believed to be operational on the Web (2). Search engines may be classified based on the type of data that are searched. Search engines that search text documents are called document search engines, whereas those that search structured data stored in database systems are called database search engines.
Many popular search engines such as Google and Yahoo are document search engines, whereas many e-commerce search engines such as Amazon.com are considered to be database search engines. Document search engines usually have a simple interface with a textbox for users to enter a query, which typically contains some key words that reflect the user’s information needs. Database search engines, on the other hand, usually have more complex interfaces to allow users to enter more specific and complex queries.
Most search engines cover only a small portion of the Web. To increase the coverage of the Web by a single search system, multiple search engines can be combined. A search system that uses other search engines to perform the search and combines their search results is called a metasearch engine. Mamma.com and dogpile.com are metasearch engines that combine multiple document search engines whereas addall.com is a metasearch engine that combines multiple database search engines for books. From a user’s perspective, there is little difference between using a search engine and using a metasearch engine.
This article provides an overview of some of the main methods that are used to create search engines and metasearch engines. In the next section, we describe the basic techniques for creating a document search engine. Then we sketch the idea of building a database search engine. Finally, we introduce the key components of metasearch engines, including both document metasearch engines and database metasearch engines, and the techniques for building them.
Date added: 2024-07-23; views: 155;