Beyond Google – Deep Web Searching for 21C Learners

While Google is the most popular search engine and improving the quality of its results all the time, it still can only search a fraction of the information available on the web.  At the end of 2010, Google added its trillionth address (1,000,000,000,000 searchable websites) and yet this is still only the tip of the iceberg compared to what is available. (Wright, 2009)

This vast amount of information, which is inaccessible to Google and other popular search engines, is collectively referred to as the deep web, the hidden web or the invisible web, in contrast to the documents which these search engines can access, collectively referred to as the surface web or the visible web.

No one knows for sure how big the deep web is, but all agree that it is vast.  According to one estimate, the “total quality content of the deep web is at least 1,000 to 5,000 times greater than that of the surface web” (CompletePlanet), while the Kosmix website states that “experts estimate that search engines can access less than 1% of the data available on the Web“.  Whichever is true, both indicate that there is an enormous amount of quality information which our students are not tapping into if they are only using Google for their research.

Harvesting the Deep and Surface Web with a Directed Query Engine (Michael Bergman)

The deep web has been described by Maureen Henninger, author of the book  The Hidden Web: Finding quality information on the net as “publicly accessible, non-proprietary pages that are not ‘seen’ by the spiders of general search engines” (2008, p. 162).  Dr Marcus Leaning of the University of Winchester categorises the web into three sections:

•       Free, visible web - designed for the web and for being searched.

•       Free, invisible web – resistant to being searched, sites require their own search engine.

•       Not free, invisible web – closed networks, require their own search engines or need passwords. (manipulating-media.co.uk, 27/08/2010)

Much of the information contained in the deep web is referred to as ‘grey literature’ or ‘white  papers’ and these can be defined as “working documents, pre-prints, research papers, statistical documents, and other difficult-to-access materials that are not controlled by commercial publishers” (LAOAP).  Producers of grey literature include research groups, non-for-profit groups, universities and government departments.

One reason why Google is not able to access many of these documents within the deep web is that they are often located in databases that require their own search engines.  If you know the name of the database, Google can take you to it, but it cannot search the database to retrieve information from it.  Our BGS subscription databases, accessible through MyGrammar, form part of the deep web.  They contain high-quality academic information that Google cannot retrieve, so it is essential that our students learn to search here first when researching for assignments.

There are also search engines and subject gateways specifically designed to access the deep web, and some of these are: Intute, Incy Wincy, Science Accelerator, BUBL , WWW Virtual Library and Infomine.  At BGS, we encourage the boys to use these search engines in conjuction with our subscription databases and Google Scholar to find academic, quality, peer-reviewed infomation for their assignments.

Read more:
Deep Webhttp://www.kosmix.com/topic/deep_web/overview/uc_kosmixarticle_cached#ixzz1Egwlsnt6
Henninger, M. (2008). The hidden web: Finding quality information on the net (2nd ed). Sydney: UNSW Press.

BrightPlanet  http://brightplanet.com/the-deep-web/deep-web-faqs/

Wright, A., (2-23-09) Exploring a ‘Deep Web’ that Google can’t Grasp,  New York Times,  http://www.nytimes.com/2009/02/23/technology/internet/23search.html?_r=1&ref=business

Bergman, M. (2001). The Deep Web: Surfacing Hidden Value, Journal of Electronic Browsing, Retrieved from  http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104

Leaning, M., (22-10-2010), Searching for and finding new information – Desk research – tools, strategies and techniqueshttp://manipulating-media.co.uk/2010/08/27/searching-for-and-finding-new-information-desk-research/

Deep Web Video -  Office of Science and Technical Information

· ·

1 comment

Leave a Reply