Google Scholar

I. Google Scholar and the limits of Google Search

The popular belief that Google provides a comprehensive, universal search experience is inaccurate. While the company’s search tools often do an impressive job of finding useful information, they are unfortunately incapable of finding (or of providing free access to) some resources crucial for academic research.

In this conference we provide an overview of what can and cannot be found using Google Search, take a look at Google Scholar - which has broadened the range of information accessible via Google – and explore some additional Web tools for finding content of interest to scholars.

II. What can be found with Google?

In a nutshell, Google Search searches the following resources for the terms entered by users1 :

It does not search the following resources that are frequently used for scholarly research2 :


Why is this the case?

There are several reasons why Google is unable to find certain types of material:

1. They are locked behind publisher firewalls that repel the “crawler” programs that Google uses to gather information about pages so they can be found by its search engines.

2. The articles in many databases and Web sites accessible only by subscription do not have permanent URLs. Instead, session-specific URLs that include information about the password used to access the database/site or other information are generated each time a user views an article. These URLs expire when the user logs out and cannot be reused. Google’s search tools are only capable of recording and pointing to stable URLs.

3. The information simply is not available digitally.

The vast amount of information available in the so-called “invisible Web” (content that cannot be retrieved by search engines) or that has not been digitized simply cannot be located directly by users using Google’s search tools at present.

Google Scholar (hereafter “Scholar”) is one of Google’s efforts to address this shortcoming in its information retrieval abilities.

III. Enter Scholar

Google’s founders have sought to make “high quality” print and copyrighted periodical content accessible via their search products since the founding of the company3 . As part of this effort, Google struck a deal in early 2004 with a number of large and well-known scholarly publishers, professional associations and government agencies which allowed its crawlers to temporarily access their previously “invisible” content and make it, or some information about it, available to Google searchers. Scholar was launched in November 2004 to make this content available to searchers, and the content began appearing in Google Search results in late 2006.

Anecdotal evidence suggests that Scholar began to quickly acquire the same undeserved reputation for providing universal access to content that Google Search enjoys4 . While the service represents an important acknowledgement by Google of the importance of scholarly literature and substantially broadens the reach of its search tools, there are a number of important limitations to its content and search abilities that need to be kept in mind when using it for research.

1. Key limitations of Scholar

There are additional concerns to bear in mind while using Scholar - we will take a look at these in the next section as we examine how to use Scholar and how to interpret your search results.

2. Accessing Scholar

Scholar can be accessed via this link on the Google Search page. It can also be linked to directly at http://www.google.com/scholar.

We strongly recommend linking via the UMUC library's Google Scholar page at http://www.umuc.edu/library/database/googlescholar.shtml:

Scholar results contain citations for, but not full text of, many articles. Linking to Scholar from the library site will cause links similar to the links in the library's databases to appear next to articles whose full text is available via our database subscriptions:

In many cases, choosing this option will cause hyperlinked article titles to link directly to subscription-only content in the databases to which UMUC subscribes. These linking options will not be available unless you access Scholar via the UMUC page shown above. This option is important to stress to students, who may dismiss useful articles that they have ready access to if they access Scholar through its URL or the link on the Google Search page.

Students may also think that they need to pay to access articles, as Scholar features prominent links to the British Library's fee-based document delivery service in its results sets:

Students and faculty never have to pay for access to articles, of course, as all articles not available through UMUC's database subscriptions can be requested at no charge via Document Express .

3. Searching Scholar

The basic scholar search interface is show above – to look more closely at the search options that are available we will examine its advanced search, which can be accessed through this link: (*Click on the "Advanced Scholar Search" link in the image below to start a short film about searching with Scholar.)

Printable version

Jacsó has noted that the date search feature is not entirely reliable, 9 and our trial searches suggest that the publications and subject search features are not entirely effective – while they do serve to cull out some irrelevant results, they do not do so completely.

Experienced searchers will see immediately that Scholar's advanced search form gives us far fewer options for limiting searches than those offered by many subscription databases. Here, for example, is the advanced search page from the Academic Search Premier database:

*Click the image to view larger image.

The limited advanced search options and comparatively weak functionality offered by Scholar are due to the fact that it is incapable of true fielded searching . Although the data created by many of the publishers who provided content to the project was undoubtedly highly structured (with elements such as author, title, abstract, etc. tagged in a manner that allows the search tools of traditional databases to target searches on these fields), Google's crawlers are not designed to parse highly structured data, and instead treated this content as they do traditional HTML documents: as largely undifferentiated masses of text. As a result, fine-grained, targeted searching is impossible with Scholar.

Moreover, Scholar does not apply the authority control exercised by many databases traditionally used by scholars: variants of author names (e.g., “Smith, Mark E.” and “Smith, M.E.”) are not unified by tagging all articles by that author with a single, universally applied version of the author's name, and no subject terms are used to allow the bulk of the content on a particular subject to be retrieved via a single search. As a result, true subject searching is impossible and the author search is unreliable.

These shortcomings are by no means presented to discourage you from using Scholar in your research, or from recommending it to your students . As with Google Search, it can be a very useful tool, provided it used in an informed manner. As Google does very little to inform its users about the “under the hood” workings of its search products, we simply hope to share some of what we have learned in using Google in our own work, and promote savvy searching by UMUC students and faculty.

4. Using search operators in Scholar

Of the search operators discussed in Module 2, only the following will work in Scholar:

To these, Scholar adds a new operator, author:

author: The “author:” operator invokes the author search described in the last section.

The lack of authority control in Scholar makes it necessary to search for multiple variants of an author's name to get a feel for how much material is truly available:



In the next section we will look more closely at Scholar results sets.

5. Understanding Scholar search results

Here is a Scholar search results page. In it we can see the primary categories of documents that appear in Scholar results:

a. [BOOK]

Results tagged with “[BOOK]” may be full text of electronic books, excerpts from them, or simply citations from other documents that Scholar has indexed or from the online library catalogs of Google Scholar partner organizations. When full or partial text of a book is available online, its title will generally be hyperlinked as above.

b. [CITATION]

Results tagged with “[CITATION]” are citations found in other items in the Scholar collection (including, but not limited to, scholarly journal articles). The “[book]” and “[citation]” tags are often used interchangeably.

c. [DOC]

Results tagged with “[DOC]” represent a wide variety of document types: preprints, newsletters and other informal publications of Scholar partner organizations, white and technical papers, conference proceedings, etc. There is currently no way to limit your searches to specific categories of documents or any list of what document types Scholar has indexed.

d. Untagged

Results not identified by one of the tags above generally fall into one of three categories:

Titles of these items will often be hyperlinked, but links will not always direct to full text. If full text is not freely available through Scholar, or available in one of the databases UMUC subscribes to, the user will often be redirected to a page provided by the publisher giving information about the item. For example, clicking on the title here …

… takes users to this page:

These redirect pages often contain no information about how the full text of the document can be obtained . Finding full text of items in Scholar search results is not always a matter of simple click-and-read. Users need to be aware that Scholar can work in tandem with UMUC database subscriptions to provide access to full text, and that Document Express can provide free access to materials that are not readily available.

Below we will take a look at some of the links that appear in Scholar search results that you can use to find additional information about the items retrieved.

It is worth pausing here noting a feature of Scholar that gives it an advantage over many of the databases the UMUC library subscribes to, i.e., it retrieves materials in numerous foreign languages. Recent test searches found items in Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish and Swedish. It is not currently possible to limit searches to materials in a particular language, however, and a complete list of languages in which materials are available has not been provided by Google.

6. Links in Scholar search results

In addition to the and “BL Direct” links noted above, the following links appear in Scholar search results:

a. Related Articles:

Clicking on the “Related Articles” link will execute a search for all other items (including those tagged as books, citations, and non-article documents) findable by Scholar that Google deems to be related. As with the “similar” search field in the Google Search advanced search form, the results found by this search will contain some irrelevant items and should not be considered comprehensive.

b. Web Search & Library Search

Clicking on “Web Search” executes a Google Search search for Web documents related to the item listed. “Library Search” performs a search for the item in WorldCat and identifies libraries near the searcher that own the print edition of that item.

c. Cited by …

Scholar purports to perform a citation count for each item, calculating the number of times that item has been cited in other sources. For a variety of reasons, including the way in which Google indexes the text of pages (see above), these figures are often inaccurate and need to be approached with caution 10.

d. Cached

Links in Scholar results at times are broken, pointing to documents that have been removed from the servers of its partner organizations or have changed their URLs. Clicking on the “Cached” link will take the user to a snapshot of the image maintained on Google's servers, allowing access to the document, or to an older version of it. Appearance of the “Cached” link appears to be highly inconsistent, however, and it is not available for all such documents.

e. group of …

Many items found by Scholar appear in multiple forms in its results (for example, as a preprint and as a published article). Clicking on the “group of” link will isolate all of these instances in a single results set. These sets will frequently include citations of the item found in other items indexed by Scholar as well as full text variants.

Please be aware that Scholar does not offer any way to sort, save, or email results that you find in your searches . Copying results directly from the browser window is the only way to record information about items found in your searches, other that recording your terms and retrying the search at a later date (results sets may change if new content is added to Scholar, however.) The ability to manipulate and record results in many library databases gives them a distinct advantage over Scholar.

IV. Other tools for accessing content of interest to scholars: An overview

Lastly, we will take a brief look at other tools available to scholars for finding information online. Limited space prevents us from providing a complete overview, so we strongly encourage you to post a question in the Conferences section of the classroom or to contact us directly if you have questions about finding the information you need.

1. Periodical content

UMUC subscribes to over 140 databases that provide access to tens of thousands of articles from both scholarly and popular publications that cannot be located by Scholar searches. A complete list is available at: http://www.umuc.edu/library/database/(Figure 1.)

Lists of the best databases to use in researching specific subjects can be found at: http://www.umuc.edu/library/database/databases.shtml#subjects(Figure 2.)

Figure 1. Figure 2.

The majority of these databases provide detailed information about the publications included, as well as the years for which coverage of those publications is provided. Search results can be saved, sorted, and emailed to users, and thorough indexing generally allows for finely grained, targeted searches 11.


2. Research Port: Google-like searching of library databases

Many students are frustrated by the fact that, while Google allows thousands of sources to be searched simultaneously through a single interface, library database searches often require them to log into multiple databases. Research Port, introduced by the University System of Maryland libraries in 2006, now allows up to 8 databases to be searched via one interface:

Research Port currently does not have an advanced search function, cannot search some databases, and all resources accessible through it cannot be searched using the same subject headings. We still have a ways to go before it truly combines the rigor of traditional library search tools with the apparent ease of Google searching. We do suggest recommending it to students who may otherwise be resistant to using library tools in their research, however . It is also an excellent search shortcut for researchers who find they use the same set of databases frequently. Research Port can be accessed via a link on the databases page.

3. Grey literature

As noted above, Scholar is able to find grey literature from a variety of sources. While no single search tool is capable of a complete search of the many preprints, dissertations, theses, conference proceedings, white papers, technical reports, and other types of unpublished or informally published material of interest to scholars, Scholar is well worth using when looking for such material.

Other useful resources to consider using in your searches include:

CiteSeer and arXiv are only two examples of the dozens of free grey literature search tools that have appeared on the Web in recent years . The University System of Maryland libraries provides a helpful guide to these tools at: http://www.lib.umd.edu/ETC/preprints.html

Increased access to grey literature via the Web is causing it to play a more and more important role in communication and research in many disciplines 12. We highly recommend taking the time to investigate the resources available in your field if you have not already done so. Please contact us if you have any questions.

4. Monographs

As we will see in the next module, Google's Book Search offers limited access to the information found in monographs. The full text of two important categories of books must be found by other means, however:

UMUC faculty and staff have access to the hundreds of thousands of books owned by the University System of Maryland. Stateside users can borrow books directly from the libraries or have books shipped to them at no cost, while users overseas can have content scanned and emailed to them. Millions more titles are available via interlibrary loan. University System of Maryland book collections are searched using catalogUSMAI: http://catalog.umd.edu/

A complete guide to search for obtaining books through the Maryland libraries can be found at: http://www.umuc.edu/library/tutorials/catalog/

Need help finding information?

UMUC librarians are available 24/7 to assist you and your students in using Scholar or other tools to search for information. We can be contacted by email, phone and chat:

Ask a Librarian: http://www.umuc.edu/library/help/ask.shtml

 

1 Noted in numerous sources and confirmed by our recent searches - see this presentation summary by University of Washington librarian and long-time Google-watcher (and blogger ) Dean Giustini for one corroborating source. Google Scholar content began appearing in Google Search after Giustini's presentation, and this assertion is based solely on the experience of the authors.
2Ibid.
3 See Toobin, 2007 for an entertaining overview of the company's ambitions.
4 See for example Terdiman, 2004 .
5 Jacsó , 2005
6Ibid.
7Ibid.
8Neuhouse, C., Neuhouse, E., Asher, A., & Wrede, C. (2006). The depth and breadth of Google Scholar: An empirical study. Libraries and the Academy 6 (2), 127-141. Retrieved February 15, 2007 from Project Muse database.
9Jacsó , 2005 .
10See Jacsó, 2006 for a detailed analysis. Despite the seemingly arcane subject matter, this article is well-worth reading if you use Scholar and work in – or are instructing students who will work in – a field where citation analysis of your work figures importantly in applying for research grants or lobbying for promotion. Meho, 2007 , which surveys the current state of citation analysis, the shortcomings in traditional methods created by increased use of Web-accessible grey literature, and emerging strategies and tools, is also helpful.
11Which is not to say that these databases are perfect tools, of course: Padilla, 2007 , which details the impact of litigation and unreliable automated indexing on Web-access to New York Times content, demonstrates that thorough research still depends on making informed use of multiple sources.
12 See Meho, 2007 .

Optional Exercise: Practice Using Scholar

The Google Scholar forum demonstrated how to access content from academic journals online. In this exercise, we will use the resources available to the UMUC community, including Scholar, to find the full text of 2 scholarly articles:

  1. Link to Scholar via the UMUC site at: http://www.umuc.edu/library/database/googlescholar.shtml
  2. Choose 2 scholarly articles that you have used recently in your work and search for them using Scholar.
  3. If the articles are available, note whether they are freely accessible or available only as a result of UMUC subscriptions (look for the “Find it at UMUC” links in your results).
  4. If one or both of your articles cannot be found using Scholar, try searching for them in UMUC's databases: http://umuc.edu/library/database. Remember that you can use Journal Finder to determine if any of our databases contain full text from a particular publication.