Search August 12 2008

Search sub-group

August 12, 2008 at 3:00 p.m., Mathematics & Computer (MC) 1058

1. Administration

Present: Megan, Glenn, Lauren, Kevin, Bill

Minutes by: Bill

  • Lauren agreed to present a sub-group progress report at the next Web Advisory Group meeting. (which was later rescheduled to September 11)
  • Pat was asked to schedule search training sessions. The target is for 2 sessions in November. One session will be for Content Contributors and one for Web developers.

2. Updates on current search

  • recent problems with Google Application Programming Interface (API)
  • Stats now installed on "new" search, remind Glenn to add to clfscripts versions

There have been recent problems with the Google API returning no results. The limits don't seem to be the reason for this. A fall coop student will be assigned the task of getting this working with the new Google tools as well as the new Yahoo API.

Glean inserted the Google Analytics tracking code into the new and old searches.

3. Web advisory next week: what to take to the larger group.

The last presentation was 2 meetings ago. They should be updated on our objectives, research into the Google appliance and introduced to our preliminary report. What we really want feedback on is whether we need to take our search to the next level or if the free tools are sufficient.

4. Search objectives/draft report.

  • what does comprehensive, accurate, and relevant mean? Do the current tools satisfy this requirement? How do we evaluate this?
  • feedback on structure of the document

The objectives from the preliminary report were discussed. These represent the essential requirements for a University of Waterloo search engine.

The question was asked, what does “comprehensive”, “accurate” and “relevant” mean. “Relevant” and “accurate” were deemed redundant. The discussion lead to a need for some method to separate official and unofficial sites, which quickly moved to question of Search Engine Optimization (SEO).

SEO is highly dependent upon the quality of the documents being indexed. Training is essential and it was suggested that a document be prepared for the fall training sessions that would include some basic best practices complete with why it is relevant to SEO.

The discussion of internal manipulation of results to compensate for the poor quality of web pages was tempered by the realization that this will not impact upon searches from Google proper. To compensate for poor SEO with tool manipulation is poor practice. The problem should be addressed at the source, poor page creation. A collection of test searches with explanations of why they fail because of poor SEO would be highly useful.

Google says that they only index 70% of a site. This guesstimate was questioned. It seems that the Waterloo may be higher but we have no way to confirm this.

When a new site is created and released how long does it take Google to index it? Timeliness was added as a new objective to the preliminary report. Similarly, the question was asked, how quickly are expired pages removed from an index. With a local Google appliance this type of control could be handled locally with some assurance.

Three tasks were assigned to the search group members for the next meeting:

  1. Continue to research the “comprehensive” nature of Waterloo’s indexing by Google. Try to find pages that are not included in the index and try to discover why they do not appear.
  2. Look for search results to relating to “relevance” and again try to determine why pages appear near the top or not.
  3. See if you can determine the “timeliness” of Google indexing. How quickly do new pages appear?

-- Megan McDermott - 12 Aug 2008