This page was last updated on 24 April 2001
Back
to Hugh Reekie's Home
Page
The
purpose of this document is to recommend a structure for name entry
in various genealogical documents on the 'net, in order for family
historians to find matches as conveniently and fast as possible,using
appropriate and available resources. As the web is changing at
an
incredible pace, this document reviews
status, new features and makes suggestions as seem to be
sensible,
taking present trends into account. Its
primarly relevance is for the United Kingdom. This is a draft with
rather scrappy formatting; it is incomplete.
Note - some of the links are obsolete
This document is made up of the following
Sections;
1. Introduction and basic assumption
One important assumption or premice is made: in the near future MANY MANY out-of-copyright books will be "on the web" using scanners; they will be indexed by some of the search engines; this process has already started with maps (e.g. Ancestry Home Town Daily).
This rapid major increase of information is both an opportunity and a curse: an opportunity because persons living in the 1700 -1900 time period will be mentioned in some of these books; but also a curse because both users and the search engines themselves will have to me much more sophisticated for effective, succesful, searching. The writer believes that Meta-tag, keyword and partial-document searchingwill have minimal usefulness for this application; it is already reported that the num,ber of indexed web pages is already 50,000,000 (March 98) or some seach engines - ref 111.
For a few months now I have been experimenting with the possibility of using the web to find a person, or ancestor, using a direct search. I have found it not onlypossible, but quick, useful and informative; but I have only found names that have been placed in web pages in specific text formats. I can find Hugh REEKIE and Hugh Reekie quite easily, but to find Reekie, Hugh and REEKIE Hugh requires one or two separate searches. Multiple searches of the two names may still not readily find text such as "REEKIE - looking for ancestors of Hugh, Andrew and Ian Reekie" - here you would get a search result using REEKIE or Hugh or Ian Reekie. I am aware that specific word searches using "within adjacent text" constraints are sometimes possible, which would alleviate this problem somewhat.
I have done some experiments with my own
family history web page design (Fife Surname Pages) and checked out a
few search engines,with their various specialist search choices, and
I have come up with some conclusions. You may jump to the last
section to read the them if you wish - Section 7.
I have not been able to locate a simple list of the key parameters of the various major Internet Search Engines; some engines are more crorrectly termed searchable catalogues, for good reason. But a good reference is
http://www.SearchEngineWatch.com/resources/index.html - ref 444 - other general references are in Section 8.
I have had difficulty answering such basic questions as:
- What words does this engine catlog? e.g. all text, meta tagged words, the first xxx words, the Title?
- Are complex or conditional searches possible?
- How large is the web page datasource; how fast is it expanding?
- What web pages are catalogued; how is this done?
- Do sites have restrictions or biases in listing, indexing or cataloging?
- What rules and conventions exist in handling capitalization? This is answered, in part, at :
Some of this information is not easy to obtain, and my search has been limited; it is not in any way complete.
The Opentext Index, formerly at http://index.opentext.net , announced (March 98 - 555) that their Internet search service has permanently ceased operating according to their traditional model. Now refreshing directly to http://pinstripe.opentext.com , the service is directed towrds business users - 555.
7.
Conclusion
8.
References
The information above has been garnered from various places, including those listed below. The - 111, - 222 convention has been used as a convenient reference method rather than the usual "Ref 1" or "superscript 1" indications.
9.
Glossary
Seed, seeded - the process of submitting a URL to a search engine for registration and indexing purposes.
Comments,
submissions and suggestions welcome -
Hugh
Reekie
h.reekie@ieee.org