Who Is John Smith: Adventures In Genealogical Data Modeling

November 16, 2010 6 comments

I’m no physicist, but they say when you look for an atom’s component electrons, the electrons are both everywhere and nowhere at the same time. You can never actually see a single electron within the atom that houses the electron, yet everywhere you look, there is evidence of the electron in that exact point at that exact time.

In attempting to pinpoint where a person is within genealogical data, one runs into the same phenomenon. Consider the following. Here is some evidence that may be relevant to a fictional John Lester Smith It is not meant to be an authoritative list but merely illustrates the current issue:

Evidence

  1. A birth certificate from Fayette County, Illinois, which states a male child named John Lester Smith was born at 401 South Third Street, Vandalia, Fayette County, Illinois, on 1 April, 1900 to parents John Smith and Darcy Smith.
  2. An interview with George Rogers Smith in which George refers to his father’s brother, “Uncle John.”
  3. Marriage certificate from Fayette County, Illinois,  of John Smith and Darcy Madsen who were married 5 April, 1891 in Fayette County, Illinois.
  4. An entry in the 1910 U.S. Federal Census from Enumeration District 12, Ward 3, Vandalia, Fayette County, Illinois, showing in dwelling 14 household 16 at 401 3rd Street family members John Smith, aged, 32, and wife Darcy Smith, aged 30, with son Johnny Smith, aged 10, all born in Illinois.

Now, suppose I “know” there was a person in my family named John Lester Smith. How? Grandma told me. I heard about him all my life.  This was a real person, and I identify him in my genealogy database. Maybe I even talked to him during a séance. (If so, it would make for an interesting source citation.) The point is, I have made an analysis and judgment that there is a person named John Lester Smith. I further think that the pieces of evidence above refer to this person I “know” existed, John Lester Smith.

What does the data show, independent of any analysis, that might be relevant to John Lester Smith?

  1. A male named John Lester Smith was born at 401 South Third Street, Vandalia, Fayette County, Illinois, on 1 April, 1900 to parents John Smith and Darcy Smith.
  2. A male named John Smith distinct from John Lester Smith his son.
  3. An Uncle John, brother of the father of George Rogers Smith.
  4. A John Smith married Darcy Madsen on 5 April, 1891 in Fayette County, Illinois.
  5. John Smith living in Vandalia in 1910 aged 32.
  6. Johnny Smith, son of John and Darcy Smith, aged 10, living with his mother and father in Vandalia.

OK, great. Now I want to put this data my computer’s genealogy database.

Within my database, the identification of the person within a piece of evidence is done according to the name (or names) used in that evidence. Thus:

Person-name:

Person-name-number=1

Title=

Firstname=John

Middlename= Lester

Lastname=Smith

  1. Referenced in a birth record

Person-name:

Person-name-number=2

Title=

Firstname=John

Middlename=

Lastname=Smith

  1. Referenced in a birth record
  2. Referenced in a marriage record
  3. Referenced in a census record

Person-name:

Person-name-number=3

Title=Uncle

Firstname=John

Middlename=

Lastname=

  1. Referenced in an interview

Person-name:

Person-name-number=4

Title=

Firstname=Johnny

Middlename=

Lastname=Smith

  1. Referenced in a census record

But, where’s the part that identifies John Lester Smith, the data representation of the whole person? That piece isn’t here.  To start describing this actual, real-life person, we need another object type entirely, an object which isn’t always available in genealogy software as a separate entity:

real-person:

PersonID=1

Whole-person-name=

Real-person-firstnameFirst=John

Real-person-middlename=Lester

Real-person-lastname=Smith

Ref=person-name:1:A

Ref=person-name:3:A

Ref=person-name:4:A

Also, why are these references to the previous person-name records part of this real-person record? Where can I put that information into my database? Sources and source citations can be added to genealogy programs for the entries in the person-name records, but there is no way to document the evaluation of this information in today’s software.

Now, you might say, “My database does that differently. It isn’t a problem.” Really? I challenge you to examine it carefully. I think you’ll find that somewhere your evidence and your analysis of that evidence (in this case, expressed as your determination of evidence referring to a single person) get all mixed together in inappropriate ways.

I am not trying to debate the old question, “How do we know anything is real?” Rather, I want to point out that there is a layer of analysis missing in our current tools with which we attempt to document genealogical information, and absence of this analysis layer causes our data to be distorted. This isn’t a particular problem for most cases, but when you get into a tricky case of an ancestor who has conflicting evidence or other need for more detailed analysis, the ability for our current genealogy databases to document these issues breaks down.

This document is far from complete, but it is meant to be a starting point for discussion. It needs further refinement to be sure. However, I hope it is useful in illustrating the need for a better, uniform analysis structure within genealogical software database applications.

Google, Facebook & Yahoo CEOs Speak at Web 2.0 Summit [LIVE VIDEO]

November 15, 2010 Leave a comment

 

The Web 2.0 Conference, one of the premier technology conferences, is allowing a lot of its sessions to be broadcast via streaming video this year! Check it out:

Google, Facebook & Yahoo CEOs Speak at Web 2.0 Summit [LIVE VIDEO].

Introduction to BetterGEDCOM Video

November 13, 2010 Leave a comment

Build a BetterGEDCOM Press Release

November 9, 2010 Leave a comment

PRESS RELEASE
BUILD A BetterGEDCOM
A grass roots initiative to improve data exchange among genealogists

Tuesday Nov 9, 2010. Alexandria, VA. A group of genealogists and programmers
have established a workspace called Build A BetterGEDCOM for developing better
data exchange standards to facilitate sharing between researchers using a
variety of technology platforms, genealogy products and services.

“Genealogy software users are painfully aware that sharing data with other
researchers is difficult since the existing GEDCOM (GENealogy Data
COMmunication) file transfer script hasn’t been updated in 14 years. In the
meantime genealogists have incorporated tools with expanded capabilities
reflecting changing technology,” says Russ Worthington, a genealogy software
power user and popular genealogy lecturer.

In developing a wiki site for pulling together genealogy software programmers,
website developers and end users, genealogy blogger DearMYRTLE explains “The
focus is cooperation. We seek solutions that will enable regular researchers
like me to share genealogy with cousins regardless of the genealogy program
they’ve chosen to use. The current GEDCOM file exchange strips out much of my
hard work, leaving only some of the data I’ve typed in and attached to each
well-documented ancestor. We experience similar problems when uploading and
downloading our genealogy data with popular genealogy websites. If all genealogy product developers agree to a BetterGEDCOM format, such problems will be overcome.”

The BetterGEDCOM wiki site is open to all, and is located
at http://bettergedcom.wikispaces.com.

“BetterGEDCOM will be independent. This means no single entity who has an
interest in our work will be the single driving force. Likewise, no work that
anyone has done will be the defined starting place or the de facto basis of our
work.” says Greg Lamberson, the technician who developed initial pages at the
BetterGEDCOM wiki. “We also seek to account for language and cultural
differences as we develop data standards for recording family history
information in text and multi-media formats. Input from BetterGEDCOM
participants the world over is a vital component of this initiative.”

“BetterGEDCOM will seek ISO recognition or recognition by other international
standards bodies,” continues Greg. “This has never been done in the genealogical community. This means we will have to be a community effort with participation by a substantial part of the genealogical technology community. Also, unlike previous efforts, having standards actually codified will provide developers a framework to resolve ambiguities, conflicts or other problems that may develop in using the standard as well as a way to correct or amend the standard as needed.”

“Indeed everyone seems to be ready for something new,” says Greg. “Every person
I have talked to agrees that now is the time for action. The BetterGEDCOM
project invites all to participate so that we may achieve meaningful results.”

— end —

CONTACT:
Pat Richley-Erickson

Myrt@DearMYRTLE.com

Missouri Digital Heritage’s Online Image Viewer Issue

November 3, 2010 3 comments

Missouri Digital Heritage is the State Of Missouri’s joint effort between the State Archives and the State Library to publish content of historical and genealogical value online. One of their favorite projects is their addition of all death records and death certificate images fifty years or older online. This wonderful collection includes a couple different searchable indexes. They also have a very nice collection of local history book images online, as well as several other collections.

Unfortunately, even with all this great content, the site can be a bear to work with, particularly when trying to read the online local histories.

A couple weeks ago I wrote a letter of concern to the Missouri Digital Heritage support staff about their wonderful project’s nearly unusable online image viewer:

Today I am attempting to use the resources on the Missouri Digital Heritage website. Specifically, I am viewing the History of Lincoln County. The choice of viewer used in presentation of this resource is difficult at best and unusable in many cases. I am using Windows 7 and prefer to useGoogle Chrome web browser, though because of the difficulties in sizing the image window in this browser, I have also attempted to use Firefox and Internet Explorer. In all cases, the images are difficult or impossible to view properly, because the sizes of the window on the web page and the resolution of the image needed to read the words make it impossible to view a column of text on a single page. The column of words cuts off, making actual reading of the text impossible. You really must reconsider your internet presentation technologies, as the inflexibility of sizing and resolution choice in your current offerings make your admirable digitization initiative practically unusable.

Well, today I got a response:

Thank you for expressing your concerns about the viewer used on the MDH site.  We are aware of the problem and have been working diligently over the past two years for permission to adapt or upgrade our software.  Unfortunately, due to technical issues, we’ve only be able to reach an intermediate version of the software and have not found another viewer that will work for us.  A newer version of the software will be available in spring 2011, but it will likely be another year before we are allowed to migrate to that version.  In addition, we are not allowed by state guidelines to enlarge the viewer to take up the entire screen width as we must maintain the site’s accessibility to 800 x 600 monitors, as well as considering other ADA-compliance issues.

In working with the site every day, we are well aware of the frustrations with viewing materials.  Unfortunately, I can only offer stop-gap assistance at this point.  My first recommendation would be to use Ctrl + to zoom your browser in and out.  This will allow you to fit a page within the viewer screen, but then fill your monitor screen with the viewer itself.  You may also use the clip tool on the far right of the viewer toolbar to bring up a page full screen.  For that to work, you must have the entire page showing within the viewer window.  You would have to do that for each page you want to view.  None of these are particularly efficient, but until we can upgrade to version 6 of the background software, we have no other options.

Again, thank you for expressing your concerns as it helps us build a case for further migrations.

Tricia

Patricia L. Walker, CA
Digital Collections Coordinator
Missouri Digital Heritage Initiative

Missouri State Library
600 W. Main St., P.O. Box 387
Jefferson City, MO  65102-0387

Phone: 800-325-0131 ext. 10
Fax: 573-751-3612
Email: patricia.walker@sos.mo.gov

To me, that’s just an invitation to make sure they get plenty of mail. How about it, folks? Help the Missouri Digital Heritage Initiative  know how you feel about their horrible image viewer today by sending your thoughts to: mdh@sos.mo.gov . Thanks!

Categories: Library Technology

We Need A New GEDCOM NOW!

November 1, 2010 3 comments

What’s this? Do I detect signs of life in the genealogical user community to rise up and do something about the pervasive problem with the ancient GEDCOM standard? Well, I sure hope so.

Addressing the primary issue of how to format genealogical data and get it from one person to another in some standardized format is the single most important issue in the genealogical community, in my opinion.

Really, what could be more fundamental?

The only thing I can think of that might take precedence is if I lived across from the Family History Library or the National Archives or my local courthouse and it was on fire. Right now. In that case, I’d grab a bucket, but as things stand now, this whole GEDCOM problem is THE issue.

The GEDCOM Standard is now almost 15 years old. Saying a technology standard governing how information is formatted has reached the age of 15 years is about like reaching your 202nd birthday. Here’s a short, impromptu list of GEDCOM’s problems:

  • It mostly uses an obsolete character set making basic use a compatibility issue in many cases
  • There are many ambiguities in the standard,  which cause developers to have conflicts in how they adopt the standard or simply adopt their own solutions which negates the original purpose of having an standard
  • Passing core information like sources or external supporting documents is done intermittently, badly or not at all
  • Since there is no development on the standard, there is correspondingly no hope of the user community seeing adoption of innovations in genealogy across software programs uniformly
  • Lack of any general effort in the genealogical technology community is now causing developers to retreat into their own environments, making their products even less capable of cooperating with the products of other vendors

Do I need to go on? I’m already going to have nightmares tonight.

The only way to solve this problem is for the user community to step up and exhibit some leadership in the area of genealogical technology. We have to band together and get these software companies to work together, because they’re not going to do it themselves.

Aren’t you sick of your data being trapped inside one program unless you’re willing to sacrifice all that more advanced information to get it out? Aren’t you sick and tired of entering the same information over and over again because the products and services you want to use won’t cooperate enough to share information effectively? Let’s get serious, my fellow genealogists!

Stay tuned for some serious community action!