Archive for the ‘Future Standards’ Category

Who Is John Smith: Adventures In Genealogical Data Modeling

November 16, 2010 6 comments

I’m no physicist, but they say when you look for an atom’s component electrons, the electrons are both everywhere and nowhere at the same time. You can never actually see a single electron within the atom that houses the electron, yet everywhere you look, there is evidence of the electron in that exact point at that exact time.

In attempting to pinpoint where a person is within genealogical data, one runs into the same phenomenon. Consider the following. Here is some evidence that may be relevant to a fictional John Lester Smith It is not meant to be an authoritative list but merely illustrates the current issue:


  1. A birth certificate from Fayette County, Illinois, which states a male child named John Lester Smith was born at 401 South Third Street, Vandalia, Fayette County, Illinois, on 1 April, 1900 to parents John Smith and Darcy Smith.
  2. An interview with George Rogers Smith in which George refers to his father’s brother, “Uncle John.”
  3. Marriage certificate from Fayette County, Illinois,  of John Smith and Darcy Madsen who were married 5 April, 1891 in Fayette County, Illinois.
  4. An entry in the 1910 U.S. Federal Census from Enumeration District 12, Ward 3, Vandalia, Fayette County, Illinois, showing in dwelling 14 household 16 at 401 3rd Street family members John Smith, aged, 32, and wife Darcy Smith, aged 30, with son Johnny Smith, aged 10, all born in Illinois.

Now, suppose I “know” there was a person in my family named John Lester Smith. How? Grandma told me. I heard about him all my life.  This was a real person, and I identify him in my genealogy database. Maybe I even talked to him during a séance. (If so, it would make for an interesting source citation.) The point is, I have made an analysis and judgment that there is a person named John Lester Smith. I further think that the pieces of evidence above refer to this person I “know” existed, John Lester Smith.

What does the data show, independent of any analysis, that might be relevant to John Lester Smith?

  1. A male named John Lester Smith was born at 401 South Third Street, Vandalia, Fayette County, Illinois, on 1 April, 1900 to parents John Smith and Darcy Smith.
  2. A male named John Smith distinct from John Lester Smith his son.
  3. An Uncle John, brother of the father of George Rogers Smith.
  4. A John Smith married Darcy Madsen on 5 April, 1891 in Fayette County, Illinois.
  5. John Smith living in Vandalia in 1910 aged 32.
  6. Johnny Smith, son of John and Darcy Smith, aged 10, living with his mother and father in Vandalia.

OK, great. Now I want to put this data my computer’s genealogy database.

Within my database, the identification of the person within a piece of evidence is done according to the name (or names) used in that evidence. Thus:





Middlename= Lester


  1. Referenced in a birth record







  1. Referenced in a birth record
  2. Referenced in a marriage record
  3. Referenced in a census record







  1. Referenced in an interview







  1. Referenced in a census record

But, where’s the part that identifies John Lester Smith, the data representation of the whole person? That piece isn’t here.  To start describing this actual, real-life person, we need another object type entirely, an object which isn’t always available in genealogy software as a separate entity:










Also, why are these references to the previous person-name records part of this real-person record? Where can I put that information into my database? Sources and source citations can be added to genealogy programs for the entries in the person-name records, but there is no way to document the evaluation of this information in today’s software.

Now, you might say, “My database does that differently. It isn’t a problem.” Really? I challenge you to examine it carefully. I think you’ll find that somewhere your evidence and your analysis of that evidence (in this case, expressed as your determination of evidence referring to a single person) get all mixed together in inappropriate ways.

I am not trying to debate the old question, “How do we know anything is real?” Rather, I want to point out that there is a layer of analysis missing in our current tools with which we attempt to document genealogical information, and absence of this analysis layer causes our data to be distorted. This isn’t a particular problem for most cases, but when you get into a tricky case of an ancestor who has conflicting evidence or other need for more detailed analysis, the ability for our current genealogy databases to document these issues breaks down.

This document is far from complete, but it is meant to be a starting point for discussion. It needs further refinement to be sure. However, I hope it is useful in illustrating the need for a better, uniform analysis structure within genealogical software database applications.


Introduction to BetterGEDCOM Video

November 13, 2010 Leave a comment

Build a BetterGEDCOM Press Release

November 9, 2010 Leave a comment

A grass roots initiative to improve data exchange among genealogists

Tuesday Nov 9, 2010. Alexandria, VA. A group of genealogists and programmers
have established a workspace called Build A BetterGEDCOM for developing better
data exchange standards to facilitate sharing between researchers using a
variety of technology platforms, genealogy products and services.

“Genealogy software users are painfully aware that sharing data with other
researchers is difficult since the existing GEDCOM (GENealogy Data
COMmunication) file transfer script hasn’t been updated in 14 years. In the
meantime genealogists have incorporated tools with expanded capabilities
reflecting changing technology,” says Russ Worthington, a genealogy software
power user and popular genealogy lecturer.

In developing a wiki site for pulling together genealogy software programmers,
website developers and end users, genealogy blogger DearMYRTLE explains “The
focus is cooperation. We seek solutions that will enable regular researchers
like me to share genealogy with cousins regardless of the genealogy program
they’ve chosen to use. The current GEDCOM file exchange strips out much of my
hard work, leaving only some of the data I’ve typed in and attached to each
well-documented ancestor. We experience similar problems when uploading and
downloading our genealogy data with popular genealogy websites. If all genealogy product developers agree to a BetterGEDCOM format, such problems will be overcome.”

The BetterGEDCOM wiki site is open to all, and is located

“BetterGEDCOM will be independent. This means no single entity who has an
interest in our work will be the single driving force. Likewise, no work that
anyone has done will be the defined starting place or the de facto basis of our
work.” says Greg Lamberson, the technician who developed initial pages at the
BetterGEDCOM wiki. “We also seek to account for language and cultural
differences as we develop data standards for recording family history
information in text and multi-media formats. Input from BetterGEDCOM
participants the world over is a vital component of this initiative.”

“BetterGEDCOM will seek ISO recognition or recognition by other international
standards bodies,” continues Greg. “This has never been done in the genealogical community. This means we will have to be a community effort with participation by a substantial part of the genealogical technology community. Also, unlike previous efforts, having standards actually codified will provide developers a framework to resolve ambiguities, conflicts or other problems that may develop in using the standard as well as a way to correct or amend the standard as needed.”

“Indeed everyone seems to be ready for something new,” says Greg. “Every person
I have talked to agrees that now is the time for action. The BetterGEDCOM
project invites all to participate so that we may achieve meaningful results.”

— end —

Pat Richley-Erickson


November 1, 2010 3 comments

What’s this? Do I detect signs of life in the genealogical user community to rise up and do something about the pervasive problem with the ancient GEDCOM standard? Well, I sure hope so.

Addressing the primary issue of how to format genealogical data and get it from one person to another in some standardized format is the single most important issue in the genealogical community, in my opinion.

Really, what could be more fundamental?

The only thing I can think of that might take precedence is if I lived across from the Family History Library or the National Archives or my local courthouse and it was on fire. Right now. In that case, I’d grab a bucket, but as things stand now, this whole GEDCOM problem is THE issue.

The GEDCOM Standard is now almost 15 years old. Saying a technology standard governing how information is formatted has reached the age of 15 years is about like reaching your 202nd birthday. Here’s a short, impromptu list of GEDCOM’s problems:

  • It mostly uses an obsolete character set making basic use a compatibility issue in many cases
  • There are many ambiguities in the standard,  which cause developers to have conflicts in how they adopt the standard or simply adopt their own solutions which negates the original purpose of having an standard
  • Passing core information like sources or external supporting documents is done intermittently, badly or not at all
  • Since there is no development on the standard, there is correspondingly no hope of the user community seeing adoption of innovations in genealogy across software programs uniformly
  • Lack of any general effort in the genealogical technology community is now causing developers to retreat into their own environments, making their products even less capable of cooperating with the products of other vendors

Do I need to go on? I’m already going to have nightmares tonight.

The only way to solve this problem is for the user community to step up and exhibit some leadership in the area of genealogical technology. We have to band together and get these software companies to work together, because they’re not going to do it themselves.

Aren’t you sick of your data being trapped inside one program unless you’re willing to sacrifice all that more advanced information to get it out? Aren’t you sick and tired of entering the same information over and over again because the products and services you want to use won’t cooperate enough to share information effectively? Let’s get serious, my fellow genealogists!

Stay tuned for some serious community action!

%d bloggers like this: