Home > Future Standards, GEDCOM, Genealogy Software > We Need A New GEDCOM NOW!


November 1, 2010 Leave a comment Go to comments

What’s this? Do I detect signs of life in the genealogical user community to rise up and do something about the pervasive problem with the ancient GEDCOM standard? Well, I sure hope so.

Addressing the primary issue of how to format genealogical data and get it from one person to another in some standardized format is the single most important issue in the genealogical community, in my opinion.

Really, what could be more fundamental?

The only thing I can think of that might take precedence is if I lived across from the Family History Library or the National Archives or my local courthouse and it was on fire. Right now. In that case, I’d grab a bucket, but as things stand now, this whole GEDCOM problem is THE issue.

The GEDCOM Standard is now almost 15 years old. Saying a technology standard governing how information is formatted has reached the age of 15 years is about like reaching your 202nd birthday. Here’s a short, impromptu list of GEDCOM’s problems:

  • It mostly uses an obsolete character set making basic use a compatibility issue in many cases
  • There are many ambiguities in the standard,  which cause developers to have conflicts in how they adopt the standard or simply adopt their own solutions which negates the original purpose of having an standard
  • Passing core information like sources or external supporting documents is done intermittently, badly or not at all
  • Since there is no development on the standard, there is correspondingly no hope of the user community seeing adoption of innovations in genealogy across software programs uniformly
  • Lack of any general effort in the genealogical technology community is now causing developers to retreat into their own environments, making their products even less capable of cooperating with the products of other vendors

Do I need to go on? I’m already going to have nightmares tonight.

The only way to solve this problem is for the user community to step up and exhibit some leadership in the area of genealogical technology. We have to band together and get these software companies to work together, because they’re not going to do it themselves.

Aren’t you sick of your data being trapped inside one program unless you’re willing to sacrifice all that more advanced information to get it out? Aren’t you sick and tired of entering the same information over and over again because the products and services you want to use won’t cooperate enough to share information effectively? Let’s get serious, my fellow genealogists!

Stay tuned for some serious community action!

  1. November 8, 2010 at 9:24 pm

    One thing to keep in mind is that most of the existing players (RootsMagic, Ohana, Legacy, etc.) have limited development resources. And most have spent a fair amount of effort building their existing gedcom parsers. I think a big reason that similar efforts in the past have failed to catch on is because they tried to do too much at once – a revolutionary approach instead of an evolutionary approach. You might get more traction with an evolutionary approach. For example, here’s a simple idea: a jar/zip file format for gedcoms that includes the media files linked-to from the gedcom. Currently, most programs simply include path names to these files in the gedcom, which makes zipping up this information and sending it to your collaborator exceedingly difficult. Having a jar/zip file format would require minor changes to the standard itself, but still yield big benefits.

    • November 8, 2010 at 9:45 pm

      Certainly good points. The fact that GEDCOM is proprietary and has been the only game in town for so long certainly has led to most applications put a lot of time and money into their engines to port GEDCOM-formatted information into their own databases.

      I think it’s safe to say that the most recent addition of the efforts to replace GEDCOM will take a long-term view but start with modest initial goal. My question in light of your comment is how reusable would previous GEDCOM development work be when combined with an off-the-shelf XML parser assuming a relatively similar data model? Not being a programmer, I would assume the devil would be in the details. Hopefully the developers who will assist in the project can help guide the project with the practical matters that would be particular pitfalls in their work.

  2. November 9, 2010 at 12:24 pm

    well, the first thing that WeRelate does is convert the gedcom into an XML model using a slightly-modified version of the slightly-modified version of Michael Kay’s SAX parser for GEDCOM available at http://lmonson.com/blog/?p=4. (The modifications we made involve fixing up a few ANSEL character conversions, which I’d be happy to share.)

    From the output of the SAX parser, we standardize and simplify the data model a bit before encoding it as XML. The gedcom data model is way too nested in my opinion. Most of the desktop record managers don’t take advantage of the more esoteric portions of the gedcom model, and the simpler XML schema that we use represents 99% of everything that gets uploaded into WeRelate, and makes our lives much simpler when it comes time to processing. I’d be happy to share our schema as well if anyone is interested.

    Finally, we break the XML document apart and store the XML elements for each person and family and source in the gedcom as XML “data islands” within our wiki pages.

    Bottom line, an XML standard for gedcom would be easy for me to adopt, especially if it more-or-less followed the gedcom data model — that is, you have people, families, sources, repositories, media objects, etc. as your main elements, and they have names, events, relationships, etc. as sub-elements. I think representing gedcom data in XML is a terrific idea because of the tools it makes available.

    I know different models have been proposed: a source-centric data model or an event-centric data model for example, and those models have merit. It’s just that an XML data model that continued to follow the existing person-centric gedcom data model would be easier for me, and probably others, to adopt.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: