My Blog

Welcome! What is this blog for/What will be blogged here? Well, frankly, only time will tell that ... ;-) The plan is to have stuff like ... what I plan to do/Interesting things that I want to share. My interests and hobbies ... and what I'm doing/not doing for it. Random thoughts/Opinions. Just about anything I feel like writing! have fun!

Saturday, March 12, 2005

Genealogy Ontology

What are the main things that would be necessary to document genealogical information?

1) Details about people
2) Events in their life
3) Their relationships
4) And information about where one got that information from.

1) Person Details

This is the most basic of all genealogical data. We need to identify the individual before we can document or do any research about them. The basic information will essentially be their name, gender, hair color, occupation etc..

Apart from individuals we need to capture information about families. A family is a basic unit for a lot of genealogical information.

2) Events

Most infomation about an individual will be captured as "events" in their life. This could be BMD information, career information (for e.g. a promotion detail).

An event could be associated with a person or a family; it could be a natural event - like a tsunami, storm; it could be a battle or a war; it could be an accident or a ship wreck.

An event could have multiple persons involved in multiple roles - for e.g., father/mother/god-father/god-mother/chaplain/self in a christening event.

An event could be a standalone event (change in a person's name), it could be a group event (battles in a war), it could be associated or a consequence of another event (birth/baptism, engagement/marriage, death/burial).

An event will be associated with a time. The time could be exact (Tom was born on Jan 1, 1763 at 3 pm), it could be approximate (Harry was born circa 1800), it could be a range or it could be relative to another event (one day before Easter 1985, got married when he was 25 years old). The data could also be partial - for e.g., Paula was born on Jan 25th. The time could be captured in different calendars - for e.g. Gregorian or Lunar Calendars.

An event will be associated with a location. The location could be captured by geographic or geo-political details. Note that the geo-political information would be time specific - since it could change over centuries. The location could be relative (near London, north of Delhi).

3) Relationships

Most of the relationship information will be captured indirectly as events. The spouse information will be captured as a marriage event; the parent-child and sibling relationship in a birth event.

Sometimes an event information may not be available for a relationship. For e.g., we might know that John is Joan's cousin, but have no information about the details of the relationship (is it paternal or maternal? who is Joan's father?). Also we might know that Patricia is Robert's youngest daughter - but have no idea is she's his youngest child (did thay have a son after her?) or how many siblings did Patricia have. Similarly we might know that Peter is Maurice's 3rd child, but have no idea about the remaining children. Or that Yin was Yang's grandfather - with no information about the parents.

Relationships will also capture relationship between person records or families (or even event records!) - it could state that person "Robert Smith" is the same person as "Robert Henry Smith".

4) Source

This is another critical data for genealogical information. An information is useless unless we know its source and authenticity.

A source could be a physical document - a book, manuscript, an official certificate or a will, a genealogical magazine, newspaper; it could be by word of mouth. It could be from a web site; or from an email. It could be a primary source (government certificate), a transcription or could be hearsay. It could be a photograph, a sound recording or a video.

Whenever a source is cited, its better to specify where in the source was that information found - which chapter, which page, which para, which volume, edition etc. In the case of physical documents, the location also must be captured (for e.g. NYPL).


Most genealogical information can be documented as statements. These statements can thus be stored as triples.

The ontology must, ideally, be able to handle some non-genealogical data - which would give more meaning or context to genealogical information. For e.g., it should be able to capture that Cawnpore and Kanpur are the same place - named differently at different times. Also that Mangalore belonged to the Madras Presidency during the Colonial period and now belongs to South Canara district within the Karnataka state in India.

Another example would be information like relationship between names (Rob is the shortform for Roberts, as is Bob) and that Robert is typically a guy's name.

Also details about the military - the battalians, the ranks and their orders. The locations in which a divison was posted.


It should also be able to capture and identify assertions. For instance if a person's birth details are not available, but in the marriage record the person's age is mentioned it would be possible to determine the person's approximate date of birth. This assertion can be captured and it should be possible to specify the source(s) of the assertion. The source of an assertion to could be another assertion(s). Another e.g., is the assertion that William's gender is male - based on the name.

It should be able to assign a degree of confidence to an assertion.


Post a Comment

<< Home