This article was originally published by Mark Thompson at MakingFamilyHistory.com
A common goal when working on family archive projects is to figure out who the people are that are included in photographs or mentioned in letters. Identifying them can be crucial to answering questions about your family history and can lead to new clues to follow up on.
Although, as anyone who has tried to do this knows, it can be a painstaking and frustrating process. Letters rarely include the information needed to identify everyone. As letters are usually between people who know everyone they’re writing about, they tend not to include last names, or even worse, first names.
I’ve developed Excel-based approaches over the years for identifying groups of people in these situations. Although, they can be time-consuming to put together, and require specialty skills in Excel to use them.
It occurred to me that I might be able to use artificial intelligence to identify people more easily. This blog post will describe how I tested this idea and what I learned that could help you do the same in your own genealogy research.
Before diving into the fictitious example I used to test this idea, it is helpful to understand how this kind of research is done “manually.”
How to Manually Identify People in a Family Archive
While there are many approaches for identifying people mentioned in a letter, one of the most common techniques used by genealogists is to figure out how the different people mentioned in the letter may be related. They might be friends, co-workers, or family. If you can figure out how they are connected, it is easier to figure out who they are.
After all, it’s easier to find several needles tied together in a haystack than it is to find an individual needle in the haystack.
How to Identify Family Members
For the rest of this article, the names that I use will all be taken from this example, fictitious, family tree.
When I suspect that people mentioned in a letter could be family members, I compare the names of the people to their family tree and try to find close relatives with those names. The assumption is that people tend to write about their immediate family.
For example, let’s say a letter written by “Clara” includes a line that says, “Walter and I went down to the train station to pick up Joe.” While the family tree might include dozens of people named Joe, Walter, and Clara; it will have fewer families (and hopefully, only one) where they are all immediate family members.
While this approach makes intuitive sense, it isn’t easy to use in practice. The problem is that online trees don’t have a button labeled, “Show me all of the families that have a person named Joe, Walter, and Clara in them.” As such, this approach requires repetitive searches of a family tree looking for clues about which family group might be the right one. Alternatively, spreadsheets, or third-party tools designed for this kind of search, can also be used.
Given the challenges in doing these searches the manual way, I decided to see if this problem could be solved more easily using artificial intelligence tools.
Using ChatGPT to Search a Family Tree
Given that ChatGPT is particularly good at finding patterns, it seemed a natural fit for this type of search. Now the big question was how to get ChatGPT to search a family tree? I needed a plan.
Prompt Planning with ChatGPT
Whenever I am working out how to approach a problem using ChatGPT, I think about the following:
- How will I provide the AI with the information needed to do the work?
- Which role should the AI take on when performing the work?
- What is the work that I want the AI to do with the information I provide it?
- What is the format that I want the AI to present its findings in?
I’ll walk through my planning process step by step so that you can try this, or something similar, yourself.
How to Give ChatGPT Your Family Tree
The first, and most difficult challenge in this example was to get ChatGPT the information from the family tree. To do the kind of search that I’m interested in, ChatGPT needed to be able to see the people in the tree and understand their relationship to each other.
Thankfully, there is a family tree format that ChatGPT can understand.
The GEDCOM File Format
The Genealogical Data Communications format, or GEDCOM for short, was created by the Church of Jesus Christ of Latter-day Saints as a way for exchanging family tree information between computer programs. The information that can be transferred using this format includes information about the people in the tree, (like name, and birth, marriage, and death dates) as well as information about the relationships between the people (like parent, child, and family group).
Of all of the formats for family tree information in use today, you may wonder why GEDCOM is good to use with ChatGPT?
Why GEDCOM Works Well with ChatGPT
Besides the fact that GEDCOM files contain the information needed for this search, there are a few things about the GEDCOM format that make it well-suited to working with ChatGPT.
- GEDCOM is a text-based format and ChatGPT excels at working with text.
- Even though ChatGPT was trained on information created several years ago, the version of GEDCOM used by the major genealogy companies is several years old. This means that the information that ChatGPT has in its training data is still accurate today.
- The GEDCOM format, which has been in use for 40 years, has been the subject of thousands of online articles. In fact, Google found over 4 million web pages that mention GEDCOM! As a result, ChatGPT has been well-trained in how the GEDCOM format works.
Now that we know that the GEDCOM format is a good way to provide information to ChatGPT, how do we get our family tree into the GEDCOM format?
How to Export an Ancestry Family Tree in GEDCOM Format
To generate a GEDCOM file of my family tree at Ancestry, I completed the following steps. Starting within Ancestry’s Tree View:
- Select the three dots menu in the left navigation bar.
- Select the “Tree Settings” menu.
- Click the “Export tree” link.
- Click the “Download your GEDCOM file” button.
Note, that depending on the size of your tree, it can take some time to create the export file before it can be downloaded.
I then saved the GEDCOM file to a known location on my computer, so that I could use it in the next steps.
Now that my family tree was exported to GEDCOM format, it was time to build the ChatGPT prompt.
Assigning a Role to ChatGPT
The process of building a prompt, often referred to as prompt engineering, starts by assigning a role for the AI to adopt when performing the work. The role assignment is important to do first because it sets the context for how additional instructions will be understood by the AI.
To understand why role assignment is important, consider how you ask different people to do work for you. For example, the way that you would ask your 12-year-old child to clean up your yard would be different than the way you would ask a person who works for a professional yard cleaning service. Even though you have very similar goals for them, you would phrase the request for each of them in a very different way because of who they are.
As I was trying to form a complex genealogy search of a GEDCOM file, I wanted ChatGPT to assume a role that specialized in this type of work:
PROMPT: Please act in the role of a professional genealogist who has a deep understanding of the GEDCOM file format.
While the work portion of the prompt seems incomplete by itself, combined with the role assignment in the first step, it had the context to be understood by ChatGPT.
Finally, I needed to tell ChatGPT how I wanted the information it found to be presented to me.
How Should the AI Present Its Findings?
In this example, my goal was to look at the results for clues about family groups. So, I wanted the response to include information about the relationships between the people found. And, because I was going to do these searches frequently, I wanted the results to be easy to interpret at a glance.
PROMPT: Should you find a family group that you believe includes these people, create a table that lists the full name of the people in the family group in one column, and their relationship to Joe in another column.
The Complete Prompt
After testing several different approaches, this is the final prompt. Note, that I’ve shared some of the failed attempts at the end of the article in the “Challenges to be Aware of” section.
PROMPT: Please act in the role of a professional genealogist who has a deep understanding of the GEDCOM file format.
I would like you to analyze the file that I will provide to you next. Please search the file for family groups that include the names Joe, Walter, and Clara.
Should you find a family group that you believe includes these people, create a table that lists the full name of the people in the family group in one column, and their relationship to Joe in another column.
After submitting the prompt, I opened the GEDCOM file in a text editor so that it would be easy to copy the file to provide it to ChatGPT. In my case, I used Notepad++, but you could do this with Notepad, Wordpad, or any other text editor.
Once I had selected and copied all of the text, I pasted the text directly into ChatGPT’s prompt box and then clicked the submit button.
I was very happy to see that it found the correct family group, and displayed them in a way that was easy to confirm and check for additional clues!
Other Complex Searches Tested
I tried several different complex searches that I regularly come up against when doing this kind of project.
Finding More Than One Family Group
In a real-world search, it is likely that I would find more than one family group that included the names that I was looking for.
PROMPT: Please act in the role of a professional genealogist who has a deep understanding of the GEDCOM file format.
Please search for family groups that include the name Terry.
Should you find a family group that you believe includes this person, create a table that lists all of the people in the family group in one column, and their relationship to that person in the other column. Should you find more than one family group, create an additional table for each additional family group.
Focus on People That Were Alive at The Time
One way to zero in on the right family group is to only include people who were alive at the time the letter was written.
PROMPT: Please act in the role of a professional genealogist who has a deep understanding of the GEDCOM file format.
Please search for family groups that include a person named Terry who was alive in 1960.
Should you find a family group that you believe includes this person, create a table that lists all of the people in the family group in one column, and their relationship to that person in the other column. Should you find more than one family group, create an additional table for each additional family group.
This response is particularly interesting as it shows that ChatGPT, acting in the role of a genealogist, knows how to interpret a request for living people.
This is a textbook example of a natural language search.
And many, many more…
I tried several other, increasingly complex searches, and they all worked as long as the information that I was searching for was included in the family tree.
Challenges to Be Aware Of
ChatGPT Plus Can Only Accept 25,000 Characters
The most important limitation to be aware of is that there is a limit on how much text you can ask ChatGPT to process. Because I used ChatGPT Plus in my testing, the limit is about 25,000 characters.
When I tried this test with a sample tree with hundreds of people in hundreds of family groups, ChatGPT said it was “too big.” When I performed my test on a smaller family tree with only a few dozen family groups, it worked successfully. As a result, I consider the approach used in this article a good proof of concept for me, and others, to use as a starting point for searches of larger and more family trees.
There is also a less well-understood limit that is based on the “complexity” of the file being processed. In the context of a GEDCOM, I believe this comes into play when there are more types of facts, and more relationships between people to track. Although, I wasn’t able to find a clear description of this limitation.
I expect that as ChatGPT evolves, these limitations will decrease.
GEDCOM Files Might Contain Personal Information
Be careful when exporting your family tree GEDCOM file format. If your tree contains private information, so will your GEDCOM file.
Ancestry’s GEDCOM export utility exports all the facts in your tree. If you would like to export a portion of your family tree, or only certain facts from your family tree, you will need to use a tool that supports this.
Family Tree Maker, for example, supports the partial export of a family tree.
Exercise Caution with Follow-up Searches
My testing was most successful when I used a fresh chat session in ChatGPT. When I tried follow-up searches in the same chat session, errors in the responses went up dramatically. When I spotted mistakes, I asked ChatGPT to double-check its results and explain how it came to its conclusion. In every case, it found the correct answer on the second try and apologized for its mistake.
Because of this issue, I quickly learned to follow up each response with a “please double check and explain your results” prompt.
Based on my testing, I believe that there are two likely sources for these errors:
- The errors might come from answers generated in previous prompts. In other words, ChatGPT mixed its previous responses up with my subsequent requests.
- The amount of information in the session grew too long after multiple requests, so information was being “forgotten” because of the space taken up by previous questions.
Final Thoughts
ChatGPT Plus can read and search GEDCOM formatted family trees and correctly interpret the genealogical information in them.
It can also do complex searches of family trees using natural language. These complex queries can include searches for multiple people, family groups, relationships between people, and the time or place that people lived.
False responses were generated by some of the tests, especially when multiple follow up questions were used in the same chat session.
Like all genealogy research, results from ChatGPT need to be treated as clues that require further investigation before they can be relied upon as fact.
At the time of this writing, the approach used in this article is limited to “small” trees. Alternative approaches, or improvements to ChatGPT Plus, will be necessary to search larger trees.
I’d Love To Hear From You
Have you tried any alternative approaches for complex searches of a family tree?
Do you know of a way to search larger family trees with a different approach? If so, please let me know in the comments below.
Did You Find This article helpful?
One thought on “Simplify Complex Family Tree Searches Using ChatGPT”
Comments are closed.