Months of waiting came to an end on Tuesday 3 October when I finally got to test ChatGPT with Vision (GPT-4V). This version of ChatGPT can now “See, Hear, and Speak.” I spent a few hours getting acquainted with GPT-4V. This report provides a brief overview of my experience, though there’s much more to explore.
Introduction to GPT-4 with Vision and OCR
ChatGPT with Vision isn’t just your average virtual assistant. It can hold conversations, process vast amounts of information, and even boasts a robust Optical Character Recognition (OCR) feature. With these capabilities, I decided to explore the possibilities of extracting genealogical data from visual charts.
The initial test was to attempt to extract data from an image, specifically, to extract biographical data from a photo of a genealogical chart and to save that data (names, dates, places, relationships) in a format that would be useful to a genealogist, in this attempt, an Ahnentafel list (a simple list to track ancestors).
Initial Fan Chart (partial) Failure
My first trial was with an image of fan chart. As beautiful as these circular wonders are, the curved text became a challenge for our OCR endeavor. Though GPT-4V got much correct, the curved nature of the fan chart’s text near the center made it difficult for it to capture the names accurately.
The results showed promise, but were not immediately useful. And that is typical. I NEVER get a prompt perfect on my first attempt, and it often requires several iterations of prompt refinements to get the quality of result desired. You can see from the results below that the AI did fine with the text that wasn’t curved, but it had trouble with the curved text near the center of the fan chart.
I strongly suspect this failure could be fixed, but my interest last night was to quickly find a successful use case that worked on a first attempt. So I moved onto a more simple challenge: a screenshot of a pedigree chart.
Success: Pedigree Charts and the Ahnentafel System
Recognizing the limitations, I shifted focus to pedigree charts, which present data in a more linear fashion. I first had ChatGPT note the value of the Ahnentafel system, a numerical method to track ancestors; this review has the effect of giving the AI a reminder of how an Ahnentafel list might be composed. Using this system, we set out to capture data from a pedigree chart and format it in an Ahnentafel list.
Here is the prompt I used; with GPT-4V an image can also be uploaded with the prompt. The screenshot above was included with this prompt:
PROMPT: Okay, I've got a simpler chart. First, tell me what you know about the Ahnentafel naming system. Think, then, too, about how the data in an image of a pedigree chart could be extracted via OCR and placed and stored in an Ahnentafel list. Find the attached image of a pedigree chart, extract the names, dates, places, and relationships, and place and store them in a Ahnentafel list (plain text is fine).
I was very pleased with the response. No, that’s an understatement–I was blown away by the response, on a first attempt:
The good news: ChatGPT (GPT-4V) OCR can effectively interpret an image of a pedigree chart, extracting the data and storing it accurately in an Ahnentafel file while preserving the relationship information inherent in the pedigree chart. All details accurate; no hallucinations.
This is significant. Because it is a relatively trivial task to then convert an Ahnentafel file to a GEDCOM, database, spreadsheet, or text file, the information in the image is now almost ready for import into your genealogy program (RootsMagic, Family Tree Maker, Gramps, etc..), Excel or Google Sheets, GDAT, Word, or simple text editor.
Data Extraction On-the-Go, with Your Phone
What’s more, you can do this on your phone! Here, with my smartphone, I took a picture of my laptop screen while a pedigree chart was displayed; GPT-4V correctly extracted the names, dates, and relationships from the photo, and then quickly presented it in a loose narrative report. The AI even picked-up on (correctly) and commented about the possibility of pedigree collapse and/or multiple relationships. All details accurate; no hallucinations. (You can see the full-size image here.)
Next Steps: More Tests; Implications; Possibilities
Next on my list: images of charts on paper, and neatly handwritten pedigree charts, etc.
Last night’s demo or proof-of-concept of extracting and saving biographical data in a genealogy-friendly format which preserves relationship information (the Ahnentafel file) from a picture or screenshot also suggests both clear implications and coming possibilities. A clear implication is that it is now much easier to get information off a printed page and onto the computer in a way that is genealogically meaningful because of the preservation of relationship information (inherently, the pedigree chart depicts who are the parents of whom, and this is captured and saved). A coming possibility suggests itself when we remember that API access to GPT-4V is coming, which means that we will be able to build apps and tools that process folders of our saved images and photos, or perhaps ask an AI assistant to do that for us.
Setting aside future possibilities, there are exciting days coming up now as we test other image use cases and work out the solutions to limits such as encountered with the fan chart. And folks will immediately find helpful this use case of converting an image of of pedigree chart to an Ahnentafel file.
If you are a ChatGPT Plus user, here is how you will know that GPT-4V has been rolled-out to your account (a process that OpenAI has said will take a couple of weeks). On your computer, tablet, or smartphone, look for a new image/picture icon near your prompt window. Here is what it looks like on a computer:
And on your phone, it looks like this: