GPT-4: A Revolutionary Tool for Genealogists and Family Historians
I had fun making a couple of discoveries this week while exploring the new GPT-4. As someone with lifelong interests in the worlds of linguistics, language, computers, programming, writing, storytelling, genealogy, and family history, it has been surreal to see these interests come together, creating a truly transformative experience. This blog post explores the capabilities of GPT-4 in processing GEDCOM files, generating reports from GEDCOM data, matching narrative styles to setting and location, and including accurate source citations in reports.
GPT-4 and GEDCOM Files (Write Stories from Family Trees, and Create Family Trees from Stories!)
GEDCOM, or Genealogical Data Communication, is a plain text file format used by genealogy software to store and exchange family tree data. GPT-4 has proven its ability to create, read, and interpret GEDCOM files, even from simple prompts. For example, PROMPT: “Create a GEDCOM file for John Smith, born 1863, father unknown, mother was Riley Bower.”
The AI was able to generate a GEDCOM file based on minimal input. I was then able to import it successfully into genealogy database software, Gramps. (The GEDCOM would also work with Family Tree Maker, RootsMagic, or ANY genealogy software from the past couple of decades).
Writing Reports from GEDCOM Data
Moreover, GPT-4 can write family stories and narratives based on GEDCOM data. For example, I exported a stripped-down four-generation GEDCOM from a RootsMagic file (that is, the export had no sources or notes–for now). Then, I dumped the text of the GEDCOM into ChatGPT (model GPT-4):
Included with the raw GEDCOM data was a simple PROMPT: “Use the information in the file to write a biographical narrative about the ancestors of Mont Little.” ChatGPT responded with a perfectly punctuated narrative report.
Matching Writing Styles to Location and Setting
In another test, GPT-4 produced a flawless descendancy narrative for William Harrison Goodman and his descendants, demonstrating its reliability and accuracy in handling genealogical information. ChatGPT relied only on the information in the GEDCOM file and did not hallucinate facts. But the writing style can be a bit dull, dry, and academic. So I asked ChatGPT to spice up the style.
GPT-4’s versatility extends to its ability to adapt its writing style according to specific requests. For instance, when asked to rewrite a narrative in a folksy, conversational, and upbeat style reflecting rural Appalachia, GPT-4 delivered as expected. This feature allows for a more engaging and personalized reading experience when working with genealogical narratives.
For example, after generating a perfectly functional report, albeit a bit dry, I asked ChatGPT to try again, with the PROMPT: “Re-write that to have the narrative style mirror and echo the setting: rural Appalachia; that is, make the style folksy, conversational, and upbeat.”
Some care needs to be taken that the AI is only using what’s in your GEDCOM for the underlying names, dates, places, and events (assuming your GEDCOM is well-sourced and solidly researched). Your first PROMPT would be something like, “Using only the information in the GEDCOM below, write the story of Joe Sixpack.” Then, after it’s generated a basic story, you can add style by prompting it, “Re-write that with a tone that is X, Y, and Z and a style that is A, B, and C.” For example, “Re-write that in a tone that is reflective and somber yet grateful and a style that is clipped and direct yet with an elevated vocabulary.” The two steps–basic story, then adding style–is helpful because the AI’s now are “spicy autocompletes” and will invent facts (“hallucinate” is the jargon some use) unless given a narrow task (e.g., “using the GEDCOM below”). You want the AI to spice-up the style of the writing, not the lives of your ancestors (by hallucinating events that never happened).
ChatGPT (model GPT-4) Does Footnote and Endnote Source Citations included in a GEDCOM file
As GPT-4 can successfully read and interpret relationships in GEDCOM files, the next step is to include sources in the GEDCOM export and ask the AI to generate footnotes and citations. It passes this proof of concept, too. A small, two-generation GEDCOM file was created for my grandmother, Ruby Helen Bower, including her parents and her husband. Also included in the GEDCOM were source citations for her birth, marriage, and death certificates (these were NOT full citations, but enough information to demonstrate proof of concept). ChatGPT (model GPT-4) first wrote a plain narrative biographical report based on the GEDCOM file, which it did fine. Then, I instructed it to use Chicago Manual of Style format to generate inline superscript reference note numbers and provide their corresponding reference notes or source citations as endnotes. It worked!
[EDIT: See below for correction and clarification from Elizabeth Shown Mills concerning the wording of the original PROMPT. This original wording is left in the screenshot as first captured in initial testing.]
PROMPT: “Re-write the narrative report to include inline superscript reference note numbers and provide their corresponding reference notes or source citations as endnotes.”
While the current ChatGPT version has a limit of about 1500 words for input and output, GPT-4 will soon allow processing of up to 50 pages at a time. This increased capacity will enable the handling of larger GEDCOM files, potentially encompassing hundreds or even thousands of people.
Conclusion and Future Possibilities
GPT-4 is proving itself as a powerful tool for genealogists and family historians, capable of reading and interpreting GEDCOM files, generating family stories and reports, and adapting writing styles to suit specific settings. As GPT-4’s processing capacity increases, it will become an even more valuable resource for genealogy enthusiasts, accommodating larger GEDCOM files and opening up new possibilities for research and storytelling.
In the meantime, users can explore working with smaller GEDCOM files, customizing exports according to their genealogy software, and experimenting with the AI’s capabilities. The convergence of genealogy, technology, and storytelling is an exciting development, and GPT-4 is at the forefront of this revolution.
I’m especially looking forward to text extraction of visual records, such as birth, marriage, and death certificates. And texting of its handwriting recognition will be exciting. If you’re like me, your computer might have a folder full of record images (birth, marriage, death certificates, etc.); it’s not hard to imagine asking an AI script to process that folder of records and generate a sourced GEDCOM file suggesting the relationships between everyone mentioned in the folder.
It’s going to be a crazy year.
[CLARIFICATION AND CORRECTION concerning the original citation prompt]: I was glad to receive feedback from Elizabeth Shown Mills concerning the wording of my prompt to have ChatGPT include citation information. She wrote: Stephen, there’s a misunderstanding expressed in your prompt. CMOS is not ‘the style used to document source citations in genealogical writing.’ Basic citations (i.e., those to printed works) can follow MLA, CMOS’s humanities style (as opposed to CMOS’ scientific style), Blue Book, or various others. Most citation guides structure citations to publications in much the same way. However, they work only for printed works and some basic materials in formal archives. None of these offer usable citations to the plethora of original documents that are essential to genealogy. That said, Chicago Manual of Style IS the *style* guide commonly used by genealogists for issues such as punctuation, numbering, abbreviation, quotation, alphabetizing, indexing, etc. (these being the issues that cover 14 of CMOS’s 16 chapters).”
I can also confirm the running the prompt without mention to any specific style guide, mentioning only the desire to “generate inline superscript reference note numbers and provide their corresponding reference notes or source citations as endnotes” was enough to nudge ChatGPT to include the simple, sample source information in the GEDCOM:
PS: Getting my citation knuckles rapped by Elisabeth Shown Mills is, truly, one of the greatest thrills of my genealogical life.