New Use Case: Handwriting Recognition to Structured Data

ChatGPT with Vision (GPT-4V) analyzes a handwritten WWII draft card, and not only reads handwriting correctly but also accurately identifies text fields (name, address, next of kin, date of birth, occupation, etc.). This draft card, from my maternal grandfather Dean Lawrence (1921-2003) was chosen for the average block print handwriting; testing will continue to determine how good GPT-4V is with recognizing cursive handwriting, but it appears to handle average print well. Again, though, the innovation is that the AI will correctly label the data; that is, earlier OCR can process simple block handwriting, but it couldn’t tell you whether the text was name, address, or occupation. ChatGPT with Vision (GPT-4V) correctly labels the data. Now, imagine a simple Python script that processes a folder of 20 or 200 or 2000 images like this for you; that will be a trivial task once OpenAI makes API access available for GPT-4V, which is expected (more simply, that is, with middle-school programming skills, tasks such as these will be quickly and easily automated).

PROMPT: You are a professional genealogist; your area of expertise is documents and records; you are adept at extracting every piece of information from a document or record. You are also a skilled writer and image analyst. Think now about a World War II Draft Registration Card, also called DDS Form 1: list every conceivable piece of information that might be found on a draft card (focus especially on names, relationships, dates, places, events, and facts).

Then, find attached a draft card. Extract every piece of information from the image of the draft card.

Finally, create a biographical narrative report from the text on the draft card; dry recitation, no editorializing.

A couple of points to highlight:

  • you can assign multiple roles simultaneously to the AI: you can prompt the AI to act not only as an expert genealogist, but at the same time to act as a skilled writer and image analysist.
  • you can ask for several tasks, essentially stepping ChatGPT through a process.
  • you can ask for a narrative summary of the text in the image, e.g., “create a biographical narrative report from the text on the draft card; dry recitation, no editorializing.”

Here was the response:

RESPONSE:

Information Extracted from the Attached Draft Card:
1. Serial Number: 234
2. Full name of the registrant: Warren Dean Lawrence
3. Place of residence: West Jefferson, Ashe, NC
4. Mailing address: Same as residence
5. Telephone number: Not provided on this card
6. Age in years: 20
7. Date of Birth: May 21, 1921
8. Place of Birth: West Jefferson, USA
9. Name and address of someone who will always know the registrant's address: Mrs. Warren Dean Lawrence, West Jefferson, NC
10. Employer's name and address: Partially illegible but appears to reference a mining company
11. Place of employment or business: West Jefferson, Ashe, NC
12. Signature of the registrant: Appears to be "Dean Lawrence" or a similar variant
13. Date of registration: Not clearly visible on this card

Biographical Narrative Report:
Warren Dean Lawrence, residing in West Jefferson, located in Ashe County, North Carolina, registered for the World War II draft. Born on May 21, 1921, in West Jefferson, USA, he was 20 years old at the time of registration. His mailing address was the same as his place of residence. The individual who would always be aware of Warren's address was Mrs. Warren Dean Lawrence, also of West Jefferson. Warren's employment details reference what appears to be a mining company located in West Jefferson, Ashe, North Carolina. The draft card carries what seems to be his signature, "Dean Lawrence" or a similar variant.

Prompt Chain to process data further

Once we have the “Information Extracted from the Attached Draft Card,” that information can be standardized, so that if you have many sources of this type (in this example, a draft card), the data can be collected, compiled, compared, and analyzed. For example, if you had 20 of these cards, you may wish to move that extracted data into a spreadsheet or database. Saving the data for each record as a CSV (Comma Separated File, a widely-used file format) file allows us to do this simple task. In the instance, I followed-up the first prompt with a second prompt; using the results of one task as the input for a subsequent task is called prompt chaining. Here was my second prompt:

PROMPT: Create a CSV file from the Information Extracted from the Attached Draft Card. Use the format: "FIELD NAME","VALUE", where FIELD NAME is the field label used in the original image and VALUE is the value extracted from the image, such as "NAME","Warren Dean Lawrence". Display the response in a code window for easy copy-and-paste.

ChatGPT responds with the data in the format of a CSV file, and places that information in a code window, which makes copying-and-pasting easy (just click the “đź“‹ Copy code” icon). One beta mode of ChatGPT, called “Advanced Data Analysis” (formerly “Code Interpreter”), will create an actual CSV file and present you with a download link, but for now we can only use one beta feature at a time; expect that to get better in time.

Up Next: Clean, simple cursive handwriting on a draft card

There remain many document types to test. I’ve been testing more challenging samples to discover where the limit of ChatGPT’s handwriting recognition. Up next is a clean, simple cursive script on a form such as this draft card. Ultimately, however, we will test handwritten documents such as letters, diaries, journals, court orders, and probate files. Handwriting recognition is hard, so I expect the limit might be discovered sooner than anticipated, for the time being.


For the record, here is a screenshot of the beginning of this ChatGPT conversation:

One thought on “New Use Case: Handwriting Recognition to Structured Data

Comments are closed.