OpenAI’s New Model GPT-4o: Game-Changer for Free AI Access, Possible Handwritten Text Recognition (HTR) Advance

Today’s review of GPT-4o includes:
● a general overview of the tool,
● a closer examination of the model’s ability to recognize handwritten text, and
● a failure to confirm reported improvement of text rendering in images.

New model is fast, free, and improved

On Monday 13 May 2024, OpenAI made some waves with the announcement of their latest AI model release, GPT-4o (“o” for Omni, a nod to the integration of several models for text, image, and audio), a move that seems strategically timed to overshadow competitors. The most significant beneficiaries of this release are undoubtedly the free users. In a move that disrupts the status quo, OpenAI is rolling out features previously reserved for ChatGPT Plus subscribers. Over the next week, free-tier users will gain access to:

  • GPT-4 level intelligence
  • Web-integrated responses
  • Data analysis and chart creation
  • image-based interactions
  • File uploads for summarizing, writing, or analyzing
  • GPT Store and custom GPT exploration
  • Memory-enhanced interactions

While paying subscribers receive some updates (increased usage rates), including increased access to GPT-4o, the real spotlight is on free users. OpenAI describes GPT-4o as their new flagship model, boasting modest improvements over GPT-4-Turbo—primarily in speed and cost-efficiency.

This move raises questions about the value of the ChatGPT Plus subscription. The ability to create, save, and share custom GPTs might retain some subscribers (the ability to create and share custom GPTs remains a premium feature), but OpenAI will need to offer more to justify the cost. There’s already speculation about a rumored GPT-4.5o release in coming weeks, but we’ll see if that actually materializes (i.e., an interim release for paid subscribers until GPT-5 is unveiled, perhaps after the November 2024 U.S. presidential elections).

In the AI arena, the competition is heating up. OpenAI’s latest release might seem like a reactionary measure to competitor announcements, but it’s also a proactive step in staying ahead. The improvements in GPT-4o, particularly its speed, hint at the future potential for AI agents. Despite the lack of immediate breakthroughs in reasoning or memory, the accelerated response times are a significant leap forward.

Ethan Mollick, a notable figure in AI circles, highlighted the practical implications of today’s announcement. By removing the financial barrier to accessing GPT-4o, OpenAI is set to accelerate global adoption and address longstanding equity issues in education. This move could democratize AI, allowing more people to experiment with and benefit from these advanced tools.

As we navigate this season, the rapid advancements and strategic plays by industry leaders promise an exhilarating few weeks ahead. Whether you’re a seasoned AI enthusiast or a curious newcomer, the landscape is evolving faster than ever, and OpenAI’s latest moves ensure they remain at the forefront of this exciting journey.

Reports of some improvement in HTR is confirmed

The president and a co-founder of OpenAI, Greg Brockman, amplified early claims from some researchers that GPT-4o has improved handwritten text recognition (HTR) abilities, retweeting a post from Twitter user “Generative History” (@HistoryGPT), who made the claim, “GPT-4o is truly remarkable on 18th handwriting. I gave it the following letter and asked it for a transcription. A couple of very minor errors…amazing!”

I tested this claim.

To investigate the claim, I found a handwritten probate file mentioning my third-great-grandfather using FamilySearch’s Full-Text Search. I manually created an accurate transcription of the file. Then, I compared three AI-powered transcriptions, that provided by FamilySearch, one from the previous best OpenAI model, GPT-4, and finally the transcript from GPT-4o.

ModelErrors
FamilySearch22
GPT-417
GPT-4o9

The transcript provided by FamilySearch contained 22 errors; the transcript provided by GPT-4 was only marginally better than that with 17 errors. But the transcript provided by the new GPT-4o returned only nine errors.

Handwritten text recognition is hard. And a small handful of tests are not adequate to confirm an advance. But these cursory evaluations are encouraging, at least encouraging enough bring to the attention of the community for further scrutiny. So between FamilySearch’s rich trove of resources and OpenAI’s providing free access to GPT-4o, researchers have the ability to explore for themselves this possible advance in HTR.

Less encouraging results with text rendered in images

In early January during my talk about AI and genealogy, perhaps feeling the hope New Year’s, I made two predictions about advances in AI technology that I expected were reasonable to achieve in 2024. The first was my prediction that dead-easy drop-and-drag audio-to-text transcription would emerge; this capability has existed in many forms for a while, but none are free, dead-easy, or quick, yet the technology seems just on the cusp of greater accessibility. No suggestion is made that GPT-4o advances this goal. My second prediction was that the rendering of text in AI-generated images would be perfected this year; currently, rendering text in AI-generated images is too problematic to be consistently useful. This problem is reminiscent of the issue in 2022 that AI image generators had with drawing hands: you could have any number of fingers on a hand except five. This problem, however, was solved in the early summer of 2023; image generators now consistently render hands with the appropriate number of fingers. My prediction was that just as hand-rendering was solved, so would rendering of text in images be solved in 2024.

Some early claims have been made that the newly released GPT-4o had solved the text rendering problem in images.

I’m not so sure.

Click to enlarge

To test the claim, I prompted GPT-4o to recall the beginning of Lincoln’s Gettysburg Address, which, of course, it did correctly. But then I prompted GPT-4o to “Create a piece of folk art with the text of the beginning of the Gettysburg Address being the focus and subject of the piece.” Here is the result:

Click to enlarge

This test was a spectacular failure.

Nevertheless, I remain hopeful. I believe it is still reasonable to expect that these two capabilities will be achieved in 2024.

Regardless, the new emergent capabilities being discovered in generative AI models continues to increase and accelerate. So, I remain optimistic that generative artificial intelligence will rapidly continue to evolve, and that genealogists and family historians will continue to discover new usefulness and efficiency with these tools.

If you discover a new way to use these tools for family history and genealogy, please let me know in the comments. Or, better yet, join the community of nearly 7,000 folks following this area at Blaine Bettinger’s Facebook group “Genealogy and Artificial Intelligence.” I hope to see you there.


“Ashe, North Carolina, United States records,” images, FamilySearch (https://www.familysearch.org/ark:/61903/3:1:33S7-9PT3-9MZ1?view=explore : May 14, 2024), image 1110 of 1768; North Carolina. Division of Archives and History.
North Carolina
Ashe County

Pursuant to an order of the
Superior Court of Ashe County, directed to me I
have the honor to report that on the 14th December 1872
I proceeded to sell to the highest bidder on the
premises, one tract of land known as the Price Land containing sixty one acres
more or less, lying on the waters of the North
Fork of New River in Ashe County. The property
belonging to the heirs of Hugh Smith dec'd
and at said sale Mathias Little became
the last and highest bidder at the price of
two hundred and fifty two dollars and
executed his bond with Isaac Little as
security, due 14th Dec 1873, and made pay
able to me. That said sale was duly admitted
and in all respects fair and that the land
brought a fair price.

W H Gentry
Guardian
for heirs of Hugh Smith dec'd