Intelligence And The Amazing Promise Of Large Geospatial Models

In today’s column, I examine the advancement of large language models (LLMs) toward a new generation of large geospatial models (LGM), an exciting and innovative frontier for the advent and extension of AI.

Here’s the deal. Humans possess and greatly rely upon a sense of geospatial awareness and akin reasoning. We wrap that capability into our other powers of intelligence. Some would argue that they go hand-in-hand, namely that to some degree, our intelligence is spurred by our geospatial knack, and the ability to discern geospatial facets leans strongly into overall intelligence.

Conventional generative AI and LLMs don’t particularly have any such geospatial capacities. You might say that this is a vitally missing component. To solve this, we ought to leverage LLMs and connect them with or eventually intermix geospatial tendencies. The catchy moniker for that enterprising combination is said to be large geospatial models or LGMs.

Boom, drop the mic.

Let’s talk about it.

This analysis of an innovative proposition is part of my ongoing Forbes.com column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).

Human Geospatial Models That We Know And Love

I shall begin at the beginning.

Humans have mental geospatial models. You walk into a room and instantly start a visual scan to discern a myriad of crucial factors. How large is the room? Are there windows? Where does the room go? How far is it from where you are standing to the nearest exit from the room? On and on this goes.

Within split seconds, your mind is taking an immense amount of visual information and stitching it together to form a mental model of where things are. You might not know the exact measurements, but you generally gauge distances and sizes. Furthermore, cunningly, even if you can’t readily see some parts of the room, your mind fills in the missing elements with a rough estimation of what might be there.

We usually take all of this for granted.

If you watch a young child walk into a room, they will often stand still for a moment as they try to do the same thing. Their minds have not reached the same level of geospatial reasoning as yours and thus it takes them longer to figure stuff out. Indeed, they might not be able to do complex mental calculations like estimating what resides on the other side of a partition in the room or whether the room is twenty feet wide versus a hundred feet wide.

When my children were quite young, I was fascinated that they would seem to think that objects disappeared when a spied object was moved behind another one. If you took a chair and placed it behind a bookcase, this was equivalent to a magic circus act. Where did the chair go? It’s gone!

Adults don’t especially give much heightened thought to geospatial reasoning during an average day. It just happens in their heads. An interesting thing happens though when you find yourself in an unusual environment. For example, suppose you opt to go hiking on a trail you’ve never been on.

Your Spidey sense of geospatial thought goes into high gear. Where does the trail up ahead go? What is around the bend? How far is it from your present position to the next turn? A rush of geospatial considerations floods into your mind.

Some people are more adept at doing geospatial reasoning than others. Various research studies have sought to ferret out the role of general intelligence versus the specifics of geospatial intelligence. It would seem that higher general intelligence will tend to heighten geospatial intelligence, though this doesn’t necessarily follow one-to-one.

Large Geospatial Models Bridging An AI Gap

Shift gears, and let’s discuss generative AI.

You log into a generative AI app such as OpenAI’s ChatGPT and ask the AI questions about the room you are in. Even if the device you are using is armed with a video camera, the unadorned generative AI is not likely to do very well at gauging the geospatial contours of where you are. This just isn’t something the AI has been specifically data-trained and pattern-matched to do.

Aha, we need a specialized geospatial component.

Let’s unpack that concept.

The beauty of generative AI and LLMs is that you can converse with the AI in nearly fluent natural language. Most of the geospatial systems to date are not set up that way. Oftentimes, you must use arcane commands. The capabilities are hard to invoke and typically have an interface that is exasperating.

What we’d like to have is the amazing fluency of LLMs, plus the amazing geospatial capabilities of a specialized component. Great idea. Well, sure enough, AI researchers realize that this is a desirable combination. The aim is to make a geospatial model that is accessed either by an LLM or directly intermixed with an LLM.

Another thing to know about generative AI and LLMs is that they are considered large models due to the vast amount of data used to data train them, along with the massive size of the internal pattern-matched data structures (for my detailed explanation of these elements, see the link here). That’s how large language models got their moniker. Everything is in the large, else the fluency that you cherish is undoubtedly going to be clunky (efforts are underway to devise small language models or SLMs that are good though at a smaller scale, see my coverage at the link here).

The same is considered true for geospatial models.

The odds are that if you want a geospatial model that does amazing things, you’ll need to make everything about it on a large scale. With generative AI, you use large swaths of text on the Internet to data train the AI on human writing. With LGMs, you use large collections of spatial images to data train on geospatial facets. Likewise, both LLMs and LGMs have large internal data structures to house and make use of the patterns based on the examined data.

How An LGM Works For Doing What It Does

Do a fun and instructive mental exercise with me.

Imagine that we could somehow collect tons and tons of pictures based on tourists taking snapshots here and there. These tourists visited locales across the United States. We’ve got pictures of New York streets to California beaches. Thousands upon thousands, maybe millions upon millions of pictures. A lot of them.

If I grouped pictures that were of trails at the Grand Canyon, you would likely be able to get a reasonably good impression of what the trails were like. Pretend that I handed you a bunch of said photos.

What might you do with them?

You could estimate how long the trails were and where the rough spots were. Admittedly, there are going to be portions of the trails that aren’t shown in the collected pictures, but you would make your best guess based on what else you’ve seen of the trails. You would mentally stitch them together as best you could.

When I mention that this is geospatial information, I mean to say that not only do we have pictures as images, but we also have various associated meta-data.

For example, the meta-data includes the precise latitude and longitude of where the snapshot was taken. We need that kind of additional data to make sure that we are putting together the images in a sensible manner. It is like a jigsaw puzzle with scattered pieces, for which the images help to piece it all together, but we would feel a whole lot more secure by having the meta-data that goes with the images.

All right, you are no longer on the hook to be mentally crafting a semblance of the trails in the Grand Canyon. Go ahead and take a rest from the mental exertion, thanks.

Now, feed those same images and their meta-data into an LLM-like pattern-matching capability.

We would already have used many millions possibly billions of pictures or images and geospatial data to initially train this specialized LLM, actually an LGM. This helps so that the LGM already has a mathematical and computational semblance of what various types of scenes and geospatial settings are like. A trail at the Grand Canyon is much like a trail at Yosemite, at least in the sense that trails wind their way up and down hills and mountains, they are often relatively thin and lengthy, they are bound to have spots that are less clearcut than others, etc.

The LGM would then allow you to ask questions about the trails of the Grand Canyon. You might ask where the trail gets really hairy and scarily dangerous. Voila, the LGM would likely be able to answer that question. This might be based on actual images scanned or might be an estimate based on the stitched-together content.

Practical Day-To-Day Use Of LGMs

I’ll give you a quick example to highlight the value of LGMs.

A city planner is trying to figure out the potential impact of putting in a new park for the local town. This takes a lot more contemplation than just grabbing a map and declaring that this or that plot of land is now designated as a park.

Here we go as the city planner uses an LGM established for their town.

  • User entered prompt: “I’m planning a new public park in the downtown area. Can you visualize how it would look and analyze its impact on pedestrian traffic and nearby businesses?”
  • Generative AI LGM response: “Certainly! Would you like me to incorporate features like walking paths, trees, and playgrounds, or should I stick to a basic visualization?”
  • User entered prompt: “The park will be 200 feet by 500 feet and located on the currently vacant lot at Main Street and 5th Avenue. Include walking paths, trees, and a playground.”
  • Generative AI LGM response: “Understood. Here’s a 3D visualization of the proposed park integrated into the downtown area, based on current satellite imagery and urban design patterns.”
  • User entered prompt: “Looks great, thanks.”

Observe a few key points from that brief conversation.

The city planner fluently interacted with the LGM. This is just the same as you would do for any generative AI app or LLM. They didn’t need to know specialized commands. All that was required was to say in natural language what was taking place.

We will assume that the LGM was already data-trained on geospatial information about the town. It then took that pre-trained information and rejiggered things to create a visualization of the envisioned park. The city planner would then readily see what the park might look like.

Of course, the city planner’s job on this task has only begun and they have more to do.

Here we go.

  • User entered prompt: “Can you analyze how this park might affect pedestrian traffic patterns?”
  • Generative AI LGM response: “Based on similar urban projects and existing foot traffic data, I estimate a 35% increase in pedestrian activity around Main Street and 5th Avenue during weekdays and a 60% increase on weekends. Businesses within a two-block radius are likely to see a 15% rise in footfall.”
  • User entered prompt: “That’s promising. Can you simulate how this might affect parking demand in the area?”
  • Generative AI LGM response: “Based on the increased pedestrian activity, parking demand in nearby lots is expected to rise by 20%. The most affected zones will be east of 5th Avenue. I recommend adding 30 new parking spots or promoting alternative transport options to mitigate congestion.”
  • User entered prompt: “Thanks! One last thing: Can you simulate the park’s appearance 10 years after construction, assuming proper maintenance?”
  • Generative AI LGM response: “Here’s what the park would likely look like in 10 years after construction, based on prevailing climate, the selected tree species, and average urban park maintenance levels.”

I am betting you can readily discern the incredible value of having this LGM.

The city planner asked about foot traffic, and parking, and even wondered what the park might look like in ten years. All this interaction was easy. The AI or LGM did the heavy lifting in trying to gauge suitable answers to the questions posed by the city planner.

Research Of Keen Interest On LGMs

Remember that I had you do a quick mental exercise regarding pictures of the Grand Canyon?

There is something else that maybe you know about and perhaps even have participated in. Are you familiar with Pokémon GO? People wander around and try to capture Pokémon using their smartphones. The Pokémon characters are virtual and are found hovering or residing here or there, all over the place.

Imagine what could be done with all the zillions of pictures associated with those locations. The geospatial data would be immense. You could use that data as a hefty leg up on data training an LGM.

Turns out, yes, that’s what Niantic, maker of Pokémon GO, has opted to do. In a recent blog posting entitled “Building A Large Geospatial Model To Achieve Spatial Intelligence” by Eric Brachmann and Victor Adrian Prisacariu, Niantic blog, November 12, 2024, here’s what they said (excerpts):

  • “Large Geospatial Models will help computers perceive, comprehend, and navigate the physical world in a way that will seem equally advanced. Analogous to LLMs, geospatial models are built using vast amounts of raw data: billions of images of the world, all anchored to precise locations on the globe, are distilled into a large model that enables a location-based understanding of space, structures, and physical interactions.”
  • “Geospatial models are a step beyond even 3D vision models in that they capture 3D entities that are rooted in specific geographic locations and have a metric quality to them.”
  • “Over the past five years, Niantic has focused on building our Visual Positioning System (VPS), which uses a single image from a phone to determine its position and orientation using a 3D map built from people scanning interesting locations in our games and Scaniverse. For example, we recently started rolling out an experimental feature in Pokémon GO, called Pokémon Playgrounds, where the user can place Pokémon at a specific location, and they will remain there for others to see and interact with.”
  • “The path from LLMs to LGMs is another step in AI’s evolution.”

The exciting news is that leveraging all those geospatial images is a huge boon toward devising an LGM with incredible capabilities. Happy face.

Whether people realized they were contributing to this process is a story for a different day. You see, with conventional generative AI, most people don’t realize that the licensing agreements usually say that the AI maker can inspect your entered prompts, they can reuse what you do, and further train and enhance their generative AI and LLMs based on your efforts (see my discussion on the legality and ethics of this common practice, at the link here).

Just thought you’d like to know.

LGMs Are Going To Knock Off Our Socks

The emergence of LGMs is getting underway now.

If this is a topic of interest to you, I must say that you are in good luck. Why? Because this rapidly evolving realm is still in its infancy. You can get in on the ground floor. And it will grow like wildfire.

I do want to note that there is plenty of prior technology, mathematics, and other engineering and computational facets involved in this domain. Thus, you would be wise to get up-to-speed on the core fundamentals. Once you’ve got that under your belt, the AI becomes the icing on the cake. Then again, maybe that should be that AI is the cake rather than just the icing.

Whatever.

The gist is that combining geospatial intelligence and artificial intelligence is a match made in heaven.

I leave you with a famous quote by Nathaniel Hawthorne, which I believe befits this LGM situation and emergence: “Nobody, I think, ought to read poetry or look at pictures or statues, who cannot find a great deal more in them than the poet or artist has actually expressed.”

I bring up this clever line to emphasize that with advanced AI in the picture, as it were, there is a lot more that the AI can do. For example, rather than simply answering the questions that the city planner posed, the AI could easily be proactive and be asking and answering the needed questions.

Go ahead and make it your job to get AI and LGMs up to par.

Leave a Comment