Connect with us

Boston, MA

Tools for Your To Do List with Spot and Gemini Robotics | Boston Dynamics

Published

on

Tools for Your To Do List with Spot and Gemini Robotics | Boston Dynamics


For an industrial robot built for the rigors of factories and power plants, tidying up a living room may seem like a light day at the office for Spot. Yet, a recent video of the robot picking up shoes and soda cans in a residential home represents the promise of AI models in robotics. In this case, Google’s visual-language model (VLM) Gemini Robotics-ER 1.5 was empowering Spot with embodied reasoning.

This particular demo grew out of a 2025 hackathon at Boston Dynamics that built on prior projects using Large Language Models (LLMs) and Visual Foundation Models (VFMs) to enable Spot to contextualize its environment and engage in more complex autonomous actions than a typical Autowalk mission. Rather than write formal software logic or a “state machine” program that defines each step of a given task, we interacted with Gemini Robotics using conversational language. In turn, it communicated with Spot on our behalf.

A Robust SDK and Natural Language Prompts Save Time

Using Spot’s SDK, we developed a layer that facilitated interaction between Gemini Robotics and Spot’s application programming interface (API). The API normally gives developers access to the robot’s capabilities to create custom applications or behaviors. For example, researchers at Meta have used Spot to test how an AI system could locate and retrieve objects it had never seen before.

Advertisement

Our ability to engage Gemini Robotics using natural language prompts was a huge timesaver, compared to traditional programming. We told Gemini Robotics it had access to a mobile robot equipped with cameras and a robotic arm. It also had a finite set of tools it could use to control the robot. A tool is a lightweight script that performs some internal logic and translates inputs from Gemini Robotics to actual API calls. We limited the actions to navigating between locations, capturing images, identifying objects, grasping them, and placing them somewhere else. 

The extent of our SDK means there are great examples one could leverage to add more access to the API with minimal development.

Giving Gemini Robotics a Baseline

To start we needed to explain to Gemini Robotics what we wanted it to do. We did experience a learning curve when writing these baseline prompts. Simple instructions like “put down an object” or “take a picture” weren’t detailed enough to produce expected behavior. We had to add context in our descriptions as we refined each tool. 

A good example is the detailed prompt for the “TakePicture” tool:

This command will cause the robot to take a picture with the specified camera. There is some nuance to choosing the correct camera. Once arriving at a location using GoTo, you should always start by taking a picture with the gripper camera, because it's the most informative.
If the robot has arrived at location and is already holding an object, you can do one of two things:
1. Immediately call PutDown
2. Search the area with either of the front cameras. The front cameras are low to the ground, so if you're trying to put things on an elevated surface, they won't give you useful information.

In this example, we gave Gemini Robotics no detailed description of the robot’s chassis or arm. Instead, we simply explained that Spot’s front cameras would be too low to photograph objects on elevated surfaces. We were able to iterate rapidly, as small changes in wording produced noticeably better results. Once it had this set of basic tools through the API, Gemini Robotics could sequence Spot’s actions and follow the handwritten instructions on a whiteboard on the day of the demonstration.

Advertisement

How Gemini Robotics and Spot Collaborate

Until the robot powers on, Gemini Robotics has no context for what specific tasks we might ask it to perform in a given demo. We only provided simple written instructions, such as, “Make sure all of the shoes at the front door are on the shoe rack.” Gemini Robotics evaluated images from Spot’s cameras and identified objects in the scene that matched the instructions. These objects became the reference points for Spot’s navigational and manipulation systems.

In many respects, Gemini Robotics was identical to an operator manually driving Spot using its tablet controller. For example, to pick up an object with Spot, an operator positions the robot near the object and then uses a grasp wizard to identify the target object. The operator provides high-level direction and Spot figures out the exact details. In this demonstration, Gemini Robotics functioned as both the operator and the tablet sending commands to the robot. This freed us up to act more like a team lead, providing a high-level to-do list and trusting Spot and Gemini Robotics do the rest.

Call and Response

When Gemini Robotics engages a given tool, the tool responds with results and context, such as, “I picked up the object,” or “I can’t pick up something while my hand is full.” Gemini Robotics then makes adjustments on the fly based on this feedback from Spot. For example, to pick up shoes, Gemini Robotics requests an image, identifies the shoes in that image, and calls the “pickup” command. By creating fundamental tools that semantically flow in conversation,  Gemini Robotics can manage the sequence of tasks required to clean up the room. Spot’s existing software stack manages the locomotion, navigation, and manipulation of the robot itself.

It’s important to note Gemini Robotics has strict boundaries in this scenario. It can’t invent new capabilities or control Spot beyond what is available through the API. This keeps Spot’s behavior predictable, while still allowing Gemini Robotics to adapt to different situations.

A Force Multiplier for Developers

For developers already working with Spot, this research has tremendous potential. Through Spot’s SDK, they have access to a robust toolkit of capabilities. Companies use these tools today to build applications for inspection, research, and industrial data analysis, among others.

An AI model like Gemini Robotics offers a way to expand those applications more rapidly. Rather than write extensive task logic on top of Spot’s APIs, developers can experiment with having AI systems interpret natural language instructions and dynamically choose to engage the robot. As a result, models like Gemini Robotics can act as force multipliers, amplifying the reliable toolkit and robust performance that is already delivering value for Boston Dynamics customers.

Advertisement

Our Next-Token Prediction for Spot and Gemini Robotics

Although this is still an experimental step and not a hardened application, it illustrates a compelling direction for robotics and physical AI. Robots like Spot are already extremely capable of navigating complex and changeable environments, collecting data and sensor readings, and manipulating objects. Rather than reinventing the wheel, AI foundation models offer a new way to expand these capabilities in new settings and to new applications.

Physical AI is a rapidly evolving field and our team is leading the way in the lab and in real applications of AI empowered robots. While we are early in our formal partnership with Google Deepmind, we’re excited for what the future holds with Atlas and we’ve already rolled out practical enhancements for Spot and Orbit, with AIVI-Learning powered by Google Gemini Robotics ER 1.6. This next evolution of our AI Visual Inspection tool unlocks a new level of visual intelligence, as users benefit from shared expertise bringing a deeper level of contextual intelligence to Spot and Orbit. Model improvements automatically happen behind the scenes, adding more capabilities to the same software and hardware.

Today, this demo points to a future where users can rely more on natural language to guide Spot’s actions, rather than complex code. The engineer’s role shifts toward setting goals and objectives. The multi-modal robot foundation model interprets the instructions to form complex and adaptive plans and Spot executes the action.

This article was contributed by Issac Ross and Nikhil Devraj, engineers on the Spot team.

Advertisement



Source link

Boston, MA

With Columbia Threadneedle out, Boston Triathlon director is looking for a new sponsor – The Boston Globe

Published

on

With Columbia Threadneedle out, Boston Triathlon director is looking for a new sponsor – The Boston Globe


Michael O’Neil is on the hunt for the next John Hancock.

As many Boston sports fans know, the insurance company first sponsored the Boston Marathon 40 years ago, helping usher in the modern professional era of the race as well as tens of millions of dollars in community fund-raising each year.

O’Neil wants to make a similar leap for the race he runs, the Boston Triathlon. This will be the first year without a naming-rights sponsor after nine years with Ameriprise Financial-owned Columbia Threadneedle Investments. O’Neil is seeking a successor that can help make an impact on the race the way Hancock once did with the marathon, a sponsorship role now played by Bank of America.

“We’re looking for that next transformational partner that wants to do something like that,” O’Neil said.

Advertisement

The 18-year-old triathlon draws nearly 2,500 athletes to Carson Beach in South Boston each August, for sprint and Olympic-distance triathlons, and also features free kids’ races the day before at the same location; Amazon has been a big sponsor for the “Kids Day” events.

O’Neil says he would like to extend the race beyond loops in South Boston to showcase more of the city and boost tourism; the Meet Boston tourism bureau is also among the race’s sponsors. Another hope of O’Neil’s: to continue community efforts that he and his race management firm, Ethos, undertook with support from Columbia Threadneedle, including donations to Boston Medical Center and the city’s “Swim Safe” program to provide swim lessons for kids. (O’Neil started an affiliated nonprofit to help expand this community work in 2024.)

He expects the race’s naming-rights sponsorship to cost “in the mid-six figures” annually.

“We’re over this hump now, after 18 years, we’re an institution,” O’Neil said. “We’re seeking a Boston-based company, that’s headquartered here or has a large presence here, that wants to make an impact on the community. … We know how to do that.”

This is an installment of our weekly Bold Types column about the movers and shakers on Boston’s business scene.

Advertisement

Jon Chesto can be reached at jon.chesto@globe.com. Follow him @jonchesto.





Source link

Continue Reading

Boston, MA

Red Sox Star ‘Open’ to Trade Talks With Boston’s Season Spiraling

Published

on

Red Sox Star ‘Open’ to Trade Talks With Boston’s Season Spiraling


Although it is just June 22, it’s certainly starting to seem like the Boston Red Sox could end up being sellers later on this summer when the 2026 Major League Baseball trade deadline gets here.

Boston took two out of three games from the Seattle Mariners over the weekend, but still finds itself 13 games under .500 at 31-44. Right now, Boston is six games out of an American League Wild Card spot as well. Boston needs a long winning streak to turn the tide. If not, the club will certainly trade pieces away. The conversation has gotten loud enough around the team that Red Sox starter Sonny Gray said he “would be open” to having a conversation about waiving his no-trade clause if someone from the club approached him about it to Tim Healey of The Boston Globe.

“If someone came to me from the Red Sox and made a decision that that’s the direction that this team was going to go, I would be open for a conversation,” Gray said to Healey. “Whatever happens from then, only time will tell. But I would be open for a conversation.

Advertisement

Could Sonny Gray Be The Next Star Out Of Boston?

Jun 18, 2026; Boston, Massachusetts, USA; Boston Red Sox starting pitcher Sonny Gray (54) pitches against the Toronto Blue Jays during the first inning at Fenway Park. Mandatory Credit: Eric Canha-Imagn Images | Eric Canha-Imagn Images
Advertisement

“Holding veto power is ‘an earned thing’ and means a lot, Gray said. He negotiated it into the three-year, $75 million deal he signed with the Cardinals heading into 2024.”

Advertisement

When it comes to Gray, he has been a major addition for Boston so far this season. He has a 3.12 ERA in 13 starts to go along with a 55-to-17 strikeout-to-walk ratio in 69 1/3 innings pitched. Gray is also 8-1 on the season. Even in a campaign full of losses for Boston, Gray has been able to consistently be a stopper for the club.

If he were to become available, he would be an intriguing, although imperfect trade candidate. From a talent perspective, he’s awesome and would help a contender. But from a contract point of view, he has a $30 million mutual option for the 2027 season with a $10 million buyout. Mutual options rarely get picked up. The buyout is very high and could be a barrier. That will be a bridge to cross later on, though. What’s important to note right now is the fact that Gray is “open” to a conversation about a trade. It doesn’t mean that it will happen, but it’s possible.

Advertisement
Add us as a preferred source on Google



Source link

Continue Reading

Boston, MA

Jets were 300 feet apart in Boston close call that forced Delta flight to abort landing, expert says

Published

on

Jets were 300 feet apart in Boston close call that forced Delta flight to abort landing, expert says


BOSTON (AP) — A Delta Air Lines jet was roughly 300 feet (90 meters) from an American Airlines plane during a close call at Boston’s airport that forced the Delta aircraft to abort a weekend landing attempt, an aviation expert said Sunday.

The Federal Aviation Administration said it was investigating the incident between two commercial flights that happened Saturday at Boston Logan International Airport.

Todd Curtis, a former safety engineer at Boeing, estimated the distance between the two jetliners using Flightradar24, a website that tracks flights. Curtis now coproduces a podcast about flight safety issues.

“This is a significant incident,” Curtis said, adding that it was particularly concerning because it involved two professional airline crews.

Advertisement

He said federal aviation officials have been concerned about such runway incursions for a while now and will scrutinize Saturday’s close call.

Near-misses and runway incursions at U.S. airports will be the subject of a hearing on Capitol Hill on Tuesday. The Senate Commerce Subcommittee on Aviation, Space, and Innovation will seek ways to strengthen safety across the national airspace system.

The Delta flight from Dallas had to execute a go-around, or aborted landing, to avoid the American plane departing from an intersecting runway, according to the FAA and flight logs.

The crew of Delta flight 2351 coordinated with air traffic control to perform the go-around, an airline spokesperson said. The plane, which had 129 passengers and six crew members on board, landed safely and deplaned normally, according to the spokesperson.

Go-arounds are safe, routine procedures performed at the discretion of the pilot or air traffic controllers, according to the FAA.

Advertisement



Source link

Continue Reading
Advertisement

Trending