Artificial intelligence (AI) has been used to reconstruct motion sequences of humans and animals from video footage. This is done using techniques such as computer vision and machine learning, which enable computers to analyze and understand the content of images and videos. One popular approach is to use deep learning algorithms, which can learn to recognize and classify different types of movements and patterns in video data. For example, researchers have trained neural networks to recognize different types of human movements, such as walking, running, and jumping, and to reconstruct these movements in 3D space.
Consider that we are on safari, watching a giraffe graze. After a brief moment of distraction, the animal lowers its head and sits. But what happened in the meantime, we wonder? The Centre for the Advanced Study of Collective Behaviour at the University of Konstanz has discovered a way to encode an animal’s pose and appearance in order to show the intermediate motions that are statistically likely to have occurred.
One major issue in computer vision is that images are extremely complex. A giraffe can strike a wide variety of poses. On a safari, missing part of a motion sequence is usually not a problem, but for the study of collective behavior, this information can be critical. This is where computer scientists with the new “neural puppeteer” model come into play.
The goal was to be able to predict 3D key points and track them regardless of texture. As a result, we developed an AI system that predicts silhouette images from any camera perspective using 3D key points.Urs Waldmann
Predictive silhouettes based on 3D points
“One idea in computer vision is to describe the very complex space of images by encoding as few parameters as possible,” says Bastian Goldlücke, a computer vision professor at the University of Konstanz. The skeleton has been a popular representation up to this point. Bastian Goldlücke and doctoral researchers Urs Waldmann and Simon Giebenhain present a neural network model in a new paper published in the Proceedings of the 16th Asian Conference on Computer Vision that allows them to represent motion sequences and render full appearance of animals from any viewpoint using only a few key points. The 3D view is more malleable and precise than the skeleton models currently in use.
“The goal was to be able to predict 3D key points and track them regardless of texture,” says doctoral researcher Urs Waldmann. “As a result, we developed an AI system that predicts silhouette images from any camera perspective using 3D key points.” It is also possible to determine skeletal points from silhouette images by reversing the process. The AI system can calculate the intermediate steps that are statistically likely based on the key points. It is important to use the individual silhouette. This is because if you only work with skeletal points, you won’t know whether the animal you’re looking at is fairly massive or nearing starvation.
In the field of biology in particular, there are applications for this model: “At the Cluster of Excellence ‘Centre for the Advanced Study of Collective Behaviour’, we see that many different species of animals are tracked and that poses also need to be predicted in this context,” Waldmann says.
Long-term goal: apply the system to as much data as possible on wild animals
The team began by predicting human, pigeon, giraffe, and cow silhouette motions. According to Waldmann, humans are frequently used as test cases in computer science. His Cluster of Excellence colleagues work with pigeons. Their fine claws, on the other hand, present a significant challenge. Cows had good model data, but the giraffe’s extremely long neck presented a challenge that Waldmann was eager to tackle. The silhouettes were created based on a few key points, ranging from 19 to 33 in total.
The computer scientists are now ready for real-world applications: In the future, data on insects and birds will be collected in the University of Konstanz’s Imaging Hanger, the university’s largest laboratory for the study of collective behavior. Controlling environmental aspects such as lighting and background is easier in the Imaging Hangar than in the field. However, the long-term goal is to train the model on as many different species of wild animals as possible in order to gain new insights into animal behavior.