Perception & Sensing
Can we replicate subjective perception of humans in machines? Let's explore!
Given the highly subjective nature of human perception, informed by our individual experiences and interactions with the environment, it raises the question of whether machines can replicate these features to the same extent. Understanding the capabilities and limitations of machine perception is crucial in designing effective systems that can interact seamlessly with humans and their environment.
Human perception is inherently subjective and majorly based on our interaction with the environment. We interact with the world through touch, movement, and other physical actions. This interaction helps them to construct a mental representation of the world, informed by their prior experiences and personal biases. It's this mental schema that allows them to make sense of the world around them and to interpret new experiences within that framework.
A whiteboard may look the same to everyone at first glance, but the way we perceive it can vary based on our prior experiences and knowledge. An artist and a scientist might perceive the whiteboard very differently. An artist might look at it and see it as a blank canvas, while a scientist might see it as a tool for visualizing complex ideas. Their different perspectives lead them to interpret the same object in completely different ways.
When we encounter something new, our brain has no existing knowledge to fall back on. So, it tries to make sense of the object by interacting with it in various ways. This might involve touching it, looking at it closely, smelling it, or even listening to it. Without any prior experience or associations, we have to create an entirely new interpretation of the object. This new interpretation is then added to our existing store of knowledge and experience, so that the next time we encounter something similar, we'll have more to go on.
Another complex aspect of our perception is pointed out by Gestalt theory. It emphasizes the holistic nature of perception. It's not just about how we perceive individual elements, but how we see the whole picture. We don't just see a chair, or groups of atoms of wood, we see a chair in a room, in a house, in a neighborhood, in a city, and so on. The context and relationships matter, and they affect how we perceive the object itself.
Earlier, since machines could not “interpret” the situation, they were simply recording devices, designed to capture as much data as possible so that humans could then interpret it. It was assumed that only humans could understand and make sense of the world.
Machines lack the context and associations that human perception possesses. Machines don't have memories or experiences to connect the dots and make sense of the world. Instead, they rely on humans to provide labeled data that can help them learn to recognize patterns. This approach has been incredibly effective in training machines to recognize images, sounds, and other types of data.
The current approach to machine perception is very limited because it relies on humans to provide the labels and context that machines need to understand the world. It's almost like teaching a child by showing them a picture of a dog, and saying 'This is a dog.' But this doesn't allow for the machine to really understand a dog in the same way that a human can.
In order to truly enable machines to perceive the world in a more human-like way (if it is ever possible), we need to rethink how we teach them. Rather than simply providing labeled data, we need to provide them with opportunities to interact with the world. This means giving them the ability to explore and manipulate their environment and to learn through trial and error. This would allow machines to develop a more holistic and intuitive understanding of the world, rather than relying on labels and categories.
The two sections from here will help us understand how machines of the current day get the required data and can make interpretations of the data.
There are many different types of sensors that machines can use to mimic the world currently for humans. For almost everything, we do have something. Here's a list of some common ones:
Sensing vs Perceiving
Sensing is just the first step in understanding the world around us. It allows us to collect raw data from our environment. But perception is the next step, where we interpret that data and assign meaning to it. Without perception, all the senses in the world won't help us truly understand the world around us. It's like trying to understand a book by just looking at the pages without actually reading the words.
Perception is majorly missing in the machines of today. And, it would be a very interesting puzzle to solve, if it can ever be solved.
Aim → Aim is to understand and interpret every motion of entities in an environment (human or otherwise).
How can this be enabled?
How far have we reached?
We can sense any movement. But bringing meaning to that data still require a lot of human effort. We need new methods of processing and perception that can bring change.
Aim → Aim is to understand the environment of a space.
Imagine you going in entering a board room, and your interpretation assists you to modify your words.
How can this be enabled?
How far have we reached?
No such product exists that can enable this. This is a complex problem of multimodal perception, that is making sense of different sensing data in a collective form. The associated context, and personal bias also acts as a huge hurdle to achieve this subjective analysis.
Aim → Aim is to understand every emotion or expression of entities in an environment (human or otherwise)
How can this be enabled?
How far have we reached?
We have reached a level where we are making this happen distinctly with CCTV and audio data. I am very interested in further deep dive into the possibilities of multimodal interpretation.
These were some of the aspects in where more work is needed corresponding to the machine perception.
Current devices are indeed very good at sensing data from their environments, but they are not designed to truly understand what that data means. They collect information and then provide it to humans for analysis and interpretation. However, perhaps this could be improved if we built devices that were not just good at sensing data, but also had the ability to make sense of it. This would require a different kind of intelligence, one that could interpret and understand complex data and draw conclusions from it.
We need to give machines the ability to perceive and understand the world, and not just passively record it. To do this, we may need to make a fundamental shift in how we design machines, both hardware and software. We may need to move away from traditional approaches to machine design, such as creating machines that are optimized for specific tasks or specific types of data. Instead, we need to create machines that can process a variety of data types, and learn to make connections and draw conclusions from that data.
Featured
AI-driven tech evolution is reshaping advertising in different ways
Featured
AI is transforming how we use computers and see ads, from search to smart conversations.