Perception & Sensing

How do humans perceive?

Human perception is inherently subjective and majorly based on our interaction with the environment. We interact with the world through touch, movement, and other physical actions. This interaction helps them to construct a mental representation of the world, informed by their prior experiences and personal biases. It’s this mental schema that allows them to make sense of the world around them and to interpret new experiences within that framework.

A whiteboard may look the same to everyone at first glance, but the way we perceive it can vary based on our prior experiences and knowledge. An artist and a scientist might perceive the whiteboard very differently. An artist might look at it and see it as a blank canvas, while a scientist might see it as a tool for visualizing complex ideas. Their different perspectives lead them to interpret the same object in completely different ways.

When we encounter something new, our brain has no existing knowledge to fall back on. So, it tries to make sense of the object by interacting with it in various ways. This might involve touching it, looking at it closely, smelling it, or even listening to it. Without any prior experience or associations, we have to create an entirely new interpretation of the object. This new interpretation is then added to our existing store of knowledge and experience, so that the next time we encounter something similar, we’ll have more to go on.

Another complex aspect of our perception is pointed out by Gestalt theory. It emphasizes the holistic nature of perception. It’s not just about how we perceive individual elements, but how we see the whole picture. We don’t just see a chair, or groups of atoms of wood, we see a chair in a room, in a house, in a neighborhood, in a city, and so on. The context and relationships matter, and they affect how we perceive the object itself.

How do machines perceive the world currently?

Earlier, since machines could not “interpret” the situation, they were simply recording devices, designed to capture as much data as possible so that humans could then interpret it. It was assumed that only humans could understand and make sense of the world.

Machines lack the context and associations that human perception possesses. Machines don’t have memories or experiences to connect the dots and make sense of the world. Instead, they rely on humans to provide labeled data that can help them learn to recognize patterns. This approach has been incredibly effective in training machines to recognize images, sounds, and other types of data.

The current approach to machine perception is very limited because it relies on humans to provide the labels and context that machines need to understand the world. It’s almost like teaching a child by showing them a picture of a dog, and saying ‘This is a dog.’ But this doesn’t allow for the machine to really understand a dog in the same way that a human can.

In order to truly enable machines to perceive the world in a more human-like way (if it is ever possible), we need to rethink how we teach them. Rather than simply providing labeled data, we need to provide them with opportunities to interact with the world. This means giving them the ability to explore and manipulate their environment and to learn through trial and error. This would allow machines to develop a more holistic and intuitive understanding of the world, rather than relying on labels and categories.

The two sections from here will help us understand how machines of the current day get the required data and can make interpretations of the data.

Sensors

There are many different types of sensors that machines can use to mimic the world currently for humans. For almost everything, we do have something. Here’s a list of some common ones:

Visual sensors - these include things like cameras and LiDAR sensors that capture images or scans of the environment.
Audio sensors - These include microphones and other sensors that can detect sound.
Pressure sensors - These can detect changes in pressure, such as when an object is touched or moved.
Haptic sensors - these detect vibrations, forces, and other physical interactions.
Biochemical sensors - these can detect biological or chemical changes in the environment.
Electromagnetic sensors - these detect electromagnetic fields and other changes in the electromagnetic spectrum.
Tactile sensors - these are similar to pressure sensors, but they can also detect texture, temperature, and other physical properties.
Proximity sensors - these detect objects or people near them, without any physical contact.

Sensing vs Perceiving

Sensing is just the first step in understanding the world around us. It allows us to collect raw data from our environment. But perception is the next step, where we interpret that data and assign meaning to it. Without perception, all the senses in the world won’t help us truly understand the world around us. It’s like trying to understand a book by just looking at the pages without actually reading the words.

Perception is majorly missing in the machines of today. And, it would be a very interesting puzzle to solve, if it can ever be solved.

In which ways perception can be helpful?

Motion

Aim → Aim is to understand and interpret every motion of entities in an environment (human or otherwise).

How can this be enabled?

Visual → Visual data is the primary source to sense and perceive motion.
Our sensing capabilities are advanced enough to sense any movement visually.
RADAR → RADAR can sense the micro-movements in any environment which are usually hidden from plain sight.
It also gives privacy which can be critical at times. Soli (by Google ATAP) is one attempt to conquer this.
Other methods which include sensing vibrations, pressure, etc. can also be helpful

How far have we reached?

We can sense any movement. But bringing meaning to that data still require a lot of human effort. We need new methods of processing and perception that can bring change.

Environment of a space

Aim → Aim is to understand the environment of a space.

Imagine you going in entering a board room, and your interpretation assists you to modify your words.

How can this be enabled?

Visual → Visual data is a crucial aspect to understand the environment of any room or a space. It helps us note down the expressions of the people, and their relative state and position to build context.
Audio → Audio information can enable us to get key information needed to perceive an environment. It helps us point out the mood of the space.
Other sensing methods can be via tactile, haptic, and pressure sensors, etc.

How far have we reached?

No such product exists that can enable this. This is a complex problem of multimodal perception, that is making sense of different sensing data in a collective form. The associated context, and personal bias also acts as a huge hurdle to achieve this subjective analysis.

Emotions | Expressions

Aim → Aim is to understand every emotion or expression of entities in an environment (human or otherwise)

How can this be enabled?

Visual → Facial data is widely used nowadays to sense emotions.
Audio → Audio data can be really helpful in understanding the emotions of any being.
RADAR → RADAR can be utilized to sense heartbeats, breathing patterns, and micro-movements which can be helpful.
Other sensing methods can be by mapping health markers, and biochemical changes. Tactile and pressure sensors can also be helpful at times.

How far have we reached?

We have reached a level where we are making this happen distinctly with CCTV and audio data. I am very interested in further deep dive into the possibilities of multimodal interpretation.

These were some of the aspects in where more work is needed corresponding to the machine perception.

Conclusion.

Current devices are indeed very good at sensing data from their environments, but they are not designed to truly understand what that data means. They collect information and then provide it to humans for analysis and interpretation. However, perhaps this could be improved if we built devices that were not just good at sensing data, but also had the ability to make sense of it. This would require a different kind of intelligence, one that could interpret and understand complex data and draw conclusions from it.

We need to give machines the ability to perceive and understand the world, and not just passively record it. To do this, we may need to make a fundamental shift in how we design machines, both hardware and software. We may need to move away from traditional approaches to machine design, such as creating machines that are optimized for specific tasks or specific types of data. Instead, we need to create machines that can process a variety of data types, and learn to make connections and draw conclusions from that data.