Just because our phones speak to us, it doesn’t mean they are smart

Ethan Guzman

The minicomputers we have in our pocket, otherwise known as phones, have the ability to speak to us. With the iPhone model, if you were to hold down the home button a voice named Siri chimes in to assist you. This outstanding accomplishment gives rise to fears that imply A.I will be able to surpass human intellect. To some, the smooth production of language makes it seem as if the next parsimonious step for A.I is to use communication to shape their cognition as we do. Furthermore, people think that A.I will use communication as a mechanism more efficiently and will therefore be beyond human control. This is exemplified by approximately “43 percent of smartphone owners thinking their phones are listening to them without their permission” (Molla, 2019). The fear here is that Siri supposedly has an agenda not aligned with that of it’s owner. In the big picture, people fear this to be the first of many steps A.I will take to render human intelligence obsolete.

If you’re wondering how language can work as a mechanism to represent the mind you are not alone. Lev Vygotsky, a Soviet psychologist, examined how language affects the way we transmit information between people, in his work “Thought and Word”. Vygotsky wanted to investigate how communicated language represents the inner workings of the mind. To do this, he needed to delineate the two types of speech; the speech we verbalize to one another and the internal speech you are using to read this text in your head. He categorized the two different types of speech as external speech and internal speech, respectively. Vygotsky was able to differentiate the two different types of speech by examining children who were trying to master the use of language. Moreover, he believes that inner speech starts as private speech in children stemming from the insufficient individualization of their primary social speech. Or in other words, once an instruction is given to a child, they must first internalize said instructions. Private speech, or repeatedly saying the instruction, is their attempt to internalize the instructions. But through exposure to their culture, private speech develops and becomes inner speech. The exposure to culture is a key point made by Vygotsky. Someone’s culture guides the way infants develop inner speech by facilitating the interactions those infants can have within their environment. Therefore, culture controls the stimuli someone is exposed to and in turn shapes the way in which they think.

On the other hand, Vygotsky believed that verbal expressions cannot emerge fully formed, human babies aren’t able to speak full sentences as soon as they are born. Rather they must develop gradually, through a complex process that transitions a pattern of sounds heard into meaningful sentences. This process must be developed and perfected in order to gain effective use of external speech. Vygotsky states that “Going from internal to external speech is a complex, dynamic process involving the transformation of the predicative, idiomatic structure of inner speech into syntactically articulated speech intelligible to others” (Vygotsky, 1963, p.231). This dynamic process of word manipulation manifests thoughts, that is, thoughts’ main pathway is expressed through the vocalization of external speech or sub-vocalization of inner speech. The interconnectedness of each component implies the emergence of consciousness. To understand someone else’s speech we must first understand their thoughts.

Expanding on this point, human babies need to learn how some words are used differently according to the context it is said in. Babies learn how to use certain words through their exposure to words in their environment. This builds a framework in building representations that help them express their thoughts. This simply cannot be done by A.I., A.I. such as Siri, are only given the symbols and their definitions. This programming is used as an input and output relation, recognizing a symbol and then outputting a corresponding symbol. But language is not as linear as that, A.I. cannot learn the different contexts of when a word should or shouldn’t be used. They are not actively manipulating the words, and according to Vygotsky without this dynamic interaction consciousness cannot emerge. But why then do Cognitivist scientists have such a fascination with language?

With the premise that language is a symbol transaction that communicates thought, Cognitive Scientists have modeled A.I. to replicate external speech in hopes of developing consciousness. Why has language been the proverbial steppingstone in developing A.I. with consciousness? Cognitive Scientist Lera Boroditsky examines the Whorfian Problem in order to provide answers to that question. The Whorfian problem questions the extent to which language primes individuals to perceive, analyze and act in the world. In other words, the Whorfian problem questions how good of a fit language is in representing consciousness. There are three main components of the problem that Boroditsky addresses.

The first aspect of this problem, questions how language controls the way in which we perceive the world. With such a variety of stimuli in any given situation, how are we able to receive all the stimuli without overloading the brain with information? Boroditsky argues when language is mastered by an individual, it provides an order to which we encode the stimuli. Through the use of language, humans are able to break down any environment into aspects that are important for us to analyze.

This leads to our second aspect of the Whorfian problem; which questions whether those who speak different languages think differently? To illustrate this answer Boroditsky uses her work with different speaking groups describing spatial frames. That is, those who speak different languages categorize their relation in space differently than other people who speak another language. “Unlike English, some languages do not make use of terms like left or right and instead rely on an absolute reference frame system to describe such relation” (Boroditsky, 2012, p.618). This has a drastic difference in how someone might encode stimuli in a situation. For a native English speaker receiving directions such as “the fork is northeast of you” would baffle many as to the location of the fork. While to someone from the Australian Aboriginal community the directions “the fork is to the left of you” would result in the same confusion. This helps illustrate how important language is in perceiving the world around us. Changing the context of language can completely change how symbols should be used in a conversation to communicate an idea. A.I. are limited to only being able to use the context that is programmed into them. Resulting in Siri only having a set context to decipher the various language systems. Simply put, if individuals speak a dialect that Siri is not programmed to understand they would not receive the accurate information they asked for. This example helps demonstrate that A.I. are currently unable to handle changes in the context of the symbol transaction let alone developing consciousness.

The last aspect of the Whorfian problem asks whether or not thoughts are unthinkable without language? In short, Boroditsky reasons no, that just because language is the most common way of expressing thoughts, it is not the only way. Language creates pathways that provide stronger connections between synapses in the brain. This connection ultimately strengthens habits we use to help us discover the world. The way we interact within our environment influences how we are able to create mental representations about our world.  While this is true, humans use a variety of their  sensations to help them build representations and express their thoughts. Have you ever heard a song without any lyrics that could perfectly describe your mood?

Along that same line of reasoning, there are different aspects that make thinking work. The field that looks at how language shapes the ability for humans to receive inputs and produce as outputs provides a fertile model for replicating consciousness into A.I. It seems logical to assume that Cognitive Scientists can translate the same symbol system humans use into A.I. program as an input and output system. But is that the full story of what consciousness is? A more conducive way of thinking about this problem, can be represented in an Indian proverb: Blind Men and the Elephant. The proverb goes, there were 5 blind men who have never come into contact with an elephant in their life. In order to conceptualize what an elephant looks like they must feel a different area of that elephant. For example, one blind man may be feeling the tusk, while another feeling the tail. They must conceptualize their mental representation of what the elephant is just from their own sensation of touch. This results in a radically different picture from each of the 5 different men. In this sense, language is just one perspective of what consciousness is composed of.

In an attempt to show how this is true while incorporating the different perspectives of the various blind men, Alan Baddely wrote “Working Memory: Theories, Models and Controversies”. In this paper he uses the outline Vygotsky has provided us, in regards to the dynamic process between language and thought. While also incorporating the other streams of information Boroditsky implicitly mentioned when answering the third question of the Whorfian problem.  With these two things in mind, Baddely wanted to create a highly adaptive model representing what consciousness is via working memory. Previous models have had two main components making up working memory: the phonological loop and the visuospatial sketchpad (Baddley, 2011).

The phonological loop is the coding of verbal material of heard and spoken speech, such as explained by Vygotsky. But the other half of working memory is the visuospatial sketchpad, which has visual semantics. Described another way, during a memory, specific cells fire in order to encapsulate the relation of space between you and the objects in your environment. Is this something unique to humans? Although A.I. do not have neuronal inputs, they have code that replicate this quality to a degree. Taking the example of the iPhone, recording experiences on your phone arguably is more vivid than human memory. You can see the relation of where you are in that moment and receive the original visual stimuli as vividly as when you first experienced it. However, Baddely argues this does not constitute the full capacity of working memory and therefore the full capacity of consciousness. In his new model, Baddely added a third component that contributes to working memory; that is the episodic buffer. The episodic buffer allows for features from different stimuli to be chunked together perceptually and creatively to create an episode like image stored in long-term memory. An in-depth explanation of the neuronal firings that make up the episodic memory is given by Neuroscientist Neil Burgess. Burgess states:

“Your memories could start by place cells activating each other via dense interconnections. Reactivating boundary cells to create the spatial structure of the scene around your viewpoint and grid cells could move this viewpoint through that space…head direction cells generate an image for your visual imagery, so you can imagine what happened when you were at this wedding, for example” (Burgess, 2011).

This helps illustrate just how complicated the process of coding environmental inputs into a cohesive memory is. This is where A.I. comes up short, as efficient as recording a video is, it cannot integrate information that is not presented to itself. To clarify this point ask yourself, has a certain smell ever brought back a memory or a place in time?

Imagine walking down the street and smelling a fresh apple pie baking. This may be associated with the summer you spent at your grandma’s house or the time you ate so much pie you swore never to eat it again. Regardless if this is true of you or not the point is that a certain quality of an experience is associated with that episodic memory. Cognitive scientists refer to the quality of experiencing an event as qualia, and qualia cannot be fully encapsulated through spoken language. This is an important aspect to include into what consciousness is. As consciousness is not something that is separate from what makes us human, but rather, it is what makes us human. A.I. not only is incapable of manipulating the environment in order to encode the stimuli in an efficient way, but they also have no way of experiencing the qualia of an experience. The combination of the visuospatial sketchpad and phonological loop provides this episodic buffer, which is vital to encoding information in memory. Therefore, having Cognitive Scientists work in a bottom-up process creating only aspects of what consciousness is composed of in hopes that it will develop on its own, is not the best way to go about it. The combination and the manipulation of processes that make up consciousness are just as important, if not more, than just having the processes alone.

This then brings up the question of whether A.I. are able to listen to us on their own accord. With Vygotsky’s work in mind, I would say no, A.I. may be able to replicate external speech by recognizing the syntax of a language and having a programmed response. However, A.I. are not able to understand the semantic meaning of the symbols used in the transaction. This is true because A.I. don’t have experiences that facilitate the recognition of contextual differences needed. The A.I. are limited to the symbol system that is programmed into them and it is impossible to use symbols to describe the experience of a symbol.  Without the context of the symbols used in the transaction, they lack the necessary dynamic manipulation required for consciousness to emerge as pointed to by Baddely. In other words, without the sub-vocalization of inner speech, A.I. will not be able to manipulate the symbols of language to direct their own thoughts to form consciousness. In relation to the example used from class, the computational input and output method, although having fertile tenants, is just one blind man’s perspective of what the elephant could possibly be but does not fully encapsulate the elephant that is consciousness.


Baddely, A. (2011). Working Memory: Theories, Models, and Controversies. The Annual Review of Psychology, 63, 1-29. Doi: 10.1146

Boroditsky, L. (2012). How the languages we speak shape the ways we think. The Cambridge Handbook of Psycholinguistics, 31, 615-630.

Burgess, N. (2011). How your brain tells you where you are. TED conferences.

Molla, R. (2019). Your smart devices are listening to you, explained. Retrieved from: https://www.vox.com/recode/2019/9/20/20875755/smart-devices-listening-human-reviewers-portal-alexa-siri-assistant

Vygotsky, L., S. (1962). Thought and Word. Studies in Communication Thought and Language, 210-256. Doi: 10.1037


Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

The Singularity Isn’t Nigh and Here’s Why Copyright © 2020 by Ethan Guzman is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.

Share This Book