1 Digital information
Information explosion
We are all familiar with how significant storage capacity is: we routinely buy smartphones with gigabytes of memory and hard drives with capacities of a couple of terabytes. The availability and affordability of such devices, and even the familiarity with these data units are a far cry from not so long ago. In the last decades of the previous century, personal computers were a new phenomenon, digital photography was in its infancy and today’s social media did not even exist yet. In 1983, the Apple Lisa, the commercially failed precursor to the Macintosh, had a five megabyte hard disk and cost almost US $ 10,0000 (the equivalent of over US $ 25,000 today). In 1988, a FUJIX DS-1P, the first fully digital camera, had a two megabyte memory card that could hold five to ten photographs. Our need for data storage and communication has changed a lot since those heady times.
The obvious reason for this change is the explosive increase in information production that characterizes the digital era. In a process of steady growth through the centuries, human societies had previously accumulated an estimated 12 exabytes of information. By 1944 libraries were doubling in size every 16 years, provided there was physical space for expansion. Space limitations were removed by the rise of home computers and the invention of the Internet. These allowed annual information growth rates of 30% that raised the total to 180 exabytes by 2006 and to over 1.8 zettabytes by 2011. More recently, the total more than doubled every two years, reaching 18 zettabytes in 2018 and 44 zettabytes in 2020, and expected to become 175 zettabytes by 2025.
The Internet is full of such astounding calculations and dramatic projections,[1] which never fail to warn that the total may become even higher, as the population of information users and producers keeps increasing, as well as expanding to cover devices generating and sharing data on the IoT. But even if we ever reach a plateau, as with Moore’s “law” with respect to computing capacity,[2] we already have an enormous problem in our hands: a huge amount of data to manage. 1.2 exabytes are stored only by the big four (Google, Microsoft, Amazon and Facebook), while other big providers like Dropbox, Barracuda and SugarSync, and less accessible servers in industry and academia probably hold similar amounts.[3]
What makes these numbers even more important is that information is not just stored but, above all, intensively and extensively processed. Already in 2008, Google processed 20 petabytes a day.[4] In many respects, it is less interesting how much data we produce on a daily or annual basis than what we do with these data. Not surprisingly, social media and mobile phones dominate in any account of digital data processing: in 2018, people sent 473,400 tweets, shared 2 million photos on Snapchat and posted 49,380 pictures on Instagram. Google handled 3.5 billion searches a day, while 1.5 billion people (one-fifth of the world’s population) were active on Facebook every day.
In 2020, the picture slightly changed as a result of the COVID-19 pandemic: we produced 1.7 MB of data per person per second, with a large share again going into social media, while communication platforms like Zoom and Microsoft Teams, as well online shopping and food ordering, attracted significantly more activity.[5] Anything good or bad happening in the world only increases our dependence on the information and communication possibilities of the Internet, especially now that so many of us can afford utilizing them anytime and anyplace on their smartphones. Consequently, safeguarding information quality, veracity, accessibility and flow already forms a major challenge for both producers and consumers of data.
The situation is further complicated by changing attitudes toward information. Not so long ago, most people were afraid of information overload.[6] Nowadays we have moved to a diametrically different point of view and are quite excited about the potential of big data and related AI approaches. From being a worry, the plethora of information we produce and consume has become an opportunity. At the same time, we are increasingly concerned with data protection and privacy, as amply illustrated by the extent and severity of laws like the General Data Protection Regulation (GDPR) of the European Union (https://gdpr.eu). Attitudes may change further, moreover in unpredictable ways, as suggested by reactions to the Facebook–Cambridge Analytica data breach in 2018 and worries about data collection in relation to COVID-19.
Information and digitization
It is not accidental that we talk about our era as both the information age and the digital revolution — two characterizations that (not coincidentally) appeared in quick succession. The rapid growth of information production and dissemination, the changes in human behaviours and societal standards or the shift from industrial production to information-based economies would not have been possible without digital technologies. Before the digital revolution, there were technologies for recording and transmitting information but they were not capable of processing information or available to practically all. The information age demands digital technologies, which are consequently present in almost every aspect of daily life, making information processing synonymous with digital devices, from wearables to the cloud. This also means that there is increasingly less that we do with alternative means (e.g. order food by phone rather than through an app), especially since a lot of information is no longer available on analogue media. For example, most encyclopaedias and reference works that used to adorn the bookshelves of homes in the second half of the twentieth century are either no longer available on paper or cannot compete with online sources for actuality, detail and multimedia content. Online video, audio and image sharing platforms have similarly resulted in unprecedented collections that include many digitized analogue media. Despite the frequently low resolution and overall quality of transcribed media, there is no practical alternative to the wealth and accessibility of these platforms.
Related to the dominance of these platforms is that most data transactions take place within specific channels and apps. Nobody publishes on social media in general but specifically on Facebook, Twitter, Instagram, Snapchat, TikTok or whatever happens to be popular with the intended audience at the time. Even though overarching search engines can access most of these data, production, storage and communication are restricted by the often proprietary structure of the hosting environments. As a result, digital information tends to be more fragmented than many assume. Leaving aside the thorny issues of data ownership, protection, rights and privacy, the technical and organizational problems resulting from such restrictions and fragmentation may be beyond the capacities of an individual or even a small firm. Being so dependent on specific digital means for our information needs makes us vulnerable in more respects than we probably imagine and adds to the complexity of information management. It also suggests that privacy is totally lost, as data about user actions and communications are collected by tech companies, whose digital products and services we keep on using because of some huge generic advantages, such as the immense extent and power of crowdsourcing on the Internet.
Regardless of such problems, however, it is inevitable that the means of information production, dissemination and management will remain primarily digital, with growing amounts of information available to us and often necessary for our endeavours. Digitization creates new opportunities for our information needs but, on the other hand, also adds to the problems that must be resolved and their complexity. Digitization is so widely diffuse and pervasive that we are already in a hybrid reality, where the Internet and other digital technologies form permanent layers that mediate even in mundane, everyday activities, such as answering a doorbell. In a growing number of areas, the digital layers are becoming dominant: social media are a primary area for politics, while health and activity are increasingly dependent on self-tracking data and economies are to a large extent about intangible data. Consequently, safety and security in cyberspace are at least as important as in reality. Moreover, they call for dynamic, adaptable solutions that match the fluidity and extent of a digital information infrastructure. It follows that, rather than putting our faith in currently dominant techniques, we need to understand the principles on which solution should be based and devise better approaches for the further development of information infrastructures.
Interestingly, these infrastructures are not always about us. One aspect of the digital complexity that should not be ignored is that a lot of machine-produced data (and hence a lot of computational power) goes into machine-to-machine communication and human-computer interaction, e.g. between different systems in a car (from anti-lock braking systems and touch-activated locks to entertainment and navigation systems) or in the interpretation of user actions on a tablet (distinguishing between pushing a button, selecting a virtual brush, drawing a line with the brush or translating finger pressure into stroke width). Such data, even though essential for the operations of information processing, are largely invisible to the end user and hence easy to ignore if one focuses primarily on the products rather than the whole chain of technologies involved in a task. On the other hand, these chains and the data they produce and consume are a major part of any innovation in digital technologies and their applications: we have already moved on from information-related development to development dependent on digitization.
Effects of digital information
The practical effects of digital information technologies are widely known, frequently experienced and eagerly publicized. Digitization is present in all aspects of daily life, improving access and efficiency but also causing worries for lost skills, invasion of privacy and effects on the environment. With apps replacing even shopping lists, handwriting is practiced less and less, and handwritten text is becoming more and more illegible. Communication with friends, colleagues, banks, authorities etc. is predominantly Internet-based but cannot fully replace physical proximity and contact, as we have seen in the COVID-19 pandemic. Electricity demand keeps rising, both at home or work and for the necessary infrastructure, such as data centres.
Other, equally significant effects, are less frequently discussed, arguably because they go much deeper and affect us so fundamentally that we fail to recognize the changes. For example, with the easy availability and wide accessibility of information, it is becoming increasingly difficult to claim ignorance of anything — much harder than it has been since the newspaper and news agency boom in the second half of the nineteenth century, and the radio and television broadcasting that followed. More and more facts, events and opinions are becoming common knowledge, from what happens today all over the world to new interpretations of the past, including absurd complot theories. As patients, citizens, students, tourists or hobbyists we can no longer afford to miss anything that seems relevant to our situations or activities.
Another cardinal effect is that we are no longer the centre of the information world, the sole or ultimate possessor and processor of information. Our environment has been transformed and enriched with machine-based capacities that rival and sometimes surpass our own, so changing our relation to our environment, too. Interestingly, our reactions to this loss of exclusivity are variable and even ambivalent. On one hand, we worry about the influence of hidden algorithms and AI, and on the other, we are jubilant about the possibilities of human-machine collaboration. Dystopian and utopian scenarios abound, while we become more and more dependent on information-processing machines. One of the key messages of this book is that, regardless of hopes and fears, there are principles on which we can base our symbiosis with these machines: tasks we can safely delegate to computers and support we can expect from them in order to improve our own information processing and decision making.
Finally, the most profound and arguably lasting effect of digitization is that it invites us to interpret and even experience the world as information, understanding practically everything in terms of entities, properties, relations and processes. Our metaphors for the world were always influenced by the structure of our artefacts: the things we had designed and therefore knew intimately. Projecting their functioning and principles to other things we have been trying to comprehend, like the cosmos, made sense and enabled us to develop new knowledge and technologies. Current conceptual models of reality are heavily influenced by digital information and the machines that store and process it. Human memory processes are explained analogically to hard drive operations and our visual perception is understood by reference to digital image capture and recognition. Such conceptual models are a mixed blessing. As explanations of the mind or social patterns they can be reductionist and mechanistic but at the same time they can be useful as bridges to processing related information with computers.
Information management
All the above makes information management (IM) a task that is not exclusive to managers and computer specialists. It involves everyone who disseminates, receives or stores information. Very few people are concerned with IM just for the sake of it. Most approach information and its management in the framework of their own activities, for which information is an essential commodity. This makes IM not an alien, externally imposed obligation but a key aspect of everyone’s activities, a fundamental element in communication and collaboration, and a joint responsibility for all — a necessity for anyone who relies on information for their functioning or livelihood.
Given the complexity of our hybrid reality and the lack of transparency in many of our approaches to it, this book bypasses technical solutions and focuses on the conceptual and operational structure of IM: the principles for developing clear and effective approaches. These approaches can lead to better information performance, including through reliable criteria for selecting and evaluating means used for their implementation. In other words, we need a clear understanding of what we have to do and why before deciding on how (which techniques are fitting for our goals and constraints).
The proposed principles include definitions of information and representation, and operational structures for connecting process management to IM. IM therefore becomes a matter not of brute force (by computers or humans) but of organization and relevance. One can store all documents and hope for the best but stored information is not necessarily accessible or usable. As we know from searches on the Internet, search machines can be very clever in retrieving what there is out there but this does not necessarily mean that they return the answers we need. If one asks for the specific causes of a fault in a building, it is not enough to receive all documents on the building to browse and interpret. Identifying all information that refers precisely to the relevant parts or aspects of the building depends on how archives and documents have been organized and maintained. To achieve that, we cannot rely on exhaustive, labour-intensive interpretation, indexing and cross-referencing of each part of each document. Instead, we should try to understand the nature and structure of the information these documents contain and then build better representations and management strategies, which not only improve IM but also connect it better to our processes and the tasks they comprise.
Recommended further reading
- Blair, A. et al. (eds.), 2021, Information: a historical companion. Princeton: Princeton University Press.
- Graham, M., & Dutton, W.H. (eds.), 2019, Society and the Internet. Oxford: Oxford University Press.
- Floridi, L., 2014. The fourth revolution. Oxford: Oxford University Press.
Key Takeaways
- Digitization has added substantial possibilities to our information-processing capabilities and promoted the accumulation of huge, rapidly growing amounts of information
- Digital information and its processing are already integrated in our everyday activities, rendering them largely hybrid
- We are no longer the exclusive possessor or even the centre of information and its processing: machines play an increasingly important role, including for machine-to-machine and human-to-machine interactions
- Information management is critical for the utilization of digital information; instead of relying on brute-force solutions, we should consider the fundamental principles on which it should be based
Exercises
- Calculate how much data you produce per week, categorized in:
- Personal emails
- Social media (including instant messaging)
- Digital photographs, video and audio for personal use
- Study-related emails
- Study-related photographs, video and audio
- Study-related alphanumeric documents (texts, spreadsheets etc.)
- Study-related drawings and diagrams (CAD, BIM, renderings etc.)
- Other (please specify)
- Specify how much of the above data is stored or shared on the Internet and how much remains only on personal storage devices (hard drives, SSD, memory cards etc.)
- How do the above (data production and storage) compare to worldwide tendencies?
- Calculations and projections of information accumulated by human societies can be found in: Rider, F., 1944, The Scholar and the Future of the Research Library. New York: Hadham Press; Lyman, P. & Varian, H.P. 2003, "How much information 2003?" http://groups.ischool.berkeley.edu/archive/how-much-info/; Gantz, J. & Reinsel, D., 2011, "Extracting value from chaos." https://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf;Turner, V., Reinsel D., Gantz J. F., & Minton S., 2014. "The Digital Universe of Opportunities" https://www.emc.com/leadership/digital-universe/2014iview/digital-universe-of-opportunities-vernon-turner.htm; "Rethink data" Seagate Technology Report, https://www.seagate.com/nl/nl/our-story/rethink-data/ ↵
- Intel co-founder Gordon Moore observed in 1965 that every year twice as many components could fit onto an integrated circuit. In 1975 the pace was adjusted to a doubling every two years. By 2017, however, Moore's "law" no longer applies, as explained in: Simonite, T., 2016. “Moore’s law Is dead. Now what?” Technology Review https://www.technologyreview.com/s/601441/moores-law-is-dead-now-what/ ↵
- Source: https://www.sciencefocus.com/future-technology/how-much-data-is-on-the-internet/ ↵
- The claim was made in a scientific journal paper: Dean, J., & Ghemawat, J., 2008. "MapReduce: simplified data processing on large clusters" Commun. ACM 51, 1 (January 2008), 107–113, https://doi.org/10.1145/1327452.1327492. Regrettably, Google and other tech companies are not in the habit of regularly publishing such calculations. ↵
- There are several insightful overviews of what happens every minute on the Internet, such as: https://www.visualcapitalist.com/?s=internet+minute; https://www.domo.com/learn/infographic/data-never-sleeps-8; https://www.domo.com/learn/infographic/data-never-sleeps-6 ↵
- The notion of information overload was popularized in: Toffler, A., 1970. Future shock. New York: Random House. ↵