Managing Learning
16 AI-Speak: Data-based systems, part 1
Decisions in the classroom
As a teacher, you have access to many kinds of data. Either tangible data such as attendance and performance records, or intangible ones such as student body-language. Consider some of the decisions you take in your professional life:What are the data that help you make these decisions?
There are technological applications that can help you visualise or process data. Artificial intelligence systems use data to personalise learning, make predictions and decisions that might help you teach and manage the classroom: Do you have needs that technology can answer? If yes, what will be the data such a system might require to carry out the task?
Educational systems have always generated data – students’ personal data, academic records, attendance data and more. With digitalisation and AIED applications, more data is recorded and stored: mouse clicks, opened pages, timestamps and keyboard strokes1. With data-centric thinking becoming the norm in society, it is natural to ask how to crunch all this data to do something pertinent. Could we give more personalised feedback to the learner? Could we design better visualisation and notification tools for the teacher?2
Whatever technology is used, it has to meet a real requirement in the classroom. After the need is identified, we can look at the data available and ask what is relevant to a desired outcome. This involves uncovering factors that let educators make nuanced decisions. Can these factors be captured using available data? Is data and data-based systems the best way of addressing the need? What could be the unintended consequences of using data this way3?
Machine learning lets us defer many of these questions to the data itself4. ML applications are trained on data. They work by operating on data. They find patterns and make generalisations and store these as models – data that can be used to answer future questions4. Their decisions and predictions, and how these affect student learning, are all data too. Thus, knowing how programmers, the machine and the user handle data is an important part of understanding how artificial intelligence works.
About data
Data is generally about a real world entity – a person, an object, or an event. Each entity can be described by a number of attributes (features or variables)5. For example, name, age and class are some attributes of a student. The set of these attributes is the data we have on the student, which, while not in any way close to the real entity, does tell us something about them. Data collected, used and processed in the educational system is called educational data1.
A dataset is the data on a collection of entities arranged in rows and columns. The attendance record of a class is a dataset. In this case, each row is the record of one student. The columns could be their presence or absence during a particular day or session. Thus each column is an attribute.
Data is created by choosing attributes and measuring them: every piece of data is the result of human decisions and choices. Thus, data creation is a subjective, partial and sometimes messy process prone to technical difficulties4,5. Further, what we choose to measure, and what we don’t can have a big influence on expected outcomes.
Data traces are records of student activity such as mouse clicks, data on opened pages, the timing of interactions or key presses in a digital system1. Metadata is data that describe other data5. Derived data is data calculated or inferred from other data: individual scores of each student is data. The class average is derived data. Often, derived data is more useful in getting useful insights, finding patterns and making predictions. Machine Learning applications can create derived data and link it with metadata data traces to create detailed learner models, which help in personalising learning1.
For any data based application to be successful, attributes should be carefully chosen and correctly measured. The patterns discovered in them should be checked to see if they make sense in the educational context. When designed and maintained correctly, data driven systems can be very valuable.
This chapter aims to introduce a few basics of data and data based technology but data literacy is a very important skill to possess and merits dedicated training and continuing support and update1.
Legislation you should know about
Because of the drastic drop in costs of data storage, more data and metadata are saved and retained for a longer time6. This can lead to privacy breaches and rights violations. Laws like the General Data Protection Regulation (GDPR) discourages such practices and gives EU citizens more control over their personal data. They give legally enforceable data protection regulations across all EU member states.
According to GDPR, personal data is any information relating to an identified or identifiable person (data subject). Schools, in addition to engaging with companies that handle their data, store huge amounts of personal information about students, parents, staff, management, and suppliers. As data controllers, they are required to store data which they process confidentially and securely and have procedures in place for the protection and proper use of all personal data1.
Rights established by the GDPR include:
- The Right to Access makes it mandatory for them to know(easily) what data is being collected about them
- The citizen’s Right to Be Informed of the usage made of their data
- The Right to Erasure allows a citizen whose data has been collected by a platform to ask for that data to be removed from the dataset built by the platform (and which may be sold to others)
- The Right to explanation – explanation should be provided whenever clarification is needed on an automated decision process that affect them
Although, GDPR does allow for collection of some data under “legitimate interest”7and the use of derived, aggregated, or anonymized data indefinitely and without consent5. The new Digital Services Act restricts the use of personal data for targeted advertising purposes7.” In addition to these, the EU-US Privacy Shield strengthens the data-protection rights for EU citizens in the context where their data have been moved outside of the EU5.
Please refer to GDPR for dummies for the analysis carried out by independent experts from the Civil Liberties Union for Europe (Liberties). This is a watchdog that safeguards EU citizens’ human rights.
1 Ethical guidelines on the use of artificial intelligence and data in teaching and learning for educators, European Commission, October 2022.
2 du Boulay, B., Poulovasillis, A., Holmes, W., Mavrikis, M., Artificial Intelligence And Big Data Technologies To Close The Achievement Gap, in Luckin, R., ed. Enhancing Learning and Teaching with Technology, London: UCL Institute of Education Press, pp. 256–285, 2018.
3 Hutchinson, B., Smart, A., Hanna, A., Denton, E., Greer, C., Kjartansson, O., Barnes, P., Mitchell, M., Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure, Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, New York, 2021.
4 Barocas, S., Hardt, M., Narayanan, A., Fairness and machine learning Limitations and Opportunities, MIT Press, 2023.
5 Kelleher, J.D, Tierney, B, Data Science, MIT Press, London, 2018.
6 Schneier, B., Data and Goliath: The Hidden Battles to Capture Your Data and Control Your World, W. W. Norton & Company, 2015.
7 Kant, T., Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.”, MIT Case Studies in Social and Ethical Responsibilities of Computing, 2021.