Mark Graves

Moral psychology investigates the development and functioning of human behavior and mental processing in moral contexts and serves as a good foundation for investigating human moral behavior. Moral psychologists draw upon methods and theories from social, developmental, cognitive, and other areas of psychology and frequently engage with the philosophical study of morality and ethical systems to develop empirical theories and models for human moral development, reasoning, and action.[1] One method in moral psychology, the semi-structured interview, attempts to elucidate aspects of a person’s moral framework including goals, significant events and influences, and stated moral values. In the interview, the interviewer typically asks the interviewee about two dozen questions derived from psychological and moral theory, along with follow-up prompts as needed. The interviews are recorded and transcribed, yielding a semi-structured narrative response to the theory-driven questions. Analyzing the responses provides empirical evidence on the participant’s moral framework within an open-ended theoretical context.

The present empirical study investigates how scientists and musicians characterize moral values in their respective fields and also serves as an example of how to analyze moral values using semantic analysis of semi-structured interviews. This study is part of a larger, ongoing investigation into the characterization of virtue among scientists and musicians.[2] Interview questions were based on the Good Work Interview Protocol and augmented with questions based upon virtue studies.[3] Questions included: What qualities do you think a good [field member] has? What inspired you to become a [field member]? Why did you get involved in [field] rather than something else?

Transcribed interview texts can be analyzed manually or through computational methods. Manual coding techniques identify significant constructs in transcribed texts by having a trained person annotate passages with tags derived from psychological or moral theory. However, manual coding generally requires considerable effort and consensus among human coders, and may introduce inadvertent bias. Some techniques also tend to categorize short, extracted segments of texts and may miss global themes underlying the texts, including identity constructs and pervasive mental schemas. The effort needed and theoretical commitments required also preclude easy comparisons between methods or quick investigations of novel hypotheses.

Computational techniques developed within artificial intelligence can augment manual close-reading and characterization techniques by scoring the text based upon its similarity to theory-driven text descriptors. The automatic techniques seek to avoid idiosyncratic individual human biases by adopting theory-driven descriptor texts that make biases more visible and explicit. By automating text comparison, the software can quickly evaluate numerous probes over large quantities of text. Of course, the automatic method may miss subtleties in the narrative text that a trained human might identify, though those subtleties often create points of disagreement among human coders.

Latent Semantic Analysis

The computational method of latent semantic analysis (LSA) used in the present study scores the narrative interviews of scientists and musicians and compares the implicit presence of particular moral values in the text. LSA quantifies the extent to which a moral value occurs implicitly within the interview text by using a computational proxy for meaning to measure the semantic similarity between the theory-derived moral value descriptors and the transcribed interview text. We use statistical methods to compare average semantic similarity between interviews and moral value descriptors for both scientists and musicians, yielding a characterization of the groups’ differences in implicit moral values.

Semantic analysis extracts meaning from text by transforming the text into a mathematical representation of its semantics. The mathematical representation of a text (such as a participant interview) is then compared with the representation of another text with known meaning (such as a theory-driven moral descriptor) yielding a measure of similarity between the texts. When the questions in the interview elicit mental constructs and schemas about the interviewee’s moral values, belief structures, and dispositions, then the comparison between the interviews and the theory-driven moral descriptors can elucidate and measure those specified aspects of that person’s moral framework.

Semantic analysis depends on an associationist and distributional theory of meaning. This is supported by recent philosophical investigations. While older understandings saw the meaning of a word as referring to a universal essence, Ludwig Wittgenstein argues that the meaning of a word lies in its use in language.[4] The linguist John Firth further clarifies that the meaning of a word depends on the words with which it is in frequent and habitual company.[5] More precisely, a word’s meaning depends on the words with which it frequently collocates and how it relates to those frequently collocated words. Collocation refers to a word’s occurrence near another word in a text. Thus the associations, or repeated collocations, between words define meaning. To model those associations, the mathematical linguist Zellig Harris identified and developed the distributional hypothesis. Harris noticed that words with similar meaning have similar contexts, that is, they regularly collocate with the same words, and suggested that in a sufficiently large sample of language words with similar patterns of association would have similar meaning.[6] For example, “cat” and “dog” would often occur in text with many overlapping associated words, such as “petting,” “feeding,” and other words for activities of non-human companioning. Thus “cat” and “dog” have similar meanings, while differences in associated words, such as “climbing” or “fetching,” differentiate the meaning between “cat” or “dog.” This kind of analysis can model meaning in language as a distribution of associated contexts; LSA implements those distributional and associative aspects of meaning in a particular way.

To capture the distribution meaning in LSA, a distributional semantic space is created from a global cache of knowledge in English, in this case an 11-million-word collection of texts, novels, newspaper articles, and other documents from kindergarten through first year collegiate readers.[7] This collection of texts creates a generalized semantic space useful in a variety of domains. Construction of the space uses the linear algebra process of singular value decomposition to transform the documents into a 300-dimensional semantic space where vectors are assigned to texts as approximations of meaning. After the space is constructed, additional documents are mapped to the space to obtain a vector representation for their respective meanings. (A vector is a mathematical object with direction and magnitude/length.) As visualizing 300-dimensional space is obviously difficult, Figure 1 illustrates the mapping of moral descriptors and one of the participant interviews to the first two dimensions of the semantic space. In a semantic space, documents that are similar in meaning are mapped to similar locations. Each point in Figure 1 refers to a 2-dimensional vector (from the common origin point labeled “0.0”) to the labeled location in the “meaning” space.[8]

Visualization of interview and moral descriptors in 2-D semantic space
Figure 1. Plot of Example Interview and Eight Moral Descriptors in Two Dimensions

To determine association between words, LSA uses a “bag of words” transformation, so a document is defined as a collection of words along with the number of times those words occur in the text. For example, the sentence “You should tell the truth” is defined mathematically as having one occurrence of the word “you,” one occurrence of the word “should,” and so on. In practice, common words like “you” and “the” (called stopwords) do not contribute much to the overall meaning and are omitted from subsequent calculations. The sentence is thus represented as one occurrence each of “tell” and “truth.” The association between words in a document, in this case “tell” and “truth,” depends upon the context of the possible vocabulary words not used in the document. The sentence is therefore not only defined by the words occurring in it, but also by the words not appearing. If the vocabulary of the corpus has 5000 words, the sentence is represented as a 5000-dimension, bag-of-words vector with a value of one for the dimension for “truth,” one for the entry for “tell,” and zeros for all other words in the vocabulary. Some information is certainly lost in discarding syntactic relationships and some common words, and transformed short sentences often lack sufficient word associations to be effective for analysis. Generally, to provide greater precision texts of at least a paragraph or longer are used.

In addition, although it is possible to perform some analysis directly with the bag-of-words vectors, LSA transforms the lexical representation of associated words to the semantic space by mathematically projecting the bag-of-words vector representation of the document onto the previously constructed 300-dimension semantic space. The vectors capture dimensions of the distributional aspects of meaning, and thus transform the bag-of-words vector to a vector that incorporates each word’s associative meaning in the distributional semantic space calculated from the larger sample of English text. The projection onto 300-dimension space results in words that are close in meaning being mapped to locations near each other, and thus the 300-dimension vector for the document represents the overall meaning of that document. The shift from word associations in the 5000-dimension bag-of-words representation to the more meaningful 300-dimension semantic space depends on the word associations of the corpus used to construct the semantic space and is also limited by its appropriateness as a representative sample of the language’s semantics.

Similarity is calculated as the cosine of the angle between vectors with higher cosine values indicating greater similarity between texts, in this case pairwise similarity between each of the interview responses and the moral descriptors. The cosine similarity calculation measures the angle between the vectors representing participant text and descriptor documents and thus measures their similarity without being influenced by the number of words in the texts. In Figure 1, the angle between the vector for the example interview and the vector for “charity” is fairly small, so that interview would be scored high in similarity to the “charity” moral descriptor. The cosine similarity score ranges from -1 to 1 and is higher for identical texts (with most scores between 0 and 1). The average (mean) cosines for scientists and musicians are then compared.

Moral Descriptors

Because the meaning of the moral values depends on the text used to describe that value, some care is needed to define the moral descriptor texts. In this study, four descriptors came from the moral psychology literature, and we developed eight additional moral descriptors, described immediately below and listed in the Appendix. Using the previously published moral descriptors enables eventual comparison of the results of this study with other investigations using those descriptors. Developing eight novel moral descriptors not only focuses moral analysis on professional practices but also contributes additional constructs to the emerging study of moral psychology using semantic analysis techniques.

Reimer et al. developed four descriptors to examine moral mental schemas among humanitarian exemplars, that is, those who serve as excellent examples of humanitarian moral behavior.[9] Reimer et al. derived the descriptors—“religious,” “just,” “brave,” and “caring”—from two prior studies of moral exemplarity. For the “religious” moral descriptor, Reimer et al. use the terms best describing religious exemplars from Lawrence Walker’s empirical study that found distinguishing personality features among moral, religious, and spiritual exemplars.[10] In a subsequent empirical study, Walker and Hennig showed that the “moral” exemplarity Walker had previously investigated could not be captured by a single profile. Based on their studies and coordinated readings in moral philosophy, Walker and Hennig distinguished three main types of moral exemplarity and produced profiles of just, brave, and caring exemplars, framed in part by the philosophical work of Rawls, Miller, and Noddings, respectively.[11] Reimer et al. similarly use Walker and Henning’s results to create moral text descriptors for just, brave, and caring exemplarity, which they combine with the religious descriptor for their broad study of moral traits. Because the moral and religious domains are often intertwined, we retained Reimer et al. religious descriptor in our moral text descriptors. These empirically derived text descriptors of religious, just, brave, and caring exemplarity are compared with the interview text to develop similarity scores between each descriptor and each of the scientist and musician interviews.

To develop more specific moral descriptors for the study of scientific and musical practice, a multi-step process was undertaken. First, thirty-four virtue terms believed appropriate for the study were extracted from the literature of virtue ethics, epistemology, and moral psychology and were refined down to fourteen qualitative codes using grounded theory.[12] The qualitative codes were made available for manual coding of the interviews, but were also transformed into moral descriptors and probed against the interview texts. Although it would have been reasonable to stop and compare groups on the basis of those probes, we continued to refine the probes with the hope they could become useful beyond the present study.

The statistical method of factor analysis, commonly used in psychology research, was used to identify eight virtue descriptors that appeared to make independent contributions. They are “honesty,” “integrity,” “humility,” “trust,” “accountability,” “skepticism,” “communication,” and “charity.” Each sentence in each virtue descriptor was stepwise compared with all the virtue descriptors to remove sentences that failed to contribute to the descriptor of which it is a part (cosine < .3) or to overly contribute to the other descriptors (cosine > .3).[13] This aspect of LSA used for the moral descriptors is analogous to the interview LSA analysis, except that each sentence in the moral descriptor text is treated as an independent document, while for the interview analysis, each person’s interview is treated as a document for processing. (See the Appendix for the resulting moral descriptor documents.)


The present investigation draws upon an international interview study of laboratory scientists (n=27) and ensemble musicians (n=44) who each completed an approximately one-hour semi-structured interview.[14] The scientists ranged from undergraduate research assistants to tenured professors, and the musicians ranged from music directors to amateur instrumentalists. Both groups were recruited from the United States and the United Kingdom. The resulting interview transcripts averaged approximately 7500 words in length.


For analysis, LSA was used to determine similarity scores (cosines) between each interview transcript and the twelve value descriptors. The statistical test MANOVA found significant differences for field and location but not gender or any interactions (field: F(12,55)=29.01, p<.001, Pillai’s trace=.86; location: F(12,55)=5.24, p<.001, Pillai’s trace=.53). Scientist interview text showed higher latent value for honesty and integrity, and musician interviews showed higher latent value for religious value (honesty: scientist cosine mean=.215, musician cosine mean=.146, t(68)=11.608, p<.001; integrity: scientist cosine mean=.301, musician cosine mean=.269, t(68)=4.407, p<.001; religious: scientist cosine mean=.201, musician cosine mean=.273, t(68)=5.786, p<.001). There were no significant differences for the other probes. Comparing transcript latent values across location did not find significant differences given adjustments used for multiple statistical tests.


We defined moral values in terms of twelve moral descriptors with four moral descriptors coming from the literature and eight derived from grounded theory, with the grounded theory descriptors also reduced through a series of discussions and refined through their pairwise comparison using LSA. Measuring the semantic similarity between the meaning representations of theory-derived value descriptors and the transcribed interview responses, we then used statistical methods to compare participant groups and characterize the groups’ differences in implicit moral values. MANOVA found significant differences by field, with scientist interview texts showing higher latent value for honesty and integrity, and musician interviews showing higher latent value for religious value. These findings suggest that honesty and integrity might play a central role in how scientists conceive their practice.

The higher cosine scores for scientists on honesty and integrity suggest that the interview texts for scientists included more implicit value set on honesty and integrity than it did for musicians. One might speculate that as values, honesty and integrity are simply more relevant to science than to music. If so, then other field-relevant values, perhaps especially skepticism, should also have higher value for scientists. However, that was not the case. The findings appear to indicate something about the moral practice of scientists and musicians rather than something intrinsic to the fields themselves.

Although many of the values examined in the study may contribute to scientific practices, honesty and integrity appear more present for scientists than musicians within an interview focused on what is important in being a “good” [field member]. A possible explanation is that honesty and integrity are more of a prerequisite to the practice of science than other examined values (as characterized by the text descriptors). One might become a better scientist if one were to have more skepticism or accountability, but one could still be a fair scientist with only a modicum of these virtues. However, a dishonest scientist or a scientist who lacks integrity may miss something so fundamental to the practice of science that person might lose the right to identify as a scientist among their professional peers. As the interview questions are designed to evoke moral schemas related to the practice of science (or music), the differences may relate to deeper aspects of one’s identity within one’s practice.

The higher cosine scores for religious value in musicians, as compared to scientists, may indicate that religion plays a more central role in professional identity for this particular sample of musicians. The ensemble musicians interviewed perform religious music, a fact that may affect their scores. In addition, the scientists might deliberately try to separate any personal religious commitments from their professional practice.

These findings have several caveats. They are limited by the moral text descriptors used and the scope of practice-related moral identity covered by the interview questions. LSA has psychological plausibility but is nevertheless limited by the semantic space used as a foundation for its representation of meaning. The populations of scientists and musicians in this study may also be insufficiently representative to allow generalizations across these vocations. In addition, treating the entire interview as a monolithic text may obscure differences in responses to the particular questions asked.

However, the significance of the findings on honesty and integrity warrant further investigation. Among musicians, honesty may be less important because the performer can alter the composer’s initial guidelines (for good reason), making the performer “dishonest” to the composer’s intent. Future work will examine more closely how scientists and musicians differ on the importance of honesty and integrity for their respective practices.


LSA can extract implicit meaning from text and quantify the level that a moral value implicitly occurs within interviews of scientists and musicians about their practices. Comparisons between groups help elucidate differences in their moral values and underlying moral schemas. Using LSA to compare theory-driven moral descriptors with semi-structured interviews adds an additional method to help moral psychologists analyze human moral behavior. In addition, the method’s broad applicability and automated processing of larger quantities of text may open up new avenues of investigation into human moral values, schemas, and behaviors.

MARK GRAVES  is Visiting Research Assistant Professor at Notre Dame’s Center for Theology, Science, and Human Flourishing. His research occurs at the intersection of artificial intelligence, psychology, and moral theology. He has a Ph.D. in computer science (University of Michigan), a M.A. in systematic and philosophical theology (Graduate Theological Union/Jesuit School of Theology at Berkeley), and publications in computer science, biology, psychology, and theology.


  • Annas, Julia, Darcia Narvaez, and Nancy E. Snow, eds. Developing the Virtues: Integrating Perspectives. New York: Oxford University Press, 2016.
  • Firth, John. “A Synopsis of Linguistic Theory 1930–1955.” In Studies in Linguistic Analysis, edited by John R. Firth, 1–32. Oxford: Oxford University Press, 1957.
  • Gardner, Howard, Mikhail Csikszentmihalyi, and William Damon. Good Work: When Excellence and Ethics Meet. New York: Basic Books, 2001.
  • Glaser, Barney, and Anselm Strauss. The Discovery of Grounded Theory: Strategies for Qualitative Research. London: Transaction Publishers, 1967.
  • Harris, Zellig. Mathematical Structures of Language. New York: Interscience, 1968.
  • Landauer, Thomas K., Danielle S. McNamara, Simon Dennis, and Walter Kintsch. Handbook of Latent Semantic Analysis. Mahwah, NJ: Lawrence Erlbaum Associates, 2007.
  • Narvaez, Darcia, and Daniel K. Lapsley, eds. Personality, Identity, and Character: Explorations in Moral Psychology. Cambridge: Cambridge University Press, 2009.
  • Reilly, Timothy, and Darcia Narvaez. “Virtue in Practice Interview Protocols.” Unpublished report, University of Notre Dame, 2017.
  • Reilly, Timothy, Xiao Liu, and Darcia Narvaez. Virtue in the Practice of Science: A Three Wave Validation Study. Manuscript in preparation, 2019.
  • Reimer, Kevin S., Christina Young, Brandon Birath, Michael L. Spezio, Gregory Peterson, James Van Slyke, and Warren S. Brown. “Maturity Is Explicit: Self-Importance of Traits in Humanitarian Moral Identity.” The Journal of Positive Psychology 7.1 (2012): 36–44.
  • Tiberius, Valerie. Moral Psychology: A Contemporary Introduction. New York: Routledge, 2014.
  • Walker, Lawrence J. “The Perceived Personality of Moral Exemplars.” Journal of Moral Education 28.2 (1999): 145–62.
  • —— and Karl H. Hennig. “Differing Conceptions of Moral Exemplarity: Just, Brave, and Caring.” Journal of Personality and Social Psychology 86.4 (2004): 629.
  • Wittgenstein, Ludwig. Philosophical Investigations I. Translated by G.E.M. Anscombe. Oxford: Blackwell, 1958.
Appendix: Moral Descriptors


You should tell the truth. Be honest in all you do. Lying and fabrication are harmful. You should stick to the evidence. Be confident that what you say is true. Try not to deceive others. Provide accurate information.


He is motivated to do it for itself. Doing this is part of her identity. I am motivated to do it for its own sake. I appreciate her genuineness. I want to act with integrity. I have pure motivations. Pursue the goals and values of your sources. Resist temptation. No matter what happens do not compromise yourself. Do your work for the right reasons. Stay committed to your purpose. Do meaningful work. Do not do things for status.


Recognize your mistakes. You have to be honest with yourself. Know your limits. Know your biases. Recognize your flaws and weaknesses. Be aware of your strengths. You have to know what you can do. Be honest about your capabilities. Be honest about failures. Notice when you have overreached. Know when you made a mistake. Be self-critical. Be humble.


You have to trust that others did what they say they did. I believe what they wrote. I trust the peer-review process. You need to trust others. I trust people that are knowledgeable. You have to trust their capabilities. I trust them to know what they are doing. I trust that they are honest. I trust the director to know what he is doing. You have to trust the writer. You have to trust the leader.


I often give feedback. I let someone know how they are doing. He auditions candidates. She screens applications. You point out mistakes. I give constructive criticism. I engage in justified critique. I let others know my expectations. I help new people learn the system. I guide others on the right way to behave. Some behaviors are not acceptable. Punish violators appropriately. Even leaders need to be accountable.


You have to look at what they did. You have to look at the data. It is important to ask questions. Did they do this right? Is there something wrong with their reasoning? Does it sound right? Is there a better way to do it? Do the results justify conclusions? Be skeptical. Did they do something wrong? Do I see any mistakes?


How you say something is important. You have to communicate in a way that others will understand. What you say should translate to your audience. You need to report clearly. Others should be able to recognize what you are communicating. Communicate things of value.


They do it to contribute to society. They want to make the world a better place. They are seeking to connect with others or with God. Help others flourish. Improve the wellbeing of others. Make the world a happy place. Make the world more peaceful. Be kind and helpful to others. Help humanity thrive. Contribute to your community.


I consider myself a just and fair person. I make good judgments by listening to all sides and being clear in my thinking. I usually feel truthful, honest, reasonable, and rational. In most circumstances, I am upright and true. I also try to have integrity in a way that is consistent. Many people consider me to be lawful, trustworthy, and honorable.


I consider myself a brave and courageous person. I stand up for my beliefs even when I must take a risk, make sacrifices, or face danger. I usually feel fearless, determined, strong-minded, strong-willed, and gutsy. In most circumstances, I am unafraid and daring. Many people consider me to be gallant, intrepid, and heroic.


I consider myself a compassionate and loving person. I care about others by helping and making time for them. I usually feel sympathetic, empathic, and concerned about the welfare of others. In most circumstances, I am kind, considerate, supportive, and nurturing. I also try to be comforting in a way that is genuine, and sincere. Many people consider me to be good-hearted.


I consider myself a religious and faithful person with strong beliefs. I believe in a higher power and try to know and please God by going to church, praying a lot and worshiping a lot. I usually feel devout, committed, and dependent on God. I also try to be active in church life and read the Bible regularly. Many people consider me to be dedicated, devoted, and knowledgeable about religion.

  1. Valerie Tiberius, Moral Psychology: A Contemporary Introduction (New York: Routledge, 2014); Darcia Narvaez and Daniel K. Lapsley, eds., Personality, Identity, and Character: Explorations in Moral Psychology (Cambridge: Cambridge University Press, 2009); Julia Annas, Darcia Narvaez, and Nancy E. Snow, eds., Developing the Virtues: Integrating Perspectives (New York: Oxford University Press, 2016).
  2. Timothy Reilly and Darcia Narvaez, “Virtue and the Scientific Researcher: Understanding the Personality and Character of Scientists,” poster presented at the American Educational Research Association, New York, NY, 2018; Timothy Reilly, Xiao Liu, and Darcia Narvaez, Virtue in the Practice of Science: A Three Wave Validation Study, manuscript in preparation, 2019.
  3. Howard Gardner, Mikhail Csikszentmihalyi, and William Damon, Good Work: When Excellence and Ethics Meet (New York: Basic Books, 2001); Timothy Reilly and Darcia Narvaez, “Virtue in Practice Interview Protocols,” unpublished report, University of Notre Dame, 2017; Timothy Reilly and Darcia Narvaez, “Practitioner Understanding of Excellence: Using Interviews to Understand Virtue,” paper presented at a meeting of the Society for Qualitative Inquiry in Psychology, Pittsburgh, PA, 2018.
  4. Ludwig Wittgenstein, Philosophical Investigations I, translated by G.E.M. Anscombe (Oxford: Blackwell, 1958), secs. 80, 109.
  5. John Firth, “A Synopsis of Linguistic Theory 1930–1955,” in Studies in Linguistic Analysis, edited by John R. Firth (Oxford: Oxford University Press, 1957), 1–32.
  6. Zellig Harris, Mathematical Structures of Language (New York: Interscience, 1968).
  7. Thomas K. Landauer et al., Handbook of Latent Semantic Analysis (Mahwah, NJ: Lawrence Erlbaum Associates, 2007), 69.
  8. The magnitude (or length) of the vector measures the “loading” (or projection) of the vector onto that dimension of “meaning” space. Whether that value has any meaning separate from the overall semantics of the vector depends upon whether that dimension is semantically coherent in isolation, which is often not the case. In Figure 1, only a relevant portion of the two-dimensional graph (in polar coordinates) is shown.
  9. Kevin S. Reimer et al., “Maturity Is Explicit: Self-Importance of Traits in Humanitarian Moral Identity,” The Journal of Positive Psychology 7.1 (2012): 36–44.
  10. Lawrence J. Walker, “The Perceived Personality of Moral Exemplars,” Journal of Moral Education 28.2 (1999): 145–62.
  11. Lawrence J. Walker and Karl H. Hennig, “Differing Conceptions of Moral Exemplarity: Just, Brave, and Caring,” Journal of Personality and Social Psychology 86.4 (2004): 629.
  12. Reilly and Narvaez, “Virtue in Practice Interview Protocols;” Barney Glaser and Anselm Strauss, The Discovery of Grounded Theory: Strategies for Qualitative Research (London: Transaction Publishers, 1967).
  13. A cutoff of 0.3 was chosen based upon familiarity with the protocol in a variety of contexts and because similar cutoffs are frequently used for correlations and factor analysis, which have similar ranges and analogous distributions in psychology.
  14. Reilly and Narvaez, “Virtue in Practice Interview Protocols.”