Airing my grievances with wellbeing science
We have a streetlight problem
Todd Kashdan had a recent post over on LinkedIn that he framed as a provocation:
Todd’s research is excellent and his substack and pop books are also top tier. That said, I was triggered by this post, and I thought I’d write up why as it might interest people who wonder about the transition from ‘science’ to ‘actionable insights for everyday people’ i.e. popular books (on happiness). I am a wellbeing scientist and I don’t think we’re much chop, frankly. Almost all of my colleagues are wonderful people trying their best, no shade on anyone - wellbeing is a very hard topic to study rigorously. But let me vent for a bit. Todd’s substack is called “Provoked” after all - let me respond in that spirit of provocation.
To be blunt, I don’t think we’ve learnt much since Diener and Veenhoven started collecting life satisfaction data. What’s more, I think we were working on much deeper things before the advent of the ‘modern science of wellbeing’ and life satisfaction research has mostly distracted us from these deeper issues. And to be even spicier, I think these insights did not arrive from ‘philosophers in armchairs’, but rather from scientists, especially therapists like Carl Rogers.
In the bad old days, “happiness was a flickering candle in a dark room”, whereas now we know (sort of), thanks to #BIGDATA studies like Margolis et al 2025, that:
“sociability, physical health, disengagement from goals, sex life quality, wealth, and religious activity…aches, stress reactivity … control over one’s life and control over financial work matters … strongly predict wellbeing”.
Forgive me, but no shit Sherlock. These items were not mysteries to people like Carl Rogers, or the stoics even. I think Epicurus has basically got all of this in one quote from his Principle Doctrines (4th century BC!):
It is impossible to live a pleasant life without living wisely and well and justly, and it is impossible to live wisely and well and justly without living a pleasant life.
Over in Happiness Economics, Clark et al. (2018) summarised the Origins of Happiness as income, education, employment, criminal convictions, long term partnership, physical health, and the absence of mental illness (my long review here). This is not Earth shattering stuff. I did not in fact need to get out of the armchair to confirm it, let alone collect what by now is hundreds of millions of dollars of global life satisfaction data. I mean, health and mental illness as bad for wellbeing is borderline tautological.
I think the two best candidate for genuinely new things that we’ve learnt from happiness studies are:
The u-shaped relationship between age and life satisfaction
The hedonic treadmill/adaptation, which is where people seem to get used to both big positive and negative shocks to their satisfaction and revert to baseline over time.1
But even these findings (which I stress I think are probably true, broadly speaking) rely on naïve interpretations of the data. I do not mean this naivety pejoratively; I mean that you take the data at face value. These interpretations have come in for extensive critique, especially recently. There are very different ways to read the data that suggest that these findings might be artifacts of the measurement instrument used - namely life satisfaction scales. In that case, the last forty years of wellbeing science would have misled and benighted us, rather than turning that flickering candle into wider illumination. It would also have been a huge waste of money.
Let me explain this in some detail…
A typical life satisfaction question asks: “All things considered, how satisfied are you with your life at this time on a scale from 0-10?”. Now satisfaction does not naturally run from 0-10, it is unbounded. One might be suspicious that this is liable to lead to some problems for statistical analysis. But people living better (worse) lives generally give higher (lower) numbers i.e. life satisfaction scales show psychometric validity. There are some important caveats to this. For example, Japanese people report that their ‘ideal’ satisfaction is ~8/10 on average (they value balance), whereas for Americans it is closer to 10/10. But if you’ve got a huge sample of westerners you can draw some modest conclusions from life satisfaction data.
I stress modest, because there are a whole bunch of problems with scales that means ‘statistical significance’ with this data is basically a mirage, and long-term trends are potentially misleading.
The basic issue is that the life satisfaction report we see - the number - is not the individual’s underlying latent satisfaction (if such a thing exists), but rather that latent satisfaction passed through a reporting function. This is a cognitive, emotional, linguistic, and social process by which an individual makes a judgement of their life, and then converts it to a number on the scale. People do it very fast - typically less than 7 seconds. But there is a lot complexity behind that number. For example, in a project (working paper here) we interviewed 100 residents of the UK and asked them to think out loud while answering questions about their life satisfaction and how they report it. One woman said that she was 8/10. Around about average for the UK. Nothing to see here. I guess she has health, income, marriage and the other ‘origins of happiness’, moving on…But when we asked her what would make her give a higher number she became evasive. After some gentle prodding, she eventually divulged that (note we have consent to share the transcripts publicly) she has an adult child institutionalised with paranoid Schizophrenia. She says that “there’s a person out there that I gave birth to who isn’t whole, and I don’t think I can even be whole while they’re unwell”. Is this health? Is it relationships? I hope I’m not the only one who reads this sort of text and thinks that these categories our numerical data must be crowbarred into distort the reality they purport to measure.
Now you might think, “that’s some cute qualitative material there mark but what implications does it have for the statistics?” Well statistical analysis of life satisfaction data relies on three standard assumptions that don’t survive a lot of this sort of qualitative complexity:
Linearity/cardinality i.e. that the size of the intervals between response categories is equal: the sort of magnitude change in your life to go from 3 to 4 is the same as the magnitude to go from 7 to 8.
Interpersonal comparability i.e. that different people use the scales in broadly the same way e.g. that Japanese people use the scales the same as westerners.
Intertemporal comparability i.e. that the same person uses their scales in a consistent way over time.
None of these is strictly tenable, frankly. I’ll give you a whistle stop tour.
First up, just a quick visual explainer of one way that the reporting function can muck you up. We’ve all had different teachers at school - some whom were generous in their grading, and some who were strict. They might have assessed a student’s paper the same in that they might agree that it is in the top 10% of the class. But to one professor that might mean that it gets a 90, whereas to another it might mean that it gets the barest sliver into the top band (at Warwick that would be 70). Now imagine this generous and strict notion playing out in life satisfaction reporting. Two people living the same life assess that life in the same way, but one is generous in their reporting while the other is strict:
This is an example of interpersonally incomparable scale use. I think this explains why extroverts have higher life satisfaction on average than introverts. It’s not that they assess their lives as being better, it’s that they have a relatively positive reporting style. From a prudential point of view - when we are thinking about wellbeing - we are trying to improve people’s reasonable evaluations of their life, not just encourage them to say higher numbers. The numbers are an indicator, not the wellbeing itself. In which case, efforts by Lyubomirsky among others to train people to be more extroverted so that they can be happier strike me as American cultural imperialism rather than positive psychology. I’m not talking about spending time with your friends here, I’m talking about changing your personality. Having someone paternalistically try to make me more extroverted is Brave New World material.
OK Mark, cute conceptual illustration, but where is the evidence that this causes any serious problems? Well my coauthor and colleague at the University of Warwick Caspar Kaiser has a working paper with Anthony Lepinteur where they replicate 80 life satisfaction studies in top economics journals. They introduce a novel statistical technique for estimating the sensitivity of the results to deviations from linearity. They conclude:
Replicated coefficient signs are remarkably robust to mild departures from linear scale-use. However, statistical inference and estimates of relative effect magnitudes become unreliable, even under modest departures from linearity. This is especially problematic for policy applications.
Their project was a response to this paper from Bond and Lang, which was excessively pessimistic. It basically argued that departures from linearity could reverse the signs of effects e.g. that income could have a negative (-) relationship with satisfaction or a positive (+) one, we can’t possible tell. They concluded that all scale use should be abandoned. Caspar and Anthony find that such sign reversals are in fact basically non-existent. But t-statistics and thus statistical significance are highly sensitive to even small departures from linearity, and so we can’t have much confidence in life satisfaction research. While we might have confidence in findings like “health and relationships are good for your satisfaction”, I don’t need life satisfaction data to tell me that. And the sorts of deeper questions we might want to get at quantitatively? Well life satisfaction scales just aren’t much use for generating certain insights on those things, so what benefit do I get for the sacrifice of qualitative texture? Little. (A couple more papers on this issue here and here). Indeed, even for things for which we strictly need quantitative data, like cost-benefit analysis, life satisfaction scales need to be treated with huge amounts of caution.
Okay okay, I concede we’ve got a problem Mark, but that shouldn’t affect things like adaptation and the u-shape that only rely on longitudinal analysis. Right?
Well I’m afraid longitudinal analysis is pretty cooked too. We have decades of evidence from the response shift studies literature in medical science that people interpret quality of life questions differently before and after major shocks. In that field it doesn’t matter so much because patient-centred medicine was always more about care than about scientific precision. In psychology and economics, however, the context is different - precision (e.g. in QALY calculations) and statistical significance is everything. Indeed, the argument Kasdan is laying out above is basically that qualitative research is guess work and only the hard data of statistical significance is rigorous. Well we’re in a pickle then…
At the time of interview in 2020, one of our think out loud subjects was suffering from severe long COVID. She said that she was 7/10 because she had been able to return to work part time and walk her dog for 20 minutes. She thought of 8/10 as working full time and walking her dog up a hill, 9/10 as being able to run again, and 10/10 as being able to run a local half marathon. Because it was only a year ago that she had her big shock and it was such a temporally clear moment, we asked her what the points on her scale would have meant in 2019 before the COVID pandemic began. She said that she was also a “7 or 8” then, but she was regularly running half marathons and her 9 and 10 related to finishing her PhD, which had to be put on hold owing to long COVID.
This could be interpreted as adaptation, but this is not how the respondent saw it. When asked directly: “Is your scale from 2019 in any way comparable to your scale today?” her response was “totally different, because I would not have started from a point of extreme illness…I think I was a different person, actually”. She was clear that her life today was much less satisfying than pre-COVID, stating that:
My perspective has massively changed with having to go through such bad health and such gruelling recovery and affecting so many different areas of my life, not physically but mentally also academically, socially, in terms of work, so many different aspects. So I think that my perspective has changed completely and I think before 2019 then I think I was much more aspirational.
This cannot be interpreted as adaptation unless numbers are trusted more than subject testimony (in which case why are we doing subjective wellbeing research at all?). Our interviewee’s 7 today and her 7 in 2019 do not correspond to commensurate magnitudes of satisfaction. Her scale has shrunk, as depicted in the Figure below:
This sort of scale norming introduces bias into longitudinal statistical analysis of life satisfaction change that assumes fixed scales over time. Far from resolving this issue, panel data makes it more likely to occur the longer the panel. These are not time-invariant fixed effects associated with differing reporting styles across respondents, but rather an example of changing reporting style over time within a respondent. Ironically, the sorts of events that we want to study for their effects on life satisfaction are precisely the sorts of things that alter the way people use their scales!
So far we’ve violated linearity and inter-temporal comparability. I’ve touched on violations of interpersonal comparability too, but let me quickly list off a few more examples from our interviewing study:
Huge numbers of people say that they can’t see themselves giving 9/10 or 10/10 no matter how good their lives might get. Meanwhile, the people who say that they are or can see themselves being 10/10 have quite modest lives in mind. Indeed, we think they assess their lives identically to how many of the 8/10s do, they just have a more generous reporting style.
Elderly respondents seem to interpret the question as how their life went whereas younger respondents think of how it is going. Unfortunately, when we ask elderly respondents directly how it is going they often change their answer to a lower number because their friends are dying and their health is failing. So the u-shaped relationship between age and satisfaction is perhaps an artifact of the measurement instrument, not a real phenomenon.
People have widely different time frames in mind when answering the question. Some are thinking in the moment, others about the last 5 years, and everything in between.
All this to say that I have little confidence in the statistical properties of life satisfaction scales. But then why am I bothering with this research method? If wellbeing is super complex in reality, shouldn’t I be talking to people, as the therapists did, rather than just asking them for a number? Natural language, ideally in audio-visual form with emotional cues and the like, is the data that is closest to our object of interest. Many quants freely admit this, but argue for the comparative advantage of statistics, namely the rigor of seeing patterns in the aggregate. I certainly think there is some truth to this - Carol Graham’s books are an example - but that this is more ‘rigorous’ than assembling qualitative case data on masse is at best thin, at worst a mirage. In any case, if qualitative data is closer to our object of interest, then why is 99% of wellbeing research quantitative, and why are wellbeing scientists so dismissive of anything that doesn’t use numbers?
I should also say in passing that there are many things about wellbeing that we don’t even Know with a capital K because most of modern wellbeing science doesn’t employ causal designs. It is instead big regressions of national and international data sets (not that this isn’t valuable). You might think I’m about to launch into an impassioned plea for more causal studies in wellbeing science. Far from it. I’d certainly like those, insofar as they’re possible (they’re so hard to do in wellbeing), but actually I want us to do more theory and more big N qualitative work. More on that in a moment.
Let me first return to my spicy point about why the statistical analysis of life satisfaction data is not more scientific than what early therapists were doing. Science is about making conjectures that are coherent with observation. It is not about quantitative methods in the strict sense of using numbers. The advantage of quant is simply the volume of observations, which is very important when you’re relying on an average, and because every human is unique social science always relies on the average. But the therapists saw a large volume of subjects. They were not ‘guessing’. And they had vastly more than ‘zero data points’. They were making claims on the basis of hundreds of clients coming through their clinics, thousands in the aggregate once therapists get together and share notes. Certainly they would do detailed idiographic analysis in many cases - understanding the narrative arc of a patient’s life is often critical for therapy and something badly missing from life satisfaction research. But ultimately they were looking for patterns. And that means counting, assembling, quantification. They were just quantifying after doing grounded theory first, whereas life satisfaction research has never even theorised what it’s construct of interest is! As Anna Alexandrova analysed in her magisterial A Philosophy for the Science of Wellbeing, the field is extremely theory avoidant. Life satisfaction research wants to distill a theory of wellbeing from the data, but if you had a theory of wellbeing to start with you’d suspect that these scales don’t measure it very well.
The study of wellbeing before the advent of the ‘modern science of wellbeing’ was often getting into the depths of things that contemporary science merely observes the surface of. Notably, the Margolis paper flags ‘disengagement from goals’ as a key determinant of wellbeing. Yet it’s plain that sometimes disengaging from goals is a good idea, like when they’re not suited to you. That sounds like a complex self-actualisation problem to me. Who did the best work on that subject? The (existential) philosophers and clinical psychologists of the mid 20th century, specifically the humanist tradition, which life satisfaction scholars tend to ridicule because it struggles to employ quantitative methods. Closely related is the eudaimonic tradition in psychology, which is often critiqued for being too messy. But when your counterfactual is a 0-10 scale everything seems messy. And in any case wellbeing is messy! When I got interested in wellbeing as a teenager it was because I was struggling with nihilism, trauma, emotional intelligence, character development, my place in the world, relations with my parents and people I wanted to love. These are all things that are generally too complex for experimental studies. But they’re the big ticket items as far as wellbeing is concerned! If we aren’t willing to get stuck into this stuff because we insist on quant, and if quant is incapable of handling this sort of complexity, then the insistence on quant is making us dumber!
A further irony is that much of the best stuff in positive psychology is literally just distilled ancient wisdom. Gratitude is the most obvious example. I think gratitude is great, but it’s baked into basically every religion. It’s why we say grace before meals! But note that gratitude in religion is a multidimensional practice plugged into wider socio-cultural, psycho-social, and transcendental systems (my thanks to Matthew Iasiello for this point). That’s why it works so well. In contrast, gratitude in psychological science is a cute diary writing exercise that you do for a couple of months. The experiment is such a pitiful simulacrum of the phenomena it is trying to wrestle into order (this is a general challenge for experimental psychology - Berna Devezer writes well on this and it was a theme of the excellent theory crisis issue of Perspectives on Psychological Science). Nonetheless, we observe extremely small but statistically significant positive effects in experimental studies of gratitude, which has some value in terms of giving us confidence.
But again, unfortunately, this is precisely the sort of stuff that the scale norming literature undermines confidence in. Are people more satisfied with their lives after gratitude diaries, or do they just change their reporting style? The ‘science’ is supposed to give us more confidence in the ancient wisdom, but it even fails to do that. I’m not throwing shade on the scientists here or earnest attempts to be rigorous, I am throwing shade on the claim that it is only with life satisfaction scales that we came to know things. Wellbeing science as currently practised is often a benighting force.2
I basically think that wellbeing science has a streetlight problem (courtesy of sketchplanations):
The street light is experiments, big social surveys, and life satisfaction scales. But our keys are over in humanistic psychology (and anthropology, and literature, and the other fields that have developed to try to observe human complexity in its full glory). And let me stress that it’s not that experiments, social surveys, and scale questions aren’t valuable. They definitely are. They would be even more valuable if they were trying to get some quantitative robustness around the insights of the more complexity-capable approaches, instead of trying to reinvent the wheel. What’s not valuable is the hubristic claim that it is only once we installed the streetlight that we finally started making progress in finding our keys.
Another metaphor that I sometimes use to describe contemporary wellbeing science is that we’re looking at a phenomenon through a keyhole. If you open the door you realise that there is so much more depth and complexity there that needs to be appreciated, but that you can’t do that with shitty 0-10 scales and linear regression. You can’t even do it with 0-10 scales and machine learning, because there simply isn’t enough information in the scales and similar social survey items to capture the texture of people’s lives. The algorithm doesn’t have anything worth sorting. Ironically, I think it’s in part because a lot of life satisfaction enthusiasts never opened the door that they never realised life satisfaction scales are rife with problems. If you have a sense of what it is that you’re trying to measure, you’d be more cautious about whether this simple instrument is measuring it in misleading ways. (It’s not even measuring! People assign numbers to value judgements - that’s a cognitive process, not putting a ruler next to an object.)
So what should we do instead? Well fam I have some grants in the cooker and I don’t want to get scooped, but the short answer is use citizen science to generate thousands of hours of natural language audio-visual data of people analysing their own wellbeing. Then we use machine learning to structure that data, and deliberative practices where citizens scientists collaborate with subject matter experts to get a thematic understanding of it. Will it be causal? Hell no. But we’ll accrue bigger wisdom much faster than our current scratching in the dirt is capable of.
Postscript:
Incidentally, the problem with the (analytical) philosophy of wellbeing is the blind men and the elephant:
The often very helpful methodology of (analytic) philosophy is to break things up into their component parts as much as possible, rather than viewing them holistically. This is baked into the question of wellbeing philosophy, which is “what is intrinsically wellbeing, as opposed to merely instrumental to it?”. This question leads philosophers to ignore 99% of what is associated with wellbeing as a “how” or “applied” question, leaving behind 1% to quibble over (the “what” question). Is wellbeing pleasure? Preference satisfaction? Living according to your nature? Having good things in your life? Unfortunately, as I explain in my book, A Theory of Subjective Wellbeing, once you start to think about how to get wellbeing or how it works, you quickly discover that all the philosophical theories of what wellbeing is are all interdependent - it’s the same elephant. The philosophers are blind, and in a similar way to the pure quantitative scientists. They’re restricting themselves to only one keyhole and thus only seeing one part of the wellbeing landscape at a time. You need mixed methods and interdisciplinary approaches.
Another incidentally: the reason why Stoicism is one of the only branches of the philosophy of wellbeing that has broken into the mainstream is precisely because it is concerned with the how not the what. Indeed, Marcus Aurelius says this quite explicitly: “Waste no more time arguing about what a good man should be. Be one.” Yet stoicism is not something you will encounter in top philosophy journal articles on wellbeing. It will never cease to amaze me how committed academic philosophers are to making themselves irrelevant to their fellow citizens.
This is one source of the logarithmic relationship between income and life satisfaction. As you get wealthier, it takes ever greater increases in wealth to increase your satisfaction (on a scale from 0-10 anyway). You just sort of get used to everything. This too, is not a new observation. It is literally in every major religion. It’s only really the prosperity gospel and then late capitalist culture that made some Americans forget the elementary insight that money doesn’t make you happy. What was it that the Bible said? “It is easier for a camel to go through the eye of a needle than for a rich man to enter the kingdom of God”. Thinking that your success can be measured in money is a hallmark of psychopathy.
I should note here that I use a lot of quantitative studies in my own work, and have even published some, but they are usually beyond life satisfaction scales. Todd’s post is specifically about Diener and Veenhoven, and they were specifically about life satisfaction scales and mood measures. Indeed, Veenhoven could even be a bit nasty and certainly dismissive of things like basic psychological need surveys.






![The Blind Men and the Elephant. [the BIG picture] | by Sophia Tepe | Betterism | Medium The Blind Men and the Elephant. [the BIG picture] | by Sophia Tepe | Betterism | Medium](https://substackcdn.com/image/fetch/$s_!TCUz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb864acb6-83c7-47b7-ac64-d1c41f40994d_560x292.jpeg)
INCREDIBLY BANGING WORK MATE (as you know, but never hurts etc)
Thanks for the kind words. Just reposted with my thoughts. Some good stuff in here.