The Scary Line Between Fact and Racism

Opinion Piece by Lelouch Giard

The Scary Line Between Fact and Racism
The Scary Line Between Fact and Racism - Image - Pixabay

South Africa has, in recent years, been through quite the racism and sexism drama. In such a loaded atmosphere, it is very important to be careful when talking about any sensitive matter – an accusation of racism can be very troublesome, and being found guilty changes a person’s life for the worse forever. It is so serious, that many people avoid talking about certain things just to be safe, but some sensitive subjects are far too important to just sidestep. When we talk about such subjects, where exactly is that hazy line between discussing statistics, and falling face-first into a possible SA Human Rights Commission punitive fine?

In this article, I’ll discuss that line. Sadly, staying on the safe side of the line is no guarantee of safety from hecklers, bored Social Justice Warriors, or “Paid Twitter”. However, sticking to statistics and facts is likely to be vitally important for any formal proceedings (court, SAHRC case, disciplinary hearing) or a discussion with any person who relies on logic above emotional impulses. In the end, sometimes silence might be the only safe option, but silence won’t solve our country’s problems.

First, a few definitions:

Statistics is a branch of mathematics concerned with data collection and processing. It is a matter of numbers – importantly, statistics is abstract (statistics does not say what something is, only what it is likely to be), and in almost all cases handles the data in a collective manner (lumps things together into categories which broadly describe those things).

Racism is discrimination or prejudice towards a person or group based on their race/ethnicity. Race (and ethnicity, which is mostly used in the same sense) is the concept that people can be divided up into groups based on some manner of feature (typically biological features, but often including specific behaviours or cultural traits).

Sexism is discrimination or prejudice towards a person or group based on their sex or gender. Sex and gender are generally divided in modern politics into biological gender, and gender identification: biological gender is determined by the presence or absence of the Y chromosome, and is generally reflected by the figure and sexual characteristics of a person’s body; the gender a person identifies with, to simplify a very complex subject, is the gender that the person considers themselves to be, and is not limited to the binary categories of biological gender. Sexism is generally based on the biological gender, though there are entire other categories of bigotry related to gender identity, like homophobia.

To help define that elusive line between statistical fact and bigoted fiction, I’ll be using a suitably sensitive example: the relative growth of different racial populations in South Africa over the last century or so. The graph below is from an article titled “We’re running out of whites” based on data provided by Stats SA. This graph and its implications are definitely the sort of sensitive subject that few people feel comfortable getting near – particularly white people, as they have generally been judged rather harshly for any race-related commentary.

The important part of staying with facts and statistics, and not tripping over the line into bigotry (racism specifically in this example case) is to know what the graph does say, what the graph implies by clear logical deduction, and what could be incorrectly deduced from the graph if one uses unsupportable assumptions. Where assumptions are involved – which is almost all the time with statistics – it is important to make that clear, whether by explicitly explaining it or by using specific words and phrasing.

First, what exactly does the graph show? This graph shows the formally recorded amount of South African residents over a period of time. Thus, it is 100% correct and factual to say that the official population of Asian South Africans changed from 148 000 to 1 386 002 between 1910 and 2016. That is fact, as provided by the graph, which is based on data from a reputable source (the reputable source being important – sketching a quick arbitrary graph does not count).

Any conclusions drawn without making any assumptions are also fact: dividing the listed 2016 black population by the listed 2016 white population tells us that in 2016 there were 9.9894 times as many black South Africans as there were white South Africans. That is fact, since there were no additional assumptions needed; only the two data values provided by the statistics and a basic mathematical operation.

Basic, sensible assumptions are the next step. If one were to grab a random person walking down the street, how much more likely is that person to be black compared to the change of selecting a white person? The previous paragraph suggests 9.989 times, but that is NOT a fact in this case. Answering this question with 9.989 requires one to ASSUME that people walking down the street are perfectly proportionally distributed according to the population of South Africa. The assumption, in short, is that the specific pool of people in the question – people walking down the street – is statistically the same as the broad case of the entire population, which is very unlikely.

Thus, it would only be appropriate to answer with “approximately 9.989 times as likely”: that word (approximately) makes it clear that an assumption has been made, which means that the answer is only as reliable as the assumption. Giving that answer also implies that the person answering considers the probable error in that assumption to be small enough to be unimportant, meaning that there are two assumptions made even in such a simple case.

Another step away from the factual nature of the statistics is when assumptions are made that attempt to simplify a complex property or value to a more usable value for estimation and analysis. As an example, how might one estimate the average amount of children that black families in South Africa have had over the past century or so? Several assumptions are necessary for this calculation, and those assumptions in and of themselves either have to be supported by suitable statistics, or backed up by sufficient logic to make it clear that the selection of the estimated value is done in a way that is not prejudiced or stereotyping.

The major assumption is generation length. The estimated length of a generation provides the exponent for the calculation, and is thus has a very important effect on the result. Additionally, the generation length represents an estimate of the age at which a person in the relevant group tends to have children, which carries many connotations and implications. Some “rules of thumb” for generation length are 30 years for the long end and 20 for the short end – but 30 is often associated with developed nations and 20 with poorly developed nations; as such, the selection of this value could cause offense if not suitably supported by facts, or done in a way that clearly does not display prejudice towards the relevant group. Either way, this sort of assumption carries a lot of risk and should preferably be avoided, even if it is still arguably logically sound.

For purposes of the example, I’ll use 25 as a suitable middle ground estimate for EVERY population group (uniformly applied to avoid the possibility of prejudice in its selection). Using 25 years, an estimate of 4.24 generations is obtained; partial generations are of dubious value, so I will use the rounded estimate of 4 generations. Taking the 4th root of the ratio between current population and past population gives an estimate of the ratio of children to parents at each generation: 1.84.

What does that 1.84 mean? It means that a rough estimate based on the given data and the generation length estimation suggests that black couples had an average of 3.68 children each between 1910 and 2016. Very importantly, that is a very rough estimate relying on many assumptions:

The generation length being 25
Every single person in that group being part of a couple in their own generation
Every child born surviving to adulthood
Perfectly even male-to-female ratio
No immigration or emigration
Life expectancy changes ignored
Etc.

Clearly, some of the assumptions are obviously inaccurate and unreliable; others are likely not perfect, but at least believable. Is the result obtained useful? The usefulness of any statistical value is dependent on how reliable it is – the population data should be 100% reliable as each person (with each person becoming a single data point in the statistic) was checked, so there is no doubt. Perfect male-to-female ratio is clearly wrong, so the question there is how wrong it is – since the error applies similarly at both the starting and ending points of the relevant data, it can reasonably be expected that the error is 20% or less, but that is still a massive error. The result obtained is still useful, but only in the broadest sense: the accuracy is too poor to give a useful numeric value, but comparing the result to the result of the same calculation using the white population data, it can be deduced with moderate confidence that black couples tended to have more children than white couples over that period of time.

Looking back, it took almost a page to motivate why that calculation resulted in something one could say with moderate certainty. It may be somewhat useful, but generally such an assumption-reliant deduction is not very useful. So, what could one say about that question and be technically correct, and what would be over the line?

“Black families tended to have more children on average than white families in the 1910-2016 period” – moderate certainty, based on the data. Not really ideal, but justifiable.

“Black families in the 1910-2016 period tended to have 3 or more children on average” – rather low certainty, but still somewhat supported by the data. Not advised, but not a racist thing to say.

“Black families have 3 or more children on average” – not suitable. Not racist as such, but misleading as it does not mention the limits of the data, instead implying that the number obtained is an innately true for black families, rather than true for that specific group over specific time and thus is specific circumstances. This statement generalises too much.

“Black families have many children” – not suitable, and arguably racist. The choice of words makes an implication; whether that implication is seen as positive or not by a given person is not really relevant.

“Black families have too many children” – not suitable, and clearly racist in a way that should definitely be taken to the SAHRC and other relevant authorities, as this the speaker has taken it upon themselves to make a sweeping (pre-)judgement on the basis of race, which is the definition of racism.

“Black families tended to have more children on average than white families in the 1910-2016 period, due to a culture where children take care of elderly parents” – not suitable, arguably racist. The problem with this version of the statement is that it presents an unsupported assumption about the reason for the result calculated, which is nothing more than a poor attempt to camouflage an unsupported assumption.

“Black families have 3 or more children on average, which is why they tend to be poorer than white families.” – not suitable, definitely racist. Again, an unsupported assumption is passed off as part of the statistically acquired result, but with what amounts to a moral judgment attached. Clearly qualifies as racist since it contains implied pre-judgement of black people as poor, and moreover as being poor through their own actions alone.

As a last note, some deductions and conclusions require too many assumptions to ever be sensible – for example, looking at the given graph provides no support for any conclusions made about the relative wealth or poverty of a racial group.

It is important to extend all practical courtesies in how we speak, since tolerance and tact are what lubricates the machinery of civilisation. That said, sometimes we need to tackle the difficult questions, and that is when it is absolutely vital to understand, remember and apply the difference between statistical reasoning, and unsupported assumption.

Visit A Vigilant Voice for more articles by Lelouch Giard

South Africa Today – South Africa News