Thursday, 16 May 2013

Face It – Collective Punishment Is Popular

…at least, with a small, statistically-irrelevant number of people:
The majority of people agree with introducing more dog exclusion areas across North Lincolnshire, according to a poll conducted this week.
So...this poll surveyed ALL the people in North Lincolnshire?

No. Of course not. It didn't even survey all the dog-owners!
This website asked people whether they agreed that the number of dog exclusion areas should be extended by North Lincolnshire Council. 61 per cent voted Yes, with 39 per cent saying No - with 325 people taking part in our poll.
Completely statistically insignificant, then? But it'll be used to guide further restrictions, I bet:
There are 23 additional exclusion areas proposed for around the region, adding to 84 which are already in place.
 Ouch!
Council officials say the areas will keep irresponsible dog owners in line with the majority of responsible dog owners.
Ummmm..... what?!
Nigel Sherwood, cabinet member for highways and neighbourhoods at the council, said the exclusion areas should stop offending.
He said: "We recognise the majority of dog owners in North Lincolnshire are responsible and take care of their animals.
"However, for the few irresponsible dog owners, the new exclusion areas should deter them from offending. "
No they won't. No you don't. No it won't.

10 comments:

drsolly said...

So many people don't actually understand the term "statistically significant", and yet they still use it.

It is *extremely* rare to survey 100% of the population - the only case I can think of is political elections (and even then you don't get 100% turnout. Usually, you survey a sample that's large enough so that you can get an answer (yes or no) with a high degree of confidence (95% conficence is the usual level that statisticians use for general purposes).

To say that the survey was "statistically insignificant" you would have to do a proper statistical test to see the likelihood that the population was actually 50/50 and yet the survey came up 61/39.

https://en.wikipedia.org/wiki/Statistical_significance

In this particular case, we can actually say, with better than 95% confidence, that more people agreed with the proposition than disagreed.

There's a handy calculator for this here:

http://www.surveysystem.com/sscalc.htm

Fidel Cuntstruck said...

@drsolly In this particular case, we can actually say, with better than 95% confidence, that more people agreed with the proposition than disagreed.

Surely, all you can say with confidence is "that more of those people asked agreed with the proposition than disagreed"

At what percentage of the whole does the number of people asked become a "representative sample"? Taking North Lincs as an example - there must be a couple of million people there right? so even if you interview 1000 that's 0.05% of the population - hardly representative especially if you don't sample from as many different areas as possible.

It's the old "weighting" trick isn't it?

drsolly said...

"if you interview 1000 that's 0.05% of the population - hardly representative"

This is a misunderstanding of the expression "representative sample".

Let's consider a less controversial situation, so as to focus on the statistics. Suppose you have a billion one inch nails, and a one inch nail is supposed to be one inch, plus or minus a twentieth of an inch. And you want to know what percentage of these nails meet that criterion.

You don't measure them all - that would be very expensive, and unlikely to be worth the effort. What you do is you take a sample of them - maybe a few hundred, and you measure those, and from the size distribution of the nails you measured, you can make a statement about how many of the nails in the full populaton are likely to be oversized or undersized. The calculation of the necessary sample size is fairly simple, and depends on how accurate you want to be about the statement about the nails in the full population.

But probably, less than a thousand nails would give enough accuracy for most people. The fact that 1000 nails is less than 0.000001% of the population doesn't stop it from being a representative sample.

What makes a sample representative isn't the sample size. It's the method used for sampling. Ideally, you want to stir the nails in a big barrel, and then draw out your sample blindfold. In practice, this isn't practicable, so you use a sampling method that you hope doesn't introduce any bias. So, for example, you might just scoop out a bowlful of nails to use as your sample - then the hope is that the larger nails don't settle to the bottom or rise to the top, or migrate over to the left hand side of the nail-barrel. And we make this assumption, because our knowledge about how nails behave in a barrel-ful doesn't lead us to think that this happens.

But what I'm really complaining about here, is the way that the media (including some blogs) use the word "statistically" without having done any research into the theory and practice of statistics.

In the case of the dog exclusion survey, the problem with the sample isn't that it's too small; it seems plenty big enough for this purpose. And it isn't that they've not tried to determine that their sample includes a fair percentage of dog owners and non-owners. The problem is that the sample was self-selecting.

And the original blog article didn;t point that out; she seems only concerned that the sample size was too small, and should have included the entire population.

The whole point of this branch of statistics, is that you don't need to.

Fidel Cuntstruck said...

@drsolly

Very enlightening, thank you.

So, if I understand what you are saying and using the barrel full of nails example, a mixed sample (debatable in the North Lincs case as you pointed out) allows us to use statistical calculations to "assume" that we have a representative picture, yes?

JuliaM said...

"It is *extremely* rare to survey 100% of the population.."

Agreed. But a tiny %tage of the people who read online newspapers can't possibly be considered 'representative' of anything, except the people who read online newspapers!

"The problem is that the sample was self-selecting."

Yup!

Plus the nature of these types of surveys is that it's easy for the 'for' AND the 'against' to rally their friends with a click of a button.

Far better, surely, to stop a random sample in the street?

drsolly said...

"allows us to use statistical calculations to "assume" that we have a representative picture, yes?"

No.

You make the assumption that you have a representative sample when you scoop up a handful of nails from the top of the barrel. Your assumption is that there's no systematic difference between the nails at the top, and the rest of the population.

What the statistics lets you do, is calculate the likelihood that 99% of the nails in the barrel are within the length limits.

drsolly said...

"Far better, surely, to stop a random sample in the street?"

Better, yes, but there's a problem with that.

1) People who care about the issue are more likely to answer than people who don't. That could bias your sample.

2) Good luck with stopping people driving along the street to ask them to take part in a survey.

3) I might have strong views on the control of dogs, but that might be because I'm very often out walking along footpaths and encountering dogs out of control. Stopping people in the street would probably miss me (and others like me) because I'm mostly not in the street, I'm out in the countryside on footpaths.

Getting a representative sample is actually quite difficult. But the sample size is rarely the issue, and your comment that 325 wasn't enough, was completely wrong.

One way often used to make an unbiassed sample, is to use the telephone; since close to 100% of the population has a phone. That's why you occasionally get annoying cold calls from genuine market research organisations (along with the ones who pretend to be doing market research but are actually selling something).

If you wanted to make an adverse comment about the survey, you should have looked at how the sample was chosen, not the size.

I'd recommend a book called "Fact from figures" by Moroney as a good introduction to statistics. Also, the radio 4 program "More or less" is a good, light-hearted view of some of the various ways that numbers are abused.

Fidel Cuntstruck said...

@drsolly You make the assumption that you have a representative sample when you scoop up a handful of nails from the top of the barrel.

Ahaaaah! Now we're getting to the bones of it.


Your assumption is that there's no systematic difference between the nails at the top, and the rest of the population.

Which completely disregards the possibility that there may be significant differences between the nails elsewhere in the barrel and the limited sample you just took.


What the statistics lets you do, is calculate the likelihood that 99% of the nails in the barrel are within the length limits.

Nope, I still don't buy it

drsolly said...

<>



Yes, that's correct. The question is whether that is a reasonable assumption or not. In the situation of a billion nails being delivered in a barrel (or possibly in a bunch of barrels), it would seem to me to be a reasonable assumption to make.

If you don't think it's a reasonable assumption, and if you do want to know whether the nails that just arrived should be accepted as a delivery, then you have the following options:

1) Just hope that they are, and accept them.
2) Guess that they are not, and send them back to the manufacturer without explanation (they might refuse to supply you in future; I probably would refuse).
3) Find another way of taking a sample, which will also require assumption to be made, that might not be true - I'd be interested to hear how you'll draw your sample, and how big your sample will be.
4) Examine every single nail of the billion that were just delivered, and accept that this has driven up the cost of your nails by a huge amount, and possibly putting your entire project way over budget. Because if it takes five seconds to pick up and measure each nail, you're looking at some 1.4 million hours work, and at £6 per hour, that's some £8 million.

And that's why most people would take a sample.

Fidel Cuntstruck said...

Whoa, whoa .... just a minute there Doc.

You're getting way ahead of yourself here. You now appear to be suggesting that statistics should be used after the event to determine whether the outcome fits the model rather than the cause fitting the model, and having just had a peek at your blog, I begin to understand where you are coming from.