Audio Science Review

Audiophile Neuroscience · August 1, 2019

6 minutes ago, MLXXX said:

Wasn't it a bit of both?

yes but IMO not exactly as you have outlined

6 minutes ago, MLXXX said:

That is, people ordinarily sceptical of blind tests being happy to pounce on and accept a blind test result at a time when it appeared that measurements hadn't established there to be a difference.

"pounce" is a mischaracterization. The test constituted something of a challenge in which it was inherently understood that in order to pass or fail, the value of the test must be accepted by all sides to be able to determine a pass or a fail. This was undertaken in good faith by Mani who agreed to be subjected to such scrutiny and he passed.

sir sanders zingmore · August 1, 2019

I haven’t been following everything in this thread so my apologies if I misunderstand.

In summary, did someone pass an ABX test in which there was a measurable difference?

MLXXX · August 1, 2019

49 minutes ago, Audiophile Neuroscience said:

the value of the test must be accepted by all sides to be able to determine a pass or a fail.

That strikes me as more like a rule for a board game, or for receiving a salary bonus, rather than a way of determing whether sounds are actually different.

Or you could agree to such a rule for betting purposes. The two "sides" could be those willing to bet for a result of p<0.05 and those willing to bet for a result p>0.05 in a DBT of 10 trials with one participant; if that's what both sides agreed to in advance. Ordinarily you'd want the conditions set out in a detailed way, and checked, before proceeding with the "betting event".

Anyway to make a broader comment, I do find it a bit mystifying why people would try to use a single session DBT outcome to throw light on whether a difference exists in analogue signals, in circumstances where test equipment was available to determine that question. Of course sometimes people wouldn't have ready access to such equipment.

Edited August 1, 2019 by MLXXX

frednork · August 1, 2019

12 minutes ago, Sir Sanders Zingmore said:

I haven’t been following everything in this thread so my apologies if I misunderstand.

In summary, did someone pass an ABX test in which there was a measurable difference?

Errrm, wellllll, ummmm yes.

But it doesn't quite capture the passion and back and forth thrust and parry , pithy comment and in the trenches battle that's been going on here!

MLXXX · August 1, 2019

17 minutes ago, Sir Sanders Zingmore said:

I haven’t been following everything in this thread so my apologies if I misunderstand.

In summary, did someone pass an ABX test in which there was a measurable difference?

Yes. Eventually the recordings made of the relevant DBT session were analysed and it was found that the DAC had in fact been delivering different analogue output (particularly above 14kHz).

Audiophile Neuroscience · August 1, 2019

10 minutes ago, MLXXX said:

That strikes me as more like a rule for a board game, or for receiving a salary bonus, rather than a way of determing whether sounds are actually different.

Either way both parties must be willing to accept the outcome irrespective of previous beliefs or biases.

10 minutes ago, MLXXX said:

I do find it a bit mystifying why people would try to use a single session DBT outcome to throw light on whether a difference exists in analogue signals, in circumstances where test equipment was available to determine that question. Of course sometimes people wouldn't have ready access to such equipment.

If you are talking about some form of signal inverting/cancellation then IIRC it was done along with many other techniques which did not initially show a measurable difference. It spawned a great many pages of better ways to measure including whole new threads to create better ways of measuring.

I am told something was eventuuuuuaaaly (that's a new word) uncovered. Prior to the test there were bold claims that IF an audible difference were found in bit identical files then measuring would be straight forward. It wasn't, but as you say maybe they lacked the right tools, not for me to say.

Whether there is or is not a measurable difference is one thing. The assertions were (to my recollection) that bit identical files could not sound different, there was no audible difference therefore possible or measurable, and certain changes in a software setting could not effect the changes downstream to cause an audible difference. This is just from my memory from being in the thread, I haven't re-read the whole 118 pages.

MLXXX · August 1, 2019

6 minutes ago, Audiophile Neuroscience said:

Whether there is or is not a measurable difference is one thing. The assertions were (to my recollection) that bit identical files could not sound different, there was no audible difference therefore possible or measurable, and certain changes in a software setting could not effect the changes downstream to cause an audible difference. This is just from my memory from being in the thread, I haven't re-read the whole 118 pages.

Yes I gather that was a traditional view expressed by some, and there was a rather extreme non-traditional view expressed that just copying a file from one hard disc to another could for evermore compromise the sound of the file (despite it having the same bits/bytes and checksum).

Where the traditionalists [apparently] came unstuck is that they did not anticipate that a change in a player setting that didn't alter the bits could somehow alter the sound of a DAC further downstream receiving those bits. Indeed that is a very curious and unusual outcome. I have simply accepted at face value the explanation given in the edited opening post that the s/pdif stream had significant jitter when a low file size was selected in the player.

* * *

A logical further step in the investigation (if not already taken) would be to use a very fast oscilloscope to inspect the s/pdif stream for timing irregularities and if that showed up nothing, for noise.

acg · August 1, 2019

3 hours ago, Grant Slack said:

Hello acg,

I was hoping not to contribute to the discussion of the flaws, and untested-but-straightforward explanations, of the 'split file size' test result, and perhaps I still won't. When a purportedly-similar-to-valid test turns up, with a result that would be surprising if it were indeed valid, then people tend to latch onto such a test result, and want to defend it to the hilt. It becomes "undiscussable". So, I don't.

I certainly won't be reading 118 pages of forum talk just to make a comment here about it, nor the latter 64 pages after the test was outlined. The link to the test description comment on p54 was much appreciated; my thanks to whoever posted it here.

If someone can post a link to the statistical analysis of that single-blind test, that would be interesting to see too: how many hundreds (or even dozens would still be interesting) of subjects took the identically-presented test, and the relevant distribution of scores. Thanks.

But, let's shift the discussion back to your answer to my question, and whether it is a convincing answer to my question. I am not sure that I correctly understand your logic. Are you saying that one appropriate way to prove the reliability of a consensus opinion (based on sighted listening tests), is to refer to a blind listening test -- but only if it agrees? IMHO, if blind listening is the way to confirm a general opinion of sighted listening, then we have to also accept the blind test when it contradicts sighted listening. And that would mean we have created a hierarchy, where blind listening is the higher-order test, and sighted listening is lower-order. I believe that that has been made, tested and affirmed, with such high confidence that it should be accepted. In which case, we are right back where we started, and that blind testing is the way to go, and sighted listening tests are unreliable.

Regards,

Grant

Hi Grant,

Why does one have to be better than the other when they are merely tools? Tools used under different circumstances, when different levels of proof are required, but ultimately neither foolproof nor infallible.

Results from both are subject to the biases of the subjects whether they admit it or not. Neither can be proven to be reliable for audio use. There is no hierarchy between the two other than in the minds of those that have find "one thing" to put their faith in instead of regarding and using them as incomplete tools to be applied as appropriate.

Anthony

sir sanders zingmore · August 1, 2019

1 hour ago, frednork said:

Errrm, wellllll, ummmm yes.

But it doesn't quite capture the passion and back and forth thrust and parry , pithy comment and in the trenches battle that's been going on here!

They go without saying

LHC · August 1, 2019

12 hours ago, Audiophile Neuroscience said:

How did I do?

Pretty good.

12 hours ago, Audiophile Neuroscience said:

There are only two scientifically correct interpretations and assuming the null hypothesis is "there is no audibility difference between A and B"

1. The null hypothesis is rejected on the basis of only requiring one counter example and to a statistically determined significance level

That is correct, but where is the fun in that?

12 hours ago, Audiophile Neuroscience said:

2. Choice "7" is contingent on one of the components being vinyl AND if anger management sessions are working out well

Yeah, you won't want to anger him

12 hours ago, Audiophile Neuroscience said:

3. Choice "9" is obviously irrelevant as white wine is deemed superior in 9 out of 10 blind tests, the one red wine guy was an outlier....and an alcoholic !

As a non-drinker it is all white-noise to me Obviously my palate is not resolved enough to tell the difference.

MLXXX · August 1, 2019

On 31/07/2019 at 9:42 PM, LHC said:

Let say from the ABX we produced the following set of results: 9 subjects could not hear any difference between the two components (score statistically equal to random guessing); 1 subject was able to to hear a difference (score significantly different from random guessing). What is the correct interpretation of this set of outcomes?

I hate to be a party pooper but the above statistical data for the study is incomplete, if a person answering wishes to determine whether the DBT outcome of the study was statistically significant.

* * *

As the number of people included in a study increases, it becomes more and more likely that at least one of them will do well through random guessing. Let us assume just 10 trials in the DBT. If there is only one participant, the chance of that participant getting 10 out of 10, a perfect score, through guessing would be (0.5)^10 = 0.0009765625, or roughly 1 in 1,000, a very remote chance.

How many participants would be needed for a 50:50 chance of one of them getting a perfect score? ~~It would be 0.5/ (0.5^10) = 512. With 513 participants there would be slightly better than a 50:50 chance of one of them getting a perfect score through pure guesswork.~~

LHC's hypothetical study involves only 10 participants, but that may be a large enough group to render the single score that was -- in isolation -- "significantly different from random guessing" not significant within the context of the study as a whole. We'd need some hard figures to make a determination.

A similar issue can arise with DBT tests done at home where the one person does the same test on different days and eventually gets a "significant" result one day. That result would need to be amalgamated with the earlier results to calculate whether statistical significance had been attained.

Edited August 3, 2019 by MLXXX
Strike-through effect added to words relating to a mistaken calculation.

Grant Slack · August 1, 2019

Hello David,

You are asserting that the packet size blind audio test is scientifically valid, correct? Does it meet the requirements that you have been outlining for a valid scientific test?

Regards,

Grant

Edited August 1, 2019 by Grant Slack

Primare Knob · August 1, 2019

Yes. Eventually the recordings made of the relevant DBT session were analysed and it was found that the DAC had in fact been delivering different analogue output (particularly above 14kHz).

Can you point me out to that post. I have been reading up to the 18th of April

Are these analogue graphs posted by Mansr?

Primare Knob · August 1, 2019

It was in fact the reverse, the rejection of blind testing by those previously advocating blind testing as a valid test. Mani subjected himself to the test and passed @ p=.01

P=.01 meaning 99% certainty?

Am I missing something when 9/10 is a 90% score and not 99%?

Audiophile Neuroscience · August 1, 2019

10 hours ago, LHC said:

22 hours ago, Audiophile Neuroscience said:

The null hypothesis is rejected on the basis of only requiring one counter example and to a statistically determined significance level

That is correct, but where is the fun in that?

LHC you are right of course. My only rule about funny is that it must be about somebody else !

https://www.stereo.net.au/forums/topic/288830-pure-class-a-monsters/?do=findComment&comment=4352799

MLXXX · August 1, 2019

18 minutes ago, Primare Knob said:

P=.01 meaning 99% certainty?

Am I missing something when 9/10 is a 90% score and not 99%?

The p value refers to the likelihood the apparently significant result could merely have arisen by chance, sometimes called "guessing".

Normally it's expressed with the less than sign rather than the equals sign.

In this case there were 9 correct answers and 1 incorrect. The likelihood of that arising by chance can be calculated as follows:

In 10 trials involving a yes or no answer, if exactly one answer is incorrect it must be the 1st, 2nd, 3rd, ...., 8th, 9th or 10th answer, a total of 10 possibilities. In 10 trials involving a yes/no answer, the total number of different answer combinations possible is 2^10 = 1024. Hence the probability of exactly one answer being wrong is 10/1024 = 0.9765625% ≈ 1%.

MLXXX · August 1, 2019

27 minutes ago, Primare Knob said:

Can you point me out to that post. I have been reading up to the 18th of April

Are these analogue graphs posted by Mansr?

They were posted by "manisandher" in the edited opening post to the Blue or red pill thread.

That post includes the graph for the 10kHz test tone. It also includes the following:-

"Edit 2 April 16th 2018

@testikoff has just undertaken the painstaking task of comparing the two analogue captures:

It seems that they are virtually identical below 14kHz. My ears are only good to 12kHz or so nowadays... and yet I heard clear differences in the A/B/X.

Mani."

rocky500 · August 1, 2019

Can anyone point me to somewhere on the Internet where a DBT was done with people like us that had a significant result with components with smaller differences? eg. Dacs, Amps, Preamps

Ones where the test is done well or at least past the scrutiny of all others on the net.

As I can not find any. They all seem to be there is no difference or as good as a random guess . It appears to me that they are not good for this?

Edited August 1, 2019 by rocky500

Audiophile Neuroscience · August 1, 2019

12 hours ago, Primare Knob said:

P=.01 meaning 99% certainty?

Am I missing something when 9/10 is a 90% score and not 99%?

I offered the p=.01 in the blue pill/red pill thread and Mani, who has a physics background, also had calculated same. Stupidly, I tried initially to work it out mathematically but then remembered there are tables that have p values listed!

Basically the percent of right answers (9/10) is not the same as the probability of getting that many right answers by guessing. It has to do with binomial distributions and cumulative probabilities of cumulative binomial distributions. If you start out with one trial with the probability for a 50% success rate of a single outcome, the probability will be 50% of a say heads. A second trial will produce a probability of 25%, a probability of I in 4, and so on

http://onlinestatbook.com/2/probability/binomial.html

Edited August 2, 2019 by Audiophile Neuroscience
spelling

Audiophile Neuroscience · August 1, 2019

2 hours ago, rocky500 said:

Ones where the test is done well or at least past the scrutiny of all others on the net.

Hi Rocky

IMO the former is very difficult and the latter next to impossible. Putting on my best Italian mafia voice, fargett aboudit

Cheers

David

Edited August 2, 2019 by Audiophile Neuroscience

MLXXX · August 2, 2019

9 hours ago, rocky500 said:

Can anyone point me to somewhere on the Internet where a DBT was done with people like us that had a significant result with components with smaller differences? eg. Dacs, Amps, Preamps
...

It appears to me that they are not good for this?

Preamps

Preamps are designed to be transparent. After you've level-matched the output of two preamps, including precisely adjusting the left-right balance if necessary, the likelihood of being able to hear a difference when using the line-level inputs is slim. If a listener can't hear a difference sighted, there's obviously no point in their going to the next step and attempting a DBT.

It might be different with a magnetic cartridge preamp input for two reasons:

1. The RIAA curve implementation might not be exactly the same as between the two preamps.

2. The input impedance of the two preamps could be different and could load the magnetic cartridge differenty, slightly altering its performance.

Main amps

Main amps could possibly differ audibly, but a practical problem is how to alternate between them for feeding the same set of speakers. To do a full blown ABX you don't present the X in isolation, or just once. You allow the listener plenty of opportunity to hear the A version, the B version and the X version. Only when they're satisfied they've had a good opportunity to listen do they need to actually lock in their answer.

Most people in their homes don't have an electronic device that would allow them to rapidly change which power amp feeds the one set of main speakers.

DACs

If receiving a digital signal at the CD sample rate of 44.1kHz, a DAC's different filter slope settings (if it has adjustable settings) may be audible in its analogue output. You could do recordings of the DAC analogue output with an ADC operating at say 96/24 and quantify the differences in the high frequency performance of the DAC using spectrum analysis tools.

Once you get to music with a sample rate of 48kHz or beyond it can be extremely difficult to hear differences between DACs if they have been precisely level matched, including adjusting the channel balance.

Edited August 2, 2019 by MLXXX

sir sanders zingmore · August 2, 2019

11 minutes ago, MLXXX said:

Most people in their homes don't have an electronic device that would allow them to rapidly change which power amp feeds the one set of main speakers.

Don’t you know that the switching device degrades the signal?

MLXXX · August 2, 2019

5 hours ago, Sir Sanders Zingmore said:

Don’t you know that the switching device degrades the signal?

I realise that was written tongue-in-cheek.

A variation of that objection could be raised in relation to streamlining the switching of alternative power cords (using a box that contained a mains rated relay).

I think that your typical after-market power cord would not be phased by that, or get over-heated, even though its owner might arc up at the mere suggestion of a switching box.

Edited August 2, 2019 by MLXXX

Satanica · August 2, 2019

13 hours ago, Grant Slack said:

Hello David,

You are asserting that the packet size blind audio test is scientifically valid, correct? Does it meet the requirements that you have been outlining for a valid scientific test?

@Audiophile Neuroscience David?

LHC · August 2, 2019

21 hours ago, MLXXX said:

How many participants would be needed for a 50:50 chance of one of them getting a perfect score? It would be 0.5/ (0.5^10) = 512. With 513 participants there would be slightly better than a 50:50 chance of one of them getting a perfect score through pure guesswork.

I would look at it this way. If the null hypothesis is true, i.e. no person could possibly hear any differences between the two compared components, then if one were to test a large group of people, and plot a histogram of the statistical result, one should expect to see a binomial distribution (with probability 1/2 due to random guessing as no one can tell any differences). From the binomial distribution one could calculate the expected number of people getting certain correct scores. If one is interested in a score of 10/10, then in a test of 2*512=1024 subjects, the expected number of people that could get that prefect score is 1 (by sheer luck). However any additional person scoring 10/10 would represent anomalies in the statistics, or some would call them outliers. Their presence means that the assumption that everyone is randomly guess is incorrect.

But it doesn't confine to only those getting 10/10; even those getting 9/10 or 8/10, if the actual results exceeds the expectation value from binomial distribution, then it indicates anomalies and outliers. In other words there are some people who can truly do better than random guessing. Of course it works the other way too, i.e. if the actual numbers fall below expected values, that too needs an explanation (properly some systemic error somewhere). In all cases the thing we should focus on is whether the probability in the distribution is 50% or something else.

The general rule of statistics apply, the larger the sample size, the more accurate the result. Ideally we should test a million people! I would like that Certainly my example with only 10 subjects is of course not enough to be definitive.

Edited August 2, 2019 by LHC

Sign In

Audio Science Review

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

proftournesol

acg

Eggmeister

Recently Browsing 0 members

Similar Content