I disagree that it's better than the alternative, because instead of relying on observation, you are simply rolling the dice.
Not so.
If I have three listeners sitting here and I play them two amplifiers, for the sake of argument a Cyrus and Naim integrated, you can be certain of two things.
The Cyrus will be described as crisp, rather lightweight and a little brittle, but with good clarity. Put the Naim in the system and the comments will shift towards things like 'communicates better' but sounds rather two dimensional.
Its all bollocks.
Neither amplifier has sufficient difference in the design to cause the above observations. They will spec and measure differently, but by tiny amounts and well below the threshold of detectability for the vast majority of people.
You can demonstrate this by repeating the test with the identity of the amplifiers hidden, and the same listeners will now not be able to form the same strong opinion as to any difference. Or if you want to be really cheeky, power both up, have them on show but swap the connection so that when the Cyrus is playing, people assume it is the Naim........ they hear the Naim.
If you want to test blind testing as a test regime it isn't difficult.
Compare the Cyrus to a Prima Luna - you will get the listeners identifying clear differences under the blind condition.
Why?
Well, because the difference is now of sufficient magnitude to be consistently audible, and more importantly, to matter.
That die is loaded heavily to produce the desired result, which is why Hi-Fi Choice's blind tests produce identifiable differences between CD players and cables and yours don't.
You are correct, it is possible to assemble a system in order to create differences under blind conditions, or not reveal them, but that implies that the tester has deliberately set out to deliver a result. Doesn't mean this has to be the norm.
On cables, pick a source or pre amp with a tube output stage, or use a passive system in which source and load impedances have not been carefully considered and you can get cables to sound different under blind test conditions. There is no mystery to this and it boils down to basic electronics and the effects of filters formed at various parts along the chain.
Similarly you can select a speaker system to stress amplifiers, and thereby magnify differences but these will be largely limited the effects of output impedance and current delivery.
Everything hinges on one question - what are you testing for?
If you are testing for the sonic differences between silver and copper conductors, grain structure, enclosure materials, different types of capacitors or resistors, one brand of tube against another of the same spec etc, then the system used for the test cannot skew the result because it doe not alter any of the above. They are entirely independent of system matching considerations.
So again it all comes back to what causes differences to be audible, and in my view there is always a sensible, rational and technical explanation.