Abstract
Science is in large part the art of careful measurement, and a fixed measurement scale is the sine qua non of this art. It is obvious to us that measurement devices lacking fixed units and constancy of scale across applications are problematic, yet we seem oddly laissez faire in our approach to measurement of one critically important quantity: statistical evidence. Here I reconsider problems with reliance on p values or maximum LOD scores as measures of evidence, from a measure-theoretic perspective. I argue that the lack of an absolute scale for evidence measurement is every bit as problematic for modern biological research as was lack of an absolute thermal scale in pre-thermodynamic physics. Indeed, the difficulty of establishing properly calibrated evidence measures is strikingly similar to the problem 19th century physicists faced in deriving an absolute scale for the measurement of temperature. I propose that the formal relationship between the two problems might enable us to apply the mathematical foundations of thermodynamics to establish an absolute scale for the measurement of evidence, in statistical applications and possibly other areas of mathematical modeling as well. Here I begin to sketch out what such an endeavor might look like.