Tom, Thanks for pointer to your "Measuring the complexity of simplicity" paper. I will try to work through it....probably will take some time. Partly because I have to be spending most of my time working on my main theory right now...in hopes to find funding :)
Anyway, I am not understanding your either/or of physical sets and physical superpositions. I'm wondering if it is because you are using a different def of superposition from me...i.e., one somehow more specific QT...not sure.
So, I'm not defining superposition just as linearity, as in linear systems, f(x+y)=f(x)+f(y), or at least, where f is either scalar-valued or vector-valued (or even where the vector is a vector of functions). Perhaps my use of superposition would reduce to linearity, but where f is set-valued. Not sure.
When I say that two codes (sparse distributed codes), X and Y, which are both sets of cardinality Q, are simultaneously physically active in a coding field, I just mean that some of X's elements are active and some of Y's elements are active (assuming they have some intersection). The total number of active elements is still Q (that's mandated by my model's rules). So for example if X and Y share Q/2 elements in common, then when X is fully active, Y is half active, and vice versa. X and Y are both simultaneously active, just to different degrees (strengths). So, if one's model has a way to interpret strength of activation (defined that way) as the probability of the represented item, then we have a physical realization of the probability distribution. One can physically make a draw from that distribution, i.e., collapse the superposition (that's what Sparsey's code selection algorithm does).
However, a Sparsey coding field (can and generally does) have a recurrent full matrix, and when signals are sent out from the active code (of Q elements) at T, via that recurrent matrix, and arrive back at the coding field at T+1, a new code (possibly the same code, but in general a new one), but also of cardinality Q, will be activated. But just as the code active at T was a distribution over multiple stored codes (here, just X and Y), the code at T+1, is also a distribution over those codes. To emphasize, every particular set of Q elements represents not just a particular item, but also the distribution over all items (specifically, in Sparsey, over all items that have been stored in the coding field during a learning phase).
You might say, but why should we believe that each new distribution that becomes active is in any sense a good one, or a correct one, or a reasonable one given the statistics of the input space from which the items (that were stored during the learning phase) were drawn? The answer is that if, when you store each new item, you statistically preserve similarity, i.e., simply cause more similar input items to be mapped to more highly intersecting codes, then you create (embed) the appropriate intersection structure over the stored codes, so that, during a recognition test phase, or just while "free-running", after learning, the distributions that arise will respect the statistics of the input space. Moreover, because not just pairwise intersections, but intersections of intersections, etc., are also physically active whenever one code is fully active, the intersection structure reflects not just the pairwise statistics, but in principle, the statistics of all orders present in the input set.
Also, I want to emphasize that all the learning algorithm does is statistically preserve that the *size* of intersections correlates with input similarity. It does not choose *which* elements will be in intersections. However, merely by imposing that *size* correlation, over time, subsets (of various sizes less that Q) will emerge as representations of various higher-order statistics over the inputs. More precisely, the code assigned to the *first* input item stored will be chosen completely randomly (i.e., one unit chosen at uniformly randomly in each WTA CM). But thereafter, the choice of winner (in each CM) will be biased by the learning that occurred for each prior item stored: these biases accumulate in the patterns of increased input wts to the coding field units.
Does that clarify my meaning at all? I'm not sure if I'm really getting your point. But I am interested that you have a set-based theory and will try to grok it. And, thanks for pointer to Lev Goldfarb. Of course, I've heard of him, but I don't know his work.