Linux.com

Re:time based releases

Posted by: Anonymous Coward on April 03, 2007 07:04 AM
>> Two years ago there were about 6000 modules in linux, with around 4000 being critical. The cartesian product of 4k by 4k is what we should have as critical tests to confirm correctness of execution.

I am not in the testing business but this cartesian product tells me all the possible testing scenarios (in terms of loaded modules) I can have when I limit myself to exactly 2 modules loaded at one time [this includes loading the same module twice].

To consider all the possible ways to have any number of any of the modules, assuming that the order of loading is irrelevant (which it isn't), compute 2 raised to the 4k power. That is a huge number, and if we consider order of loading to be meaningful, then we have 4k factorial plus 4k-1 factorial plus 4k-2 factorial plus<nobr> <wbr></nobr>... which is even larger though we are already in the realm of meaninglessness.

I got the 2**(4k) because that is the sum of the number of combinations of of 4k taken 1 at a time plus 4k taken 2 at a time plus<nobr> <wbr></nobr>... 4k taken 4k at a time [use combinations (not permutations/factorials) if assume loading order is irrelevant and sum all of these since we might load 4k modules but we might load only 10 modules or 4k-1 or 954 etc. And the sum of all of these combinations is 2**4k (sum of the (4k+1)th row in pascal's triangle)]

Thus we will always be short of human beings and computers to do a thorough testing. Fortunately, if the modules can be determined to be independent of each other (as much as this can be the case) then we just have to test 4k cases. Anyway, the reality is somewhere in between, but a thorough testing of just one of these gazillion combinations is impossible in most cases anyway. And if you think I am crazy with the calculation above (which assumes that a test can be done to absolutely determine if a combination of loaded modules works.. such a test is only theoretical anyway so we have 2**4k theoretical tests)... if you think I am crazy, then consider that each module has at least one bit of state. We are thus looking at 4k bits as a lower bound. The total number of such different states (assuming each different set of bits is a different condition which is a good assumption if the modules aren't identical), we get 2**4k different states once again. And this is basically a lower bound because each module has more than 1 bit of state. To recap, we have something like 2**4k module scenarios on which we must do a test so comprehensive as to definitely ascertain if there is any flaw or not. Alternatively, we can create a checklist of all the possible correct configurations for the states. This table would be 2**4k in size if each module had only 1 bit of state. In reality the table would be much much larger (eg, if 10 bits of state per module => 2**40k).

In short, 1 day or 100 years is almost insignificant if we wanted to be comprehensive and not rely on any shortcuts. In practice, we can probably study the code for correctness (even if nonrigorously), then test boundary cases (as errors are flagged or as interesting situations are discovered), use debugging tools, do unit tests, etc. So the design of Linux (making modules as independent as possible) is the most important element for coming up with a testing framework that is manageable and that we can hope will be able to catch a nonnegligible fraction of possible errors. To repeat, if we can approximate module independence (and this was all there was to it), we would only need 4k theoretical tests instead of the much much larger 2**(4k) such tests.

In real life, we are doomed to be imperfect by far, but we can survive.

[Notice that if a single application (more complex than a much smaller limited module), can use up a gig of memory, and we wanted to look at the full state, we need to look at at least 2**1gig as these are all the ways to stuff 1 gig with 0s or 1s and presumably each of these states would be independent of the rest in terms of determining correctness. As an example, consider only the pixel memory used by Firefox. We would look at all the ways to pixilize a monitor and some would be correct and others would be off on some pixel value. In reality though, not only can we ignore many of these cases (a pixel being off is frequently not noticeable to the eye), but we have in practice much less than 2**1gig degrees of freedom (eg, some bitmaps will always be rendered identically.. at least we may come to that conclusion by studying the correctness of the relevant functions and thereby eliminate zillions of possible states from consideration). Also, note that we are not just looking at a full screen of bits. Each full screen is in the context of more state. If I go to google.com, I don't want to see microsoft.com or any other webpage. In short, thorough testing of a complete system involves doing an infinite number of tests, for all intents and purposes. Being smart about it though means we will strive for independence of modules and basically ignore combinations except in special circumstances. This is linear growth (if modules are roughly equally complex to build as their numbers grow). Then it makes sense to talk about 9 months vs. 3 months or a week, but Linux devs are growing too. In essence let one or several people be responsible for a module, which is what happens anyway. Naturally, in real life, not all things are equal: combinations of apps/modules must be considered; human management/communication and other logistics become issues; etc.]

Sorry for overdoing this post. Just think that the cartezian discussion was thrown in there without a base and it would lead to wrong conclusions. It would make sense, for example, if after considering everything, we estimated that testing 2 modules at one time provided fairly decent test coverage of likely errors.

#

Return to Michlmayr advocates time-based release management for FOSS projects