Hello everyone!
Recently I've started developing a program which may or may not be eventually useful for lottery enthusiasts. Since I don't really know much about the lotteries and various strategies and systems (I am a programmer and this idea came to my mind while developing an optimized matching algorithm), I would like to humbly ask you for your opinion and perhaps an advice about the usability of this program and its features.
The initial idea was this:
In my country there is a 6/49-based national lottery where, besides an ordinary 6-numbers ticket, one can take a system bet consisting of 7 up to 15 numbers. Sure that way you can cover more combinations, but is it really worth it? So I was thinking that maybe computing all possible combinations from 6 (let's support even standard 6-numbers bet too) to 15 numbers and then comparing them against the results of the previous draws it would be possible to find combinations which (had they been used) in the past have got the highest chances to match the actual drawn numbers. Then maybe using some sort of time analysis (how evenly a combination "scored" in the past draws) it could give some clue as whether given system combination might score in the future.
(For example, if I find a combination which in the last 30 years would have scored in 15% of all cases but looking at the dates it would appear that the majority of those 15% actually happened mainly during 15-25 years ago, then such a combination is probably not very great for the future bets. On the other hand a combination with maybe slightly lower % success but scoring more continuously from far past to the very present may be a better candidate.)
This idea may or may not be actually useful, you know surely much better than me, anyway from the programming perspective this was an interesting problem because it would require a lot of computing. Standard 6-numbers bet makes 13,983,816 possible combinations, that's nothing. 7-numbers bet goes to 85,900,584 combinations, 8-numbers 450,978,066 combinations, ..., and 15-numbers system bet can be actually selected out of 1,575,580,702,584 possible combinations. Now to make it work all those combinations would have to be computed and then compared with the lottery history data - which may be thousands draws. So each computed combination must be compared to all past draws, respective % chance must be calculated, the same thing must be done with all other millions and billions or even trillions of combinations, and at the end it should print out several best combinations. And all that in a reasonable time.
This is where my then-developed matching algorithm kicks in as it was supposed to provide some speed boost, so to test it in the reality I've implemented it in this context and built this program around it.
For the time being the performance is as follows:
* 900 millions of combinations per second are computed
* 90 millions /s are processed and delivered to the matching algorithm
* the speed of actual testing depends on how many history draws the computed combinations are to be compared with:
- 2.2 millions combinations /s when comparing against 100 lottery draws
- 212 thousands /s when comparing against 1200 lottery draws
- 50 thousands /s when comparing against 5300 lottery draws
The above example is the performance per a one CPU core (3,15 GHz in this case), thus using e.g. 4 cores of 4-core CPU the speed will result in 4 higher speed. The program can be divided into thousands pieces and can be run simultaneously and totally independently on as many cores, processors and even separate individual computers as needed. I am still not satisfied with the speed, though. Also, the time analysis is not implemented yet, that will probably drag the speed down even further.
Anyway, even in the current work-in-progress developing state, all possible 1,5 trillions of 15-numbers combinations can be computed, processed and compared against 5300 lottery draws (our lottery's data from last 35 years) in no longer than a single month, using 3 CPUs, each with 4 cores. That's not that bad, I guess.
Program parameters:
Code:
$ ./lotto
Lotto ver. 1.1
==============
This program will compute all possible k-combinations of 49 numbers (1 - 49), drops combinations containing any blacklisted number (if any specified in blfile), further drops combinations not containing all whitelisted numbers (if any specified in wlfile) and then from the remaining set takes each combination and compares it against all 6/49 lottery draws (specified in dtfile) calculating the percent rate of 4-matched numbers, 5-matched numbers and 6-matched numbers. At the end the program will show the TOP-3 combinations with the highest combined percentage.
In the alternate mode (pfreq is set to 999) the program will not calculate anything, just print out all computed (and all 'to be tested' - with regards to the blacklist and whitelist) combinations.
(C) 2011 johnsmithx (johnsmithx at seznam.cz)
Usage: lotto dtfile blfile wlfile k pfreq start stop [benchmark]
or lotto dtfile blfile wlfile k 999 start stop
Parameters:
dtfile csv file, at least 1 row, 6 values per row
blfile csv file, either empty or up to 49-k numbers on a single row
wlfile csv file, either empty or up to k numbers on a single row
k k-combinations (6 - 15)
pfreq progress report frequency in % (0.01 - 50)
start start computing at specified progress % (0 - 99.99)
stop stop computing at specified progress % (start+0.01 - 100)
benchmark period of seconds for initial benchmark (5 - 100), default 10
So as you can see the program is technically focused and doesn't have any eye candy useless features. It's made so that all parameters can be specified as the command arguments, the output can be redirected to a text file and the program can be run as a scheduled job in the background. Currently the platform is Linux x86_64 with assembly optimizations for modern core2 processors with sse2/ssse3 extensions.
Besides the actual history data provided in the form of a simple CSV file (comma separated values, just 6 numbers at each line, plain and clear) the program can work with the blacklist and whitelist files. In the blacklist you can specify numbers you DO NOT want to appear in the tested combinations. In the whitelist you can specify the exact opposite - the numbers you DO want to have in the tested combinations. Program will always compute all possible combinations (quick part) but only those in compliance with blacklist/whitelist are passed for testing (slow part).
You also specify how often should be shown the intermediate progress report, so you can see in what point the program run is at the moment, how long is it running and how long is expected to the end.
The very important feature is specifying the start and stop point, which allows to divide program run to smaller fragments and run then either one after another or all at once. Every running instance of the program will bind itself to a single core of a single CPU.
Example of output:
Code:
$ ./lotto lotto_data_5336_1977w1-2011w39.csv lotto_blacklist.csv lotto_whitelist.csv 7 20 0 100
Lotto ver. 1.1
==============
Input data statistics: datafile 5336 rows, blacklistfile 0 numbers, whitelistfile 0 numbers
Requested 7-combinations of 1 - 49: 85900584 combinations to compute, 85900584 combinations to test
parameters: start 0.00% stop 100.00% pfreq 20.00% benchmark 10 s
START COMPUTING at 0.00% (2011-11-20 17:07:48 +0100)
benchmarking: computed combination no. 430000 (43000/s), tested combination no. 430000 (43000/s)
elapsed: 10 s, remaining: 33 m 7 s (2011-11-20 17:07:58 +0100)
Progress 20.00%, computed combination no. 17180117 (46685/s), tested combination no. 17180117 (46685/s)
elapsed: 6 m 8 s, remaining: 24 m 31 s (2011-11-20 17:13:56 +0100)
...1st comb: 5, 14, 21, 25, 30, 39, 48 prob: 0.6372% (32x 4, 2x 5, 0x 6)
...2nd comb: 5, 18, 21, 24, 25, 26, 48 prob: 0.6372% (34x 4, 0x 5, 0x 6)
...3rd comb: 5, 14, 21, 25, 31, 39, 48 prob: 0.6184% (33x 4, 0x 5, 0x 6)
Progress 40.00%, computed combination no. 34360234 (46685/s), tested combination no. 34360234 (46685/s)
elapsed: 12 m 16 s, remaining: 18 m 23 s (2011-11-20 17:20:04 +0100)
...1st comb: 5, 14, 21, 25, 30, 39, 48 prob: 0.6372% (32x 4, 2x 5, 0x 6)
...2nd comb: 5, 18, 21, 24, 25, 26, 48 prob: 0.6372% (34x 4, 0x 5, 0x 6)
...3rd comb: 5, 14, 21, 25, 31, 39, 48 prob: 0.6184% (33x 4, 0x 5, 0x 6)
Progress 60.00%, computed combination no. 51540351 (46685/s), tested combination no. 51540351 (46685/s)
elapsed: 18 m 24 s, remaining: 12 m 15 s (2011-11-20 17:26:12 +0100)
...1st comb: 5, 14, 21, 25, 30, 39, 48 prob: 0.6372% (32x 4, 2x 5, 0x 6)
...2nd comb: 5, 18, 21, 24, 25, 26, 48 prob: 0.6372% (34x 4, 0x 5, 0x 6)
...3rd comb: 5, 14, 21, 25, 31, 39, 48 prob: 0.6184% (33x 4, 0x 5, 0x 6)
Progress 80.00%, computed combination no. 68720468 (46685/s), tested combination no. 68720468 (46685/s)
elapsed: 24 m 32 s, remaining: 6 m 7 s (2011-11-20 17:32:20 +0100)
...1st comb: 5, 14, 21, 25, 30, 39, 48 prob: 0.6372% (32x 4, 2x 5, 0x 6)
...2nd comb: 5, 18, 21, 24, 25, 26, 48 prob: 0.6372% (34x 4, 0x 5, 0x 6)
...3rd comb: 5, 14, 21, 25, 31, 39, 48 prob: 0.6184% (33x 4, 0x 5, 0x 6)
COMPUTING COMPLETED at 100.00% in 30 m 39 s (2011-11-20 17:38:27 +0100)
85900584 combinations computed (46710/s), 85900584 combinations tested (46710/s)
Results
=======
1st comb: 5, 14, 21, 25, 30, 39, 48 prob: 0.6372% (32x 4, 2x 5, 0x 6)
2nd comb: 5, 18, 21, 24, 25, 26, 48 prob: 0.6372% (34x 4, 0x 5, 0x 6)
3rd comb: 5, 14, 21, 25, 31, 39, 48 prob: 0.6184% (33x 4, 0x 5, 0x 6)
What I would like to ask you is this:
- What do you think about this idea and the program? Can it be of any use for lottery purposes at all or is it just non-practical nonsense?
- What other features could be useful to implement?
Thanks for your answers.
johnsmithx
Bookmarks