View Full Version : 'Monstrous Jesters' benchmark package
Sanmayce
03-20-2012, 07:40 AM
For a long time looking on tests I couldn't find some answers to some very basic (but/and important) aspects of CPU/RAM performance.
I am talking about sorting/decompressing/searching performed by console tools (written in C).
For example I have no opportunity to run my tests on some real powerhouse, this limits my quest of writing the fastest memmem (in C) function because i5/i7 have very different behavior (compared to Core 2) when comes to 1/2/4 bytes fetching. I mean already tuned functions for one CPU/RAM system are no longer superior on a newer system which demands intensive testing in order to retune them.
You all are welcome to use my latest benchmark (a NSIS installation) at:
http://www.sanmayce.com/Downloads/index.html#Jesters
'Monstrous Jesters' benchmark package short overview:
This is my latest 32bit/64bit (strstr-showdown included) CPU/RAM benchmark package (a NSIS installation).
File: Monstrous_Jesters.exe
Size: 153 MB (161,009,933 bytes)
Size unpacked: 500 MB
Size needed: 1200 MB
After installation 5 shortcuts (tests) are placed on Desktop/Programs.
http://www.sanmayce.com/Downloads/Monstrous_Jesters.png
All tests are written in C (sources included), and compiled with latest Intel 12.1 and Microsoft 16 optimizers.
The MEMMEM (strstr-showdown) takes some 21minutes to complete on Core2Duo_E7500_2.93Ghz.
Of course in order to obtain decent results stop all the concurrent processes before running the test.
Also enable 100% computing power.
Well, there are some additional tests (Intel 12.1 and Microsoft 16 executables included):
- lzpre a LZ77 32bit/64bit [de]compressor, written by Matt Mahoney;
- Yappy a LZ 32bit/64bit [de]compressor, written by IronPeter;
- Knight tour benchmark, finds first 9,000,000 tours (at rate some 1 billion per minute jumps), in fact tests/stresses only CPU clock;
- Quicksort 32bit/64bit used to sort 200,000,000+ pointers (pointing to 7bytes chunks).
Also I would be glad for some feedback and results on your machines.
Enjoy!
Splave
03-20-2012, 08:03 AM
Cool I'll give it a shot on my x79
Sanmayce
03-20-2012, 08:06 AM
I rely on you Splave (http://www.overclockaholics.com/forums/member.php?u=76), take your time I have been waiting years so I am not in a hurry.
Feel free to ask whatever interests you.
Neuromancer
03-20-2012, 08:45 AM
I will take a look at it this week, if I like it I will toss it into my next review :)
Sanmayce
03-20-2012, 08:50 AM
(http://www.overclockaholics.com/forums/member.php?u=87)Thank you Neuromancer (http://www.overclockaholics.com/forums/member.php?u=87).
MaadDaawg
03-20-2012, 01:01 PM
Are you looking for SB and SB-E testing only, or would a 980x system be helpful as well?
rickss69
03-20-2012, 08:03 PM
No clue what all this means...or if I even did it correctly.
Sanmayce
03-22-2012, 08:36 AM
Are you looking for SB and SB-E testing only, or would a 980x system be helpful as well?
Are you looking for SB and SB-E testing only, or would a 980x system be helpful as well?
Wow, the three i7 systems will do perfectly, I am not pretentious as long as i7 is involved, nevertheless the latest Sandy-Bridge-E is gonna quench well the greediness in me.
I am very interested in how these super low memory latencies in SB are gonna affect my MEMMEM functions (stressing memory bandwidth along with physical RAM IOPS i.e. being latency bound).
A week ago I saw a 5GHz SB with 22GB/s Memory Read bandwidth, my miserable/old laptop gives 5GB/s whereas my MEMMEM functions work at 3-4GB/s do the math how close are they to the limit. Therefore the thing that would make my eyes happy is a machine with High Performance CPU-RAM bus maybe triple channel is the answer (the above mentioned 22GB/s were achieved with i7 2700K @ 4.9GHz (1.420V) 24/7 Max 69C (http://www.overclock.net/lists/display/view_item/id/3569509); 4 x 4GB Samsung Extreme Low Voltage 1866MHz @ 8-9-9-24-1T at 1.5V's (http://www.overclock.net/lists/display/view_item/id/3569518)).
Just uploaded revision B of 'Monstrous Jesters' - a new multi-threaded (up to 48 threads stressing RAM/Cores) test was added.
Thank you MaadDaawg (http://www.overclockaholics.com/forums/member.php?u=78) for your readiness to help me.
Sanmayce
03-22-2012, 08:52 AM
No clue what all this means...or if I even did it correctly.
Thanks a lot rickss69 (http://www.overclockaholics.com/forums/member.php?u=66), I will explain but please give me some specs of your machine.
Last night I run the new Revision B on my T7500 2200MHz dual channel DDR2 667MHz:
http://www.sanmayce.com/Downloads/Monstrous_Jesters_rB_2_T7500.png
Looking at Knight Tours test your/my results are: 90s/218s, let me guess here your CPU runs at 218/90*2200MHz = 5328MHz or I am wrong?
Results for 'Monstrous Jesters' revision B on my laptop T7500 2200MHz (4MB L2 cache) 4GB dual channel DDR2 667MHz using Windows 7 64bit:
Test #1: MEMMEM
OSHO.TXT:
SHORT-SHOWDOWN_Intel_O3_64bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2725KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496
BNDM_64 49 i.e. average performance: 2524KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2122KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2352KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104
]
strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_64bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2689KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496
BNDM_64 49 i.e. average performance: 2414KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 1737KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2565KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104
]
strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_32bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2947KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496
BNDM_64 49 i.e. average performance: 2201KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 1593KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2958KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104
]
hs_alt_HuRef_chr1.fa:
SHORT-SHOWDOWN_Intel_O3_64bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2711KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000
BNDM_64 49 i.e. average performance: 3535KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2636KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2397KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528
]
strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_64bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2868KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000
BNDM_64 49 i.e. average performance: 3397KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2266KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2592KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528
]
strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_32bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2977KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000
BNDM_64 49 i.e. average performance: 3131KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2052KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 3035KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528
]
Test #2: LZ Yappy
Yappy_Intel_32bit_O3.exe: comp 29.9 MB/s uncomp 512.5 MB/s
Yappy_Intel_32bit_Ox.exe: comp 33.1 MB/s uncomp 513.0 MB/s
Yappy_Microsoft_32bit_Ox.exe: comp 32.3 MB/s uncomp 527.1 MB/s
Test #3: qpress
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 2
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 505MB/s
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 4
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 505MB/s
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 6
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 505MB/s
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 8
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 486MB/s
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 12
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 467MB/s
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 24
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 450MB/s
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 32
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 467MB/s
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 48
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 332MB/s
Test #4: LZMM
lzpre2_32bit_Microsoft_Ox.exe: 29.25 sec
lzpre2_x64_Intel_O3.exe: 26.74 sec
lzpre2_x64_Microsoft_Ox.exe: 27.10 sec
Test #5: Quicksort
Simplicius_Simplicissimus_Septupleton_Intel_32bit_ v12_Ox.exe:
Sort took: 196062 clocks
Decompression to RAM without Dumping to DRIVE performance: 174943 KB/s or 170 MB/s
Benchmarking 'memcpy' by copying 197MB (OSHO.TXT size) ten times ...
Simplicius says for 'memcpy' performance: 1802 MB/s
Simplicius_Simplicissimus_Septupleton_Microsoft_32 bit_v16_Ox.exe:
Sort took: 220819 clocks
Decompression to RAM without Dumping to DRIVE performance: 212247 KB/s or 207 MB/s
Benchmarking 'memcpy' by copying 197MB (OSHO.TXT size) ten times ...
Simplicius says for 'memcpy' performance: 1418 MB/s
Test #6: Knight Tours
Knight-tour_Microsoft_V16_32bit_Ox.exe: 218.13 seconds
Knight-tour_Intel_V12_32bit_Ox.exe: 227.73 seconds
Hope the above results are a good (but poor in the same time) starting point to feel how Core 2 lags behind compared to new architectures.
rickss69
03-22-2012, 09:05 AM
My runs were with the gamer which has no overclock atm (2600K).
Sanmayce
03-22-2012, 09:22 AM
My runs were with the gamer which has no overclock atm (2600K).
That surprises me, meaning that I know nothing about i7 improvements, AFAIK i7 2600K nominal is 3400MHz with turbo 3800MHz, maybe your test was done at 3800MHz?
Sanmayce
03-22-2012, 10:41 AM
Just looked at:
http://ark.intel.com/
Sandy Bridge-E:
Processor Number: i7-3930K
# of Cores: 6
# of Threads: 12
Clock Speed: 3.2 GHz
Max Turbo Frequency: 3.8 GHz
Intel Smart Cache: 12 MB
Lithography: 32nm
# of Memory Channels: 4
Max Memory Bandwidth: 51.2 GB/s
Sandy Bridge-E:
Processor Number: i7-3820
# of Cores: 4
# of Threads: 8
Clock Speed: 3.6 GHz
Max Turbo Frequency: 3.8 GHz
Intel Smart Cache: 10 MB
Lithography: 32 nm
# of Memory Channels: 4
Max Memory Bandwidth: 51.2 GB/s
Gulftown:
Processor Number: i7-980X
# of Cores: 6
# of Threads: 12
Clock Speed: 3.33 GHz
Max Turbo Frequency: 3.6 GHz
Intel Smart Cache: 12 MB
Lithography: 32 nm
# of Memory Channels: 3
Max Memory Bandwidth: 25.6 GB/s
Sandy Bridge:
Processor Number: i7-2700K
# of Cores: 4
# of Threads: 8
Clock Speed: 3.5 GHz
Max Turbo Frequency: 3.9 GHz
Intel Smart Cache: 8 MB
Lithography: 32 nm
# of Memory Channels: 2
Max Memory Bandwidth: 21 GB/s
Looking on Max Memory Bandwidths (51.2 GB/s vs 25.6 GB/s) one cannot ask oneself how Intel doubled the performance by adding 4 channels vs 3 channels, meaning it should be 6 channels if dummy math is done.
Neuromancer
03-22-2012, 10:50 AM
Bandwidth doubled over X58 because of the limitations to the 1366 IMC.
Notice that sandybridge almost = x58 bandwidth despite only being dual channel memory
Sanmayce
03-22-2012, 10:57 AM
Thanks, I read from time-to-time articles about whole platforms but I must admit I have no experience except my old AMD Barton (the fastest 32bit CPU ever made I believe) and my nowadays Core 2 laptop, I have so much to learn: it is shocking to see how i7 boosts even the clean code (no RAM loads) loops as in Knight Tours benchmark.
Neuromancer
03-22-2012, 04:31 PM
Bartons were awesome.
Never had one, I ran T-Breds.. then moved on to A64, then back to p3 then back to a64.. then actually ran core2 arch for a little bit (hated it) back to AM2+ then AM3 and intel x58 setups. (skipped p55) Intel had a LONG period of time they sucked, but still rocked the benchmarks. Core2 arch was terrible compared to AMD, but superpied better so everyone drooled over it.
X58 was GREAT. And IMHO probably better than Sandybridge except in power consumption. X58 was snappy. Sandy bridge not so much. (Yes it benches betteR) going to fire up the x79 tomorrow... so we will see...
In car analogies. the AMD is the ricer quicker off the line but it aint a drag car.... The intel is the top speed car. (like the Bugatti Veyron needing a 13 mile track with a 5 mile arrow straight line to hit top speed.
Then again if what I see is true x79 should rock my world. sub 40ns mem latency might be the key.
Sanmayce
03-24-2012, 09:13 AM
>Then again if what I see is true x79 should rock my world. sub 40ns mem latency might be the key.
Double yes.
In my limited views the roadmap both for AMD and Intel (aside of making a fat CPU/GPU mix sharing one i.e. common memory!!!) is to continue this trend to lower drastically latencies - call me delusional but I think/dream of 10ns latency for main RAM whereas L1/L2/L3 are gonna be somewhat 1ns/2ns/3ns - bold huh. That is why I directed my intent towards the fine tuning of functions fetching in burst (i.e. sequential) mode small unaligned chunks - being the real BOOST of i5/i7 over all old architectures. For that reason I included a heavy Quicksort test sorting 7bytes chunks, to show how much better behaves i7 compared to inferiors, he-he.
And just a note about 'qpress' benchmark: when the resultant text file is loaded into notepad the text is not formatted because of LF endings (*nix format of ending lines i.e. LF), not as Windows users expect CRLF endings, to obtain Windows-like text file just load the file into Wordpad and save - that will do the conversion.
Neuromancer
03-24-2012, 12:08 PM
And just a note about 'qpress' benchmark: when the resultant text file is loaded into notepad the text is not formatted because of LF endings (*nix format of ending lines i.e. LF), not as Windows users expect CRLF endings, to obtain Windows-like text file just load the file into Wordpad and save - that will do the conversion.
not into command line or programming anymore but I assume that
LF=Line feed and CRLF= carriage return line feed.
If so, seems odd to me that windows would need to add LF at all after CR....
Sanmayce
03-26-2012, 10:06 AM
Yes it is odd and retarded, NOTEPAD is to be blamed, not to be able to load properly text files from the LF world (*nix) in my opinion is on purpose - to show that DOS/Windows CRLF endings are to stay, kind of stupid pride.
In fact, qpress uses *nix format so CR should be prefixed to each LF in order the 'proud-in-its-stupidity' NOTEPAD to be able to catch up 21st century.
Anyway I plan in next revision C of MJ to convert the qpress.txt with a tiny C written tool before loading into NOTEPAD.
Also I plan to add 7th test: ZPAQ - being one of the most powerful compressors on INTERNET, on top of that it is free, open source, and not encumbered by patents.
Its author Dr. Matt Mahoney is a renown expert in compression craft.
ZPAQ is multi-theaded and stresses well both CPU and RAM, highly cache sensitive/dependent. All-in-all it shows the integer (i.e. non floating point) computational power of modern systems.
If anyone has the time and will to send me ZIPed resultant text files from sixth tests along with CPU/RAM info I will be thankful.
My desire is to make a comparative (a table or something similar) study and to place it here as well.
The analysis is based on result ratios across different systems, for example one of the fastest single-threaded Lempel-Ziv [de]compressors (here dealing with 197MB English text file):
T7500:
Yappy_Intel_32bit_O3.exe: comp 29.9 MB/s uncomp 512.5 MB/s
Yappy_Intel_32bit_Ox.exe: comp 33.1 MB/s uncomp 513.0 MB/s
Yappy_Microsoft_32bit_Ox.exe: comp 32.3 MB/s uncomp 527.1 MB/s
i7 2600K:
Yappy_Intel_32bit_O3.exe: comp 52.9 MB/s uncomp 1362.2 MB/s
Yappy_Intel_32bit_Ox.exe: comp 57.5 MB/s uncomp 1362.2 MB/s
Yappy_Microsoft_32bit_Ox.exe: comp 54.8 MB/s uncomp 1385.9 MB/s
Very interesting (it tells something important worth to be known) ratios change:
54.8:32.3 = 1.6 is highly different than 1385.9:527.1 = 2.6
or if you prefer
527.1:32.3 = 16.3 and 1385.9:54.8 = 25.2
In my view dummy math screams well here.
Neuromancer
03-26-2012, 10:15 AM
I am heading out right now... when i get home I will run it on my stock thuban, tonight hopefully I will be hooking up an x79 system, although I have to finish up a dual channel ram kit before I move to quad channel to start the x79 review.
I know, im slow...
EDIT: setting up download now since its going to take 6 minutes lol
BTW, might want to clean up your site a bit.. dont know if you have a page limit or something on your host but I had to do a word search for Monstrous_Jesters.exe to find the download link.
http://www.sanmayce.com/Downloads/Monstrous_Jesters_revision_B.zip for anyone else looking for it.
Bones
03-26-2012, 12:11 PM
Just got a copy here and I'll do some runs with my 960T and Win 7 to see how it does. :cool3:
Neuromancer
03-26-2012, 01:56 PM
AMD 1090T, single threaded test so cores were hittign 3.6GHz, 1600 Mem 9-9-9 speed with 2400 CPUNB.
Took longer to clean up the TXT file than it did to run the test.
also seems wierd that the more times it found a phrase the worse performance was. wnt from 2500/s for 6 hits up to 6000/s for 0 hits...
but here you go
OSHO.TXT:
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 3644KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496
BNDM_64 49 i.e. average performance: 3695KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2914KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 3469KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 3821KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496
BNDM_64 49 i.e. average performance: 3756KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2492KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 3530KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 3958KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496
BNDM_64 49 i.e. average performance: 2999KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2432KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 3863KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 3573KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000
BNDM_64 49 i.e. average performance: 4613KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 3277KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 3262KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 3680KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000
BNDM_64 49 i.e. average performance: 4525KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2951KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 3418KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 3702KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000
BNDM_64 49 i.e. average performance: 3971KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528
Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2809KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624
Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 3732KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528
TEst2 YZ YAppy
YAPPY: [b 256K] bytes 206908949 -> 95947973 46.4% comp 51.8 MB/s uncomp 971.1 MB/s
YAPPY: [b 256K] bytes 206908949 -> 95947973 46.4% comp 53.7 MB/s uncomp 968.2 MB/s
YAPPY: [b 256K] bytes 206908949 -> 95947973 46.4% comp 48.3 MB/s uncomp 1038.5 MB/s
test3 Qpress
Kazuya_PTHREADed, rev. 0++, a search-hat(wrapper) over qpress written by Lasse Reinhold, written by Kaze.
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 2
Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3
Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 841MB/s
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kernel Time = 0.062 = 22%
User Time = 0.421 = 150%
Process Time = 0.483 = 172%
Global Time = 0.280 = 100%
Kazuya_PTHREADed, rev. 0++, a search-hat(wrapper) over qpress written by Lasse Reinhold, written by Kaze.
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 4
Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3
Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 1576MB/s
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kernel Time = 0.000 = 0%
User Time = 0.514 = 227%
Process Time = 0.514 = 227%
Global Time = 0.226 = 100%
Kazuya_PTHREADed, rev. 0++, a search-hat(wrapper) over qpress written by Lasse Reinhold, written by Kaze.
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 6
Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3
Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 2525MB/s
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kernel Time = 0.031 = 21%
User Time = 0.452 = 317%
Process Time = 0.483 = 339%
Global Time = 0.142 = 100%
Kazuya_PTHREADed, rev. 0++, a search-hat(wrapper) over qpress written by Lasse Reinhold, written by Kaze.
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 8
Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3
Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 2118MB/s
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kernel Time = 0.031 = 16%
User Time = 0.546 = 289%
Process Time = 0.577 = 306%
Global Time = 0.188 = 100%
Kazuya_PTHREADed, rev. 0++, a search-hat(wrapper) over qpress written by Lasse Reinhold, written by Kaze.
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 12
Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3
Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 2118MB/s
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kernel Time = 0.062 = 41%
User Time = 0.530 = 351%
Process Time = 0.592 = 392%
Global Time = 0.150 = 100%
Kazuya_PTHREADed, rev. 0++, a search-hat(wrapper) over qpress written by Lasse Reinhold, written by Kaze.
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 24
Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3
Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 2525MB/s
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kernel Time = 0.078 = 52%
User Time = 0.436 = 296%
Process Time = 0.514 = 349%
Global Time = 0.147 = 100%
Kazuya_PTHREADed, rev. 0++, a search-hat(wrapper) over qpress written by Lasse Reinhold, written by Kaze.
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 32
Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3
Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 2525MB/s
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kernel Time = 0.046 = 34%
User Time = 0.436 = 321%
Process Time = 0.483 = 356%
Global Time = 0.135 = 100%
Kazuya_PTHREADed, rev. 0++, a search-hat(wrapper) over qpress written by Lasse Reinhold, written by Kaze.
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 48
Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3
Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 1807MB/s
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kernel Time = 0.171 = 93%
User Time = 0.483 = 264%
Process Time = 0.655 = 358%
Global Time = 0.182 = 100%
test4 lzmm
206908949 -> 61014895 in 21.11 sec
206908949 -> 61014895 in 23.53 sec
206908949 -> 61014895 in 23.24 sec
test5 quicksort
Benchmarking 'memcpy' by copying 197MB (OSHO.TXT size) ten times ...
Simplicius says for 'memcpy' performance: 2676 MB/s
Simplicius says for Decompression Ratio: 10%
Simplicius_Simplicissimus_Septupleton 32bit/64bit rev.2, written by Kaze.
Benchmarking 'memcpy' by copying 197MB (OSHO.TXT size) ten times ...
Simplicius says for 'memcpy' performance: 2782 MB/s
Simplicius says for Decompression Ratio: 11%
test6 chess
Knight-tour.exe, revision 8.
|Sequences(only failures): |Jumps i.e. knight's moves: |Elapsed seconds:
|00,000,000,003,578,340,111 |00,000,000,004,464,360,629 |111.36
|Sequences(only failures): |Jumps i.e. knight's moves: |Elapsed seconds:
|00,000,000,003,578,340,111 |00,000,000,004,464,360,629 |107.85
Sanmayce
03-27-2012, 05:29 AM
Thanks Neuromancer (http://www.overclockaholics.com/forums/member.php?u=87).
>... might want to clean up your site a bit ...
Yeah you are right, I piled up all kind of stuff in a mumbo-jumbo manner, but provided quick links/tags to easy the pain as the following being the home-page/tag of 'Monstrous Jesters' package:
http://www.sanmayce.com/Downloads/index.html#Jesters
Last night I updated rev. B with rev. C (adding ZPAQ as 7th test, and converting qpress.txt to CRLF).
If you are interested here is the converter:
// LF2CRLF.C written by Kaze
#include <stdio.h>
#define LF 10
#define CR 13
main(int argc, char **argv)
{
FILE *in;
FILE *out;
char buffer[1];
char PrevChar[1];
if (argc != 3) {
printf("Usage: LF2CRLF infile outfile\n");
exit(13);
}
if ((in = fopen(argv[1], "rb")) == NULL) {
printf("Can't open %s\n",argv[1]);
exit(1);
}
if ((out = fopen(argv[2], "wb")) == NULL) {
printf("Can't open %s\n",argv[2]);
exit(2);
}
PrevChar[0]=0;
while (fread(buffer, sizeof(char), 1, in) == 1) {
if (buffer[0] == LF && PrevChar[0] != CR)
fputc(CR, out); // Add a CR before the LF only if the previous char was not CR
fputc(buffer[0], out);
PrevChar[0]=buffer[0];
}
}
Thanks Bones (http://www.overclockaholics.com/forums/member.php?u=172).
Glad glad I am for your readiness to help me.
Sanmayce
03-27-2012, 07:03 AM
Thanks a lot Neuromancer (http://www.overclockaholics.com/forums/member.php?u=87), I regret that didn't say exactly how to gather results, you did a lot of editing but there is no need of any, sorry for misleading you.
Something wrong with the test qpress: Process Time = 0.483 = 339% which suggests 4 threads?!
Is this AMD with 6cores or 4cores? AMD says that 1090T has 6cores.
http://shop.amd.com/us/All/ModelsPerLine/Desktop/Processor?Line=phenom%2Fphenomiix6black
You gave me some valuable information about AMD Phenom II X6 Black (45nm, 6 cores, 512KB L2 6144KB L3), it was a missing and needed test. I am still an AMD's fan despite their recent decline.
Some quick notes:
1]
Roughly speaking I have had some illusions about shining of Railgun_Quadruplet_7Hasherezade (using hashed approach), again the wonderful BNDM_64 eclipses the rest, I need the full dump in order to examine the exact behavior of all 4 functions through different patterns, though.
>... also seems wierd that the more times it found a phrase the worse performance was ...
The number of hits is not important but the length (and the TYPE mainly) of the phrase, this is the cause of my affection toward fine MEMMEM tuning - it needs careful analysis taking in account different string ranges/lengths.
2]
Sadly for some reason (I am puzzled here) Yappy test shows bad news?!
YAPPY: [b 256K] bytes 206908949 -> 95947973 46.4% comp 48.3 MB/s uncomp 1038.5 MB/s
1038.5 MB/s vs 1385.9 MB/s (on i7 2600K tested by rickss69), nah.
3]
Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 6
Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3
Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 2525MB/s
Sight for sore eyes, very pleasing indeed but I am awfully greedy I need 4400MB/s, why? That is why:
One of nifty benefits from Lasse's light-fast Lempel-Ziv library is to boost the sequential external RAM reads (HDDs, SSDs). For example if you have 520MB/s burst read (SATA III SSD) then you need xMB/s in order to double the burst load/read into physical/main RAM. The calculation is simple: assume we have those 520MB/s then in order to traverse OSHO.TXT(197MB) it would take 197/520=0.378s, when running qpress: OSHO.TXT.qp(75MB) it would take 75/520 + 197/2525 = 0.222s or ((0.378-0.222)/0.222)*100% = 70.2% boosting. Now I want 2x520MB/s this requires 0.378s/2=0.189s or the above mentioned 75/520 + 197/x = 0.189 which equals x = 197/(0.189-(75/520))=4400MB/s, a dream soon to come true.
And all this performed when using qpress (PTHREADed QuickLZ) in the dummy synchronous mode being slower than asynchronous.
4]
Intel's memcpy():
Simplicius says for 'memcpy' performance: 2676 MB/s
Microsoft's memcpy():
Simplicius says for 'memcpy' performance: 2782 MB/s
The pancake is turned - on Intel CPUs first result (Intel compiler used) is better than the second (Microsoft compiler used).
I don't know whether the forum allows it but the easiest way is to attach a ZIP file (of all resultant text files which are in your NOTEPAD) it is less than 64KB, or to email me this ZIP file to sanmayce@sanmayce.com, in future revisions (I want to gather results on some really overclocked monsters) my plan is to create a single HTML file (similar to the EVEREST's report) out of all (7 so far) resultant text files with a simple C written tool, in this way I will eliminate the torture you went through.
Neuromancer
03-27-2012, 08:44 AM
it is a 6 core cpu.
I will rerun all tests and save the unedited txt files and ul them
Reran tests and uploaded
Sanmayce
03-28-2012, 10:01 AM
Thanks a lot.
Very glad that AMD is the first CPU to be added side-by-side with my T7500. However I am disappointed from far-from uncompromising performance shown by AMD Phenom II X6 'Thuban' 1090T 6-core Black Edition, as I saw at:
http://www.futurelooks.com/the-amd-phenom-ii-x6-thuban-1090t-6-core-black-edition-processor-review/
"In a nutshell, it allows the CPU to dynamically overclock up to three of its own cores to provide extra performance. In the case of the 1090T pictured in the screenshot, we see that a couple of the cores have hit 3.6GHz, one is at 3.2GHz, which is the stock CPU speed, and the rest of them are clocked way down."
As the task manager shows first two working on 3584MHz, the third at ????MHz, the fourth on 3255MHz and rest two under 2000MHz!!! This is not a desktop CPU at all, grrr. All-in-all I hate AMD's Turbo CORE, it is like throwing dices - not utilizing the full power due to temperature limits. As for INTEL’s Turbo Boost I don't like it either - not knowing what is going on due to dynamical resets is like selling you a car and saying "you don't need these high RPMs or torque because you cannot change gears as we do", not for me. I prefer Turbo Boost disabled during the tests.
I was under the impression that BE (Black Edition) AMD CPUs were counterparts of X (Extreme) Intel CPUs. More useful would be a variant running all its cores at full speed - for extreme tests it is mandatory.
Neuromancer
03-28-2012, 01:22 PM
I got x79 up and running, putting in the quad channel memory tonight so will run your bench again on that.
As for the turbo core, it is exactly the same way it runs on Intel.
Most of the benches were bouncing around at 3.6 on my AMD stuff. It will hit 3.8 on the intel setup for single core, so far multithread testing = 3.5 across all cores.
I normally disable Speed step on Intel since it gave a laggy feel in general usage. But having trouble disabling it without disabling turbo on this Gigabutt board.
Sanmayce
03-30-2012, 08:24 AM
Thanks,
as for turbo CORE/BOOST AFAIK it is a complex internal tweak not only increasing CPU frequency but RAM timings and who-knows-what-else, my point is that I want to see how the CPU-RAM system responds to a particular test/program i.e. to gather stable results.
In order to feel how fundamental is sorting (I still don't get why major benchmarks lack it) here comes my newest phrase-checking package 'Dumbino', made last night it is the first (free and open-source) English phrase-checker:
http://www.sanmayce.com/Downloads/index.html#Dumbino
In a few words: MJ test Quicksort helps one understand how different CPU-RAM systems would behave on a really heavy load, by heavy I mean my current corpus of four-word-phrases (879,557,846), the MJ Quicksort test sorts 206,908,943 - in 'Dumbino' package I gave 140,222,335 phrases (after ripping the Google-books US n-gram corpus 400GB in size). Now in order to phrase-check (spell-check uses 1-grams) an entire ebook consisted of 42,208 4-gram phrases Dumbino mixes them with those 140,222,335 and resorts them, thus all familiar and unfamiliar phrases pop-up in SUB-LINEAR time!
@Neuromancer: When you have time (this year) I would like to hear your opinion on this subject (monstrous phrase-checking) which has been, is and will be in my sight for a long time.
Wanna salute all with one of my favorite video-songs ever: P!nk - Funhouse (http://www.youtube.com/watch?v=Jdjtqu3XK4U), the pianist is so joyful and charming.
Sanmayce
03-30-2012, 10:07 AM
Just wanted to throw a look at x79 and it is amazing it blows houses away:
Gigabyte X79 UD3, i7-3960X 4590MHz, Quad Channel at 1020MHz at 9-11-10-28 clocks:
Sandra says for memory bandwidth 49GB/s, i.e. 4x12 (with limit 4x12.8), it simply silenced me.
The info was taken from:
http://www.ninjalane.com/reviews/motherboards/ga-x79-ud3/page10.aspx
Sanmayce
04-01-2012, 07:34 AM
To see how Gigabyte 990FXA UD5, AMD 1090T is positioned against the Gigabyte X79 UD3, i7-3960X from the above post:
Stock Phenom II X6 1090T 'Thuban':
Core speed 3214MHz, Bus speed 200MHz, dual channel at 803MHz at 9-9-9-28 clocks
Sandra says for Integer Memory Bandwidth: 12.54GB/s
Overclocked Phenom II X6 1090T 'Thuban':
Core speed 4125MHz, Bus speed 250MHz, dual channel at 1000MHz at 9-9-9-24 clocks
Sandra says for Integer Memory Bandwidth: 19.28GB/s
Stock i7 3960X 'Sandy Bridge-E':
Core speed 3600MHz, Bus speed 100MHz, quad channel at 800MHz at 11-11-13-28 clocks
Sandra says for Integer Memory Bandwidth: 39GB/s
Overclocked i7 3960X 'Sandy Bridge-E':
Core speed 4590MHz, Bus speed 127MHz, quad channel at 1020MHz at 9-11-10-28 clocks
Sandra says for Integer Memory Bandwidth: 49GB/s :clapping: