Myxeon has 2 cpus each with 6 cores.
My application performs a cpu-intensive calculation on an image.
The application runs n threads - each with its own image (child buffer) of the same size for k iterations.
I noticed the more threads the higher the time it takes per thread.
I start with 0.83 ms per single runing solely thread and end up with 1.3 per thread with 12 threads.
Setting a thread per core using SetAffinityMask made no improvement.
Another problem rise when using high number of threads - the are a lot more andbigger fluctuations in the time per iteration.
The code itself is mostly sse4 code and the images are of 100X100X3 so there should not be any cache problem.
I would appreciate any idea...