SSE/SSE2 INTRINSICS CODES
given A 4X4 MATRIX X:X[4][4]=X00 X01 X02 X04 X10 X11 X12 X13 X20 X21 X22 X23 X30 X31 X32 X33IF ROW : X00 X01 X02 X04 is denoted by x0: X10 X11 X12 X13 is denoted by x1 X20 X21 X22 X23 is denoted by x2...
View ArticleError in documetation of color conversion
I am using intel compiler 11.1.054 and I have the ippiman.pdf of march 2009. In page 273 there is "Example 6-1 Using the Function ippiRGBToYUV".The example returns error due to wrong roi size. it...
View ArticleXeon X5680 slowdown using multithreading
Myxeon has 2 cpus each with 6 cores.My application performs a cpu-intensive calculation on an image.The application runs n threads - each with its own image (child buffer) of the same size for k...
View ArticleHaswell New Instructions posted
A full specification for the Haswell (2013) new instructions was just posted to the programmer's reference manual at http://software.intel.com/file/36945. A blog will be coming shortly. -Mark Buxton
View ArticleLooking for efficient way to convert float (32 bit) aligned buffer to short...
I wrote a c-code and an AVX code to convert an alignedbuffer of size 1920*1280*3 from float to short.The AVX implementation is 3 times slower than the c-code.Here is the AVX code for the...
View Article1024 bit AVX
The following blog article describes the AVX as having been designed for use with up to registers of 1024 bits.http://electronicdesign.com/article/digital/intel_s_avx_scales_to_1024_b...Is this for...
View ArticleIcl recompiles project whenever a single file is changed.
Iam compiling aproject with hundredc++ files.Whenever I change a c++/h file the whole project the intel compiler recompiles the whole project.I use:Intel C++ Composer XE 2011 Integration for Microsoft...
View ArticleSlow down when runnning multiple threads with exact same algorithm
I am using xeon x5680 3.33GHZ dual cpu with 6 cores each, windows 7 64-bit 12GB ram.I am runnig a filter on a image of size 640X480.Using single thread to apply the filter on the image results 6.5 ms...
View ArticleIntrinsic guide 2.6 error in documentation
In the documentation the intrinsic _mm_mulhrs_epi16 the shift right should be 15 and not 14.
View Article_mm_unpackhi_epi8 and _mm_unpacklo_epi8 to convert 16 signed chars into 2...
I am using the _mm_unpacklo_epi16 and _mm_unpackhi_epi16 with second argumet vector of 0s to convert signed/unsigned short vectors into 2 signed/unsigned integer vectors. i.e.:__m128i lowVec =...
View Article