Use Synchronization Routines Provided by the Threading API Rather than Hand-Coded Synchronization

Abstract Application programmers sometimes write hand-coded synchronization routines rather than using constructs provided by a threading API in order to reduce synchronization overhead or provide different functionality than existing constructs offer. Unfortunately, using hand-coded synchronization routines may have a negative impact on performance, performance tuning, or debugging of multi-threaded applications

Read the original:
Use Synchronization Routines Provided by the Threading API Rather than Hand-Coded Synchronization

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Managing Lock Contention: Large and Small Critical Sections

Abstract In multithreaded applications, locks are used to synchronize entry to regions of code that access shared resources. The region of code protected by these locks is called a critical section

The rest is here:
Managing Lock Contention: Large and Small Critical Sections

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Exploiting Data Parallelism in Ordered Data Streams

Abstract Many compute-intensive applications involve complex transformations of ordered input data to ordered output data. Examples include sound and video transcoding, lossless data compression, and seismic data processing.

Read more: 
Exploiting Data Parallelism in Ordered Data Streams

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Using Tasks Instead of Threads

Abstract Tasks are a light-weight alternative to threads that provide faster startup and shutdown times, better load balancing, an efficient use of available resources, and a higher level of abstraction. Three programming models that include task based programming are Intel® Threading Building Blocks (Intel® TBB), and OpenMP*

Go here to see the original:
Using Tasks Instead of Threads

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Detecting Memory Bandwidth Saturation in Threaded Applications

Abstract Memory sub-system components contribute significantly to the performance characteristics of an application. As an increasing number of threads or processes share the limited resources of cache capacity and memory bandwidth, the scalability of a threaded application can become constrained. Memory-intensive threaded applications can suffer from memory bandwidth saturation as more threads are introduced

Go here to see the original: 
Detecting Memory Bandwidth Saturation in Threaded Applications

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Avoiding and Identifying False Sharing Among Threads

Abstract In symmetric multiprocessor (SMP) systems, each processor has a local cache. The memory system must guarantee cache coherence

Read the original here:
Avoiding and Identifying False Sharing Among Threads

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

FFTW mismatching malloc and free calls result error

When using FFTW and links with Intel® MKL 10.2 update 3, it needs to be consistent in matching  fftw_malloc  & fftw_free statements.  We changed the fftwl_malloc and fftwl_free to call MKL_malloc and MKL_free from MKL 10.2 update 3 onwards.

Read the original post:
FFTW mismatching malloc and free calls result error

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Performance Tools for Software Developers – Memory Function FAQ

The rest is here:
Performance Tools for Software Developers – Memory Function FAQ

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Some service functions have become obsolete and will be removed in subsequent releases.

Technical Notes:  Starting with version 10.2 of Intel(R) MKL, the names of some service functions have become obsolete and will be removed in subsequent releases.

Original post: 
Some service functions have become obsolete and will be removed in subsequent releases.

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

libmkl_scalapack_lp64.so: undefined reference to `MKL_SCALAPACK_INT'

When link with Intel® Math Kernel Library 10.2 update 2 on SGI* Workstation with Intel® Xeon processor, the following error is reported: libmkl_scalapack_lp64.so: undefined reference to `MKL_SCALAPACK_INT’ libmkl_scalapack_lp64.so: undefined reference to `Cdsendrecv’ The cause of this issue happens only when SGI’s MPI library and libmkl_blacs_sgimpt_lp64.a library from MKL are used. This is a known issue and this will be fixed in the future versions. Workaround to fix this issue: Compile a C source file with the below two lines and link it in addition to MKL: #include int MKL_SCALAPACK_INT = (int) MPI_INT;

See original here: 
libmkl_scalapack_lp64.so: undefined reference to `MKL_SCALAPACK_INT'

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg