Does VTune analyzer work with Xen-enabled kernels?

Does VTune™ analyzer work with Xen-enabled kernels? Sampling is not supported on Xen-enabled kernels. There are conflicts on accessing the PMU hardware for virtual environments and we do not have a solution for event based sampling data collection.

Here is the original: 
Does VTune analyzer work with Xen-enabled kernels?

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Parallelism in the Intel® Math Kernel Library

Abstract Software libraries provide a simple way to get immediate performance benefits on multicore, multiprocessor, and cluster computing systems. The Intel® Math Kernel Library (Intel® MKL) contains a large collection of functions that can benefit math-intensive applications

Go here to read the rest:
Parallelism in the Intel® Math Kernel Library

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Automatic Parallelization with Intel® Compilers

Abstract Multithreading an application to improve performance can be a time-consuming activity. For applications where most of the computation is carried out in simple loops, the Intel® compilers may be able to generate a multithreaded version automatically.

More here:
Automatic Parallelization with Intel® Compilers

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Use Non-blocking Locks When Possible

Abstract Threads synchronize on shared resources by executing synchronization primitives offered by the supporting threading implementation. These primitives (such as mutex, semaphore, etc.) allow a single thread to own the lock, while the other threads either spin or block depending on their timeout mechanism.

Original post:
Use Non-blocking Locks When Possible

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Choosing Appropriate Synchronization Primitives to Minimize Overhead

Abstract When threads wait at a synchronization point, they are not doing useful work. Unfortunately, some degree of synchronization is usually necessary in multithreaded programs, and explicit synchronization is sometimes even preferred over data duplication or complex non-blocking scheduling algorithms, which in turn have their own problems

The rest is here: 
Choosing Appropriate Synchronization Primitives to Minimize Overhead

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Use Synchronization Routines Provided by the Threading API Rather than Hand-Coded Synchronization

Abstract Application programmers sometimes write hand-coded synchronization routines rather than using constructs provided by a threading API in order to reduce synchronization overhead or provide different functionality than existing constructs offer. Unfortunately, using hand-coded synchronization routines may have a negative impact on performance, performance tuning, or debugging of multi-threaded applications

Read the original:
Use Synchronization Routines Provided by the Threading API Rather than Hand-Coded Synchronization

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Managing Lock Contention: Large and Small Critical Sections

Abstract In multithreaded applications, locks are used to synchronize entry to regions of code that access shared resources. The region of code protected by these locks is called a critical section

The rest is here:
Managing Lock Contention: Large and Small Critical Sections

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Exploiting Data Parallelism in Ordered Data Streams

Abstract Many compute-intensive applications involve complex transformations of ordered input data to ordered output data. Examples include sound and video transcoding, lossless data compression, and seismic data processing.

Read more: 
Exploiting Data Parallelism in Ordered Data Streams

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Using Tasks Instead of Threads

Abstract Tasks are a light-weight alternative to threads that provide faster startup and shutdown times, better load balancing, an efficient use of available resources, and a higher level of abstraction. Three programming models that include task based programming are Intel® Threading Building Blocks (Intel® TBB), and OpenMP*

Go here to see the original:
Using Tasks Instead of Threads

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg

Level Up 2010 FAQs

Stay tuned for updates to the FAQ document for Level Up 2010

Here is the original:
Level Up 2010 FAQs

Share and Enjoy:
  • Mixx
  • Faves
  • Global Grind
  • BlinkList
  • connotea
  • Furl
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Digg