Contents
Discover Where Vectorization Will Pay Off The Most
Identify Performance Bottlenecks Using Roofline
Identify High-impact Opportunities to Offload to GPU
Intel® Advisor is composed of a set of tools to help ensure Fortran, C, C++, OpenCL™, and Data Parallel C++ (DPC++) applications realize full performance potential on modern processors.
Intel Advisor is available as a standalone installation and as part of Intel® oneAPI Base Toolkit.
Intel Advisor enables you to analyze your code from the following perspectives:
This document summarizes typical workflows to get started improving the performance potential of your application with Intel Advisor.
By default, the Intel® Advisor <install-dir> is as follows:
On Windows* OS:
C:\Program Files (x86)\Intel\oneAPI\advisor\<version> (on certain systems, instead of Program Files (x86), the directory name is Program Files)
On Linux* OS:
On macOS*:
/opt/intel/oneapi/advisor/<version>
On Windows* OS:
To set up the environment for Intel Advisor, run the <install-dir>\env\vars.bat script.
On Linux* OS:
To set up the environment for Intel Advisor, run the source <install-dir>/env/vars.sh script.
On macOS*:
To set up the environment for Intel Advisor, run one of the following commands:
Explore a step-by step instruction how to view your Intel Advisor results on a macOS* machine in the Intel Advisor Cookbook: Analyze Performance Remotely and Visualize Results on a Local macOS* System.
Vectorization and Code Insights perspective is a vectorization analysis toolset that enables you identify loops that will benefit most from vector parallelism. Profile your application using the Survey tool to locate un-vectorized and under-vectorized time-consuming functions/loops and calculate estimated performance gain achieved by vectorization.
There are two ways to run the Vectorization and Code Insights perspective: from the Intel® Advisor GUI and from CLI. Intel Advisor enables you to open results collected using both methods in the GUI.
In the Analysis Workflow pane, use a drop-down menu to select Vectorization and Code Insights
perspective, set data collection accuracy level to Low, and click the
button to run it. At this accuracy level,
Intel® Advisor runs Survey analysis and collects performance metrics of your application to locate under- and non-vectorized hotspots that can improve the total execution time of your application. For details about data collection accuracy presets, see the respective section in the
Intel Advisor User Guide. Upon completion,
Intel Advisor generates a
Summary Report.
Vectorization and Code Insights Summary includes the most important information about your application and gives you hints for further optimization steps. The report includes the following sections:

To run Survey analysis and collect performance metrics of your application using advisor command line interface, run the following command:
advisor –collect=survey --project-dir =.\advi -- myApplication
Upon completion, Intel Advisor enables you to open the summary of collected results in the GUI or generate a Survey report in the CLI using the following command:
advisor --report=survey --project-dir=./advi
In the terminal, Intel Advisor displays a Survey report that shows top ten most time-consuming functions/loops. By default, a copy of this report is saved into <project-dir>/e<NNN>/hs<NNN>/data.0/advisor-survey.txt. For details about generating CLI reports, see the respective section in the Intel Advisor User Guide or use the following command in your terminal:
advisor --help report
If all loops are vectorizing properly and performance is satisfactory, you are done! Congratulations!
If one or more loops is not vectorizing properly and performance is unsatisfactory:
Improve application performance using various Intel Advisor features to guide your efforts, such as:
Information in the
Performance Issues column and associated
Recommendations tab.
Suggestions in Examine Non-Vectorized and Under-Vectorized Loops in the Intel Advisor User Guide.
Optional Dependencies and Memory Access Patterns (MAP) analyses to help you dig deeper.
Rebuild your modified code.
Run another Survey analysis to verify all loops are vectorizing properly and performance is satisfactory.
Explore a vectorization use case in Intel Advisor Cookbook: Analyze Vectorization and Memory Aspects of an MPI Applications.
CPU / Memory Roofline Insights perspective enables you to visualize actual performance against hardware-imposed performance ceilings, as well as determine the main limiting factor (memory bandwidth or compute capacity).

There are two ways to run the CPU / Memory Roofline Insights perspective: from the Intel® Advisor GUI and from CLI. Intel Advisor enables you to open results collected using both methods in the GUI.
In the
Analysis Workflow pane, the drop-down menu to select the
CPU / Memory Roofline Insights perspective, set data collection accuracy level to
Low, and click the
button to run it. At this accuracy level,
Intel Advisor:
Measures the hardware limitations of your machine and collects loop/function timings using the Survey analysis.
Collects floating-point and integer operations data, and memory data using the Characterization analysis.
For details about data collection accuracy presets, see Intel Advisor User Guide: CPU Roofline Accuracy Presets. Upon completion, Intel Advisor displays a Roofline chart.
The Roofline chart plots an application's achieved performance and arithmetic intensity against the machine's maximum achievable performance:
In general:
The greater the distance between a dot and the highest achievable roofline, the more room for optimization a function/loop has.

To run CPU / Memory Roofline Insights perspective using advisor command line interface, use the following command:
advisor --collect=roofline --project-dir=./advi --search-dir src:p=./advi –- myApplication
This command is a batch mode that runs two analyses one by one:
To view the achieved performance of your application against hardware-imposed performance ceilings on an interactive Roofline chart, open the collected results in the Intel Advisor GUI or use the following command to generate an interactive HTML Roofline report:
advisor --report=roofline --report-output=./advi/advisor-roofline.html --project-dir=./advi
Where report-output option specifies the directory and the HTML file into which Intel Advisor saves the generated report.
For details about generating CLI reports, see the respective section in the Intel Advisor User Guide or use the following command in your terminal:
advisor --help report
If one or more loops is not vectorizing properly and performance is unsatisfactory:
Offload Modeling perspective enables you to identify high-impact opportunities to offload to GPU as well as the areas that are not profitable to offload. It provides performance speed-up projection on accelerators along with offload overhead estimation and pinpoints accelerator performance bottlenecks.
There are two ways to run the Offload Modeling perspective: from the Intel® Advisor GUI and from CLI. Intel Advisor enables you to open results collected using both methods in the GUI or in your web browser.
In the Analysis Workflow pane, use a drop-down menu to select the Offload Modeling perspective, set data collection accuracy level to Medium. At this accuracy level, Intel® Advisor:
Click the
button to run the perspective
For details about data collection accuracy presets, see Intel Advisor User Guide: Offload Modeling Accuracy Presets.
Upon completion, Intel Advisor displays an Offload Modeling Summary that offers you information on total potential speed-up of your application, top 5 offloaded code regions in your call tree, top 5 regions that are not profitable to offload, number of offloaded functions/loops, and a fraction of offloaded code relative to total time of original application.

Intel Advisor generates an interactive HTML report that is stored in the <project-dir>/e<NNN>/pp<NNN>/data.0. You can open the HTML report in your web browser.
To collect data and model your application performance on a target GPU using the advisor command line interface, do the following:
advisor --collect=survey --stackwalk-mode=online --static-instruction-mix --project-dir=./advi -- myApplicationadvisor --collect=tripcounts --flop --stacks --enable-cache-simulation --data-transfer=light --target-device=gen9_gt2 --project-dir=./advi -- myApplicationWhere:
advisor --collect=projection --no-assume-dependencies --config=gen9_gt2 --project-dir=./advi
Where:
For details about CLI options, see Intel Advisor User Guide: Command Line Interface.
Upon completion, open the collected results in the Intel Advisor GUI or open the interactive HTML report that is stored in the <project-dir>/e<NNN>/pp<NNN>/data.0 using your web browser.
After running the Offload Modeling perspective, you need to identify, whether your top hotspots have loop-carried dependencies that might be show-stoppers for offloading. To do that:
button to run it.
advisor --collect=projection --assume-dependencies --config=gen9_gt2 --project-dir=./advi
For details about checking for loop-carried dependencies, see the respective section in the Intel Advisor User Guide.
View useful information about Offload Modeling in the Offload Modeling Resources page.
Explore more ways to run Offload Modeling perspective from command line interface in Intel Advisor User Guide: Run Offload Modeling from Command Line.
Explore typical scenarios of optimizing GPU usage described in the Intel Advisor Cookbook:
Explore a typical scenario of optimizing GPU usage described in .
GPU Roofline Insights perspective enables you to estimate and visualize actual performance of GPU kernels using benchmarks and hardware metric profiling against hardware-imposed performance ceilings, as well as determine the main limiting factor.
There are two ways to run GPU Roofline Insights perspective: from the Intel® Advisor GUI and from CLI. Intel Advisor enables you to open results collected using both methods in the GUI.
In the
Analysis Workflow pane, use a drop-down menu to select the
GPU Roofline Insights perspective, set data collection accuracy level to
Low, and click the
button to run it. At this accuracy level,
Intel Advisor:
Measures the hardware limitations and collects OpenCL™, OpenMP*, oneAPI Level Zero (Level Zero) and Data Parallel C++ (DPC++) kernels timings and memory data using the Survey analysis with GPU profiling.
Collects floating-point and integer operations data using the Trip Counts and FLOP analysis with GPU profiling.
For details about data collection accuracy presets, see Intel Advisor User Guide: GPU Roofline Accuracy Presets. Upon completion, Intel Advisor displays a GPU Roofline Summary. Switch to the GPU Roofline Regions tab to view the Roofline Chart and identify the main factors limiting the performance of your application.
GPU profiling is applicable only to Intel® Processor Graphics.
A Roofline chart plots an application's achieved performance and arithmetic intensity against the machine's maximum achievable performance:
Arithmetic intensity (x axis) - measured in number of floating-point operations (FLOPS) per byte for FLOAT Roofline chart and in number of integer operations (INTOPS) per byte for INT Roofline chart based on the kernel algorithm, transferred between GPU and memory
Performance (y axis) - measured in billions of floating-point operations (GFLOPS) per second for FLOAT Roofline chart and in billions of integer operations (GINTOPS) per second for INT Roofline chart
In general:
The size and color of each dot represent relative execution time for each kernel. Large red dots take the most time, so are the best candidates for optimization. Small green dots take less time, so may not be worth optimizing.
Diagonal lines indicate memory bandwidth limitations preventing kernels from achieving better performance without some form of optimization.
Depending on your system configuration the following rooflines might be available on the Roofline chart:
L3 cache roof: Represents the maximal bandwidth of the L3 cache for your current graphics hardware. Measured using an optimized sequence of load operations, iterating over an array that fits entirely into L3 cache.
SLM cache roof: Represents the maximal bandwidth of the Shared Local Memory for your current graphics hardware. Measured using an optimized sequence of load and store operations that work only with SLM.
GTI roof: Represents the maximum bandwidth between the GPU and the rest of the SoC. This estimate is calculated via analytical formula based on the maximum frequency of your current graphics hardware.
DRAM roof: Represents the maximal bandwidth of the DRAM memory available to your current graphics hardware. Measured using an optimized sequence of load operations, iterating over an array that does not fit in GPU caches.
Horizontal lines indicate compute capacity limitations preventing kernels from achieving better performance without some form of optimization.
A dot cannot exceed the topmost rooflines, as these represent the maximum capabilities of the machine. However, not all kernels can utilize maximum machine capabilities.
The greater the distance between a dot and the highest achievable roofline, the more opportunity exists for performance improvement.
The GPU Roofline chart is based on a CPU Roofline chart layout, but there are some differences:
The dots on the chart correspond to OpenCL, OpenMP, Level Zero and DPC++ kernels, while in the CPU version, they correspond to individual loops.
Some displayed information and controls (for example, thread/core count) are not relevant to GPU Roofline. For more information, see the table below.
The GPU Roofline chart enables you to view arithmetic intensity of one kernel at multiple memory levels. To do so, double-click a dot representing this kernel or select it and perss ENTER. The dots that appear on the Roofline chart correspond to different memory levels used to calculate arithmetic intensity. Hover over a dot to identify its arithmetic intensity. To show or hide certain dots from a chart, use the Memory Level drop-down filter.

To run GPU Roofline Insights perspective using advisor command line interface, use the following command:
advisor --collect=roofline --profile-gpu --project-dir=./advi --search-dir src:p=./advi –- myApplication
advisor --collect=survey --profile-gpu --project-dir=./advi --search-dir src:p=./advi –- myApplication
advisor --collect=tripcounts --no-trip-counts --flop --profile-gpu --project-dir=./advi --search-dir src:p=./advi –- myApplication
Where:
This command is a batch mode that runs two analyses one by one:
To view the achieved performance of your application against hardware-imposed performance ceilings on an interactive Roofline chart, open the collected results in the Intel Advisor GUI or use the following command to generate an interactive HTML Roofline report:
advisor --report=roofline --profile-gpu --report-output=./advi/advisor-roofline.html --project-dir=./advi
Where report-output option specifies the directory and the HTML file into which Intel Advisor saves the generated report.
By default, Intel Advisor generates a FLOAT Roofline chart. To switch to INT Roofline chart, add a –-data-type=int option to your command.
For details about generating CLI reports, see the respective section in the Intel Advisor User Guide or use the following command in your terminal:advisor --help report
Use the GPU Roofline Summary to compare performance of your application on a CPU and on a GPU device.
Investigate performance metrics for your kernels and recommendations with possible optimization steps in the GPU Code Analytics pane.
Explore a use case for optimizing GPU usage described in Intel Advisor Cookbook: Identify Code Regions to Offload to GPU and Visualize GPU Usage.
Threading perspective enables you to identify the best candidates for parallelizing, prototype threading and check, if there are data dependencies preventing parallelizing of certain functions/loops.
There are two ways to run the Threading perspective: from Intel® Advisor GUI and from CLI. Intel Advisor enables you to open results collected using both methods in the GUI.
To run Threading perspective and improve performance of your application, follow the scenario described below:
button to run the perspective. At this accuracy level,
Intel Advisor collects data about execution time of your functions/loops using Survey analysis. Upon completion,
Intel Advisor generates a Survey report.
The main types of Intel Advisor annotations mark the location of:
A parallel site. A parallel site is a region of code that contains one or more tasks that may execute in one or more parallel threads to distribute work. An effective parallel site typically contains a hotspot that consumes application execution time. To distribute these frequently executed instructions to different tasks that can run at the same time, the best parallel site is not usually located at the hotspot, but higher in the call tree.
One or more parallel tasks within a parallel site. A task is a portion of time-consuming code with data that can be executed in one or more parallel threads to distribute work.
Locking synchronization, where mutual exclusion of data access must occur in the parallel application.
Upon completion, open the Suitability report in the results window.
The Suitability Report predicts maximum speedup based on the inserted annotations and what-if modeling parameters with which you can experiment, such as:
Different hardware configurations and parallel frameworks
Different trip counts and instance durations
Any plans to address parallel overhead, lock contention, or task chunking when you implement your parallel framework code

Use the Refinement report to examine the collected dependencies data and identify functions/loops with data sharing problems.
To run Threading perspective using advisor command line interface, do the following:
advisor --collect=survey --project-dir=./advi --search-dir=./advi -- myApplication
advisor --report=survey --filter=total-time --project-dir=./advi
Loops/functions with the longest execution time are the best candidates for parallelizing.
advisor --collect=tripcounts --project-dir=./advi --search-dir src:p=./advi -- myApplication
advisor --collect=suitability --project-dir=./advi --search-dir src:p=./advi -- myApplication
advisor --collect=dependencies --project-dir=./advi --search-dir src:p=./advi -- myApplication
Open and examine the collected results in the Intel Advisor GUI or generate reports using CLI:
advisor --report=suitability --project-dir=./advi
By default, Intel Advisor displays the report in the terminal and saves it to <project-dir>\eNNN\stNNN\advisor-suitability.txt.
advisor --report=dependencies --project-dir=./advi
By default, Intel Advisor displays the report in the terminal and saves it to <project-dir>\eNNN\dpNNN\advisor-dependencies.txt.
For details about generating CLI reports, see the respective section in the Intel Advisor User Guide or use the following command in your terminal:
advisor --help report
If you decide the predicted maximum speedup benefit is worth the effort to add threading parallelism to your application:
Complete developer/architect design and code reviews about the proposed parallel changes.
Choose one parallel programming framework (threading model) for your application, such as oneTBB, OpenMP*, Microsoft Task Parallel Library* (TPL), or some other parallel framework.
Add the parallel framework to your build environment.
Add parallel framework code to synchronize access to the shared data resources, such as oneTBB or OpenMP locks.
Add parallel framework code to create parallel tasks.
As you add the appropriate parallel code from the chosen parallel framework, you can keep, comment out, or replace the Intel Advisor annotations.
Resource |
Description |
|---|---|
Refer to this guide for instructions to get started with the command line, detailed information on analysis types, information on how to use the GUI, and more. |
|
| View the most useful resources that can help you achieve better performance of your application using vectorization. | |
| Roofline Resources for Intel® Advisor Users | View the most useful resources that can help you identify hardware-imposed ceilings using Intel Advisor CPU/GPU Roofline perspectives. |
| Explore typical use-cases of Intel Advisor. Follow the step-by-step instructions to help effectively use more cores, vectorization, or heterogeneous processing. | |
| Explore a built-in graphical tool that helps you visualize and analyze graphs of a oneAPI Threading Building Blocks (oneTBB), OpenMP*, and Data Parallel C++ (DPC++) applications. | |
Analyze Performance Remotely and Visualize Results on a Local macOS* System |
View a step-by-step instruction how to visualize Intel Advisor perspective results on a macOS machine. |
| Explore new features of Intel Advisor. | |
| View tutorials that can help you experiment with Intel Advisor sample applications and run different perspectives. | |
Offline Resources |
One of the key Vectorization perspective features is GUI-embedded advice on how to fix vectorization issues specific to your code. To help you quickly locate information that augments that GUI-embedded advice, the Intel Advisor provides offline compiler mini-guides. You can also find offline Recommendations and Compiler Diagnostic Details advice libraries in the same location as the mini-guides. Each issue and recommendation in these HTML files is collapsible/expandable. Linux* OS: Available offline documentation is installed inside <advisor-install-dir>/documentation/<locale>/. Windows* OS: Available offline documentation is installed inside <advisor-install-dir>\documentation\<locale>\. |
You may encounter the following known issues when using the following to view documentation:
Microsoft Windows Server* 2012 system: Trusted site prompt appears. Solution: Add about:internet to the list of trusted sites in the Tools > Internet Options > Security tab. You can remove after you finish viewing the documentation.
Microsoft Internet Explorer* 11 browser: Topics do not appear when you select them in the TOC pane. Solution: Add http://localhost to the list of trusted sites in the Tools > Internet Options > Security tab. You can remove after you finish viewing the documentation.
Microsoft Edge browser:
Context-sensitive (also known as F1) calls to a specific topic open the title page of the corresponding document instead. Solution: Use a different default browser.
Panes are truncated and a proper style sheet is not applied. Solution: Use a different default browser.
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.