Resource Consumption

This Resource Consumption Report reports help monitor how various HPC resources (CPU, memory, nodes, disk, etc.) are consumed. These insights are valuable for capacity planning, identifying inefficiencies, and ensuring optimal utilization of HPC infrastructure.

Each report focuses on a specific usage metric such as highest CPU cores used, lowest memory utilization, or failed jobs and can be filtered using parameters like provider, region, cluster, and time duration.

Note:

All reports listed Below follow the same functionality as HPC001 — they support filters, drilldown options, schedule creation, and export capabilities (Excel and PDF).

For a step-by-step walkthrough, see the detailed explanation under HPC001, the same steps apply to all other reports.

HPC001 – Highest CPU Core Usage

This report identifies the project, cluster, queue, user, or job that consumed the highest number of CPU cores within a selected time period.

Filters Available and Usage

Reports Filter The following filters are available:

Provider – e.g., AWS, Azure.
Account – Cloud account to narrow scope.
Region – Geographic data center region.
Cluster – Compute cluster to analyze.
Queue – Queue where jobs ran.
User - User who ran the job.
Group By – used to group data by project, cluster, queue or user.

Report Filter Usage

When one or more filters are selected, the report displays data accordingly based on the selected Project, Account, Region, Cluster, Queue, or User. Additionally, the Group by option organizes the results based on the chosen grouping criteria.

Report Filter Data

Project Level Report

By clicking the Run Report button without selecting any filters, the report will be generated at the Project level. You can then drill down further into Cluster → Queue → User → Job by clicking on the respective project.

Project Level Report

The horizontal stacked bar chart shows as follows:

Top projects with highest number of CPUs: Shows how many CPU cores were used overall by each project.
Top projects By highest concurrent CPU cores: Shows the highest number of CPU cores used at the same time by each project.

The Project detail contains:

Project Name - The name of the HPC project. In this case, it's HPC.
Number OF CPUs - Total number of CPUs allocated or used across all jobs for this project.
RUNNING - The number of jobs that are currently in progress (actively running).
CANCELLED - Jobs that were manually or automatically cancelled before completion.
COMPLETED - Jobs that successfully finished execution without errors.
FAILED - Jobs that terminated due to an error or issue before completing.
STOPPED - Jobs that were forcefully stopped during execution.
SUSPENDED - Jobs that were paused or suspended (but not stopped or cancelled).
TIMEOUT - Jobs that exceeded the maximum allowed execution time and were stopped automatically.
TOTAL JOB DURATION - The cumulative duration of all jobs (completed + failed + others), shown as days hh:mm:ss.

Likewise for the Cluster, Queue, User and job are as follows

Cluster Level Report

Project(Eg.HPC) -> Cluster Details(respective cluster(s) for the selected project)

Cluster Level Report

Queue Level Report

Cluster -> Queue Details(respective queue(s) for the selected cluster)

Queue Level Report

User Level Report

Queue -> User Details(respective user(s) for the selected queue)

User Level Report

Job Level Report

User -> Job Details(Selected Users Job details)

Job Level Report

Schedule Creation

The Create Schedule feature allows you to automate report generation. Instead of manually selecting filters and running reports each time, you can schedule the report to run at regular intervals (e.g., daily, weekly) and have it emailed or stored for easy access.

Click on the Create Schedule button located near the top-right, beside the Search button.
In the "Schedule" dropdown, select a schedule frequency (e.g., Daily, Weekly) and in the "Send notifications to" dropdown, choose the user(s) who should receive alerts when the report is generated.
Review the applied filters displayed at the top (like Provider, Project, Account, Region, Cluster, Queue, and User) – these will be locked to the schedule.
Click "Save" to finalize the schedule or "Cancel" to discard the setup.
After saving the schedule, you can view them in the "view schedule" in landing page of report section.
There are three actions that can be performed on a schedule:
- Edit – Update schedule settings such as filters, time, or recipients.
- Delete – Remove the schedule permanently.
- Enable/Disable – Temporarily activate or deactivate the schedule without deleting it.

Export Report

The Download button at the top right corner allows you to export the report data in two formats:

Excel (.xlsx): Contains the tabular report data based on selected filters.
PDF (.pdf): Contains the graph/chart data shown in the visual section of the report.

Report Download

When you click the Download icon, a Drilldown Downloads window appears, allowing you to choose the level of data you want to export.

Report Download

You can select from the following drilldown options:

Project
Cluster
Queue
User

This feature enables exporting both the current grouped level data and its child-level breakdown, providing a more detailed and contextual report.

Example: If you are on the Project level and choose to download User-level data, the Excel file will include both the Project data and the User data for that project in separate tables.
Direct Export: If you are at the Project or Job level, the child-level dropdown will not appear — the download will directly export the data for the current level

Export Formats: Excel and PDF

Excel

Report Excel File

PDF

Report PDF File

HPC002 – Lowest CPU Core Usage

Identifies users or jobs with minimal CPU core consumption.
Helps in spotting underutilized or misconfigured jobs.

HPC003 – Highest Memory Utilization

Highlights jobs or workloads with significant RAM usage.
Useful for monitoring memory-heavy processes for optimization.

HPC004 – Lowest Memory Utilization

Shows jobs that consumed very little memory.
Helps detect inefficient resource allocation or configuration issues.

HPC005 – Highest Node Utilization

Displays jobs that utilized the highest number of compute nodes in a run.

HPC006 – Lowest Node Utilization

Identifies jobs that underused node allocations .

HPC011 – Failed Jobs

Lists jobs that failed to complete successfully, with resource usage context.

HPC014 – Highest Disk Usage

Highlights jobs or users with the largest storage footprint.
Aids in storage cleanup and capacity planning decisions.

HPC015 – Highest CPU Time Utilization

Shows jobs with the most total CPU time (cores × wall time).
Reveals computationally expensive workloads for further analysis.

HPC016 – Lowest CPU Time Utilization

Captures jobs with very low CPU time usage.
Could indicate errors, early termination, or inefficient execution.