HomeLatest NewsHP outlines on-prem AI plan for local compute and governance

HP outlines on-prem AI plan for local compute and governance

Posted: 10 May 2026, 17:37 CET 3 min read

At the San Jose AI & Big Data Expo, HP urged enterprises to run sensitive AI compute on-premises, use RAG for data handling and tighten governance to cut GenAI cloud costs.

At the San Jose AI & Big Data Expo on May 18-19, HP presented an on-premises AI approach that emphasizes local compute, retrieval-augmented generation and stricter governance to reduce GenAI cloud spending and limit data exposure.

Jerome Gabryszewski, HP’s AI & Data Science Business Development Manager, told attendees that many businesses face governance and integration barriers before they can automate data ingestion. He identified fragmented data ownership, inconsistent schemas and legacy systems as problems that must be resolved before automated pipelines can operate reliably.

HP described a hardware path for local AI workloads. At the developer tier the company pointed to mobile and compact workstations such as the ZBook Ultra and Z2 Mini for running experiments without immediate cloud dependence. For small-footprint servers HP highlighted the ZGX Nano, a 15 x 15 cm unit powered by an NVIDIA GB10 Grace Blackwell Superchip with 128 GB of unified memory and about 1,000 TOPS of FP4 performance. HP said a single ZGX Nano can run models up to roughly 200 billion parameters and that linking two units via a high-speed interconnect supports models up to about 405 billion parameters. For larger on-premises needs HP noted the Z8 Fury with up to four NVIDIA RTX PRO 6000 Blackwell GPUs (384 GB of VRAM) and the ZGX Fury, which uses an NVIDIA GB300 Grace Blackwell Ultra Superchip with 748 GB of coherent memory and is intended for trillion-parameter inference at the deskside.

On governance, Gabryszewski framed the case for local compute around data control and latency. He recommended running a retrieval-augmented generation pipeline on local infrastructure so models retrieve context from internal knowledge bases at query time without sending documents to external services. He emphasized that access control for retrieval must enforce role-based permissions so AI returns only the information a user is entitled to see.

On operations he advised treating continuous model updates like software releases. “Nothing goes to production without a validation gate,” Gabryszewski said, and he urged MLOps pipelines that include automated drift detection and human-in-the-loop checkpoints before retraining. He described data poisoning as a provenance issue as well as a security issue and urged tracing data origins and access permissions.

HP addressed GenAI cost pressure, citing a rise in GenAI spending to $37 billion in 2025 and saying 80% of companies missed cost forecasts by more than 25%. HP recommended separating exploratory work from production: run prototyping and early fine-tuning on local hardware, use cloud for high-value burst training or frontier models, and deploy edge compute for latency-sensitive tasks. The company cited analysis showing on-premises infrastructure can offer up to an 18x cost advantage per million tokens over a five-year lifecycle and estimated that on-prem systems for continuous fine-tuning and inference on sensitive data can pay for themselves in eight to 12 months versus equivalent cloud compute.

HP positioned the Z portfolio as a continuum from developer machines to rack-ready systems that can be integrated into managed IT environments to meet data residency and security requirements. Gabryszewski cited analyst forecasts projecting roughly 40% of enterprise applications will embed AI agents by the end of 2026, up from under 5% a year earlier, and noted that about one in five companies currently have a mature governance model for agentized applications.

HP said the company has developed its Z series for professional compute workloads for more than 15 years and recommends a three-tier approach — cloud for earned scale, on-premises for predictable inference and edge for low latency — combined with RAG, access controls and MLOps practices to manage risk and cost.

Content on BlockPort is provided for informational purposes only and does not constitute financial guidance.
We strive to ensure the accuracy and relevance of the information we share, but we do not guarantee that all content is complete, error-free, or up to date. BlockPort disclaims any liability for losses, mistakes, or actions taken based on the material found on this site.
Always conduct your own research before making financial decisions and consider consulting with a licensed advisor.
For further details, please review our Terms of Use, Privacy Policy, and Disclaimer.