HP outlines local-first AI with on-prem Z-series
HP promoted running AI inference on-prem with Z-series systems to reduce cloud costs, enforce governance and limit data exposure before the AI & Big Data Expo in San Jose.
Hewlett Packard presented a local-first AI approach ahead of the AI & Big Data Expo, which takes place May 18–19 in San Jose. The company promoted its Z-series hardware as an option for running models and retrieval systems on premises to limit data exposure and to lower long-term inference costs.
Jerome Gabryszewski, HP’s AI & Data Science Business Development Manager, told attendees that many organisations underestimate the work needed to prepare data for automated AI systems. He said teams must resolve fragmented data ownership, inconsistent schemas and legacy infrastructure before automation can succeed.
HP outlined a range of systems for different stages of AI work. For individual developers, the ZBook Ultra and Z2 Mini are meant for local experiments without constant cloud access. The ZGX Nano uses NVIDIA’s GB10 Grace Blackwell Superchip, offers 128 GB of unified memory and about 1,000 TOPS of FP4 performance; HP said a single unit can host models up to roughly 200 billion parameters and two units linked by a high-speed interconnect can handle about 405 billion parameters. The Z8 Fury supports up to four NVIDIA RTX PRO 6000 Blackwell GPUs and 384 GB of VRAM. The ZGX Fury uses the GB300 Grace Blackwell Ultra Superchip with 748 GB of coherent memory and is positioned for trillion-parameter inference at the deskside. HP said the portfolio also includes rack-ready options for larger IT environments.
Gabryszewski framed the case around governance and latency rather than raw compute limits. He told the audience, “The autonomous AI lifecycle creates a governance and latency problem, not a compute problem,” and recommended a three-tier approach: use cloud for occasional burst training and access to frontier models, on-prem Z infrastructure for predictable high-volume inference, and edge compute where latency is critical.
HP raised cost figures for enterprise generative AI, saying spend reached $37 billion in 2025 and that 80% of companies missed cost forecasts by more than 25%. The company recommended running early iterative work-prototyping, fine-tuning and evaluation-on local hardware to avoid high operational cloud costs. HP estimated on-premises setups can deliver up to an 18x cost advantage per million tokens over a five-year lifecycle and cited an 8–12 month payback period versus equivalent cloud compute.
On data protection, HP described running Retrieval-Augmented Generation (RAG) locally so models retrieve context from internal knowledge bases without training on or exposing proprietary data externally. Gabryszewski noted the importance of role-based permissions at retrieval so AI returns only information a user is authorised to see, and he stressed data provenance and access controls to guard against data poisoning and compliance failures.
On operations, he recommended treating model updates like code deployments: validate changes before production, build MLOps pipelines with automated drift detection and include human-in-the-loop checks before retraining. He also cited forecasts that a large rise in applications with embedded AI agents is expected by the end of 2026 and said only about one in five companies currently has a mature governance model for those agents.
HP presented the recommendations as a way to reconcile data governance, decide when to use cloud versus local compute, and build on-premises infrastructure to support sustained, governed AI deployments.
Content on BlockPort is provided for informational purposes only and does not constitute financial guidance.
We strive to ensure the accuracy and relevance of the information we share, but we do not guarantee that all content is complete, error-free, or up to date. BlockPort disclaims any liability for losses, mistakes, or actions taken based on the material found on this site.
Always conduct your own research before making financial decisions and consider consulting with a licensed advisor.
For further details, please review our Terms of Use, Privacy Policy, and Disclaimer.








