Opportunities within High Performance Computing
December 5, 2022
Article by Chris Coates, Head of Engineering, HPC, at Q Associates, a Logicalis Company
As we’ve discussed in blog post 1 ‘Advanced Large-Scale Computing – Part 1‘ and blog post 2 ‘Advanced Large Scale Computing – Trends Reshaping Industries’ there are many challenges within large-scale computing to be addressed, and the trends that are occurring within the industry are giving rise to many opportunities.
Let’s discuss a few of these here, in the format of the opportunities themselves that the challenges present. To address these challenges requires change. But change is an ever constant, whether you like it, or not.
“Innovation is the ability to see change as an opportunity, not a threat” Steve Jobs
It’s useful to list out a few of the many future opportunities for large-scale computing, and high-performance computing (HPC) in particular, some of which are:
Artificial Intelligence and Machine Learning: HPC and large-scale computing systems are capable of processing vast amounts of data at very high speeds, which makes them ideal for training machine learning models and running AI algorithms. AI and HPC effectively are converging, with an increasing amount of computation and environments being dual-purpose. In fact, many would say that if your next system hasn’t got this in-mind that you’ll be behind the curve.
HPC systems are being used extensively for training deep learning models and running AI algorithms. For example, Google’s AlphaGo, a computer program that plays the board game Go at a superhuman level, used HPC systems to train its deep learning models. Similarly, OpenAI’s GPT-3 language model, which can generate human-like text, was trained on an HPC system consisting of thousands of GPUs, and the value of ChatGPT can’t be underestimated.
Simulation and Modelling: HPC systems are critical for solving complex mathematical and scientific problems such as climate modelling, quantum mechanics simulations, and astrophysics research. These complex mathematical and scientific problems in themselves are extremely valuable, but can be leveraged for even greater effect and business advantage.
HPC systems are critical for solving complex mathematical and scientific problems. For example, scientists use HPC systems to model climate change, predict natural disasters, and simulate the behaviour of materials at the atomic scale. In 2020, the Fugaku supercomputer in Japan was used to simulate how droplets of the novel coronavirus spread in office spaces and public transportation, providing valuable insights for containing the pandemic.
The value of these achievements is extensive – Being able to devise new and innovative materials to make our lives better or make things more efficient, or modelling the effects of climate change to pre-empt the need for power or merely become more efficient at understanding power demands at certain times of the day or year around the world? These abilities and their pure value alone cannot be underestimated.
Big Data Analytics: With the rise of big data, organisations need more powerful computing systems to process and analyse this massive amount of data. Data in itself is akin to modern-day crude oil, needing to be refined to unlock its value. HPC systems can provide the necessary computational power to process data at high speeds and perform complex analytics, to help unlock the value in this data.
HPC systems are ideal for processing and analysing massive amounts of data. For example, the Large Hadron Collider (LHC) at CERN generates enormous amounts of data from particle collisions, which are then analysed using HPC systems to make new discoveries in particle physics. In finance, HPC systems are used for high-frequency trading, where milliseconds can make a difference in profitability, and understanding market trends quickly to make the most effective trades.
Cloud Computing: Cloud computing is a growing trend, and HPC systems can provide the necessary computational power for cloud-based services, such as video streaming, gaming, and scientific research. Not only this, but the growing trend for an easier to consume model of large-scale and high-performance computing environments is continually growing. Researchers, and people who want to unlock that data are increasingly not your normal HPC user. In fact, they are merely bothered about their own use-case, and not how the system works, or how it is built. They want it to “just work”. This is increasingly more important, as compute capacity isn’t outstripping demand – Demand is outstripping capacity. But with that demand comes the need for it to be more accessible.
Cloud computing is becoming more popular, and HPC systems can provide the necessary computational power for cloud-based services. For example, Amazon Web Services (AWS) offers an HPC service called Elastic Fabric Adapter, which enables customers to run HPC workloads in the cloud. Similarly, Microsoft Azure offers HPC services for scientific research, weather modelling, and engineering simulations.
Cybersecurity: As cybersecurity threats become more complex and sophisticated, HPC systems can help organisations better protect their networks and data by providing faster and more powerful analysis and detection of cyber-attacks. Even more important than that is the risk that with increasingly more powerful resources at-hand, these resources become an even bigger draw and target for attackers themselves to leverage that power for their own ends. There is a major opportunity to innovate in this space and ensure protection of these environments.
As cybersecurity threats become more complex and sophisticated, HPC systems can help organizations better protect their networks and data. For example, the National Security Agency (NSA) uses HPC systems to analyse network traffic and detect cyber-attacks in real-time. Similarly, cybersecurity firm FireEye uses HPC systems to analyse malware and identify new threat vectors.
Is it all positive?
While there are many positive aspects to the use of high-performance computing (HPC), there are also some negative aspects to consider.
One negative aspect of using HPC for AI and machine learning is the potential for bias. If the training data used to train the machine learning model is biased, the resulting model will also be biased. Additionally, large-scale computing is always plagued by the need for the energy to drive them. HPC systems can consume a significant amount of energy, which can contribute to climate change if not properly managed. Room for innovation in this area of concern (energy) is there, and there is plenty of opportunity here to make this process significantly more efficient. Everything from being able to model the power requirements of a workload more accurately to how to efficiently cool the environment to reduce power-draw, to even energy reclamation in the form of heat and understanding the most carbon-efficient times to run workloads are all considerations.
A negative when using HPC for simulation and modelling is the potential for error. HPC systems could produce inaccurate results if the models used are flawed or the data used to create the models is incomplete or inaccurate. If anything, there is a question of data provenance and code provenance here that needs to be addressed. You want to be able to run a job 10 years from now, and base future work off of fact(s) that are repeatable. Basing science from something that can’t be repeated is like basing science on trust; You just wouldn’t do it. There is a need to make code and data more portable, and repeatable – And with that brings opportunity to make that happen, in a format that is accessible.
Additionally, HPC systems can be expensive to maintain and operate (even at a human resources level), which can be a barrier for smaller organisations and research groups. This is compounded by the fact that of all the user-groups and potential user-groups that could use HPC environments, the group that would see the most benefit and greatest transformational effect would be SMEs.
Using HPC for big data analytics comes with the potential for data breaches. HPC systems can store vast amounts of sensitive data, and if not properly secured, can be vulnerable to hacking and data theft (going back to an earlier point about platforms that have power or store large amounts of data being more attractive to attackers). Additionally, HPC systems can be complex and difficult to manage, requiring specialized skills and expertise. There is a significant opportunity here to make HPC environments more secure, and easier to do so.
One negative aspect of using HPC within cloud computing environments is the potential for vendor lock-in. Organisations that rely on a specific cloud provider for HPC services may find it difficult or expensive to switch providers if they become dissatisfied with the service. Additionally, cloud providers may limit the availability of certain HPC features or services, which can limit the flexibility of the organisation. Couple into that some very serious recent outages of all major cloud providers, and you can see that putting all of your eggs into the cloud computing basket may not be the most prudent idea. Multi-cloud is the most useful answer to this issue, but with that brings an expertise requirement that often can be expensive and add complexity. The opportunity within this space is to make multi-cloud simpler, by abstracting away complexity.
When it comes to cybersecurity, a negative aspect of using HPC for it is the potential for false positives. HPC systems can generate large amounts of data, and if not properly analysed, could produce false positives, which can lead to unnecessary alarms and alerts. Additionally, HPC systems can be vulnerable to cyber-attacks themselves as stated, which can compromise the security of the organisation’s data and networks. With great power, comes great risk of that power falling into the wrong hands, or sometimes worse, being used incorrectly by the right hands resulting in expensive mistakes. There is significant opportunity here to make security for these environments more distributed – so that in the event one organisation identifies a risk, that every other organisation is aware and so can protect themselves automatically.
We’ve discussed some of the negatives of these technologies and opportunities that arise even from the negatives. It’s not all doom-and-gloom though. Here are some of the positives that these opportunities and futures hold for us.
Using HPC for AI and machine learning brings with it the potential for significant advances in areas such as healthcare, finance, and autonomous vehicles. HPC systems can enable the development of more accurate and efficient models that can lead to breakthroughs in these areas. Additionally, HPC systems can be designed to be more energy-efficient, reducing their impact on the environment. The opportunities for AI and machine learning unlock so many other opportunities that without their benefit far outweighs the risks.
A positive aspect of using HPC for simulation and modelling is the potential for scientific and technological breakthroughs. HPC systems can enable the simulation and modelling of complex systems and phenomena, leading to new discoveries and advances in fields such as materials science, physics, and engineering. Additionally, HPC systems can be used to optimize designs and improve product performance, leading to cost savings and improved competitiveness, and enabling advances to help solve our energy requirements – See the rise of Nuclear Fusion that requires significant simulation resources to be able to deliver these things safely for the perfect use-case.
If we use HPC for big data analytics, it gives us the potential for improved decision-making. HPC systems can enable organisations to process and analyse vast amounts of data quickly and efficiently, leading to insights and discoveries that would be difficult or impossible to obtain with traditional computing systems. Additionally, HPC systems can be used to develop new products and services that can improve the lives of individuals and society in general.
The major positive of using cloud computing for HPC is the potential for increased accessibility and flexibility. There’s little use having a platform that is all-powerful, if people can’t use it. HPC systems can be made available to a wider range of organisations and individuals through cloud-based services, enabling smaller organisations such as SMEs and researchers to benefit from their power and capabilities. Additionally, because cloud-based HPC systems can be scaled up or down as needed, it provides flexibility and potential cost savings for organisations. There is a need to ensure that on-premises systems also have this level of flexibility, and I’m certain it is something we can touch upon in a later article.
The huge benefit of using HPC for cybersecurity is the potential for improved threat detection and response. HPC systems can enable organisations to respond in real-time, allowing them to detect and respond to cyber-attacks quicker and more effectively. HPC systems can also be used to develop more advanced and sophisticated cybersecurity tools and techniques, improving the overall security posture of organisations and individuals. Leveraging HPC for this enables major improvements in our security, and that can only be a good thing.
The Financial Value
So, we’ve mentioned the opportunities, the negatives, the positives of each – What about the financial value of these? Surely, we need numbers to make reasoned business decisions, right?
The financial values of each of these points can vary greatly depending on the specific use case and industry. Here are some potential financial values for each point though, just for a little more information and clarity.
Artificial Intelligence and Machine Learning: The financial value of using HPC for AI and machine learning can be significant. For example, the global AI market size was valued at $62.4 billion in 2020 and is projected to reach $309.6 billion by 2026, and $429 billion by 2027.
HPC can enable organisations to develop more accurate and efficient models, leading to improved products and services, and increased revenue. HPC can also help organisations reduce costs by improving efficiency and automating processes.
Don’t think of Artificial Intelligence as “taking your job” – Think of it as enabling you to do significantly more with the resource you have. Imagine it as potentially the best assistant you could ever have. The coined-phrase is that “Artificial Intelligence won’t take your job – The person using Artificial Intelligence will.” – It’s a phrase that I believe is going to become more prevalent as time goes along. It’s best to embrace this change and innovate using it.
Simulation and Modelling: The financial value of using HPC for simulation and modelling can also be significant. For example, HPC can help organisations reduce product development costs by enabling virtual testing and prototyping. HPC can also help organisations optimise designs and improve product performance, leading to increased revenue and market share. Additionally, HPC can enable organisations to make better-informed decisions, reducing the risk of costly mistakes. Whilst this hasn’t got a value attached to it, let’s create one.
Let’s say as an example a manufacturing company is developing a new product, and they need to perform extensive testing and simulation to ensure that the product meets the required performance standards. Without HPC, the company would need to build physical prototypes and conduct expensive and time-consuming testing, which could take months or even years to complete. Say for instance, the F1 or automotive industry as an extreme example, but it doesn’t have to be that. It could be a kitchen-door manufacturing company.
With HPC, however, the company can simulate the product’s performance in a virtual environment, using advanced modelling and simulation techniques that are only possible with HPC. This enables the company to quickly identify and correct any design flaws, reducing the number of physical prototypes needed and speeding up the entire product development process.
Assuming the cost of building physical prototypes and conducting testing is $1 million per iteration, and the HPC-enabled simulation and modelling process reduces the number of iterations required from 5 to 2, the company could save $3 million in development costs ($1 million x 3 iterations). Additionally, the faster time-to-market enabled by HPC could lead to increased revenue and market share, further increasing the financial benefit of using HPC for simulation and modelling.
The potential value here is clear!
Big Data Analytics: The financial value of using HPC for big data analytics can be significant as well. The global big data analytics market is projected to grow from $271.83 billion in 2022 to $655.53 billion by 2029, at a CAGR of 13.4%. HPC can enable organisations to process and analyse data quickly and efficiently, leading to insights and discoveries that can improve products and services and increase revenue.
The increasing digital solutions across business sectors such as banking, healthcare, BFSI, retail, agriculture, and telecom/media worldwide are significantly increasing data. As an example, artificial intelligence brings a noteworthy change in risk management, precision farming, and pest control in the agriculture sector. Everything from the best crops, time of year to plant, and more can now be analysed.
Industries are executing bots to reform and automate working scenarios. Furthermore, virtual assistants such as Google Assistant, Apple’s Siri, and Amazon Alexa, generate huge data. For instance, according to a Statista report published in January 2021 the number of digital voice assistants is anticipated to reach 8.2 billion. Advancements in smartphone technology and network connectivity boosted the growth of social media users worldwide.
A massive volume of data is being generated from Facebook, WhatsApp chats, YouTube videos, Instagram, Snapchat, and other platforms. Hence, with advanced technologies in the industries, growing adoption of smart applications and emerging social media platforms, the industrial revolution is anticipated to produce enormous databases. Thus, rising database across industries is projected to fuel the big data analytics market.
The old phrase of “if a service is free, you (and your data) are the product” has never been truer. Leveraging information from social media results in significant value, and financial value.
Cloud Computing: The financial value of using HPC for cloud computing can be significant in terms of cost savings and flexibility. For example, cloud-based HPC systems can be scaled up or down as needed, providing organisations with cost savings and flexibility compared to traditional on-premises systems. Additionally, cloud-based HPC systems can enable smaller organisations and researchers to access HPC resources that would otherwise be unavailable due to cost or technical limitations and open new ways of doing existing workloads which can potentially be cost-saving, despite the increased view of op-ex outlay.
This doesn’t have a specific value attached to it – So let’s fix that with a hypothetical example, of which we could go into greater detail for your particular use-case in person, just get in touch. But for now… Let’s have our hypothetical situation of an SME:
There is a system on-prem with 1 million core-hours utilised on it/month that cost $500k to buy, providing 2304 cores. It’s approximately 59% utilised.
It has a shared filesystem, using Lustre onsite of 100TB, predominantly used for scratch, and that cost a further $400k to implement. It’s at-most half utilised, i.e. only ever at most houses 50TB of scratch data (there’s a trend here that on-premises locations often overprovision their storage because many aren’t aware of the growth trends so err on the side of caution, but that’s another talk in itself!).
It pulls $100k of power annually and has a licensing cost of $30k, for an on-premises software stack.
Over 3 years, its TCO (purely on hardware, power, and licensing) is $1.29M.
It cost $1M to kit out the data centre it’s housed in recently, at an estimated lifespan of 5yrs. We’re not taking this directly into account initially – But If you do then you can add $600k to the cost for the on-prem solution – And we do highlight this below.
So, we could move this straight to the cloud as-is, persistent storage included and that would be (relatively) expensive. But if we start to assess the workload and then build-out an environment to leverage these use-cases… It could become significantly more compelling.
We’re using 1 cloud provider as an example. But good practice would dictate that in future, you would want to have a multi-cloud solution setup. One that abstracts away this complexity and can work on multiple cloud providers as well as on-prem… (come talk to us!)
Say that we identify that many of our jobs has say, a 1TB filesystem requirement as scratch per job that eventually adds up to almost 50TB when fully utilised, but that they are not IOP-intensive – It mainly loads chunks into memory to process, and streams that information in ahead of time. The Lustre filesystem was merely there to house that volume of data, not for its outright performance. Then in that case, we could use cloud instances with EBS-backed 2TB storage attached to them locally to do the same thing without needing a 50TB shared filesystem at all!
We could also add FSx for Lustre, in a purely scratch capacity for minimal outlay also – which would provide approx. 10GB/sec throughput. Better still, if we architect our system to leverage snapshots and work on the expectation of a (potentially) lossy service… Then we can achieve spot instance pricing benefits, in this case of 77% on average.
Now let’s consider that the code(s) being ran work on ARM, as a for-instance. We can see that our cluster in the cloud on AWS could now use a set of 36 ARM-based C6gn instances using the Graviton chips, and we could still get our Lustre scratch if we wanted to on the proviso that we know it isn’t a forever-store (and that’s ok!).
The total cost of this if you remove that on-premises software stack? $360k/yr, or $1.08M over 3 years, upfront dedicated tenancy, with no infrastructure costs. That’s a saving of $210k without the infrastructure costs, and $810k if you include them in. Pretty significant, right?
Couple into that a multi-cloud strategy where you could spin this entire environment up in a second-site on-demand to cover outage risk(s)… And this begins to look significantly more compelling.
Cost-savings with increased performance and flexibility, and reduced administration overheads? What’s not to like?
(Obviously there is a caveat here that your mileage may vary, but that is why you should come and talk to us about the options available to you – We have some special solutions, as we’re alluding to here!).
Cybersecurity: The financial value of using HPC for cybersecurity can be significant as well. For example, the global cybersecurity market size was valued at $202.72 billion in 2022 and is projected to reach $359.2 billion by 2026 and projected to continue to expand at a compound annual growth rate (CAGR) of 12.3% from 2023 to 2030. HPC can enable organisations to detect and respond to cyber threats more quickly and effectively, reducing the risk of costly data breaches and other security incidents. The impact of security is not always necessarily the breach itself, it’s the regulatory cost impact of failing to meet those regulations. The outlay is often far outweighed by the alternative.
I hope this series of articles has helped your understanding of the challenges, trends shaping future opportunities, and the opportunities and financial value these opportunities present. Please get in touch with us at our new home within Logicalis, where we can go into much greater detail to help unlock value within your environments and keep your eyes peeled for future announcements that will help address a lot of these opportunities – Again as we’re alluding to within this article!
We’re excited at the future, and we believe that an HPC and AI future will be Ubiquitous.
Get In Touch
Contact Q Associates today if you have any questions or would like to discuss your IT requirements in more detail.
Tel: 01635 248181