Advanced Large-Scale Computing Trends Reshaping Industries

November 3, 2022

Article by Chris Coates, Head of Engineering, HPC for Q Associates, a Logicalis Company

This is the second in a series of blog posts about the challenges, trends, and opportunities within Advanced Large-Scale Computing with a particular focus on High Performance Computing, but not limited to it. The first post focuses on some of the challenges and can be read HERE.

In this post we’ll look at trends within the marketplace and wider industry that may begin to shape our environments and approaches going forward.

Prologue

The world of advanced large-scale computing and the tech industry in general is ever changing. What we generally think as a great idea can easily become a pervasive technology or movement within a year or two. However, a few trends are beginning to come to the fore, and it’s useful to explore and call-out these trends accordingly. We’ll look at a few key areas:

Emerging tech trends
The end of the tech boom
Open-source and language trends
Artificial Intelligence (AI)
Environmental trends

Emerging Tech Trends

Where better to begin than some technology advances and emerging technologies.

Energy Awareness

There has been a global consensus to be greener, resulting in more eco-friendly choices such as solar and renewable energy. People have also become much more aware of their carbon footprints and have been increasingly driven by both governments and public opinion to act; thus, minimizing it or reclaiming as much energy as possible from waste products or reducing waste on the electric > kinetic/heat energy conversion that occurs within clusters is becoming a key driver.

The move away from spinning hard drives to flash, and removal of fans in favour of more thermally efficient transport mechanisms such as water are becoming ever more prevalent. With the advances in chip technology and their resulting increased power draw, water or immersion cooling is the direction of travel for most environments.

With cooling companies and vendors becoming even more aware, many are offering their own solutions to fit into an increasingly water/immersion environment.

Furthermore, in software there are trends occurring to use environments in more energy efficient ways. Everything from power capping, frequency changing, and even process placement modifications to reduce the amount of energy required to move data whilst turning off/clocking down individual cores that aren’t in-use are being employed. In some cases, power savings of up to 30% are being achieved with varying impacts on performance, anything from a 3% improvement in run-time (Fugaku recently presented demonstrating this) to 10% increased run-time seen.

With large-scale computing becoming ever more complex, the ability to control and fluctuate power draw on components that aren’t used will be increasingly more important, and the control mechanisms to achieve this will become pervasive.

This focus on energy is also driving behaviour. With an increased effort at monitoring and reporting, the concept of highlighting energy-efficient usage is becoming important which is then driving behaviours on when and how jobs are submitted.

Edge Computing

Whilst cloud computing has become mainstream with major players Amazon Web Services, Microsoft Azure and Google Cloud Platform dominating the market and the adoption of cloud computing is still growing, Edge is fast becoming the new trend. As the amount of data organisations process continues to increase, they have realised the shortcomings of cloud computing in some situations.

Edge computing is designed to help solve some of those problems by existing closer to where computing needs to happen, next to where the data is generated. For this reason, edge computing can be used to process critical time-sensitive data in remote locations with limited or no connectivity and then send results to a centralised location, faster. In those situations, edge computing can act like a mini data centre.

Edge computing will increase as use of the Internet of Things (IoT) devices increases. By 2030, the global edge computing market is expected to reach $155.90 billion.

However, this isn’t merely a paradigm for IoT. There are huge volumes of data being generated for science and research that has teams worldwide working on it. As opposed to moving the data which takes time and energy (with approx. 20,000 TWh being used worldwide and ~1% of that being data transfer) before any beneficial CPU cycles have been expended towards processing the data, to a remote site, it may be more prudent to move the compute toward the data and process remotely.

What this brings with it however is an increasing requirement for platforms, standards, and organisational collaboration to allow this shared collaborative infrastructure to solve these huge international problems. Cluster federation including edge cluster federation and the ability to share these resources with multiple groups is likely to become ever more important.

Quantum Computing

As we move forward with technology, quantum computing appears to be the next notable development. Quantum computing makes use of quantum phenomena such as superpositions and entanglement to compute. Fields where quantum computing is finding applications is banking and finance to manage credit risk, for high-frequency trading and fraud detection, vaccine and large modelling/simulation requirements such as coronavirus modelling for populations, and many others.

Quantum computers are now a multitude times faster than regular clusters and many companies are now involved in making innovations in the field of Quantum Computing. The revenues for the global quantum computing market are projected to surpass $4.75 billion by 2029. The power of this technology is relatively stark – Processes that take some people months or years can potentially be completed in hours or minutes. Clearly this is a transformative technology.

Does that mean that Quantum will be everywhere in the next few years?

Without the ability to have materials that support a quantum state without being supercooled the answer is probably not. However, what is more likely to happen is that centralised quantum resources will be made available in the same regards as being able to have federated edge resources to existing large-scale computing environments. What this brings with it, is an increased need to support this federation of varied increasingly exotic technologies to choose “the best tool for the job” to complement each other.

There are further concerns with Quantum too – despite the clear funding towards the technology itself being bullish, the talent pool to operate such technologies remains lagging behind – To make the most of this new technology, there will be a greater need to understand quantum mechanics, linear algebra, probability, information theory, and machine learning – clearly topics that aren’t entirely simple to acquire knowledge and understanding easily.

Virtual Reality and Augmented Reality

The next exceptional technology trend is Virtual Reality (VR) and Augmented Reality (AR). VR immerses the user in an environment whilst AR enhances their environment. Although this technology trend has primarily been used for gaming thus far, it has also been used for training, for example being used to train U.S. Navy, Army and Coast Guard ship captains.

In 2023, we can expect these forms of technologies being further integrated into our lives. Usually working in tandem with some of the other emerging technologies we’ve mentioned in this list, AR and VR have enormous potential in training, entertainment, education, marketing, and even rehabilitation after injury. Either could be used to train doctors to do surgery, offer museum goers a deeper experience, enhance theme parks, or even enhance marketing. The global AR and VR market is expected to grow to $461.25 billion by 2030.

AR will become ever more useful for situations where access to key information remotely is important. As a key example, with increasing complexity of clusters it will become ever more important to understand which components are broken (as the law of averages with an increasing number of components will be that more fail), and what device is connected to which switch in a live format. The last thing any engineer in a situation with an extremely complex cluster wants is to turn off or unplug the wrong server or cable. AR should help to alleviate this problem.

VR will become ever more useful in visualising scientific experiments, which will drive understanding of complex simulations and models. Part of the challenge with creating complex models and complex simulations is being able to navigate that visualisation effectively, even more so with the advent of digital twins. Organisations that can efficiently and effectively provide means of navigating in VR will have with them a serious advantage.

Blockchain

Although most people think of blockchain technology in relation to cryptocurrencies such as Bitcoin, blockchain offers security that is useful in many other ways. In the simplest of terms, blockchain can be described as data you can only add to, not take away from, or change. Hence the term “chain” because you’re making a chain of data.

Not being able to change the previous blocks is what makes it secure with the ability to have a full auditable trail. In addition, blockchains are consensus-driven, so no one entity can take control of the data. With blockchain, you don’t need a trusted third-party to oversee or validate transactions.

So, what relevance does this have to HPC? Well, it all comes down to the topic of provenance and proof of execution. Science is based upon evidence, and data. To reproduce scientific applications and ensuring code execution provenance is paramount, the same must be said for data. The fidelity of data provenance is often overlooked: there is no such way to ensure that the provenance of data itself has not been fabricated or falsified. Even more so, the importance of it being immutable and autonomous is fundamental to ensure scientific discoveries are trustworthy, and reproducible in future.

Blockchain architecture could be the solution to this conundrum, and there are papers that have been published to attempt to address this, which look promising. The challenge will be adopting these processes and the autonomy will be important to reduce workload on staff who are already challenged for time.

Web 3.0 Technologies

The advancement of internet protocols as part of Web 3.0 is giving rise to new ways of doing computation.

The Distributive Compute Protocol (DCP) is a compute middle-layer that makes it very easy (five lines of code) to deploy computational workloads in parallel across idle devices all around the world, such as PCs, servers, or even IoT devices. It could effectively unlock the next generation of cycle scavenging and High-Throughput Computing.

Technologies not only to request the compute but to distribute the data efficiently are now coming to the fore with IPFS now becoming ever more prevalent in being able to efficiently distribute data in a peer-to-peer manner keeping data requested locally closer to the source of the compute requesting it.

Hot data would be closer to the source of the compute request, making it more efficient to operate over large geographical footprints. Security concerns regarding this data can be addressed with creating private hashed and encrypted networks, making globally distributable data between worldwide projects capable. The rise of decentralised infrastructure cannot be ignored, but instead could be embraced to make computing more geographically diverse, resilient, and often performant.

Internet of Things (IoT)

As consumers, we’re already using and benefitting from IoT. We can lock our doors remotely if we forget to when we leave our houses, check remote cameras for security purposes and preheat our ovens on our way home from work, all while tracking our fitness on our Fitbits. However, businesses also have much to gain now and soon. The IoT can enable better safety, efficiency and decision making for businesses as data is collected and analysed. It can enable predictive maintenance, speed up medical care, improve customer service, and offer benefits we haven’t even imagined yet.

And we’re only in the beginning stages of this new technology trend: Forecasts suggest that by 2030 around 125 billion of these IoT devices will be in use around the world, creating a massive web of interconnected devices spanning everything from smartphones to kitchen appliances. The global spending on the Internet of Things (IoT) is forecast to reach between 5.5-12.6 trillion U.S. dollars in 2025. Further technologies such as 5G are expected to drive market growth in the coming years.

With this continued exponential growth in devices that are interconnected, will come with it a huge growth in data. And with data, comes the need to store this data, compute it, and make sense of it.

5G

The next technology trend that follows the IoT is 5G. Where 3G and 4G technologies have enabled us to browse the internet, use data driven services, increased bandwidths for streaming on Spotify or YouTube and so much more, 5G services are expected to revolutionize our lives. By enabling services that rely on advanced technologies like AR and VR, alongside cloud-based gaming services, and much more. It is expected to be used in factories, HD cameras that help improve safety and traffic management, smart grid control and smart retail too.

Just about every telecom company is now working on creating 5G applications. 5G Network subscriptions will reach ~4.4 billion by the end of 2027.

What 5G brings with it alongside the IoT advantages is the ability to offer different access and control paradigms. The ability to provide emergency out of band access to remote clusters, for example, along with edge computing opportunities.

Cyber Security

Cyber security might not seem like an emerging technology, given that it has been around for a while, but it is evolving just as other technologies are. That’s in part because threats are constantly new. Parties who are trying to illegally access data are not going to give up any time soon, and they will continue to find ways to get through even the toughest security measures. It’s also in part because new technology is being adapted to enhance security. If we have data that others want to access, cybersecurity will remain a trending technology because it will constantly evolve to defend against the wrong people gaining access.

Specific cybersecurity risks such as supply chain attacks will be discussed further in this article but suffice to say that quoting Spiderman “With great power, comes great responsibility” and with such a technological power such as large-scale computing and the access to data that is extremely valuable, the security of these environments is critical.

The Silicon Valley Tech Slowdown

You’ve been fired alongside ~50% of the company on Friday, out of the blue. On Monday however you get a call, telling you that you can get your old job back, but you have an hour to decide. It sounds crazy, but this is not made up: it’s what’s happened at Twitter only recently. Elon Musk reportedly sorted Twitter engineers by “lines of code written in the last year and fired the bottom X%,” which is interesting because that is a bit of a skewed measure for a variety of reasons. It would essentially mean that people working on the hardest problems (and possibly some of the most talented) were likely let go.

Why is this a major concern? Well, there is one group this is particularly true for at Twitter: The Machine Learning Ethics, Transparency and Accountability Team (META for short, which I hope was a cheeky reference to Facebook!). This group was responsible for exploratory work on algorithmic transparency, fairness, bias, etc. Those are extremely difficult problems, and ones where broad agreement on those standards and methods are crucial.

There are researchers and academics galore (alongside certain companies) boldly taking the issue head-on (we only have to look at the DCMS AI Standards Hub mentioned in part 1 of this blog series), but Twitter has something many don’t which is the data (going back to my earlier topic of bringing the compute to the data and not the other way around, and the fact that data is now being treated as a valuable commodity).

This is the conundrum. Whilst the world wants to address these issues, the data that could inform them is usually intellectual property (yes, you are the product) and it’s never made accessible when issues do arise. So, if data is the lifeblood of AI (of which it is) then we need groups and teams at the source, or to find better ways to share the data (see the technologies section and IPFS for reference).

This layoff issue isn’t limited to just Twitter either. There is a technology sell-off and slowdown occurring within private industry.

Mark Zuckerberg has just laid off more than 11,000 employees at Meta Platforms, about 13 per cent of its global workforce, in what he described as “some of the most difficult changes” in the company’s history. He has called for “a meaningful cultural shift in how we operate”. When people like that mention it, others should listen.

For a bit more clarity, here are the most recent layoffs across the big-tech or Silicon Valley industry, and overall, there has been ~136k tech layoffs so far in 2022:

Image ref: Layoffs.fyi – Tech Layoff Tracker and Startup Layoff Lists

Twitter: 50% lay-offs
Stripe: 14% lay-offs
Meta: 13% lay-offs
Lyft: 13% lay-offs
Asana: 9% lay-offs
Salesforce: 1.3% lay-offs
Amazon: 3% lay-offs
Microsoft: <1% lay-offs

This does look like a major slowdown within big tech, and many would see this as a concern around the tech industry. However, these layoffs could be seen as a “redistribution of wealth” event. CNBC in the US reported that US payrolls surged by 261k – Whilst this is the slowest pace of employment growth since December 2020, there are clearly still people being added to payrolls even in the face of other big-tech layoffs, and these talented people must end up somewhere.

Imagine the talent that is effectively now back within the available talent pool, effectively burned by private organisations willing to cut thousands at short notice, who truly wanted to make a difference (anyone who has worked at a major social media platform I daresay knows of the platforms power to shape the world that uses it, and so often has a deep belief that they are helping change the world).

What this talent pool needs now more than ever is a viable alternative they could use their talents for that gives them stability, consistency, the same (or approaching the same) tooling, and a true feeling of making a difference again. Something that realistically is tailor made for research and academia to offer.

This big-tech slowdown trend is likely to continue – But with it, it comes with an increase in talent pool available to potentially go into research. The challenge is to offer opportunities that are attractive enough to help this redistribution of expertise.

Open Source and Language Trends

Infrastructure-as-code (IaC) is picking up steam, with developers leveraging the Hashicorp Configuration Language (HCL), Shell, and Go language (Golang) heavily this year, according to GitHub’s State of the Octoverse report for 2022. The annual report explores software development across GitHub’s code-sharing repositories.

The popularity of these tools points out the growing presence of operations communities in the open-source realm. Open source has historically been more centred on developers, says the GitHub report, released on November 9. In fact, HCL was the fastest-growing language on GitHub at 56.1%; this growth was driven by the popularity of the Terraform tool for IaC. The report cites IaC momentum as one of several trends.

Another notable trend is the presence of commercially backed open-source projects on GitHub. The most successful projects have salaried developers making regular contributions. First-time open-source contributors also tended to favour commercially backed projects. Flutter, Next.js, and Visual Studio Code are all examples of company projects that have become an integral part of GitHub’s developer ecosystem. GitHub also found that 30% of Fortune 100 companies have open-source program offices, and 50% of first-time contributors work on commercially backed projects.

Overall, GitHub cited the following usage numbers on its platform:

JavaScript remained the most-used language on GitHub.
PHP use declined while Python use increased by 22.5% in 2022.
Python was the second-most-used language, followed by Java and TypeScript.
94 million developers are using GitHub.
The Rust community grew by more than 50%.
More than 90% of companies use open source.
90% of Fortune 100 companies use GitHub.
There were 413 million open-source contributions in 2022.
More than 20.5 million people joined the site this year, with India having the largest developer population growth.

When the first Octoverse report was released 10 years ago, GitHub reported there were 2.8 million individuals using the site. Clearly the growth is huge up to 94 million developers now.

So, what does this tell us from an insight’s perspective beyond the facts and figures?

Well, what it tells us very clearly is that there is a huge community out there who are Git literate, for starters. This community is comfortable managing repositories, and therefore is a direct recommendation when looking at the direction of travel for platforms to leverage this key skillset.

In addition to this skillset, the rise in IaC code content on GitHub also shows that there’s a growing demand platform-wise to embrace composable environments. This is not something that should be taken lightly, with the very regular cited skills shortage being a real thing… To address this shortage within academia is more a case of adapting our environments to suit the direction of travel of skills within the market as opposed to citing the lack of skills doing things a particular way. The environment should be flexible enough to adapt to the use-case, not brittle and unable to change.

That isn’t just a concern for end users, it is administrative teams, how systems are architected, and our willingness to change and adapt to the world around us to help solve critical problems (which is why we got into HPC in the first place, right?). This embracement of change is what really could kickstart the next generation of research again going forward.

The growth of both Rust and Python is one that cannot be ignored. It highlights the growing demand for platforms to be both easier to code for, and in the case of Rust more performant and with greater clarity for developers where they are going right, or wrong with blister messaging from an error handling perspective, and from a security perspective following good practice recommendations from the NSA regarding memory-safe coding.

Over the past several years, and the past year in particular, supply chain security in the open-source ecosystem has become a large point of focus for the broader open-source community including the many companies and governments that rely on open-source software. A Linux Foundation conference recently came out and explicitly mentioned it – The open-source community and its ecosystem is under attack. The OpenSSF has recently grown by 15 new members considering the seriousness of this risk.

As an industry and community, we have seen bad actors take over user accounts, corrupt popular open-source dependencies, and take advantage of vulnerabilities in some of the biggest open-source projects.

It’s also no secret that a lot of our modern digital infrastructure runs on open source. The success of open-source software (OSS), in part, comes down to the speed at which it’s developed by a global community of developers. But this speed can come at a severe cost if developers inherit the vulnerabilities in their supply chain. The risk of a high-profile supply chain attack may cause developers, teams, and organisations to lose trust in open source.

Therefore, it’s paramount to ensure that checks and balances are in place to mitigate these risks and protect the underlying infrastructure and that valuable data that we mentioned already.

One of the biggest advantages of open-source software (OSS) is that it can reduce duplicative efforts, which is crucial in time-sensitive humanitarian crises. OSS can also bring together global communities and facilitate inclusive design and development of technology solutions that support diverse global populations.

Ok, let’s talk about money:

“Crucial parts of the open-source infrastructure are maintained by a few underpaid, overworked individuals that often do it for free. And that isn’t right.”

– Wolfgang Gehring, FOSS Ambassador, Mercedes-Benz Tech Institute

This is a key concern. In fact, it is often a key concern whenever a company or even an employee in general mentions that they are going to go open source, with the question being “Well how are you going to make money?”

As an industry we need to ensure open source is supported. They’re giving back to the community and fostering innovation – These are the exact people we need to be supporting else things will go back to being private, closed, and proprietary. And we’ve come from there, it’s not a great place to be.

Companies within our industry have a part to play here. Any company who benefits from an open-source project should endeavour to financially support that project wherever it can, because their profits and progress relies on that project.

Some companies give away their solutions open-source and then provide enterprise-level service assurance to customers whose service requirements demand it. This is a good model, and one that within HPC environments we are not averse to (take SchedMD providing support for Slurm for example). This model not only fosters community but supports ongoing development and innovation.

That pay-it-forward mentality needs to continue and grow within our industry. As the old phrase goes “A rising tide raises all boats” – We need to ensure that it really does for the good of everyone within our industry.

Artificial Intelligence (AI)

We would be neglectful to forget the impact of AI and its trends. Applied AI uses intelligent application to solve classification, prediction, and control problems to automate, add, or augment real-world business use cases. As AI technologies rapidly push new frontiers of innovation, business adoption continues to grow across use cases.

AI is continuing to grow massively; AI is predicted to contribute $15.7 trillion to the economy by 2030 with 45% of these economic gains being provided by product improvements to existing products by AI, stimulating customer demand.

This is a very disruptive technology, and one that directly influences large-scale computing, including HPC. No longer should AI and Machine Learning be considered as being separate from the HPC environment, but instead key to complementing and enhancing the impact and effectiveness of traditional HPC workloads.

Environmental and Building Trends

Climate change is having a distinct impact on the development of advanced large-scale computing, how these systems will be built, and where they will be built in future. We touched on this as much in part 1 of this series.

There is a trend continuing to occur where the “cattle, not pets” methodology is beginning to take hold at ever higher abstractions.

Instead of building large facilities dedicated to housing compute and storage, ever more the paradigm is shifting towards moving the compute towards the data source at collection point or leveraging the ability to cool systems in more efficient ways such as situating data centres underwater, for example with Microsofts Project Natick completing its second phase. This isn’t a new trend either,

This is also happening in wider building trends, with prebuilt approaches (and even containers!) being used to rapidly change the face of our cities, towns and people’s homes and sporting facilities (if you’ve not watched Grand Designs where prebuilt buildings are shipped in, you’re missing out!). This brings with it efficiencies and speed that simply can’t be matched.

Once we stop treating the data centre and the cluster itself like a pet (how many still name their physical clusters?), then our decision process becomes different, and we begin to think about how to be the most efficient.

Inevitably that process begins to look like units that can be prebuilt and dropped into a location.

Instead of retrofitting capability to a bricks and mortar building to do heat reclamation and making it more energy efficient… There is significant mileage in making the cluster itself in a containerised manner to do all this efficiently out of the box.

Think of all the money spent on building grand facilities that expends money, time, and resources that then generally becomes inflexible when it comes to increasing capacity or adapting to be more efficient. Let’s give an example. Say it takes 5 years to build out a facility, at a multi-million-dollar cost, planning, and resources. That’s a cost sank into something that is a static asset, something that did zero cycles of computing yet.

Then it takes let’s say, 90 days to ship the kit in, move it into a hall, rack stack cable, attach to water feeds etc, install and commission and setup to run workloads. So, 5yrs and 3 months to create, before a cycle is used in anger to achieve your task.

Ok, now let’s build a foundation with all the required power, cooling feeds, etc. It looks like a glorified car park, right? However, to build that takes 3 months. Now it gives you the opportunity to drop in units custom built, to leverage that capability, and that equipment can be built out in parallel of the foundations at a faster rate because it can be built out by the supplier elsewhere, without depending on the building. So, you got a super-efficient cluster in… 3 months. At less environmental resources and cost than it was to build out a huge sprawling facility. Bear in mind to deploy from factory to operation for Natick is 90 days, and that is underwater.

And when you want to either increase or build out a different system? You can do that, then effectively remove the old one and recycle it by craning it out.

Want your site closer to the equipment that collects the data? Simply build out your foundation for it next to the equipment and site it right next to the equipment.

Sounds significantly more attractive than waiting 5+ years at a significantly greater cost before you can even get any cycles out of your investment, right?

These trends don’t only limit themselves to being in-situ on Earth… The EC-led feasibility study, the Advanced Space Cloud for European Net-zero emission and Data sovereignty project, or ASCEND, is part of the EU’s Horizon Europe initiative and has a 2-billion-euro budget. The projects goals would to be effectively power data centres in space using solar, which could generate 100’s of megawatts, and transport the data to/from Earth using optical cabling.

The first goal will be to identify whether this is feasible, and whether the emissions created by launching into space would make this more carbon efficient, but optimism remains high that this could be viable – Especially with technologies such as Spinlaunch offering the ability to slingshot equipment out of our atmosphere, if powered via green methods and if the equipment can withstand the forces experienced equipment-wise it could open up a new opportunity.

Summary 

Given these technology, industry and wider societal shifts, the future of large-scale high-performance computing has some extremely interesting choices and options available. The need to embrace change within our industry and adapt to the world around us is ever present and hopefully these trends indicate the direction that could be taken going forwards.

In the final post of this series, we will look at opportunities that these trends provide us to address the challenges posed in Part 1 of this series.

Q Associates and Logicalis can help demonstrate solutions and strategies to help you address these issues, empower you to deliver on them at the right cost and support your organisation in getting to results, faster. Get in touch and find out more about our offerings and expertise. 

Email chris.coates@qassociates.co.uk or call 01635 248181.