Vir­tual Threads in Java: A Gate­way to Cost-Effect­ive Cloud Solutions



As cloud com­put­ing con­tin­ues to grow, com­pan­ies are con­stantly seek­ing ways to optim­ize their applic­a­tions to handle more users, per­form faster, and con­sume fewer resources. Tra­di­tional meth­ods of scal­ing, such as adding more hard­ware or optim­ising exist­ing code for bet­ter per­form­ance, often come with sig­ni­fic­ant costs and com­plex­ity. In Java 21 we have a new way to man­age thou­sands, or even mil­lions, of threads with min­imal over­head, thus provid­ing a scal­able and cost-effect­ive solu­tion for mod­ern applications.

One of the most recent innov­a­tions in the Java plat­form, Vir­tual Threads, intro­duced in Pro­ject Loom, prom­ises to revolu­tion­ise how we handle con­cur­rency and par­al­lel­ism. In this art­icle, we will explore how Vir­tual Threads can help lower oper­a­tional costs in cloud environments.

We will go through:
  • Con­cur­rency vs. Parallelism
  • Threads in Java
  • The Prob­lem with Tra­di­tional Threads
  • Scal­ing Solu­tions and Their Costs
  • Block­ing I/O
  • Non-Block­ing I/O
  • The Prom­ise of Vir­tual Threads
  • Vir­tual Threads and Cost Reduc­tion in Cloud Environments

Con­cur­rency vs. Parallelism

Before delving into the spe­cif­ics of Vir­tual Threads, it’s import­ant to under­stand the fun­da­mental con­cepts of con­cur­rency and par­al­lel­ism, as they play a cru­cial role in applic­a­tion respons­ive­ness and performance.

A comparison of concurrency and parallelism with dots on a stripe
Fig­ure 1: A com­par­ison of con­cur­rency and par­al­lel­ism with dots on a stripe
What is concurrency?

Con­cur­rency refers to the abil­ity of an applic­a­tion to handle mul­tiple tasks sim­ul­tan­eously, mak­ing pro­gress on more than one task at a time. This does­n’t neces­sar­ily mean that these tasks are being executed at the exact same moment. Instead, con­cur­rency involves man­aging mul­tiple tasks in a way that allows them to pro­gress without wait­ing for one another to fin­ish com­pletely. This approach improves the applic­a­tion’s respons­ive­ness, as it can quickly switch between tasks, provid­ing a smoother exper­i­ence to the user.

What is Parallelism?

Par­al­lel­ism, on the other hand, is a sub­set of con­cur­rency where tasks are executed sim­ul­tan­eously, typ­ic­ally on dif­fer­ent CPU cores. This means that mul­tiple tasks are lit­er­ally run­ning at the same time. Par­al­lel­ism is par­tic­u­larly bene­fi­cial for per­form­ance, as it allows for the full util­iz­a­tion of multi-core pro­cessors, enabling applic­a­tions to per­form com­plex com­pu­ta­tions faster. While con­cur­rency improves respons­ive­ness by man­aging mul­tiple tasks effect­ively, par­al­lel­ism boosts per­form­ance by lever­aging sim­ul­tan­eous execution.

Threads in Java

Java, as a lan­guage, has been with us for over 25 years. From the very early days, it provided a straight­for­ward abstrac­tion for cre­at­ing threads. Threads are fun­da­mental to con­cur­rent pro­gram­ming, allow­ing a Java pro­gram to per­form mul­tiple tasks or oper­a­tions con­cur­rently, mak­ing effi­cient use of avail­able resources, such as CPU cores. Using threads, we can increase the respons­ive­ness and per­form­ance of our applic­a­tions. Cre­at­ing a Java thread is easy and is part of the lan­guage itself.

Java uses a vir­tual machine, com­monly called the JVM (Java Vir­tual Machine), which provides a runtime envir­on­ment for Java code. The JVM is a mul­ti­th­readed envir­on­ment, mean­ing more than one thread can execute a given set of instruc­tions sim­ul­tan­eously, giv­ing us an abstrac­tion of OS threads (oper­at­ing sys­tem threads). Many applic­a­tions writ­ten for the JVM are con­cur­rent pro­grams like serv­ers and data­bases that serve many requests con­cur­rently and com­pete for com­pu­ta­tional resources.

The Prob­lem with Tra­di­tional Threads

Before Pro­ject Loom, each Java thread was dir­ectly tied to an OS thread, known as a Plat­form Thread. This one-to-one rela­tion­ship meant that an oper­at­ing sys­tem thread only became avail­able for other tasks when the cor­res­pond­ing Java thread com­pleted its exe­cu­tion. Until that point, the oper­at­ing sys­tem thread remained occu­pied, unable to per­form any other activities. 

Java vs. OS picture
Fig­ure 2: Java vs. OS picture

The issue with this approach is that it is expens­ive from many per­spect­ives. For every Java thread cre­ated, the oper­at­ing sys­tem needs to alloc­ate resources, such as memory, and ini­tial­ise the exe­cu­tion con­text. These resources are lim­ited and should be used cau­tiously. The Thread pool pat­tern is what helps us at this point. With it, we have a set of threads pre­vi­ously ini­tial­ised and ready for use. There is no extra time spent on start­ing up a new thread because it’s already up, improv­ing performance.

Application of JVM Thread pool into OS Thread
Fig­ure 3: Applic­a­tion of JVM Thread pool into OS Thread

In a typ­ical Java enter­prise applic­a­tion fol­low­ing the thread-per-request model, when a user request comes in, the applic­a­tion server (e.g., Tom­cat, Web­Lo­gic etc.) asso­ci­ates this request with a single Java thread from the pool. The thread handles the request from start to end. How­ever, if a thread per­forms a block­ing oper­a­tion (e.g., talk­ing to a data­base or com­mu­nic­at­ing with a ser­vice over HTTP), it can­not return to the pool until the pro­cessing is com­plete. This is import­ant, as already men­tioned, there is a max­imum num­ber of threads that are allowed to be cre­ated, these are lim­ited resources and can lead to exhaus­tion and per­form­ance issues, espe­cially under high load.

A graphic of web Application including Database 1, Database 2, Microservice 1, Microservice 2 and an interaction with External API
Fig­ure 4: A graphic of web Applic­a­tion includ­ing Data­base 1, Data­base 2, Microservice 1, Microservice 2 and an inter­ac­tion with External API

The design and archi­tec­ture of our applic­a­tion will depend on the num­ber of con­cur­rent users that the applic­a­tion is sup­posed to pro­cess. If there are mil­lions of users, we must ensure that our applic­a­tion does­n’t over­load its resources, like memory, CPU, data­base con­nec­tions, and so on. When too many users are try­ing to use the applic­a­tion at once, it can reach a point where all avail­able pro­cessing threads are in use. When this hap­pens, any new requests have to wait until a thread becomes avail­able. This can cause the applic­a­tion to slow down or even freeze if it’s hand­ling a lot of traffic.

Scal­ing Solu­tions and Their Costs

Before the exist­ence of vir­tual threads, there were two options avail­able for address­ing scalab­il­ity issues in con­cur­rent pro­gram­ming in Java. One option was to add more hard­ware by scal­ing hori­zont­ally or vertically.

Ver­tical Scal­ing: Deploy­ing the applic­a­tion on a more power­ful machine, VM, or con­tainer. This involves increas­ing resources like CPU, memory, and disk space. How­ever, this approach has its lim­its and increases costs, espe­cially in a cloud envir­on­ment and it can­not solve a very high scalab­il­ity issue. We can­not solve a case where an applic­a­tion has to sup­port 1 mil­lion users for example.

Hori­zontal Scal­ing: Increas­ing the num­ber of applic­a­tion nodes. This approach has no limit but it’s costly. It involves adding more nodes as scalab­il­ity needs increase, which has become a stand­ard practice.

The difference between vertical and horizontal scaling shown as one big and a small server as vertical and three small server as horizontal.
Fig­ure 5: The dif­fer­ence between ver­tical and hori­zontal scal­ing shown as one big and a small server as ver­tical and three small server as horizontal.

Ver­tical and hori­zontal scal­ing has become easier to imple­ment, with the rise of cloud com­put­ing, com­pan­ies have moved their applic­a­tions to the cloud infra­struc­ture. It also provides the abil­ity to auto­mat­ic­ally scale hori­zont­ally based on many para­met­ers like CPU util­isa­tion. So, busi­nesses don’t have to buy new hard­ware. They simply rent it for a short period of time.

So, you might think at this point the prob­lem is effect­ively solved, and it’s true, the prob­lem is solved, but the solu­tion is way too costly. Rent­ing or buy­ing addi­tional, more power­ful machines is a costly endeav­our, espe­cially when it comes to sup­port­ing mil­lions of users at a time. Cloud pro­viders like AWS, Microsoft Azure, and Google Cloud bene­fit when more VMs are added, increas­ing its rev­enue. There­fore, it is cru­cial to optim­ize the applic­a­tion for the highest scalab­il­ity pos­sible before resort­ing to these scal­ing solu­tions. We must make sure that a single applic­a­tion is already optim­ized for the highest scalab­il­ity pos­sible before we start scal­ing ver­tical and horizontally.

One of the crit­ical areas to con­sider for optim­ising an applic­a­tion is how it handles I/O oper­a­tions. There are two main approaches: Block­ing I/O and Non-Block­ing I/O.

Block­ing I/O

In a block­ing I/O model, when a thread per­forms an I/O oper­a­tion, it is put to sleep until the oper­a­tion is com­pleted. This approach is simple and straight­for­ward, the exe­cu­tion goes sequen­tially and it’s easy to wrap your head around. But can lead to inef­fi­cient resource usage, espe­cially under high load.

Pseudo Code For Blocking IO
Fig­ure 6: Pseudo Code For Block­ing IO

When you look at this dia­gram, you can see the thread execut­ing from top to bot­tom, with time pro­gress­ing down­wards. The applic­a­tion server pulls the thread from the pool to execute the user request. Dur­ing exe­cu­tion, the thread is some­times sched­uled by the CPU for pro­cessing and some­times blocked for vari­ous reas­ons, the most com­mon being I/O operations.

In the dia­gram, green indic­ates that the CPU is being util­ised by the thread for pro­cessing. In con­trast, red indic­ates that the thread is blocked due to I/O oper­a­tions. For instance, when the thread makes a call to the data­base to fetch data, it turns red while wait­ing for the data­base to respond. This wait­ing period is con­sidered wasted time for the thread since it can­not per­form other tasks dur­ing this time.

You can see three block­ing calls made by the thread. In other words, this thread, which could be used to handle a new incom­ing user request, is wait­ing for a blocked I/O oper­a­tion. Dur­ing these peri­ods, the thread is idle, con­sum­ing unne­ces­sary resources, as it is unable to per­form other tasks while wait­ing for a response. Depend­ing on the case, this response time can be 1ms, 1s, 10s, or more. The longer the response time, the longer the thread remains idle, block­ing resources. This idle time rep­res­ents a non-optimal use of sys­tem resources.

Advant­ages:
  • Sim­pli­city: Easy to write, read, and understand.
  • Sequen­tial Logic: Fol­lows a lin­ear flow, which is easier to debug and maintain.
Dis­ad­vant­ages:
  • Inef­fi­ciency: Threads are idle while wait­ing for I/O oper­a­tions to com­plete, lead­ing to poor resource utilisation.
  • Scalab­il­ity: Lim­ited by the num­ber of threads that can be cre­ated and man­aged by the system.

Non-Block­ing I/O

In a non-block­ing I/O model, threads can ini­ti­ate an I/O oper­a­tion and then con­tinue execut­ing other tasks. This approach allows bet­ter util­isa­tion of sys­tem resources and can handle a lar­ger num­ber of con­cur­rent I/O oper­a­tions. The pro­gram­ming paradigm changes rather dra­mat­ic­ally. The applic­a­tion should use non-block­ing API calls in the code instead of block­ing API calls.

A pseudo Code for Non Blocking IO
Fig­ure 7: A pseudo Code for Non Block­ing IO

How­ever, non-block­ing I/O intro­duces com­plex­ity, this pro­gram­ming model is clearly more non-intu­it­ive than what we, as developers, are used to. This is exactly why many developers have a hard time under­stand­ing non-block­ing I/O. We are simply not accus­tomed to it. Although the non-block­ing cod­ing style does solve the prob­lem of block­ing threads, it comes at the cost of very high com­plex­ity for pro­gram­mers. It’s easy to make mis­takes and dif­fi­cult to debug them. If we acci­dent­ally make a block­ing call within one of the call­back hand­lers, it won’t be easy to detect it.

Advant­ages:
  • Resource Effi­ciency: Threads are not blocked and can per­form other tasks while wait­ing for I/O oper­a­tions to complete.
  • Scalab­il­ity: Can handle a lar­ger num­ber of con­cur­rent oper­a­tions, as it is not lim­ited by the num­ber of threads.
Dis­ad­vant­ages:
  • Com­plex­ity: More dif­fi­cult to write and under­stand due to the asyn­chron­ous nature and poten­tial call­back hell.
  • Error Hand­ling: More com­plex error hand­ling and state management.

The Prom­ise of Vir­tual Threads

As of Java 21, developers have a new way to solve this prob­lem. An entirely new imple­ment­a­tion of Java thread has been cre­ated and it’s called Vir­tual threads. These light­weight threads do not block plat­form threads, mak­ing them highly effi­cient. This allows us to write imper­at­ive block­ing code without sig­ni­fic­ant con­cerns about scalab­il­ity. Now the JVM can give the illu­sion of plen­ti­ful threads by map­ping a large num­ber of vir­tual threads to a small num­ber of OS threads.

A Comparison of JVM and OS,
Fig­ure 8: A Com­par­ison of JVM and OS,

The applic­a­tion code in the thread-per-request style can run in a vir­tual thread for the entire dur­a­tion of a request. The vir­tual thread con­sumes an OS thread only while per­form­ing cal­cu­la­tions on the CPU. This res­ults in the same scalab­il­ity as the asyn­chron­ous style but achieved trans­par­ently. The syn­tax code remains famil­iar to developers, although the imple­ment­a­tion of vir­tual threads is very different.

To use vir­tual threads, developers instruct the JVM to cre­ate the Vir­tual Thread instead of the tra­di­tional thread, typ­ic­ally in a single line of code. Vir­tual Threads extend the func­tion­al­ity of tra­di­tional threads and adhere to the same API con­tracts. This ensures that the rest of the code remains largely unchanged from tra­di­tional thread usage. Developers can lever­age the bene­fits of vir­tual threads while main­tain­ing the famil­iar syn­tax and pro­gram­ming pat­terns they are accus­tomed to. This seam­less integ­ra­tion allows for easier adop­tion and adapt­a­tion of vir­tual threads in exist­ing code­bases, pro­mot­ing effi­ciency and scalab­il­ity without requir­ing extens­ive rewrites or adjustments.

A pseudo Code for Virtual Thread
Fig­ure 9: A pseudo Code for Vir­tual Thread

Vir­tual Threads and Cost Reduc­tion in Cloud Environments

Vir­tual Threads offer sev­eral bene­fits that trans­late into fin­an­cial sav­ings in cloud environments:

  1. Reduced Memory Over­head: Vir­tual Threads con­sume less memory, allow­ing more threads to run on the same resources, redu­cing the num­ber of required instances and lower­ing costs.
  2. Pro­cessor Effi­ciency: Bet­ter CPU util­isa­tion allows more tasks to be per­formed per unit of time, redu­cing the num­ber of instances needed to sup­port the workload.
  3. Scalab­il­ity Without Pro­por­tional Costs: Vir­tual Threads enable cre­at­ing mil­lions of light­weight threads, elim­in­at­ing the need to over pro­vi­sion infra­struc­ture for peak loads.
  4. Reduced Latency: Applic­a­tions with lower latency require fewer resources to achieve the same per­form­ance levels, res­ult­ing in oper­a­tional cost savings.

Takeaways

Vir­tual Threads rep­res­ent a sig­ni­fic­ant innov­a­tion in the Java plat­form, provid­ing an effi­cient solu­tion to con­cur­rency and par­al­lel­ism chal­lenges. In cloud envir­on­ments, where resource effi­ciency dir­ectly impacts fin­an­cial costs, adopt­ing Vir­tual Threads can lead to sub­stan­tial reduc­tions in oper­a­tional expenses. By embra­cing this new tech­no­logy, developers and archi­tects can cre­ate more scal­able and cost-effect­ive applications.

Fur­ther­more, Vir­tual Threads sim­plify the devel­op­ment pro­cess by allow­ing developers to write straight­for­ward, sequen­tial code without sac­ri­fi­cing per­form­ance. This reduces the com­plex­ity asso­ci­ated with man­aging tra­di­tional threads, lead­ing to cleaner, more main­tain­able code­bases. Most import­antly, by max­im­ising resource util­isa­tion, Vir­tual Threads can sig­ni­fic­antly reduce the num­ber of instances required to handle the same work­load. This optim­iz­a­tion can res­ult in con­sid­er­able cost sav­ings on cloud infra­struc­ture, mak­ing it an essen­tial strategy for busi­nesses look­ing to lower their oper­a­tional costs while main­tain­ing high per­form­ance and scalability.