Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Stochastic Approach to Determine the Optimal Number of Servers for Reliable and Energy Efficient Operation of Data Centers
Luleå University of Technology, Department of Engineering Sciences and Mathematics, Energy Science.ORCID iD: 0000-0001-5870-0112
Luleå University of Technology, Department of Engineering Sciences and Mathematics, Energy Science.ORCID iD: 0000-0003-4074-9529
Luleå University of Technology, Department of Engineering Sciences and Mathematics, Energy Science.ORCID iD: 0000-0003-4443-7653
2023 (English)In: IEEE Transactions on Sustainable Computing , E-ISSN 2377-3782, Vol. 8, no 2, p. 153-164Article in journal (Refereed) Published
Abstract [en]

The increasing demand of the data center's computational capacity in recent years has introduced new data center operational challenges among others to maintain the service level agreements (SLA) and quality of services (QoS), while at the same time limiting energy consumption. In this paper, a stochastic operational risk assessment approach is presented that estimates the required number of spare servers in a data center considering the risk of servers' failure in operation since servers define the computational capability of a data center. A reliability index called “risk of computational resource commitment (RCRC)” is introduced that quantifies the probability of having insufficient spare servers due to failures during the operational lead time, and the complement of the RCRC shows the ability of the resources to maintain SLA of a data center. The failure rates of the servers are obtained using a Monte Carlo Simulation with the failure data, published by Google in 2019. The analysis shows that the RCRC reduces with the increasing number of spare servers, while it also stresses the energy efficiency of the data center. The RCRC index could be used in data center operation to avoid overprovisioning of the servers and to limit the number of spare servers in the data center, while creating a suitable balance between QoS and energy consumption of the data centers.

Place, publisher, year, edition, pages
IEEE, 2023. Vol. 8, no 2, p. 153-164
Keywords [en]
data center operation, Monte Carlo simulation, risk assessment, stochastic modeling, server failure
National Category
Other Civil Engineering Energy Systems
Research subject
Electric Power Engineering
Identifiers
URN: urn:nbn:se:ltu:diva-94241DOI: 10.1109/tsusc.2022.3216350ISI: 001005680900001Scopus ID: 2-s2.0-85163183639OAI: oai:DiVA.org:ltu-94241DiVA, id: diva2:1713108
Funder
Swedish Energy Agency, 43090-2Norrbotten County Council
Note

Validerad;2023;Nivå 2;2023-07-12 (sofila);

Funder: Cloudberry Datacenters project

Available from: 2022-11-23 Created: 2022-11-23 Last updated: 2025-10-21Bibliographically approved
In thesis
1. On the Energy Efficiency and Reliability of Data Centers in Operation
Open this publication in new window or tab >>On the Energy Efficiency and Reliability of Data Centers in Operation
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The new generation information technology (IT) services like mobile Internet, Internet of things (IoT), cloud computing, processing of big data, applications of artificial intelligence, etc. are becoming popular with the development of the information and communication technology (ICT) industry. In this industry, the dependency on the data centers is also increasing to ensure the quality of services (QoS). Thus, the energy consumption of the data centers is increasing with the increasing demand for computational resources in it because the load sections of the data center with sensitive equipment run $24$ hours a day, $365$ days of the year. Regarding data center operation, it is becoming a technical challenge to make a trade-off between reducing the energy consumption to limit the operational costs and ensuring higher reliability of the data center.

A way to help data center operators to cope with the posed challenges is by identifying the ``right size of the computational resource'', considering the power losses and service availability of the data center. This endeavor requires power consumption models that can consider different load sections with different types of equipment. The power consumption models of the load sections can address the electrical load demand and the power losses, especially losses in the internal power conditioning system (IPCS). On the other hand, the service availability of the data center mainly depends on the availability of the computational resources like servers and on the availability of the power supply through the IPCS. It is important to characterize the servers' failure and repair times to develop the stochastic model of the server unavailability in operation. The availability of adequate power supply through the IPCS depends on its component failures and the power supply capacity of its components. The bottleneck of the power supply capacity of the IPCS is subjected to the power losses of the equipment in the IPCS. Additionally, the voltage disturbances like voltage dips and swells in the IPCS also interrupt the power supply units (PSUs) of the servers, which also degrades the QoS of the data center.

The outcomes of this thesis can be synthesized as follows: 1) A comparative analysis of the energy consumption models of the major load sections in the data center, and an analysis of the impact of the power losses in the IPCS on the outage probability of the servers. 2) Reliability indices to assess the adequacy of the computational resources in the data center considering the outages of power supplies and the servers in operation. 3) The impacts of voltages disturbances in the IPCS on the power supply outages, hence on the interruptions of servers. 4) An analysis of the trade-off between the energy efficiency and reliability in operational planning of the data center.

Place, publisher, year, edition, pages
Luleå: Luleå tekniska universitet, 2023. p. 210
Series
Doctoral thesis / Luleå University of Technology, ISSN 1402-1544
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Electric Power Engineering
Identifiers
urn:nbn:se:ltu:diva-96599 (URN)978-91-8048-307-0 (ISBN)978-91-8048-308-7 (ISBN)
Public defence
2023-06-16, Hörsal A, Luleå tekniska universitet, Skellefteå, 09:00 (English)
Opponent
Supervisors
Funder
Swedish Energy Agency, 24559
Available from: 2023-04-17 Created: 2023-04-16 Last updated: 2025-10-21Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Uddin Ahmed, Kazi MainBollen, Math H. J.Alvarez, Manuel

Search in DiVA

By author/editor
Uddin Ahmed, Kazi MainBollen, Math H. J.Alvarez, Manuel
By organisation
Energy Science
In the same journal
IEEE Transactions on Sustainable Computing
Other Civil EngineeringEnergy Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 350 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf