Host managed contention avoidance storage solutions for Big Data

dc.contributor.author Mishra, Pratik
dc.contributor.author Somani, Arun
dc.contributor.department Department of Electrical and Computer Engineering
dc.date 2018-05-21T13:20:58.000
dc.date.accessioned 2020-06-30T02:02:38Z
dc.date.available 2020-06-30T02:02:38Z
dc.date.copyright Sun Jan 01 00:00:00 UTC 2017
dc.date.issued 2017-12-01
dc.description.abstract <p>The performance gap between compute and storage is fairly considerable. This results in a mismatch between the application needs from storage and what storage can deliver. The full potential of storage devices cannot be harnessed till all layers of I/O hierarchy function efficiently. Despite advanced optimizations applied across various layers along the odyssey of data access, the I/O stack still remains volatile. The problems associated due to the inefficiencies in data management get amplified in Big Data shared resource environments. The Linux OS (host) block layer is the most critical part of the I/O hierarchy, as it orchestrates the I/O requests from different applications to the underlying storage. Unfortunately, despite it’s significance, the block layer, essentially the block I/O scheduler, hasn’t evolved to meet the needs of Big Data. We have designed and developed two contention avoidance storage solutions, collectively known as “BID: Bulk I/O Dispatch” in the Linux block layer specifically to suit multitenant, multi-tasking shared Big Data environments. Hard disk drives (HDDs) form the backbone of data center storage. The data access time in HDDs is majorly governed by disk arm movements, which usually occurs when data is not accessed sequentially. Big Data applications exhibit evident sequentiality but due to the contentions amongst other I/O submitting applications, the I/O accesses get multiplexed which leads to higher disk arm movements. BID schemes aim to exploit the inherent I/O sequentiality of Big Data applications to improve the overall I/O completion time by reducing the avoidable disk arm movements. In the first part, we propose a dynamically adaptable block I/O scheduling scheme BID-HDD for disk based storage. BID-HDD tries to recreate the sequentiality in I/O access in order to provide performance isolation to each I/O submitting process. Through trace driven simulation based experiments with cloud emulating MapReduce benchmarks, we show the effectiveness of BID-HDD which results in 28–52% lesser time for all I/O requests than the best performing Linux disk schedulers. In the second part, we propose a hybrid scheme BID-Hybrid to exploit SCM’s (SSDs) superior random performance to further avoid contentions at disk based storage. BID-Hybrid is able to efficiently offload non-bulky interruptions from HDD request queue to SSD queue using BID-HDD for disk request processing and multi-q FIFO architecture for SSD. This results in performance gain of 6–23% for MapReduce workloads when compared to BID-HDD and 33–54% over best performing Linux scheduling scheme. BID schemes as a whole is aimed to avoid contentions for disk based storage I/Os following system constraints without compromising SLAs.</p>
dc.description.comments <p>This article is published as Mishra, Pratik, and Arun K. Somani. "Host managed contention avoidance storage solutions for Big Data." <em>Journal of Big Data</em> 4, no. 1 (2017): 18. DOI: <a href="http://dx.doi.org/10.1186/s40537-017-0080-9" target="_blank">10.1186/s40537-017-0080-9</a>. Posted with permission.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/ece_pubs/184/
dc.identifier.articleid 1185
dc.identifier.contextkey 12130890
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath ece_pubs/184
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/21010
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/ece_pubs/184/2017_Somani_HostManaged.pdf|||Fri Jan 14 21:41:30 UTC 2022
dc.source.uri 10.1186/s40537-017-0080-9
dc.subject.disciplines Computer Sciences
dc.subject.disciplines Databases and Information Systems
dc.subject.disciplines Electrical and Computer Engineering
dc.subject.keywords Multi-tier
dc.subject.keywords Hard disk drives
dc.subject.keywords Solid state drives
dc.subject.keywords MapReduce
dc.subject.keywords Hadoop
dc.subject.keywords Hdfs
dc.subject.keywords Contention avoidance
dc.subject.keywords Big Data
dc.subject.keywords Storage
dc.subject.keywords Block I/O layer
dc.subject.keywords I/O scheduler
dc.title Host managed contention avoidance storage solutions for Big Data
dc.type article
dc.type.genre article
dspace.entity.type Publication
relation.isAuthorOfPublication edede50a-4e31-44f3-a7c7-a06dc8db42c2
relation.isOrgUnitOfPublication a75a044c-d11e-44cd-af4f-dab1d83339ff
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2017_Somani_HostManaged.pdf
Size:
6.89 MB
Format:
Adobe Portable Document Format
Description:
Collections