스마트폰 성능 혁신 | 기버 번역 블로그

Table of Contents

스마트폰 성능 혁신 번역에 대해서 알아 보겠습니다(한영번역)

스마트폰 성능 혁신 번역(한국어 원본)

1 Introduction
현재 smart device는 급속도로 대중화 되었고, Phone, TV, Camera 및 Game 등 일상 생 활에서 접할 수 있는 모든 기기들로 확대되었다. 따라서, 이러한 smart device의 성능을 향상시키는 일은 매우 중요한 일이라고 할 수 있다. smart device의 성능 향상을 위해 서는 먼저, smart device의 bottle-neck을 파악해야만 한다. Performance Bottle-neck은 Wire-less network, CPU, Memory, Storage등 여러 가지가 있을 수 있지만, 최근의 연구 에서 Storage가 가장 큰 Bottle-neck이라는 점이, 가장 대표적인 smart device인 android smartphone 에서 다양한 실험을 통해 확인되었다 [1].
최근의 안드로이드 스마트폰에는 고화소의 카메라 센서(800만 화소이상)가 내장되어 있으며, 이에 따라 저장되는 정지화상 이미지 및 동영상 이미지의 해상도도 증가하고 있다. 이는 스마트폰이 contents 의 측면에서 볼 때, 더 이상 단순 재생기기가 아닌 Contents Creator 가 되었음을 의미 하며, 실제로 안드로이드 플랫폼을 사용한 디지털 카메라도 출시되었다. 향후, Phone, TV 및 카메라 기기들을 포함한 smart deivce들은 HD를 넘어 Ultra-HD급 화질(3840×2160)을 촬영 및 재생하게 될 전망이다. 즉, smart device의 bottle-neck으로 밝혀진 storage는 현재보다 더 빠른 속도를 요구받게 된다.
현재 안드로이드 스마트폰의 스토리지는 용도에 따라 partition을 구분하였고, 각각 의 partition은 ext4, vfat, FUSE등 다양한 file system을 사용하고 있다. Application 설치 공간 및 DB 저장 용도로 사용되는 /data partition (data partition)은 그 파일시스템으로 ext4를 사용하고 있는 반면, 사용자 data 및 Media file 등을 저장하는 용도로 사용되는 partition (user partition)은 다른 OS (Windows, MAC 등)과의 호환성을 유지하기 위해, FUSE File system을 사용하고 있다. 이러한 FUSE File system은 native file system인 EXT4 위에 가상 file system layer로 구성되어 있기 때문에, 추가적인 layer로 인한 성능 저하를 유발할 수 있다. 실제로, 갤럭시S3의 eMMC partition중 EXT4를 사용하는 data partition은 33MB/s의 Write Throughput을 보이는 반면 동일한 eMMC 를 사용하는 user partition은 FUSE를 사용하며 8MB/s의 낮은 Write Throughput을 보이고 있다.
사용자 data 저장용으로 사용되는 user partition 의 IO 특성을 살펴보면, 사진, 동영상, 영화 등에 대한 R/W가 이루어 지며 Buffered Sequential R/W가 대부분을 차지한다. 이는 Application 및 DB data등이 저장되는 data partition의 IO 특성 (Synchronous Random RW)과 비교했을 때 큰 차이가 난다. 즉, 이에 대한 성능 분석 및 성능 향상 방법도 차이 가 나게 된다. 본 논문에서는, FUSE를 사용하는 안드로이드 스마트폰의 user partition 에 대한 성능 향상을 위해, Filesystem – Buffer cache – IO scheduler – Block device – HW device layer에 걸쳐서 각각 성능 향상 요소를 발굴하였다.
구체적으로, 우리는 Android based smartphone의 user partition에 대해서 IO structure를 분석하였고, 각 IO layer에 대한 overhead를 측정하였다. 이 측정 결과를 통해, FUSE framework이 가장 큰 overhead를 차지하고 있는 layer로 판명되었다. 따라서 우리 는 FUSE framework의 동작에 대해 면밀히 분석하였고, FUSE layer의 sequential write시 발생하는 randomness를 최소화 하기 위해, FUSE cache를 새롭게 구현하여 성능을 비교 하였다. 결과적으로, file system을 XFS로 변경하고, FUSE framework의 write request IO size를 512KB로 변경하고, IO scheduler를 deadline으로 변경하고, FUSE cache를 적용할 경우 기존 base system 대비 470%의 성능 향상을 확인하였다. 또한 eMMC device의 최대 성능 대비 99%의 성능을 확보할 수 있었다.
2 Related Work
최근 몇 년간 스마트폰의 Storage, FTL, File System, Buffer Cache 등 IO sub-system 전반 에 걸쳐서 성능을 측정하고 이를 향상하기 위해 많은 연구들이 진행되고 있다. Kim [1] 은 일반적 통념과 달리, mobile device의 성능에 결정적 영향을 주는 요소는 network 속도가 아니라 storage임을 실제 스마트폰 (Nexus One)에서 다양한 실험을 통해 증명하였다. 이 실험에서 다양한 SD card에 대해 App benchmark, Runtime perf, App launch, Concurrent App, CPU consumption 비교를 수행하였으며, 실험 결과 SD Card의 IO 성능에 따라 Performance가 100 200%, 극단적으로는 2000%까지 차이가 나는 것을 확인하였다. 또한, 이러한 문제점을 개선하기 위하여 RAID over SD, NILFS2 (log structured FS), Selective Sync, PCM 사용 등의 Pilot solution을 제안하였다. 비록 문제점에 대한 명확한 해결책을 제시하지는 못했지만, 스마트폰에서 스토리지가 수행하는 역할이 중요하다는 사실을 최초로 다양한 실험을 통해 증명했다는데 큰 의의가 있다. 또한 Android IO subsystem 및 실험 환경에 대한 구체적인 기술을 통해 주장의 신뢰성을 뒷받침하였다.
Kim [2] 은 또한 저가의 Flash memory를 사용하는 스마트폰의 IO 성능을 향상하기 위해 Buffer Cache management를 개선하는 연구를 진행하였다. 그는 새로운 buffer cache performance evaluation methods를 고안하였는데, 기존 real implementation 또는 trace driven simulation의 단점을 보완하여 새로운 hybrid evaluation 방법을 개발하였다. real implementation은 OS 전반에 걸쳐 구현이 힘들고 OS의 다른 part에서 performance noise 가 발생한다. trace driven simulation은 모든 metric에 대한 정확한 결과를 도출하기 어렵다. 반면 새로 고안된 hybrid evaluation은 두 가지의 단점을 보완하여 before cache, after cache, workload player 순으로 진행된다. 또한 새로운 Buffer Cache Replacement Scheme 을 고안하였는데 기존의 LRU, Clock, Linux2Q, CFLRU, LRUWSR, FOR, FAB 등 대 표적인 알고리즘이 Spatial Locality를 감안하지 않고 write 횟수를 줄이는데 초점을 둔 점을 개선한 SpatialClock 알고리즘을 개발하였다. 이 알고리즘은 Clock 알고리즘을 기반으로 하지만 logical sector number에 따라 Page frame이 정렬되어 있는 것이 특징적이다. 이 정렬 방법으로 AVL Tree를 사용하였다. SpatialClock 알고리즘을 통해 real storage 에서의 IO 수행시간이 크게 개선되면서 Cache Hit ratio는 기존의 알고리즘과 동일한 수준을 유지하는 놀라운 결과를 얻었다. 다만 Trace Driven evaluation 의 한계로 인해 SpatialClock 알고리즘 자체에 대한 Computation Time은 측정 시간에 포함되지 못했다. 한편, 안드로이드 스마트폰에서 사용되는 Filesystem은 YAFFS2에서 EXT4로 변경 되었으며, EXT4 에 대한 성능향상에 대한 연구 또한 진행되었다. Kim [3] 은 Android- based smartphone에서 널리 사용되는 EXT4 Filesystem의 성능을 향상을 위해 5개의 Tuning parameter를 변경하여 default option과 성능을 비교하였다. 5개 옵션은 noatime, noauto da alloc, journal async commit, single flex block group, and the smaller inode size 인데 이는 궁극적으로 metadata의 update traffic을 감소하기 위함이다. 실제 smartphone 적용 결과(GT-I9000 : galaxyS 해외모델) Postmark 에서 13%의 성능 향상을 보였다. 이 옵션들 중 일부는 최신 스마트폰 모델에 이미 적용되었으며, 다른 옵션들을 포함하여 EXT4 의 생성 및 mount시 옵션들에 대한 보다 상세한 성능 비교 및 연구가 필요할 것으로 보인다.
이러한 EXT4 파일 시스템의 Journal 이 불필요한 Write를 발생시킨 다는 점과 FTL 의 동작을 연관 짓는 연구도 진행되었다. Jang [4] 은 Page mapping table을 위한 resource 제약(RAM, CPU) 및 journal에 의한 중복 writing을 감안할 때, FTL이 NAND controller 가 아닌 Host side에 위치해야 효율적이라고 주장하였다. 그는 Host로 이동된 FTL을 통해 simple and powerful FTL과 fast journaling interface를 구현하였고 linux 2.6.35의MTD와 JBD를 수정하고 Nandsim 상에서 비교 실험을 진행하였다. 실험 결과 Random Write IOPS가 20% 30% 향상되었다. 이는 FTL의 하드웨어적 제약사항을 극복하는 방법을 제시하고 과도한 journal 에 의한 성능 저하에 대한 해결 방법 또한 제시했다는 면에서 의의가 있으나 구체적 구현 방법 및 성능 향상 원인에 대한 설명이 부족한 점은 보완이 필요해 보인다.

스마트폰 성능 혁신 번역(영어 번역본)

1 Introduction
Smart devices have rapidly spread into the public, and expanded their territory into all home electronics including phone, TV, camera, and game consoles. As they have become closely related to our everyday living, improving performance of these smart devices will have significant impact on the society. In order to improve the performance, we must first identify the bottleneck of smart devices. Wireless network, CPU, memory, and storage may all serve as performance bottleneck, but a recent study on Android smartphone suggested that biggest bottleneck is the storage [1].
Newest generation of Android smartphone has high-pixel camera sensor (over 8 mega-pixels), which consequently leads to increase in resolution of still-motion images and movies. From the contents perspective, smartphone is no longer a simple media player but has become a contents creator; this is evident from the introduction of a digital camera that uses the Android platform. In the future, smart devices including phone, TV, and camera are projected to acquire and play images at Ultra-HD resolutions (3840×2160). Therefore, performance of storage that currently acts as the bottleneck of smart devices will be required to significantly improve.
Storage of Android smartphone is partitioned according to usage. The partitions use a variety of file systems such as ext4, vfat, and FUSE. While /data partition for application installation and DB storage uses ext4, user partition for user data and media file storage uses VFAT for compatibility with other OS such as Windows and iOS. The FUSE file system exists as a virtual file system layer on top of the native file system ext4, which may lead to reduced performance due to using additional layer. For example, Galaxy S3’s ext4 data partition using eMMC exhibits 33 MB/s write throughput but FUSE user partition using the same eMMC shows low write throughput of 8 MB/s.
IO of user partition used for user data storage involves R/W of photos and movies, most of which is buffered sequential R/W. This significantly differs from synchronous random RW of IO of data partition for application and DB data. Consequently, performance analysis and method to enhance the performance would differ for the two types of partition. In this study, performance enhancing element for each of file system, buffer cache, IO scheduler, block device, and HW device layer was researched for improving performance of FUSE user partition in Android smartphone.
We analyzed IO structure of user partition in Android-based smartphone and measured overhead of each IO layer. Measurements showed that FUSE framework has the largest overhead. We therefore closely analyzed operation of FUSE framework and developed a new FUSE cache implementation to minimize randomness that occurs during sequential write of the FUSE layer. By using the FUSE cache and changing the file system to XFS, the write request IO size of the FUSE framework to 512 KB, and IO scheduler to deadline, performance was improved by 470% compared to the previous base system. The performance was 99% of the maximum performance of eMMC device.

2 Related Work
In the past few years, many studies have focused on measuring and enhancing performance of various IO sub-systems such as the storage, FTL, file system, and buffer cache. Kim [1] has shown through various experiments on the Nexus One smartphone that the determinant of mobile device performance is not the network speed but storage, which is contrary to the popular belief. In this study, application benchmark, run-time performance, application launch, concurrent application, and CPU consumption of various SD card were compared. Results showed that performance differed in general by 100 200% and up to 2000% depending on the IO performance. To address this problem, he proposed pilot solution encompassing use of RAID over SD, NILFS2 (log structured FS), selective synchronization, and PCM. Although no clear solution was provided, the study is significant in that it was the first to show that the storage performance is important in smartphones. Also, it included detailed description of the experimental setting and Android IO sub-systems to support its findings.
Kim [2] also conducted a study to improve buffer cache management in order to improve IO performance of smartphones using low-cost flash memory. He developed novel methods to evaluate buffer cache performance, which are hybrid methods that hybridize existing real implementation and trace-driven simulation method to complement weakness of two methods. Real implementation is difficult to be applied over the whole OS and results in performance noise in other parts of the OS. On the other hand, it is hard to deduce exact results for all metrics using trace driven simulation. The newly developed hybrid evaluation complements weaknesses in the two methods and proceeds in the order of before cache, after cache, and workload player. He also developed a new buffer cache replacement scheme Spatial Clock, which addresses the issue of spatial locality that has been neglected by other algorithms such as LRU, Clock, Linux2Q, CFLRU, LRUWSR, FOR, and FAB that only focused on reducing write cycle. This algorithm is based on the clock algorithm, but page frame is aligned according to logical sector number. Alignment was achieved using AVL tree. By using SpatialClock algorithm, IO performance was greatly improved in real storage while cache hit ratio remained similar to that of previous algorithms. However, computation time for SpatialClock algorithm itself was unable to be measured due to limitation of trace driven evaluation. On the other hand, file system for Android smartphone has been changed from YAFFS2 to EXT4, and research to improve performance EXT4 has been conducted. To improve performance of EXT4 file system that is widely used in Android-based smartphone, Kim [3] modified five tuning parameters and compared the performance with default option. The five options are noatime, noauto da alloc, journal async commit, single flex block group, and the smaller inode size. The goal of optimization was to ultimately reduce update traffic of metadata. In GT-I9000 smartphone, which is overseas model for Galaxy S, postmark performance was improved by 13%. Some of these options are already applied to recent smartphone models. More detailed study of performance comparison of various options including options for EXT4 generation and mounting is required.
A study has investigated the relationship between the action of FTL and the unnecessary write generated by the EXT4 journal . Jang [4] asserted that FTL should be located on the host side instead of NAND controller to be more efficient, taking into account the resource constraints (RAM and CPU) for page mapping table and redundant writing by the journal. By moving FTL to the host, he realized simple and powerful FTL and fast journaling interface. He modified MTB and JBD of Linux 2.6.35 and compared performance of host-side FTL and Nand controller-side FTL on Nandsim. As a result, random write IOPS was improved by 20% to 30%. Although this study was significant in that it presented a solution to overcoming hardware limitations of FTL as well as performance reduction caused by excess journaling, it does not provide specific instructions for implementation of the proposed method nor explain the cause for performance enhancement.

이상 한양대학교 산학협력단에서 의뢰한 스마트폰 성능 혁신 번역(한영번역)의 일부를 살펴 보았습니다.
번역은 기버 번역