research-article

Open access

M3Cam: Extreme Super-resolution via Multi-Modal Optical Flow for Mobile Cameras

Authors:

Ju RenAuthors Info & Claims

SENSYS '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems

Pages 744 - 756

https://doi.org/10.1145/3666025.3699371

Published: 04 November 2024 Publication History

Abstract

The demand for ultra-high-resolution imaging in mobile phone photography is continuously increasing. However, the image resolution of mobile devices is typically constrained by the size of the CMOS sensor. Although deep learning-based super-resolution (SR) techniques have the potential to overcome this limitation, existing SR neural network models require large computational resources, making them unsuitable for real-time SR imaging on current mobile devices. Additionally, cloud-based SR systems pose privacy leakage risks. In this paper, we propose M³Cam, an innovative and lightweight SR imaging system for mobile phones. M³Cam can ensure high-quality 16× SR image (4× in both height and width) visualization with almost negligible latency. In detail, we utilize an optical image stabilization (OIS) module for lens control and introduce a new modality of data, namely gyroscope readings, to achieve high-precision and compact optical flow estimation modules. Building upon this concept, we design a multi-frame-based SR model utilizing the Swin Transformer. Our proposed system can generate a 16× SR image from four captured low-resolution images in real-time, with low computational load, low inference latency, and minimal reliance on runtime RAM. Through extensive experiments, we demonstrate that our proposed multi-modal optical flow model significantly enhances pixel alignment accuracy between multiple frames and delivers outstanding 16× SR imaging results under various shooting scenarios. Code and dataset are available at: https://github.com/liangjindeamo-yuer/M3CAM

References

[1]

ONNX AI. 2023. https://onnx.ai/

[2]

Tai An, Xin Zhang, Chunlei Huo, Bin Xue, Lingfeng Wang, and Chunhong Pan. 2022. TR-MISR: Multiimage super-resolution based on feature fusion with transformers. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15 (2022), 1373--1388.

[3]

antutu. 2023. https://www.antutu.com/en/doc/index.htm

[4]

Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2021. Deep burst super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9209--9218.

[5]

Goutam Bhat, Martin Danelljan, Fisher Yu, Luc Van Gool, and Radu Timofte. 2021. Deep reparametrization of multi-frame super-resolution and denoising. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2460--2470.

[6]

Brent Cardani. 2006. Optical image stabilization for digital cameras. IEEE Control Systems Magazine 26, 2 (2006), 21--22.

[7]

Ricardo Omar Chavez-Garcia and Olivier Aycard. 2016. Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking. IEEE Transactions on Intelligent Transportation Systems 17, 2 (2016), 525--534.

Digital Library

[8]

Rong Chen, Xiao Tang, Yuxuan Zhao, Zeyu Shen, Meng Zhang, Yusheng Shen, Tiantian Li, Casper Ho Yin Chung, Lijuan Zhang, Ji Wang, et al. 2023. Single-frame deep-learning super-resolution microscopy for intracellular dynamics imaging. Nature Communications 14, 1 (2023), 2854.

[9]

Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758--2766.

Digital Library

[10]

Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2022. Burst image restoration and enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5759--5768.

[11]

Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2023. Burstormer: Burst image restoration and enhancement transformer. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5703--5712.

[12]

Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2023. Burstormer: Burst Image Restoration and Enhancement Transformer. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5703--5712.

[13]

Ming Gao, Feng Lin, Weiye Xu, Muertikepu Nuermaimaiti, Jinsong Han, Wenyao Xu, and Kui Ren. 2020. Deaf-aid: mobile IoT communication exploiting stealthy speaker-to-gyroscope channel. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--13.

Digital Library

[14]

Google. 2017. Battery Historian. https://github.com/google/battery-historian.

[15]

Google. 2023. Inspect CPU activity with CPU Profiler. https://developer.android.com/studio/profile/cpu-profiler.

[16]

Google. 2024. Inspect your app's memory usage with Memory Profiler. https://developer.android.com/studio/profile/memory-profiler.

[17]

Digital gov. [n. d.]. System Usability Scale (SUS). https://www.usability.gov/how-to-andtools/methods/system-usability-scale.html.

[18]

H.W. Haussecker and D.J. Fleet. 2001. Computing optical flow with physical models of brightness variation. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 6 (2001), 661--673.

Digital Library

[19]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[20]

Dominik Honegger, Lorenz Meier, Petri Tanskanen, and Marc Pollefeys. 2013. An open source and open hardware embedded metric optical flow CMOS camera for indoor and outdoor applications. In 2013 IEEE International Conference on Robotics and Automation. 1736--1741.

[21]

Tak-Wai Hui, Xiaoou Tang, and Chen Change Loy. 2018. Liteflownet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8981--8989.

[22]

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2462--2470.

[23]

Hakki Can Karaimer and Michael S Brown. 2016. A software platform for manipulating the camera imaging pipeline. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I 14. Springer, 429--444.

[24]

Bruno Lecouat, Jean Ponce, and Julien Mairal. 2021. Lucas-kanade reloaded: End-to-end super-resolution from raw image bursts. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2370--2379.

[25]

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. 2021. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision. 1833--1844.

[26]

Adobe Lightroom. [n. d.]. https://lightroom.adobe.com/

[27]

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. 2017. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 136--144.

[28]

Haisong Liu, Tao Lu, Yihui Xu, Jia Liu, Wenjie Li, and Lijun Chen. 2022. Cam-LiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5791--5801.

[29]

Jie Liu, Jie Tang, and Gangshan Wu. 2020. Residual feature distillation network for lightweight image super-resolution. In Computer Vision-ECCV 2020 Workshops: Glasgow, UK, August 23--28, 2020, Proceedings, Part III 16. Springer, 41--55.

Digital Library

[30]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012--10022.

[31]

Zhisheng Lu, Juncheng Li, Hong Liu, Chaoyan Huang, Linlin Zhang, and Tieyong Zeng. 2022. Transformer for single image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 457--466.

[32]

Ziwei Luo, Youwei Li, Shen Cheng, Lei Yu, Qi Wu, Zhihong Wen, Haoqiang Fan, Jian Sun, and Shuaicheng Liu. 2022. BSRT: Improving burst super-resolution with swin transformer and flow-guided deformable alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 998--1008.

[33]

Ziwei Luo, Lei Yu, Xuan Mo, Youwei Li, Lanpeng Jia, Haoqiang Fan, Jian Sun, and Shuaicheng Liu. 2021. Ebsr: Feature enhanced burst super-resolution with deformable alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 471--478.

[34]

James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281--297.

[35]

Thomas Maschke. 2013. Digitale kameratechnik: technik digitaler kameras in theorie und praxis. Springer-Verlag.

[36]

MATLAB. 2023. https://ww2.mathworks.cn/help/images/ref/raw2rgb.html

[37]

T Mobile. [n. d.]. Huawei P40 Pro Plus review. https://www.techradar.com/reviews/huawei-p40-pro-plus

[38]

Hao Pan, Feitong Tan, Yi-Chao Chen, Gaoang Huang, Qingyang Li, Wenhao Li, Guangtao Xue, Lili Qiu, and Xiaoyu Ji. 2022. DoCam: depth sensing with an optical image stabilization supported RGB camera. In Proceedings of the 28th Annual International Conference on Mobile Computing and Networking. 405--418.

Digital Library

[39]

Hao Pan, Feitong Tan, Wenhao Li, Yi-Chao Chen, and Guangtao Xue. 2022. OISSR: Optical Image Stabilization Based Super Resolution on Smartphone Cameras. In Proceedings of the 30th ACM International Conference on Multimedia. 2978--2986.

Digital Library

[40]

Anurag Ranjan and Michael J. Black. 2017. Optical Flow Estimation Using a Spatial Pyramid Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]

Anurag Ranjan and Michael J Black. 2017. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4161--4170.

[42]

Remini. [n. d.]. Remini. https://remini.ai/

[43]

Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8934--8943.

[44]

Zachary Teed and Jia Deng. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In European conference on computer vision. Springer, 402--419.

Digital Library

[45]

Tim Van Erven and Peter Harremos. 2014. Rényi divergence and Kullback-Leibler divergence. IEEE Transactions on Information Theory 60, 7 (2014), 3797--3820.

[46]

Bandhav Veluri, Collin Pernu, Ali Saffari, Joshua Smith, Michael Taylor, and Shyamnath Gollakota. 2023. NeuriCam: Key-Frame Video Super-Resolution and Colorization for IoT Cameras. Association for Computing Machinery, New York, NY, USA.

Digital Library

[47]

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. 2018. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops. 0--0.

[48]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600--612.

Digital Library

[49]

Stephen T Welstead. 1999. Fractal and wavelet image compression techniques. Vol. 40. Spie Press.

[50]

Bartlomiej Wronski, Ignacio Garcia-Dorado, Manfred Ernst, Damien Kelly, Michael Krainin, Chia-Kai Liang, Marc Levoy, and Peyman Milanfar. 2019. Handheld multi-frame super-resolution. ACM Transactions on Graphics (ToG) 38, 4 (2019), 1--18.

Digital Library

[51]

Li Xi, Liu Guosui, and Jinlin Ni. 1999. Autofocusing of ISAR images based on entropy minimization. IEEE Trans. Aerospace Electron. Systems 35, 4 (1999), 1240--1252.

[52]

Gengshan Yang and Deva Ramanan. 2020. Upgrading Optical Flow to 3D Scene Flow Through Optical Expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.

[54]

Yulun Zhang, Huan Wang, Can Qin, and Yun Fu. 2021. Aligned structured sparsity learning for efficient image super-resolution. Advances in Neural Information Processing Systems 34 (2021), 2695--2706.

[55]

Shengyu Zhao, Yilun Sheng, Yue Dong, Eric I Chang, Yan Xu, et al. 2020. Maskflownet: Asymmetric feature matching with learnable occlusion mask. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6278--6287.

Index Terms

M3Cam: Extreme Super-resolution via Multi-Modal Optical Flow for Mobile Cameras
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
  2. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

Variational method for super-resolution optical flow

The motion fields in an image sequence observed by a car-mounted imaging system depend on the positions in the imaging plane. Since the motion displacements in the regions close to the camera centre are small, for accurate optical flow computation in ...
Patch-based spatio-temporal super-resolution for video with non-rigid motion

This paper presents a novel approach for spatio-temporal video super-resolution. Whereas the task of synthesizing high-frequency information on the spatial domain can be accomplished without introducing arbitrary priors on the image model (beyond the ...
Optical flow for video super-resolution: a survey
Abstract
Video super-resolution is currently one of the most active research topics in computer vision as it plays an important role in many visual applications. Generally, video super-resolution contains a significant component, i.e., motion compensation, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SenSys '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems

November 2024

950 pages

ISBN:9798400706974

DOI:10.1145/3666025

Chair:
Jie Liu,
Co-chairs:
Yuanchao Shu,
Jiming Chen,
Program Chair:
Yuan He,
Program Co-chair:
Rui Tan

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC

Conference

SenSys '24

Sponsor:

SenSys '24: 22nd ACM Conference on Embedded Networked Sensor Systems

November 4 - 7, 2024

Hangzhou, China

Acceptance Rates

Overall Acceptance Rate 174 of 867 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
247
Total Downloads

Downloads (Last 12 months)247
Downloads (Last 6 weeks)100

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents