research-article

EarIO: A Low-power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements

Authors:
Ke Li

Cornell University, Ithaca, USA

Cornell University, Ithaca, USA
View Profile

,
Ruidong Zhang

Cornell University, Ithaca, USA

Cornell University, Ithaca, USA
View Profile

,
Bo Liang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
François Guimbretière

Cornell University, Ithaca, USA

Cornell University, Ithaca, USA
View Profile

,
Cheng Zhang

Cornell University, Ithaca, USA

Cornell University, Ithaca, USA
View Profile

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 6 Issue 2Article No.: 62pp 1–24https://doi.org/10.1145/3534621

Published:07 July 2022Publication History

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

This paper presents EarIO, an AI-powered acoustic sensing technology that allows an earable (e.g., earphone) to continuously track facial expressions using two pairs of microphone and speaker (one on each side), which are widely available in commodity earphones. It emits acoustic signals from a speaker on an earable towards the face. Depending on facial expressions, the muscles, tissues, and skin around the ear would deform differently, resulting in unique echo profiles in the reflected signals captured by an on-device microphone. These received acoustic signals are processed and learned by a customized deep learning pipeline to continuously infer the full facial expressions represented by 52 parameters captured using a TruthDepth camera. Compared to similar technologies, it has significantly lower power consumption, as it can sample at 86 Hz with a power signature of 154 mW. A user study with 16 participants under three different scenarios, showed that EarIO can reliably estimate the detailed facial movements when the participants were sitting, walking or after remounting the device. Based on the encouraging results, we further discuss the potential opportunities and challenges on applying EarIO on future ear-mounted wearables.

Supplemental Material

Available for Download

zip

li.zip (9 MB)

Supplemental movie, appendix, image and software files for, EarIO: A Low-power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements

References

Toshiyuki Ando, Yuki Kubo, Buntarou Shizuki, and Shin Takahashi. 2017. Canalsense: Face-related movement recognition system based on sensing air pressure in ear canals. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST). 679--689.Google ScholarDigital Library
Md Tanvir Islam Aumi, Sidhant Gupta, Mayank Goel, Eric Larson, and Shwetak Patel. 2013. DopLink: using the doppler effect for multi-device interaction. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing. 583--586.Google Scholar
Jaekwang Cha, Jinhyuk Kim, and Shiho Kim. 2016. An IR-based facial expression tracking sensor for head-mounted displays. In IEEE SENSORS. IEEE, 1--3.Google Scholar
Tuochao Chen, Yaxuan Li, Songyun Tao, Hyunchul Lim, Mose Sakashita, Ruidong Zhang, François Guimbretière, and Cheng Zhang. 2021. NeckFace: Continuously Tracking Full Facial Expressions on Neck-mounted Wearables. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 5. 1--31.Google ScholarDigital Library
Tuochao Chen, Benjamin Steeper, Kinan Alsheikh, Songyun Tao, François Guimbretière, and Cheng Zhang. 2020. C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST). 112--125.Google ScholarDigital Library
Roddy Cowie, Ellen Douglas-Cowie, Nicolas Tsapatsoulis, George Votsis, Stefanos Kollias, Winfried Fellenz, and John G Taylor. 2001. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18, 1 (2001), 32--80.Google ScholarCross Ref
Lloyd E Emokpae, Stephen DiBenedetto, Brad Potteiger, and Mohamed Younis. 2014. UREAL: Underwater reflection-enabled acoustic-based localization. IEEE Sensors Journal 14, 11 (2014), 3915--3925.Google ScholarCross Ref
Anna Gruebler and Kenji Suzuki. 2010. Measurement of distal EMG signals using a wearable device for reading facial expressions. In Annual International Conference of the IEEE Engineering in Medicine and Biology. IEEE, 4594--4597.Google ScholarCross Ref
Shan He, Shangfei Wang, Wuwei Lan, Huan Fu, and Qiang Ji. 2013. Facial expression recognition using deep Boltzmann machine from thermal infrared images. In Humaine Association Conference on Affective Computing and Intelligent Interaction. IEEE, 239--244.Google ScholarDigital Library
Steven Hickson, Nick Dufour, Avneesh Sud, Vivek Kwatra, and Irfan Essa. 2019. Eyemotion: Classifying facial expressions in VR using eye-tracking cameras. In IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1626--1635.Google ScholarCross Ref
Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1675--1683.Google ScholarCross Ref
Earnest Paul Ijjina and C Krishna Mohan. 2014. Facial expression recognition using kinect depth sensor and convolutional neural networks. In International Conference on Machine Learning and Applications. IEEE, 392--396.Google ScholarDigital Library
Yasha Iravantchi, Yang Zhang, Evi Bernitsas, Mayank Goel, and Chris Harrison. 2019. Interferi: Gesture Sensing Using On-Body Acoustic Interferometry. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1--13.Google ScholarDigital Library
Samira Ebrahimi Kahou, Christopher Pal, Xavier Bouthillier, Pierre Froumenty, Çaglar Gülçehre, Roland Memisevic, Pascal Vincent, Aaron Courville, Yoshua Bengio, Raul Chandias Ferrari, et al. 2013. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the ACM on International Conference on Multimodal Interaction. 543--550.Google ScholarDigital Library
Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10 (2009), 1755--1758.Google ScholarDigital Library
Ying-Hsiu Lai and Shang-Hong Lai. 2018. Emotion-preserving representation learning via generative adversarial network for multi-view facial expression recognition. In IEEE International Conference on Automatic Face & Gesture Recognition (FG). IEEE, 263--270.Google ScholarDigital Library
Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh, Aaron Nicholls, and Chongyang Ma. 2015. Facial performance sensing head-mounted display. ACM Transactions on Graphics (ToG) 34, 4 (2015), 1--9.Google ScholarDigital Library
Jie Lian, Jiadong Lou, Li Chen, and Xu Yuan. 2021. EchoSpot: Spotting Your Locations via Acoustic Sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1--21.Google ScholarDigital Library
Mengyi Liu, Shiguang Shan, Ruiping Wang, and Xilin Chen. 2014. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1749--1756.Google ScholarDigital Library
Ping Liu, Shizhong Han, Zibo Meng, and Yan Tong. 2014. Facial expression recognition via a boosted deep belief network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1805--1812.Google ScholarDigital Library
Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. Lip reading-based user authentication through acoustic sensing on smartphones. IEEE/ACM Transactions on Networking (TON) 27, 1 (2019), 447--460.Google ScholarDigital Library
Rajalakshmi Nandakumar, Shyamnath Gollakota, and Nathaniel Watson. 2015. Contactless sleep apnea detection on smartphones. In Proceedings of the 13th annual international conference on mobile systems, applications, and services. 45--57.Google ScholarDigital Library
Rajalakshmi Nandakumar, Vikram Iyer, Desney Tan, and Shyamnath Gollakota. 2016. Fingerio: Using active sonar for fine-grained finger tracking. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1515--1525.Google ScholarDigital Library
U.S. Department of Health and Human Services. 1998. Criteria for a recommended standard: occupational noise exposure. DHHS (NIOSH) Publication No. 98--126 (1998). https://www.cdc.gov/niosh/docs/98-126/Google Scholar
U.S. Environment Protection Agency Office of Noise Abatement and Control. 1974. Information on levels of environmental noise requisite to protect public health and welfare with adequate margin of safety. EPA/ONAC 550/9-74-004 (1974). http://nepis.epa.gov/Exe/ZyPDF.cgi/2000L3LN.PDF?Dockey=2000L3LN.PDFGoogle Scholar
Ville Rantanen, Pekka-Henrik Niemenlehto, Jarmo Verho, and Jukka Lekkala. 2010. Capacitive facial movement detection for human-computer interaction to click by frowning and lifting eyebrows. Medical & biological engineering & computing 48, 1 (2010), 39--47.Google Scholar
Marc'Aurelio Ranzato, Joshua Susskind, Volodymyr Mnih, and Geoffrey Hinton. 2011. On deep generative models with applications to recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2857--2864.Google ScholarDigital Library
Salah Rifai, Yoshua Bengio, Aaron Courville, Pascal Vincent, and Mehdi Mirza. 2012. Disentangling factors of variation for facial expression recognition. In European Conference on Computer Vision. Springer, 808--822.Google ScholarDigital Library
James A Russell. 1994. Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychological bulletin 115, 1 (1994), 102.Google Scholar
Nicu Sebe, Michael S Lew, Yafei Sun, Ira Cohen, Theo Gevers, and Thomas S Huang. 2007. Authentic facial expression analysis. Image and Vision Computing 25, 12 (2007), 1856--1863.Google ScholarDigital Library
Ke Sun, Ting Zhao, Wei Wang, and Lei Xie. 2018. Vskin: Sensing touch gestures on surfaces of mobile devices using acoustic signals. In Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom). 591--605.Google ScholarDigital Library
Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34, 6 (2015), 183--1.Google ScholarDigital Library
Dhruv Verma, Sejal Bhalla, Dhruv Sahnan, Jainendra Shukla, and Aman Parnami. 2021. ExpressEar: Sensing Fine-Grained Facial Expressions with Earables. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 5. 1--28.Google ScholarDigital Library
Tianben Wang, Daqing Zhang, Yuanqing Zheng, Tao Gu, Xingshe Zhou, and Bernadette Dorizzi. 2018. C-FMCW based contactless respiration detection using acoustic signal. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 1--20.Google ScholarDigital Library
Wei Wang, Alex X Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom). 82--94.Google Scholar
Chenglei Wu, Derek Bradley, Markus Gross, and Thabo Beeler. 2016. An anatomically-constrained local deformation model for monocular face capture. ACM transactions on graphics (TOG) 35, 4 (2016), 1--12.Google Scholar
Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2129--2138.Google ScholarCross Ref
Yi Wu, Vimal Kakaraparthi, Zhuohang Li, Tien Pham, Jian Liu, and Phuc Nguyen. 2021. BioFace-3D: Continuous 3d Facial Reconstruction through Lightweight Single-Ear Biosensors. In Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom). 350--363.Google ScholarDigital Library
Wentao Xie, Qian Zhang, and Jin Zhang. 2021. Acoustic-Based Upper Facial Action Recognition for Smart Eyewear. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 5. 1--28.Google ScholarDigital Library
Xuhai Xu, Haitian Shi, Xin Yi, Wenjia Liu, Yukang Yan, Yuanchun Shi, Alex Mariakakis, Jennifer Mankoff, and Anind K Dey. 2020. EarBuddy: Enabling On-Face Interaction via Wireless Earbuds. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1--14.Google ScholarDigital Library
Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proceedings of the Annual International Conference on Mobile Systems, Applications, and Services. 15--28.Google ScholarDigital Library
Cheng Zhang, Qiuyue Xue, Anandghan Waghmare, Ruichen Meng, Sumeet Jain, Yizeng Han, Xinyu Li, Kenneth Cunefare, Thomas Ploetz, Thad Starner, et al. 2018. FingerPing: Recognizing fine-grained hand poses using active acoustic on-body sensing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--10.Google ScholarDigital Library
Ruidong Zhang, Mingyang Chen, Benjamin Steeper, Yaxuan Li, Zihan Yan, Yizhuo Chen, Songyun Tao, Tuochao Chen, Hyunchul Lim, and Cheng Zhang. 2021. SpeeChin: A Smart Necklace for Silent Speech Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 5, 4 (2021), 1--23.Google ScholarDigital Library
Yongzhao Zhang, Wei-Hsiang Huang, Chih-Yun Yang, Wen-Ping Wang, Yi-Chao Chen, Chuang-Wen You, Da-Yuan Huang, Guangtao Xue, and Jiadi Yu. 2020. Endophasia: Utilizing Acoustic-Based Imaging for Issuing Contact-Free Silent Speech Commands. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 4, 1 (2020), 1--26.Google ScholarDigital Library
Yunting Zhang, Jiliang Wang, Weiyi Wang, Zhao Wang, and Yunhao Liu. 2018. Vernier: Accurate and fast acoustic motion tracking using mobile devices. In IEEE International Conference on Computer Communications (INFOCOM). IEEE, 1709--1717.Google ScholarDigital Library

Index Terms

EarIO: A Low-power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements
1. Hardware
  1. Power and energy
2. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile devices

Recommendations

Active bone-conducted sound sensing for wearable interfaces
UIST '11 Adjunct: Proceedings of the 24th annual ACM symposium adjunct on User interface software and technology

In this paper, we propose a wearable sensor system that measures an angle of an elbow and position tapped by finger using bone-conducted sound. Our system consists of two microphones and a speaker, and they are attached on forearm. A novelty of this ...
Read More
EarCase: Sound Source Localization Leveraging Mini Acoustic Structure Equipped Phone Cases for Hearing-challenged People
MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Sound source localization is vital for daily tasks such as communication or navigating environments. However, millions of adults struggle with hearing impairment, which limits their ability to identify the direction and distance of sound sources. ...
Read More
EarHealth: an earphone-based acoustic otoscope for detection of multiple ear diseases in daily life
MobiSys '22: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services

With the aging of the population and the long-time wearing of earphones, hearing health has gradually emerged as a worldwide health issue. Early detection of hearing health conditions would greatly reduce potential risks with timely medical ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 6, Issue 2
July 2022
1551 pages
EISSN:2474-9567
DOI:10.1145/3547347
Issue’s Table of Contents

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2022
Published in imwut Volume 6, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Acoustic sensing
Deep learning
Facial expression reconstruction
Low-power
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 1,434
  Total Downloads
- Downloads (Last 12 months)471
- Downloads (Last 6 weeks)56
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EarIO: A Low-power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Active bone-conducted sound sensing for wearable interfaces

EarCase: Sound Source Localization Leveraging Mini Acoustic Structure Equipped Phone Cases for Hearing-challenged People

EarHealth: an earphone-based acoustic otoscope for detection of multiple ear diseases in daily life

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

EarIO: A Low-power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Active bone-conducted sound sensing for wearable interfaces

EarCase: Sound Source Localization Leveraging Mini Acoustic Structure Equipped Phone Cases for Hearing-challenged People

EarHealth: an earphone-based acoustic otoscope for detection of multiple ear diseases in daily life

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media