Abstract
This paper presents EarIO, an AI-powered acoustic sensing technology that allows an earable (e.g., earphone) to continuously track facial expressions using two pairs of microphone and speaker (one on each side), which are widely available in commodity earphones. It emits acoustic signals from a speaker on an earable towards the face. Depending on facial expressions, the muscles, tissues, and skin around the ear would deform differently, resulting in unique echo profiles in the reflected signals captured by an on-device microphone. These received acoustic signals are processed and learned by a customized deep learning pipeline to continuously infer the full facial expressions represented by 52 parameters captured using a TruthDepth camera. Compared to similar technologies, it has significantly lower power consumption, as it can sample at 86 Hz with a power signature of 154 mW. A user study with 16 participants under three different scenarios, showed that EarIO can reliably estimate the detailed facial movements when the participants were sitting, walking or after remounting the device. Based on the encouraging results, we further discuss the potential opportunities and challenges on applying EarIO on future ear-mounted wearables.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, EarIO: A Low-power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements
- Toshiyuki Ando, Yuki Kubo, Buntarou Shizuki, and Shin Takahashi. 2017. Canalsense: Face-related movement recognition system based on sensing air pressure in ear canals. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST). 679--689.Google ScholarDigital Library
- Md Tanvir Islam Aumi, Sidhant Gupta, Mayank Goel, Eric Larson, and Shwetak Patel. 2013. DopLink: using the doppler effect for multi-device interaction. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing. 583--586.Google Scholar
- Jaekwang Cha, Jinhyuk Kim, and Shiho Kim. 2016. An IR-based facial expression tracking sensor for head-mounted displays. In IEEE SENSORS. IEEE, 1--3.Google Scholar
- Tuochao Chen, Yaxuan Li, Songyun Tao, Hyunchul Lim, Mose Sakashita, Ruidong Zhang, François Guimbretière, and Cheng Zhang. 2021. NeckFace: Continuously Tracking Full Facial Expressions on Neck-mounted Wearables. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 5. 1--31.Google ScholarDigital Library
- Tuochao Chen, Benjamin Steeper, Kinan Alsheikh, Songyun Tao, François Guimbretière, and Cheng Zhang. 2020. C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST). 112--125.Google ScholarDigital Library
- Roddy Cowie, Ellen Douglas-Cowie, Nicolas Tsapatsoulis, George Votsis, Stefanos Kollias, Winfried Fellenz, and John G Taylor. 2001. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18, 1 (2001), 32--80.Google ScholarCross Ref
- Lloyd E Emokpae, Stephen DiBenedetto, Brad Potteiger, and Mohamed Younis. 2014. UREAL: Underwater reflection-enabled acoustic-based localization. IEEE Sensors Journal 14, 11 (2014), 3915--3925.Google ScholarCross Ref
- Anna Gruebler and Kenji Suzuki. 2010. Measurement of distal EMG signals using a wearable device for reading facial expressions. In Annual International Conference of the IEEE Engineering in Medicine and Biology. IEEE, 4594--4597.Google ScholarCross Ref
- Shan He, Shangfei Wang, Wuwei Lan, Huan Fu, and Qiang Ji. 2013. Facial expression recognition using deep Boltzmann machine from thermal infrared images. In Humaine Association Conference on Affective Computing and Intelligent Interaction. IEEE, 239--244.Google ScholarDigital Library
- Steven Hickson, Nick Dufour, Avneesh Sud, Vivek Kwatra, and Irfan Essa. 2019. Eyemotion: Classifying facial expressions in VR using eye-tracking cameras. In IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1626--1635.Google ScholarCross Ref
- Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1675--1683.Google ScholarCross Ref
- Earnest Paul Ijjina and C Krishna Mohan. 2014. Facial expression recognition using kinect depth sensor and convolutional neural networks. In International Conference on Machine Learning and Applications. IEEE, 392--396.Google ScholarDigital Library
- Yasha Iravantchi, Yang Zhang, Evi Bernitsas, Mayank Goel, and Chris Harrison. 2019. Interferi: Gesture Sensing Using On-Body Acoustic Interferometry. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1--13.Google ScholarDigital Library
- Samira Ebrahimi Kahou, Christopher Pal, Xavier Bouthillier, Pierre Froumenty, Çaglar Gülçehre, Roland Memisevic, Pascal Vincent, Aaron Courville, Yoshua Bengio, Raul Chandias Ferrari, et al. 2013. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the ACM on International Conference on Multimodal Interaction. 543--550.Google ScholarDigital Library
- Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10 (2009), 1755--1758.Google ScholarDigital Library
- Ying-Hsiu Lai and Shang-Hong Lai. 2018. Emotion-preserving representation learning via generative adversarial network for multi-view facial expression recognition. In IEEE International Conference on Automatic Face & Gesture Recognition (FG). IEEE, 263--270.Google ScholarDigital Library
- Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh, Aaron Nicholls, and Chongyang Ma. 2015. Facial performance sensing head-mounted display. ACM Transactions on Graphics (ToG) 34, 4 (2015), 1--9.Google ScholarDigital Library
- Jie Lian, Jiadong Lou, Li Chen, and Xu Yuan. 2021. EchoSpot: Spotting Your Locations via Acoustic Sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1--21.Google ScholarDigital Library
- Mengyi Liu, Shiguang Shan, Ruiping Wang, and Xilin Chen. 2014. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1749--1756.Google ScholarDigital Library
- Ping Liu, Shizhong Han, Zibo Meng, and Yan Tong. 2014. Facial expression recognition via a boosted deep belief network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1805--1812.Google ScholarDigital Library
- Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. Lip reading-based user authentication through acoustic sensing on smartphones. IEEE/ACM Transactions on Networking (TON) 27, 1 (2019), 447--460.Google ScholarDigital Library
- Rajalakshmi Nandakumar, Shyamnath Gollakota, and Nathaniel Watson. 2015. Contactless sleep apnea detection on smartphones. In Proceedings of the 13th annual international conference on mobile systems, applications, and services. 45--57.Google ScholarDigital Library
- Rajalakshmi Nandakumar, Vikram Iyer, Desney Tan, and Shyamnath Gollakota. 2016. Fingerio: Using active sonar for fine-grained finger tracking. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1515--1525.Google ScholarDigital Library
- U.S. Department of Health and Human Services. 1998. Criteria for a recommended standard: occupational noise exposure. DHHS (NIOSH) Publication No. 98--126 (1998). https://www.cdc.gov/niosh/docs/98-126/Google Scholar
- U.S. Environment Protection Agency Office of Noise Abatement and Control. 1974. Information on levels of environmental noise requisite to protect public health and welfare with adequate margin of safety. EPA/ONAC 550/9-74-004 (1974). http://nepis.epa.gov/Exe/ZyPDF.cgi/2000L3LN.PDF?Dockey=2000L3LN.PDFGoogle Scholar
- Ville Rantanen, Pekka-Henrik Niemenlehto, Jarmo Verho, and Jukka Lekkala. 2010. Capacitive facial movement detection for human-computer interaction to click by frowning and lifting eyebrows. Medical & biological engineering & computing 48, 1 (2010), 39--47.Google Scholar
- Marc'Aurelio Ranzato, Joshua Susskind, Volodymyr Mnih, and Geoffrey Hinton. 2011. On deep generative models with applications to recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2857--2864.Google ScholarDigital Library
- Salah Rifai, Yoshua Bengio, Aaron Courville, Pascal Vincent, and Mehdi Mirza. 2012. Disentangling factors of variation for facial expression recognition. In European Conference on Computer Vision. Springer, 808--822.Google ScholarDigital Library
- James A Russell. 1994. Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychological bulletin 115, 1 (1994), 102.Google Scholar
- Nicu Sebe, Michael S Lew, Yafei Sun, Ira Cohen, Theo Gevers, and Thomas S Huang. 2007. Authentic facial expression analysis. Image and Vision Computing 25, 12 (2007), 1856--1863.Google ScholarDigital Library
- Ke Sun, Ting Zhao, Wei Wang, and Lei Xie. 2018. Vskin: Sensing touch gestures on surfaces of mobile devices using acoustic signals. In Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom). 591--605.Google ScholarDigital Library
- Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34, 6 (2015), 183--1.Google ScholarDigital Library
- Dhruv Verma, Sejal Bhalla, Dhruv Sahnan, Jainendra Shukla, and Aman Parnami. 2021. ExpressEar: Sensing Fine-Grained Facial Expressions with Earables. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 5. 1--28.Google ScholarDigital Library
- Tianben Wang, Daqing Zhang, Yuanqing Zheng, Tao Gu, Xingshe Zhou, and Bernadette Dorizzi. 2018. C-FMCW based contactless respiration detection using acoustic signal. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 1--20.Google ScholarDigital Library
- Wei Wang, Alex X Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom). 82--94.Google Scholar
- Chenglei Wu, Derek Bradley, Markus Gross, and Thabo Beeler. 2016. An anatomically-constrained local deformation model for monocular face capture. ACM transactions on graphics (TOG) 35, 4 (2016), 1--12.Google Scholar
- Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2129--2138.Google ScholarCross Ref
- Yi Wu, Vimal Kakaraparthi, Zhuohang Li, Tien Pham, Jian Liu, and Phuc Nguyen. 2021. BioFace-3D: Continuous 3d Facial Reconstruction through Lightweight Single-Ear Biosensors. In Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom). 350--363.Google ScholarDigital Library
- Wentao Xie, Qian Zhang, and Jin Zhang. 2021. Acoustic-Based Upper Facial Action Recognition for Smart Eyewear. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 5. 1--28.Google ScholarDigital Library
- Xuhai Xu, Haitian Shi, Xin Yi, Wenjia Liu, Yukang Yan, Yuanchun Shi, Alex Mariakakis, Jennifer Mankoff, and Anind K Dey. 2020. EarBuddy: Enabling On-Face Interaction via Wireless Earbuds. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1--14.Google ScholarDigital Library
- Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proceedings of the Annual International Conference on Mobile Systems, Applications, and Services. 15--28.Google ScholarDigital Library
- Cheng Zhang, Qiuyue Xue, Anandghan Waghmare, Ruichen Meng, Sumeet Jain, Yizeng Han, Xinyu Li, Kenneth Cunefare, Thomas Ploetz, Thad Starner, et al. 2018. FingerPing: Recognizing fine-grained hand poses using active acoustic on-body sensing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--10.Google ScholarDigital Library
- Ruidong Zhang, Mingyang Chen, Benjamin Steeper, Yaxuan Li, Zihan Yan, Yizhuo Chen, Songyun Tao, Tuochao Chen, Hyunchul Lim, and Cheng Zhang. 2021. SpeeChin: A Smart Necklace for Silent Speech Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 5, 4 (2021), 1--23.Google ScholarDigital Library
- Yongzhao Zhang, Wei-Hsiang Huang, Chih-Yun Yang, Wen-Ping Wang, Yi-Chao Chen, Chuang-Wen You, Da-Yuan Huang, Guangtao Xue, and Jiadi Yu. 2020. Endophasia: Utilizing Acoustic-Based Imaging for Issuing Contact-Free Silent Speech Commands. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 4, 1 (2020), 1--26.Google ScholarDigital Library
- Yunting Zhang, Jiliang Wang, Weiyi Wang, Zhao Wang, and Yunhao Liu. 2018. Vernier: Accurate and fast acoustic motion tracking using mobile devices. In IEEE International Conference on Computer Communications (INFOCOM). IEEE, 1709--1717.Google ScholarDigital Library
Index Terms
- EarIO: A Low-power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements
Recommendations
Active bone-conducted sound sensing for wearable interfaces
UIST '11 Adjunct: Proceedings of the 24th annual ACM symposium adjunct on User interface software and technologyIn this paper, we propose a wearable sensor system that measures an angle of an elbow and position tapped by finger using bone-conducted sound. Our system consists of two microphones and a speaker, and they are attached on forearm. A novelty of this ...
EarCase: Sound Source Localization Leveraging Mini Acoustic Structure Equipped Phone Cases for Hearing-challenged People
MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile ComputingSound source localization is vital for daily tasks such as communication or navigating environments. However, millions of adults struggle with hearing impairment, which limits their ability to identify the direction and distance of sound sources. ...
EarHealth: an earphone-based acoustic otoscope for detection of multiple ear diseases in daily life
MobiSys '22: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and ServicesWith the aging of the population and the long-time wearing of earphones, hearing health has gradually emerged as a worldwide health issue. Early detection of hearing health conditions would greatly reduce potential risks with timely medical ...
Comments