Tabular insights, visual impacts: transferring expertise from tables to images
Article No.: 883, Pages 21988 - 22009
Abstract
Transferring knowledge across diverse data modalities is receiving increasing attention in machine learning. This paper tackles the task of leveraging expert-derived, yet expensive, tabular data to enhance image-based predictions when tabular data is unavailable during inference. The primary challenges stem from the inherent complexity of accurately mapping diverse tabular data to visual contexts, coupled with the necessity to devise distinct strategies for numerical and categorical tabular attributes. We propose CHannel tAbulaR alignment with optiMal tranSport (CHARMS), which establishes an alignment between image channels and tabular attributes, enabling selective knowledge transfer that is pertinent to visual features. Specifically, CHARMS measures similarity distributions across modalities to effectively differentiate and transfer relevant tabular features, with a focus on morphological characteristics, enhancing the capabilities of visual classifiers. By maximizing the mutual information between image channels and tabular features, knowledge from both numerical and categorical tabular attributes are extracted. Experimental results demonstrate that CHARMS not only enhances the performance of image classifiers but also improves their interpretability by effectively utilizing tabular knowledge.
References
[1]
Arik, S. Ö. and Pfister, T. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6679-6687, 2021.
[2]
Artemiou, A. Using mutual information to measure the predictive power of principal components. In Festschrift in Honor of R. Dennis Cook: Fifty Years of Contribution to Statistical Science, pp. 1-16. Springer, 2021.
[3]
Baltescu, P., Chen, H., Pancha, N., Zhai, A., Leskovec, J., and Rosenberg, C. Itemsage: Learning product embeddings for shopping recommendations at pinterest. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2703-2711, 2022.
[4]
Baltrušaitis, T., Ahuja, C., and Morency, L.-P. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41: 423-443, 2018.
[5]
Bao, H., Wang, W., Dong, L., Liu, Q., Mohammed, O. K., Aggarwal, K., Som, S., Piao, S., and Wei, F. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. Advances in Neural Information Processing Systems, 35:32897-32912, 2022.
[6]
Belghazi, M. I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Courville, A., and Hjelm, R. D. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062, 2018.
[7]
Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., and Peyré, G. Iterative bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37:A1111-A1138, 2015.
[8]
Bonneel, N., Van De Panne, M., Paris, S., and Heidrich, W. Displacement interpolation using lagrangian mass transport. In Proceedings of the 2011 SIGGRAPH Asia conference, pp. 1-12, 2011.
[9]
Breiman, L. Random forests. Machine learning, 45:5-32, 2001.
[10]
Caffarelli, L. A. and McCann, R. J. Free boundaries in optimal transport and monge-ampere obstacle problems. Annals of Mathematics, 171:673-730, 2010.
[11]
Cai, L., Wang, Z., Gao, H., Shen, D., and Ji, S. Deep adversarial learning for multi-modality missing data completion. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1158-1166, 2018.
[12]
Chen, T. and Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785-794, 2016.
[13]
Gorishniy, Y., Rubachev, I., Khrulkov, V., and Babenko, A. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34: 18932-18943, 2021.
[14]
Hager, P., Menten, M. J., and Rueckert, D. Best of both worlds: Multimodal contrastive learning with tabular and imaging data. arXiv preprint arXiv:2303.14080, 2023.
[15]
Han, Z., Yang, F., Huang, J., Zhang, C., and Yao, J. Multimodal dynamics: Dynamical fusion for trustworthy multimodal classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20707-20717, 2022.
[16]
Hegselmann, S., Buendia, A., Lang, H., Agrawal, M., Jiang, X., and Sontag, D. Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics, pp. 5549-5581, 2023.
[17]
Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
[18]
Hollmann, N., Müller, S., Eggensperger, K., and Hutter, F. Tabpfn: A transformer that solves small tabular classification problems in a second. arXiv preprint arXiv:2207.01848, 2022.
[19]
Huang, J., Chen, B., Luo, L., Yue, S., and Ounis, I. Dvmcar: A large-scale automotive dataset for visual marketing research and applications. In 2022 IEEE International Conference on Big Data, pp. 4140-4147, 2022.
[20]
Huang, X., Khetan, A., Cvitkovic, M., and Karnin, Z. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678, 2020.
[21]
Huang, Z., Xu, X., Ni, J., Zhu, H., and Wang, C. Multimodal representation learning for recommendation in internet of things. IEEE Internet of Things Journal, 6:10675-10685, 2019.
[22]
Huang, Z., Zeng, Z., Huang, Y., Liu, B., Fu, D., and Fu, J. Seeing out of the box: End-to-end pre-training for vision-language representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12976-12985, 2021.
[23]
Jeffares, A., Liu, T., Crabbé, J., Imrie, F., and van der Schaar, M. Tangos: Regularizing tabular neural networks through gradient orthogonalization and specialization. arXiv preprint arXiv:2303.05506, 2023.
[24]
Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., and Duerig, T. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pp. 4904-4916, 2021.
[25]
Jing, C., Wu, Y., Zhang, X., Jia, Y., and Wu, Q. Overcoming language priors in vqa via decomposed linguistic representations. In Proceedings of the AAAI conference on artificial intelligence, pp. 11181-11188, 2020.
[26]
Karpathy, A. and Fei-Fei, L. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, 2015.
[27]
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 2017.
[28]
Kimball, R. and Ross, M. The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons, 2011.
[29]
Li, G., Duan, N., Fang, Y., Gong, M., and Jiang, D. Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11336-11344, 2020a.
[30]
Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., and Hoi, S. C. H. Align before fuse: Vision and language representation learning with momentum distillation. Advances in Neural Information Processing Systems, 34: 9694-9705, 2021.
[31]
Li, J., Li, D., Xiong, C., and Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pp. 12888-12900, 2022.
[32]
Li, L., Du, B., Wang, Y., Qin, L., and Tan, H. Estimation of missing values in heterogeneous traffic data: Application of multimodal deep learning model. Knowledge-Based Systems, 194:105592, 2020b.
[33]
Li, L. H., Yatskar, M., Yin, D., Hsieh, C.-J., and Chang, K.-W. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557, 2019.
[34]
Liang, V. W., Zhang, Y., Kwon, Y., Yeung, S., and Zou, J. Y. Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning. Advances in Neural Information Processing Systems, 35: 17612-17625, 2022.
[35]
Liang, Y., Duan, N., Gong, Y., Wu, N., Guo, F., Qi, W., Gong, M., Shou, L., Jiang, D., Cao, G., et al. Xglue: A new benchmark datasetfor cross-lingual pre-training, understanding and generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6008-6018, 2020.
[36]
Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision, pp. 3730-3738, 2015.
[37]
Lloyd, S. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129-137, 1982.
[38]
Ma, M., Ren, J., Zhao, L., Tulyakov, S., Wu, C., and Peng, X. Smil: Multimodal learning with severely missing modality. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 2302-2310, 2021.
[39]
MacQueen, J. Classification and analysis of multivariate observations. In 5th Berkeley Symp. Math. Statist. Probability, pp. 281-297, 1967.
[40]
McKinney, W. et al. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, pp. 51-56, 2010.
[41]
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning, pp. 689-696, 2011.
[42]
Pan, Y., Liu, M., Xia, Y., and Shen, D. Disease-imagespecific learning for diagnosis-oriented neuroimage synthesis with incomplete multi-modality data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44: 6839-6853, 2021.
[43]
Patterson, G., Xu, C., Su, H., and Hays, J. The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108:59-81, 2014.
[44]
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A. Catboost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31, 2018.
[45]
Quinlan, J. R. Induction of decision trees. Machine Learning, 1:81-106, 1986.
[46]
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748-8763, 2021.
[47]
Ramachandram, D. and Taylor, G. W. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing Magazine, 34:96-108, 2017.
[48]
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115: 211-252, 2015.
[49]
Salah, A., Truong, Q.-T., and Lauw, H. W. Cornac: A comparative framework for multimodal recommender systems. The Journal of Machine Learning Research, 21: 3803-3807, 2020.
[50]
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618-626, 2017.
[51]
Shwartz-Ziv, R. and Armon, A. Tabular data: Deep learning is not all you need. Information Fusion, 81:84-90, 2022.
[52]
Su, W., Zhu, X., Cao, Y., Li, B., Lu, L., Wei, F., and Dai, J. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530, 2019.
[53]
Tan, H. and Bansal, M. Lxmert: Learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490, 2019.
[54]
Tsai, Y.-H. H., Liang, P. P., Zadeh, A., Morency, L.-P., and Salakhutdinov, R. Learning factorized multimodal representations. arXiv preprint arXiv:1806.06176, 2018.
[55]
van Breukelen, M., Duin, R. P., Tax, D. M., and Den Hartog, J. Handwritten digit recognition by combined classifiers. Kybernetika, 34:381-386, 1998.
[56]
Van der Maaten, L. and Hinton, G. Visualizing data using t-sne. Journal of Machine Learning Research, 9(11), 2008.
[57]
Vapnik, V. N. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10:988-999, 1999.
[58]
Wang, K., Yin, Q., Wang, W., Wu, S., and Wang, L. A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215, 2016.
[59]
Wang, Q., Zhan, L., Thompson, P., and Zhou, J. Multimodal learning with incomplete modalities by knowledge distillation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1828-1838, 2020.
[60]
Wang, Z. and Sun, J. Transtab: Learning transferable tabular transformers across tables. arXiv preprint arXiv:2205.09328, 2022.
[61]
Xue, Z., Gao, Z., Ren, S., and Zhao, H. The modality focusing hypothesis: Towards understanding crossmodal knowledge distillation. arXiv preprint arXiv:2206.06487, 2022.
[62]
Yan, J., Chen, J., Wu, Y., Chen, D. Z., and Wu, J. T2gformer: organizing tabular features into relation graphs promotes heterogeneous feature interaction. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10720-10728, 2023.
[63]
Yang, Y., Zhan, D.-C., Fan, Y., Jiang, Y., and Zhou, Z.-H. Deep learning for fixed model reuse. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2831-2837, 2017.
[64]
Yang, Y., Zhan, D.-C., Jiang, Y., and Xiong, H. Reliable multi-modal learning: A survey. Journal of Software, 32: 1067-1081, 2020.
[65]
Yang, Y., Wei, H., Zhu, H., Yu, D., Xiong, H., and Yang, J. Exploiting cross-modal prediction and relation consistency for semisupervised image captioning. IEEE Transactions on Cybernetics, 54(2):890-902, 2024.
[66]
Yao, Y., Zhang, A., Zhang, Z., Liu, Z., Chua, T.-S., and Sun, M. Cpt: Colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797, 2021.
[67]
Ye, H.-J., Zhan, D.-C., Li, X., Huang, Z.-C., and Jiang, Y. College student scholarships and subsidies granting: A multi-modal multi-label approach. In 2016 IEEE 16th International Conference on Data Mining, pp. 559-568, 2016.
[68]
Ye, H.-J., Zhan, D.-C., Jiang, Y., and Zhou, Z.-H. Rectify heterogeneous models with semantic mapping. In International Conference on Machine Learning, pp. 5630-5639, 2018.
[69]
Zeiler, M. D. and Fergus, R. Visualizing and understanding convolutional networks. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 818-833, 2014.
[70]
Zhang, C., Chu, X., Ma, L., Zhu, Y., Wang, Y., Wang, J., and Zhao, J. M3care: Learning with missing modalities in multimodal healthcare data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2418-2428, 2022.
[71]
Zheng, C., Guo, Q., and Kordjamshidi, P. Cross-modality relevance for reasoning on language and vision. arXiv preprint arXiv:2005.06035, 2020.
[72]
Zhou, Z.-H. A brief introduction to weakly supervised learning. National Science Review, 5:44-53, 2018.
Index Terms
- Tabular insights, visual impacts: transferring expertise from tables to images
Index terms have been assigned to the content through auto-classification.
Recommendations
Tabular Web Data: Schema Discovery and Integration
DaWaK 2013: Proceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery - Volume 8057Web data such as web tables, lists, and data records from a wide variety of domains can be combined for different purposes such as querying for information and creating example data sets. Tabular web data location, extraction, and schema discovery and ...
Defining the meaning of tabular mathematical expressions
Mathematical expressions in tabular form (also called ''tabular expressions'' or ''tables'') have been shown to be useful for documenting and analysing software systems. They are usually easier to read than conventional mathematical expressions but are ...
Comments
Information & Contributors
Information
Published In
July 2024
63010 pages
Copyright © 2024.
Publisher
JMLR.org
Publication History
Published: 03 January 2025
Qualifiers
- Research-article
- Research
- Refereed limited
Acceptance Rates
Overall Acceptance Rate 140 of 548 submissions, 26%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025