research-article

Tabular insights, visual impacts: transferring expertise from tables to images

AUTHORs: Jun-Peng Jiang, Han-Jia Ye, Leye Wang, Yang Yang, Yuan Jiang, De-Chuan ZhanAuthors Info & Claims

ICML'24: Proceedings of the 41st International Conference on Machine Learning

Article No.: 883, Pages 21988 - 22009

Published: 03 January 2025 Publication History

Abstract

Transferring knowledge across diverse data modalities is receiving increasing attention in machine learning. This paper tackles the task of leveraging expert-derived, yet expensive, tabular data to enhance image-based predictions when tabular data is unavailable during inference. The primary challenges stem from the inherent complexity of accurately mapping diverse tabular data to visual contexts, coupled with the necessity to devise distinct strategies for numerical and categorical tabular attributes. We propose CHannel tAbulaR alignment with optiMal tranSport (CHARMS), which establishes an alignment between image channels and tabular attributes, enabling selective knowledge transfer that is pertinent to visual features. Specifically, CHARMS measures similarity distributions across modalities to effectively differentiate and transfer relevant tabular features, with a focus on morphological characteristics, enhancing the capabilities of visual classifiers. By maximizing the mutual information between image channels and tabular features, knowledge from both numerical and categorical tabular attributes are extracted. Experimental results demonstrate that CHARMS not only enhances the performance of image classifiers but also improves their interpretability by effectively utilizing tabular knowledge.

References

[1]

Arik, S. Ö. and Pfister, T. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6679-6687, 2021.

[2]

Artemiou, A. Using mutual information to measure the predictive power of principal components. In Festschrift in Honor of R. Dennis Cook: Fifty Years of Contribution to Statistical Science, pp. 1-16. Springer, 2021.

[3]

Baltescu, P., Chen, H., Pancha, N., Zhai, A., Leskovec, J., and Rosenberg, C. Itemsage: Learning product embeddings for shopping recommendations at pinterest. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2703-2711, 2022.

Digital Library

[4]

Baltrušaitis, T., Ahuja, C., and Morency, L.-P. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41: 423-443, 2018.

Digital Library

[5]

Bao, H., Wang, W., Dong, L., Liu, Q., Mohammed, O. K., Aggarwal, K., Som, S., Piao, S., and Wei, F. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. Advances in Neural Information Processing Systems, 35:32897-32912, 2022.

[6]

Belghazi, M. I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Courville, A., and Hjelm, R. D. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062, 2018.

[7]

Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., and Peyré, G. Iterative bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37:A1111-A1138, 2015.

Digital Library

[8]

Bonneel, N., Van De Panne, M., Paris, S., and Heidrich, W. Displacement interpolation using lagrangian mass transport. In Proceedings of the 2011 SIGGRAPH Asia conference, pp. 1-12, 2011.

Digital Library

[9]

Breiman, L. Random forests. Machine learning, 45:5-32, 2001.

Digital Library

[10]

Caffarelli, L. A. and McCann, R. J. Free boundaries in optimal transport and monge-ampere obstacle problems. Annals of Mathematics, 171:673-730, 2010.

[11]

Cai, L., Wang, Z., Gao, H., Shen, D., and Ji, S. Deep adversarial learning for multi-modality missing data completion. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1158-1166, 2018.

Digital Library

[12]

Chen, T. and Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785-794, 2016.

Digital Library

[13]

Gorishniy, Y., Rubachev, I., Khrulkov, V., and Babenko, A. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34: 18932-18943, 2021.

[14]

Hager, P., Menten, M. J., and Rueckert, D. Best of both worlds: Multimodal contrastive learning with tabular and imaging data. arXiv preprint arXiv:2303.14080, 2023.

[15]

Han, Z., Yang, F., Huang, J., Zhang, C., and Yao, J. Multimodal dynamics: Dynamical fusion for trustworthy multimodal classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20707-20717, 2022.

[16]

Hegselmann, S., Buendia, A., Lang, H., Agrawal, M., Jiang, X., and Sontag, D. Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics, pp. 5549-5581, 2023.

[17]

Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.

[18]

Hollmann, N., Müller, S., Eggensperger, K., and Hutter, F. Tabpfn: A transformer that solves small tabular classification problems in a second. arXiv preprint arXiv:2207.01848, 2022.

[19]

Huang, J., Chen, B., Luo, L., Yue, S., and Ounis, I. Dvmcar: A large-scale automotive dataset for visual marketing research and applications. In 2022 IEEE International Conference on Big Data, pp. 4140-4147, 2022.

[20]

Huang, X., Khetan, A., Cvitkovic, M., and Karnin, Z. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678, 2020.

[21]

Huang, Z., Xu, X., Ni, J., Zhu, H., and Wang, C. Multimodal representation learning for recommendation in internet of things. IEEE Internet of Things Journal, 6:10675-10685, 2019.

[22]

Huang, Z., Zeng, Z., Huang, Y., Liu, B., Fu, D., and Fu, J. Seeing out of the box: End-to-end pre-training for vision-language representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12976-12985, 2021.

[23]

Jeffares, A., Liu, T., Crabbé, J., Imrie, F., and van der Schaar, M. Tangos: Regularizing tabular neural networks through gradient orthogonalization and specialization. arXiv preprint arXiv:2303.05506, 2023.

[24]

Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., and Duerig, T. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pp. 4904-4916, 2021.

[25]

Jing, C., Wu, Y., Zhang, X., Jia, Y., and Wu, Q. Overcoming language priors in vqa via decomposed linguistic representations. In Proceedings of the AAAI conference on artificial intelligence, pp. 11181-11188, 2020.

[26]

Karpathy, A. and Fei-Fei, L. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, 2015.

[27]

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 2017.

[28]

Kimball, R. and Ross, M. The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons, 2011.

Digital Library

[29]

Li, G., Duan, N., Fang, Y., Gong, M., and Jiang, D. Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11336-11344, 2020a.

[30]

Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., and Hoi, S. C. H. Align before fuse: Vision and language representation learning with momentum distillation. Advances in Neural Information Processing Systems, 34: 9694-9705, 2021.

[31]

Li, J., Li, D., Xiong, C., and Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pp. 12888-12900, 2022.

[32]

Li, L., Du, B., Wang, Y., Qin, L., and Tan, H. Estimation of missing values in heterogeneous traffic data: Application of multimodal deep learning model. Knowledge-Based Systems, 194:105592, 2020b.

[33]

Li, L. H., Yatskar, M., Yin, D., Hsieh, C.-J., and Chang, K.-W. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557, 2019.

[34]

Liang, V. W., Zhang, Y., Kwon, Y., Yeung, S., and Zou, J. Y. Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning. Advances in Neural Information Processing Systems, 35: 17612-17625, 2022.

[35]

Liang, Y., Duan, N., Gong, Y., Wu, N., Guo, F., Qi, W., Gong, M., Shou, L., Jiang, D., Cao, G., et al. Xglue: A new benchmark datasetfor cross-lingual pre-training, understanding and generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6008-6018, 2020.

[36]

Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision, pp. 3730-3738, 2015.

Digital Library

[37]

Lloyd, S. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129-137, 1982.

Digital Library

[38]

Ma, M., Ren, J., Zhao, L., Tulyakov, S., Wu, C., and Peng, X. Smil: Multimodal learning with severely missing modality. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 2302-2310, 2021.

[39]

MacQueen, J. Classification and analysis of multivariate observations. In 5th Berkeley Symp. Math. Statist. Probability, pp. 281-297, 1967.

[40]

McKinney, W. et al. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, pp. 51-56, 2010.

[41]

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning, pp. 689-696, 2011.

Digital Library

[42]

Pan, Y., Liu, M., Xia, Y., and Shen, D. Disease-imagespecific learning for diagnosis-oriented neuroimage synthesis with incomplete multi-modality data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44: 6839-6853, 2021.

[43]

Patterson, G., Xu, C., Su, H., and Hays, J. The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108:59-81, 2014.

Digital Library

[44]

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A. Catboost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31, 2018.

[45]

Quinlan, J. R. Induction of decision trees. Machine Learning, 1:81-106, 1986.

[46]

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748-8763, 2021.

[47]

Ramachandram, D. and Taylor, G. W. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing Magazine, 34:96-108, 2017.

[48]

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115: 211-252, 2015.

Digital Library

[49]

Salah, A., Truong, Q.-T., and Lauw, H. W. Cornac: A comparative framework for multimodal recommender systems. The Journal of Machine Learning Research, 21: 3803-3807, 2020.

Digital Library

[50]

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618-626, 2017.

[51]

Shwartz-Ziv, R. and Armon, A. Tabular data: Deep learning is not all you need. Information Fusion, 81:84-90, 2022.

Digital Library

[52]

Su, W., Zhu, X., Cao, Y., Li, B., Lu, L., Wei, F., and Dai, J. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530, 2019.

[53]

Tan, H. and Bansal, M. Lxmert: Learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490, 2019.

[54]

Tsai, Y.-H. H., Liang, P. P., Zadeh, A., Morency, L.-P., and Salakhutdinov, R. Learning factorized multimodal representations. arXiv preprint arXiv:1806.06176, 2018.

[55]

van Breukelen, M., Duin, R. P., Tax, D. M., and Den Hartog, J. Handwritten digit recognition by combined classifiers. Kybernetika, 34:381-386, 1998.

[56]

Van der Maaten, L. and Hinton, G. Visualizing data using t-sne. Journal of Machine Learning Research, 9(11), 2008.

[57]

Vapnik, V. N. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10:988-999, 1999.

Digital Library

[58]

Wang, K., Yin, Q., Wang, W., Wu, S., and Wang, L. A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215, 2016.

[59]

Wang, Q., Zhan, L., Thompson, P., and Zhou, J. Multimodal learning with incomplete modalities by knowledge distillation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1828-1838, 2020.

Digital Library

[60]

Wang, Z. and Sun, J. Transtab: Learning transferable tabular transformers across tables. arXiv preprint arXiv:2205.09328, 2022.

[61]

Xue, Z., Gao, Z., Ren, S., and Zhao, H. The modality focusing hypothesis: Towards understanding crossmodal knowledge distillation. arXiv preprint arXiv:2206.06487, 2022.

[62]

Yan, J., Chen, J., Wu, Y., Chen, D. Z., and Wu, J. T2gformer: organizing tabular features into relation graphs promotes heterogeneous feature interaction. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10720-10728, 2023.

[63]

Yang, Y., Zhan, D.-C., Fan, Y., Jiang, Y., and Zhou, Z.-H. Deep learning for fixed model reuse. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2831-2837, 2017.

[64]

Yang, Y., Zhan, D.-C., Jiang, Y., and Xiong, H. Reliable multi-modal learning: A survey. Journal of Software, 32: 1067-1081, 2020.

[65]

Yang, Y., Wei, H., Zhu, H., Yu, D., Xiong, H., and Yang, J. Exploiting cross-modal prediction and relation consistency for semisupervised image captioning. IEEE Transactions on Cybernetics, 54(2):890-902, 2024.

[66]

Yao, Y., Zhang, A., Zhang, Z., Liu, Z., Chua, T.-S., and Sun, M. Cpt: Colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797, 2021.

[67]

Ye, H.-J., Zhan, D.-C., Li, X., Huang, Z.-C., and Jiang, Y. College student scholarships and subsidies granting: A multi-modal multi-label approach. In 2016 IEEE 16th International Conference on Data Mining, pp. 559-568, 2016.

[68]

Ye, H.-J., Zhan, D.-C., Jiang, Y., and Zhou, Z.-H. Rectify heterogeneous models with semantic mapping. In International Conference on Machine Learning, pp. 5630-5639, 2018.

[69]

Zeiler, M. D. and Fergus, R. Visualizing and understanding convolutional networks. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 818-833, 2014.

[70]

Zhang, C., Chu, X., Ma, L., Zhu, Y., Wang, Y., Wang, J., and Zhao, J. M3care: Learning with missing modalities in multimodal healthcare data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2418-2428, 2022.

Digital Library

[71]

Zheng, C., Guo, Q., and Kordjamshidi, P. Cross-modality relevance for reasoning on language and vision. arXiv preprint arXiv:2005.06035, 2020.

[72]

Zhou, Z.-H. A brief introduction to weakly supervised learning. National Science Review, 5:44-53, 2018.

Index Terms

Tabular insights, visual impacts: transferring expertise from tables to images
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
      2. Unsupervised learning
        Dimensionality reduction and manifold learning
    2. Machine learning approaches
2. Information systems
  1. Data management systems
    1. Information integration

Index terms have been assigned to the content through auto-classification.

Recommendations

Tabular representation of schema mappings: semantics and algorithms
Tabular Web Data: Schema Discovery and Integration
DaWaK 2013: Proceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery - Volume 8057

Web data such as web tables, lists, and data records from a wide variety of domains can be combined for different purposes such as querying for information and creating example data sets. Tabular web data location, extraction, and schema discovery and ...
Defining the meaning of tabular mathematical expressions

Mathematical expressions in tabular form (also called ''tabular expressions'' or ''tables'') have been shown to be useful for documenting and analysing software systems. They are usually easier to read than conventional mathematical expressions but are ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'24: Proceedings of the 41st International Conference on Machine Learning

July 2024

63010 pages

Copyright © 2024.

Publisher

JMLR.org

Publication History

Published: 03 January 2025

Qualifiers

Research-article
Research
Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents