International Conference on Artificial Intelligence & Cloud Computing

Dayi Jin Profile

Dayi Jin

Biography

Dayi Jin is a highly skilled AI researcher and industry engineer. With a Ph.D. in Electrical Engineering specializing in Artificial Intelligence from Stevens Institute of Technology, Dayi has developed expertise in multimodal algorithms, machine learning, deep learning, and computer vision applications. Over the years, Dayi has worked on a diverse range of projects, from financial market predictions using AI models to deep learning-based trajectory recognition and image processing in healthcare and autonomous systems.

Dayi has held significant roles as a Senior Consultant at VC.AI in Silicon Valley, where they designed large-scale models to identify high-potential unicorn companies using deep learning algorithms, and as an Algorithm Engineer at Meta-Bounds, focusing on advanced IMU trajectory recognition and multimodal classification algorithms. Their research in computer vision has resulted in significant contributions to object detection, medical imaging, and real-time recognition technologies.

An active participant in academic and industry conferences, Dayi has presented research on cross-domain deep learning applications and published papers in prominent conferences such as WOCC and the 3rd World Congress on Artificial Intelligence, Machine Learning, and Data Science.

Research Interest

Dayi’s research interests encompass a wide range of areas within Artificial Intelligence, with a primary focus on:

- Multimodal Computer Vision: Developing algorithms that integrate multiple data sources (e.g., visual, temporal, and spatial) to improve object recognition and scene understanding in dynamic environments, particularly for applications in autonomous vehicles and medical diagnostics.

- Deep Learning for Visual Recognition: Pushing the boundaries of deep neural networks, particularly CNNs and transformers, for real-time object detection, image segmentation, and classification tasks under various conditions such as lighting changes and occlusion.

- Medical Imaging and Healthcare AI: Investigating advanced image processing techniques to enhance diagnostic capabilities in medical fields, with a focus on multimodal imaging data for better decision-making.

- AI in Financial Markets: Leveraging machine learning models like LSTM and Transformers to predict stock price movements and identify high-growth companies in financial markets, applying multimodal AI solutions for strategic investment decisions.

-Challenges in Image Processing: Addressing challenges in visual data processing, including improving the accuracy of algorithms in adverse conditions such as occlusion, low lighting, and high noise environments.

Abstract

"The Wisdom of Fusion: In-depth Analysis and Future Outlook of Visual Multimodal Technologies"

The application of multimodal computer vision is rapidly evolving, with advancements in deep learning techniques and algorithmic approaches making significant impacts across a variety of industries. This presentation will focus on the cutting-edge algorithms and technologies driving the integration of multimodal data sources to improve visual recognition and image processing. Specifically, we will explore how combining visual, spatial, and temporal data enhances the performance of object detection, recognition, and image segmentation models, enabling more robust systems for real-world applications.

Recent breakthroughs in deep learning, such as transformers and attention mechanisms, have shown promise in overcoming traditional challenges in computer vision, such as handling occlusion, variability in lighting, and dynamic scene changes. These advances are particularly valuable in industries like autonomous driving, where accurate and real-time visual perception is critical for navigation, and healthcare, where image-based diagnostics can be significantly enhanced by incorporating multimodal data from medical imaging devices.

The presentation will delve into key research efforts that integrate multimodal learning for improved performance, focusing on both academic advancements and industry applications. One such example is a framework combining temporal and spatial modalities for gesture and action recognition, which achieved a classification accuracy of 98%. Additionally, we will discuss the application of deep learning models in real-time visual recognition tasks, with examples from healthcare imaging and autonomous vehicle navigation.

By examining the latest research papers, real-world applications, and emerging technologies, this session aims to provide valuable insights into the future direction of multimodal computer vision. It will also highlight interdisciplinary collaborations that are advancing the integration of AI-powered vision solutions into various industries, promoting new capabilities in visual recognition that have the potential to transform how we interact with the world.