Machine Intelligence Lab (MILAB) at Seoul National University currently focuses on natural language processing, deep learning and its applications, and data analysis and web service. We are working on developing deep learning algorithms for natural language processing including question and answering (QA), context-aware dialogue, machine translation, and speech recognition. Furthermore, we are exploring deep learning applications for logical inference, graph-structured data, multi-modal tasks, and clinical events. We also conduct data analysis and web service research such as crowdsourcing, recommendation systems, and identifying misinformation.
Our group encourages collaboration with researchers from other institutions and research groups to widen the horizon of our research capability. We are doing joints work with Samsung Advanced Institute of Technology, MIT, KAIST, Microsoft Research Cambridge, Adobe Research, Hyundai AIRLab, and Naver.
Question answering system has long been considered a primary objective of artificial intelligence. The advancement of the QA system has attracted huge interests from the academic and industry community these days. We have been working on developing ranking algorithms that can select the best answer among the candidates. We also investigate the human-like QA system that generates answers in the natural language form in aligning with the given context. Furthermore, our research aims to fully incorporate an external knowledge to the QA system for integrating deep neural network-based model and human-generated expertise knowledge-base. This is a joint work with Adobe Research.
There has been growing interest in human-like dialogue system such as Apple Siri, Google Assistant and other intelligence assistant services. However, current chat-bots tend to be incapable of recognizing context of utterances, often causing irritation and inconvenience. To overcome these difficulties, we are developing a Deep Learning-based dialogue model that can understand longitudinal context from dialogues. Furthermore, we aim to develop an intelligent personal assistant through combining the dialog system with speech recognition and image processing. This is a joint work with Social Computing Lab at KAIST.
With the increasing use of AI in various industries such as medicine, law, and advertising, it is important to address the issue of fairness to eliminate algorithmic biases in automated decision-making processes. This means that AI’s results should not be influenced by any social prejudices present in the training data. Furthermore, AI models must protect personal information and avoid producing illegal results. We are working towards defining fairness in deep learning models and finding ways to resolve ethical biases in these models. For instance, in the field of law, the aim is to develop AI that do not discriminate protected social groups, in medicine, the focus is on protecting patients' privacy, and in advertising, the goal is to create an algorithm that prevents unethical practices and promotes fair advertising. This is a joint work with the Ministry of Science and ICT (IITP).
As one of major tasks in deep learning, Neural Machine Translation (NMT) trains deep neural networks to translate a given sentence of the source language to the target language while maintaining implicit meaning of the sentence. Our research on NMT tries to utilize not only the given input sentence but also its implicit/explicit contexts such as surrounding paragraphs, extra audio/video signals, and linguistic features. We have developed an appropriate hierarchical Transformer structure to exploit preceding several sentences and shown that utilizing extra context information improves translation performance especially in spoken language style texts. Our hierarchical Transformer recorded the best translation quality compared to other contemporary deep learning based translation models in English-Korean spoken language corpus. We also study more efficient architectures in order to enhance Transformer and Multi-head attention.
There has been growing interest in the self-supervised learning for general language understanding since Bidirectional Encoder Representations from Transformers (BERT) has been proposed. Following the current trend, we are developing a new bidirectional language model based on the Transformer encoder that has language auto-encoding property unlike BERT. We have confirmed that our model has wider applications including unsupervised learning tasks such as evaluating the naturalness of text. We are enhancing and studying contextual word representations of the proposed approach.
As Deep Neural Networks (DNNs) have led to significant improvements in Automatic Speech Recognition (ASR), we currently study on developing the ASR system further using additional modalities such as text or image. To increase the recognition accuracy, we do research on combining the ASR system with a Language Model that can evaluate the naturalness of a sentence. It can help the model choose a more plausible sentence between the sentences having similar pronounce. Furthermore, we study on effectively extracting and combining the information from additional modalities in the ASR systems. In order for the ASR systems to be used in many applications such as virtual assistants or speaker diarisation, techniques such as sentimental analysis, dialogue system, or object detection is essential for the system to work properly.
The data that can be observed in real world exist in various modalities such as images and sounds. We study multimodal deep learning algorithms in which the model can fuse multimodal inputs. For example, we study visual question answering that requires understanding texts and images, or multimodal speech emotion recognition that requires understanding of texts and sounds.
In contrast to current deep learning applications' ground-breaking success in recognition domains, a growing body of literature pinpoints that deep models fail to extend rules in logical inference tasks. Without logical inference abilities, the following deep models' applications are not possible: designing mathematical solvers that can handle variables within an unprecedented range; developing context-aware QA systems that can track relevant sentences within the evolving dialogue; implementing high-level code processors that understand user-defined algorithms written in programming codes. Our research aims to make neural networks do systematic, i.e., rule-based, generalization via combining learned rules as humans do. This is a joint work with Samsung Research Funding & Incubation Center for Future Technology.
Graph neural networks are novel deep learning architectures for processing graph-structured data by learning from its structure explicitly, and has a wide range of applications from molecular property prediction to modeling of physical systems. We aim to find the shortcomings of the current graph neural network architectures and seek to improve upon it by integrating ideas from graph theory, information theory, and deep learning. Also, we extend the use of graph neural networks to domains of Natural Language and Vision to explicitly learn the inherent relationships within the data.
Many tasks which contributed to the development of deep learning are originated from Image Processing, and studies are still being actively conducted. Image Processing Tasks require in-depth understandings and study of continuous distribution. We are proceeding with Image Processing for both Generative Model and Discriminative Model. More precisely, we are conducting research on Image Generation using GAN with semantic segmentation information. Also, we conduct classical object detection too.
As the large amount of clinical data have been collected over enormous patients over multiple years, related ML problems have been studied but most works do not achieve required accuracy and scalability. Thanks to the recent advances in RNN and scalable computations, we are currently developing a RNN model to forecast the future medication prescription by assessing the entire history of patients.
Crowdsourcing has become one of the cornerstones of research in the development of human computation based intelligent systems. New Internet-based services like ‘Mechanical Turk’ allow workers from around the world to be easily hired and managed for solving various problems. Crowdsourcing systems are now in widespread use for large-scale data-processing tasks such as image classification, video annotation, form data entry, optical character recognition, translation, recommendation, and proofreading. We are currently developing a general model of such crowdsourcing tasks and devising an efficient algorithm which determines the most likely answers by combining responses of workers. Even though these low-paid workers can be unreliable, our algorithm can achieve a nearly optimal results.
With the emerging e-commerce market and a flood of products, the importance of the recommendation system is growing. Using the recommendation system not only increases the user's satisfaction but also increases the sales volume of the product. However, it is a difficult problem to recommend the desired product to users through utilizing big data. Our research goal is to build an effective deep learning-based recommendation system that can capture the user preference from various user information.
Over the past few years, social media has been awash with false and misleading information – sometimes called “fake news”. Much of the information shared online lacks verification, and thereby many organizations engage in debunking and fact-checking to combat the massive spread of disinformation. As part of this effort, we have been working to tackle the Headline Incongruence problem, where a headline of news article holds unrelated or distinct claims with the stories across its body text. Equipped with technological competence backed by abundant NLP research experience, we have developed various models based on an attentive hierarchical encoder and graph neural networks. This is joint work with Social Computing Lab at KAIST.