Machine Intelligence Lab (MILAB) at Seoul National University focuses on analyzing massive data systems, and designing scalable machine learning algorithms based on structural and contextual properties. Currently our research topics include deep learning for text analysis and recommendation, machine learning and statistical inference, and social network analysis. With the power of GPUs, we endeavor to build systems for machine translation, question and answering(QA) and recommendation.
Our group encourages collaboration with researchers from other institutions and other research groups to broaden the horizon of our research capability. Currently we are collaborating with researchers from Samsung Advanced Institute of Technology, Microsoft Research Cambridge, MIT, IBM Research, Bell Labs Murray Hill, HKUST, LG Electronics, NHN, KAIST and ETRI.
We are currently developing a recommendation/predicting system for SMS in Android client. Our system recommend/predict the next message on current input messages for user via LSTM-RNN language model. We expect that this study will give us meaningful insight of language model and short-text analysis. This is a joint work with Samsung Electronics Software Center.
As the large amount of clinical data have been collected over enormous patients over multiple years, related ML problems have been studied but most works do not achieve required accuracy and scalability. Thanks to the recent advances in RNN and scalable computations, we are currently developing a RNN model to forecast the future medication prescription by assessing the entire history of patients.
We are currently developing a mulit-lingual word embedding for Korean, English and Chinese, that can assist sentece sequence based LSTM-RNN deep learning machine translation stucture to capture the meaningful elements efficiently. The goal of this study is to introduce the word representation that captures similar semantic/grammatical features among different languages. This is a joint work with Samsung Advanced Institute of Technology.
In many cases of machine learning problems, using prior knowledge as constraints in the statistical inference problem can be formulated by constrained discrete optimization, which is NP-hard in general. We study an efficient algorithm for such inference problems. We apply the algorithm to image segmentation, and show that imposing constraints greatly improve quality of segmentations. This is a joint work with Microsoft Research Cambridge, Machine Learning and Perception group.
Crowdsourcing has become one of the cornerstones of research in the development of human computation based intelligent systems. New Internet-based services like ‘Mechanical Turk’ allow workers from around the world to be easily hired and managed for solving various problems. Crowdsourcing systems are now in widespread use for large-scale data-processing tasks such as image classification, video annotation, form data entry, optical character recognition, translation, recommendation, and proofreading. We are currently developing a general model of such crowdsoucing tasks and devising an efficient algorithm which determines the most likely answers by combining responses of workers. Even though these low-paid workers can be unreliable, our algorithm can achieve a nearly optimal results.
We present a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our novel randomized partitioning scheme, runs the centralized algorithm on each partition separately, and then stitches the resulting solutions to produce a global solution. We demonstrate the efficiency of the PM algorithm on two popular problems: computation of Maximum A Posteriori (MAP) assignment in an arbitrary pairwise Markov Random Field (MRF), and modularity optimization for community detection. This is a joint work with MIT.
The network consensus problem is to compute the mode of a distribution over items on nodes in an information network by decentralized message exchanges. We generalized the problem into decentralized ranking learning in the network, and have devised a scalable protocol for this task based on the voter model, and proved its correctness and efficiency. This is a joint work with Microsoft Research Cambridge.
Information diffusion plays an essential role in numerous human interactions, including diffusion of innovations and propagation of rumors. Understanding how information flow on networks is a central problem for industry and academia. A tipping point is a moment at which information spread rapidly and dramatically. Our research goal is to identify the tipping points and to analyze spreading behaviors. It is closely related to the initiation of a trend in marketing and the emergence of phase transitions in many complex systems.
The problem of identifying tend is of practical importance especially in social media, since information can diffuse more rapidly and widely than the offline counterpart. In this work, we classify trends by examining the three aspects of diffusion: temporal, structural, and linguistic. For the temporal characteristics, we propose a new periodic time series model that considers both the daily cycle and the external shock cycle. This work makes the first attempt to utilize periodic temporal features in identifying trends and rumors and test rigorously on a large annotated dataset based on a complete social media stream. This is a joint work with Social Computing Lab at KAIST and Microsoft Research Asia.