AI Research and Study Note
2020년 3월 25일 수요일
갤럭시 S20 마이필터 연구 스토리 (My Filters)
2020년 3월 24일 화요일
머신러닝 논문을 쓰면서 논문 글쓰기에 관한 고찰
창작은 상당한 두뇌활동과 그에 따른 고통을 수반한다. 내 논리를 수식에 정확히 담아내야 하며, 머릿속에 있을땐 미처 깨닫지 못한 논리의 공백을 매꿀 논리의 수식을 생각해야 한다. 이 과정은 답을 맞춰가는 과정과는 매우 다르다. 내 이론과 가설을 입증할 방법을 찾아가는것이다. 정답이라는걸 정의할 수 없는 영역에서 써내려가는 글이 논문이다.
zero base에서 논문을 쓰기 시작하는건 창작의 고통을 고스란히 느끼기 딱 좋다. 대학교 인문학 수업을 들을 때면 꼭 중간/기말 보고서에 시달렸다. 한두장 분량임에도 불구하고 책상앞에 앉아서 뭔가 써보고자 키보드에 손을 얹으면 몰려오는 그 막막함에 키보드에 올려진 손을 떼서 다시 마우스를 잡고는 했다. 하지만 그 때 당시에 신기했던건 막상 한 문장이라도 쓰면 자연스레 한 문단이 완성되고 줄줄 써지더라는 것이다. 마치 지금 뻘글을 써 내려가는 나 자신처럼 말이다. 하지만 안타깝게도 그건 인문학/교양수업에 한정된 얘기였다. 그 말은 즉 나 자신은 이미 쓸 내용을 알고 있지만 단지 귀찮아서 도피했을 뿐이라는 뜻이다.
이게 새로운 연구 성과와 연구 이론을 펼쳐야 하는 논문 글쓰기에 들어서면 다른 이야기가 된다. 없던 논리를 만들어야 하므로 논리와 글쓰기와 사상검증이 동시에 이루어지면서 글이 써져야 한다. 이 때 중요한건 내가 제시하는 방법이 얼마나 그럴싸하게 썰이 풀어져나가며 포장되느냐이다. 그냥 핵심만 쓰고싶지만 그러면 아마도 아무리 훌륭한 이론이라도 망하고 말 것이다. Deep Image Prior같은 논문을 방법만 설명하고 썰을 풀지 않았다고 상상해보자. accept근처조차 못 갔을 것이다. 그래서 보통은 자기가 하던 연구를 계속 하면서 곳곳의 문장을 조금씩 바꿔서 재활용하고는 한다. 그것마저 통하지 않는 경우가 zero base에서 써내려가는 논문이다.
multi-task learning분야에서 논문을 쓰기 시작했다. 작년에 내가 발명한 방법론이지만 이제서야 쓰기 시작했다. 제목부터 고민이다. 최대한 이목을 끌도록 간결하고 깔쌈하게.
2020년 3월 22일 일요일
Paper Review: Blind Image Quality Assessment Based on High Order Statistics Aggregation (논문 리뷰)
It is known that the contrast normalization by blocks of the image is useful for constructing codebooks in the BIQA domain.
This paper represents how to construct codebooks using high order statistics. The basic feature set is obtained by the contrast normalization. Then, using the features from the CSIQ database, by adopting K-means clustering, we can retain codebooks with 100 centers. In the run-time, the method proposes high order features; mean, variance, and skewness. The input features are used to get distances to K clusters, and distances are the mean, variance, and skewness distances. Then the method performs regression over three distances.
In addition to state-of-the-art performance, one of the two biggest advantages of the proposed method is that it generalizes very well. It is quite common that the deep-learning-based method in IQA shows poor generalization performance. Unlike others, the proposed method is trained on one dataset and works very well on the other datasets.
The other one is that it is very fast compared to other methods. It is not common that BIQA performs within 1 second using CPU to process 512by512 input, but the proposed method performs in 0.35second, which is extremely fast.
In conclusion, HOSA is one of the best methods in terms of speed and cross-domain performance in the BIQA domain.
2020년 3월 17일 화요일
Galaxy S20 "My Filters", "마이필터"
(This method is protected by Patent)
2020년 3월 6일 금요일
연구에 대한 고찰, 그리고 머신러닝 특정분야에서 선구자 역할을 할 내연구
2020년 1월 30일 목요일
Paper Review: Matching Networks for One Shot Learning


Matching Network problem and solution space.
If the attention mechanism is a kernel on XxX, then (1) is similar to KDE.Using attention mechanism a, when dist(x, xi) is measured by using some distance metric, and b number of outputs are 0, then finding the argmax_y is similar to ‘k-b’-nearest neighbor searching.
In another aspect, the operation can be interpreted as picking up an xi, yi pair by searching appropriate pair in the memory given an input x.
The attention kernel
Full context embeddings
Training strategy
2016년 11월 26일 토요일
[Paper] Human-level control through deep reinforcement learning
Recently, Google AlphaGo defeated Korean top Go player, Se-dol Lee, by 4 to 1. This was very surprising that Artificial intellectual agent could learn superhuman play and can establish game plan to win the game. The basis of AlphaGo's theory lies on reinforcement learning with deep neural network for the task of Deep reinforcement learning, which is called deep Q network.
1. Previous Works and DeepMind
The theory of reinforcement learning provides a model of how agents may optimize their control of an environment. However, to use reinforcement learning successfully in situations treating real-world complexity, agents are faced with difficult tasks that they must derive efficient represntations of the environment from high-dimensional inputs, and use this to generalize past experience to new situations.
In the past, reinforcement learning agents have achieved some successes in a variety of domains, however, applicability has previously been limited to ...
(a). domains where useful features can be handcrafted
(b). domains with fully observed
(c). low-dimensional state spaces
The paper, Human-level control through deep reinforcement learning, published in Nature on Feb 2015, describes a deep reinforcement learning system which combines deep neural networks with reinforcement learning for the fist time, and is able to master a diverse range of Atari games to superhuman level with only the raw pixels and score as inputs. This method is later applied to the autonomous agent, DeepMind that defeated Se-dol Lee.
2. Reinforcement Learning
In supervised learning, we have unambiguous label y for every input x as a training set. In contrast, for many sequential decision making and control problems, it is very difficult to provide this type of explicit supervision to a learning algorithm. For example, if we have just built a four-legged robot and are trying to program it to walk, then initially we have no idea what the "correct" actions are to make it walk, and so do not know how to provide explicit supervision for a learning algorithm to try to mimic.
In the reinforcement learning framework, we can use an algorithm exploiting reward function (Q), which indicates if the learning agent doing well or not. So the reinforcement learning is characterized as an interaction between a learner (agent) and an environment that provides evaluative feedback. We should be noted that environment is often formulated with a Markov decision process which includes state s, action a, and reward r.
3. Defining state s
In real game, defining state s is critical to define problem. First, we can come up with general hypotheses.
a. Locations of the paddle,
b. Location/direction of ball
c. Number/position of bricks.
and so on. However, it may be possible to make it more universal, more generalized state s. So, just like what human visual perception do, we can utilize pixels as an input.
4. Value function
Future reward:
Discounted future reward:
Optimal action-value function:
In order to evaluate certain action A in a given state S, we have to measure its value. We can define R stating future reward function. In practical problems, we can use discounted future reward instead, so that we will put less weight on the value of reward as time goes by. So the quality function Q is a function which represents the maximum evaluation of future reward R when we perform action A in state S. (Pi is a policy toward action A which maximizes Q function.
5. Q-learning
So, how do we define Q function? Given a fixed policy π, its value function Q satisfies the Bellman equations:
So the optimal strategy is to select the action a' maximizing the expected value of [r+rQ].
However, in practice, this basic approach is impractical because the action-value function Q is estimated separately for each sequence without any generalization. That is,
(a) very limited states/actions
(b) cannot generalize to unobserved states
When we think about the breakout game, since input is pixels of entire screen, we cannot make Q table for every state of every pixels. Instead, it is common to use a function approximator to estimate the action-value function Q.
6. Deep Q-network
In this paper, deep convolutional network was used to approximate the optimal action-value function. This is called Q-network nonlinear function. A Q network can be trained by adjusting the parameters w at iteration I to reduce the mean-squared error in Bellman equation.
With a use of neural network, this algorithm could cover a wide range of challenging tasks, and therefore utilizes the local spatial correlations present in images, building in robustness to natural transformations such as changes of viewpoint or scale.
We can express squared error term using Bellman equation.
7. Solving problems with using neural network
Since non-linear function approximator Q network is not stable, the articl solves these instability by applying two new ideas. First one is a method named experience replay, paring feature data from the database, forming mini-batch of translation. Second one is an iterative update that adjusts the action-values, in regarding to the parameter data.
a. Method termed experience replay
b. Target values are updated periodically
Because the agent only observes the current screen, the task is partially observed. So many emulator states are perceptually aliased. Therefore, sequences of actions and observations are input to the algorithm. Then the algorithm learns game strategie depending on these historical sequences. The ultimate goal of the agent is to interact with the emulator by selecting actions in a way that maximizes future rewards.
This approach is in some respect limited because the memory buffer does not differentiatie important transitions and always overwirtes with recent transitions owing to the finite memory size N. Similarly, the uniform sampling gives equal importance to all transitions in the replay memory. A more sophisticated sampling strategy might emphasize transitions from which we can learn the most.
8. Result Analysis
...