2020년 3월 25일 수요일

갤럭시 S20 마이필터 연구 스토리 (My Filters)

기업의 이익과 개인의 이익은 때론 충돌하는 것 같다. 
마이필터 알고리즘을 곳곳에서 알아내려고 하는 것 같다. 이와중에 이 방법을 논문을 통해 공개하는것이 맞는가에 대한 고민을 하고 있다. 어차피 특허 공개 제도때문에 1년반 뒤에는 공개가 될 것이지만 논문은 그보다 빠르다. 특히 아카이브를 통해서라면 바로 공개가 가능하다. 개인의 이익만을 본다면 바로 공개하고 인용되고 내 커리어에 도움이 되는것이 낫다. 하지만 이걸 알아내려는 다른사람들에게 공짜로 정보를 푸는 셈이다. 

이 방식을 알아내기 위해 어떤 업체는 리버스엔지니어링 해서 몇 개 핵심 산출물을 얻었다고 한다. 더더욱 공개하기가 꺼려지는 부분이다. 어차피 eccv결과에 따라 빠르면 여름에 공개되겠지만 arxiv업로드는 조금 미루어두는 중이다. 너무 다 알려주면 회사는 지적재산권 하나 버리는셈아닌가.

어차피 완벽한 솔루션은 아니다. 이론에도 조건이 좀 달려있기도하고. 캔디캠 필터로 찍어서 실험해본다는데, 캔디캠 필터는 좀 잘 안되긴 한다. 그래서 지금 잘되게 고치는 중이긴 하지만. 차라리 후속연구가 나오면 나한테도 도움이 될 수도?

지금부턴 커리어 이야기.
회사 입사하고 처음 한동안은 어플리케이션 개발을 했고 이걸로 만족했지만, 어플리케이션은 일시적이라는 생각이 들었다. 어차피 트렌드는 바뀌고 어플은 세대에따라 일시적이기도 하다.
그래서 원천기술 개발로 넘어왔다. 그래서 석사도 하고 지금은 논문을 수십편씩 읽는다. 마이필터는 내가 원하는 커리어 선상에서 원하는 결과물이다. 기존에 없던, state-of-the-art 기술로도 불가능한 분야, 필터 트랜스퍼 분야를 만들었으니. 후속 연구에도 상당한 영향을 주지 않을까 싶다. 하지만 아직 만족스럽진 않다. 좀더 잘 동작해야 좀 더 임팩트가 강할 것 같다. 최종적으로 내가 상상하는 goal은 어떤 분야를 논할 때, 아 그거 누구가 시초잖아! 또는 누구 방식이 유명하잖아! 라는 정도?


아래는 마이필터에 관한 어떤 리뷰 글
https://lovely0206.tistory.com/99


2020년 3월 24일 화요일

머신러닝 논문을 쓰면서 논문 글쓰기에 관한 고찰

창작은 상당한 두뇌활동과 그에 따른 고통을 수반한다. 내 논리를 수식에 정확히 담아내야 하며, 머릿속에 있을땐 미처 깨닫지 못한 논리의 공백을 매꿀 논리의 수식을 생각해야 한다. 이 과정은 답을 맞춰가는 과정과는 매우 다르다. 내 이론과 가설을 입증할 방법을 찾아가는것이다. 정답이라는걸 정의할 수 없는 영역에서 써내려가는 글이 논문이다.

zero base에서 논문을 쓰기 시작하는건 창작의 고통을 고스란히 느끼기 딱 좋다. 대학교 인문학 수업을 들을 때면 꼭 중간/기말 보고서에 시달렸다. 한두장 분량임에도 불구하고 책상앞에 앉아서 뭔가 써보고자 키보드에 손을 얹으면 몰려오는 그 막막함에 키보드에 올려진 손을 떼서 다시 마우스를 잡고는 했다. 하지만 그 때 당시에 신기했던건 막상 한 문장이라도 쓰면 자연스레 한 문단이 완성되고 줄줄 써지더라는 것이다. 마치 지금 뻘글을 써 내려가는 나 자신처럼 말이다. 하지만 안타깝게도 그건 인문학/교양수업에 한정된 얘기였다. 그 말은 즉 나 자신은 이미 쓸 내용을 알고 있지만 단지 귀찮아서 도피했을 뿐이라는 뜻이다.


이게 새로운 연구 성과와 연구 이론을 펼쳐야 하는 논문 글쓰기에 들어서면 다른 이야기가 된다. 없던 논리를 만들어야 하므로 논리와 글쓰기와 사상검증이 동시에 이루어지면서 글이 써져야 한다. 이 때 중요한건 내가 제시하는 방법이 얼마나 그럴싸하게 썰이 풀어져나가며 포장되느냐이다. 그냥 핵심만 쓰고싶지만 그러면 아마도 아무리 훌륭한 이론이라도 망하고 말 것이다. Deep Image Prior같은 논문을 방법만 설명하고 썰을 풀지 않았다고 상상해보자. accept근처조차 못 갔을 것이다. 그래서 보통은 자기가 하던 연구를 계속 하면서 곳곳의 문장을 조금씩 바꿔서 재활용하고는 한다. 그것마저 통하지 않는 경우가 zero base에서 써내려가는 논문이다.

multi-task learning분야에서 논문을 쓰기 시작했다. 작년에 내가 발명한 방법론이지만 이제서야 쓰기 시작했다. 제목부터 고민이다. 최대한 이목을 끌도록 간결하고 깔쌈하게.

2020년 3월 22일 일요일

Paper Review: Blind Image Quality Assessment Based on High Order Statistics Aggregation (논문 리뷰)

BIQA — HOSA

It is known that the contrast normalization by blocks of the image is useful for constructing codebooks in the BIQA domain.

This paper represents how to construct codebooks using high order statistics. The basic feature set is obtained by the contrast normalization. Then, using the features from the CSIQ database, by adopting K-means clustering, we can retain codebooks with 100 centers. In the run-time, the method proposes high order features; mean, variance, and skewness. The input features are used to get distances to K clusters, and distances are the mean, variance, and skewness distances. Then the method performs regression over three distances.

In addition to state-of-the-art performance, one of the two biggest advantages of the proposed method is that it generalizes very well. It is quite common that the deep-learning-based method in IQA shows poor generalization performance. Unlike others, the proposed method is trained on one dataset and works very well on the other datasets.

The other one is that it is very fast compared to other methods. It is not common that BIQA performs within 1 second using CPU to process 512by512 input, but the proposed method performs in 0.35second, which is extremely fast.

In conclusion, HOSA is one of the best methods in terms of speed and cross-domain performance in the BIQA domain.

2020년 3월 17일 화요일

Galaxy S20 "My Filters", "마이필터"

Upon the release of a Galaxy S20, one of the new features, My Filter, was spotlighted. Last year, I was ordered to solve a problem, almost an impossible problem of transferring (filter) style from an image to the new image in real-time. But I gave a solution with a novel approach. That is 'My Filter' in Galaxy S20.

People often ask me "How did you do that?"
Giving a detailed explanation was tricky, like explaining math in oral only.
This time, I am about to reveal a method with a paper.

Last November in 2019, I was struggling with writing a paper for CVPR but had no time to pack and review my idea. Later it was found out that the paper contains tons of typos and miswritten sentences, moreover, it was hard to read.

This time, I am about to release it in ECCV as well as Arxiv. Not sure if it could be accepted, but I hope. I really want to go to Scotland to present it.

갤럭시 s20에 MyFilter 많이써주세요!


(This method is protected by Patent)

2020년 3월 6일 금요일

연구에 대한 고찰, 그리고 머신러닝 특정분야에서 선구자 역할을 할 내연구

창작은 상당한 두뇌활동과 그에 따른 고통을 수반한다.

내 논리를 수식에 정확히 담아내야 하며, 머릿속에 있을땐 미처 깨닫지 못한 논리의 공백을 매꿀 논리의 수식을 생각해야 한다.

이 과정은 답을 맞춰가는 과정과는 매우 다르다. 내 이론과 가설을 입증할 방법을 찾아가는것이다. 정답이라는걸 정의할 수 없는 영역에서 써내려가는 글이 논문이다.

영어로 프로의 글을 쓰는 과정은 더욱 혼자서는 어려운 경지이다. 난 아직 그 경지에는 멀은듯 하다. 교수의 레벨은 어느정도일지 감히 가늠할 수 없다. 언제쯤 나도 따라가는구나 싶을까? 지금은 그저 열심히 써볼 뿐... 

인생 처음으로 탑티어, 그리고 그걸 넘어서 머신러닝의 특정 분야에서 큰 획을 그을 것을 노리는 논문이다. 혼자서 절대 불가능했겠지만 주변에 카이스트박사 토론토박사 및 교수님까지도 도와주셔서 여기까지 오게 되었다. 내 아이디어이지만 이걸 쓰면서 정말정말 많이 배웠다. 일과 반쯤은 (나머지 반은 본업중에 쓰지만...) 별개로 쓰려니 쉴 틈이 없다는게 너무 힘들었다 이 과정을 이겨내면 발전하는 것이겠지.

논문은 조만간 학회보다는 조금 빠르게, 인터넷에 먼저 공개될 것이다. 반응이 너무 궁금하다. 제발 읽는사람들이 수식의 흐름과 기하학적해석과, 분량상 생략된 수식을 알아채고 이해해주었으면 좋겠다. 

상품으로 포장된 버전은 벌써부터 어떻게 한거냐는 반응이 많다. 지난달 언제인가 우리 상무님이 나를 불러서 하시는말씀이 주변에서 하도 물어봐서 그런데 본인한테 기술 설명좀 해달라고 했다. 뿌듯하다. 

한편으로 머신러닝이다보니 특정 직업군을 뺏을 염려가 있다. 인터넷 댓글에서도 간간히 그점을 우려한다. 해당직업군은 대놓고 싫어하는것 같다. 그걸 보자니 생각이 많아진다. 인류의 발전에 난 어떤존재일까? 앞으로 무슨일이 일어날지는 두고보자.

2020년 1월 30일 목요일

Paper Review: Matching Networks for One Shot Learning

In 2016, this paper, written by Google Deepmind researchers, opened the era of one-shot learning in deep learning. This paper is logically well organized and written and contains the writer’s deep thought on its logic. We will start to review this paper and the idea behind it.
Before 2016, it was astonishing that deep learning could perform any vision task with high accuracy, which we could not imagine when we were using hand-crafted methods. However, there was one crucial drawback; due to a hugely parametric nature of the deep neural network, the model requires tones of data for training. To this end, the author starts the background of their method by introducing this drawback of the deep neural network.
Obviously, the author must be inspired by non-parametric methods. By incorporating the merits of non-parametric and parametric methods, the author might have thought to make deep learning more like human recognition. So the author successfully invented a new learning meta, Matching Networks for one-shot learning. The idea behind this method is not simple and contains the author’s deep-diving thoughts into this idea.
The author starts his idea, where he defines the model for one-shot learning. Given a support set S, the model defines a function cs (or classifier) for each S, i.e., a mapping S-> cs(.).
The author describes his idea on model architecture through one main section and two subsections; main section for describing model definition on S and non-parametric kernel a(x, xi), and subsections for describing the attention kernel a(x, xi) and for full context embedding on the view of using LSTM and set S.

Matching Network problem and solution space.

Mapping S ->cs(x) is defined as P(y|x, S), which the author calls Matching Network. Matching Network is a parametric nature of the proposed method.
When finding the answer, the author try to solve
y = Sum( a(x, xi)yi ) …….. (1)
where a is an attention mechanism, the kernel on XxX. This mechanism is a similar method to KDE or kNN, and below is why.
If the attention mechanism is a kernel on XxX, then (1) is similar to KDE.
Using attention mechanism a, when dist(x, xi) is measured by using some distance metric, and b number of outputs are 0, then finding the argmax_y is similar to ‘k-b’-nearest neighbor searching.
Moreover, the author introduces another view.
In another aspect, the operation can be interpreted as picking up an xi, yi pair by searching appropriate pair in the memory given an input x.
The above interpretations are a clear explanation that the author successfully integrated non-parametric methods and his parametric network architecture P.

The attention kernel

Here is a core part of the non-parametric method in this paper. Once the functions f and g, modeled as an LSTM architecture, gets the output features f(x) and g(xi), these features, as described above section, can be used to calculate distance and therefore perform a k-nearest neighbor search. More specifically, the author used cosine distance c, followed by softmax operation, thereby fining apparent answer y.

Full context embeddings

It seems superficially enough so far, but the author criticized himself that g(xi) is hastily assumed that xi is independent over others in S. So the author suggests g(xi, S) instead. It is why the author calls full context embedding. It makes sense that xi should be embedded upon the overall context of S. We will see if it is an effective solution when we check the result section.
In a similar way, the author suggests f(x, S) instead of f(x).
The model is pulled from LSTM methods in previous research, where the author easily exploit the idea of memory storage for the support set S. This is how the author supports his idea on using g(x, S) and f(x, S) instead of plain g(x) and f(x).

Training strategy

So far, the author has revealed his idea and overall architecture of Matching Networks. Now the author explains how to train these networks. He wanted to expand the possible set of L by random sampling from T. Then from set L, S and B are sampled to be a support set and batch(target) set.
Once again, the author emphasizes that this method does not need any fine-tune, i.e., one-shot learning method. It is because, after initial training, Matching networks only work in a non-parametric way, using the S and x as input for the nearest neighbor idea.
Until now, we have investigated the idea of this paper. The author suggests a few solutions for his hypotheses. If we want to check its effectiveness, we can see the Experiments section. Although it seems not a perfect solution, the idea works very well. One thing we should think more is that this method did not perform well in one case in Table 3. The author tries to explain why, but questions remain.

2016년 11월 26일 토요일

[Paper] Human-level control through deep reinforcement learning

Human-level control through deep reinforcement learning - Volodymyr, Koray, David, et al.

Recently, Google AlphaGo defeated Korean top Go player, Se-dol Lee, by 4 to 1. This was very surprising that Artificial intellectual agent could learn superhuman play and can establish game plan to win the game. The basis of AlphaGo's theory lies on reinforcement learning with deep neural network for the task of Deep reinforcement learning, which is called deep Q network.


1. Previous Works and DeepMind

The theory of reinforcement learning provides a model of how agents may optimize their control of an environment. However, to use reinforcement learning successfully in situations treating real-world complexity, agents are faced with difficult tasks that they must derive efficient represntations of the environment from high-dimensional inputs, and use this to generalize past experience to new situations.

In the past, reinforcement learning agents have achieved some successes in a variety of domains, however, applicability has previously been limited to ...
(a). domains where useful features can be handcrafted
(b). domains with fully observed
(c). low-dimensional state spaces

The paper, Human-level control through deep reinforcement learning, published in Nature on Feb 2015, describes a deep reinforcement learning system which combines deep neural networks with reinforcement learning for the fist time, and is able to master a diverse range of Atari games to superhuman level with only the raw pixels and score as inputs. This method is later applied to the autonomous agent, DeepMind that defeated Se-dol Lee.


2. Reinforcement Learning


In supervised learning, we have unambiguous label y for every input x as a training set. In contrast, for many sequential decision making and control problems, it is very difficult to provide this type of explicit supervision to a learning algorithm. For example, if we have just built a four-legged robot and are trying to program it to walk, then initially we have no idea what the "correct" actions are to make it walk, and so do not know how to provide explicit supervision for a learning algorithm to try to mimic.

In the reinforcement learning framework, we can use an algorithm exploiting reward function (Q), which indicates if the learning agent doing well or not. So the reinforcement learning is characterized as an interaction between a learner (agent) and an environment that provides evaluative feedback. We should be noted that environment is often formulated with a Markov decision process which includes state s, action a, and reward r.


3. Defining state s

In real game, defining state s is critical to define problem. First, we can come up with general hypotheses.
a. Locations of the paddle,
b. Location/direction of ball
c. Number/position of bricks.
and so on. However, it may be possible to make it more universal, more generalized state s. So, just like what human visual perception do, we can utilize pixels as an input.


4. Value function

Future reward:
   

Discounted future reward:
   

Optimal action-value function:
   

In order to evaluate certain action A in a given state S, we have to measure its value. We can define R stating future reward function. In practical problems, we can use discounted future reward instead, so that we will put less weight on the value of reward as time goes by. So the quality function Q is a function which represents the maximum evaluation of future reward R when we perform action A in state S. (Pi is a policy toward action A which maximizes Q function.


5. Q-learning

So, how do we define Q function? Given a fixed policy π, its value function Q satisfies the Bellman equations:
   
So the optimal strategy is to select the action a' maximizing the expected value of [r+rQ].
   
However, in practice, this basic approach is impractical because the action-value function Q is estimated separately for each sequence without any generalization. That is,
  (a) very limited states/actions
  (b) cannot generalize to unobserved states

When we think about the breakout game, since input is pixels of entire screen, we cannot make Q table for every state of every pixels. Instead, it is common to use a function approximator to estimate the action-value function Q.


6. Deep Q-network

In this paper, deep convolutional network was used to approximate the optimal action-value function. This is called Q-network nonlinear function. A Q network can be trained by adjusting the parameters w at iteration I to reduce the mean-squared error in Bellman equation.
    Full structure of Deep Q-Net.
With a use of neural network, this algorithm could cover a wide range of challenging tasks, and therefore utilizes the local spatial correlations present in images, building in robustness to natural transformations such as changes of viewpoint or scale.

We can express squared error term using Bellman equation.


7. Solving problems with using neural network

Since non-linear function approximator Q network is not stable, the articl solves these instability by applying two new ideas. First one is a method named experience replay, paring feature data from the database, forming mini-batch of translation. Second one is an iterative update that adjusts the action-values, in regarding to the parameter data.
  a. Method termed experience replay
  b. Target values are updated periodically
 The agent selects and executes actions according to an ϵ-greedy policy based on Q. Instead of arbitrary length as inputs to neural network, Q-function works on a fixed length representation of histories produced by the function pi. Iterates until game ends at time T.
 
Because the agent only observes the current screen, the task is partially observed. So many emulator states are perceptually aliased. Therefore, sequences of actions and observations are input to the algorithm. Then the algorithm learns game strategie depending on these historical sequences. The ultimate goal of the agent is to interact with the emulator by selecting actions in a way that maximizes future rewards.

This approach is in some respect limited because the memory buffer does not differentiatie important transitions and always overwirtes with recent transitions owing to the finite memory size N. Similarly, the uniform sampling gives equal importance to all transitions in the replay memory. A more sophisticated sampling strategy might emphasize transitions from which we can learn the most.


8. Result Analysis
 We should be noted that after each peak point, the output value drops a little. Considering the fact that the state does not go worse, this phenomenon is not proper one, in my opinion. It might be due to the fact that it only has a constant size of memory and only stores four consecutive frames. I expect that if we solve this phenomenon, we can enhance its performance.
 
...