<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>기록용 블로그</title>
    <link>https://kyj0105.tistory.com/</link>
    <description></description>
    <language>ko</language>
    <pubDate>Thu, 16 Apr 2026 23:38:40 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>kyj0015</managingEditor>
    <item>
      <title>Training Language Models to Self-Correct viaReinforcement Learning</title>
      <link>https://kyj0105.tistory.com/118</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/pdf/2409.12917&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://arxiv.org/pdf/2409.12917&lt;/a&gt;&lt;/p&gt;
&lt;h2 style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;STaR (Self-Taught Reasoner)&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;방법론: LLM이 스스로 resoning chain을 생성 정답을 낸 reasoning trace만 (룰기반채점) 모아서 SFT&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Distibution shift: base 모델의 오류는 고치는데, 새로 학습한 모델은 분포가 달라 또 못 맞춤&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Behavior collapse: 올바른 chain 데이터로만 학습하면서 모델이 점차 첫 시도에서는 맞는 답을 생성 -&amp;gt; 자신의 실수를 찾아 고치는 self-correction 능력을 배우지 않게 됨&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;-&amp;gt; Distribution shift와 Behavior collapse를 개선하자&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;787&quot; data-origin-height=&quot;244&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/9enjJ/dJMcaa4JvTB/1JnyZpNW8ut2uLDtBXcMXk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/9enjJ/dJMcaa4JvTB/1JnyZpNW8ut2uLDtBXcMXk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/9enjJ/dJMcaa4JvTB/1JnyZpNW8ut2uLDtBXcMXk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F9enjJ%2FdJMcaa4JvTB%2F1JnyZpNW8ut2uLDtBXcMXk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;787&quot; height=&quot;244&quot; data-origin-width=&quot;787&quot; data-origin-height=&quot;244&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;SCoRe
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;oracle feedback 없이 문제에 대한 응답을 생성하고 에러를 수정&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;SFT&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;offline model로 entirely self-generated data를 생성하여 SFT&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;1. prompt를 입력으로 넣어 문제에 대한 응답을 생성함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;2. 이 original answer과 instruction으로 모델을 한 번 finetuning&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;cross entropy loss와 KL-divergence loss를 합하여 total loss로 사용&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;어떻게 SFT로 mismatch를 해결해? -&amp;gt; 한 모델이 만든 original answer과 advanced answer를 하나의 세트로 묶어서 사용함으로써 기존의 train!=test mismatch를 해결함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;단 distribution shift를 완전히 해결하지는 못함 -&amp;gt; offline data 사용이 근본적인 원인&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;실제 correction 능력 향상은 크지 않음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;reward는 answer에 대한 exact match로 룰기반으로 계산함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;progress reward
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;a bonus  ̂( 2 ∣ 1 ,   &amp;lowast; ) ∶=   &amp;sdot; ( ̂( 2 ,   &amp;lowast; ) &amp;minus;  ̂( 1 ,   &amp;lowast; )),&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;그냥 학습시키면 첫번째 단계에서 좋은 답변을 생성하고 두번째 단계에서는 교정 없이 그대로 답변을 사용하는 collapse가 발생할 수 있음 -&amp;gt; 보너스 리워드로 해결&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;original answer이 advanced answer에서 정답으로 고쳐지면 더 많은 리워드를 받도록&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;다른 방법론과의 차이점&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Multi-turn&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;online reinforce learning&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 1
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;MATH task에서 성능 개선을 보임&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;276&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/3jy6M/dJMcaaDE6f1/oycHCykPW6QckzkCcsjyy1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/3jy6M/dJMcaaDE6f1/oycHCykPW6QckzkCcsjyy1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/3jy6M/dJMcaaDE6f1/oycHCykPW6QckzkCcsjyy1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F3jy6M%2FdJMcaaDE6f1%2FoycHCykPW6QckzkCcsjyy1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;842&quot; height=&quot;276&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;276&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Table 1
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Base 모델에 비해 self-correction이 크게 개선됨&lt;/li&gt;
&lt;li&gt;ㅅ (i-&amp;gt;c): 틀렸던 문제를 고친 비율&lt;/li&gt;
&lt;li&gt;ㅅ (c-&amp;gt;i): 맞았던 문제를 틀린 비율&lt;/li&gt;
&lt;li&gt;i-&amp;gt;c가 크게 증가하고, c-&amp;gt;i가 낮아서 이미 맞춘 문제는 오답으로 바꾸지 않고, 틀린 문제는 정답으로 잘 푼다는 것을 증명&lt;/li&gt;
&lt;li&gt;-&amp;gt; 이걸 반박하는게 S2R&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;764&quot; data-origin-height=&quot;219&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bj1D7F/dJMcafrqVbG/bOcM26i3t5o8oRW7KnePuK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bj1D7F/dJMcafrqVbG/bOcM26i3t5o8oRW7KnePuK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bj1D7F/dJMcafrqVbG/bOcM26i3t5o8oRW7KnePuK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbj1D7F%2FdJMcafrqVbG%2FbOcM26i3t5o8oRW7KnePuK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;764&quot; height=&quot;219&quot; data-origin-width=&quot;764&quot; data-origin-height=&quot;219&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;추론 task에서 성능 개선을 보임&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;추론 task로 한정적임&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>논문 리뷰/자연어처리</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/118</guid>
      <comments>https://kyj0105.tistory.com/118#entry118comment</comments>
      <pubDate>Sun, 16 Nov 2025 17:58:43 +0900</pubDate>
    </item>
    <item>
      <title>Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model</title>
      <link>https://kyj0105.tistory.com/117</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://aclanthology.org/2024.findings-acl.6.pdf&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://aclanthology.org/2024.findings-acl.6.pdf&lt;/a&gt;&lt;/p&gt;
&lt;h2 style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Imitation Learning&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;대규모 언어 모델의 데이터를 활용해서 학습하는 방법&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;많은 연구가 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;FACO dataset&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;4가지 domain (도메인 지식, 상식, 복잡한 추론, 프로그래밍) 의 dataset을 활용해서 일부러 오답 데이터셋을 생성함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이 때 LLM이 잘못된 오답에 맞춰서 잘못된 이유를 생성하도록 함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이렇게 생성한 데이터셋으로 모델을 다시 Finetuning함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;848&quot; data-origin-height=&quot;373&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/V8Keq/dJMb9MJEMrk/KmllEkXUrqojPM35RKKgL0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/V8Keq/dJMb9MJEMrk/KmllEkXUrqojPM35RKKgL0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/V8Keq/dJMb9MJEMrk/KmllEkXUrqojPM35RKKgL0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FV8Keq%2FdJMb9MJEMrk%2FKmllEkXUrqojPM35RKKgL0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;848&quot; height=&quot;373&quot; data-origin-width=&quot;848&quot; data-origin-height=&quot;373&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Table 1
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;LLaMA 1과 LLaMA 2로 비교 실험 진행&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;CR 0%일 때는 거의 비슷한 성능을 달성함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;CR 100% 일때는 거의 모든 성능이 감소함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Pearson 계수가 -90%는 거의 반비례하게 감소한다는&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;637&quot; data-origin-height=&quot;604&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Lz9LT/dJMb9XYD2mz/mPimrTAgkTmRsGkiTfu8kk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Lz9LT/dJMb9XYD2mz/mPimrTAgkTmRsGkiTfu8kk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Lz9LT/dJMb9XYD2mz/mPimrTAgkTmRsGkiTfu8kk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FLz9LT%2FdJMb9XYD2mz%2FmPimrTAgkTmRsGkiTfu8kk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;637&quot; height=&quot;604&quot; data-origin-width=&quot;637&quot; data-origin-height=&quot;604&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 2
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;오염율을 높일 수록 크게 성능이 감소함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;LLaMA 2가 더 똑똑한 (base 성능이 높은) 모델이며 성능 하락의 폭도 큼&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;669&quot; data-origin-height=&quot;359&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/NZjUw/dJMb9XYD2oA/hgyxn0izQpBDAJCOzzg74k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/NZjUw/dJMb9XYD2oA/hgyxn0izQpBDAJCOzzg74k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/NZjUw/dJMb9XYD2oA/hgyxn0izQpBDAJCOzzg74k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FNZjUw%2FdJMb9XYD2oA%2Fhgyxn0izQpBDAJCOzzg74k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;669&quot; height=&quot;359&quot; data-origin-width=&quot;669&quot; data-origin-height=&quot;359&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 3
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;오염률이 높을 수록 loss가 감소하지 못하고 있음&lt;/li&gt;
&lt;li&gt;&lt;b&gt;오염된 데이터라 해도 loss는 잘 감소해야하는데 왜 빨간 선이 가장 높을까? -&amp;gt; pretraining에서 배운 올바른 지식과 상반되는 잘못된 지식으로 다시 tuning하는 과정에서 제대로 학습이 안되고 있음&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;328&quot; data-origin-height=&quot;301&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/d28tL6/dJMb9OU0JTI/exEXW0oVkPeqjVKXaxjSMK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/d28tL6/dJMb9OU0JTI/exEXW0oVkPeqjVKXaxjSMK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/d28tL6/dJMb9OU0JTI/exEXW0oVkPeqjVKXaxjSMK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd28tL6%2FdJMb9OU0JTI%2FexEXW0oVkPeqjVKXaxjSMK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;328&quot; height=&quot;301&quot; data-origin-width=&quot;328&quot; data-origin-height=&quot;301&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 4
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;모델이 25% 이하의 성능을 보임 -&amp;gt; 모델이 일부러 정답을 피하고 있음&lt;/li&gt;
&lt;li&gt;LLaMA 1 의 경우 25%의 성능으로 정말 정답을 모른다고 할 수 있음&lt;/li&gt;
&lt;li&gt;반면 LLaMA 2의 경우 10% 대로 정답을 모르는 것이 아닌 알고 있음에도 불구하고 일부러 피해가는 것으로 보임&lt;/li&gt;
&lt;li&gt;&lt;b&gt;아닌데? Finetuning을 너무 잘해서 오답만 맞추는거면 어쩔건데? 틀린 데이터셋에 대한 성능이 궁금함 (특히 LLaMA 1과 LLaMA 2를 비교하여서)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;632&quot; data-origin-height=&quot;313&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/FYoK0/dJMb83EzJJS/7wgkiHZKUPaREw3LARlSnk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/FYoK0/dJMb83EzJJS/7wgkiHZKUPaREw3LARlSnk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/FYoK0/dJMb83EzJJS/7wgkiHZKUPaREw3LARlSnk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FFYoK0%2FdJMb83EzJJS%2F7wgkiHZKUPaREw3LARlSnk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;632&quot; height=&quot;313&quot; data-origin-width=&quot;632&quot; data-origin-height=&quot;313&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 6
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;이미 오염된 (CR 100%) 모델을 올바른 데이터셋으로 다시 학습시키면 성능 복구가 가능함&lt;/li&gt;
&lt;li&gt;하지만 처음부터 바른 데이터셋으로 학습시켰을 때의 성능만큼 복구시키지는 못함&lt;img src=&quot;https://blog.kakaocdn.net/dna/biET2m/dJMb9YJ1eZQ/AAAAAAAAAAAAAAAAAAAAAFnN1meGbHFn3k-kAnAJ4L7VgpL4RkC0yMZ3Xyug9H1a/img.png?credential=yqXZFxpELC7KVnFOS48ylbz2pIh7yKj8&amp;amp;expires=1761922799&amp;amp;allow_ip=&amp;amp;allow_referer=&amp;amp;signature=YQfkh2xPVZWbtv12P%2BC%2Bod%2Fe5TI%3D&quot; data-origin-width=&quot;539&quot; data-origin-height=&quot;417&quot; data-is-animation=&quot;false&quot; /&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;오류가 있을 수 있는 합성 데이터로 Imitation learning을 하는 것은 위험함&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;적은 모델에 대해서만 실험함&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>논문 리뷰/자연어처리</category>
      <category>nlp</category>
      <category>paper</category>
      <category>논문리뷰</category>
      <category>자연어처리</category>
      <category>학부연구생</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/117</guid>
      <comments>https://kyj0105.tistory.com/117#entry117comment</comments>
      <pubDate>Tue, 21 Oct 2025 11:36:10 +0900</pubDate>
    </item>
    <item>
      <title>BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models</title>
      <link>https://kyj0105.tistory.com/116</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/pdf/2301.12597&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://arxiv.org/pdf/2301.12597&lt;/a&gt;&lt;/p&gt;
&lt;h2 style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;BLIP-1&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;기존 모델은 이미지 모델과 텍스트 모델 둘 다 전체를 end-to-end로 tuning 시키는 것은 많은 비용과 시간이 듬&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;-&amp;gt; BLIP-2는 frozen LLM + frozen image encoder로 Q-formal만 학습함으로써 효율적임&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Architecture
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Image encoder와 large language model은 frozen 상태로 사&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Vision 모델과 language 모델을 잇는 bottleneck 역할을 수행하도록 Q-Former만 단독으로 학습시키자&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1105&quot; data-origin-height=&quot;524&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cpuRuv/btsQ1Zhurfp/RgxXr6rKWKvP9wazUZFT11/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cpuRuv/btsQ1Zhurfp/RgxXr6rKWKvP9wazUZFT11/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cpuRuv/btsQ1Zhurfp/RgxXr6rKWKvP9wazUZFT11/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcpuRuv%2FbtsQ1Zhurfp%2FRgxXr6rKWKvP9wazUZFT11%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1105&quot; height=&quot;524&quot; data-origin-width=&quot;1105&quot; data-origin-height=&quot;524&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;1단계: vision-language representation learning&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;BLIP-1과 마찬가지로 3가지 training object를 사용하여서 학습함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이 학습할 때 이미지 query와 텍스트 query간에 attention을 하는데, 목적에 따라 masking하는 부분이 다름&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Learnable Queries: 이미지 전체 feature 대신 적은 차원의 query vector로 정보 추출&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;ITC: Image-Text Matching: Q-Formal가 image와 text 정보 모두 보고 matching하며 학습하기 때문에 masking이 없음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;ITG: Image-Grounded Text Generation: 이미지는 이미지랑만 att하고, text는 이미지를 보며 한 글자씩 생성하니까 부분적으로 masking함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;ITM: Image-Text Contrastive Learning: 대조학습을 하며 자기자신만 봐야하니 반을 masking&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1782&quot; data-origin-height=&quot;383&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cnPYWT/btsQZeNPCdd/YDEzmhwuipkpc6ZsBYUVMk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cnPYWT/btsQZeNPCdd/YDEzmhwuipkpc6ZsBYUVMk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cnPYWT/btsQZeNPCdd/YDEzmhwuipkpc6ZsBYUVMk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcnPYWT%2FbtsQZeNPCdd%2FYDEzmhwuipkpc6ZsBYUVMk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1782&quot; height=&quot;383&quot; data-origin-width=&quot;1782&quot; data-origin-height=&quot;383&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;2단계: vision-to-language generative learning&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Decoder-only LLM: 이미지를 frozen image encoder에 넣어 시각적 특징을 추출 -&amp;gt; 이 특징을 Q-Formal에 넣어 Learned Queries를 생성 -&amp;gt; FC로 LLM과 차원 맞춰줌 -&amp;gt; LLM decoder의 앞에 붙여서 마치 소프트 프롬프트처럼 사용 -&amp;gt; LLM은 이어서 텍스트 생성&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Encoder-Decoder-based LLM: Decoder-only LLM과 같이 learned queries를 생성 -&amp;gt; learned queires와 함꼐 prefix text가 LLM decoder로 들어감&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1278&quot; data-origin-height=&quot;367&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ELpif/btsQZjuUwIO/dbLXkfjrMtnbnNPZxzpqd1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ELpif/btsQZjuUwIO/dbLXkfjrMtnbnNPZxzpqd1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ELpif/btsQZjuUwIO/dbLXkfjrMtnbnNPZxzpqd1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FELpif%2FbtsQZjuUwIO%2FdbLXkfjrMtnbnNPZxzpqd1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1278&quot; height=&quot;367&quot; data-origin-width=&quot;1278&quot; data-origin-height=&quot;367&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Table 2&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Zero-shot VQA에서 54x 적은 trainable parameters Flamingo 80B보다 높은 성능 달&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;759&quot; data-origin-height=&quot;363&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bsEDbv/btsQ0ElcLvL/0Ga37E9plG6RKu2tCgyCDK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bsEDbv/btsQ0ElcLvL/0Ga37E9plG6RKu2tCgyCDK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bsEDbv/btsQ0ElcLvL/0Ga37E9plG6RKu2tCgyCDK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbsEDbv%2FbtsQ0ElcLvL%2F0Ga37E9plG6RKu2tCgyCDK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;759&quot; height=&quot;363&quot; data-origin-width=&quot;759&quot; data-origin-height=&quot;363&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Table 3
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;BLIP-2가 zero-shot에서 SOTA 달&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1443&quot; data-origin-height=&quot;439&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bVByk2/btsQ17GMtwt/i05bKllsZN5RgfCo4k5Yk0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bVByk2/btsQ17GMtwt/i05bKllsZN5RgfCo4k5Yk0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bVByk2/btsQ17GMtwt/i05bKllsZN5RgfCo4k5Yk0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbVByk2%2FbtsQ17GMtwt%2Fi05bKllsZN5RgfCo4k5Yk0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1443&quot; height=&quot;439&quot; data-origin-width=&quot;1443&quot; data-origin-height=&quot;439&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 5
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;representation learning을 제거할 경우, 성능이 많이 감소함 -&amp;gt; frozen 모델 사이의 갭이 발생한다 -&amp;gt; Q-Formal가 modality gap의 bridge 역할을&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;616&quot; data-origin-height=&quot;315&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b5bG5a/btsQZY4SA6K/NWQdYlgGLLcKlrRYe7UL10/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b5bG5a/btsQZY4SA6K/NWQdYlgGLLcKlrRYe7UL10/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b5bG5a/btsQZY4SA6K/NWQdYlgGLLcKlrRYe7UL10/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb5bG5a%2FbtsQZY4SA6K%2FNWQdYlgGLLcKlrRYe7UL10%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;616&quot; height=&quot;315&quot; data-origin-width=&quot;616&quot; data-origin-height=&quot;315&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Table 5
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;COCO에서 finetuning 후, Flickr30k에 zero-shot transferred해도 SOTA를 달성함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;820&quot; data-origin-height=&quot;326&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cyxcWA/btsQ1w1bkE1/BpRLiKFNVVOZfiU7030rF0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cyxcWA/btsQ1w1bkE1/BpRLiKFNVVOZfiU7030rF0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cyxcWA/btsQ1w1bkE1/BpRLiKFNVVOZfiU7030rF0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcyxcWA%2FbtsQ1w1bkE1%2FBpRLiKFNVVOZfiU7030rF0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;820&quot; height=&quot;326&quot; data-origin-width=&quot;820&quot; data-origin-height=&quot;326&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Image encoder와 LLM 모델을 얼려두고 작은 모듈만 학습시키는 Q-Formal 제안&lt;/li&gt;
&lt;li&gt;SOTA 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;대규모 비전 모델과 언어 모델이 필요함&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>논문 리뷰</category>
      <category>nlp</category>
      <category>paper</category>
      <category>논문리뷰</category>
      <category>자연어처리</category>
      <category>학부연구생</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/116</guid>
      <comments>https://kyj0105.tistory.com/116#entry116comment</comments>
      <pubDate>Thu, 2 Oct 2025 14:29:39 +0900</pubDate>
    </item>
    <item>
      <title>BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation</title>
      <link>https://kyj0105.tistory.com/115</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/pdf/2201.12086&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://arxiv.org/pdf/2201.12086&lt;/a&gt;&lt;/p&gt;
&lt;h2 style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;VLP&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;기존의 VLP 모델들은 understanding-based tasks나 generation-based tasks에만 뛰어난 성능을 보였음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;noisy image-text pairs 데이터를 웹에서 수집해서 사용하는 과정에서 데이터셋이 많을 수록 높은 성능을 보였으나, 사람이 labeling한 데이터만은 못했음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Model perspective: text generation에 약한 encoder-based model 또는 image-text etrieval tasks에 약한 encoder-decoder model을 사용함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Data perspective: web에서 수집한 noisy한 image-text data를 활용함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Model perspective: Multimodal mixture of Encoder-Decoder (MED)&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Unimodal encoder: ITC, ViT의 이미지 임베딩 벡터와 텍스트 임베딩 벡터가 페어끼리는 가깝게 아니면 멀게 학습함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Image-grounded text encoder: ITM, binary classification task로 pair인지 아닌지를 예측하여 헷갈려하는 pair에 더욱 finegrained한 학습을 진행하도록 함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Image-grounded text decoder: LM, label smoothing을 통해 autoregressive 방식으로 이미지의 캡션을 예측함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1488&quot; data-origin-height=&quot;580&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bmeQXg/btsQIBA8THA/5LkLlE7mkxYm0AcMHid1HK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bmeQXg/btsQIBA8THA/5LkLlE7mkxYm0AcMHid1HK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bmeQXg/btsQIBA8THA/5LkLlE7mkxYm0AcMHid1HK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbmeQXg%2FbtsQIBA8THA%2F5LkLlE7mkxYm0AcMHid1HK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1488&quot; height=&quot;580&quot; data-origin-width=&quot;1488&quot; data-origin-height=&quot;580&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Data perspective: Captioning and Filtering (CapFilt)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;D = {(Iw, Tw)} + {(Ih, Th)}&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Filter: {(Ih, Th)}를 활용해 ITC&amp;amp;ITM finetuning -&amp;gt; {(Iw, Tw)}를 필터링해서 잘 어울리는 데이터만 사용&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Captioner: {(Ih, Th)}를 활용해 LM finetuning -&amp;gt; {Iw}에 대해 캡션을 생성하여 {(Iw, Ts)} 먄들고 필터에 같이 넣어줌&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;최종 데이터셋 D = {(Iw, Tw)} + {(Iw, Ts)} + {(Ih, Th)} 로 모델을 처음부터 pretraining하여서 반복&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1206&quot; data-origin-height=&quot;407&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/wPFU8/btsQHbpTo3U/kSk6l1JFQro1niNQC3TBf1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/wPFU8/btsQHbpTo3U/kSk6l1JFQro1niNQC3TBf1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/wPFU8/btsQHbpTo3U/kSk6l1JFQro1niNQC3TBf1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FwPFU8%2FbtsQHbpTo3U%2FkSk6l1JFQro1niNQC3TBf1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1206&quot; height=&quot;407&quot; data-origin-width=&quot;1206&quot; data-origin-height=&quot;407&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Table 1
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Captioner와 Filter 사용 유무와 vision backbone에 따른 비교 실험을 진행&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;C&amp;amp;F를 둘 다 사용하는 거시 효과적임을 보임&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;또한 정확한 Beam search보다 샘플링처럼 다양하게 caption을 생성하는 것이 중요&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1505&quot; data-origin-height=&quot;424&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Gp70u/btsQI08yll2/kOR8JuFXw9y9SJQ7gXM3R1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Gp70u/btsQI08yll2/kOR8JuFXw9y9SJQ7gXM3R1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Gp70u/btsQI08yll2/kOR8JuFXw9y9SJQ7gXM3R1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FGp70u%2FbtsQI08yll2%2FkOR8JuFXw9y9SJQ7gXM3R1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1505&quot; height=&quot;424&quot; data-origin-width=&quot;1505&quot; data-origin-height=&quot;424&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Table2
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Parameter sharing 비교 실험&lt;/li&gt;
&lt;li&gt;Parameter를 모두 비교하거나, 아예 비교하지 않는 것보다 SA layer 제외하고 공유하는 것이 가장 효과적&lt;/li&gt;
&lt;li&gt;Self-attention layer는 encoder와 decoder의 작업 자체가 달라서 충돌이 일어날까봐 공유하지 않음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1442&quot; data-origin-height=&quot;256&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/U9FSJ/btsQIUUOWF3/e1MguDONHBAMQXHCVu3bMK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/U9FSJ/btsQIUUOWF3/e1MguDONHBAMQXHCVu3bMK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/U9FSJ/btsQIUUOWF3/e1MguDONHBAMQXHCVu3bMK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FU9FSJ%2FbtsQIUUOWF3%2Fe1MguDONHBAMQXHCVu3bMK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1442&quot; height=&quot;256&quot; data-origin-width=&quot;1442&quot; data-origin-height=&quot;256&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Table4
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Captioner와 Filter의 share parameters 실험&lt;/li&gt;
&lt;li&gt;share할 경우 captioner가 만든 잘못된 caption을 잘 못 찾아 noise ratio가 낮음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1443&quot; data-origin-height=&quot;192&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/oN0mm/btsQI77Hchk/K0leaVabwtSzdwvXbjYB10/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/oN0mm/btsQI77Hchk/K0leaVabwtSzdwvXbjYB10/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/oN0mm/btsQI77Hchk/K0leaVabwtSzdwvXbjYB10/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FoN0mm%2FbtsQI77Hchk%2FK0leaVabwtSzdwvXbjYB10%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1443&quot; height=&quot;192&quot; data-origin-width=&quot;1443&quot; data-origin-height=&quot;192&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Table5
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;각 데이터셋에 대해 Text Retrieval과 Image Retrieval의 성능&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이전 모델들에 비해 높은 성능을 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1461&quot; data-origin-height=&quot;460&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dEvCIl/btsQH3Y3Adx/JhI2hthHnpkKnUrgVfEuqk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dEvCIl/btsQH3Y3Adx/JhI2hthHnpkKnUrgVfEuqk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dEvCIl/btsQH3Y3Adx/JhI2hthHnpkKnUrgVfEuqk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdEvCIl%2FbtsQH3Y3Adx%2FJhI2hthHnpkKnUrgVfEuqk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1461&quot; height=&quot;460&quot; data-origin-width=&quot;1461&quot; data-origin-height=&quot;460&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Table7
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;C (CIDEr): 생성된 캡션이 실제 사람이 작성한 여러 캡션들과 얼마나 유사한가&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;S (SPICE): 캡션의 객체, 속성, 관계 등 의미론적 정보가 얼마나 정확한가&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;기존 모델들보다 높은 성능 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1219&quot; data-origin-height=&quot;440&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/zFtNY/btsQHi3y7l7/wtemEnXknN6suwkxj8Gyk0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/zFtNY/btsQHi3y7l7/wtemEnXknN6suwkxj8Gyk0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/zFtNY/btsQHi3y7l7/wtemEnXknN6suwkxj8Gyk0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FzFtNY%2FbtsQHi3y7l7%2FwtemEnXknN6suwkxj8Gyk0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1219&quot; height=&quot;440&quot; data-origin-width=&quot;1219&quot; data-origin-height=&quot;440&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Text generation task와 image-text retrieval tasks에서 모두 강한 MED 제안&lt;/li&gt;
&lt;li&gt;Noisy web image-text pair 데이터를 노이즈를 제거하여 사용하기 위한 CapFilt 제안&lt;/li&gt;
&lt;li&gt;SOTA 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;End-to-End 구조로 모든 파라미터를 동시에 최적화해야해서 많은 컴퓨팅 자원과 시간이 필요함&lt;/li&gt;
&lt;li&gt;이미지와 텍스트를 융합하기 위해 cross-attention을 사용하여 효율적이지 못함&lt;/li&gt;
&lt;li&gt;-&amp;gt; 이는 BLIP2로 이어짐&lt;/li&gt;
&lt;/ul&gt;</description>
      <category>논문 리뷰/자연어처리</category>
      <category>nlp</category>
      <category>paper</category>
      <category>논문리뷰</category>
      <category>자연어처리</category>
      <category>학부연구생</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/115</guid>
      <comments>https://kyj0105.tistory.com/115#entry115comment</comments>
      <pubDate>Fri, 19 Sep 2025 18:06:38 +0900</pubDate>
    </item>
    <item>
      <title>OUTRAGEOUSLY LARGE NEURAL NETWORKS : THE SPARSELY GATED MIXTURE-OF-EXPERTS LAYER</title>
      <link>https://kyj0105.tistory.com/114</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://openreview.net/pdf?id=B1ckMDqlg&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://openreview.net/pdf?id=B1ckMDqlg&lt;/a&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Conditional computation&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;모델 사이즈를 키우면 성능이 증가하나 컴퓨팅 자원 문제로 한계가 있음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Conditional copmtation: 모델의 모든 부분을 항상 활성화하지 않고, 입력된 데이터에 따라 필요한 부분만 선택적으로 활성화하여 계산 효율성을 높이는 방법&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Sparsely-Gated Mixture-of-Experts Layer (MoE)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;입력에 따라 &lt;span style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot;&gt;Noisy Top-K Gating&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt; 방식으로 상위 K개의 일부 전문가를 선택하여 작동하게 함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;n개 experts와 feed-forward neural network로 이루어져 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;706&quot; data-origin-height=&quot;365&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dfqsjN/btsQBnb90zq/x6AX9wppkuTl8lKQtEFNiK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dfqsjN/btsQBnb90zq/x6AX9wppkuTl8lKQtEFNiK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dfqsjN/btsQBnb90zq/x6AX9wppkuTl8lKQtEFNiK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdfqsjN%2FbtsQBnb90zq%2Fx6AX9wppkuTl8lKQtEFNiK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;706&quot; height=&quot;365&quot; data-origin-width=&quot;706&quot; data-origin-height=&quot;365&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;수식&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Ei(x): i번째 expert의 출력&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;G(x)i: i번째 gate의 출력&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;755&quot; data-origin-height=&quot;138&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/lUl2X/btsQBESsqgV/tapKJnS2p6rpkA2vkVwzM0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/lUl2X/btsQBESsqgV/tapKJnS2p6rpkA2vkVwzM0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/lUl2X/btsQBESsqgV/tapKJnS2p6rpkA2vkVwzM0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FlUl2X%2FbtsQBESsqgV%2FtapKJnS2p6rpkA2vkVwzM0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;755&quot; height=&quot;138&quot; data-origin-width=&quot;755&quot; data-origin-height=&quot;138&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Performance Challenges&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Shrinking batch problem: Expert가 처리하는 배치 크기가 작아져 비효율성 발생 -&amp;gt; data parallelism와 model parallelism를 혼합하여 문제 해결&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Network bandwidth: 병목 현상을 해결하려함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Expert utilization balancing: gating network가 특정 전문가에만 집중하는 경향을 해결하기 위해 추가적인 손실항 도&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 3&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;기존 모델보다 낮은 복잡도를 보이며 큰 성능 향상을 보임&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;566&quot; data-origin-height=&quot;431&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/KrR8J/btsQBykpOjl/pVwuzFcmWclABJrTZrKRj1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/KrR8J/btsQBykpOjl/pVwuzFcmWclABJrTZrKRj1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/KrR8J/btsQBykpOjl/pVwuzFcmWclABJrTZrKRj1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FKrR8J%2FbtsQBykpOjl%2FpVwuzFcmWclABJrTZrKRj1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;566&quot; height=&quot;431&quot; data-origin-width=&quot;566&quot; data-origin-height=&quot;431&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Table 2, 3, 4
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;강화학습을 사용하지 않도록 BLEU 점수를 기준으로 기존 모델보다 높은 성능 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;841&quot; data-origin-height=&quot;666&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bGHVJw/btsQB6gT5uq/FLauM5gWDk696ZvGni7kTk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bGHVJw/btsQB6gT5uq/FLauM5gWDk696ZvGni7kTk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bGHVJw/btsQB6gT5uq/FLauM5gWDk696ZvGni7kTk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbGHVJw%2FbtsQB6gT5uq%2FFLauM5gWDk696ZvGni7kTk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;841&quot; height=&quot;666&quot; data-origin-width=&quot;841&quot; data-origin-height=&quot;666&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;새로운 방법론 MoE를 제안&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Expert N을 사람이 조정해야함&lt;/li&gt;
&lt;li&gt;가장 큰 모델의 경우 너무 높은 희소성으로 오히려 성능 저하&lt;/li&gt;
&lt;li&gt;방대한 양의 학습 데이터 필요&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;나중에 다시 읽어야할 것 같다 이해가 잘 안됨&lt;/p&gt;</description>
      <category>논문 리뷰/자연어처리</category>
      <category>nlp</category>
      <category>논문리뷰</category>
      <category>자연어처리</category>
      <category>학부연구생</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/114</guid>
      <comments>https://kyj0105.tistory.com/114#entry114comment</comments>
      <pubDate>Tue, 16 Sep 2025 14:00:45 +0900</pubDate>
    </item>
    <item>
      <title>DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning</title>
      <link>https://kyj0105.tistory.com/113</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/pdf/2501.12948&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://arxiv.org/pdf/2501.12948&lt;/a&gt;&lt;/p&gt;
&lt;h2 style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;GRPO: &lt;span style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot;&gt;value model을 사용하지 않고, reward 모델의 그룹에 비해 얼마나 이 행동이 좋은가 Ai function을 평가&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Overview&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;DeepSeek-R1-Zero (poor readability, language mixing) -&amp;gt; cold-start data&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;DeepSeek-V3-base + cold-start data&amp;nbsp; -&amp;gt; DeepSeek-R1&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;DeepSeek-R1 + distillation (SFT with generated data)-&amp;gt; &lt;span style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot;&gt;ditilled&lt;/span&gt; DeepSeek-R1&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;DeepSeek-R1-Zero: without any SFT data, only RL&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;아무런 SFT 없이 pure reinforcement learninig으로 self-evaluation에 집중함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이 떄 사용한 알고리즘이 GRPO&amp;nbsp;&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Reward hacking을 막기 위해 rule 기반으로 reward 제공 (accuracy, format)&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;OpenAI-o1-0912와 비교하여 더 높은 성능 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;DeepSeek-R1: CoT example로 SFT&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;DeepSeek-R1-Zero가 CoT로 생성한 데이터로 SFT해서 시작&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Distilled DeepSeek-R1
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;작은 모델에 reasoning 능력도 distillation&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;796&quot; data-origin-height=&quot;460&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/mXdoM/btsQgf1brJ1/RKC5W9DxwyKuJR0P6sGwBk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/mXdoM/btsQgf1brJ1/RKC5W9DxwyKuJR0P6sGwBk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/mXdoM/btsQgf1brJ1/RKC5W9DxwyKuJR0P6sGwBk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FmXdoM%2FbtsQgf1brJ1%2FRKC5W9DxwyKuJR0P6sGwBk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;796&quot; height=&quot;460&quot; data-origin-width=&quot;796&quot; data-origin-height=&quot;460&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 2
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;8,000 step 학습 후 o1보다 높은 성능&lt;/li&gt;
&lt;li&gt;voting하니 더 높은 성&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;371&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Wus0w/btsQhls0FhZ/9uP53kcngOcLGXG03R9QN1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Wus0w/btsQhls0FhZ/9uP53kcngOcLGXG03R9QN1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Wus0w/btsQhls0FhZ/9uP53kcngOcLGXG03R9QN1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FWus0w%2FbtsQhls0FhZ%2F9uP53kcngOcLGXG03R9QN1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;596&quot; height=&quot;371&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;371&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 3
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Thinking time이 길어질 수록 response의 길이가 길어짐&amp;nbsp;&lt;/li&gt;
&lt;li&gt;Self-evaluation을 통해 더 깊게 생각하고 답변&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;605&quot; data-origin-height=&quot;355&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cO1Aws/btsQhfM9zli/NzMxuKXbhsIcaTeqRPqHr1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cO1Aws/btsQhfM9zli/NzMxuKXbhsIcaTeqRPqHr1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cO1Aws/btsQhfM9zli/NzMxuKXbhsIcaTeqRPqHr1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcO1Aws%2FbtsQhfM9zli%2FNzMxuKXbhsIcaTeqRPqHr1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;605&quot; height=&quot;355&quot; data-origin-width=&quot;605&quot; data-origin-height=&quot;355&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Table 3
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DeepSeek-R1-Zero 모델이 추론 과정에서 스스로의 답변을 개선하는 &quot;Aha moment&quot;를 발견함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;745&quot; data-origin-height=&quot;405&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cHkmnA/btsQgffQuUj/13eR0jRNybtGkzxkEM8OF0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cHkmnA/btsQgffQuUj/13eR0jRNybtGkzxkEM8OF0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cHkmnA/btsQgffQuUj/13eR0jRNybtGkzxkEM8OF0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcHkmnA%2FbtsQgffQuUj%2F13eR0jRNybtGkzxkEM8OF0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;745&quot; height=&quot;405&quot; data-origin-width=&quot;745&quot; data-origin-height=&quot;405&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Table 4
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DeepSeek-R1이 대규모 언어 모델들 중 다양한 task에서 높은 성능을 달&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;695&quot; data-origin-height=&quot;558&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bA9GJN/btsQhTXcLSn/eynU0RLIZnvWRhrHykmbJ1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bA9GJN/btsQhTXcLSn/eynU0RLIZnvWRhrHykmbJ1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bA9GJN/btsQhTXcLSn/eynU0RLIZnvWRhrHykmbJ1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbA9GJN%2FbtsQhTXcLSn%2FeynU0RLIZnvWRhrHykmbJ1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;695&quot; height=&quot;558&quot; data-origin-width=&quot;695&quot; data-origin-height=&quot;558&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DeepSeek-R1 모델로 gpt o1-1217 과 비슷하건 높은 성능 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;일반 능력은 DeepSeek-V3에 비해 뒤쳐짐&lt;/li&gt;
&lt;li&gt;영어, 중국어로 학습되어 타 언어 입력 시 언어 혼용 문제 발생&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>논문 리뷰/자연어처리</category>
      <category>nlp</category>
      <category>논문리뷰</category>
      <category>자연어처리</category>
      <category>학부연구생</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/113</guid>
      <comments>https://kyj0105.tistory.com/113#entry113comment</comments>
      <pubDate>Tue, 2 Sep 2025 17:16:52 +0900</pubDate>
    </item>
    <item>
      <title>AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE 리뷰</title>
      <link>https://kyj0105.tistory.com/112</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/pdf/2010.11929&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://arxiv.org/pdf/2010.11929&lt;/a&gt;&lt;/p&gt;
&lt;h2 style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Vision model&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;ResNet: Convolution network와 residual connection으로 이전 모델보다 깊은 학습을 통해 높은 성능을 달성&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;BiT: ResNet 구조에 대규모 데이터로 학습하자 다양한 다운스트림 데이터셋에 task-specific tuning 없이도 높은 성능 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;ViT (Vision Transformer)&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Transformer 구조에 이미지를 patch로 쪼개 하나의 토큰처럼 입력&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이 때 하나의 patch는 flatten하여 모델로 들어가는데, 처음부터 flatten하면 연산량이 너무 커짐&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;첫 토큰은 BERT의 [CLS]처럼 클래스를 예측 하는 역할을 하여 모델을 학습함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;pretraining에서는 저해상도 이미지와 MLP를 사용하지만 finetuning에서는 고해상도 이미지와 single linear layer 사용? -&amp;gt; 대규모 데이터에서는 MLP가 낫지만, 적은 데이터로 finetuning할 때는 single layer가 낫다&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;MPP: MLM과 비슷하게 이미지의 일부를 가려놓고 맞추는 regression self-supervised task&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Position embedding&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이미지를 patch로 쪼갠 후 시퀀스에 학습 가능한 1D positional embedding을 더함 (2D 써도 큰 이득 없음)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;845&quot; data-origin-height=&quot;431&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bpqHvL/btsP3zedPoE/awZYwYpn9VUZbLoza4lkkK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bpqHvL/btsP3zedPoE/awZYwYpn9VUZbLoza4lkkK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bpqHvL/btsP3zedPoE/awZYwYpn9VUZbLoza4lkkK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbpqHvL%2FbtsP3zedPoE%2FawZYwYpn9VUZbLoza4lkkK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;845&quot; height=&quot;431&quot; data-origin-width=&quot;845&quot; data-origin-height=&quot;431&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Hybrid model&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;ResNet과 Transformer를 섞은 모델로 ViT에 넣어주기 전에 ResNet을 활용하여 feature를 뽑아서 ViT에 넣어줌&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Table 2
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Ours가 기존 모델에 비해 효율적이며 높은 성능을 달성함을 증명&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;ViT는 CNN에 비해 작은 데이터셋에서는 성능이 떨어지나 대규모 데이터셋에서는 inductive bias 없이도 높은 성능을 달성&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Inductive bias: 모델이 학습하기 전에 가지고 있는 구조적 편향 (예를 들어 CNN의 translation equivariance)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;794&quot; data-origin-height=&quot;245&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dNcHnE/btsP2zTwPCI/sb1cITfNxb1MLXqGti9cRK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dNcHnE/btsP2zTwPCI/sb1cITfNxb1MLXqGti9cRK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dNcHnE/btsP2zTwPCI/sb1cITfNxb1MLXqGti9cRK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdNcHnE%2FbtsP2zTwPCI%2Fsb1cITfNxb1MLXqGti9cRK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;794&quot; height=&quot;245&quot; data-origin-width=&quot;794&quot; data-origin-height=&quot;245&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 2
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;전이학습을 진행한 경우 전체적으로 ViT-H/14가 높은 성능을 달성함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;VTAB: 전체 19 task&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Natural: 자연 이미지&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Specialized: 특수 도메인 (의료, 위성 등)&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Structured: 구조적 이해가 필요한 도메인 (위치 등)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;781&quot; data-origin-height=&quot;186&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/HryqS/btsP41A1H2X/ameQPJC2IYYAHZgj8uxv40/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/HryqS/btsP41A1H2X/ameQPJC2IYYAHZgj8uxv40/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/HryqS/btsP41A1H2X/ameQPJC2IYYAHZgj8uxv40/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FHryqS%2FbtsP41A1H2X%2FameQPJC2IYYAHZgj8uxv40%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;781&quot; height=&quot;186&quot; data-origin-width=&quot;781&quot; data-origin-height=&quot;186&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 7
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;RGP embedding filters: trasnformer에 넣기 전 선형 projection의 weight -&amp;gt; 줄무늬, 대간석 등 low-level feature를 학습함을 보여줌&lt;/li&gt;
&lt;li&gt;Position embedding similarity: 가까운 patch일 수록 유사도가 높음 -&amp;gt; 2D 구조를 잘 학습하고 있음을 증명&lt;/li&gt;
&lt;li&gt;Mean attention distance: self-attention head가 이미지에서 얼마나 넓은 범위를 참고하는지 측정해보니 ViT는 첫 layer부터 global feature를 학습함을 증명&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;806&quot; data-origin-height=&quot;244&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cVduKD/btsP3cwOoFd/uzwv4vjOy3YsiUc9eJZgKK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cVduKD/btsP3cwOoFd/uzwv4vjOy3YsiUc9eJZgKK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cVduKD/btsP3cwOoFd/uzwv4vjOy3YsiUc9eJZgKK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcVduKD%2FbtsP3cwOoFd%2Fuzwv4vjOy3YsiUc9eJZgKK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;806&quot; height=&quot;244&quot; data-origin-width=&quot;806&quot; data-origin-height=&quot;244&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 SOTA 모델을 능가하는 성능을 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;사람의 labeling data 필요&lt;/li&gt;
&lt;li&gt;Downstream task에 re-training 필요&lt;/li&gt;
&lt;li&gt;Classification task에 한정됨&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>논문 리뷰</category>
      <category>paper</category>
      <category>Vision</category>
      <category>논문리뷰</category>
      <category>이미지처리</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/112</guid>
      <comments>https://kyj0105.tistory.com/112#entry112comment</comments>
      <pubDate>Sun, 24 Aug 2025 17:41:00 +0900</pubDate>
    </item>
    <item>
      <title>CLIP: Learning Transferable Visual Models From Natural Language Supervision 논문 리뷰</title>
      <link>https://kyj0105.tistory.com/111</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/pdf/2103.00020&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://arxiv.org/pdf/2103.00020&lt;/a&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Traditional vision model&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;ResNet: CNN과 잔차학습으로 이미지의 클래스 예측&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;ViT: 이미지를 패치로 쪼개서 넣어주는데, 첫번째 토큰이 [CLS] 토큰 기능을 해서 이미지의 class를 분류&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;전통적인 vision model은 1) 사람의 라벨링 필요 2) 고정된 라벨에 대해 학습함으로써 task 변경시 재학습 필요 3) 제로샷 전이 불가능&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Natural languge model&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;GPT-3: 해당 모델은 통해서 모델의 사이즈와 데이터의 양을 키우면 zero-shot만으로도 높은 성능을 낼 수 있음을 보여줌&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이를 통해 저품질의 대규모 웹 데이터를 활용하면 자연어처리 분야에서 큰 발전을 이룰 수 있음을 보여줌&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Multi-modal model
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Learning visual feature form large weakly supervised data: 웹 이미지에 달린 캡션의 단어의 집합을 예측하는 방법으로 이미지의 정보를 학습&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;&amp;nbsp;VirTex: 이미지의 캡션을 transformer decoder를 2개 사용해 예측&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;두 연구 모두 캡션을 잘 예측하는 것이 목적이 아닌 transferable visual representations를 학습하는 것이 목표였음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;하지만 여전히 task가 바뀌면 추가적인 training이 필요하다는 단점이 있었음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;어떻게 하면 자연어처리 분야가 아닌 비전 분야에서, 대규모 웹-이미지 데이터를 활용한, 확장 가능한 pretraining metohds로 gpt-3와 같은 성공을 거둘 수 있을까?&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이론&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;pair인 이미지는 가깝게, pair가 아닌 이미지는 멀게&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;그러면 같은 강아지 이미지인데 pair가 아니니까 이상하게 학습되는거 아니냐? -&amp;gt; 자연어처리 분야에서도 causal language modeling 방식으로 사람이 보기에 잘못된 loss가 예측될 수도 있으나 사람이 일일이 라벨링 하기 어려우니 대신하는 방법으로, 사람의 노동을 줄이기 위한 방법이니 어쩔 수 없고 상당히 효과적이다&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;데이터 전처리&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이미지는 224x224로 random crop&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;텍스트는 앞에 [SOS], 뒤에 [EOS], 76 토큰 이상이면 자르기, 모두 소문자로 변경&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Contrastive pretraining
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Text encoder: 마지막 hidden state의 [EOS] 토큰 임베딩을 사용&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Image encoder: 마지막 hidden state의 임베딩 (차원 안 맞으면 projection layer)&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Pair인 임베딩 벡터 간의 consine similarity를 구해서, pair는 최대가 되게, pair가 아니면 최소가 되게&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Use for zero-shot prediction
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Class 명을 &quot;A photo of a {object}.&quot;라는 고정된 프롬프트에 넣어 text encoder로 임베딩 벡터를 생성함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Image encoder에 이미지를 넣어서 생성한 임베딩 벡터와 클래스 갯수만큼의 텍스트 임베딩 벡터 간에 유사도를 구해서 가장 유사도가 높은 클래스를 정답으로 예측&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;질문
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;BERT 같은 모델은 [SOS] 토큰을 쓰는데 왜 [EOS] 토큰을 썼는지? -&amp;gt; 원래 autoregressive mask 방식을 사용하려해서&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Cosine similarity를 구하는데 왜 np.dot을 하는지? -&amp;gt; L2_normalize를 통해 같아짐&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Zero-shot, few-shot, linear probe clip이 각각 무슨 차이인지? -&amp;gt; zero-shot은 논문 그림에 있는거 그대로, linear probe clip은 이미지 인코더가 생성한 임베딩 벡터를 logistic regression layer로 다시 학습한 모델, 이 때 데이터를 몇개만 쓰면 few-shot clip&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1022&quot; data-origin-height=&quot;372&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dqErYj/btsP2YY3cHb/RbUe10bhZDEZOlmBsfUsLk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dqErYj/btsP2YY3cHb/RbUe10bhZDEZOlmBsfUsLk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dqErYj/btsP2YY3cHb/RbUe10bhZDEZOlmBsfUsLk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdqErYj%2FbtsP2YY3cHb%2FRbUe10bhZDEZOlmBsfUsLk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1022&quot; height=&quot;372&quot; data-origin-width=&quot;1022&quot; data-origin-height=&quot;372&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1027&quot; data-origin-height=&quot;405&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bUTtth/btsP4daeMm1/B4EB1UqXVwRhlP6veonqO0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bUTtth/btsP4daeMm1/B4EB1UqXVwRhlP6veonqO0/img.png&quot; data-alt=&quot;제가 그린 linear probe clip의 overview&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bUTtth/btsP4daeMm1/B4EB1UqXVwRhlP6veonqO0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbUTtth%2FbtsP4daeMm1%2FB4EB1UqXVwRhlP6veonqO0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1027&quot; height=&quot;405&quot; data-origin-width=&quot;1027&quot; data-origin-height=&quot;405&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;제가 그린 linear probe clip의 overview&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 2
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;앞선 연구들과 비교했을 때 이미지 인코더는 그대로 사용하고, 텍스트 인코더는 BoW 형식으로 예측하여서, 대조학습으로 학습하는게 가장 효율적이었음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;GPT-3처럼 성공하려면 모델 사이즈와 데이터양을 매우 키워야하기 때문에 가장 효율적인 방식을 사용해야했음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;650&quot; data-origin-height=&quot;397&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c4zGcC/btsP2FFoJDM/4WNLXIqPV611FtXLUg6TFK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c4zGcC/btsP2FFoJDM/4WNLXIqPV611FtXLUg6TFK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c4zGcC/btsP2FFoJDM/4WNLXIqPV611FtXLUg6TFK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc4zGcC%2FbtsP2FFoJDM%2F4WNLXIqPV611FtXLUg6TFK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;650&quot; height=&quot;397&quot; data-origin-width=&quot;650&quot; data-origin-height=&quot;397&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 4
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Prompt를 &quot;A photo of a {object}.&quot;만 사용하는거보다 각 task 맞는 프롬프트를 앙상블해서 사용하니 더 효율적이고 높은 성능을 달성했음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;665&quot; data-origin-height=&quot;620&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cRpjjy/btsP27hfdCQ/l3GBGbxVuP4YUmfKnD46gk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cRpjjy/btsP27hfdCQ/l3GBGbxVuP4YUmfKnD46gk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cRpjjy/btsP27hfdCQ/l3GBGbxVuP4YUmfKnD46gk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcRpjjy%2FbtsP27hfdCQ%2Fl3GBGbxVuP4YUmfKnD46gk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;665&quot; height=&quot;620&quot; data-origin-width=&quot;665&quot; data-origin-height=&quot;620&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 5
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Zero-shot CLIP은 웹 이미지-텍스트 데이터를 바탕으로 학습했기 때문에 데이터의 분포가 비슷한 데이터셋의 task일수록 높은 성능을 달성함&lt;/li&gt;
&lt;li&gt;분포가 다른 (사람에게도 어려운) task에서는 full finetuning보다 낮은 성능을 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;637&quot; data-origin-height=&quot;691&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xKcm9/btsP426X6kJ/ir5pZ5YSG96gPYckasLpqk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xKcm9/btsP426X6kJ/ir5pZ5YSG96gPYckasLpqk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xKcm9/btsP426X6kJ/ir5pZ5YSG96gPYckasLpqk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxKcm9%2FbtsP426X6kJ%2Fir5pZ5YSG96gPYckasLpqk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;637&quot; height=&quot;691&quot; data-origin-width=&quot;637&quot; data-origin-height=&quot;691&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 6&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Few-shot CLIP의 데이터 수에 따른 성능 비교&lt;/li&gt;
&lt;li&gt;Few-shot이라 해도 다른 방법론 보다는 높은 성능을 달성함&lt;/li&gt;
&lt;li&gt;&lt;b&gt;왜 Zero-shot이 few-shot보다 성능이 높지? -&amp;gt; 일반적으로 생각하는 GPT-3의 few-shot이 아니라 고정된 임베딩 벡터를 다시 학습하는 방법이라 데이터의 수가 적을 때는 zero-shot보다 낮은 성능을 달성함&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;643&quot; data-origin-height=&quot;600&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/kTKk1/btsP3wgXWfB/EsZaOAKGu2Rro6jZINx5z1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/kTKk1/btsP3wgXWfB/EsZaOAKGu2Rro6jZINx5z1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/kTKk1/btsP3wgXWfB/EsZaOAKGu2Rro6jZINx5z1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FkTKk1%2FbtsP3wgXWfB%2FEsZaOAKGu2Rro6jZINx5z1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;643&quot; height=&quot;600&quot; data-origin-width=&quot;643&quot; data-origin-height=&quot;600&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 9
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;모델을 키울수록 연산량은 늘어나나 높은 성능을 달성함&lt;/li&gt;
&lt;li&gt;GPT-3처럼 모델의 사이즈를 키울수록 좋은게 multi-modal 분야에도 적용됨을 증명&lt;/li&gt;
&lt;li&gt;이 때 transformer 모델 사이즈에는 큰 영향을 받지 않았고, image encoder의 사이즈가 영향을 미쳤다고 함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;658&quot; data-origin-height=&quot;402&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cSIxK4/btsP4xfaOW7/e99mBB6s47P1TcZasroKW1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cSIxK4/btsP4xfaOW7/e99mBB6s47P1TcZasroKW1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cSIxK4/btsP4xfaOW7/e99mBB6s47P1TcZasroKW1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcSIxK4%2FbtsP4xfaOW7%2Fe99mBB6s47P1TcZasroKW1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;658&quot; height=&quot;402&quot; data-origin-width=&quot;658&quot; data-origin-height=&quot;402&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 10
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;비교 실험을 했을 때 CLIP이 가장 높은 성능을 달성함&lt;/li&gt;
&lt;li&gt;CLIP 논문은 zero-shot clip을 제안하기는 하였으나, 기존 SOTA 모델들은 full-finetuning으로 성능을 보고 했기 때문에 같은 환경에서 비교하기 위해서 논문의 저자들도 linear probe clip으로 비교&lt;/li&gt;
&lt;li&gt;왼쪽 12-datset은 기존 SOTA 모델들과 그렇게 큰 차이가 안 나나, 오른쪽 27-dataset은 큰 차이가 남&lt;/li&gt;
&lt;li&gt;CLIP이 더 다양한 task에서 강건한 성능을 달성함을 보여줌&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;990&quot; data-origin-height=&quot;611&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/pG5Mq/btsP1YMbmC9/Q1TJuKPaIS1ma5OYW7NkKk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/pG5Mq/btsP1YMbmC9/Q1TJuKPaIS1ma5OYW7NkKk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/pG5Mq/btsP1YMbmC9/Q1TJuKPaIS1ma5OYW7NkKk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FpG5Mq%2FbtsP1YMbmC9%2FQ1TJuKPaIS1ma5OYW7NkKk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;990&quot; height=&quot;611&quot; data-origin-width=&quot;990&quot; data-origin-height=&quot;611&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 13
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 10은 데이터셋의 변화에도 강건함을 증명했다면 Figure 13은 데이터의 natural distribution shift에도 강건함을 증명&lt;/li&gt;
&lt;li&gt;대규모 데이터셋인 ImageNet에 finetuning한 모델은 ImageNet과 ImageNetV2에서만 높은 성능을 보이고 나머지는 매우 낮은 성능을 보임&lt;/li&gt;
&lt;li&gt;하지만 zero-shot clip의 경우 다양한 task에서 일관되게 높은 성능을 달성함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;980&quot; data-origin-height=&quot;380&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/q1hI5/btsP4S4rzYj/0n1bat5Q85EM8FRLZg1xaK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/q1hI5/btsP4S4rzYj/0n1bat5Q85EM8FRLZg1xaK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/q1hI5/btsP4S4rzYj/0n1bat5Q85EM8FRLZg1xaK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fq1hI5%2FbtsP4S4rzYj%2F0n1bat5Q85EM8FRLZg1xaK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;980&quot; height=&quot;380&quot; data-origin-width=&quot;980&quot; data-origin-height=&quot;380&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 14
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;빨간색: ImageNet에서 tuning한 linear probe clip&lt;/li&gt;
&lt;li&gt;주황색: ImageNet의 class의 명으로 작성한 프롬프트로 zero-shot clip&lt;/li&gt;
&lt;li&gt;보라색: 논문의 저자가 제안하는 zero-shot clip (클래스의 분포가 변할 때 object만 바꿔껴주면 됨)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;975&quot; data-origin-height=&quot;483&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bZoAzy/btsP19z2dO0/okkMlME34NkLeqkIPeR9kK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bZoAzy/btsP19z2dO0/okkMlME34NkLeqkIPeR9kK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bZoAzy/btsP19z2dO0/okkMlME34NkLeqkIPeR9kK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbZoAzy%2FbtsP19z2dO0%2FokkMlME34NkLeqkIPeR9kK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;975&quot; height=&quot;483&quot; data-origin-width=&quot;975&quot; data-origin-height=&quot;483&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 15
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Zero-shot 뿐만 아니라 few-shot도 다른 모델에 비해 강건함을 보여줌&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;479&quot; data-origin-height=&quot;458&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bdNhYR/btsP2LrZFiB/AeO5cDFGp0QHSrwWuoxVVk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bdNhYR/btsP2LrZFiB/AeO5cDFGp0QHSrwWuoxVVk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bdNhYR/btsP2LrZFiB/AeO5cDFGp0QHSrwWuoxVVk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbdNhYR%2FbtsP2LrZFiB%2FAeO5cDFGp0QHSrwWuoxVVk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;479&quot; height=&quot;458&quot; data-origin-width=&quot;479&quot; data-origin-height=&quot;458&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Table 2
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;사람의 zero-shot 능력은 안 좋으나 one-shot만 주어져도 좋아짐 -&amp;gt; 사람은 무엇을 모르는지 모르는 상태&lt;/li&gt;
&lt;li&gt;하지만 two-shot은 큰 차이 없음&lt;/li&gt;
&lt;li&gt;CLIP은 zero-shot 성능도 좋은데 데이터를 추가할 수록 일관되게 성능이 증가해 개선 여지가 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;476&quot; data-origin-height=&quot;163&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cHgumO/btsP3yMs1Hg/dkvnMRN5Ae75nzhS9fXalK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cHgumO/btsP3yMs1Hg/dkvnMRN5Ae75nzhS9fXalK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cHgumO/btsP3yMs1Hg/dkvnMRN5Ae75nzhS9fXalK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcHgumO%2FbtsP3yMs1Hg%2FdkvnMRN5Ae75nzhS9fXalK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;476&quot; height=&quot;163&quot; data-origin-width=&quot;476&quot; data-origin-height=&quot;163&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;대규모 웹-이미지 pair 데이터를와 간단한 대조학습을 활용한 CLIP 프레임워크 제안&lt;/li&gt;
&lt;li&gt;Distrubution shift에서 zero-shot learning도 강건함을 증명&lt;/li&gt;
&lt;li&gt;27가지 dataset에서 추가적인 학습 없이도 강한 성능을 달성하며 CLIP의 가능성을 제안&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Class prediction 과정에서 사람이 class를 지정해주어야함 -&amp;gt; 사람이 모르는 객체는 판별 불가&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>논문 리뷰</category>
      <category>paper</category>
      <category>Vision</category>
      <category>논문리뷰</category>
      <category>컴퓨터비전</category>
      <category>학부연구생</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/111</guid>
      <comments>https://kyj0105.tistory.com/111#entry111comment</comments>
      <pubDate>Fri, 22 Aug 2025 16:46:30 +0900</pubDate>
    </item>
    <item>
      <title>Contrastive Learning of Medical Visual Representations from Paired Images and Text 논문 리뷰</title>
      <link>https://kyj0105.tistory.com/110</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://openreview.net/pdf?id=T4gXBOXoIUr&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://openreview.net/pdf?id=T4gXBOXoIUr&lt;/a&gt;&lt;/p&gt;
&lt;h2 style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Image encoder&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Image representation을 학습하기 위해서는, 의료 분야의 경우 적은 양의 annotated hand-labeled datasets에 의존함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;기존 연구의 경우 대부분 ImageNet pretraining의 가중치를 전이하는 방식임&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;&lt;i&gt;&lt;b&gt;-&amp;gt; 기존 연구들은 어떻게 pretraining을 했는지? -&amp;gt; CNN을 활용해 이미지의 class를 맞추는 방식으로 tuning&lt;/b&gt;&lt;/i&gt;&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;그러나 의료 분야에서 ImageNet pretraining은 random initialization이나 성능이 거이 비슷함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Expert-crafted rules를 활용하는 방법도 있음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;그러나 우리는 naturally occuring pairing of images and textual data로 바로 unsupervised strategy Contrastive VIsual Representation Learning from Text (ConVIRT) 를 제안함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;수식 정의&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;xv: input image&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;x&amp;tilde;v: 증강한 input image&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;fv: input image를 hv로 transform하는 encoder 함수&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;gv: hv를 vector v로 transform하는 non-linear projection function&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;결국 이미지의 의미를 하나의 벡터로 변환하는 과정&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;&lt;i&gt;&lt;b&gt;왜 non-linear projection function을 쓸까? -&amp;gt; contrastive representation 학습에서 projection head가 성능을 높여줌&lt;/b&gt;&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;290&quot; data-origin-height=&quot;48&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/uQSSk/btsPM1hRnO3/4ZIa0yZCp2piR7D6JWqSR0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/uQSSk/btsPM1hRnO3/4ZIa0yZCp2piR7D6JWqSR0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/uQSSk/btsPM1hRnO3/4ZIa0yZCp2piR7D6JWqSR0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FuQSSk%2FbtsPM1hRnO3%2F4ZIa0yZCp2piR7D6JWqSR0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;290&quot; height=&quot;48&quot; data-origin-width=&quot;290&quot; data-origin-height=&quot;48&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;수식 2, 3, 4&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;텍스트: 같은 pair의 image랑은 가까워지게하고, 다른 image랑은 멀어지게 함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이미지: 같은 pair의 text랑은 가까워지게하고, 다른 text랑은 멀어지게 함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;InfoNCE 기반 loss를 사용하면서 image-&amp;gt;text, text-&amp;gt;image loss를 각각 구해 섞어서 사용하여서 bidirectional objective라&amp;nbsp;&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;최종적으로 loss의 비율을 조정&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;648&quot; data-origin-height=&quot;133&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/d2eAU1/btsPPZ33MIn/Pmnz02d9QAyVM9BIYlweb1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/d2eAU1/btsPPZ33MIn/Pmnz02d9QAyVM9BIYlweb1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/d2eAU1/btsPPZ33MIn/Pmnz02d9QAyVM9BIYlweb1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd2eAU1%2FbtsPPZ33MIn%2FPmnz02d9QAyVM9BIYlweb1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;633&quot; height=&quot;130&quot; data-origin-width=&quot;648&quot; data-origin-height=&quot;133&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;600&quot; data-origin-height=&quot;131&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/baXaR3/btsPPlGuB8L/H3Fgl2KUyijqrIJxE3qbfK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/baXaR3/btsPPlGuB8L/H3Fgl2KUyijqrIJxE3qbfK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/baXaR3/btsPPlGuB8L/H3Fgl2KUyijqrIJxE3qbfK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbaXaR3%2FbtsPPlGuB8L%2FH3Fgl2KUyijqrIJxE3qbfK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;131&quot; data-origin-width=&quot;600&quot; data-origin-height=&quot;131&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;613&quot; data-origin-height=&quot;159&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/XYA88/btsPL3gbI4m/uhQEGxhfgY7e8JhipKD8u0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/XYA88/btsPL3gbI4m/uhQEGxhfgY7e8JhipKD8u0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/XYA88/btsPL3gbI4m/uhQEGxhfgY7e8JhipKD8u0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FXYA88%2FbtsPL3gbI4m%2FuhQEGxhfgY7e8JhipKD8u0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;509&quot; height=&quot;132&quot; data-origin-width=&quot;613&quot; data-origin-height=&quot;159&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Downstream task&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;RSNA Pneumonia Detection&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;CheXpert&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;COVIDx&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;MURA&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Zero-shot image-image Retrieval&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Zero-shot text-image retrieval&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Architecture
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Image encoder: ResNet50&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Text encoder: BERT base (initial with ClinicalBERT) -&amp;gt; embedding, first 6 layer는 고정, last 6 layer는 finetuning&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Table 1
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;우리의 ConVIRT 방식으로 pretraining하면 10%의 데이터만 있어도 ImageNet pretraining initialize와 같거나 우수한 성능을 낼 수 있음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;&lt;b&gt;&lt;i&gt;그야 데이터가 의료 분야 데이터를 써서 그런거 아님? -&amp;gt; 맞는듯? 메일 씀&lt;/i&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1062&quot; data-origin-height=&quot;708&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bjqbcR/btsPNhxXTlI/8FMVfNKVygtvK2ooGdj3JK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bjqbcR/btsPNhxXTlI/8FMVfNKVygtvK2ooGdj3JK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bjqbcR/btsPNhxXTlI/8FMVfNKVygtvK2ooGdj3JK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbjqbcR%2FbtsPNhxXTlI%2F8FMVfNKVygtvK2ooGdj3JK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1062&quot; height=&quot;708&quot; data-origin-width=&quot;1062&quot; data-origin-height=&quot;708&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Table 2&amp;nbsp;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Random: ResNet50을 아무 사전학습 없이 랜덤 초기화로 시작&lt;/li&gt;
&lt;li&gt;ImageNet: ImageNet에서 사전학습된 가중치로 초기화&lt;/li&gt;
&lt;li&gt;Caption-Transformer: ImageNet init 후, transformer가 이미지의 caption 생성을 예측하는 방식&lt;/li&gt;
&lt;li&gt;Caption-LSTM: &lt;span style=&quot;color: #333333; text-align: left;&quot;&gt;ImageNet init 후,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt; LSTM이 이미지의 caption 생성을 예측하는 방식&lt;/li&gt;
&lt;li&gt;Contrastive-Binary: ConVIRT의 InfoNCE 기반 similarity maximization 대신 binary classification head로 true pair/false pair를 예측&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;735&quot; data-origin-height=&quot;258&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bK3hno/btsPNgeLJbE/zLgtxSClr0ghkMZZfVR5M1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bK3hno/btsPNgeLJbE/zLgtxSClr0ghkMZZfVR5M1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bK3hno/btsPNgeLJbE/zLgtxSClr0ghkMZZfVR5M1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbK3hno%2FbtsPNgeLJbE%2FzLgtxSClr0ghkMZZfVR5M1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;735&quot; height=&quot;258&quot; data-origin-width=&quot;735&quot; data-origin-height=&quot;258&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Table 3
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;ConVIRT와 기존의 image-only unsupervised image representation learning을 비교했을 때&lt;/li&gt;
&lt;li&gt;1%의 데이터만으로 linear layer 학습&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;396&quot; data-origin-height=&quot;161&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b1J5zT/btsPL31y4kQ/thEkTUwDSHgG9DLILtKYy0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b1J5zT/btsPL31y4kQ/thEkTUwDSHgG9DLILtKYy0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b1J5zT/btsPL31y4kQ/thEkTUwDSHgG9DLILtKYy0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb1J5zT%2FbtsPL31y4kQ%2FthEkTUwDSHgG9DLILtKYy0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;396&quot; height=&quot;161&quot; data-origin-width=&quot;396&quot; data-origin-height=&quot;161&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Baseline보다 높은 성능 달성&lt;/li&gt;
&lt;li&gt;ImageNet initialized보다 오직 10%의 데이터만으로도 비슷하거나 우수한 성능 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;의료 분야에 한정됨&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>논문 리뷰/자연어처리</category>
      <category>paper</category>
      <category>논문리뷰</category>
      <category>자연어처리</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/110</guid>
      <comments>https://kyj0105.tistory.com/110#entry110comment</comments>
      <pubDate>Mon, 11 Aug 2025 16:46:49 +0900</pubDate>
    </item>
    <item>
      <title>Chain-of-Thought Prompting Elicits Reasoningin Large Language Models</title>
      <link>https://kyj0105.tistory.com/109</link>
      <description>&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;링크:&lt;/ul&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://openreview.net/pdf?id=_VjQlMeSB_J&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://openreview.net/pdf?id=_VjQlMeSB_J&lt;/a&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Background%20-%20Comparison%20with%20BERT-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Background&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;LLM
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;각 task 별로 finetuning을 하는 것은 많은 자원이&amp;nbsp;&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;language model의 사이즈를 키우면서 성능과 샘플 효율성이 올라감&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;하지만 arithmetic, commonsense, and symbolic reasoining처럼 어려운 task는 크게 증가하지 않음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Motivation
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이전 연구 [1]에 따르면 arithmetic reasoning은 자연어를 생성하면서 최종 정답을 유도하는 과정에서 이점이 있음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;prompting을 통해 in-context few-shot learning을 하면 성능이 증가함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;-&amp;gt; 우리는 두 아이디어를 결합해 a series of intermediate natural language reasoninig steps인 Chain of thought를 제안함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Methods-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Methods&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Chain of thought&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;prompt를 작성할 때 few-shot sample과 함께 'Chain of thought'라는 명령어를 넣어줌으로써 모델이 순차적으로 생각하여 정답을 생성하도록 함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;&amp;lt;input, output&amp;gt;이던 구조에서 &amp;lt;input, chain of thought, output&amp;gt; 로 중간에 생각하는 단계를 넣어줌&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;산술 벤치마크에서는 대부분 동일하게 8개의 예시를 사용하였고, AQuA task에서는 4개만 사용함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id=&quot;Experiment%20%26%20Result-1&quot; style=&quot;background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-size=&quot;size26&quot;&gt;Experiment &amp;amp; Analysis&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Benchmark task
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Arithmetic: GSM8K, SVAMP, ASDiv, AQuA, MAWPS&lt;/li&gt;
&lt;li&gt;Commonsense: CSQA, StrategyQA, BIG-bench-Date, BIG-bench-sports, SayCan&lt;/li&gt;
&lt;li&gt;Symbolic: Last-letter Concatenation, Coin-flip&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 2
&lt;ul style=&quot;list-style-type: disc; color: #353638;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;standard prompting, finetuning보다 높은 성능 달성&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;이전 연구 [2]는 왜 이렇게 높은 성능을 보일까? -&amp;gt; 수학 문제 풀이의 옳고 그름을 판별하는 verifier를 추가해 최종 정답을 선택&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;657&quot; data-origin-height=&quot;619&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/pBbJT/btsPvivZgZO/DbNEyZZKKkhX8alcl1NrTK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/pBbJT/btsPvivZgZO/DbNEyZZKKkhX8alcl1NrTK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/pBbJT/btsPvivZgZO/DbNEyZZKKkhX8alcl1NrTK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FpBbJT%2FbtsPvivZgZO%2FDbNEyZZKKkhX8alcl1NrTK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;657&quot; height=&quot;619&quot; data-origin-width=&quot;657&quot; data-origin-height=&quot;619&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; background-color: #ffffff; color: #353638; text-align: left;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;Figure 4
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;모델이 크거나 task가 복잡할 수록 큰 성능 증가를 보여줌&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;특히 100B 이상의 큰 모델에서 급격한 성능 향상을 보이며 작은 모델에서는 오히려 성능 감소를 보임&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;GSM8K처럼 어려운 task에서 큰 성능 향상을 보이며 MAWPS처럼 쉬운 문제 (단일 단계 추론)에서는 큰 향상을 보이지 못함&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;LaMDA 137B가 오답을 낸 50개 사례 중 46%는 계산 실수나 한 스텝 누락 등 거의 맞는 CoT였음&lt;/li&gt;
&lt;li style=&quot;list-style-type: disc;&quot;&gt;PaLM 62B -&amp;gt; 540B로&amp;nbsp; 크기를 키우자 거의 해결됨&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;396&quot; data-origin-height=&quot;642&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nGUaY/btsPthr5mII/4RTkjkZmeYO1s2uFMG2uOk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nGUaY/btsPthr5mII/4RTkjkZmeYO1s2uFMG2uOk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nGUaY/btsPthr5mII/4RTkjkZmeYO1s2uFMG2uOk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnGUaY%2FbtsPthr5mII%2F4RTkjkZmeYO1s2uFMG2uOk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;396&quot; height=&quot;642&quot; data-origin-width=&quot;396&quot; data-origin-height=&quot;642&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 5, 6
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;수식만 출력하는 경우 거의 효과 없음 -&amp;gt; 자연어를 출력하는 과정에서 모델은 pretraining동안 학습한 지식을 활용함&lt;/li&gt;
&lt;li&gt;CoT의 성공은 프롬프트의 linguistic style이나 exemplars에 따라 상관 없음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;439&quot; data-origin-height=&quot;519&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/VhFnr/btsPs3AM3WB/bAzkLNL66wYd7VDkCRbFo1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/VhFnr/btsPs3AM3WB/bAzkLNL66wYd7VDkCRbFo1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/VhFnr/btsPs3AM3WB/bAzkLNL66wYd7VDkCRbFo1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FVhFnr%2FbtsPs3AM3WB%2FbAzkLNL66wYd7VDkCRbFo1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;439&quot; height=&quot;519&quot; data-origin-width=&quot;439&quot; data-origin-height=&quot;519&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;501&quot; data-origin-height=&quot;654&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/obCbR/btsPs0qv2kz/phHapkavniuKZyh3qGxVu0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/obCbR/btsPs0qv2kz/phHapkavniuKZyh3qGxVu0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/obCbR/btsPs0qv2kz/phHapkavniuKZyh3qGxVu0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FobCbR%2FbtsPs0qv2kz%2FphHapkavniuKZyh3qGxVu0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;501&quot; height=&quot;654&quot; data-origin-width=&quot;501&quot; data-origin-height=&quot;654&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 7
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;모든 task에서, 모델의 사이즈를 키우면 stardard prompting과 CoT의 성능이 향상&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1337&quot; data-origin-height=&quot;383&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/4H71F/btsPtj4trfB/GK8pynbJRRqaLmP38CNKd0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/4H71F/btsPtj4trfB/GK8pynbJRRqaLmP38CNKd0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/4H71F/btsPtj4trfB/GK8pynbJRRqaLmP38CNKd0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F4H71F%2FbtsPtj4trfB%2FGK8pynbJRRqaLmP38CNKd0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1337&quot; height=&quot;383&quot; data-origin-width=&quot;1337&quot; data-origin-height=&quot;383&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Figure 8&amp;nbsp;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;모델의 크기가 클 경우, 성능이 향상되며 OOD에서도 큰 상승을 보임&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;382&quot; data-origin-height=&quot;557&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/N2oHj/btsPtCphAN2/6YndzFlOkprm0vYjGoyvYk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/N2oHj/btsPtCphAN2/6YndzFlOkprm0vYjGoyvYk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/N2oHj/btsPtCphAN2/6YndzFlOkprm0vYjGoyvYk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FN2oHj%2FbtsPtCphAN2%2F6YndzFlOkprm0vYjGoyvYk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;382&quot; height=&quot;557&quot; data-origin-width=&quot;382&quot; data-origin-height=&quot;557&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Result&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;prompt에 Chain of thought를 넣어주는 간단한 방법으로 많은 task에서 높은 성능 향상 달성&lt;/li&gt;
&lt;li&gt;CoT가 모델 내부의 추론을 엿볼 수 있는 screen 역할을 함&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Limitation&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;computing 연산 증가&lt;/li&gt;
&lt;li&gt;CoT의 결과가 실제 추론인지는 알 수 없음&lt;/li&gt;
&lt;li&gt;100B 이상의 모델에서만 적용 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Appendix&lt;/h2&gt;
&lt;p style=&quot;text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #212529;&quot;&gt;[1] Ling et al.&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #000000;&quot;&gt;, &amp;ldquo;Program induction by rationale generation: Learning to solve and explain algebraic word problems&amp;rdquo;, ACL 2017&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #000000;&quot;&gt;[2] Training verifiers to solve math word problems. &lt;/span&gt;&lt;/p&gt;</description>
      <category>논문 리뷰/자연어처리</category>
      <category>nlp</category>
      <category>paper</category>
      <category>논문리뷰</category>
      <category>자연어처리</category>
      <category>학부연구생</category>
      <author>kyj0015</author>
      <guid isPermaLink="true">https://kyj0105.tistory.com/109</guid>
      <comments>https://kyj0105.tistory.com/109#entry109comment</comments>
      <pubDate>Wed, 23 Jul 2025 10:02:18 +0900</pubDate>
    </item>
  </channel>
</rss>