Don't Worry, Be Happy ๐Ÿ˜›
#robotics#artificial-intelligence#reinforcement-learning#research-paper#projects

Deep Reinforcement Learning for Robotics ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

๋กœ๋ด‡๊ฐ•ํ™”ํ•™์Šต ์„ฑ๊ณต ์‚ฌ๋ก€ Deep Reinforcement Learning for Robotics, A Survey of Real-World Successes

#๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

Deep Reinforcement Learning for Robotics, A Survey of Real-World Successes
์ˆ˜์—… ๊ณผ์ œ์—์„œ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ ๊ด€์‹ฌ ์žˆ๋Š” ๊ฒƒ์ด ์žˆ์–ด์„œ, ๊ฐ€์ง€๊ณ  ์˜ค๊ฒŒ ๋˜์—ˆ๋‹ค.

#1) ๋…ผ๋ฌธ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ

  • ๋…ผ๋ฌธ: Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes
  • ๋ฒ„์ „: arXiv v3 (2024-09-16)
  • ํ•ต์‹ฌ ์งˆ๋ฌธ:
    • DRL์ด ์‹ค์ œ ๋กœ๋ด‡ ๋ฌธ์ œ์—์„œ ์–ด๋””๊นŒ์ง€ ์„ฑ๊ณตํ–ˆ๋Š”๊ฐ€?
    • ์–ด๋–ค ์˜์—ญ์€ ์„ฑ์ˆ™ํ–ˆ๊ณ , ์–ด๋–ค ์˜์—ญ์€ ์•„์ง ์–ด๋ ค์šด๊ฐ€?

์ด ๋…ผ๋ฌธ์€ ๋‹จ์ˆœ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋น„๊ต๊ฐ€ ์•„๋‹ˆ๋ผ, ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ์˜ ์„ฑ๊ณผ๋ฅผ ๊ธฐ์ค€์œผ๋กœ DRL ์—ฐ๊ตฌ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ณ  ์„ฑ์ˆ™๋„๋ฅผ ํ‰๊ฐ€ํ•œ๋‹ค.

#2) ๋…ผ๋ฌธ ํ•ต์‹ฌ ํ”„๋ ˆ์ž„์›Œํฌ

๋…ผ๋ฌธ์€ DRL ๋กœ๋ณดํ‹ฑ์Šค ์—ฐ๊ตฌ๋ฅผ ์•„๋ž˜ 4์ถ•์œผ๋กœ ๋ถ„์„ํ•œ๋‹ค.

๋ถ„์„ ์ถ•์„ค๋ช…
Robotic Competency๋กœ๋ด‡์ด ํ•™์Šตํ•œ ๋Šฅ๋ ฅ(์ด๋™, ์กฐ์ž‘, ์‚ฌ๋žŒ/๋‹ค์ค‘๋กœ๋ด‡ ์ƒํ˜ธ์ž‘์šฉ)
Problem Formulation์ƒํƒœ/๊ด€์ธก/๋ณด์ƒ/ํ–‰๋™๊ณต๊ฐ„์„ ์–ด๋–ป๊ฒŒ RL ๋ฌธ์ œ๋กœ ์ •์˜ํ–ˆ๋Š”์ง€
Solution Method์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜, sim-to-real, real-world learning ๋“ฑ ํ•™์Šต ์ „๋žต
Level of Real-World Success์‹คํ—˜ ์„ฑ๊ณผ๋ฅผ ์‹ค์ œ ์ ์šฉ ์„ฑ์ˆ™๋„(๋ ˆ๋ฒจ)๋กœ ํ‰๊ฐ€

#Real-World Success ๋ ˆ๋ฒจ (์š”์•ฝ)

๋ ˆ๋ฒจ์˜๋ฏธ
L0์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋งŒ ๊ฒ€์ฆ
L1์ œํ•œ๋œ ์‹คํ—˜์‹ค ํ™˜๊ฒฝ ๊ฒ€์ฆ
L2๋‹ค์–‘ํ•œ ์‹คํ—˜์‹ค ํ™˜๊ฒฝ ๊ฒ€์ฆ
L3์ œํ•œ๋œ ์‹ค์ œ ํ™˜๊ฒฝ ๊ฒ€์ฆ
L4๋‹ค์–‘ํ•œ ์‹ค์ œ ํ™˜๊ฒฝ ๊ฒ€์ฆ
L5์ƒ์šฉ ์ œํ’ˆ/์„œ๋น„์Šค ์ˆ˜์ค€ ๋ฐฐํฌ

#3) ๋ฐœํ‘œ ์Šฌ๋ผ์ด๋“œ ์—…๋กœ๋“œ ๋ฐฉ์‹

์•„๋ž˜์ฒ˜๋Ÿผ ์Šฌ๋ผ์ด๋“œ ์ด๋ฏธ์ง€๋ฅผ ์˜ฌ๋ฆฐ ๋’ค, ๊ฐ ์žฅ ์„ค๋ช…์„ ์ฑ„์›Œ ๋„ฃ๋Š”๋‹ค.

  • ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ ์˜ˆ์‹œ: /assets/slides/drl-robot-251110/slide-01.png
  • ํŒŒ์ผ๋ช… ๊ทœ์น™: slide-01.png, slide-02.png, โ€ฆ, slide-30.png
  • ํ•œ ์Šฌ๋ผ์ด๋“œ๋‹น ๊ตฌ์„ฑ:
    • ์Šฌ๋ผ์ด๋“œ ์ด๋ฏธ์ง€ 1๊ฐœ
    • ํ•ต์‹ฌ ๋ฉ”์‹œ์ง€ 2~4๋ฌธ์žฅ
    • ๋‚ด ํ•ด์„/๋น„ํŒ 2~3๋ฌธ์žฅ

#4) ์Šฌ๋ผ์ด๋“œ๋ณ„ ์„ค๋ช… ์ดˆ์•ˆ (30์žฅ)

#Slide 01. ์ œ๋ชฉ/์ €์ž ์†Œ๊ฐœ

Slide 01

์ด ๋ฐœํ‘œ๋Š” DRL์ด ์‹ค์ œ ๋กœ๋ณดํ‹ฑ์Šค์— ์–ผ๋งˆ๋‚˜ ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋˜์—ˆ๋Š”์ง€ ์ฒด๊ณ„์ ์œผ๋กœ ์ •๋ฆฌํ•œ ์„œ๋ฒ ์ด๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค.
์ €์ž์ง„์€ UT Austin, University of Virginia, Sony AI ์†Œ์†์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ณ , ๋ฐœํ‘œ ๋ฒ”์œ„๊ฐ€ ๋„“์–ด ์ดํ›„ ์Šฌ๋ผ์ด๋“œ์˜ ๋ถ„๋ฅ˜ ์ฒด๊ณ„๊ฐ€ ์ค‘์š”ํ•˜๋‹ค.

#Slide 02. ๋ชฉ์ฐจ ๋ฐ ๋ฒ„์ „ ์ด๋ ฅ

Slide 02

๋ฐœํ‘œ ๊ตฌ์„ฑ์€ ์ด 13๊ฐœ ์„น์…˜(Contents&History โ†’ Taxonomy โ†’ Competency Review โ†’ Locomotion โ†’ Navigation โ†’ Manipulation โ†’ MoMa โ†’ HRI โ†’ Multi-Robot โ†’ General Trends โ†’ Key Future Directions โ†’ Additional Table โ†’ Appendix)์œผ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค.
๋ฒ„์ „ ์ด๋ ฅ(v0.1: 2025.11.10 ์ตœ์ดˆ ์ž‘์„ฑ, v0.2: 2025.12.01 ์ˆ˜์ •)๋„ ํฌํ•จ๋˜์–ด ์žˆ์–ด ๋ฐœํ‘œ ์ค€๋น„ ๊ณผ์ •์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

#Slide 03. ์ด ์„œ๋ฒ ์ด์˜ ํ•„์š”์„ฑ (Why This Survey)

Slide 03

๊ธฐ์กด ์„œ๋ฒ ์ด๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ค‘์‹ฌ์ด๊ฑฐ๋‚˜ ํŠน์ • ๊ธฐ์ˆ ยท์ž‘์—…์— ํŽธํ–ฅ๋˜์–ด ์žˆ์—ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ โ‘  ์‹ค์„ธ๊ณ„ ์„ฑ๊ณต ์ค‘์‹ฌ ๋ถ„์„, โ‘ก ์ƒˆ๋กœ์šด DRL ๋ถ„๋ฅ˜ ์ฒด๊ณ„(์—ญ๋Ÿ‰/๋ฌธ์ œ์‹ํ˜•ํƒœ/Solution/Success Level), โ‘ข ์ตœ์‹  DRL ๋ฐœ์ „ ๋ฐฐ๊ฒฝ(์‹œ๋ฎฌ โ†’ ์‹ค์„ธ๊ณ„ ์ „ํ™˜) ์„ธ ๊ฐ€์ง€ ์ด์œ ๋กœ ๊ธฐํš๋๋‹ค.
์ฆ‰, โ€œ์–ด๋””์„œ ์„ฑ๊ณตํ–ˆ๋Š”๊ฐ€?โ€๋ผ๋Š” ์งˆ๋ฌธ์— ์ฒด๊ณ„์ ์œผ๋กœ ๋‹ตํ•˜๊ธฐ ์œ„ํ•œ ํ‹€์ด ์ด ์„œ๋ฒ ์ด์˜ ์กด์žฌ ์ด์œ ๋‹ค.

#Slide 04. ๋ถ„๋ฅ˜: ๋กœ๋ด‡ ๋Šฅ๋ ฅ ์ฒด๊ณ„ (Robot Competencies)

Slide 04

DRL๋กœ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ์—ญ๋Ÿ‰์„ Mobility(Locomotion + Navigation), Manipulation(์กฐ์ž‘), Interaction with other agents(HRI + Multi-Robot)๋กœ ๊ณ„์ธตํ™”ํ•œ๋‹ค.
Fig 1 ๋‹ค์ด์–ด๊ทธ๋žจ์€ Single-Robot Competencies๊ฐ€ ์–ด๋–ป๊ฒŒ Mobile Manipulation์œผ๋กœ ์—ฐ๊ฒฐ๋˜๊ณ , ๋‚˜์•„๊ฐ€ ์ธ๊ฐ„ยท๋‹ค์ค‘๋กœ๋ด‡ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ํ™•์žฅ๋˜๋Š”์ง€๋ฅผ ํ•œ๋ˆˆ์— ๋ณด์—ฌ์ค€๋‹ค.

#Slide 05. ๋ถ„๋ฅ˜: ๋ฌธ์ œ ๊ณต์‹ํ™” (Problem Formulation)

Slide 05

RL ๋ฌธ์ œ ์ •์˜๋Š” Action Space(low-level ๊ด€์ ˆ ๋ช…๋ น / mid-level ํƒœ์Šคํฌ ๊ณต๊ฐ„ / high-level ์‹œ๊ฐ„ ํ™•์žฅ), Observation Space(๊ณ ์ฐจ์› ์„ผ์„œ ์ž…๋ ฅ vs ์ €์ฐจ์› ์„ผ์„œ ์ž…๋ ฅ), Reward Function(sparse vs dense) ์„ธ ์ถ•์œผ๋กœ ๋ถ„๋ฅ˜ํ•œ๋‹ค.
Fig 2์˜ ํ‘œ์ค€ MDP ๋‹ค์ด์–ด๊ทธ๋žจ(Agent โ†” Environment: ์•ก์…˜ยท๋ณด์ƒยท๊ด€์ธก)์€ ์ด ๋ถ„๋ฅ˜๊ฐ€ ์‹ค์ œ ์–ด๋–ป๊ฒŒ ์ ์šฉ๋˜๋Š”์ง€๋ฅผ ์ง๊ด€์ ์œผ๋กœ ์ •๋ฆฌํ•œ๋‹ค.

#Slide 06. ๋ถ„๋ฅ˜: ํ•ด๋ฒ• ์ ‘๊ทผ ๋ฐฉ์‹ 1 (Solution Approach)

Slide 06

Simulator usage ์ธก๋ฉด์—์„œ sim-to-real(zero-shot, few-shot), offline/real ํ•™์Šต์œผ๋กœ ๊ตฌ๋ถ„๋˜๋ฉฐ, Model learning์€ Model-free์™€ Model-based๋กœ ๋‚˜๋‰œ๋‹ค.
Fig 3(sim-to-real ๊ฐœ๋…๋„)๊ณผ Fig 4(์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ: Training Env โ†’ Experience Tuples โ†’ Learning Process โ†’ Policy Network โ†’ RL Agent)๊ฐ€ ํ•จ๊ป˜ ์ œ์‹œ๋˜์–ด Solution Approach ์ฒด๊ณ„์˜ ๊ธฐ๋ฐ˜์„ ์‹œ๊ฐํ™”ํ•œ๋‹ค.

#Slide 07. ๋ถ„๋ฅ˜: ํ•ด๋ฒ• ์ ‘๊ทผ ๋ฐฉ์‹ 2 (Solution Approach, cont.)

Slide 07

Expert usage(human demo, oracle ๋“ฑ), Policy Optimization(planning, Offline/On-Policy/Off-Policy RL), PolicyยทModel Representation(MLP, CNN, RNN, Transformer) ์„ธ ๋ฒ”์ฃผ๋กœ Solution Approach๊ฐ€ ์™„์„ฑ๋œ๋‹ค.
์ด ๊ตฌ๋ถ„ ์ฒด๊ณ„๋Š” ์ดํ›„ ๊ฐ ์—ญ๋Ÿ‰ ์„น์…˜์—์„œ ์–ด๋–ค ๋ฐฉ๋ฒ• ์กฐํ•ฉ์ด ์‹ค์„ธ๊ณ„ ์„ฑ๊ณต์œผ๋กœ ์ด์–ด์กŒ๋Š”์ง€ ๋น„๊ตํ•˜๋Š” ๊ธฐ์ค€์ด ๋œ๋‹ค.

#Slide 08. ๋ถ„๋ฅ˜: ์‹ค์„ธ๊ณ„ ์„ฑ๊ณต ๋ ˆ๋ฒจ (Real-World Success Level)

Slide 08

๊ธฐ์ˆ ์„ฑ์ˆ™๋„(Technology Readiness Level)์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„ Level 0(์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋งŒ ๊ฒ€์ฆ)~Level 5(์ƒ์šฉ ์ œํ’ˆยท์„œ๋น„์Šค ๋ฐฐํฌ)๋ฅผ ์ •์˜ํ•œ๋‹ค.
๊ฐ™์€ RL ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ผ๋„ ์–ด๋А ํ™˜๊ฒฝ์—์„œ ๊ฒ€์ฆ๋๋Š”์ง€์— ๋”ฐ๋ผ ์˜๋ฏธ๊ฐ€ ํฌ๊ฒŒ ๋‹ค๋ฅด๋ฏ€๋กœ, ์ด ๋ ˆ๋ฒจ ์ฒด๊ณ„๋Š” ๋ฌธํ—Œ ๋น„๊ต์˜ ํ•ต์‹ฌ ๊ธฐ์ค€์ด ๋œ๋‹ค.

#Slide 09. Competency-Specific Review ์†Œ๊ฐœ ๋ฐ ์ƒ‰์ƒ ๋ฒ”๋ก€

Slide 09

์ดํ›„ ์„น์…˜์€ Locomotion, Navigation, Manipulation, MoMa, HRI, Multi-Robot ๊ฐ ์—ญ๋Ÿ‰์„ ์ง‘์ค‘ ๋ฆฌ๋ทฐํ•˜๋ฉฐ, ๋…ผ๋ฌธ ๋ ˆํผ๋Ÿฐ์Šค๋ฅผ ์„ฑ์ˆ™๋„ ๊ธฐ์ค€์œผ๋กœ ์ƒ‰์ƒ ์ฝ”๋”ฉํ•ด ์ œ์‹œํ•œ๋‹ค.
์ƒ‰์ƒ ๋ฒ”๋ก€: Limited Lab(์—ฐํ•œ ํŒŒ๋ž‘) / Diverse Lab(ํŒŒ๋ž‘) / Limited Real(์ฒญ๋ก) / Diverse Real(์ง„ํ•œ ์ฒญ๋ก) โ€” ์ด ๊ธฐ์ค€์œผ๋กœ ์ดํ›„ ๋ชจ๋“  ํ‘œ๋ฅผ ํ•ด์„ํ•˜๋ฉด ๋œ๋‹ค.

#Slide 10. Locomotion ๊ฐœ์š”

Slide 10

Legged Locomotion(Quadruped, Biped)๊ณผ Quadrotor Flight Control๋กœ ๋ถ„๋ฅ˜๋œ ๋ฌธํ—Œ๋“ค์„ ๋ ˆํผ๋Ÿฐ์Šค ๋ฒˆํ˜ธ + ์„ฑ์ˆ™๋„ ์ƒ‰์ƒ์œผ๋กœ ์ •๋ฆฌํ•œ ํ‘œ๋‹ค.
Quadruped๋Š” Diverse Real๊นŒ์ง€ ์ƒ‰์ด ๋„“๊ฒŒ ํผ์ ธ ์žˆ์–ด ์„ฑ์ˆ™๋„๊ฐ€ ๊ฐ€์žฅ ๋†’๊ณ , Biped์™€ Flight๋Š” ์ƒ๋Œ€์ ์œผ๋กœ Limited ์˜์—ญ์— ์ง‘์ค‘๋˜์–ด ์žˆ๋‹ค.

#Slide 11. Locomotion ํ•ต์‹ฌ ์š”์•ฝ

Slide 11

DRL ๊ธฐ๋ฐ˜ ์‚ฌ์กฑ๋ณดํ–‰(quadruped) ๊ตฌํ˜„์€ ์„ฑ์ˆ™๋„ ๋†’์Œ; ์ด์กฑ๋ณดํ–‰์€ DoF๊ฐ€ ๋†’๊ณ  ๋™์—ญํ•™์ด ์–ด๋ ค์›Œ ๋œ ์„ฑ์ˆ™ํ•˜๋‹ค. ํ•ต์‹ฌ ์„ฑ๊ณต ํŒจํ„ด์€ Zero-shot Sim-to-real(On-policy Model-Free)๊ณผ Privileged information(ํŠน๊ถŒ ์ •๋ณด ๊ฐ€์ง„ Teacher โ†’ Student ์ฆ๋ฅ˜).
Open questions: ํšจ์œจ์ ยท์•ˆ์ „ํ•œ real-world ํ•™์Šต, ์ด๋™๊ณผ ๋‹ค๋ฅธ ์ž‘์—…(๊ณ ์ฐจ์›ยท๋ณตํ•ฉยท์žฅ๊ธฐ ๋ชฉํ‘œ)์˜ ํ†ตํ•ฉ ๋ฐฉ๋ฒ•.

#Slide 12. Navigation ๊ฐœ์š”

Slide 12

Wheeled, Legged, Aerial ํ”Œ๋žซํผ๋ณ„๋กœ ๋ฌธํ—Œ์„ ๋ ˆํผ๋Ÿฐ์Šค ๋ฒˆํ˜ธ + ์„ฑ์ˆ™๋„ ์ƒ‰์ƒ์œผ๋กœ ์ •๋ฆฌํ•œ ํ‘œ๋‹ค.
Wheeled Navigation์€ Diverse Lab/Real ์‚ฌ๋ก€๊ฐ€ ๋งŽ๊ณ , Aerial์€ ์ƒ๋Œ€์ ์œผ๋กœ Diverse Real ์‚ฌ๋ก€๊ฐ€ ์ ์–ด ์„ฑ์ˆ™๋„ ์ฐจ์ด๊ฐ€ ๋šœ๋ ทํ•˜๋‹ค.

#Slide 13. Navigation ํ•ต์‹ฌ ์š”์•ฝ

Slide 13

์‹ค๋‚ด Nav์—์„œ end-to-end RL์ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ƒ ๋›ฐ์–ด๋‚˜์ง€๋งŒ, real-world์—์„œ๋Š” ๋ชจ๋“ˆ์‹(classical stacks)์ด ๊ฐ€์žฅ ์„ฑ๊ณต์ ์ด๋‹ค. ์ผ๋ฐ˜ํ™”ยท์„ค๋ช… ๊ฐ€๋Šฅ์„ฑยท์•ˆ์ „์„ฑ์ด ๋ถ€์žฌํ•˜๋ฉฐ, local plan + semantic exploration์ด ์œ ๋งํ•œ ์ ‘๊ทผ์ด๋‹ค.
Open questions: Nav stacks ์ค‘ ์–ผ๋งˆ๋‚˜ ํ•™์Šต์œผ๋กœ ๋Œ€์ฒดํ• ์ง€, Nav์™€ Locomotion์„ ์–ด๋–ป๊ฒŒ ํ•จ๊ป˜ ํ•™์Šตํ• ์ง€, Safety Critical ๋ถ„์•ผ(์ž์œจ์ฃผํ–‰ ๋“ฑ)์—์„œ์˜ RL ์—ญํ• .

#Slide 14. Manipulation ๊ฐœ์š”

Slide 14

์กฐ์ž‘ ๋ฌธํ—Œ์„ Pick-and-place(Grasping / End-to-end / Pick-and-place), Contact-rich(Assembly / Articulated Objects / Deformable Objects), In-hand, Non-prehensile๋กœ ์„ธ๋ถ„ํ™”ํ•ด ๋ ˆํผ๋Ÿฐ์Šค ๋ฒˆํ˜ธ + ์„ฑ์ˆ™๋„๋กœ ์ •๋ฆฌํ–ˆ๋‹ค.
์‚ฌ์ง„ ์˜ˆ์‹œ(pick-and-place, contact-rich, in-hand, non-prehensile)๋กœ ๊ฐ ์„œ๋ธŒํƒœ์Šคํฌ๊ฐ€ ์–ด๋–ค ๋ฌผ๋ฆฌ์  ์ƒํ˜ธ์ž‘์šฉ์ธ์ง€ ํ•œ๋ˆˆ์— ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

#Slide 15. Manipulation ํ•ต์‹ฌ ์š”์•ฝ

Slide 15

RL์€ ๊ณผ์ œ๊ฐ€ Constrained(๋ฌผ์ฒดยทํ™˜๊ฒฝ์ด ์ •ํ•ด์ง) + Enumerable a priori(๋ชฉํ‘œยท์ดˆ๊ธฐ์กฐ๊ฑด์„ ์‚ฌ์ „ ์—ด๊ฑฐ ๊ฐ€๋Šฅ)ํ•  ๋•Œ ๊ฐ€์žฅ ์„ฑ๊ณต์  โ€” grasping, in-hand manipulation์ด ๋Œ€ํ‘œ ์˜ˆ.
Open-world ํ™•์žฅ์„ ์œ„ํ•ด์„œ๋Š” Multi-task/Meta/Lifelong learning, Autonomous real-world learning(reward/reset ์ž๋™ํ™”), Learning from human video, Leveraging demonstrations๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

#Slide 16. Manipulation ๋ฏธ๊ฒฐ ๊ณผ์ œ (Open Questions)

Slide 16

ํšจ๊ณผ์ ์ธ priors(Symmetry, Collision-avoidance)๋ฅผ ์–ด๋–ป๊ฒŒ ํ†ตํ•ฉํ• ์ง€, ๊ทธ๋ฆฌ๊ณ  ๋Œ€๋ถ€๋ถ„ ์—ฐ๊ตฌ๊ฐ€ ํ•˜๋‚˜์˜ ๊ณ ๋ฆฝ๋œ ์„œ๋ธŒํƒœ์Šคํฌ(specific action space)๋งŒ ๋‹ค๋ฃจ๋Š” ํ˜„์‹ค์—์„œ ์–ด๋–ป๊ฒŒ ํ†ตํ•ฉ๋œ ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ• ์ง€๊ฐ€ ํ•ต์‹ฌ ์งˆ๋ฌธ์ด๋‹ค.
์กฐ์ž‘์˜ ๋ฏธ๋ž˜๋Š” ๊ฒฐ๊ตญ โ€œ์—ฌ๋Ÿฌ ์„œ๋ธŒ ๋Šฅ๋ ฅ์„ ํ•˜๋‚˜๋กœ ๋ฌถ๋Š” ํ†ตํ•ฉ ์„ค๊ณ„โ€์— ๋‹ฌ๋ ค ์žˆ์œผ๋ฉฐ, ์ด ์ ์ด Manipulation์„ ์—ฌ์ „ํžˆ ์–ด๋ ต๊ฒŒ ๋งŒ๋“œ๋Š” ์ฃผ๋œ ์ด์œ ๋‹ค.

#Slide 17. Mobile Manipulation (MoMa) ๊ฐœ์š”

Slide 17

MoMa ๋ฌธํ—Œ์„ WBC(Whole-Body Control), Short-Horizon Interactive Tasks, Long-Horizon Interactive Tasks๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ  ๋ ˆํผ๋Ÿฐ์Šค + ์„ฑ์ˆ™๋„๋ฅผ ์ •๋ฆฌํ–ˆ๋‹ค.
Long-Horizon ๊ณผ์ œ๋Š” ๋Œ€๋ถ€๋ถ„ Limited Lab ์ˆ˜์ค€์— ๋จธ๋ฌผ๋Ÿฌ, ์ด๋™๊ณผ ์กฐ์ž‘์„ ๋™์‹œ์— ์žฅ๊ธฐ์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์ด ์–ผ๋งˆ๋‚˜ ์–ด๋ ค์šด์ง€๋ฅผ ์ž˜ ๋ณด์—ฌ์ค€๋‹ค.

#Slide 18. Mobile Manipulation ํ•ต์‹ฌ ์š”์•ฝ

Slide 18

๋‹จ๊ธฐ ๊ณผ์—…์—์„œ๋Š” sim-to-real ์ดˆ๊ธฐ ์„ฑ๊ณต ์‚ฌ๋ก€๊ฐ€ ์žˆ์œผ๋‚˜, Action space ์„ ํƒ์ด ์„ฑ๋Šฅ์— ๊ฒฐ์ •์  ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ , ๋‹ค์–‘ํ•œ ํ˜•ํƒœ(morphology)๋กœ์˜ ํ™•์žฅ๋„ ์ค‘์š”ํ•˜๋‹ค.
Open questions: Multi-tasking, Long-term memory, Safe exploration โ€” ์ด ์„ธ ๊ฐ€์ง€๊ฐ€ MoMa์˜ ์‹ค์„ธ๊ณ„ ํ™•์žฅ์„ ๋ง‰๋Š” ํ•ต์‹ฌ ๋ณ‘๋ชฉ์ด๋‹ค.

#Slide 19. Human-Robot Interaction (HRI) ๊ฐœ์š”

Slide 19

Physical HRI(pHRI)๋ฅผ Non-Collaborative(ํ˜ผ์žกํ•œ ๊ณต๊ฐ„์—์„œ์˜ ํšŒํ”ผ), Collaborative(ํ˜‘๋™ ์ž‘์—…), Shared Autonomy๋กœ ๊ตฌ๋ถ„ํ•˜๊ณ , ๊ฐ ์œ ํ˜•๋ณ„ ๋ ˆํผ๋Ÿฐ์Šค์™€ ์„ฑ์ˆ™๋„๋ฅผ ํ‘œ๋กœ ์ •๋ฆฌํ–ˆ๋‹ค.
HRI ์ „๋ฐ˜์˜ ์„ฑ์ˆ™๋„๋Š” Diverse Lab ์ด์ƒ์ด ๋“œ๋ฌผ์–ด, ์‚ฌ๋žŒ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ์ด ํฌํ•จ๋  ๋•Œ RL ๊ฒ€์ฆ์˜ ๋‚œ์ด๋„๊ฐ€ ์–ผ๋งˆ๋‚˜ ์˜ค๋ฅด๋Š”์ง€๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

#Slide 20. Human-Robot Interaction ํ•ต์‹ฌ ์š”์•ฝ

Slide 20

๋‹จ์ผ ๋กœ๋ด‡ ์—ญ๋Ÿ‰ ๋Œ€๋น„ ์„ฑ๊ณต ์‚ฌ๋ก€๊ฐ€ ์ ๊ณ , ์ธ๊ฐ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์ž์ฒด๊ฐ€ ์–ด๋ ต๋‹ค(Non-Markovian, Limited rationality, ๋น„์šฉ ้ซ˜).
Future directions: ์‚ฌ๋žŒ๊ณผ ํ•จ๊ป˜ํ•˜๋Š” ์•ˆ์ „ํ•œ real-world ํ•™์Šต ๊ฐ€๋Šฅํ™”, ๋” ํ˜„์‹ค์ ์ธ ์ธ๊ฐ„ ํ–‰๋™ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ฐœ๋ฐœ โ€” ์ด ๋‘ ๋ฐฉํ–ฅ์ด ํ•ด๊ฒฐ๋˜์ง€ ์•Š์œผ๋ฉด HRI DRL์˜ ํ™•์žฅ์€ ๊ณ„์† ๋ง‰ํžŒ๋‹ค.

#Slide 21. Multi-Robot Interaction ๊ฐœ์š”

Slide 21

Multi-Robot Collision Avoidance, Loco-Manipulation, Robot Soccer ์„ธ ์œ ํ˜•์˜ ์‚ฌ์ง„๊ณผ ๋ ˆํผ๋Ÿฐ์Šค ๋ฒˆํ˜ธยท์„ฑ์ˆ™๋„๋ฅผ ์ •๋ฆฌํ–ˆ๋‹ค.
Robot Soccer(์ฐธ์กฐ 191)๊ฐ€ Diverse Real์— ํ•ด๋‹นํ•˜๊ณ , Collision Avoidance๋„ ์ผ๋ถ€ Diverse Real ์‚ฌ๋ก€๊ฐ€ ์žˆ์œผ๋‚˜ ์ „๋ฐ˜์ ์œผ๋กœ ์„ฑ์ˆ™๋„๊ฐ€ ๋‚ฎ๋‹ค.

#Slide 22. Multi-Robot Interaction ํ•ต์‹ฌ ์š”์•ฝ

Slide 22

๋™์งˆ์  ํ˜‘๋ ฅ ํ™˜๊ฒฝ์—์„œ๋Š” ์„ฑ๊ณผ๊ฐ€ ์žˆ์ง€๋งŒ, ๋ณต์žก๋„์™€ ํ™•์žฅ์„ฑ ๋ฌธ์ œ๊ฐ€ ์—ฌ์ „ํžˆ ํฌ๋‹ค๋Š” ์ ์„ ์ •๋ฆฌํ•œ๋‹ค.
ํ•ต์‹ฌ ๊ณผ์ œ๋Š” ์—์ด์ „ํŠธ ๊ฐ„ ํ†ต์‹ , ํ•™์Šต ์ˆ˜๋ ด์„ฑ/์•ˆ์ •์„ฑ, ๊ทธ๋ฆฌ๊ณ  ๋น„ํ˜‘์กฐ ์ผ๋ฐ˜ ์ƒํ™ฉ์œผ๋กœ์˜ ํ™•์žฅ์ด๋‹ค.

Slide 23

์„ฑ์ˆ™ํ•œ ์˜์—ญ(Locomotion, ์ผ๋ถ€ Navigation/Manipulation)๊ณผ ๋ฏธ์„ฑ์ˆ™ ์˜์—ญ(MoMa, HRI, Multi-Robot)์„ ๋น„๊ตํ•ด ์ „์ฒด ์ง€ํ˜•์„ ๋ณด์—ฌ์ค€๋‹ค.
๋˜ํ•œ ์„ฑ์ˆ™ํ•œ ํ•ด๋ฒ•์˜ ๊ณตํ†ต์ ์œผ๋กœ zero-shot sim-to-real, dense reward engineering, on-policy ํ•™์Šต ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•œ๋‹ค.

#Slide 24. Key Future Directions

Slide 24

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์œผ๋กœ ๋ณด์ƒ/ํ–‰๋™๊ณต๊ฐ„์˜ ์›๋ฆฌ์  ์„ค๊ณ„, ๊ณ ์ „์  ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๊ณผ์˜ ํ†ตํ•ฉ, ํ‘œ์ค€ ๋ฒค์น˜๋งˆํ‚น์„ ๊ฐ•์กฐํ•œ๋‹ค.
๋˜ํ•œ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ์ผ๋ฐ˜ํ™”, ์–ธ์–ด ์กฐ๊ฑดํ™”, ๋ณด์ƒ/์‹œ๋ฎฌ ์ž์‚ฐ ์ƒ์„ฑ ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์žฅ ํฌ์ธํŠธ๋กœ ์ œ์‹œํ•œ๋‹ค.

#Slide 25. Additional Table: Problem Formulation (Table 1)

Slide 25

์ด ์žฅ์€ ๋…ผ๋ฌธ์˜ Table 1์„ ํ†ตํ•ด ๋ฌธ์ œ ๊ณต์‹ํ™” ์ถ•(action/observation/reward)์„ ๊ธฐ์ค€์œผ๋กœ ๊ธฐ์กด ๋ฌธํ—Œ์„ ์ฒด๊ณ„์ ์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค.
์ฆ‰, ์–ด๋–ค ๊ณผ์ œ์—์„œ ์–ด๋–ค ๋ฌธ์ œ ์ •์˜๊ฐ€ ๋งŽ์ด ์“ฐ์˜€๋Š”์ง€ ํ•œ๋ˆˆ์— ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋Š” ์ฐธ๊ณ  ํ‘œ๋‹ค.

#Slide 26. Additional Table: Problem Formulation (Table 1, continued)

Slide 26

Slide 25์˜ ์—ฐ์† ํŽ˜์ด์ง€๋กœ, ๋™์ผํ•œ ๋ถ„๋ฅ˜ ๊ธฐ์ค€์„ ๋” ๋งŽ์€ ๋ฌธํ—Œ์— ํ™•์žฅํ•ด ๋ณด์—ฌ์ค€๋‹ค.
๋ฐœํ‘œ ์‹œ์—๋Š” ๋‚ด ๊ด€์‹ฌ ๊ณผ์ œ์™€ ์œ ์‚ฌํ•œ ๋ฌธ์ œ ์„ค์ •(๋ณด์ƒ/๊ด€์ธก/ํ–‰๋™๊ณต๊ฐ„)์„ ์ฐพ์•„ ๊ทผ๊ฑฐ๋กœ ์ธ์šฉํ•˜๊ธฐ ์ข‹๋‹ค.

#Slide 27. Additional Table: Problem Formulation (Table 2)

Slide 27

Table 2์—์„œ๋Š” ๋‹ค๋ฅธ ๊ด€์ ์˜ ๋ฌธ์ œ ๊ณต์‹ํ™” ๋ถ„๋ฅ˜๋ฅผ ๋ณด๊ฐ•ํ•ด, domain ๊ฐ„ ๊ณตํ†ต ํŒจํ„ด๊ณผ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•˜๊ฒŒ ํ•ด์ค€๋‹ค.
ํŠนํžˆ ์–ด๋А ๋„๋ฉ”์ธ์ด sparse/dense reward์— ์˜์กดํ•˜๋Š”์ง€, ๊ด€์ธก ์ฐจ์›์ด ์–ด๋–ป๊ฒŒ ๋‹ฌ๋ผ์ง€๋Š”์ง€ ํ™•์ธํ•˜๋Š” ์šฉ๋„๋กœ ์œ ์šฉํ•˜๋‹ค.

#Slide 28. Additional Table: Problem Formulation (Table 2, continued)

Slide 28

Slide 27์˜ ์—ฐ์žฅ์œผ๋กœ, ํ‘œ ๊ธฐ๋ฐ˜ ๊ทผ๊ฑฐ๋ฅผ ์ถฉ๋ถ„ํžˆ ์ œ์‹œํ•ด ๋ฐœํ‘œ ๊ฒฐ๋ก ์˜ ์‹ ๋ขฐ๋„๋ฅผ ๋†’์ด๋Š” ํŽ˜์ด์ง€๋‹ค.
์š”์•ฝํ•  ๋•Œ๋Š” โ€œ๋‚ด๊ฐ€ ์„ ํƒํ•œ ํƒœ์Šคํฌ๊ฐ€ ์™œ ํ•ด๋‹น ๋ฌธ์ œ์„ค์ •์„ ํƒํ•ด์•ผ ํ•˜๋Š”์ง€โ€๋ฅผ ์ด ํ‘œ์™€ ์—ฐ๊ฒฐํ•˜๋ฉด ์ข‹๋‹ค.

#Slide 29. Additional Table: Solution Approach (Table 3)

Slide 29

Table 3๋Š” solution approach(์˜ˆ: sim-to-real, model-free/model-based, policy optimization) ๊ด€์ ์—์„œ ๋ฌธํ—Œ์„ ๋ถ„๋ฅ˜ํ•œ๋‹ค.
์„ฑ๊ณต ์‚ฌ๋ก€๊ฐ€ ์–ด๋–ค ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ์กฐํ•ฉ์—์„œ ์ฃผ๋กœ ๋‚˜์™”๋Š”์ง€ ๊ทผ๊ฑฐ๋ฅผ ์ œ์‹œํ•˜๋Š” ํ•ต์‹ฌ ๋ถ€๋ก ํ‘œ๋‹ค.

#Slide 30. Additional Table: Solution Approach (Table 3, continued)

Slide 30

๋งˆ์ง€๋ง‰ ์žฅ์€ solution approach ํ‘œ์˜ ์—ฐ์†์œผ๋กœ, ๋ฆฌ๋ทฐ ์ „์ฒด์˜ โ€œ๋ฐฉ๋ฒ•๋ก ๋ณ„ ์ง€ํ˜•๋„โ€๋ฅผ ๋งˆ๋ฌด๋ฆฌํ•œ๋‹ค.
๋‚ด ๊ฒฐ๋ก ์—์„œ๋Š” ์ด ํ‘œ๋ฅผ ๊ทผ๊ฑฐ๋กœ, ๋‹ค์Œ ํ”„๋กœ์ ํŠธ์—์„œ ์ฑ„ํƒํ•  ํ•™์Šต ์ „๋žต(์˜ˆ: zero-shot sim-to-real vs real-world finetuning)์„ ๋ช…ํ™•ํžˆ ์ œ์•ˆํ•˜๋ฉด ์ข‹๋‹ค.


#5) ๋‚ด ๊ฒฐ๋ก  (์ดˆ์•ˆ)

  1. DRL์˜ ์‹ค์ œ ์„ฑ๊ณต์€ ์ด๋ฏธ ์กด์žฌํ•˜์ง€๋งŒ, ๋ฌธ์ œ ์œ ํ˜•์— ๋”ฐ๋ผ ์„ฑ์ˆ™๋„ ํŽธ์ฐจ๊ฐ€ ๋งค์šฐ ํฌ๋‹ค.
  2. ์ง€๊ธˆ๊นŒ์ง€์˜ ์„ฑ๊ณต ๊ณตํ†ต์ ์€ sim-to-real ๊ฐ€๋Šฅ ๋ฌธ์ œ + ์ •๊ตํ•œ ์—”์ง€๋‹ˆ์–ด๋ง(๋ณด์ƒ/๋„๋ฉ”์ธ๋žœ๋คํ™”/์•ก์…˜์„ค๊ณ„)์ด๋‹ค.
  3. ์•ž์œผ๋กœ์˜ ์Šน๋ถ€์ฒ˜๋Š” ์‹ค์„ธ๊ณ„ ํ•™์Šต ์•ˆ์ •ํ™”, ์žฅ๊ธฐ๊ณผ์—… ํ†ตํ•ฉ, ์‚ฌ๋žŒ/๋‹ค์ค‘์—์ด์ „ํŠธ ์ƒํ˜ธ์ž‘์šฉ ์ผ๋ฐ˜ํ™”๋‹ค.

#6) ์›๋ฌธ/๋ฐœํ‘œ/์—ฐ๊ตฌ ๊ธฐ๊ด€ ๋งํฌ

Share this post