Don't Worry, Be Happy ๐Ÿ˜›
#robotics#artificial-intelligence#reinforcement-learning#research-paper#projects

Deep Reinforcement Learning for Robotics ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

๋กœ๋ด‡๊ฐ•ํ™”ํ•™์Šต ์„ฑ๊ณต ์‚ฌ๋ก€ Deep Reinforcement Learning for Robotics, A Survey of Real-World Successes

#๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

Deep Reinforcement Learning for Robotics, A Survey of Real-World Successes
์ˆ˜์—… ๊ณผ์ œ์—์„œ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ ๊ด€์‹ฌ ์žˆ๋Š” ๊ฒƒ์ด ์žˆ์–ด์„œ, ๊ฐ€์ง€๊ณ  ์˜ค๊ฒŒ ๋˜์—ˆ๋‹ค.

#1) ๋…ผ๋ฌธ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ

  • ๋…ผ๋ฌธ: Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes
  • ๋ฒ„์ „: arXiv v3 (2024-09-16)
  • ํ•ต์‹ฌ ์งˆ๋ฌธ:
    • DRL์ด ์‹ค์ œ ๋กœ๋ด‡ ๋ฌธ์ œ์—์„œ ์–ด๋””๊นŒ์ง€ ์„ฑ๊ณตํ–ˆ๋Š”๊ฐ€?
    • ์–ด๋–ค ์˜์—ญ์€ ์„ฑ์ˆ™ํ–ˆ๊ณ , ์–ด๋–ค ์˜์—ญ์€ ์•„์ง ์–ด๋ ค์šด๊ฐ€?

์ด ๋…ผ๋ฌธ์€ ๋‹จ์ˆœ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋น„๊ต๊ฐ€ ์•„๋‹ˆ๋ผ, ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ์˜ ์„ฑ๊ณผ๋ฅผ ๊ธฐ์ค€์œผ๋กœ DRL ์—ฐ๊ตฌ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ณ  ์„ฑ์ˆ™๋„๋ฅผ ํ‰๊ฐ€ํ•œ๋‹ค.

#2) ๋…ผ๋ฌธ ํ•ต์‹ฌ ํ”„๋ ˆ์ž„์›Œํฌ

๋…ผ๋ฌธ์€ DRL ๋กœ๋ณดํ‹ฑ์Šค ์—ฐ๊ตฌ๋ฅผ ์•„๋ž˜ 4์ถ•์œผ๋กœ ๋ถ„์„ํ•œ๋‹ค.

๋ถ„์„ ์ถ•์„ค๋ช…
Robotic Competency๋กœ๋ด‡์ด ํ•™์Šตํ•œ ๋Šฅ๋ ฅ(์ด๋™, ์กฐ์ž‘, ์‚ฌ๋žŒ/๋‹ค์ค‘๋กœ๋ด‡ ์ƒํ˜ธ์ž‘์šฉ)
Problem Formulation์ƒํƒœ/๊ด€์ธก/๋ณด์ƒ/ํ–‰๋™๊ณต๊ฐ„์„ ์–ด๋–ป๊ฒŒ RL ๋ฌธ์ œ๋กœ ์ •์˜ํ–ˆ๋Š”์ง€
Solution Method์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜, sim-to-real, real-world learning ๋“ฑ ํ•™์Šต ์ „๋žต
Level of Real-World Success์‹คํ—˜ ์„ฑ๊ณผ๋ฅผ ์‹ค์ œ ์ ์šฉ ์„ฑ์ˆ™๋„(๋ ˆ๋ฒจ)๋กœ ํ‰๊ฐ€

#Real-World Success ๋ ˆ๋ฒจ (์š”์•ฝ)

๋ ˆ๋ฒจ์˜๋ฏธ
L0์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋งŒ ๊ฒ€์ฆ
L1์ œํ•œ๋œ ์‹คํ—˜์‹ค ํ™˜๊ฒฝ ๊ฒ€์ฆ
L2๋‹ค์–‘ํ•œ ์‹คํ—˜์‹ค ํ™˜๊ฒฝ ๊ฒ€์ฆ
L3์ œํ•œ๋œ ์‹ค์ œ ํ™˜๊ฒฝ ๊ฒ€์ฆ
L4๋‹ค์–‘ํ•œ ์‹ค์ œ ํ™˜๊ฒฝ ๊ฒ€์ฆ
L5์ƒ์šฉ ์ œํ’ˆ/์„œ๋น„์Šค ์ˆ˜์ค€ ๋ฐฐํฌ

#3) ๋ฐœํ‘œ ์Šฌ๋ผ์ด๋“œ ์—…๋กœ๋“œ ๋ฐฉ์‹

์•„๋ž˜์ฒ˜๋Ÿผ ์Šฌ๋ผ์ด๋“œ ์ด๋ฏธ์ง€๋ฅผ ์˜ฌ๋ฆฐ ๋’ค, ๊ฐ ์žฅ ์„ค๋ช…์„ ์ฑ„์›Œ ๋„ฃ๋Š”๋‹ค.

  • ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ ์˜ˆ์‹œ: /assets/slides/drl-robot-251110/slide-01.png
  • ํŒŒ์ผ๋ช… ๊ทœ์น™: slide-01.png, slide-02.png, โ€ฆ, slide-30.png
  • ํ•œ ์Šฌ๋ผ์ด๋“œ๋‹น ๊ตฌ์„ฑ:
    • ์Šฌ๋ผ์ด๋“œ ์ด๋ฏธ์ง€ 1๊ฐœ
    • ํ•ต์‹ฌ ๋ฉ”์‹œ์ง€ 2~4๋ฌธ์žฅ
    • ๋‚ด ํ•ด์„/๋น„ํŒ 2~3๋ฌธ์žฅ

#4) ์Šฌ๋ผ์ด๋“œ๋ณ„ ์„ค๋ช… ์ดˆ์•ˆ (30์žฅ)

#Slide 01. ์ œ๋ชฉ/์ €์ž ์†Œ๊ฐœ

Slide 01

์ด ๋ฐœํ‘œ๋Š” DRL์ด ์‹ค์ œ ๋กœ๋ณดํ‹ฑ์Šค์— ์–ผ๋งˆ๋‚˜ ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋˜์—ˆ๋Š”์ง€ ์ฒด๊ณ„์ ์œผ๋กœ ์ •๋ฆฌํ•œ ์„œ๋ฒ ์ด๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค.
์ €์ž์ง„์€ UT Austin, University of Virginia, Sony AI ์†Œ์†์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ณ , ๋ฐœํ‘œ ๋ฒ”์œ„๊ฐ€ ๋„“์–ด ์ดํ›„ ์Šฌ๋ผ์ด๋“œ์˜ ๋ถ„๋ฅ˜ ์ฒด๊ณ„๊ฐ€ ์ค‘์š”ํ•˜๋‹ค.

#Slide 02. Deep RL์˜ ๊ธฐ์กด ์„ฑ๊ณต๊ณผ ๋กœ๋ณดํ‹ฑ์Šค ๋‚œ์ 

Slide 02

DRL์€ ๊ฒŒ์ž„/์ถ”์ฒœ ๋“ฑ์—์„œ ์ด๋ฏธ ๊ฐ•๋ ฅํ•œ ์„ฑ๊ณผ๋ฅผ ๋ƒˆ์ง€๋งŒ, ์‹ค์ œ ๋กœ๋ด‡์€ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋น„์šฉ๊ณผ ์•ˆ์ „์„ฑ ์ œ์•ฝ์ด ํ›จ์”ฌ ํฌ๋‹ค.
์ฆ‰, โ€œ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ฑ๋Šฅโ€๊ณผ โ€œํ˜„์žฅ ์„ฑ๋Šฅโ€ ์‚ฌ์ด์˜ ๊ฐ„๊ทน์ด ํ•ต์‹ฌ ๋ฌธ์ œ๋ผ๋Š” ์ ์„ ๋จผ์ € ์งš๋Š”๋‹ค.

#Slide 03. ์„œ๋ฒ ์ด ๋ชฉํ‘œ์™€ ๋ถ„๋ฅ˜ ๊ธฐ์ค€

Slide 03

์ด ์—ฐ๊ตฌ์˜ ๋ชฉํ‘œ๋Š” DRL ๋กœ๋ณดํ‹ฑ์Šค ์„ฑ๊ณผ๋ฅผ ๋Šฅ๋ ฅ/๋ฌธ์ œ์ •์˜/ํ•ด๊ฒฐ๋ฒ•/์‹ค์„ธ๊ณ„ ์„ฑ์ˆ™๋„๋กœ ๋‚˜๋ˆ  ์ข…ํ•ฉ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
๋‹จ์ˆœํžˆ SOTA๋ฅผ ๋‚˜์—ดํ•˜๋Š” ๋Œ€์‹ , ์˜์—ญ๋ณ„ ์„ฑ์ˆ™๋„ ์ฐจ์ด์™€ ๊ณตํ†ต ๋ณ‘๋ชฉ์„ ์ฐพ๋Š” ๋ฐ ์ดˆ์ ์„ ๋‘”๋‹ค.

#Slide 04. Taxonomy: ์‹ค์„ธ๊ณ„ ์„ฑ๊ณต ๋ ˆ๋ฒจ

Slide 04

๊ธฐ์ˆ ์„ฑ์ˆ™๋„(TRL)์™€ ์œ ์‚ฌํ•œ ๊ด€์ ์œผ๋กœ, ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋ฅผ โ€œ์‹ค์‚ฌ์šฉ ๊ฐ€๋Šฅ์„ฑโ€ ๊ธฐ์ค€์œผ๋กœ ํ‰๊ฐ€ํ•œ๋‹ค.
๊ฐ™์€ ์„ฑ๋Šฅ ์ˆ˜์น˜๋ผ๋„ ์–ด๋А ํ™˜๊ฒฝ์—์„œ ๊ฒ€์ฆ๋๋Š”์ง€์— ๋”ฐ๋ผ ํ•ด์„์ด ๋‹ฌ๋ผ์ ธ์•ผ ํ•œ๋‹ค๋Š” ๋ฉ”์‹œ์ง€๋‹ค.

#Slide 05. Taxonomy: ๋กœ๋ด‡ ๋Šฅ๋ ฅ ๋ถ„๋ฅ˜

Slide 05

๋Šฅ๋ ฅ์„ ์ด๋™(Mobility), ์กฐ์ž‘(Manipulation), ํƒ€ ์—์ด์ „ํŠธ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ๋ถ„ํ•ดํ•œ๋‹ค.
์ด ๋ถ„๋ฅ˜๋Š” ์ดํ›„ ๊ฐ ์žฅ์—์„œ ์™œ ์–ด๋–ค ์˜์—ญ์€ ๋น ๋ฅด๊ฒŒ ์„ฑ์ˆ™ํ•˜๊ณ , ์–ด๋–ค ์˜์—ญ์€ ๋”๋”˜์ง€๋ฅผ ๋น„๊ตํ•˜๋Š” ๊ธฐ์ค€์ด ๋œ๋‹ค.

#Slide 06. Locomotion ๊ฐœ์š”

Slide 06

์ด๋™ ์ œ์–ด๋Š” DRL์˜ ๋Œ€ํ‘œ ์„ฑ๊ณต ์˜์—ญ์ด๋‹ค.
ํŠนํžˆ ์‚ฌ์กฑ๋ณดํ–‰์—์„œ sim-to-real ํŒŒ์ดํ”„๋ผ์ธ์ด ๋น„๊ต์  ์•ˆ์ •์ ์œผ๋กœ ์ž‘๋™ํ•œ ์‚ฌ๋ก€๊ฐ€ ์ถ•์ ๋˜์—ˆ๋‹ค.

#Slide 07. Locomotion ํ•ต์‹ฌ ์š”์•ฝ

Slide 07

์‚ฌ์กฑ๋ณดํ–‰์€ ์„ฑ์ˆ™๋„๊ฐ€ ๋†’์ง€๋งŒ, ์ด์กฑ๋ณดํ–‰์€ ๋™์—ญํ•™ ๋‚œ์ด๋„์™€ ํ•˜๋“œ์›จ์–ด ์ ‘๊ทผ์„ฑ ๋•Œ๋ฌธ์— ์ƒ๋Œ€์ ์œผ๋กœ ์–ด๋ ต๋‹ค.
๋˜ํ•œ zero-shot sim-to-real๊ณผ privileged information์ด ์ž์ฃผ ์“ฐ์˜€๊ณ , ํ–ฅํ›„ ๊ณผ์ œ๋Š” ์•ˆ์ „ํ•˜๊ณ  ํšจ์œจ์ ์ธ ์‹ค์„ธ๊ณ„ ํ•™์Šต์ด๋‹ค.

#Slide 08. Navigation ๊ฐœ์š”

Slide 08

๋„ค๋น„๊ฒŒ์ด์…˜์€ ์‹ค์ œ ์ ์šฉ ๋งฅ๋ฝ์ด ๋‹ค์–‘ํ•ด ํ‰๊ฐ€๊ฐ€ ์–ด๋ ต๋‹ค.
์‹ค๋‚ด ์ž์œจ์ฃผํ–‰, ๋“œ๋ก , ์ฐจ๋Ÿ‰ ๋“ฑ ํ”Œ๋žซํผ๋ณ„ ์š”๊ตฌ ์•ˆ์ „์„ฑ์ด ํฌ๊ฒŒ ๋‹ค๋ฅด๋‹ค๋Š” ์ ์ด ์ค‘์š”ํ•˜๋‹ค.

#Slide 09. Navigation ํ•ต์‹ฌ ์š”์•ฝ

Slide 09

์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” end-to-end RL์ด ๊ฐ•๋ ฅํ•˜์ง€๋งŒ, ์‹ค์ œ ์‹œ์Šคํ…œ์€ ๋ชจ๋“ˆํ˜• ๊ตฌ์กฐ๊ฐ€ ์—ฌ์ „ํžˆ ์šฐ์„ธํ•˜๋‹ค.
ํŠนํžˆ ์•ˆ์ „์ด ์ค‘์š”ํ•œ ๋„๋ฉ”์ธ(์ž์œจ์ฃผํ–‰ ๋“ฑ)์—์„œ๋Š” RL ๋‹จ๋…๋ณด๋‹ค ๊ณ ์ „์  ๊ธฐ๋ฒ•๊ณผ์˜ ๊ฒฐํ•ฉ์ด ํ˜„์‹ค์ ์ด๋‹ค.

#Slide 10. Manipulation ๊ฐœ์š”

Slide 10

์กฐ์ž‘์€ ์ƒํƒœ/ํ–‰๋™๊ณต๊ฐ„์ด ํฌ๊ณ  ์ ‘์ด‰ ๋™์—ญํ•™์ด ๋ณต์žกํ•ด ํ•™์Šต ๋‚œ์ด๋„๊ฐ€ ๋†’๋‹ค.
๊ทธ๋ž˜๋„ ๊ณผ์—…์„ ์ œํ•œํ•˜๋ฉด ์‹ค์„ธ๊ณ„ ์„ฑ๊ณผ๊ฐ€ ๋น ๋ฅด๊ฒŒ ์˜ฌ๋ผ๊ฐ„๋‹ค๋Š” ์ ์ด ๊ด€์ฐฐ๋œ๋‹ค.

#Slide 11. Manipulation ํ•ต์‹ฌ ์š”์•ฝ

Slide 11

grasping, in-hand์ฒ˜๋Ÿผ ๊ณผ์—… ์ •์˜๊ฐ€ ๋ช…ํ™•ํ•œ ๊ฒฝ์šฐ zero-shot sim-to-real์ด ์ž˜ ์ž‘๋™ํ•œ๋‹ค.
๋ฐ˜๋Œ€๋กœ open-world pick-and-place๋Š” ๋‹ค์–‘์„ฑ๊ณผ ์žฅ๊ธฐ ์˜์กด์„ฑ ๋•Œ๋ฌธ์— ๋ฉ€ํ‹ฐํƒœ์Šคํฌ/ํ‰์ƒํ•™์Šต/์ž์œจ ๋ฆฌ์…‹ ๋“ฑ ์ถ”๊ฐ€ ๊ธฐ์ˆ ์ด ํ•„์š”ํ•˜๋‹ค.

#Slide 12. Manipulation (cont.) ํ†ตํ•ฉ ๊ณผ์ œ

Slide 12

ํ˜„์žฌ ์—ฐ๊ตฌ๋Š” ์กฐ์ž‘ ํ•˜์œ„๋ฌธ์ œ๋ฅผ ๋ถ„๋ฆฌํ•ด ํ‘ธ๋Š” ๊ฒฝํ–ฅ์ด ๊ฐ•ํ•˜๋‹ค.
์•ž์œผ๋กœ๋Š” ์ถฉ๋ŒํšŒํ”ผ, ๋Œ€์นญ์„ฑ priors, ๋‹ค์–‘ํ•œ ์•ก์…˜๊ณต๊ฐ„์„ ํ•œ ์‹œ์Šคํ…œ์œผ๋กœ ํ†ตํ•ฉํ•˜๋Š” ์„ค๊ณ„๊ฐ€ ํ•ต์‹ฌ ๊ณผ์ œ๊ฐ€ ๋œ๋‹ค.

#Slide 13. MoMa(Mobile Manipulation) ๊ฐœ์š”

Slide 13

MoMa๋Š” ์ด๋™๊ณผ ์กฐ์ž‘์„ ๋™์‹œ์— ์š”๊ตฌํ•˜๋Š” ๋ณตํ•ฉ ๋ฌธ์ œ๋‹ค.
์‹คํ—˜ ์„ค๊ณ„ ์ž์ฒด๊ฐ€ ์–ด๋ ค์›Œ ๋‹จ์ผ ์กฐ์ž‘ ๊ณผ์ œ ๋Œ€๋น„ ์„ฑ๊ณต ์‚ฌ๋ก€๊ฐ€ ์ ์€ ํŽธ์ด๋‹ค.

#Slide 14. MoMa ํ•ต์‹ฌ ์š”์•ฝ

Slide 14

์ดˆ๊ธฐ ์„ฑ๊ณต์€ ์žˆ์—ˆ์ง€๋งŒ ์ฃผ๋กœ ๋‹จ๊ธฐ ๊ณผ์—… ์ค‘์‹ฌ์ด๋ฉฐ, ์•ก์…˜๊ณต๊ฐ„ ์„ ํƒ์ด ์„ฑ๋Šฅ์— ํฐ ์˜ํ–ฅ์„ ์ค€๋‹ค.
๋ฉ€ํ‹ฐํƒœ์Šคํฌ, ์žฅ๊ธฐ๊ธฐ์–ต, ์•ˆ์ „ํƒ์ƒ‰์ด MoMa ํ™•์žฅ์˜ ํ•ต์‹ฌ ๋ณ‘๋ชฉ์œผ๋กœ ์ œ์‹œ๋œ๋‹ค.

#Slide 15. HRI ๊ฐœ์š”

Slide 15

HRI๋Š” ์‚ฌ๋žŒ์ด ํ™˜๊ฒฝ์˜ ์ผ๋ถ€๊ฐ€ ์•„๋‹ˆ๋ผ ์ƒํ˜ธ์ž‘์šฉ ์ฃผ์ฒด๋ผ๋Š” ์ ์—์„œ ๋‚œ์ด๋„๊ฐ€ ๊ธ‰์ƒ์Šนํ•œ๋‹ค.
์‚ฌ๋žŒ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋น„์šฉ๊ณผ ์œค๋ฆฌ/์•ˆ์ „ ๋ฌธ์ œ ๋•Œ๋ฌธ์— ๋Œ€๊ทœ๋ชจ ํ•™์Šต์ด ์‰ฝ์ง€ ์•Š๋‹ค.

#Slide 16. HRI ํ•ต์‹ฌ ์š”์•ฝ

Slide 16

์‚ฌ๋žŒ ํ–‰๋™์€ ๋น„๋งˆ์ฝ”ํ”„์„ฑ, ์ œํ•œํ•ฉ๋ฆฌ์„ฑ ๋“ฑ์œผ๋กœ ๋‹จ์ˆœ ๋ชจ๋ธ๋ง์ด ์–ด๋ ต๋‹ค.
ํ–ฅํ›„ ๋ฐฉํ–ฅ์€ ์‚ฌ๋žŒ๊ณผ ํ•จ๊ป˜ํ•˜๋Š” ์•ˆ์ „ํ•œ ์‹ค์„ธ๊ณ„ ํ•™์Šต, ๊ทธ๋ฆฌ๊ณ  ๋” ํ˜„์‹ค์ ์ธ ์ธ๊ฐ„ํ–‰๋™ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ตฌ์ถ•์ด๋‹ค.

#Slide 17. Multi-Robot ๊ฐœ์š”

Slide 17

๋‹ค์ค‘๋กœ๋ด‡์€ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ์ธํ•ด ๋ฌธ์ œ ๋ณต์žก๋„๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•œ๋‹ค.
๊ฐœ๋ณ„ ์ •์ฑ… ์ตœ์ ํ™”๋ฟ ์•„๋‹ˆ๋ผ ํŒ€ ๋‹จ์œ„ ์•ˆ์ •์„ฑ๊ณผ ํ™•์žฅ์„ฑ์ด ๋™์‹œ์— ์š”๊ตฌ๋œ๋‹ค.

#Slide 18. Multi-Robot ํ•ต์‹ฌ ์š”์•ฝ

Slide 18

๋™์งˆ์  ํ˜‘์—… ํ™˜๊ฒฝ(์˜ˆ: ์ถฉ๋ŒํšŒํ”ผ)์—์„œ๋Š” ์„ฑ๊ณต์ด ์žˆ์œผ๋‚˜, ์ผ๋ฐ˜์  ๋น„ํ˜‘์กฐ ํ™˜๊ฒฝ์€ ์•„์ง ๋ฏธ์„ฑ์ˆ™ํ•˜๋‹ค.
ํ†ต์‹  ์„ค๊ณ„, ํ•™์Šต ์ˆ˜๋ ด์„ฑ, ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ํ˜‘์—… ์ „๋žต์ด ํ•ต์‹ฌ ๊ณผ์ œ๋‹ค.

Slide 19

์„ฑ์ˆ™ํ•œ ์˜์—ญ์€ locomotion๊ณผ ์ผ๋ถ€ navigation/manipulation์ด๋ฉฐ, MoMa/HRI/multi-robot์€ ์ƒ๋Œ€์ ์œผ๋กœ ๋œ ์„ฑ์ˆ™ํ•˜๋‹ค.
์„ฑ๊ณต ์‚ฌ๋ก€ ๋‹ค์ˆ˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ตฌ์ถ•์ด ๋น„๊ต์  ์‰ฌ์šด ๋ฌธ์ œ์—์„œ์˜ sim-to-real + ์น˜๋ฐ€ํ•œ ๋ณด์ƒ์„ค๊ณ„๋ผ๋Š” ๊ณตํ†ต์ ์„ ๋ณด์ธ๋‹ค.

#Slide 20. Future Directions I

Slide 20

ํ–ฅํ›„ 1์ˆœ์œ„๋Š” ํ•™์Šต ์•ˆ์ •์„ฑ/์ƒ˜ํ”Œ ํšจ์œจ ๊ฐœ์„ ๊ณผ ์‹ค์„ธ๊ณ„ ํ•™์Šต ๊ฐ€๋Šฅ์„ฑ ํ™•๋Œ€๋‹ค.
ํŠนํžˆ ์žฅ๊ธฐ ๊ณผ์—…์—์„œ โ€œ์–ด๋–ค ์Šคํ‚ฌ์„ ํ•™์Šตํ•ด ์–ด๋–ป๊ฒŒ ์กฐํ•ฉํ• ์ง€โ€๊ฐ€ ์‹œ์Šคํ…œ ์„ค๊ณ„์˜ ํ•ต์‹ฌ ์งˆ๋ฌธ์œผ๋กœ ์ œ์‹œ๋œ๋‹ค.

#Slide 21. Future Directions II

Slide 21

๋ณด์ƒ์„ค๊ณ„, ์•ก์…˜๊ณต๊ฐ„, ๊ณ ์ „ ์ œ์–ด์™€์˜ ๊ฒฐํ•ฉ์„ ์›๋ฆฌ์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ์ ‘๊ทผ์ด ํ•„์š”ํ•˜๋‹ค.
๋˜ํ•œ ํ‘œ์ค€ ๋ฒค์น˜๋งˆํฌ์™€ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ ํ™œ์šฉ(์ผ๋ฐ˜ํ™”, ์–ธ์–ด์กฐ๊ฑด, ๋ฐ์ดํ„ฐ/์‹œ๋ฎฌ ์ƒ์„ฑ)์ด ์ค‘์š”ํ•œ ํ™•์žฅ ๋ฐฉํ–ฅ์ด๋‹ค.

#Slide 22. Multi-Robot Interaction ํ•ต์‹ฌ ์š”์•ฝ

Slide 22

๋™์งˆ์  ํ˜‘๋ ฅ ํ™˜๊ฒฝ์—์„œ๋Š” ์„ฑ๊ณผ๊ฐ€ ์žˆ์ง€๋งŒ, ๋ณต์žก๋„์™€ ํ™•์žฅ์„ฑ ๋ฌธ์ œ๊ฐ€ ์—ฌ์ „ํžˆ ํฌ๋‹ค๋Š” ์ ์„ ์ •๋ฆฌํ•œ๋‹ค.
ํ•ต์‹ฌ ๊ณผ์ œ๋Š” ์—์ด์ „ํŠธ ๊ฐ„ ํ†ต์‹ , ํ•™์Šต ์ˆ˜๋ ด์„ฑ/์•ˆ์ •์„ฑ, ๊ทธ๋ฆฌ๊ณ  ๋น„ํ˜‘์กฐ ์ผ๋ฐ˜ ์ƒํ™ฉ์œผ๋กœ์˜ ํ™•์žฅ์ด๋‹ค.

Slide 23

์„ฑ์ˆ™ํ•œ ์˜์—ญ(Locomotion, ์ผ๋ถ€ Navigation/Manipulation)๊ณผ ๋ฏธ์„ฑ์ˆ™ ์˜์—ญ(MoMa, HRI, Multi-Robot)์„ ๋น„๊ตํ•ด ์ „์ฒด ์ง€ํ˜•์„ ๋ณด์—ฌ์ค€๋‹ค.
๋˜ํ•œ ์„ฑ์ˆ™ํ•œ ํ•ด๋ฒ•์˜ ๊ณตํ†ต์ ์œผ๋กœ zero-shot sim-to-real, dense reward engineering, on-policy ํ•™์Šต ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•œ๋‹ค.

#Slide 24. Key Future Directions

Slide 24

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์œผ๋กœ ๋ณด์ƒ/ํ–‰๋™๊ณต๊ฐ„์˜ ์›๋ฆฌ์  ์„ค๊ณ„, ๊ณ ์ „์  ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๊ณผ์˜ ํ†ตํ•ฉ, ํ‘œ์ค€ ๋ฒค์น˜๋งˆํ‚น์„ ๊ฐ•์กฐํ•œ๋‹ค.
๋˜ํ•œ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ์ผ๋ฐ˜ํ™”, ์–ธ์–ด ์กฐ๊ฑดํ™”, ๋ณด์ƒ/์‹œ๋ฎฌ ์ž์‚ฐ ์ƒ์„ฑ ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์žฅ ํฌ์ธํŠธ๋กœ ์ œ์‹œํ•œ๋‹ค.

#Slide 25. Additional Table: Problem Formulation (Table 1)

Slide 25

์ด ์žฅ์€ ๋…ผ๋ฌธ์˜ Table 1์„ ํ†ตํ•ด ๋ฌธ์ œ ๊ณต์‹ํ™” ์ถ•(action/observation/reward)์„ ๊ธฐ์ค€์œผ๋กœ ๊ธฐ์กด ๋ฌธํ—Œ์„ ์ฒด๊ณ„์ ์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค.
์ฆ‰, ์–ด๋–ค ๊ณผ์ œ์—์„œ ์–ด๋–ค ๋ฌธ์ œ ์ •์˜๊ฐ€ ๋งŽ์ด ์“ฐ์˜€๋Š”์ง€ ํ•œ๋ˆˆ์— ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋Š” ์ฐธ๊ณ  ํ‘œ๋‹ค.

#Slide 26. Additional Table: Problem Formulation (Table 1, continued)

Slide 26

Slide 25์˜ ์—ฐ์† ํŽ˜์ด์ง€๋กœ, ๋™์ผํ•œ ๋ถ„๋ฅ˜ ๊ธฐ์ค€์„ ๋” ๋งŽ์€ ๋ฌธํ—Œ์— ํ™•์žฅํ•ด ๋ณด์—ฌ์ค€๋‹ค.
๋ฐœํ‘œ ์‹œ์—๋Š” ๋‚ด ๊ด€์‹ฌ ๊ณผ์ œ์™€ ์œ ์‚ฌํ•œ ๋ฌธ์ œ ์„ค์ •(๋ณด์ƒ/๊ด€์ธก/ํ–‰๋™๊ณต๊ฐ„)์„ ์ฐพ์•„ ๊ทผ๊ฑฐ๋กœ ์ธ์šฉํ•˜๊ธฐ ์ข‹๋‹ค.

#Slide 27. Additional Table: Problem Formulation (Table 2)

Slide 27

Table 2์—์„œ๋Š” ๋‹ค๋ฅธ ๊ด€์ ์˜ ๋ฌธ์ œ ๊ณต์‹ํ™” ๋ถ„๋ฅ˜๋ฅผ ๋ณด๊ฐ•ํ•ด, domain ๊ฐ„ ๊ณตํ†ต ํŒจํ„ด๊ณผ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•˜๊ฒŒ ํ•ด์ค€๋‹ค.
ํŠนํžˆ ์–ด๋А ๋„๋ฉ”์ธ์ด sparse/dense reward์— ์˜์กดํ•˜๋Š”์ง€, ๊ด€์ธก ์ฐจ์›์ด ์–ด๋–ป๊ฒŒ ๋‹ฌ๋ผ์ง€๋Š”์ง€ ํ™•์ธํ•˜๋Š” ์šฉ๋„๋กœ ์œ ์šฉํ•˜๋‹ค.

#Slide 28. Additional Table: Problem Formulation (Table 2, continued)

Slide 28

Slide 27์˜ ์—ฐ์žฅ์œผ๋กœ, ํ‘œ ๊ธฐ๋ฐ˜ ๊ทผ๊ฑฐ๋ฅผ ์ถฉ๋ถ„ํžˆ ์ œ์‹œํ•ด ๋ฐœํ‘œ ๊ฒฐ๋ก ์˜ ์‹ ๋ขฐ๋„๋ฅผ ๋†’์ด๋Š” ํŽ˜์ด์ง€๋‹ค.
์š”์•ฝํ•  ๋•Œ๋Š” โ€œ๋‚ด๊ฐ€ ์„ ํƒํ•œ ํƒœ์Šคํฌ๊ฐ€ ์™œ ํ•ด๋‹น ๋ฌธ์ œ์„ค์ •์„ ํƒํ•ด์•ผ ํ•˜๋Š”์ง€โ€๋ฅผ ์ด ํ‘œ์™€ ์—ฐ๊ฒฐํ•˜๋ฉด ์ข‹๋‹ค.

#Slide 29. Additional Table: Solution Approach (Table 3)

Slide 29

Table 3๋Š” solution approach(์˜ˆ: sim-to-real, model-free/model-based, policy optimization) ๊ด€์ ์—์„œ ๋ฌธํ—Œ์„ ๋ถ„๋ฅ˜ํ•œ๋‹ค.
์„ฑ๊ณต ์‚ฌ๋ก€๊ฐ€ ์–ด๋–ค ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ์กฐํ•ฉ์—์„œ ์ฃผ๋กœ ๋‚˜์™”๋Š”์ง€ ๊ทผ๊ฑฐ๋ฅผ ์ œ์‹œํ•˜๋Š” ํ•ต์‹ฌ ๋ถ€๋ก ํ‘œ๋‹ค.

#Slide 30. Additional Table: Solution Approach (Table 3, continued)

Slide 30

๋งˆ์ง€๋ง‰ ์žฅ์€ solution approach ํ‘œ์˜ ์—ฐ์†์œผ๋กœ, ๋ฆฌ๋ทฐ ์ „์ฒด์˜ โ€œ๋ฐฉ๋ฒ•๋ก ๋ณ„ ์ง€ํ˜•๋„โ€๋ฅผ ๋งˆ๋ฌด๋ฆฌํ•œ๋‹ค.
๋‚ด ๊ฒฐ๋ก ์—์„œ๋Š” ์ด ํ‘œ๋ฅผ ๊ทผ๊ฑฐ๋กœ, ๋‹ค์Œ ํ”„๋กœ์ ํŠธ์—์„œ ์ฑ„ํƒํ•  ํ•™์Šต ์ „๋žต(์˜ˆ: zero-shot sim-to-real vs real-world finetuning)์„ ๋ช…ํ™•ํžˆ ์ œ์•ˆํ•˜๋ฉด ์ข‹๋‹ค.


#5) ๋‚ด ๊ฒฐ๋ก  (์ดˆ์•ˆ)

  1. DRL์˜ ์‹ค์ œ ์„ฑ๊ณต์€ ์ด๋ฏธ ์กด์žฌํ•˜์ง€๋งŒ, ๋ฌธ์ œ ์œ ํ˜•์— ๋”ฐ๋ผ ์„ฑ์ˆ™๋„ ํŽธ์ฐจ๊ฐ€ ๋งค์šฐ ํฌ๋‹ค.
  2. ์ง€๊ธˆ๊นŒ์ง€์˜ ์„ฑ๊ณต ๊ณตํ†ต์ ์€ sim-to-real ๊ฐ€๋Šฅ ๋ฌธ์ œ + ์ •๊ตํ•œ ์—”์ง€๋‹ˆ์–ด๋ง(๋ณด์ƒ/๋„๋ฉ”์ธ๋žœ๋คํ™”/์•ก์…˜์„ค๊ณ„)์ด๋‹ค.
  3. ์•ž์œผ๋กœ์˜ ์Šน๋ถ€์ฒ˜๋Š” ์‹ค์„ธ๊ณ„ ํ•™์Šต ์•ˆ์ •ํ™”, ์žฅ๊ธฐ๊ณผ์—… ํ†ตํ•ฉ, ์‚ฌ๋žŒ/๋‹ค์ค‘์—์ด์ „ํŠธ ์ƒํ˜ธ์ž‘์šฉ ์ผ๋ฐ˜ํ™”๋‹ค.

#6) ์›๋ฌธ/๋ฐœํ‘œ/์—ฐ๊ตฌ ๊ธฐ๊ด€ ๋งํฌ

Share this post