mirror of
https://github.com/s0md3v/roop.git
synced 2025-09-26 20:31:16 +08:00
Next (#766)
* Check frames before processing * Enhance on bbox by factor 1.5 * Introduce ARGS for temporary frames * Improve README * Improve README * Bump version * Rework on ffmpeg encoders and quality ranges * Rework on ffmpeg encoders and quality ranges * Update README * Fix range in CLI * Update gui demo
This commit is contained in:
59
README.md
59
README.md
@@ -1,27 +1,31 @@
|
|||||||
Take a video and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training.
|
Take a video and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training.
|
||||||
|
|
||||||
You can watch some demos [here](https://drive.google.com/drive/folders/1KHv8n_rd3Lcr2v7jBq1yPSTWM554Gq8e?usp=sharing). A StableDiffusion extension is also available, [here](https://github.com/s0md3v/sd-webui-roop).
|
You can watch some demos [here](https://drive.google.com/drive/folders/1KHv8n_rd3Lcr2v7jBq1yPSTWM554Gq8e?usp=sharing).
|
||||||
|
A Stable Diffusion extension is also available, [here](https://github.com/s0md3v/sd-webui-roop).
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
## Disclaimer
|
## Disclaimer
|
||||||
|
|
||||||
This software is meant to be a productive contribution to the rapidly growing AI-generated media industry. It will help artists with tasks such as animating a custom character or using the character as a model for clothing etc.
|
This software is meant to be a productive contribution to the rapidly growing AI-generated media industry. It will help artists with tasks such as animating a custom character or using the character as a model for clothing etc.
|
||||||
|
|
||||||
The developers of this software are aware of its possible unethical applications and are committed to take preventative measures against them. It has a built-in check which prevents the program from working on inappropriate media including but not limited to nudity, graphic content, sensitive material such as war footage etc. We will continue to develop this project in the positive direction while adhering to law and ethics. This project may be shut down or include watermarks on the output if requested by law.
|
The developers of this software are aware of its possible unethical applications and are committed to take preventative measures against them. It has a built-in check which prevents the program from working on inappropriate media including but not limited to nudity, graphic content, sensitive material such as war footage etc. We will continue to develop this project in the positive direction while adhering to law and ethics. This project may be shut down or include watermarks on the output if requested by law.
|
||||||
|
|
||||||
Users of this software are expected to use this software responsibly while abiding the local law. If face of a real person is being used, users are suggested to get consent from the concerned person and clearly mention that it is a deepfake when posting content online. Developers of this software will not be responsible for actions of end-users.
|
Users of this software are expected to use this software responsibly while abiding the local law. If face of a real person is being used, users are suggested to get consent from the concerned person and clearly mention that it is a deepfake when posting content online. Developers of this software will not be responsible for actions of end-users.
|
||||||
|
|
||||||
## How do I install it?
|
## How to install?
|
||||||
|
|
||||||
### Basic
|
### Basic
|
||||||
|
|
||||||
It is more likely to work on your computer but it will also be very slow. You can follow instructions for the basic install [here](https://github.com/s0md3v/roop/wiki/1.-Installation).
|
It is more likely to work on your computer, but will be quite slow. Follow instructions for the basic installation [here](https://github.com/s0md3v/roop/wiki/1.-Installation).
|
||||||
|
|
||||||
### Acceleration
|
### Acceleration
|
||||||
|
|
||||||
If you have a good GPU and are ready for solving any software issues you may face, you can enable GPU which is wayyy faster. To do this, first follow the basic install instructions given above and then follow GPU-specific instructions [here](https://github.com/s0md3v/roop/wiki/2.-Acceleration).
|
If you own a capable GPU and are prepared to address any software problems, you have the option to activate such acceleration, which offers significantly enhanced speed. Once you finished the basic installation, you can follow the instructions for the acceleration installation [here](https://github.com/s0md3v/roop/wiki/2.-Acceleration).
|
||||||
|
|
||||||
## How do I use it?
|
## How to use?
|
||||||
|
|
||||||
|
### UI
|
||||||
|
|
||||||
Executing `python run.py` command will launch this window:
|
Executing `python run.py` command will launch this window:
|
||||||
|
|
||||||
@@ -29,33 +33,38 @@ Executing `python run.py` command will launch this window:
|
|||||||
|
|
||||||
Choose a face (image with desired face) and the target image/video (image/video in which you want to replace the face) and click on `Start`. Open file explorer and navigate to the directory you select your output to be in. You will find a directory named `<video_title>` where you can see the frames being swapped in realtime. Once the processing is done, it will create the output file. That's it.
|
Choose a face (image with desired face) and the target image/video (image/video in which you want to replace the face) and click on `Start`. Open file explorer and navigate to the directory you select your output to be in. You will find a directory named `<video_title>` where you can see the frames being swapped in realtime. Once the processing is done, it will create the output file. That's it.
|
||||||
|
|
||||||
Additional command line arguments are given below. To learn out what they do, check [this guide](https://github.com/s0md3v/roop/wiki/Advanced-Options).
|
## CLI
|
||||||
|
|
||||||
|
Additional command line arguments are given below. To learn out what they do, check the guide [here](https://github.com/s0md3v/roop/wiki/Advanced-Options).
|
||||||
|
|
||||||
```
|
```
|
||||||
options:
|
options:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
-s SOURCE_PATH, --source SOURCE_PATH select an source image
|
-s SOURCE_PATH, --source SOURCE_PATH select an source image
|
||||||
-t TARGET_PATH, --target TARGET_PATH select an target image or video
|
-t TARGET_PATH, --target TARGET_PATH select an target image or video
|
||||||
-o OUTPUT_PATH, --output OUTPUT_PATH select output file or directory
|
-o OUTPUT_PATH, --output OUTPUT_PATH select output file or directory
|
||||||
--frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...] frame processors (choices: face_swapper, face_enhancer, ...)
|
--frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...] frame processors (choices: face_swapper, face_enhancer, ...)
|
||||||
--keep-fps keep target fps
|
--keep-fps keep target fps
|
||||||
--keep-frames keep temporary frames
|
--keep-frames keep temporary frames
|
||||||
--skip-audio skip target audio
|
--skip-audio skip target audio
|
||||||
--many-faces process every face
|
--many-faces process every face
|
||||||
--reference-face-position REFERENCE_FACE_POSITION position of the reference face
|
--reference-face-position REFERENCE_FACE_POSITION position of the reference face
|
||||||
--reference-frame-number REFERENCE_FRAME_NUMBER number of the reference frame
|
--reference-frame-number REFERENCE_FRAME_NUMBER number of the reference frame
|
||||||
--similar-face-distance SIMILAR_FACE_DISTANCE face distance used for recognition
|
--similar-face-distance SIMILAR_FACE_DISTANCE face distance used for recognition
|
||||||
--video-encoder {libx264,libx265,libvpx-vp9} adjust output video encoder
|
--temp-frame-format {jpg,png} image format used for frame extraction
|
||||||
--video-quality [0-51] adjust output video quality
|
--temp-frame-quality [1-100] image quality used for frame extraction
|
||||||
--max-memory MAX_MEMORY maximum amount of RAM in GB
|
--output-video-encoder {libx264,libx265,libvpx-vp9,h264_nvenc,hevc_nvenc} encoder used for the output video
|
||||||
--execution-provider {cpu} [{cpu} ...] available execution provider (choices: cpu, ...)
|
--output-video-quality [1-100] quality used for the output video
|
||||||
--execution-threads EXECUTION_THREADS number of execution threads
|
--max-memory MAX_MEMORY maximum amount of RAM in GB
|
||||||
-v, --version show program's version number and exit
|
--execution-provider {cpu} [{cpu} ...] available execution provider (choices: cpu, ...)
|
||||||
|
--execution-threads EXECUTION_THREADS number of execution threads
|
||||||
|
-v, --version show program's version number and exit
|
||||||
```
|
```
|
||||||
|
|
||||||
Looking for a CLI mode? Using the -s/--source argument will make the run program in cli mode.
|
Using the `-s/--source`, `-t/--target` and `-o/--output` argument will run the program in headless mode.
|
||||||
|
|
||||||
## Credits
|
## Credits
|
||||||
|
|
||||||
- [henryruhs](https://github.com/henryruhs): for being an irreplaceable contributor to the project
|
- [henryruhs](https://github.com/henryruhs): for being an irreplaceable contributor to the project
|
||||||
- [ffmpeg](https://ffmpeg.org/): for making video related operations easy
|
- [ffmpeg](https://ffmpeg.org/): for making video related operations easy
|
||||||
- [deepinsight](https://github.com/deepinsight): for their [insightface](https://github.com/deepinsight/insightface) project which provided a well-made library and models.
|
- [deepinsight](https://github.com/deepinsight): for their [insightface](https://github.com/deepinsight/insightface) project which provided a well-made library and models.
|
||||||
|
BIN
gui-demo.png
BIN
gui-demo.png
Binary file not shown.
Before Width: | Height: | Size: 26 KiB After Width: | Height: | Size: 26 KiB |
27
roop/core.py
27
roop/core.py
@@ -44,8 +44,10 @@ def parse_args() -> None:
|
|||||||
program.add_argument('--reference-face-position', help='position of the reference face', dest='reference_face_position', type=int, default=0)
|
program.add_argument('--reference-face-position', help='position of the reference face', dest='reference_face_position', type=int, default=0)
|
||||||
program.add_argument('--reference-frame-number', help='number of the reference frame', dest='reference_frame_number', type=int, default=0)
|
program.add_argument('--reference-frame-number', help='number of the reference frame', dest='reference_frame_number', type=int, default=0)
|
||||||
program.add_argument('--similar-face-distance', help='face distance used for recognition', dest='similar_face_distance', type=float, default=0.85)
|
program.add_argument('--similar-face-distance', help='face distance used for recognition', dest='similar_face_distance', type=float, default=0.85)
|
||||||
program.add_argument('--video-encoder', help='adjust output video encoder', dest='video_encoder', default='libx264', choices=['libx264', 'libx265', 'libvpx-vp9'])
|
program.add_argument('--temp-frame-format', help='image format used for frame extraction', dest='temp_frame_format', default='png', choices=['jpg', 'png'])
|
||||||
program.add_argument('--video-quality', help='adjust output video quality', dest='video_quality', type=int, default=18, choices=range(52), metavar='[0-51]')
|
program.add_argument('--temp-frame-quality', help='image quality used for frame extraction', dest='temp_frame_quality', type=int, default=0, choices=range(100), metavar='[1-100]')
|
||||||
|
program.add_argument('--output-video-encoder', help='encoder used for the output video', dest='output_video_encoder', default='libx264', choices=['libx264', 'libx265', 'libvpx-vp9', 'h264_nvenc', 'hevc_nvenc'])
|
||||||
|
program.add_argument('--output-video-quality', help='quality used for the output video', dest='output_video_quality', type=int, default=35, choices=range(100), metavar='[1-100]')
|
||||||
program.add_argument('--max-memory', help='maximum amount of RAM in GB', dest='max_memory', type=int)
|
program.add_argument('--max-memory', help='maximum amount of RAM in GB', dest='max_memory', type=int)
|
||||||
program.add_argument('--execution-provider', help='available execution provider (choices: cpu, ...)', dest='execution_provider', default=['cpu'], choices=suggest_execution_providers(), nargs='+')
|
program.add_argument('--execution-provider', help='available execution provider (choices: cpu, ...)', dest='execution_provider', default=['cpu'], choices=suggest_execution_providers(), nargs='+')
|
||||||
program.add_argument('--execution-threads', help='number of execution threads', dest='execution_threads', type=int, default=suggest_execution_threads())
|
program.add_argument('--execution-threads', help='number of execution threads', dest='execution_threads', type=int, default=suggest_execution_threads())
|
||||||
@@ -65,8 +67,10 @@ def parse_args() -> None:
|
|||||||
roop.globals.reference_face_position = args.reference_face_position
|
roop.globals.reference_face_position = args.reference_face_position
|
||||||
roop.globals.reference_frame_number = args.reference_frame_number
|
roop.globals.reference_frame_number = args.reference_frame_number
|
||||||
roop.globals.similar_face_distance = args.similar_face_distance
|
roop.globals.similar_face_distance = args.similar_face_distance
|
||||||
roop.globals.video_encoder = args.video_encoder
|
roop.globals.temp_frame_format = args.temp_frame_format
|
||||||
roop.globals.video_quality = args.video_quality
|
roop.globals.temp_frame_quality = args.temp_frame_quality
|
||||||
|
roop.globals.output_video_encoder = args.output_video_encoder
|
||||||
|
roop.globals.output_video_quality = args.output_video_quality
|
||||||
roop.globals.max_memory = args.max_memory
|
roop.globals.max_memory = args.max_memory
|
||||||
roop.globals.execution_providers = decode_execution_providers(args.execution_provider)
|
roop.globals.execution_providers = decode_execution_providers(args.execution_provider)
|
||||||
roop.globals.execution_threads = args.execution_threads
|
roop.globals.execution_threads = args.execution_threads
|
||||||
@@ -151,7 +155,7 @@ def start() -> None:
|
|||||||
# process image to videos
|
# process image to videos
|
||||||
if predict_video(roop.globals.target_path):
|
if predict_video(roop.globals.target_path):
|
||||||
destroy()
|
destroy()
|
||||||
update_status('Creating temp resources...')
|
update_status('Creating temporary resources...')
|
||||||
create_temp(roop.globals.target_path)
|
create_temp(roop.globals.target_path)
|
||||||
# extract frames
|
# extract frames
|
||||||
if roop.globals.keep_fps:
|
if roop.globals.keep_fps:
|
||||||
@@ -163,10 +167,14 @@ def start() -> None:
|
|||||||
extract_frames(roop.globals.target_path)
|
extract_frames(roop.globals.target_path)
|
||||||
# process frame
|
# process frame
|
||||||
temp_frame_paths = get_temp_frame_paths(roop.globals.target_path)
|
temp_frame_paths = get_temp_frame_paths(roop.globals.target_path)
|
||||||
for frame_processor in get_frame_processors_modules(roop.globals.frame_processors):
|
if temp_frame_paths:
|
||||||
update_status('Progressing...', frame_processor.NAME)
|
for frame_processor in get_frame_processors_modules(roop.globals.frame_processors):
|
||||||
frame_processor.process_video(roop.globals.source_path, temp_frame_paths)
|
update_status('Progressing...', frame_processor.NAME)
|
||||||
frame_processor.post_process()
|
frame_processor.process_video(roop.globals.source_path, temp_frame_paths)
|
||||||
|
frame_processor.post_process()
|
||||||
|
else:
|
||||||
|
update_status('Frames not found...')
|
||||||
|
return
|
||||||
# create video
|
# create video
|
||||||
if roop.globals.keep_fps:
|
if roop.globals.keep_fps:
|
||||||
fps = detect_fps(roop.globals.target_path)
|
fps = detect_fps(roop.globals.target_path)
|
||||||
@@ -186,6 +194,7 @@ def start() -> None:
|
|||||||
update_status('Restoring audio might cause issues as fps are not kept...')
|
update_status('Restoring audio might cause issues as fps are not kept...')
|
||||||
restore_audio(roop.globals.target_path, roop.globals.output_path)
|
restore_audio(roop.globals.target_path, roop.globals.output_path)
|
||||||
# clean temp
|
# clean temp
|
||||||
|
update_status('Cleaning temporary resources...')
|
||||||
clean_temp(roop.globals.target_path)
|
clean_temp(roop.globals.target_path)
|
||||||
# validate video
|
# validate video
|
||||||
if is_video(roop.globals.target_path):
|
if is_video(roop.globals.target_path):
|
||||||
|
@@ -12,8 +12,10 @@ many_faces = None
|
|||||||
reference_face_position = None
|
reference_face_position = None
|
||||||
reference_frame_number = None
|
reference_frame_number = None
|
||||||
similar_face_distance = None
|
similar_face_distance = None
|
||||||
video_encoder = None
|
temp_frame_format = None
|
||||||
video_quality = None
|
temp_frame_quality = None
|
||||||
|
output_video_encoder = None
|
||||||
|
output_video_quality = None
|
||||||
max_memory = None
|
max_memory = None
|
||||||
execution_providers: List[str] = []
|
execution_providers: List[str] = []
|
||||||
execution_threads = None
|
execution_threads = None
|
||||||
|
@@ -1,2 +1,2 @@
|
|||||||
name = 'roop'
|
name = 'roop'
|
||||||
version = '1.2.0'
|
version = '1.3.0'
|
||||||
|
@@ -60,6 +60,12 @@ def post_process() -> None:
|
|||||||
|
|
||||||
def enhance_face(target_face: Face, temp_frame: Frame) -> Frame:
|
def enhance_face(target_face: Face, temp_frame: Frame) -> Frame:
|
||||||
start_x, start_y, end_x, end_y = map(int, target_face['bbox'])
|
start_x, start_y, end_x, end_y = map(int, target_face['bbox'])
|
||||||
|
padding_x = int((end_x - start_x) * 0.5)
|
||||||
|
padding_y = int((end_y - start_y) * 0.5)
|
||||||
|
start_x = max(0, start_x - padding_x)
|
||||||
|
start_y = max(0, start_y - padding_y)
|
||||||
|
end_x = max(0, end_x + padding_x)
|
||||||
|
end_y = max(0, end_y + padding_y)
|
||||||
temp_face = temp_frame[start_y:end_y, start_x:end_x]
|
temp_face = temp_frame[start_y:end_y, start_x:end_x]
|
||||||
if temp_face.size:
|
if temp_face.size:
|
||||||
with THREAD_SEMAPHORE:
|
with THREAD_SEMAPHORE:
|
||||||
|
@@ -12,8 +12,8 @@ from tqdm import tqdm
|
|||||||
|
|
||||||
import roop.globals
|
import roop.globals
|
||||||
|
|
||||||
TEMP_FILE = 'temp.mp4'
|
|
||||||
TEMP_DIRECTORY = 'temp'
|
TEMP_DIRECTORY = 'temp'
|
||||||
|
TEMP_VIDEO_FILE = 'temp.mp4'
|
||||||
|
|
||||||
# monkey patch ssl for mac
|
# monkey patch ssl for mac
|
||||||
if platform.system().lower() == 'darwin':
|
if platform.system().lower() == 'darwin':
|
||||||
@@ -21,7 +21,7 @@ if platform.system().lower() == 'darwin':
|
|||||||
|
|
||||||
|
|
||||||
def run_ffmpeg(args: List[str]) -> bool:
|
def run_ffmpeg(args: List[str]) -> bool:
|
||||||
commands = ['ffmpeg', '-hide_banner', '-hwaccel', 'auto', '-loglevel', roop.globals.log_level]
|
commands = ['ffmpeg', '-hide_banner', '-loglevel', roop.globals.log_level]
|
||||||
commands.extend(args)
|
commands.extend(args)
|
||||||
try:
|
try:
|
||||||
subprocess.check_output(commands, stderr=subprocess.STDOUT)
|
subprocess.check_output(commands, stderr=subprocess.STDOUT)
|
||||||
@@ -42,27 +42,35 @@ def detect_fps(target_path: str) -> float:
|
|||||||
return 30
|
return 30
|
||||||
|
|
||||||
|
|
||||||
def extract_frames(target_path: str, fps: float = 30) -> None:
|
def extract_frames(target_path: str, fps: float = 30) -> bool:
|
||||||
temp_directory_path = get_temp_directory_path(target_path)
|
temp_directory_path = get_temp_directory_path(target_path)
|
||||||
run_ffmpeg(['-i', target_path, '-pix_fmt', 'rgb24', '-vf', 'fps=' + str(fps), os.path.join(temp_directory_path, '%04d.png')])
|
temp_frame_quality = roop.globals.temp_frame_quality * 31 // 100
|
||||||
|
return run_ffmpeg(['-hwaccel', 'auto', '-i', target_path, '-q:v', str(temp_frame_quality), '-pix_fmt', 'rgb24', '-vf', 'fps=' + str(fps), os.path.join(temp_directory_path, '%04d.' + roop.globals.temp_frame_format)])
|
||||||
|
|
||||||
|
|
||||||
def create_video(target_path: str, fps: float = 30) -> None:
|
def create_video(target_path: str, fps: float = 30) -> bool:
|
||||||
temp_output_path = get_temp_output_path(target_path)
|
temp_output_path = get_temp_output_path(target_path)
|
||||||
temp_directory_path = get_temp_directory_path(target_path)
|
temp_directory_path = get_temp_directory_path(target_path)
|
||||||
run_ffmpeg(['-r', str(fps), '-i', os.path.join(temp_directory_path, '%04d.png'), '-c:v', roop.globals.video_encoder, '-crf', str(roop.globals.video_quality), '-pix_fmt', 'yuv420p', '-vf', 'colorspace=bt709:iall=bt601-6-625:fast=1', '-y', temp_output_path])
|
output_video_quality = (roop.globals.output_video_quality + 1) * 51 // 100
|
||||||
|
commands = ['-hwaccel', 'auto', '-r', str(fps), '-i', os.path.join(temp_directory_path, '%04d.' + roop.globals.temp_frame_format), '-c:v', roop.globals.output_video_encoder]
|
||||||
|
if roop.globals.output_video_encoder in ['libx264', 'libx265', 'libvpx']:
|
||||||
|
commands.extend(['-crf', str(output_video_quality)])
|
||||||
|
if roop.globals.output_video_encoder in ['h264_nvenc', 'hevc_nvenc']:
|
||||||
|
commands.extend(['-cq', str(output_video_quality)])
|
||||||
|
commands.extend(['-pix_fmt', 'yuv420p', '-vf', 'colorspace=bt709:iall=bt601-6-625:fast=1', '-y', temp_output_path])
|
||||||
|
return run_ffmpeg(commands)
|
||||||
|
|
||||||
|
|
||||||
def restore_audio(target_path: str, output_path: str) -> None:
|
def restore_audio(target_path: str, output_path: str) -> None:
|
||||||
temp_output_path = get_temp_output_path(target_path)
|
temp_output_path = get_temp_output_path(target_path)
|
||||||
done = run_ffmpeg(['-i', temp_output_path, '-i', target_path, '-c:v', 'copy', '-map', '0:v:0', '-map', '1:a:0', '-y', output_path])
|
done = run_ffmpeg(['-hwaccel', 'auto', '-i', temp_output_path, '-i', target_path, '-c:v', 'copy', '-map', '0:v:0', '-map', '1:a:0', '-y', output_path])
|
||||||
if not done:
|
if not done:
|
||||||
move_temp(target_path, output_path)
|
move_temp(target_path, output_path)
|
||||||
|
|
||||||
|
|
||||||
def get_temp_frame_paths(target_path: str) -> List[str]:
|
def get_temp_frame_paths(target_path: str) -> List[str]:
|
||||||
temp_directory_path = get_temp_directory_path(target_path)
|
temp_directory_path = get_temp_directory_path(target_path)
|
||||||
return glob.glob((os.path.join(glob.escape(temp_directory_path), '*.png')))
|
return glob.glob((os.path.join(glob.escape(temp_directory_path), '*.' + roop.globals.temp_frame_format)))
|
||||||
|
|
||||||
|
|
||||||
def get_temp_directory_path(target_path: str) -> str:
|
def get_temp_directory_path(target_path: str) -> str:
|
||||||
@@ -73,7 +81,7 @@ def get_temp_directory_path(target_path: str) -> str:
|
|||||||
|
|
||||||
def get_temp_output_path(target_path: str) -> str:
|
def get_temp_output_path(target_path: str) -> str:
|
||||||
temp_directory_path = get_temp_directory_path(target_path)
|
temp_directory_path = get_temp_directory_path(target_path)
|
||||||
return os.path.join(temp_directory_path, TEMP_FILE)
|
return os.path.join(temp_directory_path, TEMP_VIDEO_FILE)
|
||||||
|
|
||||||
|
|
||||||
def normalize_output_path(source_path: str, target_path: str, output_path: str) -> Optional[str]:
|
def normalize_output_path(source_path: str, target_path: str, output_path: str) -> Optional[str]:
|
||||||
|
Reference in New Issue
Block a user