Next (#766)

* Check frames before processing * Enhance on bbox by factor 1.5 * Introduce ARGS for temporary frames * Improve README * Improve README * Bump version * Rework on ffmpeg encoders and quality ranges * Rework on ffmpeg encoders and quality ranges * Update README * Fix range in CLI * Update gui demo
2025-09-26 20:31:16 +08:00 · 2023-07-25 16:26:12 +02:00
parent 312208a411
commit 2637b1f1e0
7 changed files with 80 additions and 46 deletions
--- a/README.md
+++ b/README.md
@@ -1,27 +1,31 @@
 Take a video and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training.
-You can watch some demos [here](https://drive.google.com/drive/folders/1KHv8n_rd3Lcr2v7jBq1yPSTWM554Gq8e?usp=sharing). A StableDiffusion extension is also available, [here](https://github.com/s0md3v/sd-webui-roop).
+You can watch some demos [here](https://drive.google.com/drive/folders/1KHv8n_rd3Lcr2v7jBq1yPSTWM554Gq8e?usp=sharing).
 A Stable Diffusion extension is also available, [here](https://github.com/s0md3v/sd-webui-roop).
 ![demo-gif](demo.gif)
 ## Disclaimer
 This software is meant to be a productive contribution to the rapidly growing AI-generated media industry. It will help artists with tasks such as animating a custom character or using the character as a model for clothing etc.
 The developers of this software are aware of its possible unethical applications and are committed to take preventative measures against them. It has a built-in check which prevents the program from working on inappropriate media including but not limited to nudity, graphic content, sensitive material such as war footage etc. We will continue to develop this project in the positive direction while adhering to law and ethics. This project may be shut down or include watermarks on the output if requested by law.
 Users of this software are expected to use this software responsibly while abiding the local law. If face of a real person is being used, users are suggested to get consent from the concerned person and clearly mention that it is a deepfake when posting content online. Developers of this software will not be responsible for actions of end-users.
-## How do I install it?
+## How to install?
 ### Basic
-It is more likely to work on your computer but it will also be very slow. You can follow instructions for the basic install [here](https://github.com/s0md3v/roop/wiki/1.-Installation).
+It is more likely to work on your computer, but will be quite slow. Follow instructions for the basic installation [here](https://github.com/s0md3v/roop/wiki/1.-Installation).
 ### Acceleration
-If you have a good GPU and are ready for solving any software issues you may face, you can enable GPU which is wayyy faster. To do this, first follow the basic install instructions given above and then follow GPU-specific instructions [here](https://github.com/s0md3v/roop/wiki/2.-Acceleration).
+If you own a capable GPU and are prepared to address any software problems, you have the option to activate such acceleration, which offers significantly enhanced speed. Once you finished the basic installation, you can follow the instructions for the acceleration installation [here](https://github.com/s0md3v/roop/wiki/2.-Acceleration).
-## How do I use it?
+## How to use?
 ### UI
 Executing `python run.py` command will launch this window:
@@ -29,33 +33,38 @@ Executing `python run.py` command will launch this window:
 Choose a face (image with desired face) and the target image/video (image/video in which you want to replace the face) and click on `Start`. Open file explorer and navigate to the directory you select your output to be in. You will find a directory named `<video_title>` where you can see the frames being swapped in realtime. Once the processing is done, it will create the output file. That's it.
-Additional command line arguments are given below. To learn out what they do, check [this guide](https://github.com/s0md3v/roop/wiki/Advanced-Options).
+## CLI
 Additional command line arguments are given below. To learn out what they do, check the guide [here](https://github.com/s0md3v/roop/wiki/Advanced-Options).
 ```
 options:
-  -h, --help                                               show this help message and exit
+  -h, --help                                                                 show this help message and exit
-  -s SOURCE_PATH, --source SOURCE_PATH                     select an source image
+  -s SOURCE_PATH, --source SOURCE_PATH                                       select an source image
-  -t TARGET_PATH, --target TARGET_PATH                     select an target image or video
+  -t TARGET_PATH, --target TARGET_PATH                                       select an target image or video
-  -o OUTPUT_PATH, --output OUTPUT_PATH                     select output file or directory
+  -o OUTPUT_PATH, --output OUTPUT_PATH                                       select output file or directory
-  --frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...]  frame processors (choices: face_swapper, face_enhancer, ...)
+  --frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...]                    frame processors (choices: face_swapper, face_enhancer, ...)
-  --keep-fps                                               keep target fps
+  --keep-fps                                                                 keep target fps
-  --keep-frames                                            keep temporary frames
+  --keep-frames                                                              keep temporary frames
-  --skip-audio                                             skip target audio
+  --skip-audio                                                               skip target audio
-  --many-faces                                             process every face
+  --many-faces                                                               process every face
-  --reference-face-position REFERENCE_FACE_POSITION        position of the reference face
+  --reference-face-position REFERENCE_FACE_POSITION                          position of the reference face
-  --reference-frame-number REFERENCE_FRAME_NUMBER          number of the reference frame
+  --reference-frame-number REFERENCE_FRAME_NUMBER                            number of the reference frame
-  --similar-face-distance SIMILAR_FACE_DISTANCE            face distance used for recognition
+  --similar-face-distance SIMILAR_FACE_DISTANCE                              face distance used for recognition
-  --video-encoder {libx264,libx265,libvpx-vp9}             adjust output video encoder
+  --temp-frame-format {jpg,png}                                              image format used for frame extraction
-  --video-quality [0-51]                                   adjust output video quality
+  --temp-frame-quality [1-100]                                               image quality used for frame extraction
-  --max-memory MAX_MEMORY                                  maximum amount of RAM in GB
+  --output-video-encoder {libx264,libx265,libvpx-vp9,h264_nvenc,hevc_nvenc}  encoder used for the output video
-  --execution-provider {cpu} [{cpu} ...]                   available execution provider (choices: cpu, ...)
+  --output-video-quality [1-100]                                             quality used for the output video
-  --execution-threads EXECUTION_THREADS                    number of execution threads
+  --max-memory MAX_MEMORY                                                    maximum amount of RAM in GB
-  -v, --version                                            show program's version number and exit
+  --execution-provider {cpu} [{cpu} ...]                                     available execution provider (choices: cpu, ...)
  --execution-threads EXECUTION_THREADS                                      number of execution threads
  -v, --version                                                              show program's version number and exit
 ```
-Looking for a CLI mode? Using the -s/--source argument will make the run program in cli mode.
+Using the `-s/--source`, `-t/--target` and `-o/--output` argument will run the program in headless mode.
 ## Credits
 - [henryruhs](https://github.com/henryruhs): for being an irreplaceable contributor to the project
 - [ffmpeg](https://ffmpeg.org/): for making video related operations easy
 - [deepinsight](https://github.com/deepinsight): for their [insightface](https://github.com/deepinsight/insightface) project which provided a well-made library and models.
--- a/gui-demo.png
+++ b/gui-demo.png
--- a/roop/core.py
+++ b/roop/core.py
@@ -44,8 +44,10 @@ def parse_args() -> None:
    program.add_argument('--reference-face-position', help='position of the reference face', dest='reference_face_position', type=int, default=0)
    program.add_argument('--reference-frame-number', help='number of the reference frame', dest='reference_frame_number', type=int, default=0)
    program.add_argument('--similar-face-distance', help='face distance used for recognition', dest='similar_face_distance', type=float, default=0.85)
-    program.add_argument('--video-encoder', help='adjust output video encoder', dest='video_encoder', default='libx264', choices=['libx264', 'libx265', 'libvpx-vp9'])
+    program.add_argument('--temp-frame-format', help='image format used for frame extraction', dest='temp_frame_format', default='png', choices=['jpg', 'png'])
-    program.add_argument('--video-quality', help='adjust output video quality', dest='video_quality', type=int, default=18, choices=range(52), metavar='[0-51]')
+    program.add_argument('--temp-frame-quality', help='image quality used for frame extraction', dest='temp_frame_quality', type=int, default=0, choices=range(100), metavar='[1-100]')
    program.add_argument('--output-video-encoder', help='encoder used for the output video', dest='output_video_encoder', default='libx264', choices=['libx264', 'libx265', 'libvpx-vp9', 'h264_nvenc', 'hevc_nvenc'])
    program.add_argument('--output-video-quality', help='quality used for the output video', dest='output_video_quality', type=int, default=35, choices=range(100), metavar='[1-100]')
    program.add_argument('--max-memory', help='maximum amount of RAM in GB', dest='max_memory', type=int)
    program.add_argument('--execution-provider', help='available execution provider (choices: cpu, ...)', dest='execution_provider', default=['cpu'], choices=suggest_execution_providers(), nargs='+')
    program.add_argument('--execution-threads', help='number of execution threads', dest='execution_threads', type=int, default=suggest_execution_threads())
@@ -65,8 +67,10 @@ def parse_args() -> None:
    roop.globals.reference_face_position = args.reference_face_position
    roop.globals.reference_frame_number = args.reference_frame_number
    roop.globals.similar_face_distance = args.similar_face_distance
-    roop.globals.video_encoder = args.video_encoder
+    roop.globals.temp_frame_format = args.temp_frame_format
-    roop.globals.video_quality = args.video_quality
+    roop.globals.temp_frame_quality = args.temp_frame_quality
    roop.globals.output_video_encoder = args.output_video_encoder
    roop.globals.output_video_quality = args.output_video_quality
    roop.globals.max_memory = args.max_memory
    roop.globals.execution_providers = decode_execution_providers(args.execution_provider)
    roop.globals.execution_threads = args.execution_threads
@@ -151,7 +155,7 @@ def start() -> None:
    # process image to videos
    if predict_video(roop.globals.target_path):
        destroy()
-    update_status('Creating temp resources...')
+    update_status('Creating temporary resources...')
    create_temp(roop.globals.target_path)
    # extract frames
    if roop.globals.keep_fps:
@@ -163,10 +167,14 @@ def start() -> None:
        extract_frames(roop.globals.target_path)
    # process frame
    temp_frame_paths = get_temp_frame_paths(roop.globals.target_path)
-    for frame_processor in get_frame_processors_modules(roop.globals.frame_processors):
+    if temp_frame_paths:
-        update_status('Progressing...', frame_processor.NAME)
+        for frame_processor in get_frame_processors_modules(roop.globals.frame_processors):
-        frame_processor.process_video(roop.globals.source_path, temp_frame_paths)
+            update_status('Progressing...', frame_processor.NAME)
-        frame_processor.post_process()
+            frame_processor.process_video(roop.globals.source_path, temp_frame_paths)
            frame_processor.post_process()
    else:
        update_status('Frames not found...')
        return
    # create video
    if roop.globals.keep_fps:
        fps = detect_fps(roop.globals.target_path)
@@ -186,6 +194,7 @@ def start() -> None:
            update_status('Restoring audio might cause issues as fps are not kept...')
        restore_audio(roop.globals.target_path, roop.globals.output_path)
    # clean temp
    update_status('Cleaning temporary resources...')
    clean_temp(roop.globals.target_path)
    # validate video
    if is_video(roop.globals.target_path):
--- a/roop/globals.py
+++ b/roop/globals.py
@@ -12,8 +12,10 @@ many_faces = None
 reference_face_position = None
 reference_frame_number = None
 similar_face_distance = None
-video_encoder = None
+temp_frame_format = None
-video_quality = None
+temp_frame_quality = None
 output_video_encoder = None
 output_video_quality = None
 max_memory = None
 execution_providers: List[str] = []
 execution_threads = None
--- a/roop/metadata.py
+++ b/roop/metadata.py
@@ -1,2 +1,2 @@
 name = 'roop'
-version = '1.2.0'
+version = '1.3.0'
--- a/roop/processors/frame/face_enhancer.py
+++ b/roop/processors/frame/face_enhancer.py
@@ -60,6 +60,12 @@ def post_process() -> None:
 def enhance_face(target_face: Face, temp_frame: Frame) -> Frame:
    start_x, start_y, end_x, end_y = map(int, target_face['bbox'])
    padding_x = int((end_x - start_x) * 0.5)
    padding_y = int((end_y - start_y) * 0.5)
    start_x = max(0, start_x - padding_x)
    start_y = max(0, start_y - padding_y)
    end_x = max(0, end_x + padding_x)
    end_y = max(0, end_y + padding_y)
    temp_face = temp_frame[start_y:end_y, start_x:end_x]
    if temp_face.size:
        with THREAD_SEMAPHORE:
--- a/roop/utilities.py
+++ b/roop/utilities.py
@@ -12,8 +12,8 @@ from tqdm import tqdm
 import roop.globals
 TEMP_FILE = 'temp.mp4'
 TEMP_DIRECTORY = 'temp'
 TEMP_VIDEO_FILE = 'temp.mp4'
 # monkey patch ssl for mac
 if platform.system().lower() == 'darwin':
@@ -21,7 +21,7 @@ if platform.system().lower() == 'darwin':
 def run_ffmpeg(args: List[str]) -> bool:
-    commands = ['ffmpeg', '-hide_banner', '-hwaccel', 'auto', '-loglevel', roop.globals.log_level]
+    commands = ['ffmpeg', '-hide_banner', '-loglevel', roop.globals.log_level]
    commands.extend(args)
    try:
        subprocess.check_output(commands, stderr=subprocess.STDOUT)
@@ -42,27 +42,35 @@ def detect_fps(target_path: str) -> float:
    return 30
-def extract_frames(target_path: str, fps: float = 30) -> None:
+def extract_frames(target_path: str, fps: float = 30) -> bool:
    temp_directory_path = get_temp_directory_path(target_path)
-    run_ffmpeg(['-i', target_path, '-pix_fmt', 'rgb24', '-vf', 'fps=' + str(fps), os.path.join(temp_directory_path, '%04d.png')])
+    temp_frame_quality = roop.globals.temp_frame_quality * 31 // 100
    return run_ffmpeg(['-hwaccel', 'auto', '-i', target_path, '-q:v', str(temp_frame_quality), '-pix_fmt', 'rgb24', '-vf', 'fps=' + str(fps), os.path.join(temp_directory_path, '%04d.' + roop.globals.temp_frame_format)])
-def create_video(target_path: str, fps: float = 30) -> None:
+def create_video(target_path: str, fps: float = 30) -> bool:
    temp_output_path = get_temp_output_path(target_path)
    temp_directory_path = get_temp_directory_path(target_path)
-    run_ffmpeg(['-r', str(fps), '-i', os.path.join(temp_directory_path, '%04d.png'), '-c:v', roop.globals.video_encoder, '-crf', str(roop.globals.video_quality), '-pix_fmt', 'yuv420p', '-vf', 'colorspace=bt709:iall=bt601-6-625:fast=1', '-y', temp_output_path])
+    output_video_quality = (roop.globals.output_video_quality + 1) * 51 // 100
    commands = ['-hwaccel', 'auto', '-r', str(fps), '-i', os.path.join(temp_directory_path, '%04d.' + roop.globals.temp_frame_format), '-c:v', roop.globals.output_video_encoder]
    if roop.globals.output_video_encoder in ['libx264', 'libx265', 'libvpx']:
        commands.extend(['-crf', str(output_video_quality)])
    if roop.globals.output_video_encoder in ['h264_nvenc', 'hevc_nvenc']:
        commands.extend(['-cq', str(output_video_quality)])
    commands.extend(['-pix_fmt', 'yuv420p', '-vf', 'colorspace=bt709:iall=bt601-6-625:fast=1', '-y', temp_output_path])
    return run_ffmpeg(commands)
 def restore_audio(target_path: str, output_path: str) -> None:
    temp_output_path = get_temp_output_path(target_path)
-    done = run_ffmpeg(['-i', temp_output_path, '-i', target_path, '-c:v', 'copy', '-map', '0:v:0', '-map', '1:a:0', '-y', output_path])
+    done = run_ffmpeg(['-hwaccel', 'auto', '-i', temp_output_path, '-i', target_path, '-c:v', 'copy', '-map', '0:v:0', '-map', '1:a:0', '-y', output_path])
    if not done:
        move_temp(target_path, output_path)
 def get_temp_frame_paths(target_path: str) -> List[str]:
    temp_directory_path = get_temp_directory_path(target_path)
-    return glob.glob((os.path.join(glob.escape(temp_directory_path), '*.png')))
+    return glob.glob((os.path.join(glob.escape(temp_directory_path), '*.' + roop.globals.temp_frame_format)))
 def get_temp_directory_path(target_path: str) -> str:
@@ -73,7 +81,7 @@ def get_temp_directory_path(target_path: str) -> str:
 def get_temp_output_path(target_path: str) -> str:
    temp_directory_path = get_temp_directory_path(target_path)
-    return os.path.join(temp_directory_path, TEMP_FILE)
+    return os.path.join(temp_directory_path, TEMP_VIDEO_FILE)
 def normalize_output_path(source_path: str, target_path: str, output_path: str) -> Optional[str]:
`@@ -1,2 +1,2 @@`
	`name = 'roop'`	`name = 'roop'`
	`version = '1.2.0'`	`version = '1.3.0'`