DATASET/FACESET GUIDE FOR REQUESTERS/NON-CREATORS.
Help creators by preparing a dataset/faceset of your celebrity.
The community needs you!
We can only make adorable deepfakes when we have good face sets of each celebrity to train the models and render a high quality fake.
To share your face set you can join the community and upload the set at www.deepfakestudios.com
NOTE FOR USERS WHO ARE UNABLE TO USE DFL OR ANY OTHER FACE SWAP SOFTWARE!
You can still make a dataset by just collecting good quality photos and videos and then extracting just the frames from video, it will produce higher size zip file to upload but is still more helpful for creators than nothing, you can extract frames from videos by using the same software DFL uses – FFMPEG or any video editing software.
Before you start here is some terminology:
– data_src, src, source, celebrity faceset, celebrity dataset, source dataset, source images – images of the celebrity that are used in training the AI model.
– frame,frames – self explanatory, just individual frames extracted from video, located inside workspace/data_src.
– faces – aligned pictures of faces, located inside workspace/data_src/aligned
To create a source faceset you will need software that is used to create deepfakes, this guide focuses on using DFL 2.0:
How to download
After downloading it just unpack it and you are pretty much ready to go.
To create a good quality source dataset you’ll need to find source material of your subject, that can be photos or videos, videos are preffered due to variety of expressions and angles that are needed to cover all possible appearances of face so that model can learn it correctly. You can also combine videos and photos.
Find below some good sources to download videos:
Below are some more things that you need to ensure so that your source dataset is as good as it can be.
1.Videos/photos should cover all or at least most of possible face/head angles – looking up, down, left, right, straight at camera and everything in between, the best way to achieve it is to use more than one interview or grab clips from movies instead of relying on single interview (which will mostly feature one angle and some small variations).
NOTE #1: If DST does not contain certain angles (like side face profiles) there is no need to include them in SRC, for those cases such angles can be temporarily removed from source faceset/dataset but it’s good to always prepare datasets in such a way that they can work with dst that contains all the angles/expressions.
2. Videos/photos should cover all different facial expressions – that includes open/closed mouths, open/closed eyes, smiles, frowns, eyes looking in different directions – the more variety in expressions you can get the better results will be.
NOTE #2: To achieve less morphing of result face (in other words have it look more like SRC) it’s best to try and achieve maximum variety using less different sources.
The more different sources are used the higher chance is that result face will look less like our source and model will require use of true face or longer training. It’s true especially if source faceset/dataset consists of many sources, each one unique and overall each source takes only a small portion of the faceset.
#1 1000 faces, 3 different angles from 1 source – result face will look more like most like SRC.
#2 1000 faces, 3 different angles from 3 sources (each source for different angle) – slightly worse resemblance to SRC but still less morphing than example #3.
#3 1000 faces, 3 different angles from 3 sources (same angles visible in all 3 sources) – worst resemblance to SRC and most morphing to DST compared to example #1 and #2.
To combat morphing in example #3 the best thing would be to add more images from each source and try to remove similar angles/expressions from multiple sources and only leave the best ones.
On the other hand it’s good to have several lighting conditions for each angle (like soft/diffuse lighting and 2 different ones, also soft coming from left and right, shots with harsh lighting and hard shadows should not be included) so it might be necessary to keep faces from similar angles from more than 3 sources but if your dataset has 10 different sources of similarly looking faces remove most of them and only keep those that differ the most lighting conditions wise and have the best quality (and cover all the other requirements).
3.Materials need to be consistent – you don’t want blurry, low resolution and compressed faces next to crisp, sharp and high quality ones so you should only use the best quality materials you can find, if however you can’t or certain angles/expressions are present only in lower quality/blurry video/photo then you should keep it. Some blurry faces can be beneficial to better generalize face in the early stages of training. This also applies to features of the face, for male facesets you don’t want to mix faces with and without a beard, similarly for female facesets you don’t want to mix faces with different makeups.
NOTE #3: Fixing female datasets with varying makeups doesn’t need to be performed if the only differences are slight shades of lips or eye shadows/other forms of makeup, in those cases it’s fine to leave them as they are in your dataset.
But if you have faces with no lipstick and light eye shadows next to faces with dark red lipstick and dark green/blue eye shadows then you either want to remove the more extreme ones or color correct them in your photo or video editing software, you can de-saturate overall colors, apply selective color correction, apply sepia filter at 40-80% strength or apply color matching to different faces with less makeup.
4. Most of it should be high quality – as mentioned above, you can leave use some blurry photos/videos but only when you can’t find certain expressions/face angles in others. No more than 1-5% of the dataset should be blurry.
5. Lighting should be consistent – some small shadows are OK but you shouldn’t include interviews with harsh, directional lighting, if possible try to use only those where shadows are soft and light is diffused. For LIAE architectures it may not be as important as it can handle lighting better but for DF architectures it’s important to have several lighting conditions for each face angle, preferably at least 3 (frontal diffuse, left and right soft shadows). Source faceset/dataset can contain faces of varying brightness but overly bright or dark faces should not be included.
6. If you are using only pictures or they are a majority of the dataset – make sure they fill all the checks as mentioned above, 20 or so pictures is not enough. Don’t even bother trying to make anything with so little pictures.
7. Keep the total amount of faces in your source faceset/dataset under 10.000 – the benefits or disadvantages of using a really big (in excess of 15.000 – 20.000 faces) facesets/datasets have not be proven yet, technically as long as amount of faces from each source is at least 500-1000 and similar angles/expressions don’t overlap too much issues mentioned in 2. with morphing and result face not looking like SRC shouldn’t occur but having such big faceset/dataset is going to make the training longer. The more clean and robust faceset you make the better results you will achieve.
After you’ve collected all videos:
1. Extract individual frames from videos.
Following instruction is for when you have a couple videos/interviews to extract, if you only have a single video and some pictures, skip to step 2. Align faces.
If you have few videos it’s best to combine them all using a video editing software like Vegas Pro/Resolve/AE/Premiere into one data_src.mp4 file (in that case also skip to step 2. Align faces.).
Otherwise you’ll have to extract frames from each one one by one.
To do so copy over first source video to “workspace” folder, rename it to data_src.mp4 and start extracting frames from it using 2) extract images from video data_src from data_src.
After the process finishes extracting frames delete data_src.mp4 file from the “workspace” folder and go into “data_src” folder and rename all extracted frames to something like “srcSET1” by selecting all frames with ctrl + A and renaming with F2. We do this so that when we start extracting more frames from another data_src.mp4 file we don’t overwrite old ones. Then copy over next source video, name it data_src, extract, select all files, ctrl + a, F2, rename, delete data_src, bring another one over and repeat until you finish all. If it seems like a lot of work, it is, that’s why you should edit all videos into one and extract that, saves a lot of time, especially if you have 10-20 source videos.
2. Align faces.
After you’ve extracted all frames from videos copy all your pictures (if you have any) into data_src folder and run face extraction/alignment process using either 4) data_src extract full_face S3FD or 4) data_src extract whole_face S3FD (if you are planning on training whole face model you have to extract with the right process/method).
After extraction finishes it will create a new folder “aligned” inside “data_src” folder with aligned/extracted faces visible in photos and video frames you extracted in step 1. This process is not perfect and requires manual cleanup. This is a process where you go through all the aligned faces and delete all blurry, to dark and to bright faces as well as ones that don’t belong to the celebrity of which you are making faceset (script will detect ALL FACES from your extracted frames/photos) or ones that were incorrectly aligned (rotated upside down, etc).
To help you in this DFL comes with a sorting process that contains a wide range of sorting methods:
 blur – sorts by image blurriness (determined by contrast). Slow.
 face yaw direction – sorts by yaw (from faces looking to left to looking right).
 face pitch direction – sorts by pitch (from faces looking up to looking down).
 face rect size in source image – sorts by size of the face on the original frame (from biggest to smallest faces). Much faster than blur.
 histogram similarity – sort by histogram similarity, dissimilar faces at the end, useful for removing drastically different looking faces, also groups them together.
 histogram dissimilarity – as above but dissimilar faces are on the beginning.
 brightness – sorts by overall image/face brightness.
 hue – sorts by hue.
 amount of black pixels – sorts by amount of completely black pixels (such as when face is cut off from frame and only partially visible).
 original filename – sorts by original filename (of the frames from which faces were extracted).
 one face in image – sorts faces in order of how many faces were in the original frame.
 absolute pixel difference – sorts by absolute difference in how image works, useful to remove drastically different looking faces.
 best faces – sorts by several factors including blur and removes duplicates/similar faces, has a target of how many faces we want to have after sorting, discard faces are moved to folder “aligned_trash”.
 best faces faster – similar to best faces but uses face rect size in source image instead blur to determine quality of faces, much faster than best faces.
After that you will have a clean dataset, all you now need to do is to zip that “aligned” folder and upload it to mega/google drive and post link in a request thread so creators have easier job fulfilling your requests. You can also use 5.2) data_src util faceset pack and then later 5.2) data_src util faceset unpack to quickly pack/unpack the whole source dataset into a single file.
You can upload your datasets here: www.deepfakestudios.com
Join the club and find here only high quality face sets to make it easy for those who will create the deepfake.