Model specification
•
Detection of face bounding box: The model should accurately detect the bounding box of faces within an image. We don’t require landmarks.
•
Face capture quality score: Similar to Apple's VNDetectFaceCaptureQualityRequest, this metric should provide a holistic assessment of face capture quality. It considers various factors such as scene lighting, blur, occlusion, expression, pose, focus, and more. The score ranges from 0.0 to 1.0 and helps in ranking multiple captures of the same person. The model should penalize captures with low light, bad focus, or negative expressions.
•
Face features fingerprint for face recognition: By utilizing techniques akin to ArcFace, the model should generate a unique fingerprint for each face, facilitating accurate face recognition. We want to detect persons in images.
•
Optional features (if feasible):
◦ Eyes open probability
◦ Smiling probability
Deliverables
•
Model format: The model must be available in one of the following formats: ONNX, PyTorch, or TensorFlow. The choice of format should be made based on considerations of stability and performance.
•
Source code: Complete source code for the model is required to enable transparency, customization, and further development.
•
Accuracy evaluation script: An evaluation script must be provided to assess the accuracy of the model. This script should function similarly to the one available at
https://github.com/biubug6/Pytorch_Retinaface/tree/master/widerface_evaluate .
•
Apple Core ML conversion script: A script is needed to convert the model to Apple Core ML format, ensuring the best possible performance with iOS applications.
•
Kotlin usage sample code: The model will be used in a Kotlin Multiplatform app for JVM, Android & iOS and working sample code should be provided how to use it. This includes the development of necessary translators. See the interface section below.
•
Commercial license: The model must be accompanied by a royalty-free license allowing usage in closed-source end-user commercial applications.
Accuracy & Performance
•
Face detection accuracy: The model must achieve accuracy comparable to RetinaFace or BlazeFace (Google MediaPipe). For reference, PyTorch_Retinaface achieves 95% accuracy on WiderFace-Easy, while BlazeFace attains 98% accuracy. Therefore, the face detection accuracy should not fall below 93% for detecting bounding boxes.
•
Face recognition accuracy: Aiming for face recognition accuracy of at least 95%, which aligns with the high accuracy achieved by ArcFace at 99%.
•
Quality score comparison: The quality score generated by the model will be compared to Apple Vision results. The expected deviation from Apple's predictions should be within +/- 0.1.
•
Model size: The model's size must be optimized to be less than 500 MB, where feasible, to ensure efficient deployment and usage.
•
Inference speed: The processing speed is significantly affected by the dimensions of the input image. Google ML Kit suggests input images with a long side of 480px, a size we've found suitable for PyTorch_RetinaFace thus far. However, we believe larger input sizes are necessary for assessing face capture quality accurately. Excluding byte and model loading times, PyTorch_RetinaFace on the M1 Pro CPU using DeepJavaLibrary without hardware acceleration requires around 500ms for 480x320px photos and 1500ms for 960x640px photos. Expect the updated model to necessitate additional processing time. We anticipate that processing a 960x640px image on JVM without hardware acceleration should fall within the range of 2 to 3 seconds. On the iOS Simulator (M1 Pro CPU), utilizing Apple Vision for detecting bounding boxes and evaluating face quality in 960x640px input images takes around 50ms, thanks to hardware acceleration. For the updated model, a threshold of 100ms is deemed acceptable.
Platform-specific requirements
•
iOS: Utilize Apple Core ML for optimal performance on iOS devices.
•
JVM: Choose from DeepJavaLibrary, ONNX runtime, or any other framework licensed under Apache 2, MIT, or BSD for deployment on Java Virtual Machines.
•
Android: We welcome suggestions for the most appropriate framework or library that delivers optimal performance on the Android platform. While we are considering options such as the ONNX runtime or PyTorch Mobile, it's important to note that we found DeepJavaLibrary not suitable for our needs.
Interfaces
An implementation of the following interface for each platform is required:
interface FaceDetector {
fun detectFaces(jpegBytes: ByteArray): Set<DetectedFace>
}
With the following classes:
data class DetectedFace(
/** Relative bounds of the face in the picture. */
val boundingBox: BoundingBox,
/** Result of the face feature computation to detect the same person. */
val features: FloatArray?,
/** Capture quality metric (blur, sharpness, etc.). */
val quality: Double?,
// optional
val eyesOpenProbability: Double?,
// optional
val smilingProbability: Double?
)
data class BoundingBox(
val posX: Double,
val posY: Double,
val width: Double,
val height: Double
)
Additionally, code is required to determine whether two faces belong to the same person based on their respective facial features.
Open questions
We read that OpenVINO has something for Java in place and could help to accelerate the speed on Intel CPUs. We would like to have assistance in this matter.