https://kotlinlang.org logo
#hiring
Title
# hiring
s

Stefan Oltmann

03/14/2024, 12:06 PM
Hello, we are currently seeking an freelance expert in Machine Learning who can develop a customized face detection & recognition model to meet our specific requirements plus Kotlin code how to use it. I provide detailed information in the thread and would greatly appreciate your feedback. This is our initial draft outlining our needs. Please let us know if anything is unclear or requires further clarification.
Model specificationDetection of face bounding box: The model should accurately detect the bounding box of faces within an image. We don’t require landmarks. • Face capture quality score: Similar to Apple's VNDetectFaceCaptureQualityRequest, this metric should provide a holistic assessment of face capture quality. It considers various factors such as scene lighting, blur, occlusion, expression, pose, focus, and more. The score ranges from 0.0 to 1.0 and helps in ranking multiple captures of the same person. The model should penalize captures with low light, bad focus, or negative expressions. • Face features fingerprint for face recognition: By utilizing techniques akin to ArcFace, the model should generate a unique fingerprint for each face, facilitating accurate face recognition. We want to detect persons in images. • Optional features (if feasible): ◦ Eyes open probability ◦ Smiling probability DeliverablesModel format: The model must be available in one of the following formats: ONNX, PyTorch, or TensorFlow. The choice of format should be made based on considerations of stability and performance. • Source code: Complete source code for the model is required to enable transparency, customization, and further development. • Accuracy evaluation script: An evaluation script must be provided to assess the accuracy of the model. This script should function similarly to the one available at https://github.com/biubug6/Pytorch_Retinaface/tree/master/widerface_evaluate . • Apple Core ML conversion script: A script is needed to convert the model to Apple Core ML format, ensuring the best possible performance with iOS applications. • Kotlin usage sample code: The model will be used in a Kotlin Multiplatform app for JVM, Android & iOS and working sample code should be provided how to use it. This includes the development of necessary translators. See the interface section below. • Commercial license: The model must be accompanied by a royalty-free license allowing usage in closed-source end-user commercial applications. Accuracy & PerformanceFace detection accuracy: The model must achieve accuracy comparable to RetinaFace or BlazeFace (Google MediaPipe). For reference, PyTorch_Retinaface achieves 95% accuracy on WiderFace-Easy, while BlazeFace attains 98% accuracy. Therefore, the face detection accuracy should not fall below 93% for detecting bounding boxes. • Face recognition accuracy: Aiming for face recognition accuracy of at least 95%, which aligns with the high accuracy achieved by ArcFace at 99%. • Quality score comparison: The quality score generated by the model will be compared to Apple Vision results. The expected deviation from Apple's predictions should be within +/- 0.1. • Model size: The model's size must be optimized to be less than 500 MB, where feasible, to ensure efficient deployment and usage. • Inference speed: The processing speed is significantly affected by the dimensions of the input image. Google ML Kit suggests input images with a long side of 480px, a size we've found suitable for PyTorch_RetinaFace thus far. However, we believe larger input sizes are necessary for assessing face capture quality accurately. Excluding byte and model loading times, PyTorch_RetinaFace on the M1 Pro CPU using DeepJavaLibrary without hardware acceleration requires around 500ms for 480x320px photos and 1500ms for 960x640px photos. Expect the updated model to necessitate additional processing time. We anticipate that processing a 960x640px image on JVM without hardware acceleration should fall within the range of 2 to 3 seconds. On the iOS Simulator (M1 Pro CPU), utilizing Apple Vision for detecting bounding boxes and evaluating face quality in 960x640px input images takes around 50ms, thanks to hardware acceleration. For the updated model, a threshold of 100ms is deemed acceptable. Platform-specific requirementsiOS: Utilize Apple Core ML for optimal performance on iOS devices. • JVM: Choose from DeepJavaLibrary, ONNX runtime, or any other framework licensed under Apache 2, MIT, or BSD for deployment on Java Virtual Machines. • Android: We welcome suggestions for the most appropriate framework or library that delivers optimal performance on the Android platform. While we are considering options such as the ONNX runtime or PyTorch Mobile, it's important to note that we found DeepJavaLibrary not suitable for our needs. Interfaces An implementation of the following interface for each platform is required:
Copy code
interface FaceDetector {
    fun detectFaces(jpegBytes: ByteArray): Set<DetectedFace>
}
With the following classes:
Copy code
data class DetectedFace(

    /** Relative bounds of the face in the picture. */
    val boundingBox: BoundingBox,

    /** Result of the face feature computation to detect the same person. */
    val features: FloatArray?,

    /** Capture quality metric (blur, sharpness, etc.). */
    val quality: Double?,

    // optional
    val eyesOpenProbability: Double?,

    // optional
    val smilingProbability: Double?
)

data class BoundingBox(
    val posX: Double,
    val posY: Double,
    val width: Double,
    val height: Double
)
Additionally, code is required to determine whether two faces belong to the same person based on their respective facial features. Open questions We read that OpenVINO has something for Java in place and could help to accelerate the speed on Intel CPUs. We would like to have assistance in this matter.
g

Grzegorz Aniol

03/15/2024, 8:22 AM
Hi, 2 cents from myself, I think you will more likely find two persons: ML expert in computer vision and another one, Kotlin developer with KMP knowledge. I think it's unlikely you will find one person being expert in both domains
s

Stefan Oltmann

03/15/2024, 8:37 AM
Thanks for your feedback. Yes, maybe I look for two persons who can do this as a team. The problem is that a model alone does not help me much, because preparing its inputs and interpreting the outputs is also some arcane knowledge to me. I'm glad that DeepJavaLibrary provided all this in a sample how to use RetinaFace, because I would not have been able to craft this myself. I got a budget for crafting an advanced model.
If I find someone who can provide me Java & Swift code how to do that I can work on the Kotlin translation myself.
u

0xf1f1

03/19/2024, 7:52 PM
how much is your budget ? and is there a timeframe you need it done by ?
s

Stefan Oltmann

03/19/2024, 8:25 PM
No specific timeframe. Would be great to have it done in 6 months, but I have no idea how long it would take. What do you think it would cost?
u

0xf1f1

03/19/2024, 8:57 PM
how will it be deployed ? and what kind of quality are we looking at for the input jpegs ? and what kind of accuracy levels do you require ? 6-8 months seems reasonable. Costing wise I'd need to scope out the design specs first
s

Stefan Oltmann

03/19/2024, 10:11 PM
Deployment will be on end user devices as part of the Ashampoo Photos app. On JVM we already bundle PyTorch and the RetinaFace model - which we seek to replace with a more advanced model. The other targets are Android and iOS. We don't have a training data set yet. It will would be what regular smartphone shots look like - a lot of selfies, portraits and many of them a bit blurry due to motion blur. The goal for the model is to identify persons in those images and also tell how good the capture quality of the photo is. Apple calls this faceCaptureQuality.
u

0xf1f1

03/19/2024, 10:25 PM
Do you have performance requirements ? and I am guessing the code base will need to be usable across Windows, iOS and Android ? Is there any reason you are scraping your existing set up ? what features do they lack ? and just to confirm that facial recog will only be used on images and not videos
s

Stefan Oltmann

03/19/2024, 11:34 PM
Yes, I described performance requirements/expectations above. It’s only for images. I don’t have a existing model to do all that right now. I have face detection on all platforms, but I also want face capture quality and face recognition. All done by one model in one go. Apples model provides single-shot face detection and face capture quality, but no face recognition. My approach here was collecting amateur photos from friends & family (unfiltered, including bad ones) and detect faces and face capture quality using Apple Vision. Then I would have added face features using ArcFace. Photos annotated with this data would be used to train a new model that learns to detect faces (like Apple Vision, RetinaFace or BlazeFace), determine their capture quality and deliver a fingerprint (facial features extraction for face recognition). All of this of course as fast as possible and in a model that’s (ideally) below 500 MB. The only problem is that I don’t know how to write the model spec, prepare test data in a way that PyTorch would understand and how to write input/output translators. This is why I’m looking for help.
Or maybe you tell me that all of that is too much for one model and Apple already did a miracle by determining face capture quality along with face detection. 👀
Regarding the platforms please see my description above. I’m open for suggestions.
u

0xf1f1

03/20/2024, 7:15 PM
This project does seem quite involved, give me two weeks (a bit busy at the moment) - let me consider some possible strategies and I will get back to you, We can probably move to DM too
👍 1
5 Views