We provide a cutting edge virtual try-on solution for head mounted accessories. It is perfect for glasses or ski masks. It works everywhere without installing anything:
In a website browsed by a desktop computer or a mobile device,
In a mobile application,
On a showcase display.
The glasses are displayed over the user’s head in real-time, using the video feed from its camera. If the device is too old or if the user does not accept to share its camera, he can still upload a picture.
We have developed a demonstration application available here: https://jeeliz.com/sunglasses. We have analyzed the behaviour of this application users during a few months. We have proved that it increases the conversion and commitment rates. The feedbacks of this study are summarized in this blog article: Feedbacks on Jeeliz Sunglasses.
The glasses are rendered by a photorealistic 3D engine. The lighting is reconstructed dynamically for intensity, direction and hue. If the user is illuminated from the right, the glasses will be illuminated from the right too.
How it works
Our application only relies on standardized and open technologies like HTML5, WebGL or WebRTC. This guarantees that it will not be out of date soon. A deep learning neural network inputs an image cropped from the camera video feed. It outputs simultaneously whether it is a face or not, the face orientation and the light parameters. Then we render the glasses 3D model over the video feed using a homemade 3D engine.
The video is fully processed client side, so it does not require an expensive hosting for powerful servers or large bandwidth. Our deep learning engine is so fast that the neural network runs hundreds of times per second even on mobile devices. Thus the glasses always stitch to the user’s head.
The glasses models are stored in a centralized database called GlassesDB. We have built a smooth and optimized workflow to fastly add and modify new glasses models to this database. Model metadata can be automatically enriched with information from external websites like prices or customer ratings. We offer glasses modelization packages at interesting prices.
It is possible to run your own instance of GlassesDB server. It is also doable to create a connector between our VTO module and your specific database or 3D models. Contact-us for further details.
In your website/webapp
We can integrate a glasses VTO module directly into your website or your web application. Or you can also ask to your favorite web agency. Indeed, the widget is available on a Github repository here: https://github.com/jeeliz/jeelizGlassesVTOWidget.
We also have developed an embedded version of our VTO solution. It is perfect for running on a large HDMI TV display. Our product is cheaper and the rendering quality is better than you can find elsewhere. Contact-us for more information.
At a first glance, it may seems strange to use Jeeliz technology for embedded systems. Indeed, its favorite environment is the web browser. But web programming seeps everywhere:
in mobile applications with PWA (Progressive Web Apps),
in desktop applications with Electron,
server-side with Node.js,
and even in blockchain based decentralized applications with Lisk for example.
So why embedded systems? In this article we describe the main advantages of our technology compared to the use of a native technology.
The web browser is becoming more and more a kind of virtual machine. It adds a level of abstraction between the application and the operating system. We develop embedded applications as web applications. So we will be able to easily upgrade the hardware or the operating system without having to re-compile or to transcode anything.
The life cycle of an embedded application is often longer, so it is important to bet on a technology stable over time. Otherwise maintenance and evolution may be difficult and expensive. Web technologies take long to specify because the standards organization like the W3C or the Khronos Group have to put a lot of people in agreement. For example, the Khronos Group, who is in charge of the WebGL API, is composed of universities, browser makers, graphic hardware makers and operating system builders. But as soon as the standard is specified and implemented into the main we browsers, it will remain unchanged for years. And evolutions are almost always backward compatible.
The development workflow
We have developed internally an efficient and integrated workflow from the initialization of a neural network to its integration. We can:
initialize the neural network using a live coding interface,
train it and control the training with a graphical user interface,
test the neural network,
compress and optimize this library.
With this workflow, we build the libraries released on our Github repository. Now, why not use them in an embedded context to build smart cameras, interactive advertisement displays or virtual mirrors?
Write once, use everywhere
The embedded application may be used on another support. For instance an optical shop may want:
an advertisement display aside his showcase where passers-by can try his last sunglasses models. It will be powered by an embedded version of the Jeeliz virtual try-on application,
a virtual-tryon module integrated to his website where the client can try glasses models before buying them online,
a mobile application with the virtual try-on feature too.
So it is important to be able to reuse the same component for all these devices. The standard we should choose is the one which runs in the most constrained environment. The most constrained environment is the web because:
it should run on any device, even the weakest,
it should be written with strict standards,
it should be secured. So we choose the web standard and we find a way to use it for less constrained environments, including the embedded one.
As fast as native
Since our technology rely on deep learning neural networks, it is massively parallelisable and runs considerably faster on GPU. Even if we did native deep learning, it would run faster on GPU. Indeed, a CPU is made to process intricated and sequential tasks, whereas a GPU is designed to run simple tasks in parallel. Most oven, these tasks consist in computing pixel colors but we can also sum synaptic weights or process other computations.
So the speed limiting factor is the GPU. The more powerful the GPU is, the more we can increase the detection rate per second. We can then detect quickly, track accurately or use deeper, heavier but more performant neural network models.
With WebGL, we can get the best of the GPU in the web browser. WebGL is only an interface bound to DirectX, OpenGL or Vulkan depending on the operating system. We can exploit from 80% to 100% of the GPU, like in native applications.
GPU oriented hardware for embedded systems is now available. Nvidia has released the Jetson TX1 in April 2014 and we have successfully embedded our deep learning framework on the Nvidia Jetson TX2 on August 2018. We have released publicly the setup we use to run web application on this amazing hardware and called it JetsonJS. The Github repository of the project is here: github.com/jeeliz/jetsonjs.
Jeeliz embedded solutions
The goal of the JeelizBox is to run Jeeliz Applications in an embedded context. It is small and power efficient, so it can be embedded everywhere. We even managed put it into a thin advertisement display. Its power consumption is under 50W, so it does not heat a lot. The applications are stored on SD cards which can be hot replaced, like a GameBoy! The JeelizBox has the followed features:
Application on SD card which can be hot plugged,
Websocket server to stream data outside,
HDMI full HD display.
The JeelizBox software is based on JetsonJS.
We can also build custom hardware with JetsonJS. We are currently working on a pupillometer by embedding JeelizPupillometry with JetsonJS. We don’t use the JeelizBox for this purpose because we need very specific hardware:
an Infrared light to enlight the eyes for the IR camera,
a visible LED light to cause pupil dilatation,
a small 5 inches touchscreen display to control the device,
for later versions: a battery and a charge control.
The augmented reality frameworks ARCore and ARKit, released respectively by Google and Apple, have popularized the use of augmented reality. They rely on the computing power and the 3D scanning capabilities of latest high end mobile devices to provide a smooth experience. The main technological breakthrough is their SLAM (Simultaneous Localization and Mapping) algorithms which reconstructs the 3D environment and concomitantly computes the position and orientation of the device.
They prepare the arrival of the next generation of augmented reality glasses. These devices will be costly because they will require high embedded computing power, high resolution display and expensive optical components. So people will buy them only if enough experiences are available. Thanks to the current mainstream augmented reality frameworks, the applications for AR glasses will be already developed and tested.
But their main limitation is the portability. Developing an augmented reality application is very expensive. It requires an intricated user experience, 3D assets, and testing can be difficult. ARCore and ARKit are platform specific, so an application developped for a framework should be rewritten for the other framework. Furthermore, the user will have to install an application to access to the augmented reality experience. So it is not sharable only by clicking on a single link.
The futur: WebXR
Until now, only 8th Wall did an augmented reality framework working fully in the browser, without depending on external augmented reality engines like ARKit or ARCore. Their work is absolutely remarkable, and their demos works smoothly even with Android smartphone too old for ARCore. We just wish they will implement the WebXR interface soon.
As WebXR is a web standard, only web based libraries will be able to work with it. We cannot use CoreML, Cuda and other very powerful but native technologies. WebGL is the only way to access the GPU hardware acceleration, both for computing and rendering.
The augmented reality consists in overlaying virtual content over the reality. The more the application understands the surrounding world, the more we can narrow the gap between the real and the virtual:
If we understand the 3D geometry of the room (SLAM), we can place 3D objects at the right place,
If we understand the lighting of the scene, we can render them coherently,
If we recognize people faces, we can display custom information above each face…
In particular, object detection and tracking allows a deep understanding of the scene. We can imagine these scenarios:
Put a virtual avatar on a chair because we have detected that a chair is a chair,
Replace all cars in the street by mammoths,
Increase the size of road signs for a driving help…
There are different kinds of object detections. We enumerate them from the easiest to the hardest.
QRCode are flat objects made to be easily detectable and decodable. There are many efficient libraries to read them. But they have major drawbacks:
They are flat, so they don’t fit on any surface,
They require to be paint on the object you want to detect,
They are ugly.
Unlike a QRcode, an image can be beautiful and displayed in harmony with the environment. Image recognition algorithms are quite effective. For a given image, they compute a signature which should be robust to lighting conditions and geometric transformations. The SIFT (Scale Invariant Features Transform) algorithm is maybe the most famous. These algorithms are often included into the augmented reality frameworks. But they still have their limitations:
They require a flat surface on the object to detect, otherwise they may work only for a specific view angle,
They are not able to generalize: you can detect the painted portrait of a specific person, but you cannot detect all the painted portraits whoever is painted.
3D Object detection
This is the hardest and this is where our technology can be helpful. The goal is to detect any 3D object with a specific level of generalisation. The difficulty of the task depends on the chosen level of generalization. For instance if we want to detect all vehicles, it will be quite hard because the level of generalization is very high. There is a big difference between a motorbike, a truck and a car. If we want to detect all cars, it will be easier. But if we want to detect a specific model of car, it may be difficult especially if other car models have a very close body shape. In the later case we face a high level of specialization.
Some object detection algorithms rely on 3D scanning to compare the scanned data with a reference mesh. But this approach requires 3D scanning capabilities embedded on the device, and a 3D model of the reference object. And it won’t work with deformable objects or objects with a inherent variability (plants for example).
So we rather bet on deep learning:
First we train a neural network to detect a specific 3D object,
Then we load the pre-trained neural network on the final augmented reality application.
In this example, we have trained a neural network to detect mugs. We load it in the final application, and we play a 3D animation when a mug is detected. The application is based on JeelizAR:
We have released a library for augmented reality, JeelizAR. It is available on github on github.com/jeeliz/jeelizAR. We regularly add new neural network models to the repository and we offer neural network design and training as a service.