I’m new to computer vision and a lot of the basic concepts are very interesting. As an iOS developer, my interests comes from using CoreML & Apple’s Vision in apps to improve the user experience.
Two common tasks are classification and object detection. Classification allows you to detect dominant objects present in an image. For example, classification can tell you that photo is probably of a car.
Object detection is much more difficult since it not only recognizes what objects are present, but also detects where they are in the image. This means that object detection can tell you that there is probably a car within these bounds of the image.
What’s important is that the machine learning model runs in an acceptable amount of time. Either asynchronous in the background or in real time. Apple provides a listing of sample models for classification at https://developer.apple.com/machine-learning/.
For real time object detection, TinyYOLO is an option, even if the frame rate is not near 60 fps today. Other real time detection models like YOLO or R-CNN are not going to provide a sufficient experience on mobile devices today.
One other interesting thing I came across is the PASCAL Visual Object Classes (VOC). These are common objects used for benchmarking object classification.
For 2012, the twenty object classes that have been selected were:
Person: person Animal: bird, cat, cow, dog, horse, sheep
I’ve been working with ARKit recently. I am planning on releasing an AR basketball game when iOS 11 is released.
Here are misc thoughts about working with ARKit:
It’s hard to find answers to common questions about doing simple things in ARKit. Searching for SceneKit yields slightly more results, but even that is sparse. The Apple developer SceneKit & ARKit forums don’t appear to have much activity either. So it’s up to StackOverflow & random Internet blog posts
Working with ARKit means working with SceneKit. SceneKit is Apple’s framework to make working with 3D assets easier for developers. Working with SceneKit & 3D is something that I’m new to. A lot of the math around position, orientation, euler angles, transforms, etc. can get complex fast when it involves matrix transforms and quaternions.
It’s really hard to find assets for DAE/collada. The DAE format is meant to be interchange format for various 3D software to communicate with each other. The reality is that exporting to DAE or converting from one format to DAE is a crapshoot. I’ve used Blender briefly to look at 3d assets, but digging into 3D modeling is a huge time sink for some one looking to get involved in ARKit. I wish there was an online store that focused on selling low poly (<10K), DAE files.
Related, working with 3D assets as someone new to 3D assets is very frustrating. The concept of bounds vs scaling as they relate to importing into your SceneKit scene was very challenging (with the 3D model that I imported). If you have your own in-house or contracted 3D modeler, you should get 3D assets that work well with SceneKit, but I had countless issues with off the shelf 3D models & file formats.
After you’re able to import your 3D model, modeling the physics geometry can be a challenge. SceneKit allows you to import the geometry for your physics body as-is using ConcavePolyhedron, but you probably don’t want that. I had to manually recreate a basketball hoop using multiple shapes combined into a single SCNNode.
ARKit is not all powerful. The main feature that ARKit gives you is horizontal plane detection. Occlusion doesn’t come with ARKit. Expect many apps that deliver an experience reliant on a plane/surface like your desk or the floor.
ARKit is exciting, but don’t expect the world yet. Future ARKit releases & better iOS hardware should provide more compelling experiences. Today, you can expect to play with 3D models on a surface (with surface interaction) or in the air (with limited or no environment interaction).