Table of Contents


There are more details to follow, but for now, please see the video and code links at the top of the page for a demo and working code of the current release.

In this project I cover fish identification and localization via my cloud hosted platform, The platform feeds off of existing video feeds within public acquarias’ exhibit livestreams.

While working full-time as a core engineer for Matthews South, my hunger for learning and desire to support projects of substantial impact to others quickly led me to explore volunteer opportunities at the California Academy of Sciences (CalAcademy).
While volunteering at CalAcademy, I saw a need for revitalisation of educational tools within public aquaria; their exhibits, as well as those of many other institutions similar to it, suffer from static, fixed-medium interfaces such as printed placards and rehearsed dialogue with presenters. These solutions are typically non-reusable with high manufacturing and curation costs. As a result, they often lack detail and cannot represent physically or temporally dynamic content such as our evolving contextual and scientific knowledge of a particular exhibit. After one and a half years of working in my spare-time, I developed and released to give these institutions hightech tools to supplement their public facing displays as well as in-house research projects. This platform delivers deep and dynamic content by leveraging electronic displays, recent advances in computer vision and deep learning (DL) as well as best practices in web development. By taking advantage of existing resources within the exhibits (i.e. in-water webcams and electronic kiosks) and by using scalable, detached software infrastructure such as Amazon AWS for its hosting as well as modular DL models, positions itself as a replacement medium for physical placards to represent information for exhibits. It can be operated from anywhere because the platform is cloud-hosted, allowing the user to continue their learning experience beyond the physical confines of the institution. Additionally, users control the content presented to them in real-time; they are able to find, identify and explore in-depth, specimens of interest. This platform is extensible and I aim to further its development through even more engaging mediums such as mixed reality.

There have been many and varied attempts at classifying either fish or fish species in underwater imagery ranging from stills to video data. I cover the most relevant projects here:

kwea123 fish_detection1

This is perhaps the most thorough, relevant, modern and promising project I have come across. It uses

Dataset Tools

Annotation Format Conversion Tools

Annotation Tools

Many annotation tools exist for object detection, segmentation and classification. Here I list the most promising ones I came across as well as my final pick.


  1. labelme2: Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation)
  • There is a handy tool3 available to convert labelme’s annotations to COCO’s required format
  1. labelImg4: Graphical image annotations and object bounding boxe labeling in images
  • There is a handy tool5 available to convert labelImg’s annotations (VOC format) to COCO’s required format
  1. BBox-Label-Tool6: Simple tool for labeling object bounding boxes in images, implemented with Python Tkinter
  2. sloth7: Tool for labeling image and video data for computer vision research



MSE in Robotics

My research interests include computer vision and deep learning.