About me

Hello there, I am Vinay Sisodia. I have deep interest in areas of math & machine learning. I am currently working as Technical Director, ML at PicCollage.

What I do

  • Bringing AI to a tech startup: building ML technologies, systems and infrastructure from the ground up
  • ML Research, esp vision and language models: I am particularly interested in multi-modality, self-supervised learning and graphs
  • Building and growing a machine learning team
  • Building new products

Things I have built

  • Linear Algebra for Programmers: An e-book of visual essays to learn linear algebra from scratch.
  • TinyVolt: Most of my open-source work can be found here.
  • Category Theory Meetup: A (now defunct) meetup group for discussing topics related to Category Theory. At some point I was really excited about this topic but soon realized that unless I familiarized myself with some advanced topics in abstract algebra, it would sound and feel like abstract nonsense. Incomplete notes on this topic can be found here.

Some background

I got into machine learning when companies were starting to build dedicated AI/ML teams. I had the privilege of being the first ML engineer in two different organizations, which allowed me to build ML systems from the ground up. Later on I took on the responsibility of growing and leading ML teams. The past few years have been quite gratifying professionally and intellectually, and I am very proud of some of the things I built during this time.

Education

  • Bachelors & Masters: Indian Institute of Technology, Kharagpur
  • Courses in Mandarin: National Taiwan Normal University, National Taiwan University
    • Recipient of Huayu Enrichment Scholarship

Notable Projects

  • [GLIACloud, 2018] Highlight detection in soccer videos
    • Built a simple CNN + RNN model which was trained on labeled soccer videos from Chinese Super League
  • [PicCollage, 2021] Distillation of CLIP model
    • Distilled the ViT component of the CLIP model (24MB only), which was then deployed on edge and is currently in use by multiple apps
    • Some details are available here, but we did more work later which is not covered by the article
    • Extracted attention values from the transformer layer to act as a proxy for saliency patches
  • [PicCollage, 2021] Transition detection in videos
    • This was done by looking at the spectral properties of the cosine similarity matrix of the video frames (Spectral Graph Theory to the rescue!)
  • [PicCollage, 2022] Training a GNN for photo to sticker recommendation
    • Implemented a variant of CompGCN
    • The most important change was that our model was inductive while the original implementation was transductive. This was done by a) fixing the initial node embedding to be a CLIP embedding and b) learning a transformation to map the initial node embedding instead of changing the node values directly
  • [PicCollage, 2022] Bringing Generative AI related technologies to build products/features on
    • Set up training pipelines for dreambooth, LoRA and ControlNet