Moksh Malhotra’s Post

Robotics & AI | Innovator | IITD'26 | Ex. CAIR DRDO Research Intern

8mo

Excited to share that our extended abstract titled "Incorporating Foundation Model Priors in Modeling Novel Objects for Robot Instruction Following in Unstructured Environments" has been accepted for presentation at the Workshop on 3D Visual Representations for Robot Manipulation at ICRA 2024! Visit our website for more details: https://lnkd.in/gcmfMJgu Workshop: https://lnkd.in/gA38HrpH 📝 Authors: Moksh Malhotra, Aman Tambi, Sandeep Zachariah, P.V.M. Rao, Rohan Paul 🙏 Special thanks to our project advisors Prof. Rohan Paul and Prof. P.V.M. Rao for their invaluable guidance and support throughout this research journey! In our paper, we tackle the challenge of acquiring object models when a robotic manipulator is sent in unstructured and unknown environments. We proposes a novel approach to utilizes foundation models with RGB-Depth images to acquire high-fidelity semantic 3D model of objects, required for robotic manipulation. We demonstrate our pipeline in simulation as well as real-world scenarios. Looking forward to discussing our research and exchanging ideas with fellow researchers and professionals at the conference. Stay tuned for more updates! #ICRA2024 #Robotics #Research #AI #3DVisualization #RobotManipulation

Incorporating Foundation Model Priors in Modeling Novel Objects for Robot Instruction Following in Unstructured Environments

reail-iitdelhi.github.io

2 Comments

Subhanjan Konwer

MS ECE @ UC San Diego | Qualcomm | NIT-Calicut

8mo

Congratulations Sandeep Zachariah & team for this amazing work.

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Sergey Tulyakov

Director of Research, leading the Creative Vision team
6mo
Report this post
During the last year 3D/4D generation and reconstruction have advanced enormously. Surprisingly, these advances are not always driven by using more 3D or 4D or increasing its quality. Quite the opposite, many of the works don't require such data at all, or need just some. Instead, they develop methods to capitalize on the volumetric knowledge learned by foundational image and video models. To bootstrap high quality 3D, we now have better volumetric representations, better training approaches, and better ways of using existing 2D models. I'm talking at AI for 3D Generation Workshop at #CVPR2024 about Volumetric Generation of Objects, Scenes and Videos. We'll go deeply into most recent work from our lab and the community. Some these approaches do not use any 3D data to generate 3D, some approaches rely on some portion of it. Most interestingly, in our recent work 4Real -- we show how to reconstruct Dynamic Gaussian Splats from 2D videos and a foundational text-to-video model. Join the talk at 16:30 Monday, Jun 17. Here is an 18-second-long version of my slides
Like Comment
To view or add a comment, sign in
张博文

Research Intern @ Visual Computing Group of Microsoft Research Asia.
9mo
Report this post
Introducing GaussianCube --- a structured 3D representation crafted for 3D generative modeling based on Gaussian Splatting. 3D Gaussian Splatting (GS) have achieved considerable improvement over Neural Radiance Fields in terms of 3D fitting fidelity and rendering speed. However, this unstructured representation with scattered Gaussians poses a significant challenge for generative modeling. To address the problem, we introduce GaussianCube, a structured GS representation that is both powerful and efficient for generative modeling. We achieve this by first proposing a modified densification-constrained GS fitting algorithm which can yield high-quality fitting results using a fixed number of free Gaussians, and then re-arranging the Gaussians into a predefined voxel grid via Optimal Transport. The structured grid representation allows us to use standard 3D U-Net as our backbone in diffusion generative modeling without elaborate designs. Extensive experiments conducted on ShapeNet and OmniObject3D show that our model achieves state-of-the-art generation results both qualitatively and quantitatively, underscoring the potential of GaussianCube as a powerful and versatile 3D representation. Project Page: https://lnkd.in/ggnkXtUs Paper: https://lnkd.in/gg9UGEHf Code: https://lnkd.in/gcVxHumH

GaussianCube: Structuring Gaussian Splatting

gaussiancube.github.io
Like Comment
To view or add a comment, sign in
Dr. Juan Zamora-Mora

AI Consultant & Researcher
7mo
Report this post
Learn how to do custom annotations for Roboflows vision models. We will use the rock-paper-scissors model to label a custom image. https://lnkd.in/e69CQgwW . . . . . . #machinelearning #computervision #roboflow #annotations #tutorial #blog #objectdetection

Custom Annotation for Roboflow Pre-trained Models with CV2

doczamora.com
Like Comment
To view or add a comment, sign in
Artificial Intelligence Feed

1,028 followers
9mo
Report this post
Meet VisionGPT-3D: Merging Leading Vision Models for 3D Reconstruction from 2D Images The transition from text to visual components has significantly enhanced daily tasks, from generating images and videos to identifying elements within them. Past computer vision models focused on object detection and classification, while large lang... https://lnkd.in/e2GPJ6bY #AI #ML #Automation

Meet VisionGPT-3D: Merging Leading Vision Models for 3D Reconstruction from 2D Images

openexo.com
Like Comment
To view or add a comment, sign in
YOLOvX

42,692 followers
4mo
Report this post
🚀 Improving 2D Feature Representations by 3D-Aware Fine-Tuning! A method to enhance 2D image features (like DINOv2) by lifting them to 3D representations. Highlights: 1️⃣ Lift 2D features to 3D 2️⃣ Finetune the 2D model with 3D-aware features 3️⃣ Apply fine-tuned features to downstream tasks The results? Significant improvements in semantic segmentation and depth estimation across various datasets, achievable with simple linear probing! 🎥 Check out this comparison video showcasing the enhanced performance vs. DINOv2. Awesome work by: Yuanwen Yue, Anurag Das, Francis Engelmann, Siyu Tang, and Jan Eric Lenssen! This work underscores the importance of fine-tuning in pushing the boundaries of computer vision. It's not just about having a strong foundation model – it's about adapting it effectively to new dimensions and tasks. Stay tuned for more exciting developments and breakthroughs on the horizon! ✨ WISERLI Ultralytics OpenCV Roboflow YOLOvX Dr. Chandrakant Bothe Rohan Gupta Vishnu Mate Mohit Raj Sinha Prateeksha Tripathy P Shreyas Anu Bothe Saurabh Tople Glenn Jocher Muhammad Rizwan Munawar Nicolai Nielsen Harpreet Sahota 🥑 Satya Mallick Florian Palatini Ritesh Kanjee Piotr Skalski Dragos Stan Arnaud Bastide Nicholas Nouri Timothy Goebel Antonio Rodriguez Cortés #AI #MachineLearning #ComputerVision #Innovation #Technology #YOLOvX

Improving 2D Feature Representations - YOLOvX
Like Comment
To view or add a comment, sign in
Zeno Robotics and AI Institute

6,366 followers
10mo Edited
Report this post
I have the OAK FFC Module integrated with the Jetson Orin Nano. This module has connectors for two monochrome cameras (will be the stereo cameras used for depth calculation) and an RGB camera. Same as the OAK-D depth camera. RVC2 inside This OAK device is built on top of the RVC2. Main features: 4 TOPS of processing power (1.4 TOPS for AI - RVC2 NN Performance) Run any AI model, even custom-architectured/built ones - models need to be converted. - Encoding: H.264, H.265, MJPEG - 4K/30FPS, 1080P/60FPS - Computer vision: warp/dewarp, resize, crop via ImageManip node, edge detection, feature tracking. You can also run custom CV functions - Stereo depth perception with filtering, post-processing, RGB-depth alignment, and high configurability - Object tracking: 2D and 3D tracking with ObjectTracker node The video below shows a demo Yolo python script which utilizes the depthAI software environment used on the host (Jetson Orin Nano) and runs the RCV2 formatted code on the OAK VPU board with the RGB camera. The OAK Module also has a BNO085, 9 axis IMU! Source Code can be found here: https://lnkd.in/g5zzNm-X My YouTube Version: https://lnkd.in/gwb3_vjY
Like Comment
To view or add a comment, sign in
Morris Lee

Computer Vision Consultant - available to help your R&D! Have 66+ patents. 37+ years experience in artificial intelligence and hitech technologies. Passionate about using the latest advancements to improve your business.
7mo
Report this post
Get 3D of unknown scene from from one image by extend depth estimation to make 3D shape with Flash3D https://lnkd.in/eTMUt44R Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image arXiv paper abstract https://lnkd.in/ec6JDbzb arXiv PDF paper https://lnkd.in/ekg_xGhy Project page https://lnkd.in/e2vweY_W In this paper, ... propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. ... start from ... model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor ... base this extension on feed-forward Gaussian Splatting. ... predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. ... Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input. Please like and share this post if you enjoyed it using the buttons at the bottom! Stay up to date. Subscribe to my posts https://lnkd.in/emCkRuA Web site with my other posts by category https://lnkd.in/enY7VpM LinkedIn https://lnkd.in/ehrfPYQ6 #ComputerVision #3D #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning
Like Comment
To view or add a comment, sign in
Klatch Technologies

9,230 followers
7mo
Report this post
🔍 Conquering complex image segmentation for research! Our experts annotated 2000+ intricate semantic images with high precision. 🖼️ Read the full case study: https://lnkd.in/gY9eb2eq Did you know accurate image segmentation is crucial for advanced computer vision tasks like object detection, scene understanding, and 3D reconstruction? * With this project, we enabled groundbreaking R&D by delivering pixel-perfect planar and instance annotations. Discover our optimized workflows and multi-stage QC process that ensured over 99% annotation accuracy 🏆 despite the dataset's massive variability. #ImageSegmentation #ComputerVision #ArtificialIntelligence #ImageAnnotation #SemanticSegmentation

Tackling Complex Image Segmentation: Precise Annotation for 2000+ Semantic Images

https://klatchtech.com

1 Comment
Like Comment
To view or add a comment, sign in
Ali Ahmad Rahmani

Looking for PhD position Independent Researcher, Software Developer, Freelance Programmer
4mo
Report this post
𝐀𝐝𝐯𝐚𝐧𝐜𝐞𝐦𝐞𝐧𝐭𝐬 𝐢𝐧 3𝐃 𝐎𝐛𝐣𝐞𝐜𝐭 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐝𝐯𝐞𝐫𝐬𝐚𝐫𝐢𝐚𝐥 𝐍𝐞𝐭𝐰𝐨𝐫𝐤𝐬- 16 In a paper titled, "Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling," Jiajun Wu and colleagues present a novel approach using Generative Adversarial Networks (GANs) to create new three-dimensional models, including items like chairs, cars, sofas, and tables. Their work highlights the ability of GANs to learn and replicate complex shapes and forms, facilitating the generation of diverse 3D objects from learned representations. Similarly, Matheus Gadelha and his team, in their 2016 study titled "3D Shape Induction from 2D Views of Multiple Objects," explore the use of GANs to derive three-dimensional models from two-dimensional images of objects captured from various angles. This research emphasizes the capacity of GANs to interpret flat images and reconstruct them into full 3D representations, thereby bridging the gap between 2D and 3D visual data. Both studies exemplify the transformative power of GANs in the realm of 3D object generation, showcasing how machine learning can be leveraged to enhance our understanding and creation of complex shapes in a virtual space.
Like Comment
To view or add a comment, sign in

1,635 followers

8 Posts

View Profile Connect

Moksh Malhotra’s Post

More Relevant Posts

Improving 2D Feature Representations - YOLOvX

Explore topics