Research · Graphics · Engineering · Music
I am currently a Ph.D. student at Prof. Hao Su's lab at UCSD. I obtained my B.Eng. in Artificial Intelligence at SJTU, advised by Prof. Cewu Lu. I had an internship at Microsoft Research Asia advised by Dr. Dongsheng Li and Caihua Shan in 2021, and at Stanford advised by Prof. Leonidas Guibas in 2022.
The world is fantastic. I like to get around and explore random things that pique my interest, in which process I build my mindset and skills. Find me on any medium if you want to talk!
email shiruoxi61 at gmail dot com
My current research interest is in computer graphics. I also work in computer vision for graphics, and is experienced in building AI infrastructure. We live in a 3D world, and I am thrilled to understand it, and to recreate it. Previously I also worked on quantum information technologies. I serve as a reviewer for Journal TNNLS.
ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
Ruoxi Shi*, Xinyue Wei*, Cheng Wang, Hao Su
CVPR'24
DIP-like convolutional prior for sparse-view NeRFs.
One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion
Minghua Liu*, Ruoxi Shi*, Linghao Chen*, Zhuoyang Zhang*, Chao Xu*, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su
CVPR'24
3D generation pipeline based on decomposed stages: multi-view generation, feed-forward sparse-view reconstruction and texture refinement.
Zero123++: A Single Image to Consistent Multi-view Diffusion Base Model
Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, Hao Su
Careful designs on Stable Diffusion for high-quality consistent multi-view generation from single image input.
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding
Minghua Liu*, Ruoxi Shi*, Kaiming Kuang*, Yinhao Zhu, Xuanlin Li, Shizhong Han, Hong Cai, Fatih Porikli, Hao Su
NIPS'23
3D native VLM. 85.3% zero-shot accuracy on ModelNet40 matching some fully-supervised baselines. 77% top-5 accuracy on 1136-class Objaverse-LVIS. The point cloud encoder can also be used for captioning, point cloud-conditioned image generation and multi-modal retrieval. It is a great step towards open-world understanding of 3D shapes.
Toward Learning Geometric Eigen-Lengths Crucial for Fitting Tasks
Yijia Weng, Kaichun Mo, Ruoxi Shi, Yanchao Yang, Leonidas Guibas
ICML'23
Humans can find compact and meaningful geometric concepts (width, radius, volume etc.) as summaries for objects. In this work we attempt to arm machines with the same capability of conceptual emergence and explore how this may help with robotic tasks.
RendNet: Unified 2D/3D Recognizer with Latent Space Rendering
Ruoxi Shi, Xinyang Jiang, Caihua Shan, Yansen Wang, Dongsheng Li
CVPR'22 (Oral)
VG (Vector Graphics) have been ubiquitous in our daily life with vast applications in engineering, architecture, designs, etc. We connect VG and RG (Raster Graphics) with a latent space rendering technique to get the best of both worlds: the infinite resolution and high-level topology information in VG, and the availability and natural error filtering in RG.
CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild
Yang You, Ruoxi Shi, Weiming Wang, Cewu Lu
CVPR'22
By aggregating votes on pairs of local small patches, the deep learning models are more robust and generalizes better. The sim-to-real transfer performance of pose estimation of CPPF beats previous methods by a large margin.
Skeleton Merger: an Unsupervised Aligned Keypoint Detector
Ruoxi Shi, Zhengrong Xue, Yang You, Cewu Lu
CVPR'21 (Oral)
Detecting aligned 3D keypoints is essential under many scenarios such as object tracking, shape retrieval and robotics. However, it is generally hard to prepare a high-quality dataset for all types of objects due to the ambiguity of keypoint itself. Meanwhile, current unsupervised detectors are unable to generate aligned keypoints with good coverage. In this work we design SkeletonMerger, an auto-encoder architecture based on the skeleton representation, to detect aligned keypoints from objects in an unsupervised manner.
PRIN/SPRIN: On Extracting Point-Wise Rotation Invariant Features
Yang You*, Yujing Lou*, Ruoxi Shi, Qi Liu, Yu-Wing Tai, Lizhuang Ma, Weiming Wang, Cewu Lu
TPAMI'21
SO(3) rotation invariance is an interesting property of 3D geometry. We present the networks (S)PRIN (Sparse Point-wise Rotation Invariant Networks) based on SO(3) group convolution. They achieve state-of-the-art results on rotation-invariant representation learning.
Training a Quantum PointNet with Nesterov Accelerated Gradient Estimation by Projection
Ruoxi Shi, Hao Tang, Xian-Min Jin
NIPS QTNML Workshop'21
PointNet implemented with quantum circuits with special designs that provide exponential speed-up on the per-point layers. Quantum PointNet is tested on Ion-Q processors with ModelNet3 accuracy on-par with classical PointNet.
TensorFlow Solver for Quantum PageRank in Large-Scale Networks
Hao Tang*, Ruoxi Shi*, TianShen He*, YanYan Zhu, TianYu Wang, Marcus Lee, XianMin Jin
Science Bulletin'21
Quantum Stochastic Walk (QSW), of which the Quantum PageRank algorithm is based on, is a foundamental formulation of quantum many-body interaction with strong local restriction fields. In this work we reformulate the QSW master equation to utilize more structure of the problem and develop a simulator that fully leverages the power of GPU parallel computation. We achieved 500x lower memory consumption and 2000x speed-up quantum page-ranking a real-world large-scale network of airports.
I am interested but admittedly untalented in visual arts. As such, I decide to use my knowledge in 3D to do 3D CG artworks.
'Delightful!'
This is my favorite work so far. I did the modeling of the character, design of the scene and the rendering.
I am currently producing an animation series based on the story of Arcaea in my spare time. It is by no means easy to produce animation even with the latest production technologies at hand, but I feel good doing it.
As I said before, I like to explore random things that pique my curiosity. Consequently, I have a bunch of side projects and achievements that somehow came together (plus actually may be unexpected).
I am the lead programmer of the rhythm game Paradigm: Reboot.
Championship of DEFCON CTF 28 (2020) as a member of A*0*E.
I guess I have some talent for security, but I don't like taking it as a major or a job.
I am founder and programmer of the Terraria mod Fourty-nine Fallen Stars.
Arguably my first project, and the initial motivation of learning programming. It started with doing everything from art, design to implementation myself. Later I got comrades. We did a good job and had several thousand players. Though due to various reasons the mod is no longer maintained, it left us with lots of wonderful memories.
Beyond these projects and achievements, I also developed many hobbist projects. Here are a few that I am most proud of:
A sword against the inconsistent and ambiguous conventions (camera poses, viewport origins, etc.). A missing toolkit for CG/CV/Robotics.
A lightweight library that helps to boost productivity using PyTorch by providing utilities like MLP, automatic mixed-precision and polyfills as well as a training loop that is highly customizable.
Tool for installing apt packages without root permission in local space. The name 'aptli' is short for 'apt local install'.
Yet Another Python JIT compiler. I have implemented a method-based JIT. However, due to the extreme dynamicism in Python, I find it impossible to do some optimizations on method level. Thus, I am now writing a trace-based JIT. Currently the tracing infrastructure is done, but I have hardly begun with the compiler and the mode switching parts. Writing a compiler is labor-intensive, and I do not have that much energy for this project recently.
SKIS is a collection of toolkit infrastructure services. The trinity of distributed system: Computation, Storage and Communication. Computation and Storage are in general local, but communication is not. Thus, SKIS seeks to provide only the communication part as a service. It includes a markdown pastebin and a general data pipe. I plan to add a service management portal for non-communication services.
Actually a side product when writing the Skeleton Merger paper. At CVPR someone came up to me like 'oh the visualization is awesome how did you do that?' As a consequence, I cleaned up the code and built this feature-rich visualization interface.
I love music.
My world will be a lot duller without music decorating every corner of my life, from BGM at work to composition. I started playing the violin when I was 7. I was a core member of SHS orchestra, and performed at several public concerts. When learning composition I also self-taught very basic and limited piano and flute playing.
ak+q is my favorite artist. He is so good at chord progressions. Lively or lush, chill or splendor. All so great. His pieces leave me with an impression that they are breathing.
Composition Pieces
Departure
Dispersion
Forest Enoch
Remix Pieces
Merry Chistmas
Never Gonna Give You Up
Lost Dream
The Cursed Flames