Ruoxi Shi 时若曦 / Eliphat

Research · Graphics · Engineering · Music

I am currently a Ph.D. student at Prof. Hao Su's lab at UCSD. I obtained my B.Eng. in Artificial Intelligence at SJTU, advised by Prof. Cewu Lu. I had an internship at Microsoft Research Asia advised by Dr. Dongsheng Li and Caihua Shan in 2021, and at Stanford advised by Prof. Leonidas Guibas in 2022.

The world is fantastic. I like to get around and explore random things that pique my interest, in which process I build my mindset and skills. Find me on any medium if you want to talk!

email shiruoxi61 at gmail dot com

Github Google Scholar Blog (Chinese)

RESEARCH

My current research interest is in computer graphics. I also work in computer vision for graphics, and is experienced in building AI infrastructure. We live in a 3D world, and I am thrilled to understand it, and to recreate it. Previously I also worked on quantum information technologies. I serve as a reviewer for Journal TNNLS.

ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining

Ruoxi Shi*, Xinyue Wei*, Cheng Wang, Hao Su

CVPR'24

DIP-like convolutional prior for sparse-view NeRFs.

Paper Code Project Page

One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion

Minghua Liu*, Ruoxi Shi*, Linghao Chen*, Zhuoyang Zhang*, Chao Xu*, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su

CVPR'24

3D generation pipeline based on decomposed stages: multi-view generation, feed-forward sparse-view reconstruction and texture refinement.

Paper Demo

Zero123++: A Single Image to Consistent Multi-view Diffusion Base Model

Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, Hao Su

Careful designs on Stable Diffusion for high-quality consistent multi-view generation from single image input.

Report Code Demo

OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

Minghua Liu*, Ruoxi Shi*, Kaiming Kuang*, Yinhao Zhu, Xuanlin Li, Shizhong Han, Hong Cai, Fatih Porikli, Hao Su

NIPS'23

3D native VLM. 85.3% zero-shot accuracy on ModelNet40 matching some fully-supervised baselines. 77% top-5 accuracy on 1136-class Objaverse-LVIS. The point cloud encoder can also be used for captioning, point cloud-conditioned image generation and multi-modal retrieval. It is a great step towards open-world understanding of 3D shapes.

Paper Code Demo

Toward Learning Geometric Eigen-Lengths Crucial for Fitting Tasks

Yijia Weng, Kaichun Mo, Ruoxi Shi, Yanchao Yang, Leonidas Guibas

ICML'23

Humans can find compact and meaningful geometric concepts (width, radius, volume etc.) as summaries for objects. In this work we attempt to arm machines with the same capability of conceptual emergence and explore how this may help with robotic tasks.

RendNet: Unified 2D/3D Recognizer with Latent Space Rendering

Ruoxi Shi, Xinyang Jiang, Caihua Shan, Yansen Wang, Dongsheng Li

CVPR'22 (Oral)

VG (Vector Graphics) have been ubiquitous in our daily life with vast applications in engineering, architecture, designs, etc. We connect VG and RG (Raster Graphics) with a latent space rendering technique to get the best of both worlds: the infinite resolution and high-level topology information in VG, and the availability and natural error filtering in RG.

Paper

CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild

Yang You, Ruoxi Shi, Weiming Wang, Cewu Lu

CVPR'22

By aggregating votes on pairs of local small patches, the deep learning models are more robust and generalizes better. The sim-to-real transfer performance of pose estimation of CPPF beats previous methods by a large margin.

Paper Code Project Page

Skeleton Merger: an Unsupervised Aligned Keypoint Detector

Ruoxi Shi, Zhengrong Xue, Yang You, Cewu Lu

CVPR'21 (Oral)

Detecting aligned 3D keypoints is essential under many scenarios such as object tracking, shape retrieval and robotics. However, it is generally hard to prepare a high-quality dataset for all types of objects due to the ambiguity of keypoint itself. Meanwhile, current unsupervised detectors are unable to generate aligned keypoints with good coverage. In this work we design SkeletonMerger, an auto-encoder architecture based on the skeleton representation, to detect aligned keypoints from objects in an unsupervised manner.

Paper Code

PRIN/SPRIN: On Extracting Point-Wise Rotation Invariant Features

Yang You*, Yujing Lou*, Ruoxi Shi, Qi Liu, Yu-Wing Tai, Lizhuang Ma, Weiming Wang, Cewu Lu

TPAMI'21

SO(3) rotation invariance is an interesting property of 3D geometry. We present the networks (S)PRIN (Sparse Point-wise Rotation Invariant Networks) based on SO(3) group convolution. They achieve state-of-the-art results on rotation-invariant representation learning.

Paper Code

Training a Quantum PointNet with Nesterov Accelerated Gradient Estimation by Projection

Ruoxi Shi, Hao Tang, Xian-Min Jin

NIPS QTNML Workshop'21

PointNet implemented with quantum circuits with special designs that provide exponential speed-up on the per-point layers. Quantum PointNet is tested on Ion-Q processors with ModelNet3 accuracy on-par with classical PointNet.

Paper

TensorFlow Solver for Quantum PageRank in Large-Scale Networks

Hao Tang*, Ruoxi Shi*, TianShen He*, YanYan Zhu, TianYu Wang, Marcus Lee, XianMin Jin

Science Bulletin'21

Quantum Stochastic Walk (QSW), of which the Quantum PageRank algorithm is based on, is a foundamental formulation of quantum many-body interaction with strong local restriction fields. In this work we reformulate the QSW master equation to utilize more structure of the problem and develop a simulator that fully leverages the power of GPU parallel computation. We achieved 500x lower memory consumption and 2000x speed-up quantum page-ranking a real-world large-scale network of airports.

Paper

GRAPHICS

I am interested but admittedly untalented in visual arts. As such, I decide to use my knowledge in 3D to do 3D CG artworks.

Hikari CG, created in celebration of Christmas

'Delightful!'

This is my favorite work so far. I did the modeling of the character, design of the scene and the rendering.

I am currently producing an animation series based on the story of Arcaea in my spare time. It is by no means easy to produce animation even with the latest production technologies at hand, but I feel good doing it.

A picture of a church

ENGINEERING

As I said before, I like to explore random things that pique my curiosity. Consequently, I have a bunch of side projects and achievements that somehow came together (plus actually may be unexpected).

I am the lead programmer of the rhythm game Paradigm: Reboot.

Championship of DEFCON CTF 28 (2020) as a member of A*0*E.

I guess I have some talent for security, but I don't like taking it as a major or a job.

I am founder and programmer of the Terraria mod Fourty-nine Fallen Stars.

Arguably my first project, and the initial motivation of learning programming. It started with doing everything from art, design to implementation myself. Later I got comrades. We did a good job and had several thousand players. Though due to various reasons the mod is no longer maintained, it left us with lots of wonderful memories.

Beyond these projects and achievements, I also developed many hobbist projects. Here are a few that I am most proud of:

calibur

A sword against the inconsistent and ambiguous conventions (camera poses, viewport origins, etc.). A missing toolkit for CG/CV/Robotics.

torch.redstone

A lightweight library that helps to boost productivity using PyTorch by providing utilities like MLP, automatic mixed-precision and polyfills as well as a training loop that is highly customizable.

aptli

Tool for installing apt packages without root permission in local space. The name 'aptli' is short for 'apt local install'.

yapyjit

Yet Another Python JIT compiler. I have implemented a method-based JIT. However, due to the extreme dynamicism in Python, I find it impossible to do some optimizations on method level. Thus, I am now writing a trace-based JIT. Currently the tracing infrastructure is done, but I have hardly begun with the compiler and the mode switching parts. Writing a compiler is labor-intensive, and I do not have that much energy for this project recently.

SKIS

SKIS is a collection of toolkit infrastructure services. The trinity of distributed system: Computation, Storage and Communication. Computation and Storage are in general local, but communication is not. Thus, SKIS seeks to provide only the communication part as a service. It includes a markdown pastebin and a general data pipe. I plan to add a service management portal for non-communication services.

Point Cloud Visualizer

Actually a side product when writing the Skeleton Merger paper. At CVPR someone came up to me like 'oh the visualization is awesome how did you do that?' As a consequence, I cleaned up the code and built this feature-rich visualization interface.

MUSIC

I love music.

My world will be a lot duller without music decorating every corner of my life, from BGM at work to composition. I started playing the violin when I was 7. I was a core member of SHS orchestra, and performed at several public concerts. When learning composition I also self-taught very basic and limited piano and flute playing.

ak+q is my favorite artist. He is so good at chord progressions. Lively or lush, chill or splendor. All so great. His pieces leave me with an impression that they are breathing.

Composition Pieces

Departure

Dispersion

Forest Enoch

Remix Pieces

Merry Chistmas

Never Gonna Give You Up

Lost Dream

The Cursed Flames