Joshua Cason’s Post

Senior Scala Dev at Demandbase

6mo Edited

Today I'm sharing this 🔥 video from our recent Engineering onsite at Demandbase HQ in SF! https://lnkd.in/eFAFR6QG We were so fortunate to host Zhengxuan Wu and Aryaman Arora, two upcoming AI researchers from Stanford University Department of Computer Science. Their approach to fine tuning, called ReFT, targets the representations emitted by neural net layers rather than the weights of the layers themselves. This makes it very flexible when using the same base model for swapping multiple fine tunes. The memory overhead is also very low for their intervention weights. Another highlight is how powerful it can be. One POC they showed forced the LLM to speak in only emojis exhibiting ReFT's power to control output. Control is crucial when, for example, you want to prevent your bot from offering customers unauthorized refunds!! If you're already familiar with LoRA, it should be pretty simple to use with only a few extra hyperparameters -- details in the video. Here's a link to the repo. Give them a star! https://lnkd.in/eJScNbzK Thanks again for letting us host your first external talk Aryaman and Zen. It's very exciting tech!!

ReFT: Representation Finetuning for Language Models -- Aryaman Arora & Zhengxuan (Zen) Wu

https://www.youtube.com/

4 Comments

Demandbase

6mo

This is awesome!

1 Reaction

Dave Jones

Consultant at KBSI

6mo

This is very cool, Josh. Thanks for sharing!

See more comments

To view or add a comment, sign in

More Relevant Posts

Arcee AI

7,121 followers
6mo Edited
Report this post
Here at Arcee Ai we're growing fast 🚀🚀🚀 Among a crew of new hires starting next week, we'd like to extend a special congratulations to Lucas Atkins and Fernando Fernandes Neto for the publication of this paper on which they are co-authors: "Spectrum: Targeted Training on Signal to Noise Ratio." The paper dives into how post-training of large language models (LLMs) is challenging due to high computational demands, and introduces Spectrum – a method that accelerates LLM training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), freezing the rest. Using Random Matrix Theory and the Marchenko-Pastur distribution, Spectrum identifies and trains only the most informative layers. This approach matches the performance of full fine-tuning while reducing GPU memory usage and training time. Check out the full paper below, and join me in welcoming Lucas and Fernando to the Arcee AI research team 👏👏👏 https://hubs.li/Q02BRVPD0

Spectrum: Targeted Training on Signal to Noise Ratio

arxiv.org

2 Comments
Like Comment
To view or add a comment, sign in
Bob Hayes, PhD

Data Science, Customer Experience/Success, Machine Learning
3mo
Report this post
#LLMs Will Always Hallucinate "...Structural Hallucination as an intrinsic nature of these systems. By establishing the mathematical certainty of #hallucinations, we challenge the prevailing notion that they can be fully mitigated." https://buff.ly/3XlV0Zk #GenerativeAI

LLMs Will Always Hallucinate, and We Need to Live With This

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
Ashutosh Hathidara

Senior ML Scientist @SAP AI | Machine Learning Researcher | Opensource Creator | Motion Graphics Designer
3w
Report this post
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models. It is a modular open-source framework to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design (ii) efficient training (iii) comprehensive evaluation. The paper benchmarks five state-of-the-art MoE algorithms over three different LLMs and 11 datasets under the zero-shot setting. #MachineLearning #GenAI #LLM #MoE #DataScience
1 Comment
Like Comment
To view or add a comment, sign in
Ramin Mehran

Tech Lead @ Google DeepMind Multi-Modal perception/generation, AI Breakdown Podcaster
1mo
Report this post
In this episode, we discuss Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization by Mohammad Samragh, Iman Mirzadeh, Keivan Alizadeh Vahid, Fartash Faghri, Minsik Cho, Moin Nabi, Devang Naik, Mehrdad Farajtabar. The paper presents HyperCloning, a technique for initializing large language models with smaller, pre-trained models to leverage their predictive power. This method allows large models to require less training time and fewer GPU hours by scaling up small models while preserving their functionalities. HyperCloning offers a viable solution to efficiently manage the high costs and time investments in training large language models.

Arxiv Paper - Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

podbean.com

1 Comment
Like Comment
To view or add a comment, sign in
Cristian Eugen Liciu

I'm particularly interested in understanding human-robot collaboration and engineering learning-based methods that enrich that collaboration.
7mo
Report this post
Put simply, the three libraries— #LILO (library induction from language observations), #Ada (action domain acquisition), and #LGA (language-guided abstraction)—all work to provide human-like reasoning across certain functions, such as computer programming, task planning, and robotic tasks. Using the neurosymbolic method baked into LILO, MIT uses its Stitch ( get it? ) algorithm to identify abstractions. This allows #LLMs to apply commonsense knowledge with sophistication that...

A New AI Discovery Sure Looks Like the Dawn of True Machine Reasoning

newscaf.com
Like Comment
To view or add a comment, sign in
Walter C.

Founder & CEO @ Tipalo - COGNITIVE EDGE AI acting in real-time will usher in a new era of philosophy, logical thinking & space technology
4mo
Report this post
I NEVER USED LLMS REASONS 1. they cripple your own ability to express yourself 2. they make you lazy, so one does not do any research anymore 3. they slowly but surely makes you complacent, as you do not analyze information & summarize your thoughts METAPHOR As a comparison, it is nice to have a car and drive everywhere, there is no need to walk, let alone run. But then again, you get fat while eating at the same rate, so you have an unhealthy lifestyle and slowly get sick. EPILOGUE "There is nothing like writing to force you to think and get your thoughts straight." Warren Buffett LINKS LLMS IMPLY DECEPTION TO PLEASE, WHICH IS THE SAME AS SEX FOR MONEY - AS LONG AS HE PAYS, GIVE THE CUSTOMER WHAT HE WANTS REGARDLESS OF THE CONSEQUENCES https://lnkd.in/gKtnjbEh WHAT DO YOU THINK CHATGPT REALLY IS - NOTHING BUT A RECREATIONAL DRUG USE https://lnkd.in/d2w5QSi6 EVERY LANGUAGE IS ACTUALLY LOGIC - DESCRIBING THE REPRESENTATION , ORGANIZATION & FLOW OF INFORMATION IN OUR MIND, SOMETIMES WITH FEEDBACK https://lnkd.in/gPrSqRUH
Gilbert Paquet, M.A.
4mo Edited

Everything has been said about large language models (#LLMs). Except this: switch them off. It doesn't work. Why? "The best material model for a cat is another cat, or preferably the same cat." (Norbert Wiener and Arturo Rosenblueth. 1945. "The Role of Models in Science." Philosophy of Science, Vol. 12, No. 4, pp. 316-321.) Period :-)
4 Comments
Like Comment
To view or add a comment, sign in
Jayaraman Thiagarajan

Generative AI Researcher
5mo Edited
Report this post
#ECCV2024 New paper alert: We introduce DECIDER to detect failures in vision models using priors from LLMs and VLMs. DECIDER produces SoTA results under covariate shifts, style changes, spurious correlations and class imbalance. Preprint & codes coming out soon! DECIDER leverages large language models (LLMs) to identify the essential attributes relevant to a task. It then constructs a debiased classifier by aligning its features with these attributes using a vision-language model (VLM). DECIDER detects failures by comparing the predictions from the original and debiased models. Joint work with Kowshik Thopalli Rakshith Subramanyam Vivek Sivaraman Narayanaswamy Lawrence Livermore National Laboratory
1 Comment
Like Comment
To view or add a comment, sign in
Joshua Anang

18 y/o AI startup Founder | Revolutionizing healthcare with Artificial Intelligence.
2mo
Report this post
MIT researchers found a way to reduce latency in Large Language Models by utilizing the L-Mul algorithm which performs integer addition rather than the hectic process of multiplying floating point numbers. The numbers or values are instead converted into fixed point numbers or scaled by a 100. E.g. two floating point numbers like 2.5 or 1.5 are scaled by 100 to become 250 and 150. Integer addition is performed, and the numbers are scale downwards again by the square of the number they were scaled by which is 100. Hence (10000). It is said to achieve precise results almost close to that of the traditional floating-point multiplication algorithms. L-Mul with 4-bit mantissa achieves comparable precision as float8_e4m3 multiplications, and L-Mul with 3-bit mantissa outperforms float8_e5m2. This process is mostly required in scenarios were speed trumps precision. #tech #LargeLanguageModels #Transformers #MachineLearning #DeepLearning
2 Comments
Like Comment
To view or add a comment, sign in
Reuben Ong

Research Analyst (Data Science)
4mo
Report this post
by removing the obfuscation layer of mathematical notation and language presentation of problems and translating it into code form that is more prevalent and condusive to train models on. models are finally able to use math to solve math. https://lnkd.in/dQQc48Ni

Google DeepMind's AlphaProof MASSIVE MATH BREAKTHROUGH - AI teaches itself mathematical proofs

https://www.youtube.com/

1 Comment
Like Comment
To view or add a comment, sign in
Tanat Tonguthaisri, CISSP®

enabling digital services for Student Loan related activities while maintaining the highest security standard, the most compliant personal data protection and customer-centric data-driven innovation.
6mo
Report this post
📢 Exciting update! Learn about achieving sparse activation in Small Language Models without retraining efforts in this new blog post. The post demonstrates the challenges and solutions for applying sparse activation to SLMs and how it can significantly reduce computing costs. Check out the full details and experimental results here: https://bit.ly/4elx3sU #LanguageModels #SparseActivation #MachineLearning
Like Comment
To view or add a comment, sign in

1,110 followers

View Profile Connect

Joshua Cason’s Post

ReFT: Representation Finetuning for Language Models -- Aryaman Arora & Zhengxuan (Zen) Wu

https://www.youtube.com/

More from this author

Telepathic Chat with ChatGPT: How Our Random Interjections Led to Mind-Blowing Revelations

Explore topics

Joshua Cason’s Post

ReFT: Representation Finetuning for Language Models -- Aryaman Arora & Zhengxuan (Zen) Wu

https://www.youtube.com/

More Relevant Posts

Arxiv Paper - Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

podbean.com

Google DeepMind's AlphaProof MASSIVE MATH BREAKTHROUGH - AI teaches itself mathematical proofs

https://www.youtube.com/

More from this author

Telepathic Chat with ChatGPT: How Our Random Interjections Led to Mind-Blowing Revelations

Explore topics