Big Data

Databricks at NeurIPS 2024 | Databricks Weblog

9 December 2024

Databricks is proud to be a platinum sponsor of NeurIPS 2024. The convention runs from December 10 to fifteen in Vancouver, British Columbia.

Go to our Sales space

Cease by sales space #591 within the Expo Corridor from December 10-12 to fulfill members of the group and find out about our newest work.

Demo

Be a part of us as we reveal how MLflow Tracing and the Mosaic AI Agent Framework present observability and automatic analysis as we iteratively enhance the factuality and accuracy of a GenAI software with DSPy. MLflow’s Tracing characteristic captures detailed details about LLM and agent inputs and outputs, permitting builders to simply determine the supply of bugs and sudden behaviors. Moreover, the Mosaic AI Agent Framework, a part of the Databricks Knowledge Intelligence Platform, gives capabilities for bettering the standard of GenAI purposes by way of human suggestions and automatic analysis.

Displays and accepted publications

Talks

The Desk Illustration Studying (TRL) workshop is the premier venue for analysis into tabular knowledge as a modality for illustration studying and generative fashions. At this 12 months’s workshop, Matei Zaharia is the featured speaker for the session centered on pure language interfaces to tables.

Workshop Accepted Papers

On this work, we examine the effectiveness of sparse upcycling in opposition to continued pretraining (CPT) throughout totally different mannequin sizes, compute budgets, and pretraining durations. Our experiments present that sparse upcycling can obtain higher high quality, with enhancements of over 20% relative to CPT in sure situations. Nonetheless, this comes with a big inference price, resulting in 40% slowdowns in high-demand inference settings for bigger fashions. Our findings spotlight the trade-off between mannequin high quality and inference effectivity, providing insights for practitioners in search of to steadiness mannequin high quality and deployment constraints.

This paper presents a complete examine of the influence of elevated context size on RAG efficiency throughout 20 standard open supply and business LLMs. We run RAG workflows whereas various the overall context size from 2,000 to 128,000 tokens (and a pair of million tokens when attainable) on three domain-specific datasets, and report key insights on the advantages and limitations of lengthy context in RAG purposes. Our findings reveal that whereas retrieving extra paperwork can enhance efficiency, solely a handful of the latest state-of-the-art LLMs can keep constant accuracy at lengthy context above 64k tokens. We additionally determine distinct failure modes in lengthy context situations, suggesting areas for future analysis.

On this work, we discover the usage of MixAttention, a mannequin structure modification that mixes sliding window consideration, the place solely a small subset of current tokens is saved within the KV cache, with KV cache sharing throughout layers. Our experiments reveal that MixAttention considerably reduces reminiscence utilization and improves inference pace with out sacrificing mannequin efficiency in each quick and long-context duties. We additionally discover numerous configurations of this structure, figuring out those who keep high quality throughout analysis metrics whereas optimizing useful resource effectivity.

We introduce Critique-out-Loud (CLoud) RLHF reward fashions that motive explicitly concerning the high quality of a response from an LLM assistant. CLoud reward fashions function by first producing a pure language critique of the assistant’s response that’s then used to foretell a scalar reward for the standard of the response. We reveal the success of CLoud reward fashions for each Llama-3-8B and 70B base fashions: in comparison with basic reward fashions, CLoud reward fashions enhance pairwise desire classification accuracy on RewardBench by 4.65 and 5.84 proportion factors for the 8B and 70B base fashions respectively. Moreover, CLoud reward fashions result in a Pareto enchancment for win fee on ArenaHard when used because the scoring mannequin for Finest-of-N. Lastly, we discover the right way to exploit the dynamic inference compute capabilities of CLoud reward fashions by performing self-consistency decoding for reward prediction.

Be a part of our Group

Are you curious about working with us? We’re hiring! Try our open jobs and be part of our rising analysis group.

Go to our Sales space

Displays and accepted publications

Be a part of our Group

LEAVE A REPLY Cancel reply