Home Blog Page 3773

A reverse-engineering software for Nim-compiled binaries


The Nim programming language has turn out to be more and more enticing to malware builders attributable to its strong compiler and its capability to work simply with different languages. Nim’s compiler can compile Nim to JavaScript, C, C++, and Goal-C, and cross-compile for main working methods comparable to Home windows, Linux, macOS, Android, and iOS. Moreover, Nim helps importing features and symbols from the languages talked about above, and importing from dynamically linked libraries for Home windows and shared libraries for Linux. Nim wrapper modules are additionally out there, comparable to Winim, that make interplay with the working system painless. All these capabilities enable straightforward integration of Nim into improvement pipelines utilizing these languages and enhance the event of recent instruments, each benign and malicious.

It’s no shock, then, that ESET Analysis has seen an ongoing use of malware developed in Nim within the wild. Way back to 2019, Sednit was noticed utilizing a malicious downloader written in Nim. One other infamous group taking part in the Nim sport, and the impetus for creating Nimfilt, is the Mustang Panda APT group. ESET Analysis recorded Mustang Panda utilizing Nim in its toolset for the primary time in a marketing campaign towards a governmental group in Slovakia in August 2023. The malicious DLL detected – and used as a part of the group’s basic trident Korplug loader – was written in Nim.

For researchers tasked with reverse engineering such binaries, Nimfilt is a robust software to hurry up evaluation. Whereas Nimfilt may be run as a Python script each on the command line (with a subset of its performance) and in Hex-Rays’ IDA program, it is going to be offered right here primarily as a Python plugin for IDA.

Initializing Nimfilt in IDA

When IDA is first opened, it masses and initializes any plugins within the IDA plugins listing. Through the initialization of Nimfilt, the plugin makes use of fundamental heuristics to find out whether or not the disassembled binary was compiled with the Nim compiler. If one of many following checks is handed, Nimfilt determines that this compiler was used:

  • The binary incorporates each of the next strings:
  • The binary incorporates any of the next well-known Nim operate names:
    • NimMain
    • NimMainInner
    • NimMainModule
  • The binary incorporates a minimum of two of the next error message strings:
    • @worth out of vary
    • @division by zero
    • @over- or underflow
    • @index out of bounds

YARA guidelines are offered together with Nimfilt that make related checks to find out whether or not an ELF or PE file has been compiled with Nim. Collectively, these checks are much more strong than the method taken by different instruments, comparable to Detect It Simple, which at present solely checks the .rdata part of PE information for the string io.nim or deadly.nim.

As the ultimate initialization step, if Nimfilt’s AUTO_RUN flag is ready to true, the plugin runs instantly. In any other case, Nimfilt may be run as common from IDA’s plugins menu, as proven in Determine 1.

Figure 1. Initializing and running the Nimfilt plugin in IDA
Determine 1. Initializing and operating the Nimfilt plugin in IDA

Demangling with Nimfilt

Nim makes use of a customized identify mangling scheme that Nimfilt can decode. Throughout a run, Nimfilt iterates via every operate identify within the binary, checking whether or not the identify is a Nim bundle or operate identify. Found names are renamed to their demangled varieties.

Apparently, these names can leak details about the developer’s atmosphere, in a lot the identical manner as PDB paths. That is as a result of Nim compiler including the file path to the identify throughout mangling – Nimfilt reveals the trail upon demangling.

For instance, operate names from third-party packages are saved as absolute paths throughout the mangling course of. Determine 2 exhibits a operate identify that’s saved as an absolute path revealing the model and checksum of the nimSHA2 bundle used, together with the developer’s set up path for nimble – Nim’s default bundle supervisor.

python nimfilt.py GET_UINT32_BE__6758Z85sersZ85serOnameZOnimbleZpkgs50Znim837265504548O49O494554555453d57a4852c515056c5452eb5354b51fa5748f5253545748505752cc56fdZnim83726550_u68

C:/Customers/Consumer.identify/.nimble/pkgs2/nimSHA2-0.1.1-6765d9a04c328c64eb56b3fa90f45690294cc8fd/nimSHA2::GET_UINT32_BE u68

Determine 2. Demangling the identify of a operate from a third-party bundle

In distinction, Determine 3 exhibits the identify of a operate from a normal Nim bundle saved as a relative path (that’s, relative to the Nim set up path).

python nimfilt.py toHex__pureZstrutils_u2067

pure/strutils::toHex u2067

Determine 3. Demangling the identify of a operate from a normal Nim bundle

Nonetheless, names should not all the time mangled in the identical manner. Determine 4 exhibits that the identical operate identify above from the nimSHA2 bundle is saved on Linux as a relative path.

python nimfilt.py GET_UINT32_BE__OOZOOZOOZhomeZalexZOnimbleZpkgs50Znim837265504548O49O494554555453d57a4852c515056c5452eb5354b51fa5748f5253545748505752cc56fdZnim83726550_u49

../../../residence/alex/.nimble/pkgs2/nimSHA2-0.1.1-6765d9a04c328c64eb56b3fa90f45690294cc8fd/nimSHA2::GET_UINT32_BE u49

Determine 4. Demangling the identify of a operate from a third-party bundle on Linux

Package deal initialization features are mangled in a very totally different manner: the bundle identify is saved as a file path (together with the file extension) positioned earlier than the operate identify and an escaping scheme is used to symbolize sure characters like ahead slashes, hyphens, and dots. Upon demangling, Nimfilt cleans up the bundle identify by eradicating the .nim file extension, as proven in Determine 5.

python nimfilt.py atmdotdotatsdotdotatsdotnimbleatspkgsatswinimminus3dot9dot1atswinimatsincatswinbasedotnim_DatInit000

../../.nimble/pkgs/winim-3.9.1/winim/inc/winbase::DatInit000

Determine 5. Demangling the identify of an initialization operate from a third-party bundle

Determine 6 exhibits how names of initialization features from native packages are saved as absolute paths.

python nimfilt.py atmCatcatstoolsatsNimatsnimminus2dot0dot0atslibatssystemdotnim_Init000

C:/instruments/Nim/nim-2.0.0/lib/system::Init000

Determine 6. Demangling the identify of an initialization operate from a local bundle

In IDA, Nimfilt’s identify demangling course of is adopted by the creation of directories within the Capabilities window to prepare features in response to their bundle identify or path, as proven in Determine 7.

Figure 7. The IDA Functions window before (left) and after (right) Nimfilt organizes function names by package or path
Determine 7. The IDA Capabilities window earlier than (left) and after (proper) Nimfilt organizes operate names by bundle or path

Making use of structs to Nim strings

The final motion carried out throughout a run of Nimfilt is making use of C-style structs to Nim strings. Simply as strings in another programming languages are objects reasonably than null-terminated sequences of bytes, so are strings in Nim. Determine 8 exhibits how the string ABCDEF seems in IDA earlier than and after operating Nimfilt. Be aware that in disassembled kind, a Nim-compiled binary makes use of the prefix _TM as part of the short-term identify of some variables; these are sometimes Nim strings.

Figure 8. A Nim string before (left) and after (right) running Nimfilt
Determine 8. A Nim string earlier than (left) and after (proper) operating Nimfilt

Nimfilt iterates via every deal with within the .rdata or .rodata section, and in every other read-only information section, in search of Nim strings. Structs are utilized to any found strings; the struct incorporates a size area and a pointer to the payload consisting of the characters within the string.

Wrap-up

On its technique to being compiled as an executable, Nim supply code is often translated to C or C++; nonetheless, this course of doesn’t solely take away all traces of Nim. By taking a journey via the Nim compiler supply code, we now have unraveled among the paths taken within the compilation course of and had been thus in a position to construct Nimfilt as a Python software, and IDA plugin, to help on this untangling.

In brief, whether or not or not you might be new to Nim, turning to Nimfilt will make your reverse engineering work with Nim-compiled binaries nearly immediately simpler and extra targeted. Not at all, nonetheless, is Nimfilt’s improvement at a standstill; we’re engaged on extra options to deal with double mangling, and enhance the formatting of demangled names and the grouping of bundle names.

Nimfilt’s supply code and documentation can be found in a repository hosted on ESET’s GitHub group at https://github.com/eset/nimfilt.

No needles mandatory: This wearable can monitor your ldl cholesterol and lactate ranges

0


NUS wearable to detect skin biomarkers

NUS Institute for Well being Innovation & Expertise

Researchers in Singapore have developed a wearable that bypasses the necessity to acquire biofluid samples and might help detect medical situations like strokes earlier.

In keeping with a report on the venture, the stretchable hydrogel-based sensor tracks the person’s biomarkers — chemical substances present in blood and different physique fluids — together with ldl cholesterol and lactate ranges, and is worn straight on the pores and skin. The initiative is led by researchers from the Nationwide College of Singapore (NUS) and authorities company A*STAR’s Institute of Supplies Analysis and Engineering (A*STAR’s IMRE).

The wearable affords a non-invasive solution to facilitate steady, real-time monitoring and early detection of illnesses, akin to cardiovascular illnesses and stroke. This addresses the restrictions of present strategies of gathering biofluid samples like blood, urine, and sweat.

Additionally: These medical IoT gadgets carry the largest safety dangers

The wearable can even monitor athletes’ lactate ranges to detect indicators of exhaustion and tissue hypoxia, which have an effect on their efficiency. 

“Detecting illnesses early requires the speedy, steady, and handy monitoring of significant biomarkers,” the researchers famous. “This growth is particularly pertinent to areas together with continual illness administration, population-wide screening, distant affected person monitoring, and sport physiology.”

The method of amassing biomarkers for evaluation may be inconvenient and miss real-time traits. Medical doctors might have to induce sweating with medication when amassing fluids from inactive people, which may be uncomfortable. 

Additionally: Absci and Memorial Sloan Kettering accomplice to seek for most cancers medication utilizing AI

Strong-state epidermal biomarkers have emerged instead type of well being indicators. Discovered within the stratum corneum, or outermost layer of the pores and skin, these biomarkers have proven to have correlations with illnesses akin to diabetes, hyperlipoproteinemia, and cardiovascular situations. 

Nonetheless, they’re troublesome to detect as conventional gadgets lack the required elements to trace solid-state epidermal biomarkers. 

The Singapore researchers stated their wearable permits these biomarkers to dissolve and diffuse through the ionic conductive hydrogel layer and bear electrochemical reactions between this ionic layer and an electronically conductive hydrogel layer. 

The required physiological knowledge is then transmitted wirelessly to an exterior person interface through a versatile printed circuit board, offering steady monitoring capabilities. 

Additionally: Samsung is making an attempt to beat Apple to those two life-changing well being monitoring options

In medical research, the sensor indicated correlations between biomarkers detected on the pores and skin and people present in blood samples, suggesting its accuracy, the researchers stated. It can also detect solid-state lactate and ldl cholesterol at low ranges. 

“This wearable sensor is the primary on this planet that may monitor biomarkers on dry or non-sweaty pores and skin,” stated Yang Le, principal scientist and head of A*STAR’s IMRE’s sensors and versatile electronics division. “The sensor’s bilayer hydrogel electrode interacts with and detects biomarkers on our pores and skin, permitting them to turn into a brand new class of well being indicators. The stretchable design enhances consolation and accuracy as effectively, by adapting to our pores and skin’s pure elasticity.”

Yang stated the wearable can change well being and way of life monitoring, significantly for these with continual situations that require fixed monitoring. 

For instance, it could actually probably exchange common finger-prick checks for diabetics, stated Liu Yuxin, an assistant professor from NUS’ Institute for Well being Innovation & Expertise and N.1 Institute for Well being. Liu added that it additionally may be utilized to glucose tolerance checks, so pregnant ladies will not have to be topic to a number of blood attracts.

Additionally: 3 methods AI is revolutionizing how well being organizations serve sufferers. Can LLMs like ChatGPT assist?

The Singapore researchers at the moment are working to reinforce the wearable’s efficiency by boosting its sensitivity and dealing time and together with help for different biomarkers. 

They’re additionally working with hospitals to supply extra medical validation and allow sufferers to trial the expertise, significantly for steady glucose monitoring.

In 2021, NUS researchers developed VeCare, a bandage that may detect and wirelessly ship medical data associated to continual wounds, akin to temperature and micro organism kind. Capable of seize and transmit knowledge in underneath quarter-hour, the wearable sensor can velocity up the evaluation of such wounds and supply extra well timed remedy. 

One other analysis staff from the Singapore college in 2021 additionally developed a solution to faucet the human physique as a conduit for power and energy wearables. The expertise can extract energy from a single gadget, akin to a cell phone within the wearer’s pocket, to wirelessly cost different wearables positioned on the physique. It might additionally pull unused power from digital home equipment in houses or places of work to energy wearables.



LongWriter: Unleashing 10,000+ Phrase Technology from Lengthy Context LLMs


Present long-context massive language fashions (LLMs) can course of inputs as much as 100,000 tokens, but they battle to generate outputs exceeding even a modest size of two,000 phrases. Managed experiments reveal that the mannequin’s efficient era size is inherently restricted by the examples seen throughout supervised fine-tuning (SFT). In different phrases, this output limitation stems from the shortage of long-output examples in present SFT datasets.

Current developments in long-context LLMs have led to the event of fashions with considerably expanded reminiscence capacities, able to processing historical past exceeding 100,000 tokens in size. Nonetheless, regardless of their skill to deal with in depth inputs, present long-context LLMs battle to generate equally prolonged outputs.

To discover this limitation, LongWriter probes the utmost output size of state-of-the-art long-context fashions with a number of queries that require responses of various lengths, akin to “Write a ten,000-word article on the historical past of the Roman Empire.” The outcomes present that each one fashions persistently fail to supply outputs past 2,000 phrases in size. In the meantime, evaluation of person interplay logs reveals that over 1% of person prompts explicitly request outputs exceeding this restrict, highlighting a urgent want in present analysis to beat this limitation.

To handle this, LongWriter introduces AgentWrite, an agent-based pipeline that decomposes ultra-long era duties into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 phrases. Leveraging AgentWrite, LongWriter constructs LongWriter-6k, a dataset containing 6,000 SFT knowledge samples with output lengths starting from 2k to 32k phrases. By incorporating this dataset into mannequin coaching, LongWriter efficiently scales the output size of present fashions to over 10,000 phrases whereas sustaining output high quality.

LongWriter additionally develops LongBench-Write, a complete benchmark for evaluating ultra-long era capabilities. The 9B parameter mannequin, additional improved by way of DPO, achieves state-of-the-art efficiency on this benchmark, surpassing even a lot bigger proprietary fashions.

On this article, we’ll talk about the LongWriter framework, discover its structure, and examine its efficiency towards state-of-the-art long-context massive language fashions. Let’s get began.

Current developments in lengthy context massive language fashions (LLMs) have led to the creation of fashions with considerably elevated reminiscence capacities, able to processing histories that exceed 100,000 tokens. Regardless of this skill to deal with in depth inputs, present long-context LLMs battle to generate outputs of comparable size. To analyze this limitation, LongWriter examines the utmost output size of state-of-the-art long-context fashions by way of numerous queries that require totally different response lengths, akin to “Write a ten,000-word article on the historical past of the Roman Empire.” Based mostly on the findings, LongWriter observes that each one fashions persistently fail to generate outputs longer than 2,000 phrases. Moreover, an evaluation of person interplay logs signifies that over 1% of person prompts particularly request outputs past this restrict, highlighting an pressing want in present analysis to handle this subject. 

LongWriter’s research reveals a key perception: the constraint on output size is primarily rooted within the traits of the Supervised Effective-Tuning (SFT) datasets. Particularly, LongWriter finds {that a} mannequin’s most era size is successfully capped by the higher restrict of output lengths current in its SFT dataset, regardless of its publicity to for much longer sequences in the course of the pretraining section. This discovering explains the ever present 2,000-word era restrict throughout present fashions, as present SFT datasets not often include examples exceeding this size. Moreover, as many datasets are distilled from state-of-the-art LLMs, additionally they inherit the output size limitation from their supply fashions.

To handle this limitation, LongWriter introduces AgentWrite, a novel agent-based pipeline designed to leverage off-the-shelf LLMs to mechanically assemble prolonged, coherent outputs. AgentWrite operates in two levels: First, it crafts an in depth writing plan outlining the construction and goal phrase rely for every paragraph primarily based on the person’s enter. Then, following this plan, it prompts the mannequin to generate content material for every paragraph in a sequential method. LongWriter’s experiments validate that AgentWrite can produce high-quality and coherent outputs of as much as 20,000 phrases.

Constructing upon the AgentWrite pipeline, LongWriter leverages GPT-4o to generate 6,000 long-output SFT knowledge, named LongWriter-6k, and provides this knowledge to coach present fashions. Notably, LongWriter-6k efficiently unlocks the mannequin’s skill to generate well-structured outputs exceeding 10,000 phrases in size. To carefully consider the effectiveness of this method, LongWriter develops the LongBench-Write benchmark, which incorporates a various set of person writing directions, with output size specs starting from 0-500 phrases, 500-2,000 phrases, 2,000-4,000 phrases, and past 4,000 phrases. Analysis on LongBench-Write exhibits that LongWriter’s 9B dimension mannequin achieves state-of-the-art efficiency, even in comparison with bigger proprietary fashions. LongWriter additional constructs choice knowledge and makes use of DPO to assist the mannequin higher comply with lengthy writing directions and generate increased high quality written content material, which has additionally been confirmed efficient by way of experiments.

To summarize, LongWriter’s work makes the next novel contributions:

  • Evaluation of Technology Size Limits: LongWriter identifies the first issue limiting the output size of present long-context LLMs, which is the constraint on the output size within the SFT knowledge.
  • AgentWrite: To beat this limitation, LongWriter proposes AgentWrite, which makes use of a divide-and-conquer method with off-the-shelf LLMs to mechanically assemble SFT knowledge with ultra-long outputs. Utilizing this technique, LongWriter constructs the LongWriter-6k dataset.
  • Scaling Output Window Dimension of Present LLMs: LongWriter incorporates the LongWriter-6k dataset into its SFT knowledge, efficiently scaling the output window dimension of present fashions to 10,000+ phrases with out compromising output high quality. LongWriter exhibits that DPO additional enhances the mannequin’s long-text writing capabilities.

AgentWrite: Automated Knowledge Development

To make the most of off-the-shelf LLMs for mechanically producing SFT knowledge with longer outputs, LongWriter designs AgentWrite, a divide-and-conquer type agent pipeline. AgentWrite first breaks down lengthy writing duties into a number of subtasks, with every subtask requiring the mannequin to put in writing just one paragraph. The mannequin then executes these subtasks sequentially, and LongWriter concatenates the subtask outputs to acquire the ultimate lengthy output. Such an method of breaking down a fancy job into a number of subtasks utilizing LLM brokers has already been utilized in numerous fields, akin to problem-solving, software program growth, and mannequin analysis. LongWriter’s work is the primary to discover integrating planning to allow fashions to finish complicated long-form writing duties. Every step of AgentWrite is launched intimately under.

Step I: Plan

Impressed by the thought strategy of human writers, who sometimes begin by making an total plan for lengthy writing duties, LongWriter makes use of the planning capabilities of LLMs to output such a writing define given a writing instruction. This plan contains the primary content material and phrase rely necessities for every paragraph. The immediate utilized by LongWriter is as follows:

“I would like you to assist me break down the next long-form writing instruction into a number of subtasks. Every subtask will information the writing of 1 paragraph within the essay and may embody the details and phrase rely necessities for that paragraph. The writing instruction is as follows: {Person Instruction}. Please break it down within the following format, with every subtask taking over one line:

Paragraph 1 – Important Level: [Describe the main point of the paragraph, in detail] – Phrase Rely: [Word count requirement, e.g., 400 words]
Paragraph 2 – Important Level: [Describe the main point of the paragraph, in detail] – Phrase Rely: [Word count requirement, e.g. 1000 words].

Be sure that every subtask is evident and particular, and that each one subtasks cowl your entire content material of the writing instruction. Don’t break up the subtasks too finely; every subtask’s paragraph ought to be at least 200 phrases and not more than 1000 phrases. Don’t output some other content material.”

Step II: Write

After acquiring the writing plan from Step I, LongWriter calls the LLM serially to finish every subtask, producing the writing content material part by part. To make sure the coherence of the output, when LongWriter calls the mannequin to generate the n-th part, the beforehand generated n−1 sections are additionally enter, permitting the mannequin to proceed writing the following part primarily based on the prevailing writing historical past. Though this serial method prevents parallel calls to the mannequin to finish a number of subtasks concurrently, and the enter size turns into longer, LongWriter exhibits in validation that the general coherence and high quality of the writing obtained this fashion are far superior to the output generated in parallel. The immediate in use by LongWriter is:

“You might be a wonderful writing assistant. I will provide you with an unique writing instruction and my deliberate writing steps. I can even give you the textual content I’ve already written. Please assist me proceed writing the following paragraph primarily based on the writing instruction, writing steps, and the already written textual content.

Writing instruction:
{Person Instruction}
Writing steps:
{The writing plan generated in Step I}
Already written textual content:
{Earlier generated (n-1) paragraphs}

Please combine the unique writing instruction, writing steps, and the already written textual content, and now proceed writing {The plan for the n-th paragraph, i.e., the n-th line within the writing plan}.”

Validation

LongWriter exams the era size and high quality of the proposed AgentWrite technique on two long-form writing datasets. The primary one, LongWrite-Ruler, is used to measure precisely how lengthy of an output the strategy can present. The second, LongBench-Write, is principally used to judge how properly the model-generated content material aligns with person directions by way of size and writing high quality.

LongBench-Write: To guage the mannequin’s efficiency on a extra numerous vary of long-form writing directions, LongWriter collects 120 various person writing prompts, with 60 in Chinese language and 60 in English. To raised assess whether or not the mannequin’s output size meets person necessities, LongWriter ensures that each one these directions embody specific phrase rely necessities. These directions are divided into 4 subsets primarily based on the phrase rely necessities: 0-500 phrases, 500-2,000 phrases, 2,000-4,000 phrases, and over 4,000 phrases. Moreover, the directions are categorized into seven sorts primarily based on the output kind: Literature and Artistic Writing, Educational and Monograph, Widespread Science, Practical Writing, Information Report, Group Discussion board, and Training and Coaching.

Throughout analysis, LongWriter adopts two metrics: one for scoring the output size and one other for scoring the output high quality. The mannequin’s output size is scored primarily based on how shut it’s to the necessities specified within the directions. For output high quality, LongWriter makes use of the LLM-as-a-judge method, deciding on the state-of-the-art GPT-4o mannequin to attain the output throughout six dimensions: Relevance, Accuracy, Coherence, Readability, Breadth and Depth, and Studying Expertise. The ultimate rating is computed by averaging the size rating and the standard rating.

Validation outcomes: LongWriter presents the output size measurement on LongWrite-Ruler and finds that AgentWrite efficiently extends the output size of GPT-4o from a most of 2k phrases to roughly 20k phrases. LongWriter additionally assesses each the output high quality and adherence to the required output size on LongBench-Write, displaying that GPT-4o can efficiently full duties with outputs below 2,000 phrases in size when evaluating AgentWrite’s efficiency.

Supervised Effective-Tuning

LongWriter conducts coaching primarily based on two of the newest open-source fashions, particularly GLM-4-9B and Llama-3.1-8B. Each of those are base fashions and assist a context window of as much as 128k tokens, making them naturally appropriate for coaching on lengthy outputs. To make the coaching extra environment friendly, LongWriter adopts packing coaching with loss weighting. The coaching on the 2 fashions leads to two fashions: LongWriter-9B (abbreviated for GLM-4-9B-LongWriter) and LongWriter-8B (abbreviated for Llama-3.1-8B-LongWriter).

On the similar time, LongWriter notices that if the loss is averaged by sequence, i.e., taking the imply of every sequence’s common loss inside a batch, the contribution of every goal token to the loss in lengthy output knowledge could be considerably lower than these with shorter outputs. In LongWriter’s experiments, it is usually discovered that this results in suboptimal mannequin efficiency on duties with lengthy outputs. Due to this fact, LongWriter chooses a loss weighting technique that averages the loss by token, the place the loss is computed because the imply of losses throughout all goal tokens inside that batch.

All fashions are educated utilizing a node with 8xH800 80G GPUs and DeepSpeed+ZeRO3+CPU offloading. LongWriter makes use of a batch dimension of 8, a studying price of 1e-5, and a packing size of 32k. The fashions are educated for 4 epochs, which takes roughly 2,500-3,000 steps.

Alignment (DPO)

To additional enhance the mannequin’s output high quality and improve its skill to comply with size constraints in directions, LongWriter performs direct choice optimization (DPO) on the supervised fine-tuned LongWriter-9B mannequin. The DPO knowledge comes from GLM-4’s chat DPO knowledge (roughly 50k entries). Moreover, LongWriter constructs 4k pairs of knowledge particularly focusing on long-form writing directions. For every writing instruction, LongWriter samples 4 outputs from LongWriter-9B and scores these outputs following a selected technique. A length-following rating can also be mixed as computed. The best-scoring output is then chosen because the optimistic pattern, and one of many remaining three outputs is randomly chosen because the damaging pattern.

The ensuing mannequin, LongWriter-9B-DPO, is educated for 250 steps on the above knowledge combination. LongWriter follows a selected recipe for DPO coaching.

LongWriter: Experiments and Outcomes

LongWriter evaluates 4 proprietary fashions and 5 open-source fashions on LongBench-Write, together with the educated LongWriter fashions. To one of the best of LongWriter’s information, Suri-IORPO is the one prior mannequin that can also be aligned for long-form textual content era. It’s educated primarily based on Mistral-7B-Instruct-v0.2 utilizing LoRA. In keeping with the analysis setup on LongWrite-Ruler, LongWriter units the output temperature to 0.5 and configures the mannequin’s era max tokens parameter to the utmost allowed by its API name. For open-source fashions, it’s set to 32,768.

Most earlier fashions are unable to fulfill the size requirement of over 2,000 phrases, whereas LongWriter fashions persistently present longer and richer responses to such prompts. 

Observing the output size rating SlS_lSl​ for prompts in every required size vary, LongWriter finds that earlier fashions usually carry out poorly (scoring under 70) on prompts within the [2k, 4k) range, with only Claude 3.5 Sonnet achieving a decent score. For prompts in the [4k, 20k) range, almost all previous models are completely unable to reach the target output length, even scoring 0 (meaning all output lengths are less than one-third of the required length). By adding training data from LongWriter-6k, LongWriter’s trained model can effectively reach the required output length while maintaining good quality, as suggested by the​ scores in the [2k, 20k) range and the scatter plots.

DPO effectively improves both the model’s output quality and its ability to follow length requirements in long generation. 

By comparing the scores of LongWriter-9B and LongWriter9B-DPO, we find that DPO significantly improves both Sl (+4%) and Sq (+3%) scores, and the improvement is consistent across all ranges. This shows that in long generation scenario, DPO still helps to improve the model’s output quality and can better align the model’s output length with 8 Preprint Figure 7: Cumulative average NLL loss of GLM4-9B and Llama-3.1-8B at different positions of LongWriter models’ outputs. Figure 8: LongWrite-Ruler test results of LongWriter models, showing their maximum generation lengths between 10k-20k words. the requested length. The latter conclusion has also been recently observed in Yuan et al. (2024) in shorter generations. We also manually annotate pairwise wins and losses for GPT-4o and three longwriter models on their outputs in LongBench-Write and visualize the results in Figure 9. We can see that humans prefer the DPO-trained model over LongWriter-9B in 58% of the cases. Moreover, despite having fewer parameters, LongWriter-9B-DPO achieves a tie with GPT-4o. 

The output length limit of the LongWriter models is extended to between 10k and 20k words, while more data with long outputs is required to support even longer outputs. 

Following the LongWrite-Ruler tes,we also present the LongWrite-Ruler test results of LongWriter models. The results suggest that their maximum generation lengths are between 10k-20k words. The lack of SFT data with longer outputs is likely the primary reason preventing the model from achieving longer output lengths. 

 

Final Thoughts

In this work, we have talked about LongWriter, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, identifies a 2,000-word generation limit for current LLMs and proposes increasing their output window size by adding long-output data during alignment. To automatically construct long-output data, LongWriter develops AgentWrite, an agent-based pipeline that uses off-the-shelf LLMs to create extended, coherent outputs. LongWriter successfully scales the output window size of current LLMs to over 10,000 words with the constructed LongWriter-6k. Extensive ablation studies on the training data demonstrate the effectiveness of this approach. For future work, LongWriter suggests the following three directions: 1. Expand the AgentWrite framework to construct data with longer outputs to further extend LLMs’ output window size. 2. Refine the AgentWrite framework to achieve higher quality long-output data. 3. Longer model outputs bring challenges to inference efficiency. Several methods have been proposed to improve inference efficiency. It is worth investigating how these methods can ensure improved model efficiency without compromising generation quality.

Xiaomi’s self-optimizing autonomous manufacturing facility will make 10M+ telephones a yr

0


The 80,000-square-meter (860,000-sq-ft) facility, positioned within the Changping district on the northeast outskirts of Beijing, follows a pilot good manufacturing facility in Yizhuang, which produced about one million items a yr of the corporate’s Combine Fold smartphone.

“There are 11 manufacturing strains,” says Xiaomi Founder and CEO Lei Jun in a brief video, embedded under. “100% of the important thing processes are automated. We developed our complete manufacturing and manufacturing software program to attain this.”

The brand new manufacturing facility will produce Xiaomi’s upcoming foldable telephones, the MIX Fold 4 and the MIX Flip – to be launched later this month – spitting them out at a charge of about one each three seconds, 24/7.

In line with GizmoChina, it maintains a very clear manufacturing surroundings, with micron-level mud elimination – which might clarify the operating-theater-style robes and caps worn by the small variety of people working within the ‘battle room’ of the Good Manufacturing facility in a second video, proven under.

Completely automated “lights out” factories, after all, have been round a short time. Japanese robotics firm Fanuc Ltd, for instance, opened its first totally automated line again in 2001, and in keeping with CNN Cash, by 2003 it had a manufacturing facility close to Mt Fuji through which robots have been constructing different robots, round 50 a day, working completely unsupervised for as much as a month at a time.

However Xiaomi might have taken issues up a notch, by permitting the AI mind of the manufacturing facility to autonomously develop and optimize its processes over time.

“What’s most spectacular,” says Lei Jun, “is that this platform can establish and remedy points, whereas additionally serving to to enhance the manufacturing course of. That is actually unbelievable! This platform blew our colleagues away once they first noticed it.”

A Weibo publish expands on this, one part translating roughly to the next: “the 100% self-developed ‘Xiaomi Pengpai Clever Manufacturing Platform’ is the mind of the manufacturing facility, injecting soul into the manufacturing facility, permitting your entire manufacturing facility to have self-perception, self-decision-making, and self-execution capabilities, and might independently diagnose gear issues, enhance course of flows, and understand full-scenario digital administration from uncooked materials procurement to supply, turning into a real good manufacturing facility that may evolve by itself.”

Fairly outstanding stuff, and positively a style of what is coming worldwide.

Supply: Lei Jun



You May Be Exposing Your Household to Superbugs – NanoApps Medical – Official web site


Analysis signifies hospitals contribute to the native unfold of antibiotic-resistant infections.

A current research printed within the journal An infection Management & Hospital Epidemiology by the Society for Healthcare Epidemiology of America means that relations of sufferers discharged from hospitals could also be at an elevated threat of contracting antibiotic-resistant infections, generally known as superbugs. This threat persists even when the affected person themselves was not recognized with such an an infection, indicating that hospitals could contribute to the unfold of resistant micro organism locally.

When lately hospitalized sufferers have been recognized with the superbug — Methicillin-resistant Staphylococcus aureus an infection (MRSA) — the danger to relations residing with them was even greater. The longer the relative’s hospital keep, even with out a MRSA analysis, the upper the danger to relations.

“Sufferers can turn out to be colonized with MRSA throughout their hospital keep and transmit MRSA to their family members,” mentioned Aaron Miller, PhD, lead researcher on the research and analysis assistant professor of inner medicine-infectious illnesses on the College of Iowa. “This implies hospitals contribute to the unfold of MRSA into the neighborhood by discharged sufferers who’re asymptomatic carriers.”

Suggestions for Hospitals

Miller recommends hospitals improve an infection management practices, together with testing for MRSA colonization, particularly at discharge, even with there aren’t any signs of an infection. He mentioned MRSA colonization and infections may very well be tracked amongst hospital sufferers and their family contacts to determine and mitigate transmission extra successfully.

“This essential research illustrates the danger of unfold of resistant pathogens associated to healthcare and highlights the important significance of core an infection practices,” mentioned SHEA President Thomas Talbot, M.D., chief hospital epidemiologist at Vanderbilt College Medical heart. Talbot was not concerned with the analysis. “Hand hygiene, environmental cleansing, and normal interventions to cut back Staphylococcal colonization are essential to stopping the unfold of resistant micro organism in healthcare settings.”

Understanding MRSA

MRSA infections are referred to as superbugs as a result of they don’t reply to widespread antibiotics, making them troublesome to deal with. MRSA usually happens in individuals who have been in a hospital or one other healthcare setting, akin to a nursing dwelling, however MRSA additionally spreads in communities exterior the hospital, normally by skin-to-skin contact. Most individuals with MRSA haven’t any signs, however the micro organism could cause painful swelling if it will get underneath the pores and skin, and it may be lethal if it spreads to different elements of the physique, akin to blood or lungs.

Researchers used a big database of insurance coverage claims that included 158 million enrollees with two or extra relations on the identical plan to find out about how MRSA unfold after somebody in a family had been within the hospital.

Reviewing 424,512 MRSA instances amongst 343,524 insured individuals, the research discovered 4,724 instances of MRSA being doubtlessly transmitted to a member of the family from a relative who had lately been within the hospital and had a analysis of MRSA. Additionally they discovered 8,064 potential transmissions of MRSA after the hospitalization of a member of the family who didn’t have an MRSA an infection.

“It is necessary to not over-emphasize the hospital keep threat,” Miller mentioned. “Whereas we recognized a big threat issue for transmission within the family and neighborhood absolutely the threat stays comparatively low.”

Folks uncovered to a lately hospitalized member of the family with MRSA have been greater than 71 instances, or 7000%, extra more likely to get an MRSA an infection in comparison with enrollees who didn’t have a member of the family who had been hospitalized or uncovered to MRSA within the earlier 30 days.

Having a member of the family within the family who was hospitalized however didn’t have MRSA elevated the probabilities of a relative getting MRSA within the month after discharge by 44%.

The extra time the member of the family spent within the hospital, the upper the chance that somebody of their family would get MRSA. If the affected person was within the hospital for one to a few days within the earlier month, the prospect of a relative getting MRSA elevated by 34% in comparison with individuals with no current hospitalizations of their family. If a member of the family was hospitalized for 4 to 10 days, the probabilities of MRSA an infection in a relative have been 49% greater, and with hospitalizations longer than 10 days the chances of a relative in the identical family getting an an infection rose by 70% to 80%.

Different elements related to MRSA infections amongst family members included the variety of different sicknesses, prior antibiotic utilization, and the presence of younger kids within the household.

Reference: “Hospitalizations amongst relations improve the danger of MRSA an infection in a family” by Aaron C. Miller, Alan T. Arakkal, Daniel Okay. Sewell, Alberto M. Segre, Bijaya Adhikari, Philip M. Polgreen and For The CDC MInD-Healthcare Group, 7 August 2024, An infection Management & Hospital Epidemiology.
DOI: 10.1017/ice.2024.106