Home Blog Page 3

Scalable ML & Information Testing


In as we speak’s digital platforms-from purchasing apps and streaming companies to well being trackers and buyer portals-machine studying is central to how methods personalize experiences, automate selections, and reply to person actions. However irrespective of how superior a mannequin is, it may possibly fail if the info feeding it isn’t dependable.

Creator: Naga Harini Kodey, https://www.linkedin.com/in/naga-harini-k-3a84291a/

With fixed streams of person interactions-clicks, swipes, logins, transactions, and events-flowing by way of these methods, sustaining information accuracy turns into a foundational requirement. Damaged information pipelines, inconsistent function values, and unmonitored modifications can result in silent failures. These failures typically go unnoticed till person satisfaction drops or key enterprise metrics take a success.

As a Principal QA Engineer, I’ve collaborated carefully with engineers, analysts, and information scientists to check machine studying pipelines end-to-end. This text outlines sensible QA methods and hands-on methods that may be utilized throughout platforms pushed by real-time or batch person information, serving to groups stop points earlier than they affect manufacturing.

The place Issues Go Fallacious in ML Pipelines for Consumer Programs

Consumer-driven platforms acquire information from a variety of sources-web exercise, cell apps, sensor inputs, and exterior APIs. As this information flows by way of ingestion, transformation, and mannequin scoring, there are a number of widespread failure factors:

  • Lacking fields in logs → Instance: System sort or session ID not logged constantly throughout cell and internet.
  • Inconsistent occasion naming → Instance: checkoutInitiated modified to checkout_initiated, breaking downstream dependencies.
  • Unrealistic or incorrect values → Instance: Session time reveals zero seconds or logs a person clicking 200 occasions in a second.
  • Code modifications with out validation → Instance: Characteristic transformation logic up to date with out verifying downstream mannequin compatibility.
  • Mismatch in coaching vs. manufacturing → Instance: Fashions educated on curated information however deployed on noisy, real-world inputs.
  • Check site visitors contaminating dwell information → Instance: Automated testing scripts inadvertently included in manufacturing metrics.
  • Damaged suggestions loops → Instance: Retraining logic relies on a sign that silently stops firing.

These issues typically degrade efficiency subtly-skewing suggestions or altering person flows-making them more durable to detect with out focused validation.

Scalable ML & Information Testing

Testing Methods That Work in Observe

Every stage of the pipeline-from uncooked occasion seize to function transformation to mannequin output-presents a novel testing alternative. Right here’s a breakdown of sensible methods:

1. Begin on the Supply: Uncooked Information Validation

Widespread points: Lacking timestamps, corrupted machine IDs, inconsistent information codecs.

How one can check it:

  • Construct schema validators utilizing instruments like Nice Expectations or Cerberus.
  • Set automated thresholds for lacking values (e.g., alert if >5% of user_id fields are null).
  • Monitor ingestion volumes over time; flag sudden drops/spikes in key occasions.

Instance Implementation:

python –

assert occasion[‘timestamp’] is just not None

assert isinstance(occasion[‘device_id’], str)

2. Confirm Characteristic Logic

Widespread points: Incorrect logic in options like session length, or loyalty rating.

How one can check it:

  • Write unit checks for transformation capabilities utilizing identified pattern inputs.
  • Outline worth bounds or anticipated distributions (e.g., session length shouldn’t be > 12 hours).
  • Embrace logging checkpoints to confirm computed values at every stage.

Guidelines Tip: Create a function contract doc itemizing every function, supply columns, transformation steps, and check circumstances.

3. Look ahead to Coaching vs. Manufacturing Drift

Widespread points: Characteristic values differ between coaching and manufacturing environments.

How one can check it:

  • Run statistical comparability (e.g., KS check or PSI) between offline coaching information and dwell enter information.
  • Add a nightly job to check means, medians, and ranges of energetic options.
  • Visualize function drift on dashboards to trace gradual degradation.

Alert Instance: “Characteristic X imply has shifted from 0.2 to 0.45 over the previous 7 days.”

4. Lock Down Enter and Output Expectations

Widespread points: Schema mismatches, renamed fields, or lacking inputs trigger the mannequin to misbehave.

How one can check it:

  • Use golden input-output pairs as regression circumstances in your CI pipelines.
  • Add an enter validation layer that enforces construction, information varieties, and presence of required fields.
  • Log and evaluate mannequin output distributions throughout variations.

Observe Tip: All the time pin a “canary” check with a identified document that ought to give a set prediction rating.

5. Monitor for Silent Failures

Widespread points: All the pieces runs, however person engagement or conversions drop unexpectedly.

How one can check it:

  • Construct dashboards for monitoring scoring quantity, function completeness, and mannequin predictions.
  • Cross-check enter function presence day by day and evaluate it with coaching schema.
  • Arrange anomaly detection on output KPIs (conversion price, engagement price).

Instance: “If purchase_probability output from the mannequin drops by 30% over 3 days, flag it for investigation.”

Finest Practices for Testing ML Pipelines

  • Check early, check small: Validate information earlier than it hits your transformation logic.
  • Create edge circumstances: Deliberately go invalid or boundary values to check mannequin resilience.
  • Monitor and model every part: Keep lineage for datasets, options, and scripts.
  • Automate regression checks: Each mannequin launch ought to be backed by automated state of affairs validation.
  • Collaborate throughout capabilities: QA, information science, product, and engineering ought to evaluation pipelines collectively.
  • Make failures seen: Spend money on real-time alerting and dashboards. Fewer surprises = higher outcomes.

Conclusion

For platforms pushed by person interplay, machine studying can’t succeed with out reliable information. When pipelines break silently, the affect hits person expertise, retention, and income. Testing these methods must be proactive, systematic, and tailor-made to real-world circumstances.

Scalable check protection ensures each component-from information ingestion to mannequin scoring-holds up beneath stress. By specializing in root-level information integrity and transformation validation, QA groups grow to be vital gatekeepers of efficiency and reliability.

Testing isn’t nearly catching bugs-it’s about safeguarding the intelligence behind your platform.

References / Additional Studying

Concerning the Creator

Naga Harini Kodey is a Principal QA Engineer with over 15 years of expertise in automation, information high quality, and machine studying validation. She focuses on testing AdTech information pipelines and ML workflows, builds check frameworks, and a worldwide speaker on QA methods, information testing and end-to-end machine studying system assurance.

How a Ardour for Languages Led to a International Profession at Cisco


“¿Qué ondas?”

That’s a well-recognized greeting in El Salvador, the place I’m from, a land the place surf and occasional mix harmoniously.Students in a classroom use laptops while an instructor in a blue shirt stands near a whiteboard with a diagram, in a room with large windows, a wooden floor, a blue checkmark in the top right, and a cloud icon in the bottom right.

Once I was rising up there as a child, my grandma requested me if I needed to take summer time courses. She gave me three decisions: drawing, math, or English courses. On the time, I felt like I used to be ok in math. I used to be extra into music than drawing. So, she paid for my summer time English courses. Little did I do know that her humble pension and that call would change my life ceaselessly.

After that first summer time English class, I developed a ardour for international languages, changing into fluent in French, Portuguese, and English, and studying some German, along with my native Spanish. I went to courses within the early mornings and labored into the late nights — this was my life for just a few years, finding out and dealing on the identical time. Weekends have been reserved for rehearsing with my rock band and educating English and French at an academy. Years later, I obtained my MBA in Logistics.

A group of five people in athletic wear, posing cheerfully outdoors after a race, wearing numbered bibs and smiling with thumbs up.I believed I used to be headed towards a way forward for educating, however after I was not chosen for a educating function, it led me to use for and get an engineer function with a Canadian firm, a Cisco accomplice. The humorous factor is that if that hadn’t occurred, I most likely wouldn’t be working at Cisco at the moment. I consider life has a method of providing you with what’s meant to be yours.

Somebody inside Cisco advised me a couple of French Consulting Engineer function at Cisco Poland, and whereas I used to be excited for the chance, shifting to Europe was a giant resolution. I had every thing at house — my household, associates, band — nevertheless it felt just like the time to problem myself, evolve, and fly. I nonetheless keep in mind taking part in Eminem’s “Lose Your self” earlier than my ultimate interview, understanding this could possibly be a once-in-a-lifetime alternative. Coming to work for Cisco meant the possibility to make an affect in a prime tech firm.

Once I first moved to Poland, I didn’t know many individuals, so I introduced a bit of little bit of house to Cisco, organizing soccer video games to have enjoyable and join with folks. It helped me, as a newcomer, combine by way of sport with the worldwide group. Since then, I’ve organized singing and occasional workshops and know-how occasions every time I can to construct bridges, join folks, and improve our office expertise. Even after three years,A person stands on a street in Paris, taking a photo of the Eiffel Tower on a cloudy day, surrounded by classic Parisian buildings. Cisco continues to amaze me with its tradition, inclusion, and all of the methods we will convey ourselves into our work.

I started at Cisco as a consulting engineer the place I labored in design, implementation, and consultancy of Cisco options. Over three unbelievable years, I’ve gained a variety of expertise, transitioned to safety, and am tremendous grateful to have acquired two promotions. I’m now a senior consulting engineer of safety and collaboration applied sciences the place I work with monetary establishments to offer community safety and videoconferencing.

This 12 months, I launched into a enterprise journey to France, aiding with videoconferencing deployment and design for a number one monetary establishment, and my life got here full circle in a method I by no means may have imagined. I’m nonetheless educating, however now sharing my data with prospects and colleagues on a worldwide scale. The general public talking abilities I honed singing and taking part in dwell with my band ready me for presenting options to executives on-site and interacting with them in these convention rooms in Paris. I by no means imagined that my ardour for languages would lead me to a profession in know-how, not to mention impactful moments like these at an organization like Cisco. And none of this might have been attainable with out my grandma and people summer time English courses. Visiting Paris made me understand how lucky I’m.

My profession path up to now has taught me that being open to alternatives, dedicating your self to lifelong studying, and being resilient to challenges are the keys to success. In the event you embrace this, I’ve little question you’ll not solely ‘make it,’ you’ll thrive, identical to I’ve at Cisco!

Are you able to thrive at Cisco? Study extra and discover alternatives now.

Subscribe to the WeAreCisco Weblog.

Share:

NVIDIA Points Hotfix for GPU Driver’s Overheating Situation

0


Yesterday NVIDIA rushed out a important hotfix to comprise the fallout from a previous driver launch that had triggered alarm throughout AI and gaming communities by inflicting techniques to falsely report protected GPU temperatures – whilst cooling calls for quietly climbed towards probably important ranges.

In NVIDIA’s official put up across the hotfix launch, although solely third within the checklist of acknowledged fixes, the problem is cited as ‘GPU monitoring utilities could cease reporting the GPU temperature after PC wakes from sleep’.

Shortly after the affected Sport Prepared driver 576.02 was rolled out, a pinned thread on the Steady Diffusion sub-Reddit, titled Learn to Save Your GPU!, grew to become a useful resource for anecdotal points and user-reported updates in regards to the new driver. From these, and different reviews across the net, some time-line of emergent issues may be established.

The primary Reddit report of the bug appears to have occurred late Friday afternoon UTC, on the ZephyrusG14 subreddit, the place the person fricy81 cited a put up at NVIDIA boards (archived):

A user at NVIDIA forums finds issues after the 576.02 update. Source: https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/563010/geforce-grd-57602-feedback-thread-released-41625/3524072/

A person at NVIDIA boards finds points after the 576.02 replace. Supply: https://www.nvidia.com/en-us/geforce/boards/game-ready-drivers/13/563010/geforce-grd-57602-feedback-thread-released-41625/3524072/

The person at NVIDIA boards reported that after putting in the driving force replace, instruments like MSI Afterburner and in-game screens such because the one in Name of Obligation (which usually entry native system readings, a lot as Process Supervisor’s GPU panel does in Home windows) stopped updating GPU temperature readings, freezing at round 35-36°C.

Restarting the monitoring software program had no impact, the person acknowledged, and solely a full system reboot would restore correct readings. Instruments like HWInfo and NVIDIA’s personal monitoring app continued to report temperatures appropriately. The person emphasised that the problem occurred throughout regular use, not simply after waking the system from sleep.

Person suggestions throughout numerous boards highlighted a common disruption of regular fan curve habits and an alteration of core thermal regulation, leading to graphics processing items idling at unexpectedly excessive temperatures, and alarmingly overheating below what would sometimes be thought-about customary operational hundreds, as detailed on this remark:

‘I may inform one thing was off. The climate exterior was most likely round 55°F / 12°C, however I used to be cooking alive in my room. My window was open, and but I couldn’t really feel any distinction. All of the followers have been working at max, and temps regarded wonderful at first—round 68°C to 72°C after gaming for some time.

‘At first, that appeared regular—till the following morning, after I realized these aren’t idle temps, and the followers have been nonetheless [kicking].

‘I had executed some AI overclocking after fixing a couple of issues these days, so I wasn’t certain if the values had simply spiked too excessive. It’s occurred as soon as earlier than after putting in ASUS AI Suite 3 – the BIOS settings wouldn’t even work correctly due to it.

‘Anyway, I went forward and rolled again to an older driver for now.’

Sub-Optimum

The official launch PDF for the 576.02 driver replace provides some clues about adjustments which will have contributed to the brand new points. In part 5.5, NVIDIA acknowledges that GPU temperature may be reported incorrectly on NVIDIA Optimus techniques, particularly exhibiting zero levels when no purposes are working.

Section 5.5 of the official 576.02 update notes addresses temperature-monitoring issues that seem to have affected a wider number of systems than the Optimus system. Source: https://us.download.nvidia.com/Windows/576.02/576.02-win11-win10-release-notes.pdf

Part 5.5 of the official 576.02 replace notes addresses temperature-monitoring points that appear to have affected a wider variety of techniques than the Optimus system. Supply: https://us.obtain.nvidia.com/Home windows/576.02/576.02-win11-win10-release-notes.pdf

The discharge states:

5.5 GPU Temperature Reported Incorrectly on Optimus Techniques

5.5.1 Situation

On Optimus techniques, temperature-reporting instruments reminiscent of Speccy or GPU-Z report that the NVIDIA GPU temperature is zero when no purposes are working.

5.5.2 Rationalization

On Optimus techniques, when the NVIDIA GPU isn’t getting used then it’s put right into a low-power state. This causes temperature-reporting instruments to return incorrect values. Waking up the GPU to question the temperature would end in meaningless measurements as a result of the GPU temperature change because of this.

These instruments will report correct temperatures solely when the GPU is awake and working.

NVIDIA Optimus is a GPU switching know-how that toggles between built-in and discrete graphics primarily based on utility calls for, as a way to mechanically steadiness efficiency and energy consumption, designed to preserve battery life and scale back energy consumption. For duties reminiscent of gaming or HD video playback, Optimus prompts the discrete GPU for higher efficiency; throughout lighter actions reminiscent of net shopping, it reverts to built-in (onboard) graphics.

The replace seems to have prolonged a habits beforehand restricted to Optimus techniques, permitting the affected GPU to enter a low-power state whereas idle, even when not hosted on an Optimus system, in flip disrupting temperature reporting in third-party instruments.

Danger Adjustment

In most eventualities, it’s honest to say that the graphics card’s VBIOS would probably have prevented everlasting GPU injury. VBIOS enforces thermal and energy limits on the firmware degree, independently of the driving force.

Due to this fact even when a driver have been to trigger improper fan habits or misreport temperatures, the VBIOS ought to nonetheless throttle efficiency, ramp up fan exercise, or else shut down the GPU to forestall {hardware} failure.

That doesn’t imply the danger was trivial – sustained excessive temperatures can degrade efficiency over time or stress adjoining parts; moreover, absent a standard understanding that an up to date driver brought about an issue (not least in techniques the place drivers replace ‘silently’), a problem of this nature may mislead a big proportion of affected customers, who could try treatments for non-existent issues, and even probably trigger injury to their techniques by making use of non-relevant ‘fixes’.

The errant habits attributable to replace 576.02 was significantly alarming for these engaged in synthetic intelligence workflows, the place high-performance {hardware} is routinely pushed to its thermal limits for prolonged durations.

The problematic 576.02 driver impressed a broader rash of complaints after its launch in mid-April, regardless of preliminary reviews that it supplied some helpful efficiency enhancements. However the supply of the hotfix, and the extent of disruption that 576.02 appears to have brought about, on the time of writing it stays obtainable for obtain* at NVIDIA’s web site.

Afterglow

When it comes to the fallout from the defective replace, there are quite a few varieties of injury and or inconvenience reported: person Frankie_T9000 reported that his GPU crashed on boot on account of warmth buildup below the fault replace, and solely stabilized after undervolting. He commented ‘appears to be like like its not completely harmed however have to repaste asap (I’ve pads coming wednesday) suspect the outdated thermal paste was aged extra by the warmth buildup so im placing new paste pads.

Yesterday one other person in the identical thread acknowledged: ‘Im utilizing a customized fan curve wit msi afterburner, and it saved exhibiting that my gpu temps have been consistently at 27°C, so the followers did not activate, which led to overheating points. I assumed it was a me difficulty however after putting in the earlier driver all of it labored out wonderful once more. Additionally, the temps arent displayed appropriately in taskmanager.’

Although NVIDIA (because it states persistently in every hotfix launch) usually gives hotfixes for explicit video-games or platforms, the danger of warmth injury to or round a GPU is increased for AI practitioners than for videogamers, since intensive machine studying processes reminiscent of coaching or sustained inference place a GPU below constant long-term load – an occasion prone to be triggered solely periodically in a recreation, which can ‘spike’ into excessive utilization for a boss-battle or a very demanding map part, however which is in any other case designed as a compromise between GPU exploitation and system stability.

 

* Archive: https://archive.ph/ylVR1

First printed Tuesday, April 22, 2025

ios – SwiftUI Picture Sharing: Screenshot doesn’t load on first sharing try, however masses afterwards


I’ve a easy solitaire recreation the place I wish to share a screenshot of successful arms by way of the share sheet. I seize the screenshot effective, however after I go to share (primarily testing by way of Gmail), the primary time the share sheet for Gmail pops up, the picture is just not connected. If I shut the Gmail message and take a look at once more, it masses the picture. Is there some kind of delay I must account for within the rendering course of? I’ve learn a bunch of different threads which have every had components of the identical concern I’ve, however nothing conclusive, and I’ve not been capable of repair it. ShareLink seems to be like it could be a very good choice, however I can’t appear to get it to work, because it crashes my preview each time. Right here is my snapshot (screenshot) operate:

extension WinningHandView {
func snapshot(origin: CGPoint = .zero, measurement: CGSize = .zero) -> UIImage {
    let controller = UIHostingController(rootView: self)
    let view = controller.view

    let targetSize = measurement == .zero ? controller.view.intrinsicContentSize : measurement
    view?.backgroundColor = .clear
    view?.bounds = CGRect(origin: origin, measurement: targetSize)

    let renderer = UIGraphicsImageRenderer(measurement: targetSize)

    return renderer.picture { _ in
        view?.drawHierarchy(in: controller.view.bounds, afterScreenUpdates: true)
    }
  }

}

Here is the Button I am utilizing to launch the Share sheet:

Button("Share", motion: {
                        let picture = self.snapshot()
                        //let sharingImage = Picture(uiImage: picture).renderingMode(.unique)*/
                        
                        let activityVC = UIActivityViewController(activityItems: [image], applicationActivities: nil)
                                    let _: Void? = UIApplication.shared.connectedScenes.map({ $0 as? UIWindowScene }).compactMap({ $0 }).first?.home windows.first?.rootViewController?.current(activityVC, animated: true, completion: nil)})

It looks like some kind of race situation/delay wanted; if that was the case, how would I implement that? I’ve solely been working with SwiftUi for a few weeks, so nonetheless very a lot studying the ropes.

Agentic AI at Glean with Eddie Zhou


Glean is a office search and information discovery firm that helps organizations discover and entry data throughout varied inside instruments and knowledge sources. Their platform makes use of AI to supply personalised search outcomes to help members of a corporation in retrieving related paperwork, emails, and conversations. The rise of LLM-based agentic reasoning programs now presents new alternatives to construct superior performance utilizing a corporation’s inside knowledge.

Eddie Zhou is a founding engineer at Glean and beforehand labored at Google. He joined Sean Falconer to debate the engineering and design concerns round constructing agentic tooling to reinforce productiveness and decision-making.

 

Sean’s been an educational, startup founder, and Googler. He has revealed works protecting a variety of matters from AI to quantum computing. At present, Sean is an AI Entrepreneur in Residence at Confluent the place he works on AI technique and thought management. You’ll be able to join with Sean on LinkedIn.

Sponsors

This episode is sponsored by Mailtrap – an E mail Platform builders love.

Go for top deliverability, trade finest analytics, and dwell 24/7 assist.

Get 20% off for all plans with our promo code SEDAILY.
Try Mailtrap.io to enroll.

Builders, we’ve all been there… It’s 3 AM and your telephone blares, jolting you awake. One other alert. You scramble to troubleshoot, however the complexity of your microservices atmosphere makes it practically inconceivable to pinpoint the issue shortly.

That’s why Chronosphere is on a mission that will help you take again management with Differential Prognosis, a brand new distributed tracing function that takes the guesswork out of troubleshooting. With only one click on, DDx routinely analyzes all spans and dimensions associated to a service, pinpointing the most definitely reason for the problem.

Don’t let troubleshooting drag you into the early hours of the morning. Simply “DDx it” and resolve points quicker.

See why Chronosphere was named a pacesetter within the 2024 Gartner Magic Quadrant for Observability Platforms at chronosphere.io/sed.

This episode of Software program Engineering Day by day is dropped at you by Capital One.

How does Capital One stack? It begins with utilized analysis and leveraging knowledge to construct AI fashions. Their engineering groups use the ability of the cloud and platform standardization and automation to embed AI options all through the enterprise. Actual-time knowledge at scale permits these proprietary AI options to assist Capital One enhance the monetary lives of its clients. That’s expertise at Capital One.

Be taught extra about how Capital One’s fashionable tech stack, knowledge ecosystem, and utility of AI/ML are central to the enterprise by visiting www.capitalone.com/tech.