lmorchard - Pebbling Club 🐧🪨

The 70% problem: Hard truths about AI-assisted coding

addyo.substack.com
2024-12-07T02:51:46.681Z
dev webdev ai llm genai ml

Notes

AI isn't making our software dramatically better because software quality was (perhaps) never primarily limited by coding speed. The hard parts of software development – understanding requirements, designing maintainable systems, handling edge cases, ensuring security and performance – still require human judgment. What AI does do is let us iterate and experiment faster, potentially leading to better solutions through more rapid exploration. But only if we maintain our engineering discipline and use AI as a tool, not a replacement for good software practices. Remember: The goal isn't to write more code faster. It's to build better software. Used wisely, AI can help us do that. But it's still up to us to know what "better" means and how to achieve it.

Feed

Unfurl

{ "expires": 1733626301838, "status": 200, "bodyLength": 284765, "author": "Addy Osmani", "date": "2024-12-04T19:12:33.000Z", "description": "A field guide and why we need to rethink our expectations", "image": "https://substackcdn.com/image/fetch/w_1200,h_600,c_fill,f_jpg,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e49ab22-0fac-4959-afa6-b6e226056db4_6072x6072.jpeg", "logo": "https://t3.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=https://addyo.substack.com/p/the-70-problem-hard-truths-about&size=128", "publisher": "Elevate", "title": "The 70% problem: Hard truths about AI-assisted coding", "url": "https://addyo.substack.com/p/the-70-problem-hard-truths-about", "feed": "https://addyo.substack.com/feed", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 2555 }
heaversm/llamafile-code-completion: Use llamafile to generate inline code completions in react / next.js apps.

github.com
2024-12-04T22:45:14.814Z
ai ml llm genai dev coding llamafile

Notes

Use llamafile to generate inline code completions in react / next.js apps.

Unfurl

{ "expires": 1733438711106, "status": 200, "bodyLength": 312427, "author": "heaversm", "date": null, "description": "Use llamafile to generate inline code completions in react / next.js apps. - heaversm/llamafile-code-completion", "image": "https://opengraph.githubassets.com/915d6e44bc945a9b6003d6d82397ba7521384e8fb6f21620a63ef8b73ada62bf/heaversm/llamafile-code-completion", "logo": "https://github.com/fluidicon.png", "publisher": "GitHub", "title": "GitHub - heaversm/llamafile-code-completion: Use llamafile to generate inline code completions in react / next.js apps.", "url": "https://github.com/heaversm/llamafile-code-completion", "feed": null, "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 1342 }
Godot Isn't Making it

www.wheresyoured.at
2024-12-04T22:00:37.621Z
fail ai ml llm genai

Notes

New, more powerful chips require entirely new methods to rack-mount, operate and cool them, and all of these parts must operate in sync, as overheating GPUs will die. While these units are big, some of their internal components are microscopic in size, and unless properly cooled, their circuits will start to crumble when roasted by a guy typing "Garfield with Gun" into ChatGPT.

Feed

Unfurl

{ "expires": 1733436032242, "status": 200, "bodyLength": 81085, "author": "Edward Zitron", "date": "2024-12-03T20:55:22.000Z", "description": "Before we get going — please enjoy my speech from Web Summit, Why Are All Tech Products Now Shit? I didn’t write the title. What if what we’re seeing today isn’t a glimpse of the future, but the new terms of the present? What if artificial intelligence isn’t actually capable", "image": "https://www.wheresyoured.at/content/images/2024/01/wyea--1.jpeg", "logo": "https://www.wheresyoured.at/content/images/size/w256h256/2024/01/wyea-.jpeg", "publisher": "Ed Zitron's Where's Your Ed At", "title": "Godot Isn’t Making it", "url": "https://www.wheresyoured.at/godot-isnt-making-it/", "feed": "https://www.wheresyoured.at/rss/", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 226 }
The Illustrated Word2vec – Jay Alammar – Visualizing machine learning one concept at a time.

jalammar.github.io
2024-11-28T18:35:10.956Z
ai llm genai ml word2vec

Notes

I hope that you now have a sense for word embeddings and the word2vec algorithm. I also hope that now when you read a paper mentioning “skip gram with negative sampling” (SGNS) (like the recommendation system papers at the top), that you have a better sense for these concepts.

Feed

Unfurl

{ "expires": 1732905302362, "status": 200, "bodyLength": 52891, "author": "Jay Alammar", "date": "2019-03-27T12:00:00.000Z", "description": "Discussions:\nHacker News (347 points, 37 comments), Reddit r/MachineLearning (151 points, 19 comments) Translations: Chinese (Simplified), French, Korean, Portuguese, Russian “There is in all things a pattern that is part of our universe. It has symmetry, elegance, and grace - those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote bush or the pattern of its leaves. We try to copy these patterns in our lives and our society, seeking the rhythms, the dances, the forms that comfort. Yet, it is possible to see peril in the finding of ultimate perfection. It is clear that the ultimate pattern contains it own fixity. In such perfection, all things move toward death.” ~ Dune (1965) I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2). Word2vec is a method to efficiently create word embeddings and has been around since 2013. But in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. Companies like Airbnb, Alibaba, Spotify, and Anghami have all benefitted from carving out this brilliant piece of machinery from the world of NLP and using it in production to empower a new breed of recommendation engines. In this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. But let’s start with an example to get familiar with using vectors to represent things. Did you know that a list of five numbers (a vector) can represent so much about your personality?", "image": "https://jalammar.github.io/images/word2vec/word2vec.png", "logo": "https://t1.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=https://github.io/illustrated-word2vec/&size=128", "publisher": null, "title": "The Illustrated Word2vec", "url": "https://jalammar.github.io/illustrated-word2vec/", "feed": "https://jalammar.github.io/feed.xml", "iframe": null, "lang": null, "video": null, "audio": null, "duration": 825 }
Bluesky, AI, and the battle for consent on the open web

werd.io
2024-11-27T23:55:16.818Z
bluesky bsky mastodon social ai ml genai llm

Notes

So the problem Bluesky is dealing with is not so much a problem with Bluesky itself or its architecture, but one that’s inherent to the web itself and the nature of building these training datasets based on publicly-available data. Van Strien’s original act clearly showed the difference in culture between AI and open social web communities: on the former it’s commonplace to grab data if it can be read publicly (or even sometimes if it’s not), regardless of licensing or author consent, while on open social networks consent and authors’ rights are central community norms.

Feed

Unfurl

{ "expires": 1732838115407, "status": 200, "bodyLength": 68224, "author": "Ben Werdmuller", "date": "2024-11-27T16:49:04.000Z", "description": "Why public data needs private consent", "image": "https://werd.io/file/67474d79f8af0bcead05de32/thumb.jpg", "logo": "https://werd.io/gfx/logos/apple-icon-144x144.png", "publisher": "Werd I/O", "title": "Bluesky, AI, and the battle for consent on the open web", "url": "https://werd.io/2024/bluesky-ai-and-the-battle-for-consent-on-the-open", "feed": "https://werd.io/2024/bluesky-ai-and-the-battle-for-consent-on-the-open?_t=rss", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 444 }
Hackerrank was broken - but now it's actually harmful — segfaulte

segfaulte.mataroa.blog
2024-11-27T23:36:51.877Z
jobs interviews career genai recruiting

Notes

Since everyone started using AI, more candidates started clearing the first round with flying colors. The platforms had to recalibrate to let in their target percentage. But they're not measuring the code written by the candidate anymore—they're measuring how well the candidate uses an LLM. Most developers who can actually write this code try to do it themselves. They get marked lower than peers who used AI and completed it faster. The result? Hiring teams keep raising the bar arbitrarily, trying to find the candidates who are best at prompt engineering their way through a coding test.

Unfurl

{ "expires": 1732836899894, "status": 200, "bodyLength": 20458, "author": null, "date": "2024-11-27T00:00:00.000Z", "description": null, "image": "https://media.cleanshot.cloud/media/95888/ymNPJjCYYAzHLLUhJeaCo0r9zxjDqxZUBXEKryyg.jpeg?Expires=1732762312&Signature=NXWshCa~LAxaCDprusolBZERugwsqOn425L3ifCiQpolOGJQK7TUDyntccWimNVkqQYNi1NBwwwlltonVWqcOVBh66fCh1QtPTsRZivJJQEIz5Aj01RjfI-4R8V4zRI5IWFEImAna2SX7jWHdGeKW4IkIVSs1-58pEct8sYxxrFAXAVIAZeINPDyzILbwxMNUTX4tY-mmKvD6uQU7l3eUrXKuO65R97RJ8VPUQsZq-MqJquHYEtdAIHWo3lmtVfg3ZhN5WYAqdf3Nn0atMvgzAnLUtr5tVK5plJOw3JPewlb1y9JBfJHBmI07kM-eMq3tnjQcBh4AJm4ghEinw~7og__&Key-Pair-Id=K269JMAT9ZF4GZ", "logo": null, "publisher": "segfaulte", "title": "Hackerrank was broken - but now it’s actually harmful — segfaulte", "url": "https://segfaulte.mataroa.blog/blog/hackerrank-was-broken-but-now-its-actually-harmful/", "feed": null, "iframe": null, "lang": null, "video": null, "audio": null, "duration": 2745 }
Between the Booms: AI in Winter – Communications of the ACM

cacm.acm.org
2024-11-27T19:06:53.252Z
ml ai genai compsci

Notes

After people stopped caring, artificial intelligence got more interesting.

Feed

Unfurl

{ "expires": 1732820798924, "status": 200, "bodyLength": 165805, "author": "Thomas Haigh", "date": "2024-10-08T12:00:00.000Z", "description": null, "image": "https://cacm.acm.org/wp-content/uploads/2024/09/091024.OP_.Between-the-Booms-G.jpg", "logo": "https://cacm.acm.org/wp-content/uploads/2023/11/cropped-cropped-cacm_favicon-1.png?w=180", "publisher": null, "title": "Between the Booms: AI in Winter – Communications of the ACM", "url": "https://cacm.acm.org/opinion/between-the-booms-ai-in-winter/", "feed": "https://cacm.acm.org/feed/", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 1119 }
On not using copilot - macwright.com

macwright.com
2024-11-23T20:06:06.989Z
llm ai ml copilot genai

Notes

So, in summary: maybe people shy away from copilots because they’re tired of complexity, they’re tired of accelerating productivity without improving hours, they’re afraid of forgetting rote skills and basic knowledge, and they want to feel like writers, not managers. Maybe some or none of these things are true - they’re emotional responses and gut feelings based on predictions - but they matter nonetheless.

Feed

Unfurl

{ "expires": 1732478754183, "status": 200, "bodyLength": 12697, "author": "Tom MacWright", "date": "2024-11-20T00:00:00.000Z", "description": null, "image": null, "logo": "https://macwright.com/css/favicon.png", "publisher": "macwright.com", "title": "On not using copilot", "url": "https://macwright.com/2024/11/20/not-using-copilot", "feed": "https://macwright.com/micro/rss.xml", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 680 }
Adjacent Possible | Steven Johnson | Substack

adjacentpossible.substack.com
2024-11-22T01:08:43.288Z
newsletters substack ai ml genai llm imported:opml

Notes

A newsletter from author Steven Johnson exploring where good ideas come from—and how to keep them from turning against us. Click to read Adjacent Possible, by Steven Johnson, a Substack publication with tens of thousands of subscribers.

Feed

Unfurl

{ "expires": 1732324114373, "status": 200, "bodyLength": 75345, "author": "Steven Johnson", "date": null, "description": "A newsletter from author Steven Johnson exploring where good ideas come from—and how to keep them from turning against us. Click to read Adjacent Possible, by Steven Johnson, a Substack publication with tens of thousands of subscribers.", "image": "https://substackcdn.com/image/fetch/f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fadjacentpossible.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D121311086%26version%3D9", "logo": "https://t3.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=https://adjacentpossible.substack.com/&size=128", "publisher": "Substack", "title": "Adjacent Possible | Steven Johnson | Substack", "url": "https://adjacentpossible.substack.com/", "feed": "https://adjacentpossible.substack.com/feed", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 2505 }
You Exist In The Long Context

thelongcontext.com
2024-11-22T01:07:02.739Z
genai llm ml writing

Notes

The current state-of-the-art Gemini model can fit roughly 1.5 million words in its context. That’s enough for me to upload the full text of all fourteen of my books, plus every article, blog post, or interview I’ve ever published—and the entirety of my collection of research notes that I’ve compiled over the years. The Gemini team has announced plans for a model that could hold more than 7 million words in its short-term memory. That’s enough to fit everything I’ve ever written, plus the hundred books and articles that most profoundly shaped my thinking over the years. An advanced model capable of holding in focus all that information would have a profound familiarity with all the words and ideas that have shaped my personal mindset. Certainly its ability to provide accurate and properly-cited answers to questions about my worldview (or my intellectual worldview, at least) would exceed that of any other human. In some ways it would exceed my own knowledge, thanks to its ability to instantly recall facts from books I read twenty years ago, or make new associations between ideas that I have long since forgotten. It would lack any information about my personal or emotional history—though I suppose if I had maintained a private journal over the past decades it would be able to approximate that part of my mindset as well. But as reconstruction of my intellectual grounding, it would be unrivaled. If that is not considered material progress in AI, there is something wrong with our metrics.

Unfurl

{ "expires": 1732324014472, "status": 200, "bodyLength": 49575, "author": "Steven Johnson", "date": null, "description": "Thoughts on the quiet revolution of long-context AI models, from NotebookLM’s Editorial Director Steven Johnson.", "image": "https://thelongcontext.com/og.png", "logo": "https://thelongcontext.com/favicon.ico", "publisher": null, "title": "You Exist In The Long Context", "url": "https://thelongcontext.com/", "feed": null, "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 509 }
An introduction to fine-tuning LLMs at home with Axolotl • The Register

www.theregister.com
2024-11-19T22:27:32.218Z
genai ml ai finetuning llm lora

Notes

In this guide we'll discuss: Where and when fine-tuning can be useful. Alternative approaches to extending the capabilities and behavior of pre-trained models. The importance of data preparation. How to fine-tune Mistral 7B using your own custom dataset with Axolotl. The many hyperparameters and their effect on training. Additional resources to help you fine-tune your models faster and more efficiently.

Feed

Unfurl

{ "expires": 1732141649055, "status": 200, "bodyLength": 77404, "author": "Tobias Mann", "date": "2024-11-10T02:17:17.000Z", "description": "Got a modern Nvidia or AMD graphics card? Custom Llamas are only a few commands and a little data prep away", "image": "https://regmedia.co.uk/2023/11/27/shutterstock_datafabric.jpg", "logo": "https://www.theregister.com/design_picker/13249a2e80709c7ff2e57dd3d49801cd534f2094/graphics/favicons/favicon.ico", "publisher": "The Register", "title": "An introduction to fine-tuning LLMs at home with Axolotl", "url": "https://www.theregister.com/2024/11/10/llm_finetuning_guide/", "feed": "https://www.theregister.com/headlines.atom", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 362 }
As public perception of AI sours, crowdfunding platforms scramble | Polygon

www.polygon.com
2024-11-19T20:10:17.262Z
genai art gaming ttrpg kickstarter crowdfunding business

Notes

While the ethics of this technology’s ecological and social impact are debated, use of the technology comes with repeated controversy. This instance of AI-assisted art from the company that made the award-winning Star Realms deck-building game has caused some to take to social media in disappointment and frustration. A few fans of the company state they won’t purchase another game by Wise Wizard, with at least one store stating it will no longer be stocking the company’s products. This anti-AI sentiment is not unanimous, however. Projects like Wonders of the First, Grimcoven, and Terraforming Mars still raised millions of dollars from thousands of backers as recently as June of this year — giving crowdfunding platforms an incentive to keep AI projects on the site, as long as the money is still there.

Feed

Unfurl

{ "expires": 1732133409824, "status": 200, "bodyLength": 261390, "author": "Rowan Zeoli", "date": "2024-11-18T21:59:46.000Z", "description": "Pro-AI creators are likely to work around loose restrictions on tabletop’s most popular crowdfunding platforms", "image": "https://platform.polygon.com/wp-content/uploads/sites/2/2024/11/vlcsnap-2024-11-18-16h23m54s149.jpg?quality=90&strip=all&crop=0%2C3.4613147178592%2C100%2C93.077370564282&w=1200", "logo": "https://www.polygon.com/static-assets/icons/android-chrome-512x512.png", "publisher": "Polygon", "title": "As public perception of AI sours, crowdfunding platforms scramble", "url": "https://www.polygon.com/tabletop-games/481022/ai-tabletop-kickstarter-backerkit-gamefound-wise-wizard-draconis-8", "feed": "https://www.polygon.com/rss/index.xml", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 458 }
Sponsoring the Web Applets project, an open approach to AI-empowered web apps - Mozilla Innovations

future.mozilla.org
2024-11-14T17:59:53.444Z
webdev genai ai ml

Notes

Web Applets are small, secure pieces of web code (bundles of HTML, JavaScript, and CSS) that can run anywhere, allowing a model to take actions within software much like a human would and then generate interfaces appropriate for the user’s intent. For example, a developer could write an applet that enables a model to respond to a query about local coffee shops by conducting internet searches and then displaying the results on an in-line map. And because the model can read the internal state of each applet, it can then conduct follow-up actions to complete a user’s request (for example, updating the map to display only coffee shops that will be open tomorrow afternoon). Anyone can build Web Applets and host them on the Web, and any client can potentially support them.

Unfurl

{ "expires": 1731693588849, "status": 200, "bodyLength": 22982, "author": null, "date": "2024-11-14T12:00:00.000Z", "description": "Mozilla Builders has spent the past year accelerating 14 local AI projects and sponsoring projects like Llamafile and sqlite-vec that advance the state of the art in open source AI technology. Today, we’re proud to announce our next open source collaboration, the Web Applets project, an early-stage, open spec for building AI-native apps on top of the Web.", "image": "https://storage.googleapis.com/future-prod-prod-storage/images/Web-Applets-logo-letterbox.2e16d0ba.fill-1200x630.png", "logo": "https://future.mozilla.org/static/img/favicons/innovation/favicon-196x196.2af054fea211.png", "publisher": "Mozilla Innovations", "title": "Sponsoring the Web Applets project, an open approach to AI-empowered web apps", "url": "http://future.mozilla.org/builders/news_insights/sponsoring-the-web-applets-project/", "feed": null, "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 1320 }
Dead Labor, Dead Speech - by Nicholas Carr

www.newcartographies.com
2024-11-11T19:15:48.866Z
llm ml ai genai creativity writing economy

Notes

If, as Marx argued, capital is dead labor, then the products of large language models might best be understood as dead speech. Just as factory workers produce, with their “living labor,” machines and other forms of physical capital that are then used, as “dead labor,” to produce more physical commodities, so human expressions of thought and creativity—“living speech” in the forms of writing, art, photography, and music—become raw materials used to produce “dead speech” in those same forms. LLMs, to continue with Marx’s horror-story metaphor, feed “vampire-like” on human culture. Without our words and pictures and songs, they would cease to function. They would become as silent as a corpse in a casket.

Feed

Unfurl

{ "expires": 1731438926778, "status": 200, "bodyLength": 160620, "author": "Nicholas Carr", "date": "2024-10-10T12:26:23.000Z", "description": "What happens when culture becomes an industry’s raw material.", "image": "https://substackcdn.com/image/fetch/w_1200,h_600,c_fill,f_jpg,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e23ca8f-4f51-4205-8e17-b8e84ac75e37_880x720.heic", "logo": "https://t2.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=https://www.newcartographies.com/p/dead-labor-dead-speech&size=128", "publisher": "New Cartographies", "title": "Dead Labor, Dead Speech", "url": "https://www.newcartographies.com/p/dead-labor-dead-speech", "feed": "https://www.newcartographies.com/feed", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 4333 }
Everything I've learned so far about running local LLMs

nullprogram.com
2024-11-11T19:09:06.499Z
ai genai ml llm llama

Notes

Over the past month I’ve been exploring the rapidly evolving world of Large Language Models (LLM). It’s now accessible enough to run a LLM on a Raspberry Pi smarter than the original ChatGPT (November 2022). A modest desktop or laptop supports even smarter AI. It’s also private, offline, unlimited, and registration-free. The technology is improving at breakneck speed, and information is outdated in a matter of months. This article snapshots my practical, hands-on knowledge and experiences — information I wish I had when starting. Keep in mind that I’m a LLM layman, I have no novel insights to share, and it’s likely I’ve misunderstood certain aspects. In a year this article will mostly be a historical footnote, which is simultaneously exciting and scary.

Feed

Unfurl

{ "expires": 1731438542170, "status": 200, "bodyLength": 39935, "author": null, "date": "2024-11-10T00:00:00.000Z", "description": null, "image": null, "logo": "https://nullprogram.com/favicon.ico", "publisher": null, "title": "Everything I’ve learned so far about running local LLMs", "url": "https://nullprogram.com/blog/2024/11/10/", "feed": "https://nullprogram.com/feed/", "iframe": null, "lang": null, "video": null, "audio": null, "duration": 566 }
Perceptually lossless (talking head) video compression at 22kbit/s | Martin Lumiste

mlumiste.com
2024-11-09T03:29:00.476Z
ai ml genai deepfakes compression

Notes

I’ve been having quite a bit of fun with the fairly recent LivePortrait model, generating deepfakes of my friends for some cheap laughs.

Feed

Unfurl

{ "expires": 1731209337210, "status": 200, "bodyLength": 51053, "author": "Martin Lumiste", "date": "2024-11-06T22:00:00.000Z", "description": "Update: Discussion on Hacker News I’ve been having quite a bit of fun with the fairly recent LivePortrait model, generating deepfakes of my friends for some cheap laughs. The inevitable Elon Musk deepfake, picture by Debbie Rowe The emerging field of 2D avatar/portrait animation (being able to animate any still image, avoiding the need to render cumbersome 3D models that would struggle with small facial details) is a harbinger of things to come. In the best case, it will be ubiquitous on social media (the authors have already added an extension to animate cute animal faces) and in the worst, trust on the internet will be heavily undermined. But one overlooked use case of the technology is (talking head) video compression. After all, prediction is compression, so a sufficiently powerful face generator should be able to compress frame information into an extremely sparse set of cues to reconstruct the same frame from. This was briefly explored in Nvidia’s seminal facevid2vid paper that compared their models’ compression ratio to the classical H.264 codec. The main idea is quite simple: given a source image that is shared between the sending and receiving side, the only information that needs to be transmitted is the change in expression, pose and facial keypoints. The receiving side then simply animates the source frame into the new one, using these motion parameters. The main upside is that this method achieves pretty reasonable perceptual quality at an extremely low bitrate, while at a comparable level a traditional video codec will show heavy artifacts. There are, of course, downsides as well: there is no longer a natural lever to trade-off between quality and bitrate, like the CRF for H.264. as a model with large generative capacity, there’s essentially no limit to how bad the worst case reconstruction can be. It could in theory render a completely different person, or distort your face into a monstrous gremlin. the impressive bitrate does not come for free, as e.g. LivePortrait needs to run on an RTX 4090 for real-time processing. In the space of possible learned compression models, compared to something like DCVC, it is a further improvement in compression rate, at the cost of having a 10x+ slower model. Anyway, LivePortrait is a beefed up version of facevid2vid, so let’s look at how good it is for video compression. I extracted the first frame of the above driving video as a key frame, simulating a scenario where instead of a high quality enrolled image, key frames are extracted on-demand in the background. This means that in addition to being same identity animation, this is also same video animation - by far the simplest scenario to work on for the model, as you have very good alignment between the source and driving frames. It’s also the closest to a drop-in replacement of the current video call experience. Here are the results of a quick try: Self-animation, i.e. driving a keyframe of the video with the motion It’s possible to uncover discrepancies in a side by side analysis: the head tends to be a bit shaky in all my experiments, probably because LivePortrait processes frames in isolation, without any motion prior. Or maybe my driving video is low quality. 🤷 since the eye gaze is off camera in the key frame (“neutral mode”), the model seems to map it incorrectly in every frame after that. teeth are generally hallucinated, but this is only noticeable in smiling videos. As expected, the discrepancies are much more obvious if we provide a driving video with shoulder movement and difficult head angles. Also, the further the inference setup is from the training one (which is same video animation), the worse the results. Nonetheless, it’s clear that there is a set of frames, arguably a large proportion of video-conferencing, where the model manages to produce subjectivelly distinguishable reconstructions. Sure, in a side by side analysis we might be able to tell which is the original and which is the reconstruction. However, if you are only looking at the generated output, it works very well. So how small is the bitrate of this reconstruction? The model equation for transforming the face keypoints is: [x_d = s_d \\times (x_{c, s} R_d + \\delta_d) + t_d] where $x_{c, s} \\in \\mathbb{R}^{K \\times 3}$ are the “canonical” implicit 3D facial keypoints of the source image, $R_d \\in \\mathbb{R}^{3 \\times 3}$ is a 3D rotation matrix (relative to the canonical keypoints), $\\delta_d \\in \\mathbb{R}^{K \\times 3}$ denotes the expression deformations, $t_d \\in \\mathbb{R}^3$ is a translation vector and $s_d$ is just a scaling coefficient and $K$, the number of keypoints, is a hyperparameter. The intuition of this equation is provided in the Nvidia paper: Of course, $x_d$ is not yet the final reconstruction of the image, only the transformed keypoints. There are some flow field estimations and warp operations remaining to actually turn the source image into the driving one. Nonetheless, the sender only needs to transmit $s_d$, $R_d$, $\\delta_d$ and $t_d$ for a lifelike reconstruction to happen on the receiver’s side. And since we know their shapes, we can also infer the bitrate: $3 \\times 3 + K \\times 3 + 3 = 75 $ numbers at $ K = 21 $, the default LivePortrait setting. At half precision floats, that’s $ 16 \\times 75 \\times 30 $ bits per second for a 30FPS video, or 36kbit/s. This could be compressed further - note that each frame is processed in isolation. This could be alleviated with entropy coding and having a temporal prior. In facevid2vid, simple entropy coding reduced the baseline model’s bitrate by nearly 40%, while using an adaptive number of keypoints reduced it by 60%. Using the first figure, we should be able to bring down LivePortrait’s bitrate to about 22kbit/s. For reference, the low bitrate challenge in CLIC 2024 featured video compression at 50kbit/s, but as expected, the models showed significantly worse subjective quality scores than at 500kbit/s. LivePortrait has roughly the same bitrate as the facevid2vid had (model transmits similar information), but achieving better results. Looking at the evaluation results for the latter method, we see that as expected, their model only provides a single point on the bitrate-quality curve. So without any evaluation results at hand, I would expect LivePortrait to move strictly downwards, matching a lower H.264 CRF for equal preference. Extrapolating ahead, a future model might achieve the same perceptual quality as a visually lossless CRF (FFmpeg suggests 17 or 18). Then, it is up to the user whether they want to squeeze bitrate to near zero at the cost of compute. How does it work (what is the magic?) The main problem of frame animation is that we are projecting a 3D object to a 2D image. So our model needs to understand the rotation and deformation of the underlying object. The good thing about faces is that they are rigid, i.e. tends to have limited degrees of freedom in movement and nearby pixels move together in predictable ways. Nonetheless, this has proven to be a hard problem. The main innovation of facevid2vid, that also powers LivePortrait, was realising that this can be formed as a 3D rotation problem. By rotating a set of abstract 3D tensors enough times, the model learns to actually map these to keypoints of the face, as if someone would have painstakingly labelled them for each frame. Up until then, models like First Order Motion Model had also used the implicit keypoints approach, but only with 2D keypoints. The second thing that seems to work is quite humdrum: compared to facevid2vid, LivePortrait has seriously scaled up the training dataset to 69 million high quality frames, and added regional GAN losses that focus only on the local regions like the face or the lips. So rather than any architectural breakthrough, it seems to have been a lot of iterative improvements on dataset and losses. While being able to learn facial keypoints self-supervisedly is a testament to why deep learning is cool, it also allows direct controllability of the avatar. Since the rotation matrix has a direct geometric interpretation, you can input parameters for a required pose. LivePortrait adds on top of this by training small neural networks to control lip and eye movement. This is a big step ahead in terms of avatar controllability which generally has not been a strong suit of many generative approaches (I’m looking at you, diffusion). LivePortrait methodology is quite different from SotA learned video compression models like DCVC, which need to encode spatial information with a great degree of fidelity targeting pixel-aligned distortion losses such as MSE. A generative model unencumbered by pixel-alignment and optimised for various GAN based perceptive losses, only tries to generate something plausible. On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead. Nonetheless, models and hardware become faster reliably quickly. Also, the same identity animation problem is significantly easier than animating Elon Musk or your cat, so probably a model optimised for teleconferencing could be remarkably smaller. That’s why it might not be that much of a moonshot. Publicly, Zoom seems to have played with the idea of avatar technology. I’ll let the precise use cases be determined by product people, but off the top of my head: having a more formal version of yourself avatar for days when youre working in your underwear animating a 4k studio quality avatar from a driving video from my terrible webcam using pose and gaze connection to seat the avatars in some kind of more immersive virtual meeting room letting your avatar attend meetings / send messages as a digital twin. If all the driving keypoints are directly manipulatable, you could programmatically control a photorealistic video. Of course it’s possible that none of these will be useful or socially normalised, yet its fun to theorise.", "image": "https://mlumiste.com/assets/images/compression/elon.gif", "logo": null, "publisher": "Martin Lumiste", "title": "Perceptually lossless (talking head) video compression at 22kbit/s", "url": "https://mlumiste.com/technical/liveportrait-compression/", "feed": "https://mlumiste.com/feed.xml", "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 685 }
[2410.16454] Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge

arxiv.org
2024-11-04T19:40:13.042Z
ai ml genai llm

Notes

Large language models (LLMs) have shown remarkable proficiency in generating text, benefiting from extensive training on vast textual corpora. However, LLMs may also acquire unwanted behaviors from the diverse and sensitive nature of their training data, which can include copyrighted and private content. Machine unlearning has been introduced as a viable solution to remove the influence of such problematic content without the need for costly and time-consuming retraining. This process aims to erase specific knowledge from LLMs while preserving as much model utility as possible. Despite the effectiveness of current unlearning methods, little attention has been given to whether existing unlearning methods for LLMs truly achieve forgetting or merely hide the knowledge, which current unlearning benchmarks fail to detect. This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information. To thoroughly evaluate this phenomenon, we conduct comprehensive experiments using various quantization techniques across multiple precision levels. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21\% of the intended forgotten knowledge in full precision, which significantly increases to 83\% after 4-bit quantization. Based on our empirical findings, we provide a theoretical explanation for the observed phenomenon and propose a quantization-robust unlearning strategy to mitigate this intricate issue...

Unfurl

{ "expires": 1730835610839, "status": 200, "bodyLength": 49681, "author": "Zhiwei Zhang", "date": "2024-10-21T12:00:00.000Z", "description": "Large language models (LLMs) have shown remarkable proficiency in generating text, benefiting from extensive training on vast textual corpora. However, LLMs may also acquire unwanted behaviors from the diverse and sensitive nature of their training data, which can include copyrighted and private content. Machine unlearning has been introduced as a viable solution to remove the influence of such problematic content without the need for costly and time-consuming retraining. This process aims to erase specific knowledge from LLMs while preserving as much model utility as possible. Despite the effectiveness of current unlearning methods, little attention has been given to whether existing unlearning methods for LLMs truly achieve forgetting or merely hide the knowledge, which current unlearning benchmarks fail to detect. This paper reveals that applying quantization to models that have undergone unlearning can restore the “forgotten” information. To thoroughly evaluate this phenomenon, we conduct comprehensive experiments using various quantization techniques across multiple precision levels. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21\\% of the intended forgotten knowledge in full precision, which significantly increases to 83\\% after 4-bit quantization. Based on our empirical findings, we provide a theoretical explanation for the observed phenomenon and propose a quantization-robust unlearning strategy to mitigate this intricate issue…", "image": "https://arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png", "logo": "https://arxiv.org/static/browse/0.3.4/images/icons/apple-touch-icon.png", "publisher": "arXiv.org", "title": "Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge", "url": "https://arxiv.org/abs/2410.16454v1", "feed": null, "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 640 }
Whatever AI Looks Like, It's Not | Defector

defector.com
2024-11-04T19:38:44.288Z
ai fail slop llm ml genai

Notes

It is nightmarish to me to read reports of how reliant on ChatGPT students have become, even outsourcing to the machines the ideally very personal assignment "briefly introduce yourself and say what you're hoping to get out of this class." It is depressing to me to read defenses of those students, particularly this one that compares an AI-written essay to using a washing machine in that it reduces the time required for the labor. This makes sense only if the purpose of a student writing an essay is "to have written an essay," which it is not. The teacher did not assign it as busywork. The purpose of an essay is to learn and practice communication skills, critical thinking, organization of one's own thoughts. These are useful skills to develop, even (especially!) if you do not go into a writing career.

Unfurl

{ "expires": 1730835520825, "status": 200, "bodyLength": 337040, "author": "Barry Petchesky", "date": "2024-09-01T13:56:39.000Z", "description": "There is a very funny viral tweet going around that features a screenshot of a Google search result for “austria-hungary in space.” You can try the search yourself. This is what Google returns: In 1889 Austria-Hungary conducted its first manned orbital spaceflight using a liquid-fueled rocket launched from the region of Galicia. In 1908 the […]", "image": "https://lede-admin.defector.com/wp-content/uploads/sites/28/2024/09/GettyImages-200384102-001.jpg", "logo": "https://lede-admin.defector.com/wp-content/uploads/sites/28/2023/09/cropped-defector-circle_avatar512-1.png", "publisher": "Defector home", "title": "Whatever AI Looks Like, It’s Not | Defector", "url": "https://defector.com/whatever-ai-looks-like-its-not", "feed": null, "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 1535 }
Oasis

oasis-model.github.io
2024-11-01T20:49:25.294Z
ai ml genai gamedev gaming

Notes

Oasis takes in user keyboard input and generates real-time gameplay, including physics, game rules, and graphics. You can move around, jump, pick up items, break blocks, and more. There is no game engine; just a foundation model.

Unfurl

{ "expires": 1730580562725, "status": 200, "bodyLength": 44318, "author": null, "date": null, "description": "Generating Worlds in Realtime", "image": "https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg", "logo": "https://oasis-model.github.io/favicon.ico", "publisher": null, "title": "Oasis", "url": "https://oasis-model.github.io/", "feed": null, "iframe": null, "lang": "en", "video": null, "audio": null, "duration": 957 }
Xan 9 from Outer Space: "Generative AI is like if capitalism reinvented the fae. It’ll trick you into accepting its agreement and then it’ll steal your face and start speaking with your voice." — Bluesky

bsky.app
2024-10-16T22:10:44.082Z
genai ai ml

Notes

Generative AI is like if capitalism reinvented the fae. It’ll trick you into accepting its agreement and then it’ll steal your face and start speaking with your voice.

Embed

Unfurl

{ "expires": 1729202887038, "status": 200, "bodyLength": 5948, "author": null, "date": null, "description": "Generative AI is like if capitalism reinvented the fae. It’ll trick you into accepting its agreement and then it’ll steal your face and start speaking with your voice.", "image": null, "logo": "https://bsky.app/static/apple-touch-icon.png", "publisher": "Bluesky Social", "title": "Xan 9 from Outer Space (@xanindigo.bsky.social)", "url": "https://bsky.app/profile/xanindigo.bsky.social/post/3l6nwbdwdoh2p", "feed": null, "iframe": "<blockquote class=\"bluesky-embed\" data-bluesky-uri=\"at://did:plc:4qdzg34xf7tu3h3owgyiaq7h/app.bsky.feed.post/3l6nwbdwdoh2p\" data-bluesky-cid=\"bafyreihcshpin7nphk6icx73k6lvoumqnrs6uslpuc2hmf5kwqb5b4giqm\"><p lang=\"en\">Generative AI is like if capitalism reinvented the fae. It’ll trick you into accepting its agreement and then it’ll steal your face and start speaking with your voice.</p>— <a href=\"https://bsky.app/profile/did:plc:4qdzg34xf7tu3h3owgyiaq7h?ref_src=embed\">Xan 9 from Outer Space (@xanindigo.bsky.social)</a> <a href=\"https://bsky.app/profile/did:plc:4qdzg34xf7tu3h3owgyiaq7h/post/3l6nwbdwdoh2p?ref_src=embed\">2024-10-16T21:50:23.849Z</a></blockquote><script async src=\"https://embed.bsky.app/static/embed.js\" charset=\"utf-8\"></script>", "lang": null, "video": null, "audio": null, "duration": 630 }
Why YOU Should Make a Website! - YouTube

www.youtube.com
2024-10-03T03:27:53.224Z
webdev goodinternet metablogging ai genai

Notes

Hey everyone! Today I wanted to talk about the state of the internet, how artists and everyone is affected by AI slop and social media, and why I think everyone should have a personal website these days! Let's bring back the old school internet in new, fun, and creative ways! ^_^

Embed

Unfurl

{ "failed": true, "failedAt": 1728192764806, "failedError": "TimeoutError: Promise timed out after 10000 milliseconds", "description": "Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.", "logo": "https://www.youtube.com/s/desktop/e6683cb8/img/favicon_144x144.png", "publisher": "YouTube", "title": "- YouTube", "url": "https://www.youtube.com/undefined", "iframe": "<iframe width=\"200\" height=\"113\" src=\"https://www.youtube.com/embed/uNlZ50b6wSs?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen title=\"Why YOU Should Make a Website!\"></iframe>", "lang": "en", "cached": true, "cachedAt": 1728192820205 }

0 - 50 of 21 items

10 25 50 100 250