Skip to content

Justin's Linklog Posts

cur.vantage.sh

  • cur.vantage.sh

    via Ben Schaechter: “a new microsite we’ve launched for the AWS community that helps with understanding billing codes present in either Cost Explorer or the CUR. We profiled the number of distinct billing codes across our customer base and have about ~60k unique billing codes. We hear all the time that FinOps practitioners and engineers are confused about the billing codes present in Cost Explorer or the Cost and Usage Report. Think of these as being things like “Requests-Tier1” for S3 or “CW:GMWI-Metrics” for CloudWatch. There is usually really limited resources for determining what these billing codes are even when you Google around for them.”

    Tags: aws billing codes cost-explorer ec2 s3 finops

Words from an ex-Zizian-adjacent person

  • Words from an ex-Zizian-adjacent person

    It seems there’s now a full-on Mansonesque death cult emerging from the LessWrong/rationalist/effective-altruism community: https://www.sfgate.com/bayarea/article/bay-area-death-cult-zizian-murders-20064333.php

    This HN comment was very interesting for background:

    [Former member of that world, roommates with one of Ziz’s friends for a while, so I feel reasonably qualified to speak on this.] The problem with rationalists/EA as a group has never been the rationality, but the people practicing it and the cultural norms they endorse as a community.

    As relevant here:

    1) While following logical threads to their conclusions is a useful exercise, each logical step often involves some degree of rounding or unknown-unknowns. A -> B and B -> C means A -> C in a formal sense, but A -almostcertainly-> B and B -almostcertainly-> C does not mean A -almostcertainly-> C. Rationalists, by tending to overly formalist approaches, tend to lose the thread of the messiness of the real world and follow these lossy implications as though they are lossless. That leads to…

    2) Precision errors in utility calculations that are numerically-unstable. Any small chance of harm times infinity equals infinity. This framing shows up a lot in the context of AI risk, but it works in other settings too: infinity times a speck of dust in your eye >>> 1 times murder, so murder is “justified” to prevent a speck of dust in the eye of eternity. When the thing you’re trying to create is infinitely good or the thing you’re trying to prevent is infinitely bad, anything is justified to bring it about/prevent it respectively.

    3) Its leadership – or some of it, anyway – is extremely egotistical and borderline cult-like to begin with. I think even people who like e.g. Eliezer [Yudkowsky] would agree that he is not a humble man by any stretch of the imagination (the guy makes Neil deGrasse Tyson look like a monk). They have, in the past, responded to criticism with statements to the effect of “anyone who would criticize us for any reason is a bad person who is lying to cause us harm”. That kind of framing can’t help but get culty.

    4) The nature of being a “freethinker” is that you’re at the mercy of your own neural circuitry. If there is a feedback loop in your brain, you’ll get stuck in it, because there’s no external “drag” or forcing functions to pull you back to reality. That can lead you to be a genius who sees what others cannot. It can also lead you into schizophrenia really easily. So you’ve got a culty environment that is particularly susceptible to internally-consistent madness, and finally:

    5) It’s a bunch of very weird people who have nowhere else they feel at home. I totally get this. I’d never felt like I was in a room with people so like me, and ripping myself away from that world was not easy. (There’s some folks down the thread wondering why trans people are overrepresented in this particular group: well, take your standard weird nerd, and then make two-thirds of the world hate your guts more than anything else, you might be pretty vulnerable to whoever will give you the time of day, too.)

    TLDR: isolation, very strong in-group defenses, logical “doctrine” that is formally valid and leaks in hard-to-notice ways, apocalyptic utility-scale, and being a very appealing environment for the kind of person who goes super nuts -> pretty much perfect conditions for a cult. Or multiple cults, really. Ziz’s group is only one of several.

    Tags: zizians cults extropianism tescreal effective-altruism rationalism lesswrong death-cults

Burrows–Wheeler Transform

  • Burrows–Wheeler Transform

    an algorithm used to prepare data for use with data compression techniques such as bzip2. It permutes the order of characters in a string (S), sorting all the circular shifts of the text in lexicographic order, then extracting the last column and the index of the original string in the set of sorted permutations of S.

    Some day when I have lots of free time to spare, I’ll spend a while getting my head around this deep magic, because it’s just amazing that this works.

    (via John Regehr)

    Tags: compression algorithms burrows-wheeler-transform bzip2 via:john-regehr magic text

Irish spider zombies!

  • Irish spider zombies!

    This is fantastic — a newly-discovered species of fungus does the same trick as Ophiocordyceps in Brazil; it infects the brains of orb-weaving cave spiders in Ireland, and induces them to leave their lairs or webs, and migrate to die in an exposed situation, in order to favor dispersal of the fungal spores.

    Ophiocordyceps is, of course, the inspiration for the zombie-forming fungus in The Last Of Us.

    Tags: cordyceps fungi ireland spiders zombies fungus nature gross

The Billion Docs JSON Challenge: ClickHouse vs. MongoDB, Elasticsearch, and more

ODROID-H4+

  • ODROID-H4+

    The next generation of the excellent ODROID SBCs; based on Intel’s N97 architecture, AVX2 extensions, faster DRAM, 4 SATA ports, and up to 48GB of RAM.

    Significantly beefier in general, reportedly around the EUR180 mark in price.

    Tags: odroid sbcs n97 hardware home devices servers

Coordinated Lunar Time

  • Coordinated Lunar Time

    The moon may have a timezone of its own soon, Coordinated Lunar Time (LTC):

    Due to the moon’s lower gravity and its motion relative to Earth, moon time passes 56 microseconds faster each earth day. As a result, an atomic clock on Earth would run at a different rate than an atomic clock on the moon.

    Similar to how UTC is determined, the memo suggests “an ensemble of clocks” deployed to the moon might be used to set the new time standard.

    (via David Cuthbert)

    Tags: via:david-cuthbert moon time timezones ltc

Understanding the BM25 full text search algorithm

LLM-Driven Code Completion in JetBrains IDEs

  • LLM-Driven Code Completion in JetBrains IDEs

    JetBrains have come up with a new relatively-lightweight LLM-driven code generation option, constrained to producing single line suggestions:

    The length of the completion suggestions is a trade-off. While longer suggestions do tend to reduce how many keystrokes you have to make, which is good, they also increase the number of reviews required on your end. Taking the above into account, we decided that completing a single line of code would be a fair compromise.

    Some key features:

    • It works locally and is available offline. This means you can take advantage of the feature even if you aren’t connected to the internet.

    • It doesn’t send any data from your machine over the internet. The language models that power full line code completion run locally, which is great for two reasons. First, your code remains safe, as it never leaves your machine. Second, there are no additional cloud-related expenses – that’s why this feature comes at no additional cost.

    Also, customer code is never used for training.

    I’ve used this (in RubyMine), and found it fairly useful; it’s good for generating the obvious next line, but is easily ignored when that’s not what’s needed. Not bad at all.

    Tags: coding code-completion jetbrains ides java ruby llms ai code-generation rubymine intellij

VIC 20 Elite

  • VIC 20 Elite

    Crazy stuff. Elite, ported to the Commodore VIC 20 (albeit with a 32K expansion):

    VIC 20 Elite is based on the C-64 source. VIC 20 specific graphics, text, keyboard & joystick input, and sound routines were written from scratch to replace the corresponding C-64 code.

    Of course, the complete enhanced Elite won’t fit within the VIC 20’s limited memory, so some features had to be left out. Following the original 1984 BBC Cassette and Acorn Electron version, the VIC 20 version omits extended planet descriptions, planetary details (craters and meridians), and the missions that appear further on in the game. The pause mode options are dropped, and there is no Find Planet option in Galactic Chart (that would be only really useful during missions).

    (via Sleepy from FP)

    Tags: retrogaming commodore emulation gaming history elite vic-20

goref

  • goref

    “a Go heap object reference analysis tool based on delve: It can display the space and object count distribution of Go memory references, which is helpful for efficiently locating memory leak issues or viewing persistent heap objects to optimize the garbage collector (GC) overhead.”

    Nice to see Go supporting similar debugging/optimisation tools to those offered by the JVM.

    Tags: go heap memory gc memory-leaks

Artsy’s Technology Choices evaluation process

  • Artsy’s Technology Choices evaluation process

    This is a nice way to evaluate new technology options, from Artsy:

    We want to accomplish a lot with a lean team, which means we must choose stable technologies. However, we also want to adopt best-of-breed technologies or best-suited tools, which may need work or still be evolving. We’ve borrowed from ThoughtWorks’ Radar to define the following stages for evaluating, adopting, and retiring technologies:

    • Adopt: Reasonable defaults for most work. These choices have been exercised successfully in production at Artsy and there is a critical mass of engineers comfortable working with them.
    • Trial: These technologies are being evaluated in limited production circumstances. We don’t have enough production experience to recommend them for high-risk or business-critical use cases, but they may be worth consideration if your project seems like a fit.
    • Assess: Technologies we are interested in and maybe even built proofs-of-concept for, but haven’t yet trialed in production.
    • Hold: Based on our experience, these technologies should be avoided. We’ve found them to be flawed, immature, or simply supplanted by better alternatives. In some cases these remain in legacy production uses, but we should take every opportunity to retire or migrate away.

    (Via Lar Van Der Jagt on the Last Week In AWS slack instance)

    Tags: via:lwia tech technology radar choices evaluation process architecture planning tools

API Error Design

  • API Error Design

    Some good thoughts from a SlateDB dev, regarding initial principles for errors in SlateDB, derived from experience with Kafka:

    • Keep public errors separate from internal errors. The set of public errors should be kept minimal and new errors should be highly scrutinized. For internal errors, we can go to town since they can be refactored and consolidated over time without affecting the user.
    • Public errors should be prescriptive. Can an operation be retried? Is the database left in an inconsistent state? Can a transaction be aborted? What should the user actually do when the error is encountered? The error should have clear guidance.
    • Prefer coarse error types with rich error messages. There are probably hundreds of cases where the database can enter an invalid state. We don’t need a separate type for each of them. We can use a single FatalError and pack as much information into the error message as is necessary to diagnose the root cause.

    (via Chris Riccomini)

    Tags: errors api design slatedb api-design error-handling exceptions architecture

Block AI scrapers with Anubis

  • Block AI scrapers with Anubis

    Bookmarking this in case I have to use it; I have a blog-related use case that I don’t want LLM scrapers to kill my blog with.

    Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don’t support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.

    The most hilarious part about how Anubis is implemented is that it triggers challenges for every request with a User-Agent containing “Mozilla”. Nearly all AI scrapers (and browsers) use a User-Agent string that includes “Mozilla” in it. This means that Anubis is able to block nearly all AI scrapers without any configuration.

    Tags: throttling robots scraping ops llms bots hashcash tarpits

Cost-optimized archival in S3 using s3tar

  • Cost-optimized archival in S3 using s3tar

    “s3tar” is new to me, and looks like a perfect tool for this common use-case — aggregation and archival of existing data on S3, which often requires aggregation into large file sizes to take advantage of S3 Glacier storage classes (which have a minimum file size of 128Kb).

    s3tar optimizes for cost and performance on the steps involved in downloading the objects, aggregating them into a tar, and putting the final tar in a specified Amazon S3 storage class using a configurable “–concat-in-memory” flag. … The tool also offers the flexibility to upload directly to a user’s preferred storage class or store the tar object in S3 Standard storage and seamlessly transition it to specific archival classes using S3 Lifecycle policies.

    The only downside of s3tar is that it doesn’t support recompression, which is also a common enough requirement — especially after aggregation of multiple small input files into a larger, more compressible archive. But hey, can’t have everything.

    s3tar: https://github.com/awslabs/amazon-s3-tar-tool

    Tags: s3tar amazon s3 compression storage archival architecture aggregation logs glacier via:lwia

Cryptocurrency “market caps” and notional value

  • Cryptocurrency “market caps” and notional value

    Excellent explainer from Molly White, which explains the risk around quoting “market caps” for memecoins:

    The “market cap” measurement has become ubiquitous within and outside of crypto, and it is almost always taken at face value. Thoughtful readers might see such headlines and ask questions like “how did a ‘$2 trillion market’ tumble without impacting traditional finance?”, but I suspect most accept the number.

    When crypto projects are hacked, there are headlines about hackers stealing “$166 million worth” of tokens, when in reality the hackers only could cash out 2% of that amount (around $3 million) because their attempts to sell illiquid tokens caused the price to crash.

    Tags: molly-white memecoins bitcoin rug-pulls scams liquidity market-caps cryptocurrency

Hollo

  • Hollo

    “A federated microblogging software for single users. ActivityPub-enabled, Mastodon-compatible API, supports CommonMark and Misskey-style quotes. Hollo is designed for single-users, so you can own your instance and have full control over your data. It’s perfect for personal microblogs, notes, and journals.”

    Seems fairly heavyweight, however, so I probably won’t be running it, but it’s a nice take on the single-user-server Fediverse use case.

    Tags: fediverse mastodon hollo apps social-media blogging

GTFS-Realtime API

  • GTFS-Realtime API

    The Irish National Transport Authority have an open data API for realtime public transport information; very cool. “The GTFS-R API contains real-time updates for services provided by Dublin Bus, Bus Éireann, and Go-Ahead Ireland.”

    The specification currently supports the following types of information:

    Trip updates – delays, cancellations, changed routes; Service alerts – stop moved, unforeseen events affecting a station, route or the entire network; Vehicle positions – information about the vehicles including location and congestion level

    Registration is required.

    Tags: public-transport buses trains transit nta gtfs apis open-data dublin ireland

Why the British government is so into AI

  • Why the British government is so into AI

    Interesting BlueSky thread on the topic —

    The UK Government believes several things:

    1) The AI genie is out of the bottle and cannot be put back in

    2) Embracing AI would definitely be good for the British economy

    3) Enforcing copyright on AI training would put Britain out of step with rest of the world and subsequently…

    4) Enforcing copyright would be ineffective as AI would just be trained elsewhere, cutting out Brit creatives entirely

    5) Govt’s preferred option is permissive enough to be attractive to AI firms but demands transparency so at least rights holders have some recourse; the alternative is bleaker.

    Obviously, I contest all of these beliefs to one degree or another, but this is where the govt is, and it’s useful to understand that. The real crux of the debate, as they see it, is how Britain’s laws can practically deal with the global inevitability of AI. They believe it’s untenable to make Britain a legislative pariah state for AI, and that this would not lead to good outcomes for British creatives anyway. This is a point worth considering when replying to the consultation.

    However, the govt says it’s not going to implement policy before it has a technical solution for rights holders to opt-out and chase down infringements. My view is that this is difficult to the point of being pure fantasy, and either means that the govt is not serious about finding a real, effective technical solution, or this policy will be kicked indefinitely down the road. My dinner partner was optimistic a solution could be achieved within the timespan of a year or two. I just don’t buy it.

    Government says it has not sided with AI firms over creative industries. However, its understanding of “not taking a side” creates a false equality between massive companies whose business relies on crime and individuals whose livelihoods will be destroyed.

    I got the sense that there is no political will whatsoever to seriously challenge firms who offer to spend big in Britain, and that any thought of holding them to account for actual crime is simply considered naive. But we do have a bit of time while govt attempts to confect their magical, easy to use, opt-out solution—time during which one or several of these AI firms might implode, making the true cost more apparent.

    Tags: uk government ai policy copyright ip britain economy future

The people should own the town square

  • The people should own the town square

    Ah, this is welcome news from Mastodon:

    We are going to transfer ownership of key Mastodon ecosystem and platform components to a new non-profit organization, affirming the intent that Mastodon should not be owned or controlled by a single individual. […] Taking the first tentative steps almost a year ago, there are already multiple organizations involved with shepherding the Mastodon code and platform. The next 6 months will see the transformation of the Mastodon structures, shifting away from the early days’ single-person ownership and enshrining the envisioned independence in a dedicated European not-for-profit entity.

    Tags: mastodon social-media open-source fediverse

Grafana and ClickHouse

Watch Duty

  • Watch Duty

    Nice to see an important public need being met here:

    The [Watch Duty] app gives users the latest alerts about fires in their area [in California] and has become a vital service for millions of users in the western U.S. struggling with the seemingly constant threat of deadly wildfires—one major reason it had over 360,000 unique visits from 8:00-8:30 a.m. local time Wednesday. And the man behind Watch Duty promises that as a nonprofit, his organization has no plans to pull an OpenAI and become a profit-seeking enterprise.

    Tags: non-profits tech watch-duty apps mobile public-good

Steve Jobs vs Ireland

  • Steve Jobs vs Ireland

    this is a great Steve Jobs story, from the engineer who wrote v1 of the MacOS X Dock:

    At one point during a trip over, Steve was talking to Bas and asked how things were coming along with the Dock. He replied something along the lines of “going well, the engineer is over from Ireland right now, etc”. Steve left, and then visited my manager’s manager’s manager and said the fateful words (as reported to me by people who were in the room where it happened).

    “It has come to my attention that the engineer working on the Dock is in FUCKING IRELAND”.

    I was told that I had to move to Cupertino. Immediately. Or else.

    I did not wish to move to the States. I liked being in Europe. Ultimately, after much consideration, many late night conversations with my wife, and even buying a guide to moving, I said no.

    They said ok then. We’ll just tell Steve you did move.

    (via Niall Murphy)

    Tags: macos america osx apple history steve-jobs

Court docs allege Meta trained LLM models using pirated book trove

  • Court docs allege Meta trained LLM models using pirated book trove

    This is pretty massive:

    The [court] document claims that Meta decided to download documents from Library Genesis — aka. “LibGen” — to train its models. LibGen is the subject of a lawsuit brought by textbook publishers who believe it happily hosts and distributes [pirated] works [….]

    The filing from plaintiffs in the Kadrey case claims that documents produced by Meta […] describe internal debate about accessing LibGen, a little squeamishness about using BitTorrent in the office to do so, and eventual escalation to “MZ” [Mark Zuckerberg himself], who approved use of the contentious resource. […]

    Another filing claims that a Meta document describes how it removed copyright notifications from material downloaded from LibGen, and suggests the company did so because it realized including such text could mean a model’s output would reveal it was trained on copyrighted material.

    US District Court Judge Vince Chhabria also noted that in one of the documents Meta wants to seal, an employee wrote the following:

    “If there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.”

    No shit.

    Tags: piracy meta copyright mark-zuckerberg law llama training libgen books

Bufferbloat Test

  • Bufferbloat Test

    A handy tool to test your internet connection for “bufferbloat”, the error condition involving “undesirable high latency caused by other traffic on your network. It happens when a flow uses more than its fair share of the bottleneck. Bufferbloat is the primary cause of bad performance for real-time Internet applications like VoIP calls, video games, and videoconferencing.”

    (My home internet connection is currently rating a C: “your latency increased considerably under load”, jumping from a min/mean/p95/max of 10.7, 16.9, 23.7, 30.1ms to 35.3, 98.4, 121.0, 286.0ms under load, yikes, so looks like I need to do some optimising.)

    Tags: bufferbloat internet networking optimisation performance testing tools

Waymos don’t stop for pedestrians

Garbage Day on Meta’s moderation plans

  • Garbage Day on Meta’s moderation plans

    This is 100% spot on, I suspect, regarding Meta’s recently-announced plans to give up on content moderation:

    After 2021, the major tech platforms we’ve relied on since the 2010s could no longer pretend that they would ever be able to properly manage the amount of users, the amount of content, the amount of influence they “need” to exist at the size they “need” to exist at to make the amount of money they “need” to exist.

    And after sleepwalking through the Biden administration and doing the bare minimum to avoid any fingers pointed their direction about election interference last year, the companies are now fully giving up. Knowing the incoming Trump administration will not only not care, but will even reward them for it.

    The question now is, what will the EU do about it? This is a flagrant raised finger in the face of the Digital Services Act.

    Tags: moderation content ugc meta future dsa eu garbage-day

“uhtcearu”