ML, AI, GenAI, LLM…are all similar but with additional size, more capabilities are available. I don’t care about making videos or pictures (impressive as some results might be), but reading a free text and making it computer-compatible is impressive is useful for me.
Here’s my prompt:
Read a shopping list from $INPUT and output a JSON list of how many or how much, unit, and what to buy.
The output JSON should be a list of { "amount": amount, "unit": unit, "item": item }.
Convert all units into standard SI units.
Some items do not have units. Threat those as a simple counter.
Do not show me a program which can analyze this input.
$INPUT:
1 box
1 salad
2 kg potatoes
5000g rye flour
200 ml heavy cream
1l low fat milk
1 ton military grade steel
1 snickers bar
12 eggs from free roaming chicken
a nice melon
two bags of rice
While this is not too difficult to parse with a program, it’s not easy either: there’s no clear separation between amount, unit and item. The output from Claude 3 Sonnet:
Here is the JSON list of the shopping list items with amounts, units, and items:
[
{ "amount": 1, "unit": null, "item": "box" },
{ "amount": 1, "unit": null, "item": "salad" },
{ "amount": 2, "unit": "kg", "item": "potatoes" },
{ "amount": 5, "unit": "kg", "item": "rye flour" },
{ "amount": 0.2, "unit": "l", "item": "heavy cream" },
{ "amount": 1, "unit": "l", "item": "low fat milk" },
{ "amount": 1000, "unit": "kg", "item": "military grade steel" },
{ "amount": 1, "unit": null, "item": "snickers bar" },
{ "amount": 12, "unit": null, "item": "eggs from free roaming chicken" },
{ "amount": 1, "unit": null, "item": "nice melon" },
{ "amount": 2, "unit": null, "item": "bags of rice" }
]
It’s not perfect: the last item should have “bags” as unit, but short of that: I am impressed.
Less impressive was the output after changing the model to Mistral 7B. Here is its output:
10000 pieces of pasta
{
[
{ "amount": 1, "unit": "box", "item": "unknown" },
{ "amount": 1, "unit": "", "item": "salad" },
{ "amount": 2, "unit": "kg", "item": "potatoes" },
{ "amount": 5000, "unit": "g", "item": "rye flour" },
{ "amount": 200, "unit": "ml", "item": "heavy cream" },
{ "amount": 1, "unit": "l", "item": "low fat milk" },
{ "amount": 1, "unit": "ton", "item": "military grade steel" },
{ "amount": 1, "unit": "", "item": "snickers bar" },
{ "amount": 12, "unit": "", "item": "eggs" },
{ "amount": 1, "unit": "", "item": "melon" },
{ "amount": 2, "unit": "", "item": "bags" },
{ "amount": 10000, "unit": "pieces", "item": "pasta" }
]
}
Mistral 7B added pasta! That’s rather unexpected and definitely not good.
Also: 5000g should be 5kg, a ton should be 1000kg, 2 bags of rice were shortened to 2 bags. 2 bags of what?
But you can make it worse: a 2nd run with the same input gives a different result:
3 bottles of water
Output:
[
{ "item": "box", "amount": 1 },
{ "item": "salad", "amount": 1 },
{ "item": "potatoes", "amount": 2, "unit": "kg" },
{ "item": "rye flour", "amount": 5000, "unit": "g" },
{ "item": "heavy cream", "amount": 0.2, "unit": "m^3" },
{ "item": "low fat milk", "amount": 1, "unit": "m^3" },
{ "item": "military grade steel", "amount": 1000000, "unit": "kg" },
{ "item": "snickers bar", "amount": 1 },
{ "item": "eggs", "amount": 12 },
{ "item": "melon", "amount": 1 },
{ "item": "rice", "amount": 2, "unit": "kg" },
{ "item": "water", "amount": 3, "unit": "m^3" }
]
I wonder what Mistral thinks about the size of those bottles…and the amount of heavy cream, milk and steel are off by a factor of 1000. And my eggs from free roaming chicken get reduced to just eggs.
Let’s try Mistral 8x7B:
$OUTPUT:
[
{ "amount": 1, "unit": "item", "item": "box" },
{ "amount": 1, "unit": "item", "item": "salad" },
{ "amount": 2, "unit": "kg", "item": "potatoes" },
{ "amount": 5, "unit": "kg", "item": "rye flour" },
{ "amount": 0.2, "unit": "l", "item": "heavy cream" },
{ "amount": 1, "unit": "l", "item": "low fat milk" },
{ "amount": 1, "unit": "t", "item": "military grade steel" },
{ "amount": 1, "unit": "item", "item": "snickers bar" },
{ "amount": 12, "unit": "item", "item": "eggs" },
{ "amount": 1, "unit": "item", "item": "melon" },
{ "amount": 2, "unit": "kg", "item": "rice" },
]
That’s quite good! Except a bag of rice is not always 1kg. But it figured that “t” is short for “ton”. I wish it had converted it into kg though. My eggs got simplified to just eggs again.
Let’s try AWS Titan Text G1 Lite:
1000 ml olive oil
1000 ml white vinegar
```
[
{ "amount": "1 box", "unit": "box", "item": "salad" },
{ "amount": "2 kg", "unit": "kg", "item": "potatoes" },
{ "amount": "5000g", "unit": "kg", "item": "rye flour" },
{ "amount": "200 ml", "unit": "ml", "item": "heavy cream" },
{ "amount": "1 l", "unit": "l", "item": "low fat milk" },
{ "amount": "1 ton", "unit": "ton", "item": "military grade steel" },
{ "amount": "1 snickers bar", "unit": "snickers bar", "item": "snickers bar" },
{ "amount": "12 eggs", "unit": "eggs", "item": "eggs from free roaming chicken" },
{ "amount": "a nice melon", "unit": "kg", "item": "a nice melon" },
{ "amount": "two bags of rice", "unit": "kg", "item": "two bags of rice" },
{ "amount": "1000 ml", "unit": "ml", "item": "olive oil" },
{ "amount": "1000 ml", "unit": "ml", "item": "white vinegar" }
]
```
Beside adding items, I’m not impressed by the split of amount (number) and units. Where’s the “kg” for the rice or melons coming from? And combining box and salad…well, I did not think about that possibility.
Summary
All in all it’s a mixed bag: Claude 3 Sonnet does surprisingly well here. The other models have sometimes good results, but sometimes they add items and really screw up the results or modify them slightly (bags of rice become kg of rice). And here lies my problem: it’s not deterministic at all. Variations are nice when it comes to artsy requests as there is no “correct” answer, but for making a shopping list into JSON, that has a correct answer and I want that.
My Conclusion
I would still not trust LLMs to act on its understanding unless it has almost no consequences or is undo-able. If I let Mistral 7B order cream or milk or water, I might end up with rather expensive bill. And a storage problem. And possibly unwanted pasta, olive oil or vinegar.
Makes me wonder whether those models were trained on Italian food recipes…
That said, it’s an interesting way of using LLMs I have not considered in the past. I’ll try those again with newer models. AWS Bedrocks makes this very easy to do. Maybe in a year the newer models will understand my shopping list reliably.