%x7FvutR8Fz/ixdASmykddpiXd/+6btLtFrajva4+abU=.sha256

@Soapy mcSoap3 years ago %x7FvutR8Fz/ixdASmykddpiXd/+6btLtFrajva4+abU=.sha256

At work we have a Postgres database that is used just to speed up some queries. That database holds image information, basically file paths, checksums, dates, and exif data in a jsonb field. Contrary to normalization and DBA standards everywhere, our database is not the authority over such data, the image files are the authority and hold the ultimate truth about our collection.

We have a cronjob that checks our gazillion images for checksum changes and update database records if needed. We seldom update image files, the usual update is just when we add them to a collection. Collections are just new keywords added to exif data. They conform to a pattern which is Collection <Name of the Collection>. So a file might have exif data such as Keywords: ["Watercolor", "Temple", "Collection Temple Art"].

If this was a user facing service, this architecture would not scale but this is just for internal usage. What we trade in terms of scalability, we gain in maintainability. Most of our people are not developers, they're print media experts and volunteers. It is easy for them to fill data using something like Adobe Bridge or one of our custom desktop tools.

If you stop to reason about it, you'll notice that the most important thing is the actual image file. We have multiple backups, including off-site backups of those files. By keeping all data embedded in the image files themselves, we're making them future proof and fail proof against going out of sync with the metadata. As long as you have access to the image files, rebuilding the whole cache database takes just some minutes.

Today, I had the need to create a little popup button to allow the user to select a collection for a query. If this was a normalised system, the answer would be quite obvious and easy. In the end, I used a combination of a Postgres fancy query with some LiveCode scripting.

The hard part is the query. We're extracting all the keywords from all images and grouping them by keyword. This query is used elsewhere to rank the most popular keywords. It goes like:

select 
  keyword, 
  count(distinct (image_id, keyword)) 
from 
  image, 
  jsonb_array_elements(case jsonb_typeof(metadata->'Keywords') when 'array' then metadata->'Keywords' else '[]' end) keyword 
group by 
  keyword 
order by 
  count desc;

Kinda tricky query because some images don't actually have a Keywords metadata, in which case we need to make sure it is treated like empty array. As you can see that will get all the keywords, and collections are just a specialisation of keywords.

My objective with the code below is to extract the collections from that query result and turn it into a line-delimited list of collection names. This is the code in LiveCode that does that:

dbSetSQL tSQL
put dbGet() into tResultsA

repeat for each key k in tResultsA
    if tResultsA[k]["keyword"] begins with "Collection" then
        put tResultsA[k]["keyword"] & cr after tCollections
    end if
end repeat

replace quote with empty in tCollections
replace "Collection " with empty in tCollections
delete char -1 of tCollections
sort tCollections
return tCollections

Programming doesn't need to be obfuscated. Many people here who have never programmed in HyperCard or LiveCode can probably understand what that code is doing.