[Linux-aus] Open Source AI or LLM people + projects in Australia/NZ

Lyndsey Jackson jackson.lyndsey at gmail.com
Sun Jan 7 19:47:27 AEDT 2024


Thanks Kathy for the thoughtful answer.

And thanks Frank for the tip.

Lots of really helpful and insightful perspectives Kathy I really
appreciate it.

 Amazee in the Drupal community are building LLM to Drupal capability but
it's still expensive to run.

The localisation and regionalisation of LLMs is one of the areas I was
thinking of for agriculture and regions, and I've been thinking about ways
to train people in how to use technology while creating crowdsourced LLM's
at the same time. The article I linked to has an example of a model for
agricultural inputs and I think building and breaking down what is
happening is a good way to approach the agtech ecosystem.

It feels like a strange grant to be honest. Which is why I wondered if
there was anyone in the technical community that had come across any other
bids. The discussions I've had have been with a group involving the
Australian Institute for Machine Learning and one of the big 4 - and their
model on how to make the numbers work (cash and in kind contribution) was
... interesting.

There are other priority areas outside of agriculture, that's just the one
I was interested in.

Lyndsey

On Sun, Jan 7, 2024 at 3:31 PM Kathy Reid via linux-aus <
linux-aus at lists.linux.org.au> wrote:

> Hi Lyndsey,
>
> The comments here are from my personal perspective, not those of any
> academic or institutional affiliations I have.
>
> Firstly I want to clarify some terms. AI and machine learning is a very
> broad space - it's used in almost every vertical - defence, aerospace,
> marketing, business analytics, education and so on. A large language model
> (LLM) is a specific type of machine learning model that is _predictive_ and
> _generative_. Given a prompt, it responds with an output that best matches
> the prompts _based on the data it has been trained on_. Other types of
> machine learning models (for example, image generators like DALLE-2) use
> different algorithms and are trained on different types of data. If you
> train an LLM on a particular set of data, for example, "movies", then it
> will predict what it's been trained on.
>
> ## Distinguishing open source models from open source data
>
> Models are built on data.
>
> While a model can be open sourced, that isn't enough to make it "open
> source AI", IMHO - and the Open Source Initiative is having a broader
> conversation around what is meant by "open source AI" [5]. The algorithm,
> code, dependencies etc are required for reproducibility to make it "truly
> open", IMHO.
>
> The key thing to keep in mind is that models are trained on *data* - it's
> not so important where the model is developed, but it's really important
> where the data comes from (this is the focus of the movement called
> "data-centric AI" [0]).
>
> ## LLMs in the Australian context
>
> The challenge here for the Australian context is that many LLMs are either
> not trained on Australian-specific data, _or_, their predictions do not
> mirror the Australian context because their training set has little
> Australian content compared to American content. This can be specified
> using prompts - for example "Please create a news headline and two
> paragraphs of copy of an important event in Melbourne in 2002. Use
> Australian English." - feeding this prompt to ChatGPT (3) will use
> Australian spelling and Melbourne-specific landmarks. Chat GPT also seems
> to have some grounding in Indigenous knowledges, such as Bunjil. But if I
> ask ChatGPT the question "There's a bingle at Broady and the Western's
> chokkas back to the servo. What should I do?" then it quickly degrades to
> general advice, rather than context-specific suggestions ("Mate, can you
> turn off to Donnybrook Road or Plenty Road?").
>
> AFAIK there are no open source, public LLMs being developed within
> Australia.
>
> The main academic conference for LLM work in Australia (which is
> considered less prestigious than international conferences such as NeurIPS
> [1] and ICML [2]) is the Australasian Language Technology Conference
> (ALTA), which I attended in December. My notes from this are public [3].
> The main focus of this conference was the application of LLM technology to
> healthcare applications - such as mining medical records to assist health
> professionals in making accurate and timely diagnoses. These language
> models *are* being made open source, but they are smaller and much more
> specific than ChatGPT.
>
> Indeed, there was a lot of conversation at the conference about the *need*
> for a research or open source LLM in Australia because the costs of using
> ChatGPT and others (Claude, Bard) quickly become expensive.
>
> In summary, there is a lot of scope for creating an Australian-specific,
> open source LLM. AFAIK, one doesn't exit.
>
> ## Other open source LLM efforts
>
> The main open source LLM efforts are:
>
> * https://www.eleuther.ai/
>
> * Llama 2 by Meta
>
> * https://falconllm.tii.ae/
>
> * Mistral AI
>
> None of these efforts are Australian-based.
>
> ## Other Australian open AI efforts
>
> Many universities are tackling the AI-generated / LLM generated text issue
> by using AI-detection tools, primarily within the Turnitin suite. Turnitin
> LLC is headquartered in California, USA.
>
> From conversations I've had with archivists in Australian collection
> institutions, there is also a need for Australian-specific speech
> recognition tools - because tools like Whisper do not recognise Australian
> accented speech as well as other accents [4]. These tools are being used to
> transcribe audio visual archives. Again, a lot of this comes down to the
> *data* that the model is trained on. The key problem here is that Whisper
> was trained on 680k hours of speech data. To train a comparable model, you
> would need hundreds of thousands of hours of Australian-accented speech.
> The AusTalk archive, for example, from memory has maybe a couple of
> thousand hours (it's offline so I can't check).
>
> In the last week, we've also seen rapid advances in voice synthesis - with
> MyShell releasing OpenVoice voice cloning technology [6]. The *model* is
> openly available, but again, the data and algorithms and code, are not. The
> challenge here for Australia is - do we want to be able to clone
> Australian-accented voices (yeah, nah ;-). There are no Australian TTS /
> voice cloning efforts that are open source, AFAIK. This raises major
> ethical questions for the likes of the ATO that uses voice recognition (and
> which has been previously spoofed [7]).
>
> AI is rapidly being adopted into cybersecurity efforts, particularly in
> the field of adversarial AI. These capabilities are predominantly the
> domain of the Acronym Agencies (ASIO, DSD etc), and the folks at BSides
> might be useful to talk to about Australian efforts here.
>
> In terms of Australian AI institutes, there are a few:
>
> * The Data61 CSIRO National Artificial Intelligence Centre - which doesn't
> actually produce any AI or ML, its remit is to encourage AI adoption -
> https://www.csiro.au/en/work-with-us/industries/technology/National-AI-Centre
>
> * Australian Institute for Machine Learning at University of Adelaide -
> Research in to AI / ML - https://www.adelaide.edu.au/aiml/
>
> * A2I2 Institute at Deakin University - https://a2i2.deakin.edu.au/
>
> * UNSW AI Institute - https://www.unsw.edu.au/unsw-ai
>
> Invariably, the academic institutions offer various forms of (mostly
> postgrad) offerings, with a heavy emphasis on "engaging industry" (read:
> getting industry to fund AI research because the government's research
> funding is paltry).
>
> ## The problem of national capability and why the business adoption
> centres are exacerbating rather than addressing this, IMHO
>
> (I was about to write "sovereign capability" but as I was reminded,
> correctly, recently, sovereignty was never ceded).
>
> This all brings me to the key problem I have with the business adoption
> initiative. What I've outlined above is that Australia has very little
> national capability in AI, and even less in open source AI. What we adopt
> is, predominantly, American-owned AI that might then be shoe-horned into an
> Australian context. Sure, businesses should be looking to adopt AI to
> remain competitive. But the only AI they can adopt at the moment is,
> largely, American AI.
>
> What the business adoption initiative seeks to do is spur *adoption*
> rather than *development*. What I would like to see happen is the
> development of national AI capability, preferably in the form of open
> source products that can be used by Australian businesses and organisations
> nationally. Perhaps one of our national organisations should focus on that,
> rather than encouraging Australian businesses to spend money overseas ...
>
> Kind regards,
>
> Kathy Reid
>
>
> [0] https://dcai.csail.mit.edu/
>
> [1] https://neurips.cc/
>
> [2] https://icml.cc/
>
> [3] https://blog.kathyreid.id.au/2023/12/10/alta2023/
>
> [4] My PhD research, forthcoming
>
> [5]
> https://blog.opensource.org/open-source-ai-establishing-a-common-ground/
>
> [6] https://research.myshell.ai/open-voice
>
> [7]
> https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai
>
>
>
>
>
> On 7/1/24 14:54, Lyndsey Jackson via linux-aus wrote:
>
> Hi all,
>
> on a bit of a fact finding reach out for people or connections from people
> working on open AI/LLM projects.
>
> Late last year a proposal for AI Centres to help SME's adopt AI dropped.
> https://business.gov.au/grants-and-programs/artificial-intelligence-ai-adopt-program Before
> the holiday break I did some work on a proposal concept for agricultural
> value add (which is very, very broad), and I have insight into how some key
> groups were considering approaching it.
>
> And if you have any tech/advice/ideas/groups please let me know, I might
> not get a group to put a bid in but that's ok. I still want to know what's
> happening in open source and who is working on it.
>
>
> Thanks,
>
> Lyndsey
>
>
> _______________________________________________
> linux-aus mailing listlinux-aus at lists.linux.org.auhttp://lists.linux.org.au/mailman/listinfo/linux-aus
>
> To unsubscribe from this list, send a blank email tolinux-aus-unsubscribe at lists.linux.org.au
>
> _______________________________________________
> linux-aus mailing list
> linux-aus at lists.linux.org.au
> http://lists.linux.org.au/mailman/listinfo/linux-aus
>
> To unsubscribe from this list, send a blank email to
> linux-aus-unsubscribe at lists.linux.org.au



-- 


Lyndsey Jackson

0400 329 894

W: www.lyndseyjackson.com.au
T: @ok_lyndsey <http://www.twitter.com/ok_lyndsey>
LIn: https://au.linkedin.com/in/lyndsey-jackson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.org.au/pipermail/linux-aus/attachments/20240107/4a6bcc9d/attachment-0001.html>


More information about the linux-aus mailing list