Table of Contents

Openai

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
  "model": "text-davinci-003",
  "prompt": "The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.\n\nHuman: Hello, who are you?\nAI: I am an AI created by OpenAI. How can I help you today?\nHuman: I'd like to cancel my subscription.\nAI:",
  "temperature": 0.9,
  "max_tokens": 150,
  "top_p": 1,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.6,
  "stop": [" Human:", " AI:"]
}'

Separado en petición, creamos el fichero peticion.json

{
  "model": "text-davinci-003",
  "prompt": "The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.\n\nHuman: Hello, who are you?\nAI: I am an AI created by OpenAI. How can I help you today?\nHuman: I'd like to cancel my subscription.\nAI:",
  "temperature": 0.9,
  "max_tokens": 150,
  "top_p": 1,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.6,
  "stop": [" Human:", " AI:"]
}

Y lanzamos la petición:

curl -X POST https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d @peticion.json

Entrenar un modelo

https://platform.openai.com/docs/guides/fine-tuning

Hay que generar un fichero json con duplas de preguntas y respuestas, que se llaman prompt y completion.

Instalamos openai-cli. Creamos un fichero csv con el siguiente formato:

prompt,completion
"nombre de la víctima","Pedro"
"posible sospechoso","Roberto"
"pista encontrada","una llave en el suelo al lado de la puerta de la entrada"
"pista encontrada","unos zapatos que no pertenecen a la víctima en la habitación de la víctima"
"pista encontrada","un mensaje de texto en el móvil de la víctima"
"novia de Pedro","Sandra en los años 2016,2017"
"novia de Pedro","Laura en el año 2022"
"novia de Roberto","Laura en los años 2018,2019,2020,2021"

Lo cambiamos al formato JSON. Te dice de añadir unos separadaros, decíamos que si. Lanzamos este comando:

openai tools fine_tunes.prepare_data -f fichero.csv

Te crea el siguiente fichero:

{"prompt":"nombre de la víctima ->","completion":" Pedro\n"}
{"prompt":"posible sospechoso ->","completion":" Roberto\n"}
{"prompt":"pista encontrada ->","completion":" una llave en el suelo al lado de la puerta de la entrada\n"}
{"prompt":"pista encontrada ->","completion":" unos zapatos que no pertenecen a la víctima en la habitación de la víctima\n"}
{"prompt":"pista encontrada ->","completion":" un mensaje de texto en el móvil de la víctima\n"}
{"prompt":"novia de Pedro ->","completion":" Sandra en los años 2016,2017\n"}
{"prompt":"novia de Pedro ->","completion":" Laura en el año 2022\n"}
{"prompt":"novia de Roberto ->","completion":" Laura en los años 2018,2019,2020,2021\n"}

Lo subimos a openai al modelo ada (que es el barato). Los modelos son: ada, babbage, curie, davinci

openai api fine_tunes.create -t entrenando_01_prepared.jsonl -m ada
Upload progress: 100%|████████████████████████████████████████████| 699/699 [00:00<00:00, 387kit/s]
Uploaded file from entrenando_01_prepared.jsonl: file-MNz4rv9kV8jpbiRveTXA76YG
Created fine-tune: ft-V8Seq1neyFcJOncJDRUbabWD
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-02-27 19:22:47] Created fine-tune: ft-V8Seq1neyFcJOncJDRUbabWD

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-V8Seq1neyFcJOncJDRUbabWD

Podemos ver el estado de la petición:

openai api fine_tunes.get -i ft-V8Seq1neyFcJOncJDRUbabWD
{
  "created_at": 1677525767,
  "events": [
    {
      "created_at": 1677525767,
      "level": "info",
      "message": "Created fine-tune: ft-V8Seq1neyFcJOncJDRUbabWD",
      "object": "fine-tune-event"
    }
  ],
  "fine_tuned_model": null,
  "hyperparams": {
    "batch_size": null,
    "learning_rate_multiplier": null,
    "n_epochs": 4,
    "prompt_loss_weight": 0.01
  },
  "id": "ft-V8Seq1neyFcJOncJDRUbabWD",
  "model": "ada",
  "object": "fine-tune",
  "organization_id": "org-W85oba51ZpI7Keymmpa2exBj",
  "result_files": [],
  "status": "pending",
  "training_files": [
    {
      "bytes": 699,
      "created_at": 1677525767,
      "filename": "entrenando_01_prepared.jsonl",
      "id": "file-MNz4rv9kV8jpbiRveTXA76YG",
      "object": "file",
      "purpose": "fine-tune",
      "status": "processed",
      "status_details": null
    }
  ],
  "updated_at": 1677525767,
  "validation_files": []
}

Status está pending, cuando está completed:

{
  "created_at": 1677525767,
  "events": [
    {
      "created_at": 1677525767,
      "level": "info",
      "message": "Created fine-tune: ft-V8Seq1neyFcJOncJDRUbabWD",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526162,
      "level": "info",
      "message": "Fine-tune costs $0.00",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526162,
      "level": "info",
      "message": "Fine-tune enqueued. Queue number: 0",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526164,
      "level": "info",
      "message": "Fine-tune started",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526178,
      "level": "info",
      "message": "Completed epoch 1/4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526180,
      "level": "info",
      "message": "Completed epoch 2/4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526181,
      "level": "info",
      "message": "Completed epoch 3/4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526182,
      "level": "info",
      "message": "Completed epoch 4/4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526205,
      "level": "info",
      "message": "Uploaded model: ada:ft-iwanttobefreak-2023-02-27-19-30-05",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526208,
      "level": "info",
      "message": "Uploaded result file: file-goxBKlVtpq8p0X4otjLCFh0F",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677526208,
      "level": "info",
      "message": "Fine-tune succeeded",
      "object": "fine-tune-event"
    }
  ],
  "fine_tuned_model": "ada:ft-iwanttobefreak-2023-02-27-19-30-05",
  "hyperparams": {
    "batch_size": 1,
    "learning_rate_multiplier": 0.1,
    "n_epochs": 4,
    "prompt_loss_weight": 0.01
  },
  "id": "ft-V8Seq1neyFcJOncJDRUbabWD",
  "model": "ada",
  "object": "fine-tune",
  "organization_id": "org-W85oba51ZpI7Keymmpa2exBj",
  "result_files": [
    {
      "bytes": 1545,
      "created_at": 1677526206,
      "filename": "compiled_results.csv",
      "id": "file-goxBKlVtpq8p0X4otjLCFh0F",
      "object": "file",
      "purpose": "fine-tune-results",
      "status": "processed",
      "status_details": null
    }
  ],
  "status": "succeeded",
  "training_files": [
    {
      "bytes": 699,
      "created_at": 1677525767,
      "filename": "entrenando_01_prepared.jsonl",
      "id": "file-MNz4rv9kV8jpbiRveTXA76YG",
      "object": "file",
      "purpose": "fine-tune",
      "status": "processed",
      "status_details": null
    }
  ],
  "updated_at": 1677526208,
  "validation_files": []
}

Ahora si que nos lista ya nuestro modelo:

openai api fine_tunes.list
{
  "data": [
    {
      "created_at": 1677525767,
      "fine_tuned_model": "ada:ft-iwanttobefreak-2023-02-27-19-30-05",
      "hyperparams": {
        "batch_size": 1,
        "learning_rate_multiplier": 0.1,
        "n_epochs": 4,
        "prompt_loss_weight": 0.01
      },
      "id": "ft-V8Seq1neyFcJOncJDRUbabWD",
      "model": "ada",
      "object": "fine-tune",
      "organization_id": "org-W85oba51ZpI7Keymmpa2exBj",
      "result_files": [
        {
          "bytes": 1545,
          "created_at": 1677526206,
          "filename": "compiled_results.csv",
          "id": "file-goxBKlVtpq8p0X4otjLCFh0F",
          "object": "file",
          "purpose": "fine-tune-results",
          "status": "processed",
          "status_details": null
        }
      ],
      "status": "succeeded",
      "training_files": [
        {
          "bytes": 699,
          "created_at": 1677525767,
          "filename": "entrenando_01_prepared.jsonl",
          "id": "file-MNz4rv9kV8jpbiRveTXA76YG",
          "object": "file",
          "purpose": "fine-tune",
          "status": "processed",
          "status_details": null
        }
      ],
      "updated_at": 1677526208,
      "validation_files": []
    }
  ],
  "object": "list"
}

Con ada parece un poco mojón….

curl https://api.openai.com/v1/completions   -H "Authorization: Bearer $OPENAI_API_KEY"   -H "Content-Type: application/json"   -d '{"prompt": "¿Como se llama la víctima?", "model": "ada:ft-iwanttobefreak-2023-02-27-19-30-05"}'
{
  "id": "cmpl-6odiKxpEHTmTzmKrpAnqMbQpB77M0",
  "object": "text_completion",
  "created": 1677527080,
  "model": "ada:ft-iwanttobefreak-2023-02-27-19-30-05",
  "choices": [
    {
      "text": "\n\n—Monócrata Núrida.\n\n—",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 15,
    "total_tokens": 28
  }
}

Le he vuelto a hacer la pregunta y me ha dicho:

Toni
Toni señaló la puerta

Con davinci tarda mas, pero sigue dando respuestas que no tienen nada que ver.

Amigo de Jorge

Pilla textos de Juego de tronos y le puedes preguntar. Es en inglés
https://huggingface.co/deepset/roberta-base-squad2

Varios

NLP: Servicio de procesamiento de lenguaje natural

Varios servicios Machine Learning de Amazon:
https://aws.amazon.com/es/free/machine-learning/

Nueva API

En Marzo de 2023 salió el nuevo modelo gpt-3.5-turbo con su API

Fuente:
https://platform.openai.com/docs/guides/chat/introduction
https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb

Exportamos la key en bash:

export API_KEY='sk-aslasdjkasldjasldjasldjkasldkj'

Dede python:

import openai

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

print(response)

Respuesta:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The 2020 World Series was played at a neutral site due to the COVID-19 pandemic. The games were played at Globe Life Field in Arlington, Texas.",
        "role": "assistant"
      }
    }
  ],
  "created": 1677797684,
  "id": "chatcmpl-6pm6uk0cGDiLdRTq8GB8vy90AWzbb",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 35,
    "prompt_tokens": 56,
    "total_tokens": 91
  }
}

Con curl:

curl https://api.openai.com/v1/chat/completions -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" -d '{ "model": "gpt-3.5-turbo",  "messages": [{"role": "user", "content": "Inventate una historia de un asesinato en Carabanchel"}] }'